Today's applications process vast amounts of textual data. Full-Text Search (FTS) is a great mechanism for fast, efficient access to character-type data elements. c-tree applications with large volumes of text can now be complemented with high-performing text search capabilities.
Similar to a traditional b-tree index over a c-tree data file, you may now define a Full-Text Index (FTI) by specifying which character-type fields to include in this search index. An additional set of FTI files will be maintained on disk.
This support is on a file-by-file basis (the same as a typical c-tree index).
Once an FTI is defined for a file, it is maintained in “real-time” along with any other b-tree type c-tree indexes, including deferred index support.
Accessing data records through a full-text search index is simple. Using new API search functions, you provide a word or phrase (text) for which you are searching. All records whose FTI-indexed fields contain this text are returned. It is left up to the developer to utilize this result as needed for the application.

The first step in using Full-Text Search is creating an FTS Index (FTI). Functions are available through several different APIs for creating the index, adding a field to the index, setting the default index, and managing the FTI handle. Functions also allow options to be set and other parameters to be controlled.
The FTS indexes are maintained as records are added to the table. Internally, a tokenizer divides the text into "tokens" (which are roughly equivalent to a list of categorized words).
When a full-text search is performed, the function is passed one or more words. Each word is compared to the tokens in the FTS Index. The function returns the records that contain those words in their indexed fields.
Additional information can be returned, such as the proximity of multiple tokens. For example, when searching for "FairCom" and "database", they are in closer proximity in the text "FairCom database" than in "FairCom announces new enhancements to its highly acclaimed database." Statistics about relevance can also be returned.
Full-Text Search is presently available in the following APIs:
- FairCom DB API C
- FairCom DB API C++
- FairCom DB API Java
- FairCom DB API .NET
Full-Text Search is supported in client/server only.
Functions are available to create a full-text index on the specified fields of a single table, even an existing table. A Full-Text Index can be defined on any text field (as specified in the data file's schema).
As an example, to add full-text search support to a table at the FairCom DB API C level, call ctdbAddFTI. This function returns a CTHANDLE to be used with FTS functions to add a field to the list of fields that will be indexed by the FTS Index.
A Full-Text Index can be added to an existing table by calling ctdbAddFTI, setting the various properties (including whether the index is created immediately, or as a deferred index, and then calling ctdbAlterTable.
Quick Start Guide to FTS
This chapter will get you up and running with FairCom Full-Text Search (FTS) in a hurry.
Currently, FairCom provides support at the navigational API level for interfacing with FTS. Support is presently being added for the relational APIs.
Navigational Full-Text Search Using FairCom DB API
Adding a Full-Text Index (FTI) to a New Table
To programmatically add FTS support to an empty data file is very straight forward:
- Add a new FTS index using the ctdbAddFTI() function call. This defines the index.
- Add a field to the new index file using the ctdbAddFTIFieldByName() function call. This tells FairCom DB which field to fully index. At this time only a single field can be part of an FTI. Multiple FTI can be defined over the same table.
- To finish creating the new table, call ctdbCreateTable() to complete the file creation, including the new FTI.
See ctdbCreateTable.
That's it! You are now ready to start adding data.
Here is an example:
CTHANDLE dTable;
pFTI = ctdbAddFTI(dTable, "myFTSidx");
if (!pFTI)
Handle_Error(hSession, "ctdbAddFTI");
if (ctdbAddFTIFieldByName(pFTI, 0, "body", CTDB_FTI_MODE_REG))
Handle_Error(hSession, "ctdbAddFTIFieldByName");
Adding an FTI to an Existing File
If you are adding an FTI over an existing FairCom DB data file that is already populated, then once you have performed the first 2 steps above, simply call ctdbAlterTable(dTable, CTDB_ALTER_INDEX).
Performing an FTS Query
The following code snippet demonstrates a query using the FairCom DB API interface:
[...]
if (ctdbFTSearchOn(hRecord, string, 0))
Handle_Error(hSession, "ctdbFTSearchOn()");
rv = ctdbFirstRecord(hRecord);
while (rv == NO_ERROR)
{
if (ctdbGetFieldAsString(hRecord, 0, string, 65536) != CTDBRET_OK)
Handle_Error(hSession, "ctdbGetFieldAsString()");
printf("%s\n",string);
rv = ctdbNextRecord(hRecord);
}
if (ctdbFTSearchOff(hRecord))
Handle_Error(hSession, "ctdbFTSearchOff()");
[...]
Full-Text Query Syntax
Optional Quotation Marks
The use of single quotation marks to delimit words in an FTS query string is optional. Words in a query string may be delimited in either of two ways:
- They may be surrounded by single quotation marks.
or - They may be delimited by a space character (0x32).
A phrase must be surrounded by single quotation marks.
Operators
The following terms are treated as operators when they are not surrounded by single quotes:
and
near
near/x (where x is a numeric value, as explained later under NEAR Queries)
or (treated as an operator but returns an error because we do not currently support this operator)
Examples
The following are examples of valid queries:
To search for documents containing the three words "can", "be", and "found":
"can be found"To search for documents containing the word "can" near the word "found":
"can NEAR found"
Very First Token Search
The caret (^) character indicates to search from the beginning of the column. Only one caret is allowed in an FTS query.
SQL:
All words from the “document” column from the “docs” table for which "linux" is the first token:
SELECT * FROM docs WHERE document MATCH '^linux';FairCom DB API (pseudo code):
ftSearchOn("document:^linux");
Phrase Queries
A phrase query retrieves all documents that contain a nominated set of terms or term prefixes in a specified order with no intervening tokens. Phrase queries are specified by enclosing a space-separated sequence of terms or term prefixes in double quotes ("). For example:
The following example is a query for all documents that contain the phrase "linux applications":
SELECT * FROM docs WHERE document MATCH '"linux applications"';FairCom DB API (pseudo code):
ftSearchOn("document:'linux applications'");
Token NOT present Queries
It is possible to exclude from filter results, rows that contain a keyword by using a “minus” before the keyword.
For example: find all documents which contain the keyword “linux” but exclude those that also include the keyword “applications”
SELECT * FROM docs WHERE document MATCH '"linux” -“applications"';Multiple keywords for exclusion can also be used. For example, find rows where the “document” column contains the “linux” keyword but not “applications” or “database”
SELECT * FROM docs WHERE document MATCH '"linux” -“applications" -”database”';
NEAR Queries
A NEAR query returns documents that contain two or more nominated tokens within a specified proximity of each other. NEAR is a binary operator: it accepts two and only two arguments in the form: A NEAR B.
A NEAR query is specified by putting the keyword NEAR between two phrase, term, or prefix queries. By default, NEAR searches for 10 or fewer intervening tokens between the two arguments. To specify a proximity other than the default, an operator of the form NEAR/<N> may be used, where <N> is the maximum number of intervening terms allowed.
SQL:
Use the following query to search for a document that contains the tokens "faircom" and "database" with not more than 6 intervening tokens. This matches the only document in table docs. Note that the order in which the tokens appear in the document does not have to be the same as the order in which they appear in the query.
SELECT * FROM docs WHERE document MATCH 'database NEAR/6 faircom';FairCom DB API (pseudo code):
ftSearchOn("document:database NEAR/6 faircom");
Multiple NEAR Operators
More than one NEAR operator may appear in a single query. In this case each pair of terms or phrases separated by a NEAR operator must appear within the specified proximity of each other in the document. Using the same table and data as in the block of examples above:
The following query selects documents that contain an instance of the term "FairCom" separated by two or fewer terms from an instance of the term "acid", which is separated by two or fewer terms from an instance of the term "relational".
SELECT * FROM docs WHERE document MATCH 'faircom NEAR/2 acid NEAR/2 relational';This option will require one index lookup for every term in the phrase, saving the doclist and then an intersection of those search results based on the specified distance.
FairCom DB API (pseudo code):
ftSearchOn("document:faircom NEAR/2 acid NEAR/2 relational");
Precedence
NEAR takes precedence over AND. For example:
A AND B NEAR Cinstructs FTS to search for records that contain token A and that also contain token B in close proximity ("NEAR") to token C.
Full-Text Search Query on Hyphenated Words
The FairCom full-text search supports querying on hyphenated words. Hyphenated words are converted to phrases. For example, searching for the word "self-contained" searches for the phrase 'self contained'.
What is commonly considered a "word" may be ignored either because the tokenizer is written to ignore specific words (e.g., words with fewer than a specified number of characters) or because the sequence of characters is on the stop list of words specifically ignored. In the following example, querying the word "give-and-take" would search for the phrase 'give and take', however the word "and" may be ignored.
If the hyphenated word is in a phrase, it is converted to individual words in that phrase.
Example FTS Queries
The following are some examples of valid queries:
To search for documents containing the three words "can", "be", and "found":
'can be found'To search for documents containing the word "can" near the word "found":
'can' NEAR 'found'In the next examples, "near" is used as a search term rather than an operator, which means that it requires single quotes.
To search for the words "cans", "near", and "trash":
'cans' 'near' 'trash'To search for "near" and "and" near "cans":
'cans' NEAR/2 'near' NEAR/2 'and' NEAR/2 'trash'To search for usage of "NEAR/2" in a document:
'near/2'
To perform a phrase match, the phrase must be enclosed in single quotation marks per FTS query syntax defined at the FairCom DB API level. In SQL, that mandates two single quotes since the entire query is surrounded by a single quote (because it is a literal). Therefore, quotes need to be escaped in the standard way (that is double them).
(This implies the entire query is a phrase match since the final query is made by 3 single quotes, the phrase and 3 single quotes.)
SELECT * FROM docs WHERE document MATCH '(''faircom'' AND ''database'')'For convenience, the FairCom DB SQL syntax now allows wrapping phrases within the FTS query literal with double quotes (that do not need to be escaped).
SELECT * FROM docs WHERE document MATCH '(“faircom” AND “database”)'Both the old and the new methods (double quote or double single quotes) of specifying phrase search can be used for SQL FTS queries.
FTS: Quotes, Operators, Examples
Optional Quotation Marks
The use of single quotation marks to delimit words in an FTS query string is optional. Words in a query string may be delimited in either of two ways:
- They may be surrounded by single quotation marks.
or - They may be delimited by a space character (0x32).
A phrase must be surrounded by single quotation marks.
Operators
The following terms are treated as operators when they are not surrounded by single quotes:
and
near
near/x (where x is a numeric value, as explained later under NEAR Queries)
or (treated as an operator but returns an error because we do not currently support this operator)
Examples
The following are examples of valid queries:
To search for documents containing the three words "can", "be", and "found":
"can be found"To search for documents containing the word "can" near the word "found":
"can NEAR found"
FTS: Query Syntax
Very First Token Search
The caret (^) character indicates to search from the beginning of the column. Only one caret is allowed in an FTS query.
SQL:
All words from the “document” column from the “docs” table for which "linux" is the first token:
SELECT * FROM docs WHERE document MATCH '^linux';FairCom DB API (pseudo code):
ftSearchOn("document:^linux");
Phrase Queries
A phrase query retrieves all documents that contain a nominated set of terms or term prefixes in a specified order with no intervening tokens. Phrase queries are specified by enclosing a space-separated sequence of terms or term prefixes in double quotes ("). For example:
The following example is a query for all documents that contain the phrase "linux applications":
SELECT * FROM docs WHERE document MATCH '"linux applications"';FairCom DB API (pseudo code):
ftSearchOn("document:'linux applications'");
Token NOT present Queries
It is possible to exclude from filter results, rows that contain a keyword by using a “minus” before the keyword.
For example: find all documents which contain the keyword “linux” but exclude those that also include the keyword “applications”
SELECT * FROM docs WHERE document MATCH '"linux” -“applications"';Multiple keywords for exclusion can also be used. For example, find rows where the “document” column contains the “linux” keyword but not “applications” or “database”
SELECT * FROM docs WHERE document MATCH '"linux” -“applications" -”database”';
NEAR Queries
A NEAR query returns documents that contain two or more nominated tokens within a specified proximity of each other. NEAR is a binary operator: it accepts two and only two arguments in the form: A NEAR B.
A NEAR query is specified by putting the keyword NEAR between two phrase, term, or prefix queries. By default, NEAR searches for 10 or fewer intervening tokens between the two arguments. To specify a proximity other than the default, an operator of the form NEAR/<N> may be used, where <N> is the maximum number of intervening terms allowed.
SQL:
Use the following query to search for a document that contains the tokens "faircom" and "database" with not more than 6 intervening tokens. This matches the only document in table docs. Note that the order in which the tokens appear in the document does not have to be the same as the order in which they appear in the query.
SELECT * FROM docs WHERE document MATCH 'database NEAR/6 faircom';FairCom DB API (pseudo code):
ftSearchOn("document:database NEAR/6 faircom");
Multiple NEAR Operators
More than one NEAR operator may appear in a single query. In this case each pair of terms or phrases separated by a NEAR operator must appear within the specified proximity of each other in the document. Using the same table and data as in the block of examples above:
The following query selects documents that contain an instance of the term "FairCom" separated by two or fewer terms from an instance of the term "acid", which is separated by two or fewer terms from an instance of the term "relational".
SELECT * FROM docs WHERE document MATCH 'faircom NEAR/2 acid NEAR/2 relational';This option will require one index lookup for every term in the phrase, saving the doclist and then an intersection of those search results based on the specified distance.
FairCom DB API (pseudo code):
ftSearchOn("document:faircom NEAR/2 acid NEAR/2 relational");
Precedence
NEAR takes precedence over AND. For example:
A AND B NEAR Cinstructs FTS to search for records that contain token A and that also contain token B in close proximity ("NEAR") to token C.
Full-Text Search Query on Hyphenated Words
The FairCom full-text search supports querying on hyphenated words. Hyphenated words are converted to phrases. For example, searching for the word "self-contained" searches for the phrase 'self contained'.
What is commonly considered a "word" may be ignored either because the tokenizer is written to ignore specific words (e.g., words with fewer than a specified number of characters) or because the sequence of characters is on the stop list of words specifically ignored. In the following example, querying the word "give-and-take" would search for the phrase 'give and take', however the word "and" may be ignored.
If the hyphenated word is in a phrase, it is converted to individual words in that phrase.
Example FTS Queries
The following are some examples of valid queries:
To search for documents containing the three words "can", "be", and "found":
'can be found'To search for documents containing the word "can" near the word "found":
'can' NEAR 'found'In the next examples, "near" is used as a search term rather than an operator, which means that it requires single quotes.
To search for the words "cans", "near", and "trash":
'cans' 'near' 'trash'To search for "near" and "and" near "cans":
'cans' NEAR/2 'near' NEAR/2 'and' NEAR/2 'trash'To search for usage of "NEAR/2" in a document:
'near/2'Easier Full-Text Search (FTS) MATCH Operator Syntax
To perform a phrase match, the phrase must be enclosed in single quotation marks per FTS query syntax defined at the FairCom DB API level. In SQL, that mandates two single quotes since the entire query is surrounded by a single quote (because it is a literal). Therefore, quotes need to be escaped in the standard way (that is double them).
(This implies the entire query is a phrase match since the final query is made by 3 single quotes, the phrase and 3 single quotes.)
SELECT * FROM docs WHERE document MATCH '(''faircom'' AND ''database'')'For convenience, the FairCom DB SQL syntax now allows wrapping phrases within the FTS query literal with double quotes (that do not need to be escaped).
SELECT * FROM docs WHERE document MATCH '(“faircom” AND “database”)'Both the old and the new methods (double quote or double single quotes) of specifying phrase search can be used for SQL FTS queries.
SQL search examples
SQL full-text search examples using PORTER English stemming tokenizer
The PORTER stemming tokenizer identifies each word in the search using whitespace and punctuation. Then, regardless of case, the tokenizer finds all matching words by reducing each word in the search to its stem by removing trailing s, er, ing, and ed.
A search string identifies each word using whitespace and/or punctuation and may use double quotes to identify a word or a phrase. Phrases contain two or more words within double quotes and may include whitespace between words. Punctuation, such as .!>, is ignored in the search string when it is not a full-text search operator. Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT; However, * ( ) OR NOT are not supported and will return an error when used.
/* =====================================================================
* Full-text search demo using the PORTER tokenizer.
* =====================================================================
*/
/* Create the testsearch table. */
CREATE TABLE testsearch
(
name VARCHAR(30)
, description VARCHAR(500)
)
STORAGE_ATTRIBUTES 'hotAlter; huge; rowid_fld; noRecbyt_idx;'
;
/* Create full-text index on the description field of the testsearch table. Use the PORTER English word stem tokenizer to do case insensitive matching of common word forms */
CREATE FULLTEXT INDEX testsearch_description_porter_fts
ON testsearch( description )
TOKENIZER PORTER
UPDATE_MODE ASYNC
;
/* Insert a record into the testsearch table. */
INSERT INTO testsearch
VALUES('Mike','Americans --love-- their __BASEball__ and <FOOTball>!')
;
/* Look at the record inserted into the testsearch table. */
SELECT description FROM testsearch;
/* =====================================================================
* The following queries find the inserted record
* by searching for words and phrases in the description field:
* 'Americans --love-- their __BASEball__ and <FOOTball>!'.
*
* The PORTER tokenizer reduces the text to
* 'american love their baseball and football'.
*
* The tokenizer identifies each word using whitespace and/or punctuation.
* The tokenizer is case insensitive.
* The tokenizer reduces each word to its stem
* by removing trailing s, er, ing, and ed.
* The tokenizer finds all words matching the stem.
* These words match each other -- even when they are invalid words:
* baseball baseballs baseballing baseballer baseballed
*
* A search string identifies each word using whitespace and/or punctuation.
* A search string may use double quotes to identify a word or a phrase.
* A phrase contains two or more words within double quotes.
* A search string may include any amount of whitespace between words.
* A search string may include
* Punctuation, such as .!>, is ignored in the search string
* when it is not a full-text search operator.
* Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT
* The following search operators are unsupported: * ( ) OR NOT
* The server returns an error when you use an unsupported search operator.
* =====================================================================
*/
/* WORD SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseball';
/* VERB STEM SEARCH: Find records in the testsearch table where the description column contains common English conjugations of verbs, such as 'loving', 'love', 'loved', and 'loves', anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'loving';
/* NOUN STEM SEARCH: Find records in the testsearch table where the description column contains English singular and plural forms of nouns, such as 'american' and 'americans', anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'american';
/* CASE INSENSITIVE SEARCH: Find records in the testsearch table where the description column contains common forms of the word 'baseBALLs', such as 'baseball' and 'Baseballs' anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseBALLs';
/* FIRST WORD SEARCH: Find records in the testsearch table where the description column contains the case-insensitive word 'americans' as the first word in the description field. */
SELECT description FROM testsearch WHERE description MATCH '^americans';
/* FIRST PHRASE SEARCH: Find records in the testsearch table where the description column contains the phrase "americans love" at the begining of the text in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' ^"americans love" ';
/* MULTI-WORD SEARCH: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseball love';
/* MULTI-WORD SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field while ignoring case and whitespace. */
SELECT description FROM testsearch WHERE description MATCH ' Baseball >>> LOVE! ';
/* PHRASE SEARCH: Find records in the testsearch table where the description column contains the case-insensitive phrase "love their" anywhere in the description field. The words 'love" and 'their" must follow each other with no intervening words in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' "love their" ';
/* PHRASE SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the case-insensitive phrase "LOVE their" anywhere in the description field. The case-insensitive words 'LOVE" and 'theIR" must follow each other with no intervening words, but any amount of whitespace and punctuation may occur between these words in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' "LOVE theIR" ';
/* STEMMED MULTI-WORD SEARCH: Find records in the testsearch table where the description column contains several words anywhere in the description field, such as 'baseBALLs' and 'american'. The PORTER tokenizer matches each word that has the same English stemming. */
SELECT description FROM testsearch WHERE description MATCH 'Loved their american baseBALLs.';
/* STEMMED PHRASE SEARCH: Find records in the testsearch table where the description column contains common forms of each word in the phrase "and football" anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' "and footballs" ';
/* STEMMED WORD SEARCH ACROSS INCORRECT WORD FORMS: Find records in the testsearch table where the description column contains correct and incorrect forms of words, such as 'baseballing', 'baseballer', 'baseballed'. The PORTER stemmer applies its rules to every word -- even when the rule is incorrect in English. */
SELECT description FROM testsearch WHERE description MATCH ' american americans americaning americaner americaned baseball baseballs baseballing baseballer baseballed their theirs theiring theired "and" "ands" "anding" "anded" ';
/* STEMMED FIRST WORD SEARCH: Find records in the testsearch table where the description column contain common forms of the word 'americaning' as the first word in the description field. */
SELECT description FROM testsearch WHERE description MATCH '^americaning';
/* STEMMED FIRST PHRASE SEARCH: Find records in the testsearch table where the description column contains common forms of the words in the phrase "american loving" at the begining of the text in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' ^"american loving" ';
/* NEAR SEARCH: Find records in the testsearch table where the description column contains the words 'Americans' and 'baseball' with no more than ten words between each other anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'Americans NEAR football';
/* NEAR PHRASE SEARCH: Find records in the testsearch table where the description column contains the words 'their' and 'baseballs' with no words between each other. This is the same as using a phrase search. */
SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseball';
/* NEAR N SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' with no more than 1 word between each other. This pattern may occur anywhere in the description field with any amount of whitespace and punctuation between the words. */
SELECT description FROM testsearch WHERE description MATCH ' baseball NEAR/1 football ';
/* NOT WORD SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' but not 'karate'. NOTE: The minus operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include a term in the search phrase before the NOT operator. */
SELECT description FROM testsearch WHERE description MATCH 'love baseball -karate';
/* NOT PHRASE SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' but not the phrase "hate karate" and not the word 'soccer'. NOTE: The minus operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include a term in the search phrase before the NOT operator. */
SELECT description FROM testsearch WHERE description MATCH 'baseball -"hate karate" -soccer';
/* STEMMED NEAR SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the words 'American' and 'baseball' with no more than ten words between each other anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'americaner NEAR baseballs ';
/* STEMMED PHRASE SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the words 'their' and 'baseballs' with no words between each other anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseBALLS ';
/* STEMMED NEAR N SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the words 'baseball' and 'football' with no more than 1 word between each other anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseballing NEAR/1 footballed ';
/* STEMMED NOT WORD SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the words 'love' and 'baseball' but not 'karate'. NOTE: The minus operator must be adjacent to the word or phrase with no intervening whitespace. You must also have a term in the search phrase before the minus - operator. */
SELECT description FROM testsearch WHERE description MATCH 'loves baseballing -karate';
/* STEMMED NOT PHRASE SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the word 'baseball' but not the phrase "hate karate". */
SELECT description FROM testsearch WHERE description MATCH 'loving baseballs -"hating karate" ';
/* STEMMED NOT PHRASE SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the word 'love' but not case-insensitive common forms of the words in the phrase "Americans football". */
SELECT description FROM testsearch WHERE description MATCH 'Loved -"ameRICAN footBALLs" ';
/* Remove the testsearch table. */
DROP TABLE testsearch;
SQL full-text search examples using SIMPLE tokenizer
The SIMPLE tokenizer identifies each word using whitespace and/or punctuation. Then regardless of case, the tokenizer finds all matching words in the search. tokenizer is case insensitive.
A search string identifies each word using whitespace and/or punctuation and may use double quotes to identify a word or a phrase. Phrases contain two or more words within double quotes and may include whitespace between words. Punctuation, such as .!>, is ignored in the search string when it is not a full-text search operator. Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT; However, * ( ) OR NOT are not supported and will return an error when used.
/* =====================================================================
* Full-text search demo using the SIMPLE tokenizer.
* =====================================================================
*/
/* Create the testsearch table. */
CREATE TABLE testsearch
(
name VARCHAR(30)
, description VARCHAR(500)
)
STORAGE_ATTRIBUTES 'hotAlter; huge; rowid_fld; noRecbyt_idx;'
;
/* Create full-text index on the description field of the testsearch table. Use the PORTER English word stem tokenizer to do case insensitive matching of common word forms */
CREATE FULLTEXT INDEX testsearch_description_simple_fts
ON testsearch( description )
UPDATE_MODE ASYNC
;
/* Insert a record into the testsearch table. */
INSERT INTO testsearch
VALUES('Mike','Americans --love-- their __BASEball__ and <FOOTball>!')
;
/* Look at the record inserted into the testsearch table. */
SELECT description FROM testsearch;
/* =====================================================================
* The following queries find the inserted record
* by searching for words and phrases in the description field:
* 'Americans --love-- their __BASEball__ and <FOOTball>!'.
*
* The SIMPLE tokenizer reduces the text to
* 'americans love their baseball and football'.
*
* The tokenizer identifies each word using whitespace and/or punctuation.
* The tokenizer is case insensitive.
* A search string identifies each word using whitespace and/or punctuation.
* A search string may use double quotes to identify a word or a phrase.
* A phrase contains two or more words within double quotes.
* A search string may include any amount of whitespace between words.
* A search string may include
* Punctuation, such as .!>, is ignored in the search string
* when it is not a full-text search operator.
* Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT
* The following search operators are unsupported: * ( ) OR NOT
* The server returns an error when you use an unsupported search operator.
* =====================================================================
*/
/* WORD SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseball';
/* CASE INSENSITIVE SEARCH: Find records in the testsearch table where the description column contains case-insensitive forms of the word 'americans', such as 'Americans' and 'AMERIcans' anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'americans';
/* FIRST WORD SEARCH: Find records in the testsearch table where the description column contains the word 'americans' as the first word in the description field. */
SELECT description FROM testsearch WHERE description MATCH '^americans';
/* FIRST PHRASE SEARCH: Find records in the testsearch table where the description column contains the phrase "americans love" at the begining of the text in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' ^"americans love" ';
/* MULTI-WORD SEARCH: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseball love';
/* MULTI-WORD SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field. NOTE: Any amount of whitespace may occur between these words in the search text. Punctuation, such as .!>, is also ignored in the search string when it is not a full-text search operator. */
SELECT description FROM testsearch WHERE description MATCH ' <Baseball> >>> .LOVE. !!! ';
/* PHRASE SEARCH: Find records in the testsearch table where the description column contains the case-insensitive phrase "love their" anywhere in the description field. The words 'love" and 'their" must follow each other with no intervening words in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' "love their" ';
/* PHRASE SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the case-insensitive phrase "LOVE their" anywhere in the description field. The case-insensitive words 'LOVE" and 'theIR" must follow each other with no intervening words, but any amount of whitespace and unreserved punctuation may occur between these words in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' "LOVE! theIR" ';
/* NEAR SEARCH: Find records in the testsearch table where the description column contains the words 'Americans' and 'baseball' with no more than ten words between each other anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'Americans NEAR football ';
/* NEAR PHRASE SEARCH: Find records in the testsearch table where the description column contains the words 'their' and 'baseballs' with no words between each other. This is the same as using a phrase search. */
SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseball';
/* NEAR N SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' with no more than 1 word between each other. This pattern may occur anywhere in the description field with any amount of whitespace and punctuation between the words. */
SELECT description FROM testsearch WHERE description MATCH ' baseball NEAR/1 football ';
/* NOT WORD SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' but not 'karate'. NOTE: The minus - operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include at least one term in the search phrase before the - operator. */
SELECT description FROM testsearch WHERE description MATCH 'love baseball -karate';
/* NOT PHRASE SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' but not the phrase "hate karate" and not the word 'soccer'. NOTE: The minus operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include a term in the search phrase before the NOT operator. */
SELECT description FROM testsearch WHERE description MATCH 'baseball -"hate karate" -soccer';
/* AND SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' and 'loved'. NOTE: The AND operator is optional. The server automatically adds it between each term in the search expression. The following two queries are functionally identical. */
SELECT description FROM testsearch WHERE description MATCH ' love AND "their baseball" ';
SELECT description FROM testsearch WHERE description MATCH ' love "their baseball" ';
/* SEARCH for reserved words "and", "or", and "near": Find records in the testsearch table where the description column contains the words "baseball" and "and" but not "or" and "near". NOTE: enclose a reserved word in double quotes to treat it as a word instead of an operator. */
SELECT description FROM testsearch WHERE description MATCH ' love "and" -"or" -"near" ';
/* =====================================================================
* The following queries do NOT find the inserted record
* because they do not match the text:
* 'Americans --love-- their __BASEball__ and <FOOTball>!'
* =====================================================================
*/
/* VERB STEM SEARCH: This query returns no results because the SIMPLE tokenizer does not match common English conjugations of verbs, such as 'loving', 'love', 'loved', and 'loves'. The PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'loving';
/* NOUN STEM SEARCH: This query returns no results because the SIMPLE tokenizer does not match common English singular and plural forms of nouns, such as 'american' and 'americans'. The PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'american';
/* FIRST WORD SEARCH: This query returns no results because 'love' is not the first word in the description. */
SELECT description FROM testsearch WHERE description MATCH '^love';
/* STEMMED MULTI-WORD SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'Baseballs american';
/* STEMMED NEAR SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'American NEAR baseballs ';
/* STEMMED PHRASE SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseballs ';
/* STEMMED NEAR N SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'loves NEAR/1 baseballs ';
/* Remove the testsearch table. */
DROP TABLE testsearch;
Supported Features
Current Support
The initial release of FairCom Full-Text Search supports the most relevant features typically expected by users of full-text search. Some minor features will be deferred until later phases of product development. This section lists considerations you need to keep in mind when using the current version of FTS.
Multiple FTI Supported per Data File
FTS permits multiple Full-Text Indexes per data file.
Word Dictionary
The Word Dictionary contains a list of tokens (as generated by the tokenizer) for known words appearing at least once in the indexed documents. Each index uses its own Word Dictionary. Because each tokenizer may encode words in its own way, each Word Dictionary uses a single tokenizer (you cannot change tokenizers without rebuilding the Word Dictionary).
Limitations
Several limitations will be addressed in future FTS releases:
Limit of 1 Field per FTI
FTS is currently limited to a single field per FTI. This limitation will be released in the future.
OR Operator - The OR operator is not supported in this release. All search terms are considered to be joined by AND operators. This functionality could be created in the application by breaking the search into multiple searches and presenting the union of their results.
Wildcard Search - Wildcard searches are not supported in this release of FTS.
A limited form of wildcard searches, called "term-prefix searches," allows you to search for words that begin with the specified characters.
Parentheses - The use of parentheses to indicate the precedence of parts of a complex search is not presently supported. However, in this release of FTS, all search terms are ANDed together, so precedence should not matter.
Full-Text Index stop word list
The Full-Text Search feature supports adding a list of stop words to a Full-Text Index when the word dictionary is created.
To set a stop word list when creating a full text index dictionary, call the ctdbSetFTIOption() function with option set to CTDB_FTI_OPTION_STOP_LIST and pvalue set to a UTF-8 string containing the stop words separated by spaces. Example:
rc = ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_STOP_LIST, "a an the", 0);
FairCom DB API FTS API
This section provides reference information about the FairCom DB API API functions that support Full-Text Search.
The following FairCom DB API functions are available for Full-Text Search:
- ctdbAddFTI - Add a new Full-Text Search Index to a table.
- ctdbAddFTIField - Add a new field to a Full-Text Search Index.
- ctdbAddFTIFieldByName - Add a new field to an FTS Index.
- ctdbAllocFTI - Allocate memory for a new Full-Text Search handle.
- ctdbCheckFTIBackgroundLoad - Check the status of the background index loader for the FTI.
- ctdbDelFTI - Delete a table's Full-Text Search configuration.
- ctdbFreeFTI - Release all resources associated with a Full-Text Search handle.
- ctdbFTSearchOff - Disable and free an existing Full-Text Search.
- ctdbFTSearchOn - Set up a new Full-Text Search.
- ctdbGetFTI - Retrieve the Full-Text Search handle.
- ctdbGetFTIByName - Retrieve the index handle given the index name.
- ctdbGetFTIByUID - Retrieve an FTI by its unique ID.
- ctdbGetFTIField - Retrieve the field handle of the field indicated by FieldNumber.
- ctdbGetFTIFieldCount - Retrieve the number of segments associated with this index.
- ctdbGetFTIFieldMode - Return the field mode specified when adding the field to the FTI.
- ctdbGetFTIHandle - Return the Full-Text Search handle or NULL on failure.
- ctdbGetFTIName - Retrieve the name of the FTI.
- ctdbGetFTINbr - Get the number of an FTI in the FTI list.
- ctdbGetFTINbrByName - Retrieve the index number given the index name.
- ctdbGetFTIStatus - Retrieve the status of the index handle.
- ctdbGetTableFTICount - Retrieve the number of FTS indices associated with the table.
- ctdbIsFTSearchOn - Indicate if a Full-Text Search is active or not.
- ctdbSetDefaultFTI - Disable and free an existing Full-Text Search.
- ctdbSetFTICndxExpr - Set the conditional expression for this Full Text Index.
- ctdbSetFTIOption - Allows several Full-Text Search values to be set depending on the option.
- ctdbStartFTIBackgroundLoad - Starts a background index loader for the FTI.
ctdbAddFTI
Declaration
CTHANDLE ctdbAddFTI(CTHANDLE Handle, pTEXT name)Description
Add a new Full-Text Search index to a table.
- Handle is a table handle (CTDBTABLE)
- name is an Index name
Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
Return Values
On success returns the Full-Text Search handle.
On failure returns NULL.
See Errors for a complete listing of valid FairCom DB error values.
See Also
- FairCom DB API Full-Text Search API
ctdbAddFTIField
Declaration
CTDBRET ctdbAddFTIField(CTHANDLE Handle, CTHANDLE FieldHandle, ULONG mode)Description
Add a new field to a Full-Text Search index.
- Handle is a Full-Text Search handle (CTDBFTS)
- FieldHandle is a field handle (CTDBFIELD)
- mode is a Full-Text search field matching mode:
| Mode | Code | Description |
| CTDB_FTI_MODE_REG | 0x00000001 | Regular - Default field treatment. |
| CTDB_FTI_MODE_UTF8 | 0x00000002 | Field format in UTF8. |
| CTDB_FTI_MODE_UTF16 | 0x00000004 | Field format in UTF16. |
Return Values
| Mode | Code | Description |
| CTDBRET_NOTFTS | 4149 | Not a Full-Text Search handle |
| CTDBRET_FTS_DEFINED | 4150 | Full-Text Search already defined on table |
See Errors for a complete listing of valid FairCom DB error values.
Example
See Full-Text Search Example.
See Also
- Full-Text Search ICU Tokenizer
- FairCom DB API Full-Text Search API
- Full-Text Search Example
ctdbAddFTIFieldByName
Declaration
CTDBRET ctdbAddFTIFieldByName(CTHANDLE Handle, NINT Number, pTEXT FieldName, ULONG mode)Description
Add a new field to an FTS index.
- Handle is a table handle (CTDBTABLE).
- Number is a Full-Text Search Index number.
- FieldName is a Field name.
- mode is a Full-Text search field matching mode:
| Mode | Code | Description |
| CTDB_FTI_MODE_REG | 0x00000001 | Regular - Default field treatment. |
| CTDB_FTI_MODE_UTF8 | 0x00000002 | Field format in UTF8. |
| CTDB_FTI_MODE_UTF16 | 0x00000004 | Field format in UTF16. |
Return
Error code
| Mode | Code | Description |
| CTDBRET_NOTFTS | 4149 | Not a Full-Text Search handle |
| CTDBRET_FTS_DEFINED | 4150 | Full-Text Search already defined on table |
See Errors for a complete listing of valid FairCom DB error values.
See Also
- Full-Text Search ICU Tokenizer
- FairCom DB API Full-Text Search API
ctdbAllocFTI
Declaration
pCTDBFTI ctdbAllocFTI(CTHANDLE Handle)Description
Allocate memory for a new Full-Text Index handle.
- Handle is a table handle (CTDBTABLE)
Return
The new allocated Full-Text Search handle (pCTDBFTS), or NULL on failure.
See Errors for a complete listing of valid FairCom DB error values.
ctdbCheckFTIBackgroundLoad
Check the status of the background index loader for the Full-Text Index (FTI).
Declaration
CTDBRET ctdbLOCAL ctdbCheckFTIBackgroundLoad(pCTDBFTI pFTI, pBGLDINF status)Parameters
- pFTI [IN] - Full-Text Index handle.
- status [OUT] - Status information.
Description
Note: It is not possible to delete an FTI for which a background load has been started and still not finished. Attempting this operation causes the ctdbAlterTable() to fail with error CTDBRET_LOADISACTIVE (4166, new error code).
This function was originally named ctdbFTICheckBackgroundLoad, which has been deprecated.
Return
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
| 4165 | CTDBRET_NOLOAD | The background FTI index loader has not been "started" on the specified FTI. |
See Errors for a complete listing of valid c-tree Plus error values.
See Also
- ctdbStartFTIBackgroundLoad()
- FairCom DB API - FTI Background Initial Load
ctdbDelFTI
Declaration
CTDBRET ctdbDelFTI(CTHANDLE Handle, COUNT Number)
Description
Delete a table Full-Text Search configuration.
- Handle is a table handle (CTDBTABLE).
- Number is a Full-Text Search configuration number (currently ignored).
Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
Return Values
| Mode | Code | Description |
| CTDBRET_NOTFTS | 4149 | Not a Full-Text Search handle |
| CTDBRET_FTS_DEFINED | 4150 | Full-Text Search already defined on table |
See Errors for a complete listing of valid FairCom DB error values.
ctdbFreeFTI
Declaration
VOID ctdbFreeFTI(CTHANDLE Handle)Description
Release all resources associated with a Full-Text Search handle.
- Handle is a Full-Text Search Handle (CTDBFTS)
Return Values
None
See Errors for a complete listing of valid FairCom DB error values.
ctdbFTSearchOff
Declaration
CTDBRET ctdbFTSearchOff(CTHANDLE Handle)Description
Disable and free an existing Full-Text Search.
- Handle is a record handle (CTDBRECORD)
Return Values
| Mode | Code | Description |
| CTDBRET_NOTFTS | 4149 | Not a Full-Text Search handle |
| CTDBRET_FTS_DEFINED | 4150 | Full-Text Search already defined on table |
See Errors for a complete listing of valid FairCom DB error values.
See Also
- ctdbFTSearchOn
- ctdbIsFTSearchOn
ctdbFTSearchOn
Declaration
CTDBRET ctdbFTSearchOn(CTHANDLE Handle, pTEXT query, NINT mode)Description
Set up a new Full-Text Search.
- Handle is a record handle (CTDBRECORD)
- query is the text to locate
- mode should be set to 0.
Remember: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
Return Values
| Mode | Code | Description |
| CTDBRET_NOTFTS | 4149 | Not a Full-Text Search handle |
| CTDBRET_FTS_DEFINED | 4150 | Full-Text Search already defined on table |
See Errors for a complete listing of valid FairCom DB error values.
See Also
- ctdbFTSearchOff
- ctdbIsFTSearchOn
ctdbGetFTI
Declaration
CTHANDLE ctdbDECL ctdbGetFTI(CTHANDLE Handle, COUNT Number)Description
Retrieve the Full-Text Search handle.
- Handle is a table handle (CTDBTABLE)
- Number is a Full-Text Search configuration number (must be set to 0; currently it is ignored)
Return Values
Returns the index handle on success or NULL on failure.
See Errors for a complete listing of valid FairCom DB error values.
ctdbGetFTIByName
Declaration
CTHANDLE ctdbDECL ctdbGetFTIByName(CTHANDLE Handle, pTEXT name)Description
Retrieve the index handle given the index name.
- Handle is a table handle.
- Name is an FTS Index name.
Return
Return the index handle on success or NULL on error.
See Errors for a complete listing of valid FairCom DB error values.
ctdbGetFTIByUID
Retrieve an FTI by its unique ID (the Full Text Index counterpart of ctdbGetIndexByUID).
Return Values
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
ctdbGetFTIField
Declaration
CTHANDLE ctdbDECL ctdbGetFTIField(CTHANDLE Handle, NINT FieldNumber)Description
Retrieve the field handle of the field indicated by FieldNumber.
- Handle is an FTS handle.
- FieldNumber is a field number in the FTS field sequence.
Return Values
Return return a HANDLE that is the CTDBFIELD handle.
See Errors for a complete listing of valid FairCom DB error values.
ctdbGetFTIFieldCount
Declaration
NINT ctdbDECL ctdbGetFTIFieldCount(CTHANDLE Handle)Description
Retrieve the number of segments associated with this index.
- Handle is an FTS handle
Return Values
Return the number of indexes or -1 on error.
See Errors for a complete listing of valid FairCom DB error values.
ctdbGetFTIFieldMode
Declaration
ULONG ctdbDECL ctdbGetFTIFieldMode(CTHANDLE Handle, NINT FieldNumber)Description
Return the field mode specified when adding the field to the FTI.
Parameters
- Handle [IN] - FTS handle
- FieldNumber [IN] - Field number in the FTI field sequence
Return Values
Return the field mode or 0 on error.
ctdbGetFTIHandle
Declaration
CTHANDLE ctdbGetFTIHandle(CTHANDLE Handle)Description
Return the Full-Text Search handle or NULL on failure.
- Handle [IN] - FairCom DB API C API handle
Return
Returns the index handle on success. Returns NULL on failure.
See Errors for a complete listing of valid FairCom DB error values.
ctdbGetFTIName
Declaration
pTEXT ctdbDECL ctdbGetFTIName(CTHANDLE Handle)Description
Retrieve the name of the Full-Text Index (FTI).
Return Values
Return string pointer to FTI name or NULL on failure.
ctdbGetFTINbr
Get the number of an FTI in the FTI list (the Full Text Index counterpart of ctdbGetIndexNbr).
Return Values
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
ctdbGetFTINbrByName
Declaration
COUNT ctdbDECL ctdbGetFTINbrByName(CTHANDLE Handle, cpTEXT name)Description
Retrieve the index number given the index name.
- Handle is a table handle
- Name is a Full-Text Search Index name.
Return Values
Return the index number on success or -1 on error.
See Errors for a complete listing of valid FairCom DB error values.
ctdbGetFTIStatus
Declaration
ULONG ctdbDECL ctdbGetFTIStatus(CTHANDLE Handle)Description
Retrieve the status of the index handle.
- Handle [IN] - FTS handle.
The status of the index handle is a bit map describing one or more of the following:
- CTDBFTI_OLD - Original value (no changes)
- CTDBFTI_NEW - Index added
- CTDBFTI_DEL - Original Index deleted
Return
Return the index handle status value.
See Errors for a complete listing of valid FairCom DB error values.
ctdbGetTableFTICount
Declaration
COUNT ctdbGetTableFTICount(CTHANDLE Handle)Description
Retrieve the number of FTS indexes associated with the table.
- Handle is a table handle (CTDBTABLE)
Return Values
Return the number of indexes or -1 on error.
See Errors for a complete listing of valid FairCom DB error values.
ctdbIsFTSearchOn
Declaration
CTBOOL ctdbIsFTSearchOn(CTHANDLE Handle)Description
Indicate if a Full-Text Search is active or not. A Full-Text Search is active after a successful call to ctdbITSearchOn(). A Full-Text Search can be switched off by calling ctdbFTSearchOff().
- Handle is a record handle (CTDBRECORD)
Return Values
Return YES is record set if active.
Return NO if record set is not active.
See Errors for a complete listing of valid FairCom DB error values.
See Also
- ctdbFTSearchOff
- ctdbFTSearchOn
ctdbSetDefaultFTI
Declaration
CTDBRET ctdbSetDefaultFTI(CTHANDLE Handle, COUNT Number)Description
Set the number of a new default FTI.
- Handle is a record handle (CTDBRECORD)
- Number is the Full-Text Search index to set as default by number
Return Values
Returns CTDBRET_OK on success.
| Mode | Code | Description |
| CTDBRET_NOTFTS | 4149 | Not a Full-Text Search handle |
| CTDBRET_FTS_DEFINED | 4150 | Full-Text Search already defined on table |
See Errors for a complete listing of valid FairCom DB error values.
ctdbSetFTICndxExpr
Declaration
CTDBRET ctdbDECL ctdbSetFTICndxExpr(CTHANDLE Handle, pTEXT conditionExpr)Parameters:
- Handle [IN] - Full Text Index handle
- conditionExpr [IN] - Conditional expression
Description
Set the conditional expression for this Full Text Index.
This function is very similar to the ctdbSetIndexCndxExpr function used on regular indexes.
If this function is called on an open table, it verifies that the expression is valid.
Remember: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
Addition to ctdbSetFTIOption
Alternatively, a new option for ctdbSetFTIOption allows you to set a conditional expression for an existing Full-Text Index (FTI):
- CTDB_FTI_OPTION_CNDXEXPR - Set the Full-Text Index conditional expression. The pvalue parameter contains the conditional expression to apply. The lvalue is ignored.
Return Values
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
See Also
- ctdbSetFTIOption
ctdbSetFTIOption
Declaration
CTDBRET ctdbDECL ctdbSetFTIOption(CTHANDLE Handle, UCOUNT option, pTEXT pvalue, ULONG lvalue)Description:
Allows several Full-Text Search values to be set depending on the option.
Parameters:
- Handle [IN] - FTI handle
- option [IN] - Option to set
- pvalue [IN] - If the option takes a string value, set it to this value
- lvalue [IN] - If the option takes a long value, set it to this value
In function ctdbSetFTIOption, either lvalue or pvalue needs to be set depending on the option.
Update Options
CTDB_FTI_OPTION_UPDSet lvalue to one of the following to specify when to perform FTS Index updates:
| Mode | Code | Description |
|---|---|---|
| CTDB_FTI_OPTION_UPD_ASYNC | 0x0000 | Perform FTI updates in background (default) |
| CTDB_FTI_OPTION_UPD_IMMEDIATE | 0x0001 | Perform FTI updates immediately |
| CTDB_FTI_OPTION_IGNORE_EXISTING | 0x0100 |
Disable the loading of existing records to create an FTI without loading existing records This setting is not persisted after creation. It is possible to perform a ctdbAlterTable(...CTDB_ALTER_FTI) indicating to not load existing record on rebuild (i.e., empty the indices it rebuilds) by explicitly calling ctdbSetFTIOption (... ,CTDB_FTI_OPTION_IGNORE_EXISTING) on the proper indices. |
Tokenizer Options
CTDB_FTI_OPTION_TOKENSIZESet lvalue to the maximum token size.
CTDB_FTI_OPTION_TOKENIZERSet lvalue to one of the following:
| Mode | Code | Description |
|---|---|---|
| CTDB_FTI_TOKENIZER_CUSTOM | 0 | Use a user-defined custom tokenizer - see Full-Text Search Custom Tokenizer |
| CTDB_FTI_TOKENIZER_SIMPLE | 1 | Use the standard, simple tokenizer, which uses white space to delimit tokens. Default |
| CTDB_FTI_TOKENIZER_PORTER | 2 | Use the Porter Tokenizer, which uses "stemming" to reduce words to a common root so grammatically similar words can be matched. |
| CTDB_FTI_TOKENIZER_ICU | 3 | Use the ICU Unicode Tokenizer - see Full-Text Search ICU Tokenizer |
ICU Tokenizer
CTDB_FTI_OPTION_ICULANGSet pvalue to a string identifying the language for Unicode collation based on ICU library capabilities to use with the ICU tokenizer. See Full-Text Search ICU Tokenizer.
CTDB_FTI_OPTION_ICUOPTIONSet lvalue to an ICU option using c-tree extended segment modes for ICU-based indexes configuration. See Full-Text Search ICU Tokenizer.
Custom Tokenizer
CTDB_FTI_OPTION_CUSTLIBSet pvalue to the name of the library to load implementing a custom tokenizer. See Full-Text Search Custom Tokenizer.
CTDB_FTI_OPTION_CUSTPARAMSet pvalue to a string that will be passed to the custom tokenizer init function with custom details. See Full-Text Search Custom Tokenizer.
Dictionary Domain Options
This option is used internally to define the scope of the word dictionary used for full-text search. It currently has no exposed API uses.
| Dictionary Domain | Code | Description |
|---|---|---|
| CTFTI_DOMAIN_TABLE | 0x0000 | Scope is the table. |
| CTFTI_DOMAIN_DATABASE | 0x0001 | Scope is the database the table belongs. |
| CTFTI_DOMAIN_SERVER | 0x0002 | Scope is the server. |
Returns:
ctdbSetFTIOption() returns CTDBRET_OK on success, or one of the following errors on failure:
- CTDBRET_NOTFTI if called passing a handle not belonging to an FTI.
- CTDBRET_INVARG if called passing an invalid argument combination or unknown option.
- CTDBRET_NOTSUPPORTED if called with the Full-Text index support not enabled.
See Also
- Full-Text Search ICU Tokenizer
- Full-Text Search Custom Tokenizer
ctdbStartFTIBackgroundLoad
Starts a background index loader for the Full-Text Index (FTI).
Declaration
CTDBRET ctdbLOCAL ctdbStartFTIBackgroundLoad(pCTDBFTI pFTI)Description
- pFTI - [IN] Full-Text Index handle.
Options
In addition to the existing FTI options:
CTDB_FTI_OPTION_IGNORE_EXISTING - ignore existing record.
The following FTI option has been added:
CTDB_FTI_OPTION_BACKLOAD_EXISTING - load existing record in background.
The setting is persisted so that, in the case of an Alter Table that requires rebuilding the index, the behavior will be the one specified at creation time.
Note: It is not possible to delete an FTI for which a background load has been started and still not finished. Attempting this operation causes the ctdbAlterTable() to fail with error CTDBRET_LOADISACTIVE (4166, new error code).
This function was originally named ctdbFTIStartBackgroundLoad, which has been deprecated.
Return Values
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
Example
ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_UPD, NULL, CTDB_FTI_OPTION_BACKLOAD_EXISTING) See Also
- ctdbCheckFTIBackgroundLoad()
- FairCom DB API - FTI Background Initial Load
FairCom DB API - FTI Background Initial Load
FairCom Server supports loading a Full-Text Index (FTI) with existing records in the background. To simplify the use of background load thread for FTI at creation time, the FairCom DB API functions listed below have been implemented:
- ctdbFTICheckBackgroundLoad - Check the status of the background index loader for the Full-Text Index (FTI).
- ctdbFTIStartBackgroundLoad - Starts a background index loader for the Full-Text Index (FTI).
Note: It is not possible to delete an FTI for which a background load has been started and still not finished. Attempting this operation causes the ctdbAlterTable() to fail with error CTDBRET_LOADISACTIVE (4166 new error code).
For advanced users, see:
- Asynchronous load of Full-Text Index
Asynchronous Load of Full-Text Index
For Advanced Users
FairCom Server supports loading a Full-Text Index (FTI) with existing records in the background. This means that an FTI can be quickly created with the data file open in exclusive mode, and then the data file can be closed and reopened in shared mode to permit other connections to access the file while the original connection requests a background thread to load the FTI from the records in the data file.
The status of the background load can be monitored. When the background load is requested, it is assigned a user-defined load identifier. The status of all requested background load operations for a particular data file is stored as a resource in that data file.
The following information is stored for each background load:
/* Background load (of index or record update callback) details: */
typedef struct bgldinf {
LONG8 queuedtime; /* time at which the load was queued */
LONG8 statustime; /* time at which current status was reported */
LONG8 offset; /* current offset in file */
BGLDSTT status; /* current load status */
LONG errcod; /* background load error code */
LONG curpct; /* current percentage complete */
LONG pad; /* padding for structure alignment */
TEXT cbname[BGLDIDZ];/* name of the callback function */
TEXT loadid[BGLDIDZ];/* ID for this background load operation */
} BGLDINF, *pBGLDINF;Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
Requesting a background load of an FTI
To request a background load of an FTI, call the ctRecordUpdateCallbackControl() function, specifying an opcode of RUCBCTLqueuecallback and the name of the FTI's record update callback function, which is $FTS$indexname.
Below is an example showing the background load of the FTI named myFTSidx001 on the file qafulltextsearch.dat, using a load identifier of myftsload. This load identifier can be used to monitor the progress of the background load. Note that in this example, datno is the data file number for the data file, which the application has already opened:
RUCBCTL rucbctl;
RUCBQCB rucbqcb;
memset(&rucbctl,0,sizeof(rucbctl));
memset(&rucbqcb,0,sizeof(rucbqcb));
rucbctl.verson = RUCBCTL_VERS_V01;
rucbctl.opcode = RUCBCTLqueuecallback;
rucbctl.bufsiz = sizeof(rucbqcb);
rucbctl.bufptr = (pTEXT) &rucbqcb;
rucbqcb.verson = RUCBQCB_VERS_V01;
rucbqcb.datnam = "qafulltextsearch.dat";
rucbqcb.datno = datno;
rucbqcb.cbname = "$FTS$myFTSidx001";
rucbqcb.loadid = "myftsload";
if ((rc = (COUNT) ctRecordUpdateCallbackControl(&rucbctl))) {
printf("Error: Failed to queue record update callback invocation: %d (%d)\n",
rc,sysiocod);
goto err_ret;
}
printf("Successfully queued record update callback invocation.\n");The dfkutl utility supports an option to request a background load of an FTI. Use the -queuecbkload option, specifying the name of the data file, the callback, and the load ID. Here is an example:
dfkctl -queuecbkload qafulltextsearch.dat $FTS$myFTSidx001 myftsload
Successfully queued callback invocation for file qafulltextsearch.dat.Monitoring a background load of a full text index
To monitor the status of a background load of a full text index, call the ctGetBackgroundLoadStatus() function, specifying the data file number and the load identifier. Below is an example:
printf("Waiting for full text index update to complete...\n");
while (1) {
pBGLDRES
pbgldres;
BGLDRES
bgldres;
NINT rc,reslen;
/* Read status from resource. */
reslen = sizeof(bgldres);
pbgldres = &bgldqres;
if (!(rc = ctGetBackgroundLoadStatus(datno,rucbqcb.loadid,&pbgldres,&reslen))) {
printf("Queued callback load resource: status=%d offset=" ctLLnd(10) " (%3d%%)\n",
bgldres.bgldi[0].status,bgldres.bgldi[0].offset,bgldres.bgldi[0].curpct);
if (pbgldres->bgldi[0].status == BGLD_SUCCEEDED) {
printf("Full text index background load completed successfully.\n");
break;
} else if (pbgldres->bgldi[0].status == BGLD_FAILED) {
rc = bgldres.bgldi[0].errcod;
printf("Error: Full text index background load terminated with error %d\n",
rc);
break;
} else {
ctThrdSleep(200);
}
} else {
printf("Error: Failed to read background load status: %d\n", rc);
break;
}
}The dfkutl utility supports an option to read the status of a background load of a full text index. Use the -getbgload option, specifying the name of the data file and the load id. Here is an example:
dfkctl -getbgload qafulltextsearch.dat myftsload -h 1 -i 1 1
status errcod offset curpct callback
succeeded 0 1214400 100% $FTS$myFTSidx001
loadid: myftsload
queuedtime: Wed Sep 6 12:18:40 2017
statustime: Wed Sep 6 12:16:58 2017If the load id is not specified, the status of all background loads for a data file is displayed. The background load status is stored in a resource in the data file. The status entries remain in the resource even after the background load completes so the status can be examined after load completion. To delete an entry from this resource, use the dfkctl utility's -delbgload option, specifying the name of the data file and optionally the load ID. If no load id is specified, all entries are deleted from the resource. Here is an example:
dfkctl -delbgload qafulltextsearch.dat myftsload
Successfully deleted status of all background loads for file qafulltextsearch.datRequesting a background load of a c-tree B+-tree index
c-tree also supports a background load of a c-tree B+-tree index by calling the ctDeferredIndexControl() function with opcode of ctDeferredIndexControl. It is now possible to assign a load ID to the background load of a B+-tree index.
Below is an example. Note the use of DFKQIL structure version 2 instead of version 1. Version 2 of the structure adds a loadid field:
DFKCTL dfkctl;
DFKQIL dfkqil;
LONG idxlst[1];
memset(&dfkctl,0,sizeof(dfkctl));
memset(&dfkqil,0,sizeof(dfkqil));
dfkctl.verson = DFKCTL_VERS_V01;
dfkctl.opcode = DFKCTLqueueload;
dfkctl.bufsiz = sizeof(dfkqil);
dfkctl.bufptr = (pTEXT) &dfkqil;
dfkqil.verson = DFKQIL_VERS_V02;
dfkqil.datno = datno;
dfkqil.numidx = 1;
idxlst[0] = datno + 1;
dfkqil.idxlst = idxlst;
dfkqil.loadid = "myindexload";
if ((rc = ctDeferredIndexControl(&dfkctl)) != NO_ERROR) {
printf("Error: Failed to schedule index load: %d\n",
rc);
goto err_ret;
}
printf("Successfully scheduled index load.\n");The dfkutl utility supports an option to request a background load of a B+-tree index. Use the -queueidxload option, specifying the name of the data file, the callback, and the load id. Here is an example:
dfkctl -queueidxload qafulltextsearch.dat 1 id=myftsload
Successfully queued callback invocation for file qafulltextsearch.dat.See Also
- ctGetBackgroundLoadStatus
- ctDeleteBackgroundLoadStatus
ctGetBackgroundLoadStatus
Declaration
extern ctCONV NINT ctDECL ctGetBackgroundLoadStatus(COUNT datno,pTEXT loadid,ppBGLDRES ppbgldres,pNINT preslen);Parameters:
- [IN] datno - c-tree data file number.
- [IN] loadid - If not NULL, specifies the ID of one background load operation whose information the caller wants to retrieve.
- [IN,OUT] ppbgldldres - If *ppbgldres is not NULL, *preslen must be set to the size of this buffer. If the buffer is sufficiently large, the function writes the resource data to that address and sets *preslen to the amount of data written.
If *ppbgldres is NULL, the function allocates a sufficiently-sized buffer to hold the resource data and sets *preslen to the size of the buffer. The caller is responsible for freeing this buffer by calling mbfree(). Note that in this case we allocate the entry with space for an additional BGLDINF structure so that the caller can add its entry to this buffer if desired. - [IN,OUT] preslen - Size of the input or allocated resource buffer. See the description of the ppbgldres parameter above for more details.
Description
ctGetBackgroundLoadStatus() reads the resource that contains the status of all background index and record update callback load operations for the specified data file.
Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
Return Values
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
See Also
- ctDeleteBackgroundLoadStatus
- Asynchronous load of Full-Text Index
ctDeleteBackgroundLoadStatus
Declaration
extern ctCONV NINT ctDECL ctDeleteBackgroundLoadStatus(COUNT datno,pTEXT loadid);Parameters:
- [IN] datno - c-tree data file number.
- [IN] loadid - If not NULL, specifies the ID of one background load operation whose information the caller wants to delete from the resource. If NULL, the entire resource is deleted.
Description
ctDeleteBackgroundLoadStatus() deletes the resource that contains the status of all background load operations for the specified data file, or deletes the specified entry from this resource.
Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
Return Values
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
See Also
- ctGetBackgroundLoadStatus
- Asynchronous load of Full-Text Index
FairCom DB API C++, Java, and .NET FTS API
This section provides reference information about the following three APIs, that are all based on the FairCom DB API API:
- FairCom DB API C++
- FairCom DB API Java (see the JTDB Javadocs)
- FairCom DB API .NET
Each of these APIs is very similar in function. The tables below show the function name for each API.
Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
The remainder of this chapter provides function details for the FairCom DB API C++ API. The details are the same for FairCom DB API Java and FairCom DB API .NET. Only the API name will vary, as detailed in the tables below:
Class: CTTable
The following methods are available in the CTTable class:
| c-treeDB C++ Methods | c-treeDB .NET Methods | c-treeDB Java Methods | |
| Add a new index to the table | AddFullTextIndex | AddFullTextIndex |
AddFullTextIndex (java.lang.String name) |
| Remove a full text index from a table |
DelFullTextIndex by index number by index name |
DelFullTextIndex by index number by index name |
DelFullTextIndex by index number by index name |
| Retrieve a Full Text index object |
GetFullTextIndex by index number by index name |
GetFullTextIndex by index number by index name |
GetFullTextIndex by index number by index name |
| Retrieve the number of full text indexes of this table | GetFullTextIndexCount | GetFullTextIndexCount | GetFullTextIndexCount |
| Set Full Text index options |
SetFullTextIndexOption by index object by index number by index name |
SetFullTextIndexOption by index object by index number by index name |
SetFullTextIndexOption by index object by index number by index name |
Class: CTRecord
The following methods are available in the CTRecord class:
| c-treeDB C++ Methods | c-treeDB .NET Methods | c-treeDB Java Methods | |
| Set the new record default Full Text index number | SetDefaultFullTextIndex | SetDefaultFullTextIndex | SetDefaultFullTextIndex |
| Set the full text search criteria | FullTextSearchOn | FullTextSearchOn | FullTextSearchOn |
| Indicate if Full Text Search is active on the record | IsFullTextSearchOn | IsFullTextSearchOn | IsFullTextSearchOn |
| Turn off Full Text Search | FullTextSearchOff | FullTextSearchOff | FullTextSearchOff |
| Get detail about Full Text Search error | GetFullTextSearchErrorDetails | GetFullTextSearchErrorDetails | GetFullTextSearchErrorDetails |
Class: CTFullTextIndex
The following is available in the CTFullTextIndex class:
| c-treeDB C++ Methods | c-treeDB .NET Methods | c-treeDB Java Methods | |
| Constructors |
CTFullTextIndex() () (CTFullTextIndex) |
CTFullTextIndex () (CTFullTextIndex) |
CTFullTextIndex () (CTFullTextIndex) |
| Add a new field to Full Text Index | AddField | AddField | AddField |
| Retrieve a Full Text index field | GetField | GetField | GetField |
| Retrieve the Full Text index field count. | GetFieldCount | GetFieldCount | GetFieldCount |
| Retrieve a Full Text index field mode | GetFieldMode | GetFieldMode | GetFieldMode |
| Retrieve the status of the index handle. | GetStatus | GetStatus | GetStatus |
| Set Full Text index options | SetOption | SetOption | SetOption |
| Retrieve the Full Text index name | GetName | GetName | GetName |
| Retrieve the ordinal number of the current Full Text index | GetNumber | GetNumber | GetNumber |
Class: CTFullTextIndexDictionary
The following is available in the FairCom DB API CTFullTextIndexDictionary class:
| c-treeDB C++ Methods | c-treeDB .NET Methods | c-treeDB Java Methods | |
| Constuctors |
CTFullTextIndexDictionary () (CTDatabase) (CTFullTextIndexDictionary) (CTSession) (CTTable) |
CTFullTextIndexDictionary () (CTDatabase) (CTFullTextIndexDictionary) (CTSession) (CTTable) |
CTFullTextIndexDictionary () (CTDatabase) (CTFullTextIndexDictionary) (CTSession) (CTTable) |
| Set Full Text index Dictionary options | SetOption | SetOption | SetOption |
Class: CTTable
The following methods are available in the CTTable class:
- AddFullTextIndex
- DelFullTextIndex (by index number)
- DelFullTextIndex (by index name)
- GetFullTextIndex (by index number)
- GetFullTextIndex (by index name)
- AddField (by field object)
- AddField (by name)
- AddField (by number)
- GetFullTextIndexCount
- SetFullTextIndexOption (by index object)
- SetFullTextIndexOption (by index number)
- SetFullTextIndexOption (by index name)
See Also
- FairCom DB API C++, Java, and .NET FTS API
AddField (by field object)
Declaration
void CTTable::AddField(CTFullTextIndex& pIndex, const CTField& pField, ULONG mode)Parameters:
- pIndex [IN] - Full-Text Index object.
- pField [IN] - Field to compose the index.
- mode [IN] - Full-Text search field matching mode
Description
Add a new segment to an index.
Return Values
Return a segment object.
See Also
- AddField (by name)
- AddField (by number)
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
AddField (by name)
Declaration
void CTTable::AddField(CTFullTextIndex& pIndex, const CTString& FieldName, ULONG mode)Parameters:
- pIndex [IN] - Full-Text Index object.
- FieldName [IN] - Name of field to compose the index.
- mode [IN] - Full-Text Search field matching mode.
Description
Add a new segment to an index using the field name.
Return Values
Return a segment object.
See Also
- AddField
- AddField (by name)
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
AddField (by number)
Declaration
void CTTable::AddField(CTFullTextIndex& pIndex, NINT FieldNumber, ULONG mode)Parameters:
- pIndex [IN] - Full-Text Index object.
- FieldNumber [IN] - Number of field to compose the index.
- mode [IN] - Full-Text search field matching mode.
Description
Add a new segment to an index using the field number.
Return Values
Return a segment object.
See Also
- AddField
- AddField (by name)
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
AddFullTextIndex
Declaration
CTFullTextIndex CTTable::AddFullTextIndex(const CTString& name)Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
Parameters:
- name [IN] - Full-Text Index name.
Description
Add a new Full-Text Index to the table.
Return Values
Return an index object.
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
See Also
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
DelFullTextIndex (by index name)
Declaration
void CTTable::DelFullTextIndex(const CTString& IndexName)Parameters:
- IndexName [IN]
Description
Delete a Full-Text Index from a table using the index name.
Return Values
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
See Also
- DelFullTextIndex (by index number)
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
DelFullTextIndex (by index number)
Declaration
void CTTable::DelFullTextIndex(NINT IndexNumber)Parameters:
- IndexNumber [IN]
Description
Delete a Full-Text index from a table using the index number.
Return Values
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
See Also
- DelFullTextIndex (by index name)
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
GetFullTextIndex (by index name)
Declaration
CTFullTextIndex CTTable::GetFullTextIndex(const CTString& name)Parameters:
- name [IN] - Index name
Description
Retrieve the Full-Text Index using its name.
Return Values
Return an index object.
See Also
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
GetFullTextIndex (by index number)
Declaration
CTFullTextIndex CTTable::GetFullTextIndex(NINT IndexNumber)Parameters:
- IndexNumber [IN]
Description
Retrieve a Full-Text Index object using the index number.
Return Values
Return an index object.
See Also
- GetFullTextIndex (by name)
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
GetFullTextIndexCount
Declaration
COUNT CTTable::GetFullTextIndexCount() constParameters:
None.
Description
Retrieve the number of Full-Text Indexes of this table.
Return Values
Return the number of indexes.
See Also
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
SetFullTextIndexOption (by index object)
Declaration
void CTTable::SetFullTextIndexOption(CTFullTextIndex& pIndex, UCOUNT option, CTString &pvalue, ULONG lvalue)Parameters:
- pIndex [IN] - Full-Text Index object.
- option [IN] - Option to set.
- pvalue [IN] - Value to set the option to for option taking string value.
- lvalue [IN] - Value to set the option to for option taking long value.
Description
Set Full-Text Index options using the Full-Text Index object.
See Also
- SetFullTextIndexOption (by number)
- SetFullTextIndexOption (by name)
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
SetFullTextIndexOption (by name)
Declaration
void CTTable::SetFullTextIndexOption(const CTString& IndexName, UCOUNT option, CTString &pvalue, ULONG lvalue)Parameters:
- IndexName [IN] - Full-Text Index name.
- option [IN] - Option to set.
- pvalue [IN] - Value to set the option to for option taking string value.
- lvalue [IN] - Value to set the option to for option taking long value.
Description
Set Full-Text Index options using the Full-Text Index name.
See Also
- SetFullTextIndexOption
- SetFullTextIndexOption (by number)
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
SetFullTextIndexOption (by number)
Declaration
void CTTable::SetFullTextIndexOption(NINT IndexNbr, UCOUNT option, CTString &pvalue, ULONG lvalue)Parameters:
- IndexNbr [IN] - Full-Text Index number.
- option [IN] - Option to set.
- pvalue [IN] - Value to set the option to for option taking string value.
- lvalue [IN] - Value to set the option to for option taking long value.
Description
Set Full-Text Index options using the Full-Text Index number.
See Also
- SetFullTextIndexOption
- SetFullTextIndexOption (by name)
- Class: CTTable
- FairCom DB API C++, Java, and .NET FTS API
Class: CTRecord
The following methods are available in the CTRecord class:
- SetDefaultFullTextIndex
- FullTextSearchOn
- IsFullTextSearchOn
- FullTextSearchOff
- GetFullTextSearchErrorDetails
See Also
- FairCom DB API C++, Java, and .NET FTS API
FullTextSearchOff
Declaration
void CTRecord::FullTextSearchOff()Parameters:
None.
Description
Turn off Full-Text Search.
Return Values
None.
See Also
- Class: CTRecord
- FairCom DB API C++, Java, and .NET FTS API
FullTextSearchOn
Declaration
void CTRecord::FullTextSearchOn(CTString &query)Parameters:
- query [IN] - The full text search query
Description
Set the Full-Text Search criteria.
Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.
Return Values
None.
See Also
- Class: CTRecord
- FairCom DB API C++, Java, and .NET FTS API
GetFullTextSearchErrorDetails
Declaration
void CTRecord::GetFullTextSearchErrorDetails(pNINT error, pNINT position)Parameters:
- error [OUT] - error occurred.
- position [OUT] - position of the error in the Full Text Search query string.
Returns:
Description
Get detail about Full-Text Search error.
Return Values
None.
See Also
- Class: CTRecord
- FairCom DB API C++, Java, and .NET FTS API
IsFullTextSearchOn
Declaration
CTBOOL CTRecord::IsFullTextSearchOn()Parameters:
None.
Description
Indicate if Full-Text Search is active on the record.
Return Values
Return YES if the Full-Text Search is on.
See Also
- Class: CTRecord
- FairCom DB API C++, Java, and .NET FTS API
SetDefaultFullTextIndex
Declaration
void CTRecord::SetDefaultFullTextIndex(NINT indexno)Parameters:
- indexno [IN] - The new record Full Text index number.
Description
Set the new record default Full Text index number.
Return Values
None.
See Also
- Class: CTRecord
- FairCom DB API C++, Java, and .NET FTS API
Class: CTFullTextIndex::CTFullTextIndex()
The following methods are available in this class:
- AddField
- GetField
- GetFieldCount
- GetFieldMode
- CTFullTextIndex::GetStatus
- CTFullTextIndex::SetOption
- GetName
- GetNumber
Constructor:
CTFullTextIndex::CTFullTextIndex(const CTFullTextIndex& pIndex)Destructor:
CTFullTextIndex::~CTFullTextIndex()Operator: =
Assign a CTIndex object to another.
CTFullTextIndex& CTFullTextIndex::operator=(const CTFullTextIndex& pIndex)Parameters:
None.
Returns:
Return the assigned object.
See Also
- FairCom DB API C++, Java, and .NET FTS API
AddField
Declaration
void CTFullTextIndex::AddField(const CTField& pField, ULONG mode)Parameters :
- pField [IN] - Field object.
- mode [IN] - Full-Text Search field matching mode.
Description
Add a new field to a Full-Text Index.
See Also
- Class: CTFullTextIndex::CTFullTextIndex()
- FairCom DB API C++, Java, and .NET FTS API
GetField
Declaration
CTField CTFullTextIndex::GetField(NINT FieldNumber)Parameters:
- FieldNumber [IN]
Description
Retrieve a Full-Text Index field.
Return Values
Field object.
See Also
- Class: CTFullTextIndex::CTFullTextIndex()
- FairCom DB API C++, Java, and .NET FTS API
GetFieldCount
Declaration
NINT CTFullTextIndex::GetFieldCount() constParameters:
None.
Description
Retrieve the Full-Text Index field count.
Return Values
Return the Full-Text Index field count.
See Also
- Class: CTFullTextIndex::CTFullTextIndex()
- FairCom DB API C++, Java, and .NET FTS API
GetFieldMode
Declaration
ULONG CTFullTextIndex::GetFieldMode(NINT FieldNumber)Parameters:
- FieldNumber [IN]
Description
Return the field mode specified when adding the field to the FTI.
Return Values
Return the field mode or 0 on error.
See Also
- Class: CTFullTextIndex::CTFullTextIndex()
- FairCom DB API C++, Java, and .NET FTS API
GetName
Declaration
CTString CTFullTextIndex::GetName()Parameters:
None.
Description
Retrieve the Full-Text Index name.
Return Values
Return the Full-Text Index name.
See Also
- Class: CTFullTextIndex::CTFullTextIndex()
- FairCom DB API C++, Java, and .NET FTS API
GetNumber
Declaration
NINT CTFullTextIndex::GetNumber()Parameters:
None.
Description
Retrieve the Full-Text Index number.
Return Values
Return the Full-Text Index number.
See Also
- Class: CTFullTextIndex::CTFullTextIndex()
- FairCom DB API C++, Java, and .NET FTS API
GetStatus
Declaration
ULONG CTFullTextIndex::GetStatus() constParameters:
None.
Description
Retrieve the status of the index handle. The status of the index handle is a bit map describing one or more of the following:
- CTDBFTI_OLD - Original value (no changes)
- CTDBFTI_NEW - Index added
- CTDBFTI_DEL - Original Index deleted
- CTDBFTI_OPTUPD - Optinos have been updated
Return Values
Return the index handle status.
See Also
- Class: CTFullTextIndex::CTFullTextIndex()
- FairCom DB API C++, Java, and .NET FTS API
SetOption
Declaration
void CTFullTextIndex::SetOption(UCOUNT option, CTString &pvalue, ULONG lvalue)Parameters:
- option [IN] - Option to set.
- pvalue [IN] - Value to set the option to for option taking string value.
- lvalue [IN] - Value to set the option to for option taking long value.
Description
Set Full-Text Index options.
See Also
- Class: CTFullTextIndex::CTFullTextIndex()
- FairCom DB API C++, Java, and .NET FTS API
Class: CTFullTextIndexDictionary
FairCom DB API C++ API Full-Text Index methods:
- CTFullTextIndexDictionary::SetOption
Constructor:
CTFullTextIndexDictionary::CTFullTextIndexDictionary(const CTSession &session)Constructor:
CTFullTextIndexDictionary::CTFullTextIndexDictionary(const CTDatabase &database)Constructor:
CTFullTextIndexDictionary::CTFullTextIndexDictionary(const CTTable &table)Destructor:
CTFullTextIndexDictionary::~CTFullTextIndexDictionary()See Also
- FairCom DB API C++, Java, and .NET FTS API
CTFullTextIndexDictionary::SetOption
Declaration
void CTFullTextIndexDictionary::SetOption(UCOUNT option, CTString &pvalue, ULONG lvalue)Parameters:
- option [IN] - Option to set.
- pvalue [IN] - Value to set the option to for option taking string value.
- lvalue [IN] - Value to set the option to for option taking long value.
Description
Set Full-Text Index dictionary options.
Return Values
| Value | Symbolic Constant | Explanation |
|---|---|---|
| 0 | CTDBRET_OK | Successful operation. |
See Errors for a complete listing of valid c-tree Plus error values.
See Also
- Class: CTFullTextIndexDictionary
- FairCom DB API C++, Java, and .NET FTS API
ctCopyFile extended to copy Full-Text Indexes
In FairCom DB V11.5 and later, the ctCopyFile and ctfcpCopyFile functions support copying Full-Text Index (FTI) files belonging to the copied table.
- The original FTI name and the new names need to be specified after all physical index names.
- The FTIs need to be specified by the name used when creating them (i.e. name with directory, when used, and without extension).
Tokenizers
Internally, FairCom Full-Text Search uses a "tokenizer" to divide text into "tokens," which are roughly equivalent to a list of categorized words. A tokenizer follows a set of rules for extracting tokens (usually single words) from a text string or search query. Several algorithms can be used to tokenize text. The simplest uses white space to determine the boundaries between words. More advanced algorithms can be used when necessary.
FairCom Full-Text Search provides built-in support for several tokenizers:
| Type | Algorithm | Recommended Usage |
|---|---|---|
| Simple | Essentially uses white space and punctuation to delimit tokens. Not case-sensitive. | This is the default. Allows for quick and easy searching. |
| Porter | Uses "stemming" to reduce words to a common root to allow grammatically similar words to be matched. For example, "searching" and "searched" have the same stem, "search." | The Porter tokenized creates more compact indices by compacting words to their simplest form. This type of stemming can return false positive results. |
| ICU | Allow international support following Unicode rules for handling supported languages. Case-sensitive (depending on configuration). | When Unicode support is required, this is the recommended tokenizer. |
| Custom | Allows FairCom customers to develop their own tokenizers for special requirements. | You can create a custom tokenizer if you have special requirements. Sample code is provided to get you started. |

See also:
- Full-Text Search ICU Tokenizer
- Full-Text Search Custom Tokenizer
Full-Text Search ICU Tokenizer
FairCom Full-Text Search capabilities are available on Unicode text by using a ICU based tokenizer. An application can configure the ICU tokenizer as follows:
- Call ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_TOKENIZER, NULL, CTDB_FTI_TOKENIZER_ICU)
- Call ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_ICULANG, XXXX , 0) where XXXX is a string specifying the locale
- Call ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_ICUOPTION, NULL, YYYY) where YYYY is a ctKSEG_COMPU* combination.
Calls to ctdbAddFTIField or ctdbAddFTIFieldByName can specify the mode parameter using the following values:
- CTDB_FTI_MODE_REG: source string encoding depends on the DODA, CT_*STRING are considered in UTF-8 format, Ct_*UNICODE are considered in UTF-16 format.
- CTDB_FTI_MODE_UTF-8: source string in UTF-8 encoding
- CTDB_FTI_MODE_UTF-16: source string in UTF-16 encoding
The CTDB_FTI_OPTION_ICUSTRENGTH option has been renamed to CTDB_FTI_OPTION_ICUOPTION.
Full-Text Search Custom Tokenizer
In addition to the default tokenizers provided with FairCom DB FTS, support is available to call a DLL containing a custom tokenizer. FairCom Full-Text Search allows programmers to create their own full-text tokenizer and set it as the tokenizer to be used in a full-text index.
To use a custom tokenizer, the programmer must do the following:
- Provide a DLL that the server can load.
- When creating the full-text index, use the ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_CUSTLIB...) function to specify the library name (without the .DLL extension on Windows and without the lib on Unix).
- Use the ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_CUSTPARAM...) function to pass custom configuration parameters to the tokenizer.
The c-tree source code in the sdk\Xtras\ctree.samples\special\tokenizer directory contains an example of a custom tokenizer, easytok.c, and a stub for the tokenizer to be implemented by programmers in tokenizer.c. Both files have no dependency on any c-tree code and can be simply compiled as a DLL (for instance cl /LD easytok.c on Windows) or shared library on Unix and copied to a place where the server can load them.
See the complete FairCom Full-Text Search documentation for a list the functions that must be implemented.
Tokenizer_init
Initialize the tokenizer. This function is called:
- at index creation to verify that the shared library/DLL can be properly open, the functions are resolved, and they are callable.
- before starting the text to be indexed tokenization
- when parsing search query in order to tokenize the request
Parameters:
- texttype [IN] - the type of the text passed in: CTDB_FTI_MODE_REG, CTDB_FTI_MODE_UTF8, CTDB_FTI_MODE_UTF16
-
text [IN] - text to be tokenized. The memory it points to it is guaranteed to be valid until the Tokenizer_reset or the Tokenizer_end call.
Notice that the text is passed only at init time so it is the implementer's responsibility to keep track of it and the "current position" - textsize [IN] - size (in bytes) of the text passed in
- maxtokensize [IN] - size (in bytes) of the maximum token length the Index has been set to
- param [IN] - tokenizer parameter string passed by the application.
- errcode [OUT] - error code. (Guaranteed to be != NULL)
Returns:
Tokenize context handle that will be passed to the other functions.
NULL in case of error.
Usage:
DLLexport void* Tokenizer_init (unsigned long texttype, char* text, size_t textsize, long maxtokensize, char* param, int* errcode)
Tokenizer_reset
Resets the text and its size for an already initialized tokenizer.
This function is used mainly during searches to tokenize the various search items of a search query.
Parameters:
- handle [IN] - tokenizer context handle that needs the text to be refreshed.
- text [IN] - text to be tokenized. The memory it points to it is guaranteed to be valid until the next Tokenizer_reset or the Tokenizer_end call.
- textsize [IN] - size (in bytes) of the text passed in.
Returns:
CTDBRET_OK if successful, or the c-tree error code on failure.
Usage:
DLLexport int Tokenizer_reset (void *handle, char* text, size_t textsize)
Tokenizer_next
Determines and returns the next token in the text.
Parameters:
- handle [IN] - tokenizer context handle
-
size [OUT] - length in bytes of the returned token. (Guaranteed to be != NULL)
In case of error, this needs to be set != 0.
In case of end-of-text (no more tokens) needs to be set to 0.
Returns:
'\0' terminated string containing the next token, which needs to point to memory that needs to stay valid until the next tokenizer function call.
NULL in case of error (size != 0) or end-of-text (size == 0)
Usage:
DLLexport char *Tokenizer_next (void* handle, int *size)
Tokenizer_end
Terminates the use of the tokenizer. It is the implementer's responsibility to release any resource it might have allocated.
Parameters:
- handle [IN] - tokenizer context handle.
Usage:
DLLexport void Tokenizer_end (void* handle)