FairCom DB Full-Text Search

Fast, efficient access to textual data.

Today's applications process vast amounts of textual data. Full-Text Search (FTS) is a great mechanism for fast, efficient access to character-type data elements. c-tree applications with large volumes of text can now be complemented with high-performing text search capabilities.

Similar to a traditional b-tree index over a c-tree data file, you may now define a Full-Text Index (FTI) by specifying which character-type fields to include in this search index. An additional set of FTI files will be maintained on disk.

This support is on a file-by-file basis (the same as a typical c-tree index).

Once an FTI is defined for a file, it is maintained in “real-time” along with any other b-tree type c-tree indexes, including deferred index support.

Accessing data records through a full-text search index is simple. Using new API search functions, you provide a word or phrase (text) for which you are searching. All records whose FTI-indexed fields contain this text are returned. It is left up to the developer to utilize this result as needed for the application.

The first step in using Full-Text Search is creating an FTS Index (FTI). Functions are available through several different APIs for creating the index, adding a field to the index, setting the default index, and managing the FTI handle. Functions also allow options to be set and other parameters to be controlled.

The FTS indexes are maintained as records are added to the table. Internally, a tokenizer divides the text into "tokens" (which are roughly equivalent to a list of categorized words).

When a full-text search is performed, the function is passed one or more words. Each word is compared to the tokens in the FTS Index. The function returns the records that contain those words in their indexed fields.

Additional information can be returned, such as the proximity of multiple tokens. For example, when searching for "FairCom" and "database", they are in closer proximity in the text "FairCom database" than in "FairCom announces new enhancements to its highly acclaimed database." Statistics about relevance can also be returned.

Full-Text Search is presently available in the following APIs:

  1. FairCom DB API C
  2. FairCom DB API C++
  3. FairCom DB API Java
  4. FairCom DB API .NET

Full-Text Search is supported in client/server only.

Functions are available to create a full-text index on the specified fields of a single table, even an existing table. A Full-Text Index can be defined on any text field (as specified in the data file's schema).

As an example, to add full-text search support to a table at the FairCom DB API C level, call ctdbAddFTI. This function returns a CTHANDLE to be used with FTS functions to add a field to the list of fields that will be indexed by the FTS Index.

A Full-Text Index can be added to an existing table by calling ctdbAddFTI, setting the various properties (including whether the index is created immediately, or as a deferred index, and then calling ctdbAlterTable.

 

Quick Start Guide to FTS

This chapter will get you up and running with FairCom Full-Text Search (FTS) in a hurry.

Currently, FairCom provides support at the navigational API level for interfacing with FTS. Support is presently being added for the relational APIs.

 

 

Adding a Full-Text Index (FTI) to a New Table

To programmatically add FTS support to an empty data file is very straight forward: 

  1. Add a new FTS index using the ctdbAddFTI() function call. This defines the index. 
  2. Add a field to the new index file using the ctdbAddFTIFieldByName() function call. This tells FairCom DB which field to fully index. At this time only a single field can be part of an FTI. Multiple FTI can be defined over the same table.
  3. To finish creating the new table, call ctdbCreateTable() to complete the file creation, including the new FTI.

See ctdbCreateTable.

That's it! You are now ready to start adding data. 

Here is an example:

CTHANDLE dTable;
	pFTI = ctdbAddFTI(dTable, "myFTSidx");
	if (!pFTI)
		Handle_Error(hSession, "ctdbAddFTI");
	if (ctdbAddFTIFieldByName(pFTI, 0, "body", CTDB_FTI_MODE_REG))
		Handle_Error(hSession, "ctdbAddFTIFieldByName");

 

Adding an FTI to an Existing File

If you are adding an FTI over an existing FairCom DB data file that is already populated, then once you have performed the first 2 steps above, simply call ctdbAlterTable(dTable, CTDB_ALTER_INDEX).

 

Performing an FTS Query

The following code snippet demonstrates a query using the FairCom DB API interface:

[...]
if (ctdbFTSearchOn(hRecord, string, 0))
 Handle_Error(hSession, "ctdbFTSearchOn()");
rv = ctdbFirstRecord(hRecord);
while (rv == NO_ERROR)
{  
if (ctdbGetFieldAsString(hRecord, 0, string, 65536) != CTDBRET_OK) 
  Handle_Error(hSession, "ctdbGetFieldAsString()");
 printf("%s\n",string);
 rv = ctdbNextRecord(hRecord);
}

  
if (ctdbFTSearchOff(hRecord))
 Handle_Error(hSession, "ctdbFTSearchOff()");
[...]

 

Full-Text Query Syntax

 

Optional Quotation Marks

The use of single quotation marks to delimit words in an FTS query string is optional. Words in a query string may be delimited in either of two ways:

  • They may be surrounded by single quotation marks.
    or
  • They may be delimited by a space character (0x32).

A phrase must be surrounded by single quotation marks.

 

Operators

The following terms are treated as operators when they are not surrounded by single quotes:

and
near
near/x (where x is a numeric value, as explained later under NEAR Queries)
or (treated as an operator but returns an error because we do not currently support this operator)

 

Examples

The following are examples of valid queries:

To search for documents containing the three words "can", "be", and "found":

"can be found"

To search for documents containing the word "can" near the word "found":

"can NEAR found"

 

Very First Token Search

The caret (^) character indicates to search from the beginning of the column. Only one caret is allowed in an FTS query.

SQL:

All words from the “document” column from the “docs” table for which "linux" is the first token:

SELECT * FROM docs WHERE document MATCH '^linux';

FairCom DB API (pseudo code):

ftSearchOn("document:^linux");

 

Phrase Queries

A phrase query retrieves all documents that contain a nominated set of terms or term prefixes in a specified order with no intervening tokens. Phrase queries are specified by enclosing a space-separated sequence of terms or term prefixes in double quotes ("). For example:

The following example is a query for all documents that contain the phrase "linux applications":

SELECT * FROM docs WHERE document MATCH '"linux applications"';

FairCom DB API (pseudo code):

ftSearchOn("document:'linux applications'");

 

Token NOT present Queries

It is possible to exclude from filter results, rows that contain a keyword by using a “minus” before the keyword.

For example: find all documents which contain the keyword “linux” but exclude those that also include the keyword “applications”

SELECT * FROM docs WHERE document MATCH '"linux” -“applications"';

Multiple keywords for exclusion can also be used. For example, find rows where the “document” column contains the “linux” keyword but not “applications” or “database”

SELECT * FROM docs WHERE document MATCH '"linux” -“applications" -”database”';

 

NEAR Queries

A NEAR query returns documents that contain two or more nominated tokens within a specified proximity of each other. NEAR is a binary operator: it accepts two and only two arguments in the form: A NEAR B.

A NEAR query is specified by putting the keyword NEAR between two phrase, term, or prefix queries. By default, NEAR searches for 10 or fewer intervening tokens between the two arguments. To specify a proximity other than the default, an operator of the form NEAR/<N> may be used, where <N> is the maximum number of intervening terms allowed.

SQL:

Use the following query to search for a document that contains the tokens "faircom" and "database" with not more than 6 intervening tokens. This matches the only document in table docs. Note that the order in which the tokens appear in the document does not have to be the same as the order in which they appear in the query.

SELECT * FROM docs WHERE document MATCH 'database NEAR/6 faircom';

FairCom DB API (pseudo code):

ftSearchOn("document:database NEAR/6 faircom");

 

Multiple NEAR Operators

More than one NEAR operator may appear in a single query. In this case each pair of terms or phrases separated by a NEAR operator must appear within the specified proximity of each other in the document. Using the same table and data as in the block of examples above:

The following query selects documents that contain an instance of the term "FairCom" separated by two or fewer terms from an instance of the term "acid", which is separated by two or fewer terms from an instance of the term "relational".

SELECT * FROM docs WHERE document MATCH 'faircom NEAR/2 acid NEAR/2 relational';

This option will require one index lookup for every term in the phrase, saving the doclist and then an intersection of those search results based on the specified distance.

FairCom DB API (pseudo code):

ftSearchOn("document:faircom NEAR/2 acid NEAR/2 relational");

 

Precedence

NEAR takes precedence over AND. For example:

A AND B NEAR C

instructs FTS to search for records that contain token A and that also contain token B in close proximity ("NEAR") to token C.

 

Full-Text Search Query on Hyphenated Words

The FairCom full-text search supports querying on hyphenated words. Hyphenated words are converted to phrases. For example, searching for the word "self-contained" searches for the phrase 'self contained'.

What is commonly considered a "word" may be ignored either because the tokenizer is written to ignore specific words (e.g., words with fewer than a specified number of characters) or because the sequence of characters is on the stop list of words specifically ignored. In the following example, querying the word "give-and-take" would search for the phrase 'give and take', however the word "and" may be ignored.

If the hyphenated word is in a phrase, it is converted to individual words in that phrase.

 

Example FTS Queries

The following are some examples of valid queries:

To search for documents containing the three words "can", "be", and "found":

'can be found'

To search for documents containing the word "can" near the word "found":

'can' NEAR 'found'

In the next examples, "near" is used as a search term rather than an operator, which means that it requires single quotes.

To search for the words "cans", "near", and "trash":

'cans' 'near' 'trash'

To search for "near" and "and" near "cans":

'cans' NEAR/2 'near' NEAR/2 'and' NEAR/2 'trash'

To search for usage of "NEAR/2" in a document:

'near/2'

To perform a phrase match, the phrase must be enclosed in single quotation marks per FTS query syntax defined at the FairCom DB API level. In SQL, that mandates two single quotes since the entire query is surrounded by a single quote (because it is a literal). Therefore, quotes need to be escaped in the standard way (that is double them).

(This implies the entire query is a phrase match since the final query is made by 3 single quotes, the phrase and 3 single quotes.)

SELECT * FROM docs WHERE document MATCH '(''faircom'' AND ''database'')'

For convenience, the FairCom DB SQL syntax now allows wrapping phrases within the FTS query literal with double quotes (that do not need to be escaped).

SELECT * FROM docs WHERE document MATCH '(“faircom” AND “database”)'

Both the old and the new methods (double quote or double single quotes) of specifying phrase search can be used for SQL FTS queries. 

 

FTS: Quotes, Operators, Examples

 

Optional Quotation Marks

The use of single quotation marks to delimit words in an FTS query string is optional. Words in a query string may be delimited in either of two ways:

  • They may be surrounded by single quotation marks.
    or
  • They may be delimited by a space character (0x32).

A phrase must be surrounded by single quotation marks.

 

Operators

The following terms are treated as operators when they are not surrounded by single quotes:

and
near
near/x (where x is a numeric value, as explained later under NEAR Queries)
or (treated as an operator but returns an error because we do not currently support this operator)

 

Examples

The following are examples of valid queries:

To search for documents containing the three words "can", "be", and "found":

"can be found"

To search for documents containing the word "can" near the word "found":

"can NEAR found" 

 

FTS: Query Syntax

 

Very First Token Search

The caret (^) character indicates to search from the beginning of the column. Only one caret is allowed in an FTS query.

SQL:

All words from the “document” column from the “docs” table for which "linux" is the first token:

SELECT * FROM docs WHERE document MATCH '^linux';

FairCom DB API (pseudo code):

ftSearchOn("document:^linux");

 

Phrase Queries

A phrase query retrieves all documents that contain a nominated set of terms or term prefixes in a specified order with no intervening tokens. Phrase queries are specified by enclosing a space-separated sequence of terms or term prefixes in double quotes ("). For example:

The following example is a query for all documents that contain the phrase "linux applications":

SELECT * FROM docs WHERE document MATCH '"linux applications"';

FairCom DB API (pseudo code):

ftSearchOn("document:'linux applications'");

 

Token NOT present Queries

It is possible to exclude from filter results, rows that contain a keyword by using a “minus” before the keyword.

For example: find all documents which contain the keyword “linux” but exclude those that also include the keyword “applications”

SELECT * FROM docs WHERE document MATCH '"linux” -“applications"';

Multiple keywords for exclusion can also be used. For example, find rows where the “document” column contains the “linux” keyword but not “applications” or “database”

SELECT * FROM docs WHERE document MATCH '"linux” -“applications" -”database”';

 

NEAR Queries

A NEAR query returns documents that contain two or more nominated tokens within a specified proximity of each other. NEAR is a binary operator: it accepts two and only two arguments in the form: A NEAR B.

A NEAR query is specified by putting the keyword NEAR between two phrase, term, or prefix queries. By default, NEAR searches for 10 or fewer intervening tokens between the two arguments. To specify a proximity other than the default, an operator of the form NEAR/<N> may be used, where <N> is the maximum number of intervening terms allowed.

SQL:

Use the following query to search for a document that contains the tokens "faircom" and "database" with not more than 6 intervening tokens. This matches the only document in table docs. Note that the order in which the tokens appear in the document does not have to be the same as the order in which they appear in the query.

SELECT * FROM docs WHERE document MATCH 'database NEAR/6 faircom';

FairCom DB API (pseudo code):

ftSearchOn("document:database NEAR/6 faircom");

 

Multiple NEAR Operators

More than one NEAR operator may appear in a single query. In this case each pair of terms or phrases separated by a NEAR operator must appear within the specified proximity of each other in the document. Using the same table and data as in the block of examples above:

The following query selects documents that contain an instance of the term "FairCom" separated by two or fewer terms from an instance of the term "acid", which is separated by two or fewer terms from an instance of the term "relational".

SELECT * FROM docs WHERE document MATCH 'faircom NEAR/2 acid NEAR/2 relational';

This option will require one index lookup for every term in the phrase, saving the doclist and then an intersection of those search results based on the specified distance.

FairCom DB API (pseudo code):

ftSearchOn("document:faircom NEAR/2 acid NEAR/2 relational");

 

Precedence

NEAR takes precedence over AND. For example:

A AND B NEAR C

instructs FTS to search for records that contain token A and that also contain token B in close proximity ("NEAR") to token C.

 

Full-Text Search Query on Hyphenated Words

The FairCom full-text search supports querying on hyphenated words. Hyphenated words are converted to phrases. For example, searching for the word "self-contained" searches for the phrase 'self contained'.

What is commonly considered a "word" may be ignored either because the tokenizer is written to ignore specific words (e.g., words with fewer than a specified number of characters) or because the sequence of characters is on the stop list of words specifically ignored. In the following example, querying the word "give-and-take" would search for the phrase 'give and take', however the word "and" may be ignored.

If the hyphenated word is in a phrase, it is converted to individual words in that phrase.

 

Example FTS Queries

The following are some examples of valid queries:

To search for documents containing the three words "can", "be", and "found":

'can be found'

To search for documents containing the word "can" near the word "found":

'can' NEAR 'found'

In the next examples, "near" is used as a search term rather than an operator, which means that it requires single quotes.

To search for the words "cans", "near", and "trash":

'cans' 'near' 'trash'

To search for "near" and "and" near "cans":

'cans' NEAR/2 'near' NEAR/2 'and' NEAR/2 'trash'

To search for usage of "NEAR/2" in a document:

'near/2'

 

Easier Full-Text Search (FTS) MATCH Operator Syntax

To perform a phrase match, the phrase must be enclosed in single quotation marks per FTS query syntax defined at the FairCom DB API level. In SQL, that mandates two single quotes since the entire query is surrounded by a single quote (because it is a literal). Therefore, quotes need to be escaped in the standard way (that is double them).

(This implies the entire query is a phrase match since the final query is made by 3 single quotes, the phrase and 3 single quotes.)

SELECT * FROM docs WHERE document MATCH '(''faircom'' AND ''database'')'

For convenience, the FairCom DB SQL syntax now allows wrapping phrases within the FTS query literal with double quotes (that do not need to be escaped).

SELECT * FROM docs WHERE document MATCH '(“faircom” AND “database”)'

Both the old and the new methods (double quote or double single quotes) of specifying phrase search can be used for SQL FTS queries. 

 

SQL search examples

 

SQL full-text search examples using PORTER English stemming tokenizer

The PORTER stemming tokenizer identifies each word in the search using whitespace and punctuation. Then, regardless of case, the tokenizer finds all matching words by reducing each word in the search to its stem by removing trailing s, er, ing, and ed.

A search string identifies each word using whitespace and/or punctuation and may use double quotes to identify a word or a phrase. Phrases contain two or more words within double quotes and may include whitespace between words. Punctuation, such as .!>, is ignored in the search string when it is not a full-text search operator. Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT; However, * ( ) OR NOT are not supported and will return an error when used.

/* =====================================================================

 * Full-text search demo using the PORTER tokenizer.

 * =====================================================================

 */

/* Create the testsearch table. */

CREATE TABLE testsearch

(

  name VARCHAR(30)

, description VARCHAR(500) 

) 

STORAGE_ATTRIBUTES 'hotAlter; huge; rowid_fld; noRecbyt_idx;'

;

/* Create full-text index on the description field of the testsearch table. Use the PORTER English word stem tokenizer to do case insensitive matching of common word forms */

CREATE FULLTEXT INDEX testsearch_description_porter_fts  

  ON  testsearch( description )  

  TOKENIZER    PORTER  

  UPDATE_MODE  ASYNC

;

/* Insert a record into the testsearch table. */

INSERT INTO testsearch 

VALUES('Mike','Americans   --love--  their __BASEball__ and <FOOTball>!')

;

/* Look at the record inserted into the testsearch table. */

SELECT  description FROM testsearch;

/* =====================================================================

 * The following queries find the inserted record 

 * by searching for words and phrases in the description field:

 * 'Americans   --love--  their __BASEball__ and <FOOTball>!'. 

 * 

 * The PORTER tokenizer reduces the text to 

 * 'american love their baseball and football'. 

 * 

 * The tokenizer identifies each word using whitespace and/or punctuation.

 * The tokenizer is case insensitive.

 * The tokenizer reduces each word to its stem 

 * by removing trailing s, er, ing, and ed.

 * The tokenizer finds all words matching the stem.

 * These words match each other -- even when they are invalid words: 

 * baseball baseballs baseballing baseballer baseballed

 * 

 * A search string identifies each word using whitespace and/or punctuation.

 * A search string may use double quotes to identify a word or a phrase.

 * A phrase contains two or more words within double quotes.

 * A search string may include any amount of whitespace between words.

 * A search string may include 

 * Punctuation, such as .!>, is ignored in the search string 

 * when it is not a full-text search operator. 

 * Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT

 * The following search operators are unsupported: * ( ) OR NOT

 * The server returns an error when you use an unsupported search operator.

 * =====================================================================

 */

/* WORD SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'baseball';

/* VERB STEM SEARCH: Find records in the testsearch table where the description column contains common English conjugations of verbs, such as 'loving', 'love', 'loved', and 'loves', anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'loving';

/* NOUN STEM SEARCH: Find records in the testsearch table where the description column contains English singular and plural forms of nouns, such as 'american' and 'americans', anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'american';

/* CASE INSENSITIVE SEARCH: Find records in the testsearch table where the description column contains common forms of the word 'baseBALLs', such as 'baseball' and 'Baseballs' anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'baseBALLs';

/* FIRST WORD SEARCH: Find records in the testsearch table where the description column contains the case-insensitive word 'americans' as the first word in the description field. */

SELECT description FROM testsearch WHERE description MATCH '^americans';

/* FIRST PHRASE SEARCH: Find records in the testsearch table where the description column contains the phrase "americans love" at the begining of the text in the description field. */

SELECT  description FROM testsearch WHERE description MATCH ' ^"americans love" ';

/* MULTI-WORD SEARCH: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'baseball love';

/* MULTI-WORD SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field while ignoring case and whitespace. */

SELECT description FROM testsearch WHERE description MATCH ' Baseball   >>>   LOVE! ';

/* PHRASE SEARCH: Find records in the testsearch table where the description column contains the case-insensitive phrase "love their" anywhere in the description field. The words 'love" and 'their" must follow each other with no intervening words in the description field. */

SELECT description FROM testsearch WHERE description MATCH ' "love their" ';

/* PHRASE SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the case-insensitive phrase "LOVE their"  anywhere in the description field. The case-insensitive words 'LOVE" and 'theIR" must follow each other with no intervening words, but any amount of whitespace and punctuation may occur between these words in the description field. */

SELECT description FROM testsearch WHERE description MATCH ' "LOVE   theIR" ';

/* STEMMED MULTI-WORD SEARCH: Find records in the testsearch table where the description column contains several words anywhere in the description field, such as 'baseBALLs' and 'american'. The PORTER tokenizer matches each word that has the same English stemming. */

SELECT description FROM testsearch WHERE description MATCH 'Loved their american baseBALLs.';

/* STEMMED PHRASE SEARCH: Find records in the testsearch table where the description column contains common forms of each word in the phrase "and football" anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH ' "and footballs" ';

/* STEMMED WORD SEARCH ACROSS INCORRECT WORD FORMS: Find records in the testsearch table where the description column contains correct and incorrect forms of words, such as 'baseballing', 'baseballer', 'baseballed'. The PORTER stemmer applies its rules to every word -- even when the rule is incorrect in English. */

SELECT description FROM testsearch WHERE description MATCH ' american americans americaning americaner americaned baseball baseballs baseballing baseballer baseballed their theirs theiring theired "and" "ands" "anding" "anded"  ';

/* STEMMED FIRST WORD SEARCH: Find records in the testsearch table where the description column contain common forms of the word 'americaning' as the first word in the description field. */

SELECT description FROM testsearch WHERE description MATCH '^americaning';

/* STEMMED FIRST PHRASE SEARCH: Find records in the testsearch table where the description column contains common forms of the words in the phrase "american loving" at the begining of the text in the description field. */

SELECT  description FROM testsearch WHERE description MATCH ' ^"american loving" ';

/* NEAR SEARCH: Find records in the testsearch table where the description column contains the words 'Americans' and 'baseball' with no more than ten words between each other anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'Americans NEAR football';

/* NEAR PHRASE SEARCH: Find records in the testsearch table where the description column contains the words 'their' and 'baseballs' with no words between each other. This is the same as using a phrase search. */

SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseball';

/* NEAR N SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' with no more than 1 word between each other. This pattern may occur anywhere in the description field with any amount of whitespace and punctuation between the words. */

SELECT description FROM testsearch WHERE description MATCH ' baseball NEAR/1 football ';

/* NOT WORD SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' but not 'karate'. NOTE: The minus operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include a term in the search phrase before the NOT operator. */

SELECT description FROM testsearch WHERE description MATCH 'love baseball -karate';

/* NOT PHRASE SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' but not the phrase "hate karate" and not the word 'soccer'. NOTE: The minus operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include a term in the search phrase before the NOT operator. */

SELECT description FROM testsearch WHERE description MATCH 'baseball -"hate karate" -soccer';

/* STEMMED NEAR SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the words 'American' and 'baseball' with no more than ten words between each other anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'americaner NEAR baseballs ';

/* STEMMED PHRASE SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the words 'their' and 'baseballs' with no words between each other anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseBALLS ';

/* STEMMED NEAR N SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the words 'baseball' and 'football' with no more than 1 word between each other anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'baseballing NEAR/1 footballed ';

/* STEMMED NOT WORD SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the words 'love' and 'baseball' but not 'karate'. NOTE: The minus operator must be adjacent to the word or phrase with no intervening whitespace. You must also have a term in the search phrase before the minus - operator. */

SELECT description FROM testsearch WHERE description MATCH 'loves baseballing -karate';

/* STEMMED NOT PHRASE SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the word 'baseball' but not the phrase "hate karate". */

SELECT description FROM testsearch WHERE description MATCH 'loving baseballs -"hating karate" ';

/* STEMMED NOT PHRASE SEARCH: Find records in the testsearch table where the description column contains case-insensitive common forms of the word 'love' but not case-insensitive common forms of the words in the phrase "Americans football". */

SELECT description FROM testsearch WHERE description MATCH 'Loved -"ameRICAN  footBALLs" ';

/* Remove the testsearch table. */

DROP TABLE testsearch;

 

SQL full-text search examples using SIMPLE tokenizer

The SIMPLE tokenizer identifies each word using whitespace and/or punctuation. Then regardless of case, the tokenizer finds all matching words in the search. tokenizer is case insensitive.

A search string identifies each word using whitespace and/or punctuation and may use double quotes to identify a word or a phrase. Phrases contain two or more words within double quotes and may include whitespace between words. Punctuation, such as .!>, is ignored in the search string when it is not a full-text search operator. Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT; However, * ( ) OR NOT are not supported and will return an error when used.

/* =====================================================================

 * Full-text search demo using the SIMPLE tokenizer.

 * =====================================================================

 */

/* Create the testsearch table. */

CREATE TABLE testsearch

(

  name VARCHAR(30)

, description VARCHAR(500) 

) 

STORAGE_ATTRIBUTES 'hotAlter; huge; rowid_fld; noRecbyt_idx;'

;

/* Create full-text index on the description field of the testsearch table. Use the PORTER English word stem tokenizer to do case insensitive matching of common word forms */

CREATE FULLTEXT INDEX testsearch_description_simple_fts  

  ON  testsearch( description )  

  UPDATE_MODE  ASYNC

;

/* Insert a record into the testsearch table. */

INSERT INTO testsearch 

VALUES('Mike','Americans   --love--  their __BASEball__ and <FOOTball>!')

;

/* Look at the record inserted into the testsearch table. */

SELECT  description FROM testsearch;

/* =====================================================================

 * The following queries find the inserted record 

 * by searching for words and phrases in the description field:

 * 'Americans   --love--  their __BASEball__ and <FOOTball>!'. 

 * 

 * The SIMPLE tokenizer reduces the text to 

 * 'americans love their baseball and football'. 

 * 

 * The tokenizer identifies each word using whitespace and/or punctuation.

 * The tokenizer is case insensitive.

 * A search string identifies each word using whitespace and/or punctuation.

 * A search string may use double quotes to identify a word or a phrase.

 * A phrase contains two or more words within double quotes.

 * A search string may include any amount of whitespace between words.

 * A search string may include 

 * Punctuation, such as .!>, is ignored in the search string 

 * when it is not a full-text search operator. 

 * Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT

 * The following search operators are unsupported: * ( ) OR NOT

 * The server returns an error when you use an unsupported search operator.

 * =====================================================================

 */

/* WORD SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'baseball';

/* CASE INSENSITIVE SEARCH: Find records in the testsearch table where the description column contains case-insensitive forms of the word 'americans', such as 'Americans' and 'AMERIcans' anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'americans';

/* FIRST WORD SEARCH: Find records in the testsearch table where the description column contains the word 'americans' as the first word in the description field. */

SELECT  description FROM testsearch WHERE description MATCH '^americans';

/* FIRST PHRASE SEARCH: Find records in the testsearch table where the description column contains the phrase "americans love" at the begining of the text in the description field. */

SELECT  description FROM testsearch WHERE description MATCH ' ^"americans love" ';

/* MULTI-WORD SEARCH: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'baseball love';

/* MULTI-WORD SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field. NOTE: Any amount of whitespace may occur between these words in the search text. Punctuation, such as .!>, is also ignored in the search string when it is not a full-text search operator. */

SELECT description FROM testsearch WHERE description MATCH ' <Baseball>   >>>   .LOVE. !!! ';

/* PHRASE SEARCH: Find records in the testsearch table where the description column contains the case-insensitive phrase "love their" anywhere in the description field. The words 'love" and 'their" must follow each other with no intervening words in the description field. */

SELECT description FROM testsearch WHERE description MATCH ' "love their" ';

/* PHRASE SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the case-insensitive phrase "LOVE their"  anywhere in the description field. The case-insensitive words 'LOVE" and 'theIR" must follow each other with no intervening words, but any amount of whitespace and unreserved punctuation may occur between these words in the description field. */

SELECT description FROM testsearch WHERE description MATCH ' "LOVE!   theIR" ';

/* NEAR SEARCH: Find records in the testsearch table where the description column contains the words 'Americans' and 'baseball' with no more than ten words between each other anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'Americans NEAR football ';

/* NEAR PHRASE SEARCH: Find records in the testsearch table where the description column contains the words 'their' and 'baseballs' with no words between each other. This is the same as using a phrase search. */

SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseball';

/* NEAR N SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' with no more than 1 word between each other. This pattern may occur anywhere in the description field with any amount of whitespace and punctuation between the words. */

SELECT description FROM testsearch WHERE description MATCH ' baseball NEAR/1 football ';

/* NOT WORD SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' but not 'karate'. NOTE: The minus - operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include at least one term in the search phrase before the - operator. */

SELECT description FROM testsearch WHERE description MATCH 'love baseball -karate';

/* NOT PHRASE SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' but not the phrase "hate karate" and not the word 'soccer'. NOTE: The minus operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include a term in the search phrase before the NOT operator. */

SELECT description FROM testsearch WHERE description MATCH 'baseball -"hate karate" -soccer';

/* AND SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' and 'loved'. NOTE: The AND operator is optional. The server automatically adds it between each term in the search expression. The following two queries are functionally identical. */

SELECT description FROM testsearch WHERE description MATCH ' love AND "their baseball" ';

SELECT description FROM testsearch WHERE description MATCH ' love "their baseball" ';

/* SEARCH for reserved words "and", "or", and "near": Find records in the testsearch table where the description column contains the words "baseball" and "and" but not "or" and "near". NOTE: enclose a reserved word in double quotes to treat it as a word instead of an operator. */

SELECT description FROM testsearch WHERE description MATCH ' love "and" -"or" -"near" ';

/* =====================================================================

 * The following queries do NOT find the inserted record 

 * because they do not match the text: 

 * 'Americans   --love--  their __BASEball__ and <FOOTball>!' 

 * =====================================================================

 */

/* VERB STEM SEARCH: This query returns no results because the SIMPLE tokenizer does not match common English conjugations of verbs, such as 'loving', 'love', 'loved', and 'loves'. The PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'loving';

/* NOUN STEM SEARCH: This query returns no results because the SIMPLE tokenizer does not match common English singular and plural forms of nouns, such as  'american' and 'americans'. The PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'american';

/* FIRST WORD SEARCH: This query returns no results because 'love' is not the first word in the description. */

SELECT  description FROM testsearch WHERE description MATCH '^love';

/* STEMMED MULTI-WORD SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'Baseballs american';

/* STEMMED NEAR SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'American NEAR baseballs ';

/* STEMMED PHRASE SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseballs ';

/* STEMMED NEAR N SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'loves NEAR/1 baseballs ';

/* Remove the testsearch table. */

DROP TABLE testsearch;

 

Supported Features

 

Current Support

The initial release of FairCom Full-Text Search supports the most relevant features typically expected by users of full-text search. Some minor features will be deferred until later phases of product development. This section lists considerations you need to keep in mind when using the current version of FTS.

 

Multiple FTI Supported per Data File

FTS permits multiple Full-Text Indexes per data file.

 

Word Dictionary

The Word Dictionary contains a list of tokens (as generated by the tokenizer) for known words appearing at least once in the indexed documents. Each index uses its own Word Dictionary. Because each tokenizer may encode words in its own way, each Word Dictionary uses a single tokenizer (you cannot change tokenizers without rebuilding the Word Dictionary).

 

Limitations

Several limitations will be addressed in future FTS releases:

 

Limit of 1 Field per FTI

FTS is currently limited to a single field per FTI. This limitation will be released in the future.

OR Operator - The OR operator is not supported in this release. All search terms are considered to be joined by AND operators. This functionality could be created in the application by breaking the search into multiple searches and presenting the union of their results.

Wildcard Search - Wildcard searches are not supported in this release of FTS.

A limited form of wildcard searches, called "term-prefix searches," allows you to search for words that begin with the specified characters.

Parentheses - The use of parentheses to indicate the precedence of parts of a complex search is not presently supported. However, in this release of FTS, all search terms are ANDed together, so precedence should not matter.

 

Full-Text Index stop word list

The Full-Text Search feature supports adding a list of stop words to a Full-Text Index when the word dictionary is created.

To set a stop word list when creating a full text index dictionary, call the ctdbSetFTIOption() function with option set to CTDB_FTI_OPTION_STOP_LIST and pvalue set to a UTF-8 string containing the stop words separated by spaces. Example:

rc = ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_STOP_LIST, "a an the", 0);

 

FairCom DB API FTS API

This section provides reference information about the FairCom DB API API functions that support Full-Text Search.

The following FairCom DB API functions are available for Full-Text Search:

  • ctdbAddFTI - Add a new Full-Text Search Index to a table.
  • ctdbAddFTIField - Add a new field to a Full-Text Search Index.
  • ctdbAddFTIFieldByName - Add a new field to an FTS Index.
  • ctdbAllocFTI - Allocate memory for a new Full-Text Search handle.
  • ctdbCheckFTIBackgroundLoad - Check the status of the background index loader for the FTI.
  • ctdbDelFTI - Delete a table's Full-Text Search configuration.
  • ctdbFreeFTI - Release all resources associated with a Full-Text Search handle.
  • ctdbFTSearchOff - Disable and free an existing Full-Text Search.
  • ctdbFTSearchOn - Set up a new Full-Text Search.
  • ctdbGetFTI - Retrieve the Full-Text Search handle.
  • ctdbGetFTIByName - Retrieve the index handle given the index name.
  • ctdbGetFTIByUID - Retrieve an FTI by its unique ID.
  • ctdbGetFTIField - Retrieve the field handle of the field indicated by FieldNumber.
  • ctdbGetFTIFieldCount - Retrieve the number of segments associated with this index.
  • ctdbGetFTIFieldMode - Return the field mode specified when adding the field to the FTI.
  • ctdbGetFTIHandle - Return the Full-Text Search handle or NULL on failure.
  • ctdbGetFTIName - Retrieve the name of the FTI.
  • ctdbGetFTINbr - Get the number of an FTI in the FTI list.
  • ctdbGetFTINbrByName - Retrieve the index number given the index name.
  • ctdbGetFTIStatus - Retrieve the status of the index handle.
  • ctdbGetTableFTICount - Retrieve the number of FTS indices associated with the table.
  • ctdbIsFTSearchOn - Indicate if a Full-Text Search is active or not.
  • ctdbSetDefaultFTI - Disable and free an existing Full-Text Search.
  • ctdbSetFTICndxExpr - Set the conditional expression for this Full Text Index.
  • ctdbSetFTIOption - Allows several Full-Text Search values to be set depending on the option.
  • ctdbStartFTIBackgroundLoad - Starts a background index loader for the FTI.

 

ctdbAddFTI

Declaration

CTHANDLE ctdbAddFTI(CTHANDLE Handle, pTEXT name)

Description

Add a new Full-Text Search index to a table.

  • Handle is a table handle (CTDBTABLE)
  • name is an Index name

Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

Return Values

On success returns the Full-Text Search handle.

On failure returns NULL.

See Errors for a complete listing of valid FairCom DB error values.

See Also

  • FairCom DB API Full-Text Search API

 

ctdbAddFTIField

Declaration

CTDBRET ctdbAddFTIField(CTHANDLE Handle, CTHANDLE FieldHandle, ULONG mode)

Description

Add a new field to a Full-Text Search index.

  • Handle is a Full-Text Search handle (CTDBFTS)
  • FieldHandle is a field handle (CTDBFIELD)
  • mode is a Full-Text search field matching mode:
Mode Code Description
CTDB_FTI_MODE_REG 0x00000001 Regular - Default field treatment.
CTDB_FTI_MODE_UTF8 0x00000002 Field format in UTF8.
CTDB_FTI_MODE_UTF16 0x00000004 Field format in UTF16.

Return Values

Mode Code Description
CTDBRET_NOTFTS 4149 Not a Full-Text Search handle
CTDBRET_FTS_DEFINED 4150 Full-Text Search already defined on table

See Errors for a complete listing of valid FairCom DB error values.

Example

See Full-Text Search Example.

See Also

  • Full-Text Search ICU Tokenizer
  • FairCom DB API Full-Text Search API
  • Full-Text Search Example

 

ctdbAddFTIFieldByName

Declaration

CTDBRET ctdbAddFTIFieldByName(CTHANDLE Handle, NINT Number, pTEXT FieldName, ULONG mode)

Description

Add a new field to an FTS index.

  • Handle is a table handle (CTDBTABLE).
  • Number is a Full-Text Search Index number.
  • FieldName is a Field name.
  • mode is a Full-Text search field matching mode:
Mode Code Description
CTDB_FTI_MODE_REG 0x00000001 Regular - Default field treatment.
CTDB_FTI_MODE_UTF8 0x00000002 Field format in UTF8.
CTDB_FTI_MODE_UTF16 0x00000004 Field format in UTF16.

Return

Error code

Mode Code Description
CTDBRET_NOTFTS 4149 Not a Full-Text Search handle
CTDBRET_FTS_DEFINED 4150 Full-Text Search already defined on table

See Errors for a complete listing of valid FairCom DB error values.

See Also

  • Full-Text Search ICU Tokenizer
  • FairCom DB API Full-Text Search API

 

ctdbAllocFTI

Declaration

pCTDBFTI ctdbAllocFTI(CTHANDLE Handle)

Description

Allocate memory for a new Full-Text Index handle.

  • Handle is a table handle (CTDBTABLE)

Return

The new allocated Full-Text Search handle (pCTDBFTS), or NULL on failure.

See Errors for a complete listing of valid FairCom DB error values.

 

ctdbCheckFTIBackgroundLoad

Check the status of the background index loader for the Full-Text Index (FTI).

Declaration

CTDBRET ctdbLOCAL ctdbCheckFTIBackgroundLoad(pCTDBFTI pFTI, pBGLDINF status)

Parameters

  • pFTI [IN] - Full-Text Index handle.
  • status [OUT] - Status information.

Description

Note: It is not possible to delete an FTI for which a background load has been started and still not finished. Attempting this operation causes the ctdbAlterTable() to fail with error CTDBRET_LOADISACTIVE (4166, new error code).

 

This function was originally named ctdbFTICheckBackgroundLoad, which has been deprecated.

Return

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.
4165 CTDBRET_NOLOAD The background FTI index loader has not been "started" on the specified FTI.

See Errors for a complete listing of valid c-tree Plus error values.

See Also

  • ctdbStartFTIBackgroundLoad()
  • FairCom DB API - FTI Background Initial Load

 

ctdbDelFTI

Declaration

CTDBRET ctdbDelFTI(CTHANDLE Handle, COUNT Number)

Description

Delete a table Full-Text Search configuration.

  • Handle is a table handle (CTDBTABLE).
  • Number is a Full-Text Search configuration number (currently ignored).

Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

Return Values

Mode Code Description
CTDBRET_NOTFTS 4149 Not a Full-Text Search handle
CTDBRET_FTS_DEFINED 4150 Full-Text Search already defined on table

See Errors for a complete listing of valid FairCom DB error values.

 

 

ctdbFreeFTI

Declaration

VOID ctdbFreeFTI(CTHANDLE Handle)

Description

Release all resources associated with a Full-Text Search handle.

  • Handle is a Full-Text Search Handle (CTDBFTS)

Return Values

None

See Errors for a complete listing of valid FairCom DB error values.

 

 

ctdbFTSearchOff

Declaration

CTDBRET ctdbFTSearchOff(CTHANDLE Handle)

Description

Disable and free an existing Full-Text Search.

  • Handle is a record handle (CTDBRECORD)

Return Values

Mode Code Description
CTDBRET_NOTFTS 4149 Not a Full-Text Search handle
CTDBRET_FTS_DEFINED 4150 Full-Text Search already defined on table

See Errors for a complete listing of valid FairCom DB error values.

See Also

  • ctdbFTSearchOn
  • ctdbIsFTSearchOn

 

ctdbFTSearchOn

Declaration

CTDBRET ctdbFTSearchOn(CTHANDLE Handle, pTEXT query, NINT mode)

Description

Set up a new Full-Text Search.

  • Handle is a record handle (CTDBRECORD)
  • query is the text to locate
  • mode should be set to 0.

Remember: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

Return Values

Mode Code Description
CTDBRET_NOTFTS 4149 Not a Full-Text Search handle
CTDBRET_FTS_DEFINED 4150 Full-Text Search already defined on table

See Errors for a complete listing of valid FairCom DB error values.

See Also

  • ctdbFTSearchOff
  • ctdbIsFTSearchOn

 

ctdbGetFTI

Declaration

CTHANDLE ctdbDECL ctdbGetFTI(CTHANDLE Handle, COUNT Number)

Description

Retrieve the Full-Text Search handle.

  • Handle is a table handle (CTDBTABLE)
  • Number is a Full-Text Search configuration number (must be set to 0; currently it is ignored)

Return Values

Returns the index handle on success or NULL on failure.

See Errors for a complete listing of valid FairCom DB error values.

 

ctdbGetFTIByName

Declaration

CTHANDLE ctdbDECL ctdbGetFTIByName(CTHANDLE Handle, pTEXT name)

Description

Retrieve the index handle given the index name.

  • Handle is a table handle.
  • Name is an FTS Index name.

Return

Return the index handle on success or NULL on error.

See Errors for a complete listing of valid FairCom DB error values.

 

ctdbGetFTIByUID

Retrieve an FTI by its unique ID (the Full Text Index counterpart of ctdbGetIndexByUID).

Return Values

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

 

ctdbGetFTIField

Declaration

CTHANDLE ctdbDECL ctdbGetFTIField(CTHANDLE Handle, NINT FieldNumber)

Description

Retrieve the field handle of the field indicated by FieldNumber.

  • Handle is an FTS handle.
  • FieldNumber is a field number in the FTS field sequence.

Return Values

Return return a HANDLE that is the CTDBFIELD handle.

See Errors for a complete listing of valid FairCom DB error values.

 

ctdbGetFTIFieldCount

Declaration

NINT ctdbDECL ctdbGetFTIFieldCount(CTHANDLE Handle)

Description

Retrieve the number of segments associated with this index.

  • Handle is an FTS handle

Return Values

Return the number of indexes or -1 on error.

See Errors for a complete listing of valid FairCom DB error values.

 

ctdbGetFTIFieldMode

Declaration

ULONG ctdbDECL ctdbGetFTIFieldMode(CTHANDLE Handle, NINT FieldNumber)

Description

Return the field mode specified when adding the field to the FTI.

Parameters

  • Handle [IN] - FTS handle
  • FieldNumber [IN] - Field number in the FTI field sequence

Return Values

Return the field mode or 0 on error.

 

ctdbGetFTIHandle

Declaration

CTHANDLE ctdbGetFTIHandle(CTHANDLE Handle)

Description

Return the Full-Text Search handle or NULL on failure.

  • Handle [IN] - FairCom DB API C API handle

Return

Returns the index handle on success. Returns NULL on failure.

See Errors for a complete listing of valid FairCom DB error values.

 

 

ctdbGetFTIName

Declaration

pTEXT ctdbDECL ctdbGetFTIName(CTHANDLE Handle)

Description

Retrieve the name of the Full-Text Index (FTI).

Return Values

Return string pointer to FTI name or NULL on failure.

 

ctdbGetFTINbr

Get the number of an FTI in the FTI list (the Full Text Index counterpart of ctdbGetIndexNbr).

Return Values

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

 

ctdbGetFTINbrByName

Declaration

COUNT ctdbDECL ctdbGetFTINbrByName(CTHANDLE Handle, cpTEXT name)

Description

Retrieve the index number given the index name.

  • Handle is a table handle
  • Name is a Full-Text Search Index name.

Return Values

Return the index number on success or -1 on error.

See Errors for a complete listing of valid FairCom DB error values.

 

ctdbGetFTIStatus

Declaration

ULONG ctdbDECL ctdbGetFTIStatus(CTHANDLE Handle)

Description

Retrieve the status of the index handle.

  • Handle [IN] - FTS handle.

The status of the index handle is a bit map describing one or more of the following:

  • CTDBFTI_OLD - Original value (no changes)
  • CTDBFTI_NEW - Index added
  • CTDBFTI_DEL - Original Index deleted

Return

Return the index handle status value.

See Errors for a complete listing of valid FairCom DB error values.

 

ctdbGetTableFTICount

Declaration

COUNT ctdbGetTableFTICount(CTHANDLE Handle)

Description

Retrieve the number of FTS indexes associated with the table.

  • Handle is a table handle (CTDBTABLE)

Return Values

Return the number of indexes or -1 on error.

See Errors for a complete listing of valid FairCom DB error values.

 

ctdbIsFTSearchOn

Declaration

CTBOOL ctdbIsFTSearchOn(CTHANDLE Handle)

Description

Indicate if a Full-Text Search is active or not. A Full-Text Search is active after a successful call to ctdbITSearchOn(). A Full-Text Search can be switched off by calling ctdbFTSearchOff().

  • Handle is a record handle (CTDBRECORD)

Return Values

Return YES is record set if active.

Return NO if record set is not active.

See Errors for a complete listing of valid FairCom DB error values.

See Also

  • ctdbFTSearchOff
  • ctdbFTSearchOn

 

ctdbSetDefaultFTI

Declaration

CTDBRET ctdbSetDefaultFTI(CTHANDLE Handle, COUNT Number)

Description

Set the number of a new default FTI.

  • Handle is a record handle (CTDBRECORD)
  • Number is the Full-Text Search index to set as default by number

Return Values

Returns CTDBRET_OK on success.

Mode Code Description
CTDBRET_NOTFTS 4149 Not a Full-Text Search handle
CTDBRET_FTS_DEFINED 4150 Full-Text Search already defined on table

See Errors for a complete listing of valid FairCom DB error values.

 

 

ctdbSetFTICndxExpr

Declaration

CTDBRET ctdbDECL ctdbSetFTICndxExpr(CTHANDLE Handle, pTEXT conditionExpr)

Parameters:

  • Handle [IN] - Full Text Index handle
  • conditionExpr [IN] - Conditional expression

Description

Set the conditional expression for this Full Text Index.

This function is very similar to the ctdbSetIndexCndxExpr function used on regular indexes.

If this function is called on an open table, it verifies that the expression is valid.

Remember: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

Addition to ctdbSetFTIOption

Alternatively, a new option for ctdbSetFTIOption allows you to set a conditional expression for an existing Full-Text Index (FTI):

  • CTDB_FTI_OPTION_CNDXEXPR - Set the Full-Text Index conditional expression. The pvalue parameter contains the conditional expression to apply. The lvalue is ignored.

Return Values

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

See Also

  • ctdbSetFTIOption

 

ctdbSetFTIOption

Declaration

CTDBRET ctdbDECL ctdbSetFTIOption(CTHANDLE Handle, UCOUNT option, pTEXT pvalue, ULONG lvalue)

Description:

Allows several Full-Text Search values to be set depending on the option.

Parameters:

  • Handle [IN] - FTI handle
  • option [IN] - Option to set
  • pvalue [IN] - If the option takes a string value, set it to this value
  • lvalue [IN] - If the option takes a long value, set it to this value

In function ctdbSetFTIOption, either lvalue or pvalue needs to be set depending on the option.

Update Options

CTDB_FTI_OPTION_UPD

Set lvalue to one of the following to specify when to perform FTS Index updates:

Mode Code Description
CTDB_FTI_OPTION_UPD_ASYNC 0x0000 Perform FTI updates in background (default)
CTDB_FTI_OPTION_UPD_IMMEDIATE 0x0001 Perform FTI updates immediately
CTDB_FTI_OPTION_IGNORE_EXISTING 0x0100

Disable the loading of existing records to create an FTI without loading existing records

This setting is not persisted after creation. It is possible to perform a ctdbAlterTable(...CTDB_ALTER_FTI) indicating to not load existing record on rebuild (i.e., empty the indices it rebuilds) by explicitly calling ctdbSetFTIOption (... ,CTDB_FTI_OPTION_IGNORE_EXISTING) on the proper indices.

Tokenizer Options

CTDB_FTI_OPTION_TOKENSIZE

Set lvalue to the maximum token size.

CTDB_FTI_OPTION_TOKENIZER

Set lvalue to one of the following:

Mode Code Description
CTDB_FTI_TOKENIZER_CUSTOM 0 Use a user-defined custom tokenizer - see Full-Text Search Custom Tokenizer
 
CTDB_FTI_TOKENIZER_SIMPLE 1 Use the standard, simple tokenizer, which uses white space to delimit tokens. Default
CTDB_FTI_TOKENIZER_PORTER 2 Use the Porter Tokenizer, which uses "stemming" to reduce words to a common root so grammatically similar words can be matched.
CTDB_FTI_TOKENIZER_ICU 3 Use the ICU Unicode Tokenizer - see Full-Text Search ICU Tokenizer
 

ICU Tokenizer

CTDB_FTI_OPTION_ICULANG

Set pvalue to a string identifying the language for Unicode collation based on ICU library capabilities to use with the ICU tokenizer. See Full-Text Search ICU Tokenizer.

CTDB_FTI_OPTION_ICUOPTION

Set lvalue to an ICU option using c-tree extended segment modes for ICU-based indexes configuration. See Full-Text Search ICU Tokenizer.

Custom Tokenizer

CTDB_FTI_OPTION_CUSTLIB

Set pvalue to the name of the library to load implementing a custom tokenizer. See Full-Text Search Custom Tokenizer.

CTDB_FTI_OPTION_CUSTPARAM

Set pvalue to a string that will be passed to the custom tokenizer init function with custom details. See Full-Text Search Custom Tokenizer.

Dictionary Domain Options

This option is used internally to define the scope of the word dictionary used for full-text search. It currently has no exposed API uses.

Dictionary Domain Code Description
CTFTI_DOMAIN_TABLE 0x0000 Scope is the table.
CTFTI_DOMAIN_DATABASE 0x0001 Scope is the database the table belongs.
CTFTI_DOMAIN_SERVER 0x0002 Scope is the server.

Returns:

ctdbSetFTIOption() returns CTDBRET_OK on success, or one of the following errors on failure:

  • CTDBRET_NOTFTI if called passing a handle not belonging to an FTI.
  • CTDBRET_INVARG if called passing an invalid argument combination or unknown option.
  • CTDBRET_NOTSUPPORTED if called with the Full-Text index support not enabled.

See Also

  • Full-Text Search ICU Tokenizer
  • Full-Text Search Custom Tokenizer

 

ctdbStartFTIBackgroundLoad

Starts a background index loader for the Full-Text Index (FTI).

Declaration

CTDBRET ctdbLOCAL ctdbStartFTIBackgroundLoad(pCTDBFTI pFTI)

Description

  • pFTI - [IN] Full-Text Index handle.

Options

In addition to the existing FTI options:

CTDB_FTI_OPTION_IGNORE_EXISTING - ignore existing record.

The following FTI option has been added:

CTDB_FTI_OPTION_BACKLOAD_EXISTING - load existing record in background.

The setting is persisted so that, in the case of an Alter Table that requires rebuilding the index, the behavior will be the one specified at creation time.

Note: It is not possible to delete an FTI for which a background load has been started and still not finished. Attempting this operation causes the ctdbAlterTable() to fail with error CTDBRET_LOADISACTIVE (4166, new error code).

 

This function was originally named ctdbFTIStartBackgroundLoad, which has been deprecated.

Return Values

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

Example

ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_UPD, NULL, CTDB_FTI_OPTION_BACKLOAD_EXISTING) 

See Also

  • ctdbCheckFTIBackgroundLoad()
  • FairCom DB API - FTI Background Initial Load

 

FairCom DB API - FTI Background Initial Load

FairCom Server supports loading a Full-Text Index (FTI) with existing records in the background. To simplify the use of background load thread for FTI at creation time, the FairCom DB API functions listed below have been implemented:

  • ctdbFTICheckBackgroundLoad - Check the status of the background index loader for the Full-Text Index (FTI).
  • ctdbFTIStartBackgroundLoad - Starts a background index loader for the Full-Text Index (FTI).

Note: It is not possible to delete an FTI for which a background load has been started and still not finished. Attempting this operation causes the ctdbAlterTable() to fail with error CTDBRET_LOADISACTIVE (4166 new error code).

 

For advanced users, see:

  • Asynchronous load of Full-Text Index

 

Asynchronous Load of Full-Text Index

For Advanced Users

FairCom Server supports loading a Full-Text Index (FTI) with existing records in the background. This means that an FTI can be quickly created with the data file open in exclusive mode, and then the data file can be closed and reopened in shared mode to permit other connections to access the file while the original connection requests a background thread to load the FTI from the records in the data file.

The status of the background load can be monitored. When the background load is requested, it is assigned a user-defined load identifier. The status of all requested background load operations for a particular data file is stored as a resource in that data file.

The following information is stored for each background load:

/* Background load (of index or record update callback) details: */
typedef struct bgldinf {
        LONG8   queuedtime;     /* time at which the load was queued */
        LONG8   statustime;     /* time at which current status was reported */
        LONG8   offset;         /* current offset in file */
        BGLDSTT status;         /* current load status */
        LONG    errcod;         /* background load error code */
        LONG    curpct;         /* current percentage complete */
        LONG    pad;            /* padding for structure alignment */
        TEXT    cbname[BGLDIDZ];/* name of the callback function */
        TEXT    loadid[BGLDIDZ];/* ID for this background load operation */
} BGLDINF, *pBGLDINF;

Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

Requesting a background load of an FTI

To request a background load of an FTI, call the ctRecordUpdateCallbackControl() function, specifying an opcode of RUCBCTLqueuecallback and the name of the FTI's record update callback function, which is $FTS$indexname.

Below is an example showing the background load of the FTI named myFTSidx001 on the file qafulltextsearch.dat, using a load identifier of myftsload. This load identifier can be used to monitor the progress of the background load. Note that in this example, datno is the data file number for the data file, which the application has already opened:

 RUCBCTL rucbctl;
 RUCBQCB rucbqcb;
 memset(&rucbctl,0,sizeof(rucbctl));
 memset(&rucbqcb,0,sizeof(rucbqcb));
 rucbctl.verson = RUCBCTL_VERS_V01;
 rucbctl.opcode = RUCBCTLqueuecallback;
 rucbctl.bufsiz = sizeof(rucbqcb);
 rucbctl.bufptr = (pTEXT) &rucbqcb;
 rucbqcb.verson = RUCBQCB_VERS_V01;
 rucbqcb.datnam = "qafulltextsearch.dat";
 rucbqcb.datno = datno;
 rucbqcb.cbname = "$FTS$myFTSidx001";
 rucbqcb.loadid = "myftsload";
 if ((rc = (COUNT) ctRecordUpdateCallbackControl(&rucbctl))) {
  printf("Error: Failed to queue record update callback invocation: %d (%d)\n",
   rc,sysiocod);
  goto err_ret;
 }
 printf("Successfully queued record update callback invocation.\n");

The dfkutl utility supports an option to request a background load of an FTI. Use the -queuecbkload option, specifying the name of the data file, the callback, and the load ID. Here is an example:

dfkctl -queuecbkload qafulltextsearch.dat $FTS$myFTSidx001 myftsload
Successfully queued callback invocation for file qafulltextsearch.dat.

Monitoring a background load of a full text index

To monitor the status of a background load of a full text index, call the ctGetBackgroundLoadStatus() function, specifying the data file number and the load identifier. Below is an example:

 printf("Waiting for full text index update to complete...\n");
 while (1) {
  pBGLDRES
   pbgldres;
  BGLDRES
   bgldres;
  NINT rc,reslen;
  /* Read status from resource. */
  reslen = sizeof(bgldres);
  pbgldres = &bgldqres;
  if (!(rc = ctGetBackgroundLoadStatus(datno,rucbqcb.loadid,&pbgldres,&reslen))) {
   printf("Queued callback load resource: status=%d offset=" ctLLnd(10) " (%3d%%)\n",
    bgldres.bgldi[0].status,bgldres.bgldi[0].offset,bgldres.bgldi[0].curpct);
   if (pbgldres->bgldi[0].status == BGLD_SUCCEEDED) {
    printf("Full text index background load completed successfully.\n");
    break;
   } else if (pbgldres->bgldi[0].status == BGLD_FAILED) {
    rc = bgldres.bgldi[0].errcod;
    printf("Error: Full text index background load terminated with error %d\n",
     rc);
    break;
   } else {
    ctThrdSleep(200);
   }
  } else {
   printf("Error: Failed to read background load status: %d\n", rc);
   break;
  }
 }

The dfkutl utility supports an option to read the status of a background load of a full text index. Use the -getbgload option, specifying the name of the data file and the load id. Here is an example:

dfkctl -getbgload qafulltextsearch.dat myftsload -h 1 -i 1 1
status           errcod      offset  curpct callback
succeeded             0     1214400   100%  $FTS$myFTSidx001
            loadid: myftsload
        queuedtime: Wed Sep  6 12:18:40 2017
        statustime: Wed Sep  6 12:16:58 2017

If the load id is not specified, the status of all background loads for a data file is displayed. The background load status is stored in a resource in the data file. The status entries remain in the resource even after the background load completes so the status can be examined after load completion. To delete an entry from this resource, use the dfkctl utility's -delbgload option, specifying the name of the data file and optionally the load ID. If no load id is specified, all entries are deleted from the resource. Here is an example:

dfkctl -delbgload qafulltextsearch.dat myftsload
Successfully deleted status of all background loads for file qafulltextsearch.dat

Requesting a background load of a c-tree B+-tree index

c-tree also supports a background load of a c-tree B+-tree index by calling the ctDeferredIndexControl() function with opcode of ctDeferredIndexControl. It is now possible to assign a load ID to the background load of a B+-tree index.

Below is an example. Note the use of DFKQIL structure version 2 instead of version 1. Version 2 of the structure adds a loadid field:

 DFKCTL dfkctl;
 DFKQIL dfkqil;
 LONG idxlst[1];
 memset(&dfkctl,0,sizeof(dfkctl));
 memset(&dfkqil,0,sizeof(dfkqil));
 dfkctl.verson = DFKCTL_VERS_V01;
 dfkctl.opcode = DFKCTLqueueload;
 dfkctl.bufsiz = sizeof(dfkqil);
 dfkctl.bufptr = (pTEXT) &dfkqil;
 dfkqil.verson = DFKQIL_VERS_V02;
 dfkqil.datno = datno;
 dfkqil.numidx = 1;
 idxlst[0] = datno + 1;
 dfkqil.idxlst = idxlst;
 dfkqil.loadid = "myindexload";
 if ((rc = ctDeferredIndexControl(&dfkctl)) != NO_ERROR) {
  printf("Error: Failed to schedule index load: %d\n",
   rc);
  goto err_ret;
 }
 printf("Successfully scheduled index load.\n");

The dfkutl utility supports an option to request a background load of a B+-tree index. Use the -queueidxload option, specifying the name of the data file, the callback, and the load id. Here is an example:

dfkctl -queueidxload qafulltextsearch.dat 1 id=myftsload
Successfully queued callback invocation for file qafulltextsearch.dat.

See Also

  • ctGetBackgroundLoadStatus
  • ctDeleteBackgroundLoadStatus

 

 

ctGetBackgroundLoadStatus

Declaration

extern ctCONV  NINT ctDECL ctGetBackgroundLoadStatus(COUNT datno,pTEXT loadid,ppBGLDRES ppbgldres,pNINT preslen);

Parameters:

  • [IN] datno - c-tree data file number.
  • [IN] loadid - If not NULL, specifies the ID of one background load operation whose information the caller wants to retrieve.
  • [IN,OUT] ppbgldldres - If *ppbgldres is not NULL, *preslen must be set to the size of this buffer. If the buffer is sufficiently large, the function writes the resource data to that address and sets *preslen to the amount of data written.
    If *ppbgldres is NULL, the function allocates a sufficiently-sized buffer to hold the resource data and sets *preslen to the size of the buffer. The caller is responsible for freeing this buffer by calling mbfree(). Note that in this case we allocate the entry with space for an additional BGLDINF structure so that the caller can add its entry to this buffer if desired.
  • [IN,OUT] preslen - Size of the input or allocated resource buffer. See the description of the ppbgldres parameter above for more details.

Description

ctGetBackgroundLoadStatus() reads the resource that contains the status of all background index and record update callback load operations for the specified data file.

Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

Return Values

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

See Also

  • ctDeleteBackgroundLoadStatus
  • Asynchronous load of Full-Text Index

 

ctDeleteBackgroundLoadStatus

Declaration

extern ctCONV  NINT ctDECL ctDeleteBackgroundLoadStatus(COUNT datno,pTEXT loadid);

Parameters:

  • [IN] datno - c-tree data file number.
  • [IN] loadid - If not NULL, specifies the ID of one background load operation whose information the caller wants to delete from the resource. If NULL, the entire resource is deleted.

Description

ctDeleteBackgroundLoadStatus() deletes the resource that contains the status of all background load operations for the specified data file, or deletes the specified entry from this resource.

Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

Return Values

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

See Also

  • ctGetBackgroundLoadStatus
  • Asynchronous load of Full-Text Index

 

FairCom DB API C++, Java, and .NET FTS API

This section provides reference information about the following three APIs, that are all based on the FairCom DB API API:

  1. FairCom DB API C++
  2. FairCom DB API Java (see the JTDB Javadocs)
  3. FairCom DB API .NET

Each of these APIs is very similar in function. The tables below show the function name for each API.

Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

The remainder of this chapter provides function details for the FairCom DB API C++ API. The details are the same for FairCom DB API Java and FairCom DB API .NET. Only the API name will vary, as detailed in the tables below:

Class: CTTable

The following methods are available in the CTTable class:

  c-treeDB C++ Methods c-treeDB .NET Methods c-treeDB Java Methods
Add a new index to the table AddFullTextIndex AddFullTextIndex

AddFullTextIndex

(java.lang.String name)

Remove a full text index from a table

DelFullTextIndex

by index number

by index name

DelFullTextIndex

by index number

by index name

DelFullTextIndex

by index number

by index name

Retrieve a Full Text index object

GetFullTextIndex

by index number

by index name

GetFullTextIndex

by index number

by index name

GetFullTextIndex

by index number

by index name

Retrieve the number of full text indexes of this table GetFullTextIndexCount GetFullTextIndexCount GetFullTextIndexCount
Set Full Text index options

SetFullTextIndexOption

by index object

by index number

by index name

SetFullTextIndexOption

by index object

by index number

by index name

SetFullTextIndexOption

by index object

by index number

by index name

Class: CTRecord

The following methods are available in the CTRecord class:

  c-treeDB C++ Methods c-treeDB .NET Methods c-treeDB Java Methods
Set the new record default Full Text index number SetDefaultFullTextIndex SetDefaultFullTextIndex SetDefaultFullTextIndex
Set the full text search criteria FullTextSearchOn FullTextSearchOn FullTextSearchOn
Indicate if Full Text Search is active on the record IsFullTextSearchOn IsFullTextSearchOn IsFullTextSearchOn
Turn off Full Text Search FullTextSearchOff FullTextSearchOff FullTextSearchOff
Get detail about Full Text Search error GetFullTextSearchErrorDetails GetFullTextSearchErrorDetails GetFullTextSearchErrorDetails

Class: CTFullTextIndex

The following is available in the CTFullTextIndex class:

  c-treeDB C++ Methods c-treeDB .NET Methods c-treeDB Java Methods
Constructors

CTFullTextIndex()

()

(CTFullTextIndex)

CTFullTextIndex

()

(CTFullTextIndex)

CTFullTextIndex

()

(CTFullTextIndex)

Add a new field to Full Text Index AddField AddField AddField
Retrieve a Full Text index field GetField GetField GetField
Retrieve the Full Text index field count. GetFieldCount GetFieldCount GetFieldCount
Retrieve a Full Text index field mode GetFieldMode GetFieldMode GetFieldMode
Retrieve the status of the index handle. GetStatus GetStatus GetStatus
Set Full Text index options SetOption SetOption SetOption
Retrieve the Full Text index name GetName GetName GetName
Retrieve the ordinal number of the current Full Text index GetNumber GetNumber GetNumber

Class: CTFullTextIndexDictionary

The following is available in the FairCom DB API CTFullTextIndexDictionary class:

  c-treeDB C++ Methods c-treeDB .NET Methods c-treeDB Java Methods
Constuctors

CTFullTextIndexDictionary

()

(CTDatabase)

(CTFullTextIndexDictionary)

(CTSession)

(CTTable)

CTFullTextIndexDictionary

()

(CTDatabase)

(CTFullTextIndexDictionary)

(CTSession)

(CTTable)

CTFullTextIndexDictionary

()

(CTDatabase)

(CTFullTextIndexDictionary)

(CTSession)

(CTTable)

Set Full Text index Dictionary options SetOption SetOption SetOption

 

Class: CTTable

The following methods are available in the CTTable class:

  • AddFullTextIndex
  • DelFullTextIndex (by index number)
  • DelFullTextIndex (by index name)
  • GetFullTextIndex (by index number)
  • GetFullTextIndex (by index name)
  • AddField (by field object)
  • AddField (by name)
  • AddField (by number)
  • GetFullTextIndexCount
  • SetFullTextIndexOption (by index object)
  • SetFullTextIndexOption (by index number)
  • SetFullTextIndexOption (by index name)

See Also

  • FairCom DB API C++, Java, and .NET FTS API

 

AddField (by field object)

Declaration

void CTTable::AddField(CTFullTextIndex& pIndex, const CTField& pField, ULONG mode)

Parameters:

  • pIndex [IN] - Full-Text Index object.
  • pField [IN] - Field to compose the index.
  • mode [IN] - Full-Text search field matching mode

Description

Add a new segment to an index.

Return Values

Return a segment object.

See Also

  • AddField (by name)
  • AddField (by number)
  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

AddField (by name)

Declaration

void CTTable::AddField(CTFullTextIndex& pIndex, const CTString& FieldName, ULONG mode)

Parameters:

  • pIndex [IN] - Full-Text Index object.
  • FieldName [IN] - Name of field to compose the index.
  • mode [IN] - Full-Text Search field matching mode.

Description

Add a new segment to an index using the field name.

Return Values

Return a segment object.

See Also

  • AddField
  • AddField (by name)
  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

AddField (by number)

Declaration

void CTTable::AddField(CTFullTextIndex& pIndex, NINT FieldNumber, ULONG mode)

Parameters:

  • pIndex [IN] - Full-Text Index object.
  • FieldNumber [IN] - Number of field to compose the index.
  • mode [IN] - Full-Text search field matching mode.

Description

Add a new segment to an index using the field number.

Return Values

Return a segment object.

See Also

  • AddField
  • AddField (by name)
  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

AddFullTextIndex

Declaration

CTFullTextIndex CTTable::AddFullTextIndex(const CTString& name)

Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

Parameters:

  • name [IN] - Full-Text Index name.

Description

Add a new Full-Text Index to the table.

Return Values

Return an index object.

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

See Also

  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

DelFullTextIndex (by index name)

Declaration

void CTTable::DelFullTextIndex(const CTString& IndexName)

Parameters:

  • IndexName [IN]

Description

Delete a Full-Text Index from a table using the index name.

Return Values

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

See Also

  • DelFullTextIndex (by index number)
  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

DelFullTextIndex (by index number)

Declaration

void CTTable::DelFullTextIndex(NINT IndexNumber)

Parameters:

  • IndexNumber [IN]

Description

Delete a Full-Text index from a table using the index number.

Return Values

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

See Also

  • DelFullTextIndex (by index name)
  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

GetFullTextIndex (by index name)

Declaration

CTFullTextIndex CTTable::GetFullTextIndex(const CTString& name)

Parameters:

  • name [IN] - Index name

Description

Retrieve the Full-Text Index using its name.

Return Values

Return an index object.

See Also

  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

GetFullTextIndex (by index number)

Declaration

CTFullTextIndex CTTable::GetFullTextIndex(NINT IndexNumber)

Parameters:

  • IndexNumber [IN]

Description

Retrieve a Full-Text Index object using the index number.

Return Values

Return an index object.

See Also

  • GetFullTextIndex (by name)
  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

GetFullTextIndexCount

Declaration

COUNT CTTable::GetFullTextIndexCount() const

Parameters:

None.

Description

Retrieve the number of Full-Text Indexes of this table.

Return Values

Return the number of indexes.

See Also

  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

SetFullTextIndexOption (by index object)

Declaration

void CTTable::SetFullTextIndexOption(CTFullTextIndex& pIndex, UCOUNT option, CTString &pvalue, ULONG lvalue)

Parameters:

  • pIndex [IN] - Full-Text Index object.
  • option [IN] - Option to set.
  • pvalue [IN] - Value to set the option to for option taking string value.
  • lvalue [IN] - Value to set the option to for option taking long value.

Description

Set Full-Text Index options using the Full-Text Index object.

See Also

  • SetFullTextIndexOption (by number)
  • SetFullTextIndexOption (by name)
  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

SetFullTextIndexOption (by name)

Declaration

void CTTable::SetFullTextIndexOption(const CTString& IndexName, UCOUNT option, CTString &pvalue, ULONG lvalue)

Parameters:

  • IndexName [IN] - Full-Text Index name.
  • option [IN] - Option to set.
  • pvalue [IN] - Value to set the option to for option taking string value.
  • lvalue [IN] - Value to set the option to for option taking long value.

Description

Set Full-Text Index options using the Full-Text Index name.

See Also

  • SetFullTextIndexOption
  • SetFullTextIndexOption (by number)
  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

SetFullTextIndexOption (by number)

Declaration

void CTTable::SetFullTextIndexOption(NINT IndexNbr, UCOUNT option, CTString &pvalue, ULONG lvalue)

Parameters:

  • IndexNbr [IN] - Full-Text Index number.
  • option [IN] - Option to set.
  • pvalue [IN] - Value to set the option to for option taking string value.
  • lvalue [IN] - Value to set the option to for option taking long value.

Description

Set Full-Text Index options using the Full-Text Index number.

See Also

  • SetFullTextIndexOption
  • SetFullTextIndexOption (by name)
  • Class: CTTable
  • FairCom DB API C++, Java, and .NET FTS API

 

Class: CTRecord

The following methods are available in the CTRecord class:

  • SetDefaultFullTextIndex
  • FullTextSearchOn
  • IsFullTextSearchOn
  • FullTextSearchOff
  • GetFullTextSearchErrorDetails

See Also

  • FairCom DB API C++, Java, and .NET FTS API

 

FullTextSearchOff

Declaration

void CTRecord::FullTextSearchOff()

Parameters:

None.

Description

Turn off Full-Text Search.

Return Values

None.

See Also

  • Class: CTRecord
  • FairCom DB API C++, Java, and .NET FTS API

 

FullTextSearchOn

Declaration

void CTRecord::FullTextSearchOn(CTString &query)

Parameters:

  • query [IN] - The full text search query

Description

Set the Full-Text Search criteria.

Note: The Full-Text Index must be committed before it can be used. An FTI cannot be created and dropped in the same transaction.

 

Return Values

None.

See Also

  • Class: CTRecord
  • FairCom DB API C++, Java, and .NET FTS API

 

GetFullTextSearchErrorDetails

Declaration

void CTRecord::GetFullTextSearchErrorDetails(pNINT error, pNINT position)

Parameters:

  • error [OUT] - error occurred.
  • position [OUT] - position of the error in the Full Text Search query string.

Returns:

Description

Get detail about Full-Text Search error.

Return Values

None.

See Also

  • Class: CTRecord
  • FairCom DB API C++, Java, and .NET FTS API

 

IsFullTextSearchOn

Declaration

CTBOOL CTRecord::IsFullTextSearchOn()

Parameters:

None.

Description

Indicate if Full-Text Search is active on the record.

Return Values

Return YES if the Full-Text Search is on.

See Also

  • Class: CTRecord
  • FairCom DB API C++, Java, and .NET FTS API

 

SetDefaultFullTextIndex

Declaration

void CTRecord::SetDefaultFullTextIndex(NINT indexno)

Parameters:

  • indexno [IN] - The new record Full Text index number.

Description

Set the new record default Full Text index number.

Return Values

None.

See Also

  • Class: CTRecord
  • FairCom DB API C++, Java, and .NET FTS API

 

Class: CTFullTextIndex::CTFullTextIndex()

The following methods are available in this class:

  • AddField
  • GetField
  • GetFieldCount
  • GetFieldMode
  • CTFullTextIndex::GetStatus
  • CTFullTextIndex::SetOption
  • GetName
  • GetNumber

Constructor:

CTFullTextIndex::CTFullTextIndex(const CTFullTextIndex& pIndex)

Destructor:

CTFullTextIndex::~CTFullTextIndex()

Operator: =

Assign a CTIndex object to another.

CTFullTextIndex& CTFullTextIndex::operator=(const CTFullTextIndex& pIndex)

Parameters:

None.

Returns:

Return the assigned object.

See Also

  • FairCom DB API C++, Java, and .NET FTS API

 

AddField

Declaration

void CTFullTextIndex::AddField(const CTField& pField, ULONG mode)

Parameters :

  • pField [IN] - Field object.
  • mode [IN] - Full-Text Search field matching mode.

Description

Add a new field to a Full-Text Index.

See Also

  • Class: CTFullTextIndex::CTFullTextIndex()
  • FairCom DB API C++, Java, and .NET FTS API

 

GetField

Declaration

CTField CTFullTextIndex::GetField(NINT FieldNumber)

Parameters:

  • FieldNumber [IN]

Description

Retrieve a Full-Text Index field.

Return Values

Field object.

See Also

  • Class: CTFullTextIndex::CTFullTextIndex()
  • FairCom DB API C++, Java, and .NET FTS API

 

GetFieldCount

Declaration

NINT CTFullTextIndex::GetFieldCount() const

Parameters:

None.

Description

Retrieve the Full-Text Index field count.

Return Values

Return the Full-Text Index field count.

See Also

  • Class: CTFullTextIndex::CTFullTextIndex()
  • FairCom DB API C++, Java, and .NET FTS API

 

GetFieldMode

Declaration

ULONG CTFullTextIndex::GetFieldMode(NINT FieldNumber)

Parameters:

  • FieldNumber [IN]

Description

Return the field mode specified when adding the field to the FTI.

Return Values

Return the field mode or 0 on error.

See Also

  • Class: CTFullTextIndex::CTFullTextIndex()
  • FairCom DB API C++, Java, and .NET FTS API

 

GetName

Declaration

CTString CTFullTextIndex::GetName()

Parameters:

None.

Description

Retrieve the Full-Text Index name.

Return Values

Return the Full-Text Index name.

See Also

  • Class: CTFullTextIndex::CTFullTextIndex()
  • FairCom DB API C++, Java, and .NET FTS API

 

GetNumber

Declaration

NINT CTFullTextIndex::GetNumber()

Parameters:

None.

Description

Retrieve the Full-Text Index number.

Return Values

Return the Full-Text Index number.

See Also

  • Class: CTFullTextIndex::CTFullTextIndex()
  • FairCom DB API C++, Java, and .NET FTS API

 

GetStatus

Declaration

ULONG CTFullTextIndex::GetStatus() const

Parameters:

None.

Description

Retrieve the status of the index handle. The status of the index handle is a bit map describing one or more of the following:

  • CTDBFTI_OLD - Original value (no changes)
  • CTDBFTI_NEW - Index added
  • CTDBFTI_DEL - Original Index deleted
  • CTDBFTI_OPTUPD - Optinos have been updated

Return Values

Return the index handle status.

See Also

  • Class: CTFullTextIndex::CTFullTextIndex()
  • FairCom DB API C++, Java, and .NET FTS API

 

SetOption

Declaration

void CTFullTextIndex::SetOption(UCOUNT option, CTString &pvalue, ULONG lvalue)

Parameters:

  • option [IN] - Option to set.
  • pvalue [IN] - Value to set the option to for option taking string value.
  • lvalue [IN] - Value to set the option to for option taking long value.

Description

Set Full-Text Index options.

See Also

  • Class: CTFullTextIndex::CTFullTextIndex()
  • FairCom DB API C++, Java, and .NET FTS API

 

Class: CTFullTextIndexDictionary

FairCom DB API C++ API Full-Text Index methods:

  • CTFullTextIndexDictionary::SetOption

Constructor:

CTFullTextIndexDictionary::CTFullTextIndexDictionary(const CTSession &session)

Constructor:

CTFullTextIndexDictionary::CTFullTextIndexDictionary(const CTDatabase &database)

Constructor:

CTFullTextIndexDictionary::CTFullTextIndexDictionary(const CTTable &table)

Destructor:

CTFullTextIndexDictionary::~CTFullTextIndexDictionary()

See Also

  • FairCom DB API C++, Java, and .NET FTS API

 

CTFullTextIndexDictionary::SetOption

Declaration

void CTFullTextIndexDictionary::SetOption(UCOUNT option, CTString &pvalue, ULONG lvalue)

Parameters:

  • option [IN] - Option to set.
  • pvalue [IN] - Value to set the option to for option taking string value.
  • lvalue [IN] - Value to set the option to for option taking long value.

Description

Set Full-Text Index dictionary options.

Return Values

Value Symbolic Constant Explanation
0 CTDBRET_OK Successful operation.

See Errors for a complete listing of valid c-tree Plus error values.

See Also

  • Class: CTFullTextIndexDictionary
  • FairCom DB API C++, Java, and .NET FTS API

 

ctCopyFile extended to copy Full-Text Indexes

In FairCom DB V11.5 and later, the ctCopyFile and ctfcpCopyFile functions support copying Full-Text Index (FTI) files belonging to the copied table.

  • The original FTI name and the new names need to be specified after all physical index names.
  • The FTIs need to be specified by the name used when creating them (i.e. name with directory, when used, and without extension).

 

Tokenizers

Internally, FairCom Full-Text Search uses a "tokenizer" to divide text into "tokens," which are roughly equivalent to a list of categorized words. A tokenizer follows a set of rules for extracting tokens (usually single words) from a text string or search query. Several algorithms can be used to tokenize text. The simplest uses white space to determine the boundaries between words. More advanced algorithms can be used when necessary.

FairCom Full-Text Search provides built-in support for several tokenizers:

Type Algorithm Recommended Usage
Simple Essentially uses white space and punctuation to delimit tokens. Not case-sensitive. This is the default. Allows for quick and easy searching.
Porter Uses "stemming" to reduce words to a common root to allow grammatically similar words to be matched. For example, "searching" and "searched" have the same stem, "search." The Porter tokenized creates more compact indices by compacting words to their simplest form. This type of stemming can return false positive results.
ICU Allow international support following Unicode rules for handling supported languages. Case-sensitive (depending on configuration). When Unicode support is required, this is the recommended tokenizer.
Custom Allows FairCom customers to develop their own tokenizers for special requirements. You can create a custom tokenizer if you have special requirements. Sample code is provided to get you started.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

See also:

  • Full-Text Search ICU Tokenizer
  • Full-Text Search Custom Tokenizer

 

Full-Text Search ICU Tokenizer

FairCom Full-Text Search capabilities are available on Unicode text by using a ICU based tokenizer. An application can configure the ICU tokenizer as follows:

  1. Call ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_TOKENIZER, NULL, CTDB_FTI_TOKENIZER_ICU)
  2. Call ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_ICULANG, XXXX , 0) where XXXX is a string specifying the locale
  3. Call ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_ICUOPTION, NULL, YYYY) where YYYY is a ctKSEG_COMPU* combination.

Calls to ctdbAddFTIField or ctdbAddFTIFieldByName can specify the mode parameter using the following values:

  • CTDB_FTI_MODE_REG: source string encoding depends on the DODA, CT_*STRING are considered in UTF-8 format, Ct_*UNICODE are considered in UTF-16 format.
  • CTDB_FTI_MODE_UTF-8: source string in UTF-8 encoding
  • CTDB_FTI_MODE_UTF-16: source string in UTF-16 encoding

The CTDB_FTI_OPTION_ICUSTRENGTH option has been renamed to CTDB_FTI_OPTION_ICUOPTION.

 

Full-Text Search Custom Tokenizer

In addition to the default tokenizers provided with FairCom DB FTS, support is available to call a DLL containing a custom tokenizer. FairCom Full-Text Search allows programmers to create their own full-text tokenizer and set it as the tokenizer to be used in a full-text index.

To use a custom tokenizer, the programmer must do the following:

  1. Provide a DLL that the server can load.
  2. When creating the full-text index, use the ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_CUSTLIB...) function to specify the library name (without the .DLL extension on Windows and without the lib on Unix).
  3. Use the ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_CUSTPARAM...) function to pass custom configuration parameters to the tokenizer.

The c-tree source code in the sdk\Xtras\ctree.samples\special\tokenizer directory contains an example of a custom tokenizer, easytok.c, and a stub for the tokenizer to be implemented by programmers in tokenizer.c. Both files have no dependency on any c-tree code and can be simply compiled as a DLL (for instance cl /LD easytok.c on Windows) or shared library on Unix and copied to a place where the server can load them.

See the complete FairCom Full-Text Search documentation for a list the functions that must be implemented.

 

Tokenizer_init

Initialize the tokenizer. This function is called:

  1. at index creation to verify that the shared library/DLL can be properly open, the functions are resolved, and they are callable.
  2. before starting the text to be indexed tokenization
  3. when parsing search query in order to tokenize the request

Parameters:

  • texttype [IN] - the type of the text passed in: CTDB_FTI_MODE_REG, CTDB_FTI_MODE_UTF8, CTDB_FTI_MODE_UTF16
  • text [IN] - text to be tokenized. The memory it points to it is guaranteed to be valid until the Tokenizer_reset or the Tokenizer_end call.
    Notice that the text is passed only at init time so it is the implementer's responsibility to keep track of it and the "current position"
  • textsize [IN] - size (in bytes) of the text passed in
  • maxtokensize [IN] - size (in bytes) of the maximum token length the Index has been set to
  • param [IN] - tokenizer parameter string passed by the application.
  • errcode [OUT] - error code. (Guaranteed to be != NULL)

Returns:

Tokenize context handle that will be passed to the other functions.

NULL in case of error.

Usage:

DLLexport void* Tokenizer_init (unsigned long texttype, char* text, size_t textsize, long maxtokensize, char* param, int* errcode)

 

Tokenizer_reset

Resets the text and its size for an already initialized tokenizer.

This function is used mainly during searches to tokenize the various search items of a search query.

Parameters:

  • handle [IN] - tokenizer context handle that needs the text to be refreshed.
  • text [IN] - text to be tokenized. The memory it points to it is guaranteed to be valid until the next Tokenizer_reset or the Tokenizer_end call.
  • textsize [IN] - size (in bytes) of the text passed in.

Returns:

CTDBRET_OK if successful, or the c-tree error code on failure.

Usage:

DLLexport int Tokenizer_reset (void *handle, char* text, size_t textsize)

 

Tokenizer_next

Determines and returns the next token in the text.

Parameters:

  • handle [IN] - tokenizer context handle
  • size [OUT] - length in bytes of the returned token. (Guaranteed to be != NULL) 
    In case of error, this needs to be set != 0.
    In case of end-of-text (no more tokens) needs to be set to 0.

Returns:

'\0' terminated string containing the next token, which needs to point to memory that needs to stay valid until the next tokenizer function call.

NULL in case of error (size != 0) or end-of-text (size == 0)

Usage:

DLLexport char *Tokenizer_next (void* handle, int *size)

 

Tokenizer_end

Terminates the use of the tokenizer. It is the implementer's responsibility to release any resource it might have allocated.

Parameters:

  • handle [IN] - tokenizer context handle.

Usage:

DLLexport void Tokenizer_end (void* handle)