Full text search in SQL Anywhere is performed using a text index. Each value in a column on which a text index has been built is referred to as a document. When a text index is created, each document is processed by a built-in term breaker to determine the terms (also referred to as tokens) contained in the document, and the positions of the terms in the document. The built-in term breaker also performs term breaking on the documents (text components) of a query string. For example, the query string 'rain or shine' consists of two documents, 'rain' and 'shine', connected by the OR operator. The built-in term breaker algorithm is also used to break into terms the stoplist and the input of a sa_char_terms system procedure.
Depending on the needs of your application, you may find some behaviors of the built-in term breaker to be undesirable or limiting. For example, the built-in term breaker does not offer language-specific term breaking. Here are some other reasons you may want to implement custom term breakering:
No language-specific term breaking Linguistic rules with respect to what constitutes a term differs from one language to another. Consequently, term breaking rules are different from one language to another. The built-in term breaker does not offer language-specific term breaking rules.
Handling of words with apostrophes The word "they'll" is treated as "they ll" by the GENERIC term breaker. However, you could design a custom term breaker that treats the apostrophe as part of the word.
No support for term replacement You cannot specify replacements for a term. For example, when indexing the word "they'll", you might want to store it as two terms: they and will. Likewise, you may want to use term replacement to perform a case insensitive search on a case sensitive database.
SQL Anywhere also allows you to use external prefilter libraries to perform prefiltering on data before it is indexed. Prefiltering allows you to extract only the textual content you want indexed from a document. For example, suppose you want to create a text index on a column containing XML values. A prefilter allows you to filter out the XML tags so that they are not indexed with the content.
SQL Anywhere provides an API you can use to access custom and 3rd party prefilter and term breaker libraries when creating and updating full text indexes. This means you can use external libraries to take document formats like XML, PDF, and Word and remove unwanted terms and content before indexing their content.
Some sample prefilter and term breaker libraries are included in your Samples directory to help you design your own, or you can use the API to access 3rd party libraries. If Microsoft Office is installed on the system running the database server then IFilters for Office documents such as Word and Excel are available. If the server has Acrobat Reader installed, then a PDF IFilter is likely available.
Discuss this page in DocCommentXchange.
|Copyright © 2010, iAnywhere Solutions, Inc. - SQL Anywhere 12.0.0|