对使用 NGRAM 术语断开器的文本索引执行非模糊全文搜索。此过程也可用于创建中文、日文或朝鲜文数据的全文搜索。
前提条件
您必须具有 CREATE TEXT CONFIGURATION 和 CREATE TABLE 系统特权。还必须具有 SELECT ANY TABLE 系统特权或对于表 MarketingInformation 的 SELECT 特权。
上下文和注释
在具有多字节字符集的数据库中,某些标点符号和空格字符(如全角逗号和全角空格)可能会处理为字母数字字符。
启动 Interactive SQL。单击 [开始] » [程序] » [SQL Anywhere 16] » [管理工具] » [Interactive SQL]。
在 [连接] 窗口中,填写以下字段:
在 [验证] 下拉列表中,选择 [数据库]。
在 [用户 ID] 字段中键入 DBA。
在 [口令] 字段中键入 sql。
在 [操作] 下拉列表中选择 [使用 ODBC 数据源连接]。
执行以下语句以创建名为 myNcharNGRAMTextConfig 的 NCHAR 文本配置对象:
CREATE TEXT CONFIGURATION myNcharNGRAMTextConfig FROM default_nchar; |
执行以下语句,将 TERM BREAKER 算法更改为 NGRAM,并将 MAXIMUM TERM LENGTH 设置为 2:
ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig TERM BREAKER NGRAM; ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig MAXIMUM TERM LENGTH 2; |
对于中文、日文和朝鲜文数据,N 的建议值为 2 或 3。如果搜索限制为一个或两个字符,则将 N 值设置为 1。将 N 值设置为 1 可能会降低较长查询的执行速度。
启动 Sybase Central。单击 [开始] » [程序] » [SQL Anywhere 16] » [管理工具] » [Sybase Central]。
单击 [连接] » [使用 SQL Anywhere 16 连接]。
在 [连接] 窗口中,填写以下字段:
在 [验证] 下拉列表中,选择 [数据库]。
在 [用户 ID] 字段中键入 DBA。
在 [口令] 字段中键入 sql。
在 [操作] 下拉列表中选择 [使用 ODBC 数据源连接]。
创建 MarketingInformation 表的副本。
展开 [表] 文件夹。
右击 MarketingInformation,然后单击 [复制]。
右击 [表] 文件夹,然后单击 [粘贴]。
在 [名称] 字段中键入 MarketingInformationNgram。
单击 [确定]。
在 Interactive SQL 中,执行以下语句向 MarketingInformationNgram 表中添加数据:
INSERT INTO MarketingInformationNgram SELECT * FROM GROUPO.MarketingInformation; COMMIT; |
执行以下语句可在 MarketingInformationNgram.Description 列上创建使用 myNcharNGRAMTextConfig 文本配置对象的 IMMEDIATE REFRESH 文本索引:
CREATE TEXT INDEX ncharNGRAMTextIndex ON MarketingInformationNgram( Description ) CONFIGURATION myNcharNGRAMTextConfig; |
测试文本索引。
下列语句在 2 元语法词文本索引中搜索包含 sw 的术语。结果按分数降序排序。
SELECT M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'sw' ) ct ORDER BY ct.score DESC; |
Description | Score |
---|---|
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown
cotton hooded Sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html>
|
2.262071918398649 |
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes
durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester
blend makes it easy to keep them clean.</span></p></body></html>
|
1.5556043490424176 |
以下语句搜索包含 ams 的术语。结果按分数降序排序。
SELECT M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'ams' ) ct ORDER BY ct.score DESC; |
对于 2 元语法词文本索引,以上语句在语义上等效于:
SELECT M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, '"am ms"' ) ct ORDER BY ct.score DESC; |
两个语句均会返回以下结果:
Description | Score |
---|---|
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically
grown cotton hooded sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html>
|
1.6619019465461564 |
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction
includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy
to keep them clean.</span></p></body></html>
|
1.5556043490424176 |
以下语句搜索包含后跟任意字母数字字符的 v 的术语。因为 ve 在索引数据中出现频率较高,所以包含 2 元语法词 ve 的行所得到的分数要低于包含 vi 的行。结果按分数降序排序。
SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'v*' ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
901 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favvorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html>
|
3.3416789108071976 |
907 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic
headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html>
|
2.1123084896159376 |
905 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool
cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>
|
1.6750365447462499 |
910 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton
shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html>
|
0.9244136988525732 |
906 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown
cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help
neutralize odor.</span></p></body></html>
|
0.9134171046194403 |
904 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat
is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html>
|
0.7313071661212746 |
903 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt
made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html>
|
0.6799436746197272 |
以下语句在各行中搜索包含 v 的全部术语。第二个语句之后,变量包含字符串 [av OR ev OR iv OR ov OR rv OR ve OR vi OR vo
]。结果按分数降序排序。如果某一 n 元语法词在所有索引行中出现,则为其指定的分数为零。
如果单个字符出现在空白字符或非字母数字字符之前,那么这是用于定位该字符的唯一方法。
CREATE VARIABLE query NVARCHAR (100); SELECT LIST (term, ' OR ' ) INTO query FROM sa_text_index_vocab_nchar( 'ncharNGRAMTextIndex', 'MarketingInformationNgram', 'dba' ) WHERE term LIKE '%v%'; SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, query ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
901 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html>
|
6.654350268810443 |
907 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic
headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html>
|
4.265623837817126 |
903 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt
made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html>
|
2.9386676702799504 |
910 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton
shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html>
|
2.5481193655722336 |
904 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat
is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html>
|
2.4293498211307214 |
905 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool
cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>
|
1.6750365447462499 |
906 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown
cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help
neutralize odor.</span></p></body></html>
|
0.9134171046194403 |
902 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This simple, sleek, and
lightweight technical shirt is designed for high-intensity workouts in hot and humid weather. The recycled polyester fabric
is gentle on the earth and soft against your skin.</span></p></body></html>
|
0 |
908 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically
grown cotton hooded sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html>
|
0 |
909 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction
includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20%
polyester blend makes it easy to keep them clean.</span></p></body></html>
|
0 |
以下语句在 Description 列中搜索包含 ea、ka 和 ki 的行。
SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'ea ka ki' ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
904 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html>
|
3.4151032739119733 |
以下语句在 Description 列中搜索包含 ve 和 vi,但不包含 gg 的行。
SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 've & vi -gg' ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
905 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool
cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>
|
1.6750365447462499 |
关闭 Interactive SQL 和 Sybase Central。
![]() |
使用DocCommentXchange讨论此页。
|
版权 © 2013, SAP 股份公司或其关联公司. - SAP Sybase SQL Anywhere 16.0 |