可通过以下过程对使用 NGRAM 术语断开器的文本索引执行全文搜索。此过程也可用于创建中文、日文或朝鲜文数据的全文搜索。
在具有多字节字符集的数据库中,某些标点符号和空格字符(如全角逗号和全角空格)可能会处理为字母数字字符。
启动 Interactive SQL 并使用 SQL Anywhere 12 Demo 数据源连接到示例数据库。
执行以下语句以创建名为 myNcharNGRAMTextConfig 的 NCHAR 文本配置对象:
CREATE TEXT CONFIGURATION myNcharNGRAMTextConfig FROM default_nchar; |
执行以下语句,将 TERM BREAKER 算法更改为 NGRAM,并将 MAXIMUM TERM LENGTH (N) 设置为 2。
ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig TERM BREAKER NGRAM; |
ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig MAXIMUM TERM LENGTH 2; |
对于中文、日文和朝鲜文数据,N 的建议值为 2 或 3。如果搜索限制为一个或两个字符,则将 N 值设置为 1。将 N 值设置为 1 可能会降低较长查询的执行速度。
启动 Sybase Central 并使用 SQL Anywhere 12 Demo 数据源连接到示例数据库。
创建 MarketingInformation 表的副本。
展开 [表] 文件夹。
右击 MarketingInformation,然后单击 [复制]。
右击 [表] 文件夹,然后单击 [粘贴]。
在 [名称] 字段中键入 MarketingInformationNgram。
单击 [确定]。
在 Interactive SQL 中,执行以下语句向 MarketingInformationNgram 表中添加数据:
INSERT INTO MarketingInformationNgram SELECT * FROM MarketingInformation; COMMIT; |
执行以下语句可在 MarketingInformationNgram.Description 列上创建使用 myNcharNGRAMTextConfig 文本配置对象的 IMMEDIATE REFRESH 文本索引:
CREATE TEXT INDEX ncharNGRAMTextIndex ON MarketingInformationNgram( Description ) CONFIGURATION myNcharNGRAMTextConfig; |
执行以下语句来测试文本索引。
下列语句在 2 元语法词文本索引中搜索包含 sw 的术语。结果按分数降序排序。
SELECT M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'sw' ) ct ORDER BY ct.score DESC; |
Description | Score |
---|---|
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown
cotton hooded Sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html>
|
2.262071918398649 |
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes
durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester
blend makes it easy to keep them clean.</span></p></body></html>
|
1.5556043490424176 |
以下语句搜索包含 ams 的术语。结果按分数降序排序。
SELECT M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'ams' ) ct ORDER BY ct.score DESC; |
对于 2 元语法词文本索引,以上语句在语义上等效于:
SELECT M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, '"am ms"' ) ct ORDER BY ct.score DESC; |
Description | Score |
---|---|
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded sweatshirt
with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html>
|
1.6619019465461564 |
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy
to keep them clean.</span></p></body></html>
|
1.5556043490424176 |
以下语句搜索包含后跟任意字母数字字符的 v 的术语。因为 ve 在索引数据中出现频率较高,所以包含 2 元语法词 ve 的行所得到的分数要低于包含 vi 的行。结果按分数降序排序。
SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'v*' ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
901 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span
style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favvorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html>
|
3.3416789108071976 |
907 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic
headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html>
|
2.1123084896159376 |
905 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>
|
1.6750365447462499 |
910 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Shorts</title></head><body lang=EN-US><p><span
style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html>
|
0.9244136988525732 |
906 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown
cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help
neutralize odor.</span></p></body></html>
|
0.9134171046194403 |
904 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html>
|
0.7313071661212746 |
903 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span
style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html>
|
0.6799436746197272 |
以下语句在各行中搜索包含 v 的全部术语。第二个语句之后,变量包含字符串 [av OR ev OR iv OR ov OR rv OR ve OR vi OR vo
]。结果按分数降序排序。如果某一 n 元语法词在所有索引行中出现,则为其指定的分数为零。
如果单个字符出现在空白字符或非字母数字字符之前,那么这是用于定位该字符的唯一方法。
CREATE VARIABLE query NVARCHAR (100); SELECT LIST (term, ' OR ' ) INTO query FROM sa_text_index_vocab_nchar( 'ncharNGRAMTextIndex', 'MarketingInformationNgram', 'dba' ) WHERE term LIKE '%v%'; SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, query ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
901 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html>
|
6.654350268810443 |
907 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic
headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html>
|
4.265623837817126 |
903 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A
sporty, casual shirt made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html>
|
2.9386676702799504 |
910 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These
quick-drying cotton shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html>
|
2.5481193655722336 |
904 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This
fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html>
|
2.4293498211307214 |
905 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A
lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>
|
1.6750365447462499 |
906 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown
cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help
neutralize odor.</span></p></body></html>
|
0.9134171046194403 |
902 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This
simple, sleek, and lightweight technical shirt is designed for high-intensity workouts in hot and humid weather. The recycled
polyester fabric is gentle on the earth and soft against your skin.</span></p></body></html>
|
0 |
908 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight
100% organically grown cotton hooded sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html>
|
0 |
909 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch
construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An
80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html>
|
0 |
以下语句在 Description 列中搜索包含 ea、ka 和 ki 的行。
SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'ea ka ki' ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
904 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html>
|
3.4151032739119733 |
以下语句在 Description 列中搜索包含 ve 和 vi,但不包含 gg 的行。
SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 've & vi -gg' ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
905 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>
|
1.6750365447462499 |
从 Interactive SQL 和 Sybase Central 断开连接。
(可选)将示例数据库 (demo.db) 恢复至其初始状态。
![]() |
使用DocCommentXchange讨论此页。
|
版权 © 2012, iAnywhere Solutions, Inc. - SQL Anywhere 12.0.1 |