Click here to view and discuss this page in DocCommentXchange. In the future, you will be sent there automatically.

SAP Sybase SQL Anywhere 16.0 (中文) » SQL Anywhere 服务器 - SQL 用法 » 查询和数据修改 » 全文搜索

 

教程:对 NGRAM 文本索引执行非模糊全文搜索

对使用 NGRAM 术语断开器的文本索引执行非模糊全文搜索。此过程也可用于创建中文、日文或朝鲜文数据的全文搜索。

前提条件

您必须具有 CREATE TEXT CONFIGURATION 和 CREATE TABLE 系统特权。还必须具有 SELECT ANY TABLE 系统特权或对于表 MarketingInformation 的 SELECT 特权。

上下文和注释

在具有多字节字符集的数据库中,某些标点符号和空格字符(如全角逗号和全角空格)可能会处理为字母数字字符。

 任务
  1. 启动 Interactive SQL。单击 [开始] » [程序] » [SQL Anywhere 16] » [管理工具] » [Interactive SQL]。

  2. 在 [连接] 窗口中,填写以下字段:

    1. 在 [验证] 下拉列表中,选择 [数据库]。

    2. 在 [用户 ID] 字段中键入 DBA

    3. 在 [口令] 字段中键入 sql

    4. 在 [操作] 下拉列表中选择 [使用 ODBC 数据源连接]。

  3. 执行以下语句以创建名为 myNcharNGRAMTextConfig 的 NCHAR 文本配置对象:

    CREATE TEXT CONFIGURATION myNcharNGRAMTextConfig FROM default_nchar;
  4. 执行以下语句,将 TERM BREAKER 算法更改为 NGRAM,并将 MAXIMUM TERM LENGTH 设置为 2:

    ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig 
       TERM BREAKER NGRAM;
    ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig 
       MAXIMUM TERM LENGTH 2;

    对于中文、日文和朝鲜文数据,N 的建议值为 2 或 3。如果搜索限制为一个或两个字符,则将 N 值设置为 1。将 N 值设置为 1 可能会降低较长查询的执行速度。

  5. 启动 Sybase Central。单击 [开始] » [程序] » [SQL Anywhere 16] » [管理工具] » [Sybase Central]。

  6. 单击 [连接] » [使用 SQL Anywhere 16 连接]。

  7. 在 [连接] 窗口中,填写以下字段:

    1. 在 [验证] 下拉列表中,选择 [数据库]。

    2. 在 [用户 ID] 字段中键入 DBA

    3. 在 [口令] 字段中键入 sql

    4. 在 [操作] 下拉列表中选择 [使用 ODBC 数据源连接]。

  8. 创建 MarketingInformation 表的副本。

    1. 展开 [] 文件夹。

    2. 右击 MarketingInformation,然后单击 [复制]。

    3. 右击 [] 文件夹,然后单击 [粘贴]。

    4. 在 [名称] 字段中键入 MarketingInformationNgram

    5. 单击 [确定]。

  9. 在 Interactive SQL 中,执行以下语句向 MarketingInformationNgram 表中添加数据:

    INSERT INTO MarketingInformationNgram
       SELECT * FROM GROUPO.MarketingInformation;
    COMMIT;
  10. 执行以下语句可在 MarketingInformationNgram.Description 列上创建使用 myNcharNGRAMTextConfig 文本配置对象的 IMMEDIATE REFRESH 文本索引:

    CREATE TEXT INDEX ncharNGRAMTextIndex 
       ON MarketingInformationNgram( Description ) 
          CONFIGURATION myNcharNGRAMTextConfig;
  11. 测试文本索引。

    1. 下列语句在 2 元语法词文本索引中搜索包含 sw 的术语。结果按分数降序排序。

      SELECT M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'sw' ) ct
         ORDER BY ct.score DESC;
      Description Score
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded Sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html> 2.262071918398649
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html> 1.5556043490424176
    2. 以下语句搜索包含 ams 的术语。结果按分数降序排序。

      SELECT M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'ams' ) ct
         ORDER BY ct.score DESC;

      对于 2 元语法词文本索引,以上语句在语义上等效于:

      SELECT M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, '"am ms"' ) ct
         ORDER BY ct.score DESC;

      两个语句均会返回以下结果:

      Description Score
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html> 1.6619019465461564
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html> 1.5556043490424176
    3. 以下语句搜索包含后跟任意字母数字字符的 v 的术语。因为 ve 在索引数据中出现频率较高,所以包含 2 元语法词 ve 的行所得到的分数要低于包含 vi 的行。结果按分数降序排序。

      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'v*' ) ct
         ORDER BY ct.score DESC;
      
      ID Description Score
      901 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favvorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html> 3.3416789108071976
      907 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html> 2.1123084896159376
      905 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html> 1.6750365447462499
      910 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html> 0.9244136988525732
      906 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help neutralize odor.</span></p></body></html> 0.9134171046194403
      904 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html> 0.7313071661212746
      903 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html> 0.6799436746197272
    4. 以下语句在各行中搜索包含 v 的全部术语。第二个语句之后,变量包含字符串 [av OR ev OR iv OR ov OR rv OR ve OR vi OR vo]。结果按分数降序排序。如果某一 n 元语法词在所有索引行中出现,则为其指定的分数为零。

      如果单个字符出现在空白字符或非字母数字字符之前,那么这是用于定位该字符的唯一方法。

      CREATE VARIABLE query NVARCHAR (100);
      SELECT LIST (term, ' OR ' )
      INTO query
         FROM sa_text_index_vocab_nchar( 'ncharNGRAMTextIndex', 'MarketingInformationNgram', 'dba' )
         WHERE term LIKE '%v%';
      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
         CONTAINS( M.Description, query ) ct
         ORDER BY ct.score DESC;
      ID Description Score
      901 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html> 6.654350268810443
      907 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html> 4.265623837817126
      903 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html> 2.9386676702799504
      910 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html> 2.5481193655722336
      904 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html> 2.4293498211307214
      905 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html> 1.6750365447462499
      906 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help neutralize odor.</span></p></body></html> 0.9134171046194403
      902 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This simple, sleek, and lightweight technical shirt is designed for high-intensity workouts in hot and humid weather. The recycled polyester fabric is gentle on the earth and soft against your skin.</span></p></body></html> 0
      908 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html> 0
      909 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html> 0
    5. 以下语句在 Description 列中搜索包含 eakaki 的行。

      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'ea ka ki' ) ct
        ORDER BY ct.score DESC;
      ID Description Score
      904 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html> 3.4151032739119733
    6. 以下语句在 Description 列中搜索包含 vevi,但不包含 gg 的行。

      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 've & vi -gg' ) ct
         ORDER BY ct.score DESC;
      ID Description Score
      905 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html> 1.6750365447462499
  12. 关闭 Interactive SQL 和 Sybase Central。

结果

已对 NGRAM 文本索引执行全文搜索。

下一个

(可选)将示例数据库 (demo.db) 恢复至其原始状态。请参见重新创建示例数据库 (demo.db)

 另请参见