Click here to view and discuss this page in DocCommentXchange. In the future, you will be sent there automatically.

SQL Anywhere 11.0.1 (日本語) » SQL Anywhere サーバ - SQL の使用法 » データのクエリと変更 » データのクエリ » 全文検索のタイプ

 

チュートリアル:NGRAM テキスト・インデックスへの全文検索の実行

次の手順に従って、NGRAM 単語区切りを使用するテキスト・インデックスに全文検索を実行します。中国語、日本語、韓国語データの全文検索を作成する際もこの手順で実行できます。

マルチバイト文字セットを含むデータベースでは、全角カンマや全角スペースなどの一部の句読表記文字やスペース文字は、英数字として処理される場合があります。

参照:チュートリアル:あいまい全文検索の実行

♦  NGRAM テキスト・インデックスへの全文検索の実行
  1. 次の文を実行して、myNcharNGRAMTextConfig という NCHAR テキスト設定オブジェクトを作成します。

    CREATE TEXT CONFIGURATION myNcharNGRAMTextConfig FROM default_nchar;
  2. 次の文を実行して、TERM BREAKER アルゴリズムを NGRAM に変更し、MAXIMUM TERM LENGTH (N) を 2 に設定します。

    ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig 
       TERM BREAKER NGRAM;
    ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig 
       MAXIMUM TERM LENGTH 2;

    中国語、日本語、韓国語データの場合は、N の値を 2 または 3 に設定することをおすすめします。検索語が 1 文字または 2 文字以内の場合は、N の値を 1 に設定します。クエリが長い場合に N の値を 1 に設定すると、実行に時間がかかる可能性があります。

  3. MarketingInformation テーブルのコピーを作成します。

    1. Sybase Central で [テーブル] フォルダを拡張します。

    2. [MarketingInformation] を右クリックし、[コピー] を選択します。

    3. [テーブル] フォルダを右クリックし、[貼り付け] を選択します。

    4. [名前] フィールドに、MarketingInformationNgram と入力します。[OK] をクリックします。

  4. 次の文を実行して、MarketingInformationNgram テーブルにデータを追加します。

    INSERT INTO MarketingInformationNgram
       SELECT *
          FROM MarketingInformation;
    COMMIT;
    
  5. 次の文を実行して、myNcharNGRAMTextConfig テキスト設定オブジェクトを使用することで、MarketingInformationNgram テーブルの Description カラムに IMMEDIATE REFRESH テキスト・インデックスを作成します。

    CREATE TEXT INDEX ncharNGRAMTextIndex 
       ON MarketingInformationNgram( Description ) 
          CONFIGURATION myNcharNGRAMTextConfig;
    
  6. 次の文を実行して、テキスト・インデックスをテストします。

    1. この文により、2-GRAM のテキスト・インデックスで sw を含む単語が検索されます。結果はスコアの降順でソートされます。

      SELECT M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'sw' ) ct
         ORDER BY ct.score DESC;
      
      Description score
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>「Sw」eatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded 「Sw」eatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html> 2.262071918398649
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>「Sw」eatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html> 1.5556043490424176
    2. 次の文は、ams を含む単語を検索します。結果はスコアの降順でソートされます。

      SELECT M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'ams' ) ct
         ORDER BY ct.score DESC;
      

      2-GRAM テキスト・インデックスでは、前述の文はセマンティック上、次の文と同義になります。

      SELECT M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, '"am ms"' ) ct
         ORDER BY ct.score DESC;
      
      Description score
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded sweatshirt with taped neck se「ams」. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html> 1.6619019465461564
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched se「ams」 for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html> 1.5556043490424176
    3. 次の文は、v の後に任意の英数字が続く単語を検索します。インデックス付けされたデータで、ve の出現する頻度の方が高いため、2-gram の ve を含むローのスコアは、vi を含むローよりも低くなります。結果はスコアの降順でソートされます。

      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'v*' ) ct
         ORDER BY ct.score DESC;
      
      ID Description score
      901 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We'「ve」 impro「ve」d the design of this perennial fav「vo」rite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html> 3.3416789108071976
      907 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>「Vi」sor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate 「vi」sor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic headband has plenty of stretch to gi「ve」 you a snug yet comfortable fit e「ve」ry time you wear it.</span></p></body></html> 2.1123084896159376
      905 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side 「ve」nts for breathable comfort during aerobic acti「vi」ties. Moisture-absorbing headband liner.</span></p></body></html> 1.6750365447462499
      910 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton shorts pro「vi」de all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html> 0.9244136988525732
      906 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>「Vi」sor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help neutralize odor.</span></p></body></html> 0.9134171046194403
      904 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier tra「ve」l, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html> 0.7313071661212746
      903 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt made of recycled water bottles. It will ser「ve」 you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html> 0.6799436746197272
    4. 次の文は、各ローで v を含む任意の単語を検索します。2 番目の文の後、変数には文字列 av OR ev OR iv OR ov OR rv OR ve OR vi OR vo が含まれます。結果はスコアの降順でソートされます。インデックス付けされたすべてのローに N-gram があると、スコアは 0 になります。

      これは、ホワイトスペースや英数字以外の文字の前にある 1 文字を検索するための唯一の方法です。

      CREATE VARIABLE query NVARCHAR (100);
      SELECT LIST (term, ' OR ' )
      INTO query
         FROM sa_text_index_vocab( 'ncharNGRAMTextIndex', 'MarketingInformationNgram', 'dba' )
         WHERE term LIKE '%v%';
      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
         CONTAINS( M.Description, query ) ct
         ORDER BY ct.score DESC;
      
      ID Description score
      901 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We'「ve」 impr「ove」d the design of this perennial f「avo」rite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html> 6.654350268810443
      907 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>「Vi」sor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate 「vi」sor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic headband has plenty of stretch to g「ive」 you a snug yet comfortable fit 「eve」ry time you wear it.</span></p></body></html> 4.265623837817126
      903 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt made of recycled water bottles. It will se「rve」 you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html> 2.9386676702799504
      910 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton shorts pr「ovi」de all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html> 2.5481193655722336
      904 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier tr「ave」l, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html> 2.4293498211307214
      905 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side 「ve」nts for breathable comfort during aerobic act「ivi」ties. Moisture-absorbing headband liner.</span></p></body></html> 1.6750365447462499
      906 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>「Vi」sor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help neutralize odor.</span></p></body></html> 0.9134171046194403
      902 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This simple, sleek, and lightweight technical shirt is designed for high-intensity workouts in hot and humid weather. The recycled polyester fabric is gentle on the earth and soft against your skin.</span></p></body></html> 0
      908 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html> 0
      909 <html><head><meta http-equ「iv」=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html> 0
    5. 次の文は、Description カラムで eakak を含むローを検索します。

      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'ea ka ki' ) ct
        ORDER BY ct.score DESC;
      
      ID Description score
      904 <html><h「ea」d><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></h「ea」d><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is id「ea」l for glacier travel, s「ea」-「ka」yaking, and hi「ki」ng. With conc「ea」led draw cord for windy days.</span></p></body></html> 3.4151032739119733
    6. 次の文は、Description カラムで vevi を含み、gg は含まないローを検索します。

      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 've & vi –gg' ) ct
         ORDER BY ct.score DESC;
      
      ID Description score
      905 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side 「ve」nts for breathable comfort during aerobic acti「vi」ties. Moisture-absorbing headband liner.</span></p></body></html> 1.6750365447462499