naked es search, pinyin search

1:Analyzer is generally composed of three parts:
    character filters, tokenizers, token filters

2 Components of
    Analyzer: The inside of Analyzer is a pipeline

    Step 1 Character filter
    Step 2 Tokenization
    Step 3 Token filtering


3:Analyzer pipeline:
    (input) 
    ——---String----->> (CharacterFilters) 
    -----String----->> (Tokenizer)
    -----Tokens----->> (TokensFilters)
    -----Tokens----->> 
    (outpur)


=========================Example 1========================== ==


{
    "index": {
        "analysis": {
            "analyzer": {
                "customHTMLSnowball": {
                    "type": "custom",
                    "char_filter": ["html_strip"],
                    "tokenizer": "standard",
                    "filter": ["lowercase", "stop", "snowball"]
                }
            }
        }
    }
}

The above custom Analyzer is named customHTMLSnowball, which means:
remove html tags (html_strip character filter), such as <p> <a> <div> .
Word segmentation, remove punctuation (standard tokenizer)
Convert uppercase words to lowercase (lowercase token filter)
Filter stop words (stop token filter), such as "the" "they" "i" "a" "an" "and" .
Extracting word stems (snowball token filter, snowball algorithm is the most commonly used algorithm for extracting English word stems.)
cats -> cat
catty -> cat
stemmer -> stem
stemming -> stem
stemmed -> stem

 


=========================Example 1========================== ==

 

=========================Example 2========================== ==

naked es search, pinyin search

curl -XPUT "http://localhost:9200/yyyy" -H 'Content-Type: application/json' -d'
{
  "settings": {
   "analysis": {
     "analyzer": {
       "default": {
         "type": "custom",
         "tokenizer": "ik_smart",
         "char_filter": [
            "html_strip"
          ],
          "filter": [
            "pinyin_filter",
            "lowercase",
            "stop",
            "ngram_1_20"
          ]
       },
       "default_search":{
         "type": "custom",
         "tokenizer": "ik_smart",
         "char_filter": [
            "html_strip"
          ]
      }
     },
     "filter": {
       "ngram_1_20": {
         "type": "ngram",
         "min_gram": 1,
         "max_gram": 20,
         "token_chars": [
           "letter",
           "digit"
          ]
       },
       "pinyin_filter": {
         "type": "pinyin",
         "keep_original": true,
         "keep_joined_full_pinyin": true
       }
     }
   }
  }
}'

 

=========================Example 2========================== ==

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325139624&siteId=291194637