Parsing -analysis

 

Parsing -analysis

 

Resolving -analysis

It can be understood as a word.

Parsing performed by parser --analyzer, resolver including two kinds of built-in and user-defined.

 

1.1 parser

1.1.1. Built-in parser

doc:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html

Standard Analyzer: by word boundaries break down, most punctuation is ignored, lowercase term, supported the removal of stop words.

Simple Analyzer: non-alphabetic characters for the word point, formatted letters to lowercase.

Whitespace Analyzer: a blank character segmentation point of not executed lowercase.

Stop Analyzer: similar to the simple analyzer, but supported the removal of stop words.

Pattern Analyzer: regular analytic word

Language Analyzers: word in other languages

Fingerprint Analyzer:

The fingerprint analyzer is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.

 

1.1.2. Custom parser

Temporarily involved.

 

1.2. Index word / word search

The index is well understood word, write the word division to form the index.

Each text field to specify a unique analyzer;

If not specified, the default to index settings / default parameters prevail, essentially standard analyzer.

 

Search word

For search statement, word will be carried out, using an index word default parser;

Search word can be set independently of the word is, but generally do not have.

 

1.2.1. Segmentation example

A built-english parser as an example:

"The QUICK brown foxes jumped over the lazy dog!"

First of lowercase, remove stop words are high frequency, convert the word to word prototype, the end result is the sequence:

[ quick, brown, fox, jump, over, lazi, dog ]

 

 

2. Case

Environment configuration:

Creating index test_i

Create a field msg, use the default configuration, ie, standard tokenizer

Create a field msg_english, use english word breaker;

 

# Create a test environment

d = {"msg":"Eating an apple a day keeps doctor away."}

rv = es.index("test_i", d)

Q. (Rw)

d = { "properties": {

      "msg_english": {

        "type":     "text",

        "analyzer": "english"

    }  }    }

rv = es.indices.put_mapping (body = d, index = [ "test_i"]) # normally returns true

 

# View Data Structure

rv = es.indices.get_mapping(index_name)

{

 "test_i": {

  "mappings": {

   "properties": {

    "msg": {

     "type": "text",

     "fields": {

      "keyword": {

       "type": "keyword",

       "ignore_above": 256

      }     }    },

    "msg_english": {

     "type": "text",

     "analyzer": "english"

    }   }  } }}

 

Into the document:

d = {"msg_english":"Eating an apple a day keeps doctor away."}

rv = es.index("test_i", d)

 

Query: Query is divided into two parts, the first match by field msg eat, no hits entry, query msg_english field

# search apis

def search_api_test():

    data = {    "query" : {        "match" : {"msg_english":"eat"}    },    }

    rv = es.search(index="test_i", body=data)

    Q. (Rw)

search_api_test()

 

result

{ "took": 2,

 "timed_out": false,

 "_shards": {

  "total": 1,

  "successful": 1,

  "skipped": 0,

  "failed": 0

 },

 "hits": {

  "total": {

   "value": 1,

   "relation": "eq"

  },

  "max_score": 0.2876821,

  "hits": [

   {

    "_index": "test_i",

    "_type": "_doc",

    "_id": "XG7KFG0BpAsDZnvvGLz2",

    "_score": 0.2876821,

    "_source": {

     "msg_english": "Eating an apple a day keeps doctor away."

    }   }  ] }}

 

The difference between word test, visual test standard word and english word's: supplement

Test code:

    # Participle test

d1 = {"analyzer":"standard","text":"Eating an apple a day keeps doctor away."}

d2 = {"analyzer":"english","text":"Eating an apple a day keeps doctor away."}

rv1 = es.indices.analyze(body=d1, format="text")

rv2 = es.indices.analyze(body=d2, format="text")

print ([x [ "token"] for x in rv1 [ "tokens"]]) # d1 segmentation results

print ([x [ "token"] for x in rv2 [ "tokens"]]) # d2 segmentation results

Output:

['eating', 'an', 'apple', 'a', 'day', 'keeps', 'doctor', 'away']

['eat', 'appl', 'dai', 'keep', 'doctor', 'awai']

 

Guess you like

Origin www.cnblogs.com/wodeboke-y/p/11562809.html