ElasticSearch Chinese word search environment to build

ElasticSearch is a powerful search tool, and is an important part ELK Kit

A good memory as written mess, this is the Chinese word search es build a test environment in the windows environment, the following steps

1, the installation jdk1.8, configuration environment variables

2, download ElasticSearch7.1.1, version changes faster, just looked at the latest version is already 7.2.0, 7.1.1 build based on this environment, Download https://www.elastic.co/cn/downloads/elasticsearch , to give a zip archive, under the following operating cmd command to start the decompression ES

./bin/elasticsearch.bat

Normal start, then prompt the next time some of logging output

Browser, enter http: // localhost: 9200 / test whether the service can be accessed, under normal circumstances displays summary information, instructions to build a successful ES

3, ElasticSearch while providing a powerful Restful interface, but does not have a UI interface operation is not very intuitive, elasticsearch head-good solution to this problem, elasticsearch-head is based on a tool node, providing visual display interface by connecting the ES service, Details reference:

https://github.com/mobz/elasticsearch-head , the installation procedure is very simple, as follows

git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head
npm install
npm run start

After the service interface follows the normal start-up display

 

Browser, enter http: // localhost: 9100 / can see the corresponding UI

4, Chinese word plugins details see https://github.com/medcl/elasticsearch-analysis-ik , pay attention not to choose the wrong version, otherwise it will fail in accordance with, es7.1.1 select the corresponding version, installation steps are as follows:

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.1.1/elasticsearch-analysis-ik-7.1.1.zip

5, test Chinese word search function to index, or send the following request elasticsearch-head in the postman

- Creating an index
curl -XPUT http://localhost:9200/news 

- the index to add data
curl -XPOST HTTP: // localhost: 9200 / News / _create / 1 -H 'Content-Type: the Application / json' -d ' 
{ " Content " : " the United States to leave Iraq is a mess right " }
 '

Add the following data

Add index mapping

curl -XPOST http://localhost:9200/news/_mapping -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            }
        }

}'

The difference is ik_max_word ik_smart

ik_max_word: text will do the most fine-grained split, for example, will be "the national anthem of People's Republic of China" split "People's Republic of China, the Chinese people, the Chinese, the Chinese People's Republic, people, people, people, republic, republican, and, the country country, national anthem, "will exhaust all the possible combinations for Term Query;

ik_smart: split will do the most coarse-grained, for example, will be "People's Republic of China national anthem" split "People's Republic of China, the national anthem," for Phrase queries.

Test Example:

http: // localhost: 9200 / _analyze, by ik_max_word word, the following results

Entry

{ "Text": "Great Hall People's Republic of China", "analyzer": "ik_max_word"}

Export

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "中华人民",
            "start_offset": 0,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "中华",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "华人",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "人民共和国",
            "start_offset": 2,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "人民",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "共和国",
            "start_offset": 4,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "共和",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "国人",
            "start_offset": 6,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 8
        },
        {
            "token": "人民大会堂",
            "start_offset": 7,
            "end_offset": 12,
            "type": "CN_WORD",
            "position": 9
        },
        {
            "token": "人民大会",
            "start_offset": 7,
            "end_offset": 11,
            "type": "CN_WORD",
            "position": 10
        },
        {
            "token": "人民",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 11
        },
        {
            "token": "大会堂",
            "start_offset": 9,
            "end_offset": 12,
            "type": "CN_WORD",
            "position": 12
        },
        {
            "token": "大会",
            "start_offset": 9,
            "end_offset": 11,
            "type": "CN_WORD",
            "position": 13
        },
        {
            "token": "会堂",
            "start_offset": 10,
            "end_offset": 12,
            "type": "CN_WORD",
            "position": 14
        }
    ]
}

If you enter

{ "Text": "Great Hall People's Republic of China", "analyzer": "ik_smart"}

Export

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "人民大会堂",
            "start_offset": 7,
            "end_offset": 12,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

According to the input retrieval word grammar, request url: http: // localhost: 9200 / news / _search

Input:

{
    "query" : { "match" : { "content" : "中华人民共和国国歌" }},
    "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}

Output:

{
    "took": 11,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.6810182,
        "hits": [
            {
                "_index": "news",
                "_type": "_doc",
                "_id": "6",
                "_score": 1.6810182,
                "_source": {
                    "content": "中华民族国歌"
                },
                "highlight" : {
                     "Content" : [
                         "<tag1> China </ tag1> nation <tag1> National Anthem </ tag1>"
                    ]
                }
            },
            {
                "_index": "news",
                "_type": "_doc",
                "_id": "5",
                "_score": 0.9426802,
                "_source": {
                    "content": "人民公社"
                },
                "highlight": {
                    "content": [
                        "<tag1>人民</tag1>公社"
                    ]
                }
            }
        ]
    }
}

The following operating results

 

Guess you like

Origin www.cnblogs.com/weiweictgu/p/11102772.html