ElasticSearch study notes (3)-visual interface Kibana and ES Chinese word segmentation configuration

Develop a habit, like first and then watch!!!

Preface

In the previous blog which we have simply explained the ES 安装and basic增删改查 , but explain additions and deletions to change search operation before it, forget the point is to teach you how to install visual interface Kibana . Here too, talk with you.

There is also the problem of Chinese word segmentation of ES. For details, please see my first blog on ES. In it, I explained in detail the important principles of ES search algorithm- (倒排索引)because of this, we also need to configure ES Chinese word segmentation.

Visual interface Kibana

When you use the Mysql database, you must have used visualization tools like Navicat and other databases. Then obviously Mysql has it, and ES must have it. The ES visualization tool is called Kibana. Its function is actually similar to Navicat's function, and it also helps us to be able to Observe the information of each node in ES and related data in the node more intuitively.

After understanding the general role of Kibana, let's take a look at how to install Kibana.

  • Upload and unzip

    cd /opt/es
    tar -zxvf kibana-6.3.1-linux-x86_64.tar.gz
    

    This process will be longer

    It's not that the file is large, but because the number of files is relatively large

Insert picture description here

  • Configure ES information in kibana

    vi kibana.yml
    

Insert picture description here
If the cloud server, then access the address where the ES to fill your server's 公网address on the line, not like before 配置ES, when filled out 外网the address of.

Save and exit

  • Dynamic kibana

    cd ../bin
    nohup ./kibana &
    

Insert picture description here

So our kibana has been started

At this time, let's check our Kibana page in the address bar:

IP地址:5601

You can visit Kibana's page

Insert picture description here

Seeing such a page means that our Kibana has成功启动并且成功连接到我们的服务器的ElasticSearch了

And all our operations before are performed in this interface:
Insert picture description here

ES Chinese word segmentation

Remember when we talked about before, through the interior of the ES algorithm is 倒排索引the way to carry out, and when we talk about the first step is the first inverted index of content stored in the database we were 分词treated, so we are now Need to test whether the word segmentation operation of ES can be executed normally.

Let's first test how ES's segmentation of English is. Through the figure below, we can see that ES's segmentation of English is completely possible.

Insert picture description here
But it is clear that the data we will operate later will definitely be in Chinese, so we now need to test to see if ES can recognize our Chinese.

Insert picture description here

It can be seen that after the execution, we found that ES is not able to recognize Chinese word segmentation. He can only regard Chinese as a collection of single characters, and cannot understand 词语this concept . Since we cannot correctly recognize our words, then we are now You need to configure the relevant Chinese word segmentation plugin.

We need to upload our IK tokenizer to our ES plugins directory. Here we need to pay attention to the fact that the plugins directory is dedicated to storing our plugins , and after this we need to pay attention to: under the plugins directory are single directory to identify a widget , a widget is not able to decompress a plurality of folders, and must be 一个单个的文件夹which contains the insert所有配置信息 , and can not be 多层目录嵌套kind , or are not identify them, the specific decompressed The format should be the following format.

Insert picture description here

This format is the correct format.

After the decompression is complete, we need to restart our ES so that our plug-in can take effect.
Insert picture description here

After ES restarts, let's see if our tokenizer can be used normally. The tokenizer plugin we installed is a tokenizer plugin called IK. This plugin has two types of grammar analysis, one is ik_smartand the other is ik_max_word.

ik_smartIs a simpler tokenizer

ik_max_wordIs a more powerful word segmenter

Here we can see it through the following example.

This is our word is designated ik_smartsegmentation result of:

Insert picture description here

This is our word is designated ik_max_wordsegmentation result of:

Insert picture description here

Comparing the above results, we can find that it is indeed more obvious ik_max_wordthat the effect of the word segmenter is more powerful. He not only breaks a sentence into multiple consecutive words, but also like the word "Chinese" , he also It can be broken down into three words "Chinese, Chinese, and Chinese" . In this way, the effect of word segmentation can be better realized.

After understanding the word segmentation, let's take a look at the meaning of each attribute of the word segmentation result.

Insert picture description here

At this time, everyone may have to say again, why do we need to produce such some participle results? Here is a concept we have mentioned before 相关性算分. Because this correlation calculation may need to know where the participle appears, a total of It has appeared several times, etc., these will directly affect the results of the correlation score, so the word segmentation result is like this.

In this way, our ES Chinese word segmentation has been configured.

Originality is not easy, codewords are not easy. If you think it is helpful to you, you can follow my official account, newcomers need your support!!!

Insert picture description here

If you don't look at it, you look good!

Keep watching, you look better!

Guess you like

Origin blog.csdn.net/lovely__RR/article/details/112175763