[SpringCloud] Getting Started with Microservice Technology Stack 5 - ElasticSearch

ElasticSearch


Inverted index

Insert image description here

Establishing an inverted index: segment the article title and store each word in term. These words also correspond to an id, which is the document.

Inverted index retrieval: Suppose we search for Huawei mobile phones

  1. Participles: "Huawei" "mobile phone"
  2. Find the corresponding two keys and their document ids from the database
  3. Since the document ids are 2,3 and 1,2 respectively; it can be seen that document id=2 has the highest degree of overlap and best meets the search conditions, so the search results will be ranked at the top.
  4. Search results are stored in the result set

Environment configuration

First you need to download the following three things (the 7.8 version is chosen here to be compatible with lower versions of JAVA. Higher versions of ES must require higher versions of JDK, which is very inconvenient)

  1. ElasticSearch7.8.0
  2. kibana
  3. ik tokenizer

Attention! Since we are building an environment under Windows, when downloading the ik word segmenter, be sure to download the compiled package elasticsearch-analysis-ik-7.8.0.zip. Do not download the source code package! ! !

All versions of the three-piece set must be consistent! There is no such thing as downward or upward compatibility!


Installation under windows is very simple. Unzip all three compressed packages to a directory with a non-Chinese path.

First drop all the contents of the ik 分词器 compressed package into the plugins folder in the root directory of es7.8

Open the JVM configuration file of es7.8:es7.8/config/jvm.options
Adjust the running memory, otherwise the memory will explode and crash after running it

-Xms1g
-Xmx1g

You're done, double-click to run the following two bat files (note the order)

  1. es根目录/bin/elasticsearch.bat
  2. kibana根目录/bin/kibana.bat

es runs on port 9200 by default, and kibana runs on port 5601 by default.


Test ik tokenizer

Open kibana consolelocalhost:5601

Click on the menu in the upper left corner, scroll to the bottom and select dev tools
Here you can test our es code at will, such as inserting indexes and queries

According to the format below, we used the ik intelligent word segmenter to perform word segmentation on a line of text containing Chinese and English.

Insert image description here

POST _analyze
{
    
    
  "text": "我再也不想学JAVA语言了",
  "analyzer": "ik_smart"
}

Add extended dictionary

Internet hot words cannot always be included by the ik word segmenter, let alone Chinese, so in special cases we need to add an extended dictionary to help the ik word segmenter correctly identify new Internet words

First open the ik word segmenter extension settings file:es根目录/plugins/analysis-ik/config/IKAnalyzer.cfg.xml
Change it to the following

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">ext.dic</entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords">stopword.dic</entry>

	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

New file in the same directoryext.dic is used to store expansion words. Change the line after each expansion word is written.
We can add the following two expansion words

小黑子
煤油树枝
香精煎鱼
香菜凤仁鸡
梅素汁

Restart es7.8 and return to our kibana interface again

It can be seen that the ik word segmenter successfully identified the hot words on the Internet and performed the word segmentation operation!

Insert image description here


Operation index

To create a simple index, you only need to make brief modifications to the following code

PUT /heima
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "info":{
    
                                // 设置字段名为"info"的映射
        "type": "text",                   // 设置字段类型为"text"
        "analyzer": "ik_smart"            // 使用中文分词器"ik_smart"进行分词
      },
      "email":{
    
                               // 设置字段名为"email"的映射
        "type": "keyword",                 // 设置字段类型为"keyword",表示不会进行分词
        "index": false                     // 设置不对该字段进行索引,即无法通过该字段进行搜索
      },
      "name":{
    
                                // 设置字段名为"name"的映射
        "type": "object",                  // 设置字段类型为"object",表示是一个嵌套对象
        "properties": {
    
                        // 定义嵌套对象的属性
          "firstname":{
    
                        // 设置嵌套对象的属性名为"firstname"的映射
            "type":"keyword"                // 设置属性类型为"keyword",表示不会进行分词
          },
          "lastname":{
    
                         // 设置嵌套对象的属性名为"lastname"的映射
            "type":"keyword"                // 设置属性类型为"keyword",表示不会进行分词
          }
        }
      }
    }
  }
}

The result after execution in dev tools is

{
    
    
	"acknowledged": true,
	"shards_acknowledged": true,
	"index": "heima"
}

Indexing and document operations

Once created, the index library and mapping in es cannot be modified, but new fields can be added to them.

The following command adds a new field called age to the index heima

PUT /heima/_mapping
{
    
    
  "properties":{
    
    
    "age":{
    
    
      "type":"keyword"
    }
  }
}

Get index library:GET /索引库名称
Delete index library:DELETE /索引库名称


Guess you like

Origin blog.csdn.net/delete_you/article/details/133611783