Search engine elasticsearch: install elasticsearch (including installation components kibana, IK tokenizer, deployment es cluster)

install elasticsearch

1. Deploy single point es

1.1. Create a network

Kibana can help us write DSL statements conveniently, so we need to install kibana

Because we also need to deploy the kibana container, we need to interconnect the es and kibana containers. Here first create a network:

docker network create es-net

1.2. Load image

Here we use the image of version 7.12.1 of elasticsearch, which is very large, close to 1G. It is not recommended that you pull it yourself.

The pre-class materials provide mirrored tar packages:

insert image description here

You upload it to the virtual machine, and then run the command to load it:

# 导入数据
docker load -i es.tar

In the same way, there are kibanatar packages that also need to do this.

1.3. Run

Run the docker command to deploy single point es:

docker run -d \
	--name es \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    -e "discovery.type=single-node" \
    -v es-data:/usr/share/elasticsearch/data \
    -v es-plugins:/usr/share/elasticsearch/plugins \
    --privileged \
    --network es-net \
    -p 9200:9200 \
    -p 9300:9300 \
elasticsearch:7.12.1

Command explanation:

  • -e "cluster.name=es-docker-cluster": set the cluster name
  • -e "http.host=0.0.0.0": The listening address, which can be accessed from the external network
  • -e "ES_JAVA_OPTS=-Xms512m -Xmx512m": (future runtime) memory size
  • -e "discovery.type=single-node": non-cluster mode
  • -v es-data:/usr/share/elasticsearch/data: Mount the logical volume, bind the data directory of es
  • -v es-logs:/usr/share/elasticsearch/logs: Mount the logical volume, bind the log directory of es
  • -v es-plugins:/usr/share/elasticsearch/plugins: Mount the logical volume, bind the plug-in directory of es
  • --privileged: grant logical volume access
  • --network es-net: join a network named es-net (kibana will also join, the two can communicate with each other)
  • -p 9200:9200: Port mapping configuration (users of port 9200 access port 9300, the port that will be interconnected between nodes in the future, which is not currently available)

-v Local volume: container directory
If there is no local volume, it should be created for you. By docker volume inspect 卷名viewing the volume information, there is a local directory

# 查看所有数据卷
docker volume ls
# 查看数据卷详细信息卷
docker volume inspect html

insert image description here
insert image description here
insert image description here

After the above docker run ...command is executed, docker psyou can view the corresponding process, and the browser can also access it. Enter
in the browser: http://192.168.141.100:9200 (note that the ip is replaced by your own) to see the elasticsearch Response result:

insert image description here

2.Department kibana

Kibana can provide us with an elasticsearch visual interface for us to learn.

2.1. Deployment

Run the docker command to deploy kibana

docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601  \
kibana:7.12.1
  • --network es-net: join a network named es-net, in the same network as elasticsearch
  • -e ELASTICSEARCH_HOSTS=http://es:9200": Set the address of elasticsearch, because kibana is already in the same network as elasticsearch, so you can directly access elasticsearch with the container name
  • -p 5601:5601: port mapping configuration

Kibana is generally slow to start and needs to wait for a while. You can use the command:

docker logs -f kibana

Check the running log. When you see the following log, it means success:

insert image description here

At this point, enter the address in the browser to access: http://192.168.141.100:5601 , you can see the result

See kibana~
insert image description here
click Explore on my own, and then
insert image description here

2.2.DevTools

A DevTools interface is provided in kibana:

insert image description here
insert image description here

In this interface, DSL can be written to operate elasticsearch. And there is an automatic completion function for DSL statements.

The json format statement on the left is the DSL query statement,
the essence is to send a Restful request to es

2.3 Word segmentation problem (Chinese is not friendly)

# 测试分词器
POST /_analyze
{
    
    
  "text": "李白讲的java太棒了",
  "analyzer": "english"
}
{
    
    
  "tokens" : [
    {
    
    
      "token" : "李",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
    
    
      "token" : "白",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
    
    
      "token" : "讲",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
    
    
      "token" : "的",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
    
    
      "token" : "java",
      "start_offset" : 4,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
    
    
      "token" : "太",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
    
    
      "token" : "棒",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
    
    
      "token" : "了",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    }
  ]
}

insert image description here
Changing the parser from 'english' to 'chinese' or other 'standard' is still the same, and the running result remains the same
insert image description here
insert image description here

It can be seen that the word segmentation in English is still good, and 'java' is divided into one word. But in Chinese, the land is divided character by character, which is obviously inappropriate. The default es cannot understand the Chinese meaning

3. Install the IK tokenizer

Git address: https://github.com/medcl/elasticsearch-analysis-ik

It can be seen that it is specially used for ES
insert image description here

3.1. Install ik plugin online (slower)

# 进入容器内部
docker exec -it elasticsearch /bin/bash

# 在线下载并安装
./bin/elasticsearch-plugin  install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip

#退出
exit
#重启容器
docker restart elasticsearch

3.2. Install ik plugin offline (recommended)

1) View the data volume directory

To install the plug-in, you need to know the location of the plugins directory of elasticsearch, and we use the data volume mount, so we need to view the data volume directory of elasticsearch, and check it by the following command:

docker volume inspect es-plugins

Show results:

[
    {
    
    
        "CreatedAt": "2023-07-15T15:57:30+08:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/es-plugins/_data",
        "Name": "es-plugins",
        "Options": null,
        "Scope": "local"
    }
]

It shows that the plugins directory is mounted to: /var/lib/docker/volumes/es-plugins/_data this directory.

2) Unzip the tokenizer installation package

Next, we need to decompress the ik tokenizer in the pre-class materials and rename it to ik

insert image description here

3) Upload to the plug-in data volume of the es container

That is /var/lib/docker/volumes/es-plugins/_data :

insert image description here

4) Restart the container

# 4、重启容器
docker restart es
# 查看es日志
docker logs  es | grep  analysis-ik

insert image description here
Successfully loaded, the word breaker is installed

5) Test:

The IK tokenizer contains two modes:

  • ik_smart: Least segmentation (group words as long as possible, and then no longer segmentation)

  • ik_max_word: The finest segmentation (more divisions, if it is a word, it will be divided, and the word can be used repeatedly)

POST /_analyze
{
    
    
  "text": "胡老师讲的java太棒了",
  "analyzer": "ik_max_word"
}

result:

{
    
    
  "tokens" : [
    {
    
    
      "token" : "胡",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
    
    
      "token" : "老师",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
    
    
      "token" : "讲",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
    
    
      "token" : "的",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
    
    
      "token" : "java",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "ENGLISH",
      "position" : 4
    },
    {
    
    
      "token" : "太棒了",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
    
    
      "token" : "太棒",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
    
    
      "token" : "了",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "CN_CHAR",
      "position" : 7
    }
  ]
}

3.3 Extended word dictionary

With the development of the Internet, "word-making movements" have become more and more frequent. Many new words appeared that did not exist in the original vocabulary list. For example: "Olige", "Forever Drop God" and so on.

insert image description here

Therefore, our vocabulary also needs to be constantly updated, and the IK tokenizer provides the function of expanding vocabulary.

1) Open the IK tokenizer config directory:
/var/lib/docker/volumes/es-plugins/_data/ik/config
insert image description here

2) Add in the content of the IKAnalyzer.cfg.xml configuration file:

The configuration has been written by default, just fill in the file name

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 *** 添加扩展词典-->
        <entry key="ext_dict">ext.dic</entry>
</properties>

3) Create a new ext.dic, you can refer to the config directory to copy a configuration file for modification

In fact, just list each word line by line

全红禅
永远滴神
奥力给

4) restart elasticsearch

docker restart es

# 查看 日志
docker logs -f elasticsearch

The log will show that the ext.dic configuration file has been successfully loaded

Or wait patiently for a while, basically it can be loaded normally

5) Test effect:

POST /_analyze
{
    
    
  "text": "全红禅永远滴神,我的神,奥力给",
  "analyzer": "ik_max_word"
}

Note that the encoding of the current file must be in UTF-8 format, and editing with Windows Notepad is strictly prohibited

3.4 Stop word dictionary

In Internet projects, the speed of transmission between networks is very fast, so many languages ​​​​are not allowed to be transmitted on the network, such as: sensitive words about religion, politics, etc., then we should also ignore the current vocabulary when searching.

The IK tokenizer also provides a powerful stop word function, allowing us to ignore the contents of the current stop vocabulary when indexing.

1) Add the contents of the IKAnalyzer.cfg.xml configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典-->
        <entry key="ext_dict">ext.dic</entry>
         <!--用户可以在这里配置自己的扩展停止词字典  *** 添加停用词词典-->
        <entry key="ext_stopwords">stopword.dic</entry>
</properties>

In fact, they are all configured, but the names of the two dictionaries are empty by default.

3) Add stop words to stopword.dic

This file already exists by default, just add it directly

的
地
了
哦
啊
嘤

4) restart elasticsearch

# 重启服务
docker restart elasticsearch
docker restart kibana

# 查看 日志
docker logs -f elasticsearch

The stopword.dic configuration file has been successfully loaded in the log

5) Test effect:

POST /_analyze
{
    
    
  "text": "全红禅永远滴神,我的神,奥力给",
  "analyzer": "ik_max_word"
}

Note that the encoding of the current file must be in UTF-8 format, and editing with Windows Notepad is strictly prohibited

{
    
    
  "tokens" : [
    {
    
    
      "token" : "全红禅",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
    
    
      "token" : "永远滴神",
      "start_offset" : 3,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
    
    
      "token" : "永远",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
    
    
      "token" : "滴",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
    
    
      "token" : "神",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
    
    
      "token" : "我",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_CHAR",
      "position" : 5
    },
    {
    
    
      "token" : "神",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
    
    
      "token" : "奥力给",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 7
    }
  ]
}

全红禅, 永远滴神, 奥利给. can be recognized as idioms
. won't be participle anymore

  • summary
    insert image description here

4. Deploy the es cluster

Deploying the es cluster can be done directly using docker-compose, but it is required that your Linux virtual machine has at least 4G of memory space (if it is not enough, re-allocate and increase it)

4.1. Create es cluster

First write a docker-compose file with the following content:

docker-compose.yml

version: '2.2'
services:
  es01:
    image: elasticsearch:7.12.1
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic
  es02:
    image: elasticsearch:7.12.1
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - data02:/usr/share/elasticsearch/data
    ports:
      - 9201:9200
    networks:
      - elastic
  es03:
    image: elasticsearch:7.12.1
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - data03:/usr/share/elasticsearch/data
    networks:
      - elastic
    ports:
      - 9202:9200
volumes:
  data01:
    driver: local
  data02:
    driver: local
  data03:
    driver: local

networks:
  elastic:
    driver: bridge

You can see from the yml file:
es01 port 9200
es02 port 9201
es03 port 9202

To upload to linux
insert image description here
es, you need to modify some linux system permissions and modify /etc/sysctl.conffiles

vi /etc/sysctl.conf

Add the following content:

vm.max_map_count=262144

insert image description here

Then execute the command to make the configuration take effect:

sysctl -p

insert image description here

If you restart the virtual machine, you have to start docker first

systemctl start docker

Run docker-compose to bring up the cluster:

docker-compose up -d

insert image description here
insert image description here

View logs for each node

docker logs -f es01
docker logs -f es02
docker logs -f es03

4.2. Cluster status monitoring

Kibana can monitor es clusters, but the new version needs to rely on the x-pack function of es, and the configuration is more complicated.

It is recommended to use cerebro to monitor the status of es cluster, official website: https://github.com/lmenezes/cerebro

The pre-class materials have provided the installation package:

Link: https://pan.baidu.com/s/1zrji4O8niH_UmQNKBhNIPg
Extraction code: hzan

insert image description here

It can be used after decompression, which is very convenient.

The decompressed directory is as follows:

insert image description here

Enter the corresponding bin directory:

insert image description here

Double-click the cerebro.bat file to start the service.

insert image description here

When a higher version such as jdk17 starts, an error will be reported: just Caused by: java.lang.IllegalStateException: Unable to load cache item
change the java in the environment variable to the environment variable of jdk8, that is,
use jdk8

Visit http://localhost:9000 to enter the management interface:

insert image description here
http://192.168.141.100:9200/

Enter the address and port of any node of your elasticsearch, and click connect:

insert image description here

A green bar indicates that the cluster is green (healthy).

4.3. Create index library

1) Use kibana's DevTools to create an index library

Enter the command in DevTools:

Multiple nodes store the index library in fragments, and then back up each other.
How to fragment and how many copies? Configured when creating the index library

PUT /itcast
{
    
    
  "settings": {
    
    
    "number_of_shards": 3, // 分片数量
    "number_of_replicas": 1 // 副本数量
  },
  "mappings": {
    
    
    "properties": {
    
    
      // mapping映射定义 ...
    }
  }
}

Kibana has been stopped, in fact, you can also use cerebro to create an index library

2) Use cerebro to create an index library

You can also create an index library with cerebro:

insert image description here

Fill in the index library information:

insert image description here

Click the create button in the lower right corner:

insert image description here

4.4. View fragmentation effect

Go back to the home page, and you can view the effect of index library sharding: it
insert image description here
is exactly the same as the picture in the case diagram
. These found a misunderstanding. The above picture does not have 3 index libraries, but an index library is divided into 3 slices for storage. Shards are stored on different es instances, and the shards between the three instances are mutual backups

Guess you like

Origin blog.csdn.net/hza419763578/article/details/131739973