Article directory
install elasticsearch
1. Deploy single point es
1.1. Create a network
Kibana can help us write DSL statements conveniently, so we need to install kibana
Because we also need to deploy the kibana container, we need to interconnect the es and kibana containers. Here first create a network:
docker network create es-net
1.2. Load image
Here we use the image of version 7.12.1 of elasticsearch, which is very large, close to 1G. It is not recommended that you pull it yourself.
The pre-class materials provide mirrored tar packages:
You upload it to the virtual machine, and then run the command to load it:
# 导入数据
docker load -i es.tar
In the same way, there are kibana
tar packages that also need to do this.
1.3. Run
Run the docker command to deploy single point es:
docker run -d \
--name es \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "discovery.type=single-node" \
-v es-data:/usr/share/elasticsearch/data \
-v es-plugins:/usr/share/elasticsearch/plugins \
--privileged \
--network es-net \
-p 9200:9200 \
-p 9300:9300 \
elasticsearch:7.12.1
Command explanation:
-e "cluster.name=es-docker-cluster"
: set the cluster name-e "http.host=0.0.0.0"
: The listening address, which can be accessed from the external network-e "ES_JAVA_OPTS=-Xms512m -Xmx512m"
: (future runtime) memory size-e "discovery.type=single-node"
: non-cluster mode-v es-data:/usr/share/elasticsearch/data
: Mount the logical volume, bind the data directory of es-v es-logs:/usr/share/elasticsearch/logs
: Mount the logical volume, bind the log directory of es-v es-plugins:/usr/share/elasticsearch/plugins
: Mount the logical volume, bind the plug-in directory of es--privileged
: grant logical volume access--network es-net
: join a network named es-net (kibana will also join, the two can communicate with each other)-p 9200:9200
: Port mapping configuration (users of port 9200 access port 9300, the port that will be interconnected between nodes in the future, which is not currently available)
-v Local volume: container directory
If there is no local volume, it should be created for you. Bydocker volume inspect 卷名
viewing the volume information, there is a local directory
# 查看所有数据卷
docker volume ls
# 查看数据卷详细信息卷
docker volume inspect html
After the above docker run ...
command is executed, docker ps
you can view the corresponding process, and the browser can also access it. Enter
in the browser: http://192.168.141.100:9200 (note that the ip is replaced by your own) to see the elasticsearch Response result:
2.Department kibana
Kibana can provide us with an elasticsearch visual interface for us to learn.
2.1. Deployment
Run the docker command to deploy kibana
docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601 \
kibana:7.12.1
--network es-net
: join a network named es-net, in the same network as elasticsearch-e ELASTICSEARCH_HOSTS=http://es:9200"
: Set the address of elasticsearch, because kibana is already in the same network as elasticsearch, so you can directly access elasticsearch with the container name-p 5601:5601
: port mapping configuration
Kibana is generally slow to start and needs to wait for a while. You can use the command:
docker logs -f kibana
Check the running log. When you see the following log, it means success:
At this point, enter the address in the browser to access: http://192.168.141.100:5601 , you can see the result
See kibana~
click Explore on my own, and then
2.2.DevTools
A DevTools interface is provided in kibana:
In this interface, DSL can be written to operate elasticsearch. And there is an automatic completion function for DSL statements.
The json format statement on the left is the DSL query statement,
the essence is to send a Restful request to es
2.3 Word segmentation problem (Chinese is not friendly)
# 测试分词器
POST /_analyze
{
"text": "李白讲的java太棒了",
"analyzer": "english"
}
{
"tokens" : [
{
"token" : "李",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "白",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "讲",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "的",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "java",
"start_offset" : 4,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "太",
"start_offset" : 8,
"end_offset" : 9,
"type" : "<IDEOGRAPHIC>",
"position" : 5
},
{
"token" : "棒",
"start_offset" : 9,
"end_offset" : 10,
"type" : "<IDEOGRAPHIC>",
"position" : 6
},
{
"token" : "了",
"start_offset" : 10,
"end_offset" : 11,
"type" : "<IDEOGRAPHIC>",
"position" : 7
}
]
}
Changing the parser from 'english' to 'chinese' or other 'standard' is still the same, and the running result remains the same
It can be seen that the word segmentation in English is still good, and 'java' is divided into one word. But in Chinese, the land is divided character by character, which is obviously inappropriate. The default es cannot understand the Chinese meaning
3. Install the IK tokenizer
Git address: https://github.com/medcl/elasticsearch-analysis-ik
It can be seen that it is specially used for ES
3.1. Install ik plugin online (slower)
# 进入容器内部
docker exec -it elasticsearch /bin/bash
# 在线下载并安装
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip
#退出
exit
#重启容器
docker restart elasticsearch
3.2. Install ik plugin offline (recommended)
1) View the data volume directory
To install the plug-in, you need to know the location of the plugins directory of elasticsearch, and we use the data volume mount, so we need to view the data volume directory of elasticsearch, and check it by the following command:
docker volume inspect es-plugins
Show results:
[
{
"CreatedAt": "2023-07-15T15:57:30+08:00",
"Driver": "local",
"Labels": null,
"Mountpoint": "/var/lib/docker/volumes/es-plugins/_data",
"Name": "es-plugins",
"Options": null,
"Scope": "local"
}
]
It shows that the plugins directory is mounted to: /var/lib/docker/volumes/es-plugins/_data
this directory.
2) Unzip the tokenizer installation package
Next, we need to decompress the ik tokenizer in the pre-class materials and rename it to ik
3) Upload to the plug-in data volume of the es container
That is /var/lib/docker/volumes/es-plugins/_data
:
4) Restart the container
# 4、重启容器
docker restart es
# 查看es日志
docker logs es | grep analysis-ik
Successfully loaded, the word breaker is installed
5) Test:
The IK tokenizer contains two modes:
-
ik_smart
: Least segmentation (group words as long as possible, and then no longer segmentation) -
ik_max_word
: The finest segmentation (more divisions, if it is a word, it will be divided, and the word can be used repeatedly)
POST /_analyze
{
"text": "胡老师讲的java太棒了",
"analyzer": "ik_max_word"
}
result:
{
"tokens" : [
{
"token" : "胡",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "老师",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "讲",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "的",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 3
},
{
"token" : "java",
"start_offset" : 5,
"end_offset" : 9,
"type" : "ENGLISH",
"position" : 4
},
{
"token" : "太棒了",
"start_offset" : 9,
"end_offset" : 12,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "太棒",
"start_offset" : 9,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "了",
"start_offset" : 11,
"end_offset" : 12,
"type" : "CN_CHAR",
"position" : 7
}
]
}
3.3 Extended word dictionary
With the development of the Internet, "word-making movements" have become more and more frequent. Many new words appeared that did not exist in the original vocabulary list. For example: "Olige", "Forever Drop God" and so on.
Therefore, our vocabulary also needs to be constantly updated, and the IK tokenizer provides the function of expanding vocabulary.
1) Open the IK tokenizer config directory:
/var/lib/docker/volumes/es-plugins/_data/ik/config
2) Add in the content of the IKAnalyzer.cfg.xml configuration file:
The configuration has been written by default, just fill in the file name
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 *** 添加扩展词典-->
<entry key="ext_dict">ext.dic</entry>
</properties>
3) Create a new ext.dic, you can refer to the config directory to copy a configuration file for modification
In fact, just list each word line by line
全红禅
永远滴神
奥力给
4) restart elasticsearch
docker restart es
# 查看 日志
docker logs -f elasticsearch
The log will show that the ext.dic configuration file has been successfully loaded
Or wait patiently for a while, basically it can be loaded normally
5) Test effect:
POST /_analyze
{
"text": "全红禅永远滴神,我的神,奥力给",
"analyzer": "ik_max_word"
}
Note that the encoding of the current file must be in UTF-8 format, and editing with Windows Notepad is strictly prohibited
3.4 Stop word dictionary
In Internet projects, the speed of transmission between networks is very fast, so many languages are not allowed to be transmitted on the network, such as: sensitive words about religion, politics, etc., then we should also ignore the current vocabulary when searching.
The IK tokenizer also provides a powerful stop word function, allowing us to ignore the contents of the current stop vocabulary when indexing.
1) Add the contents of the IKAnalyzer.cfg.xml configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典-->
<entry key="ext_dict">ext.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典 *** 添加停用词词典-->
<entry key="ext_stopwords">stopword.dic</entry>
</properties>
In fact, they are all configured, but the names of the two dictionaries are empty by default.
3) Add stop words to stopword.dic
This file already exists by default, just add it directly
的
地
了
哦
啊
嘤
4) restart elasticsearch
# 重启服务
docker restart elasticsearch
docker restart kibana
# 查看 日志
docker logs -f elasticsearch
The stopword.dic configuration file has been successfully loaded in the log
5) Test effect:
POST /_analyze
{
"text": "全红禅永远滴神,我的神,奥力给",
"analyzer": "ik_max_word"
}
Note that the encoding of the current file must be in UTF-8 format, and editing with Windows Notepad is strictly prohibited
{
"tokens" : [
{
"token" : "全红禅",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "永远滴神",
"start_offset" : 3,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "永远",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "滴",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_CHAR",
"position" : 3
},
{
"token" : "神",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "我",
"start_offset" : 8,
"end_offset" : 9,
"type" : "CN_CHAR",
"position" : 5
},
{
"token" : "神",
"start_offset" : 10,
"end_offset" : 11,
"type" : "CN_CHAR",
"position" : 6
},
{
"token" : "奥力给",
"start_offset" : 12,
"end_offset" : 15,
"type" : "CN_WORD",
"position" : 7
}
]
}
全红禅
, 永远滴神
, 奥利给
. can be recognized as idioms
的
. won't be participle anymore
- summary
4. Deploy the es cluster
Deploying the es cluster can be done directly using docker-compose, but it is required that your Linux virtual machine has at least 4G of memory space (if it is not enough, re-allocate and increase it)
4.1. Create es cluster
First write a docker-compose file with the following content:
docker-compose.yml
version: '2.2'
services:
es01:
image: elasticsearch:7.12.1
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- data01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- elastic
es02:
image: elasticsearch:7.12.1
container_name: es02
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- data02:/usr/share/elasticsearch/data
ports:
- 9201:9200
networks:
- elastic
es03:
image: elasticsearch:7.12.1
container_name: es03
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- data03:/usr/share/elasticsearch/data
networks:
- elastic
ports:
- 9202:9200
volumes:
data01:
driver: local
data02:
driver: local
data03:
driver: local
networks:
elastic:
driver: bridge
You can see from the yml file:
es01 port 9200
es02 port 9201
es03 port 9202
To upload to linux
es, you need to modify some linux system permissions and modify /etc/sysctl.conf
files
vi /etc/sysctl.conf
Add the following content:
vm.max_map_count=262144
Then execute the command to make the configuration take effect:
sysctl -p
If you restart the virtual machine, you have to start docker first
systemctl start docker
Run docker-compose
to bring up the cluster:
docker-compose up -d
View logs for each node
docker logs -f es01
docker logs -f es02
docker logs -f es03
4.2. Cluster status monitoring
Kibana can monitor es clusters, but the new version needs to rely on the x-pack function of es, and the configuration is more complicated.
It is recommended to use cerebro to monitor the status of es cluster, official website: https://github.com/lmenezes/cerebro
The pre-class materials have provided the installation package:
Link: https://pan.baidu.com/s/1zrji4O8niH_UmQNKBhNIPg
Extraction code: hzan
It can be used after decompression, which is very convenient.
The decompressed directory is as follows:
Enter the corresponding bin directory:
Double-click the cerebro.bat file to start the service.
When a higher version such as jdk17 starts, an error will be reported: just
Caused by: java.lang.IllegalStateException: Unable to load cache item
change the java in the environment variable to the environment variable of jdk8, that is,
use jdk8
Visit http://localhost:9000 to enter the management interface:
http://192.168.141.100:9200/
Enter the address and port of any node of your elasticsearch, and click connect:
A green bar indicates that the cluster is green (healthy).
4.3. Create index library
1) Use kibana's DevTools to create an index library
Enter the command in DevTools:
Multiple nodes store the index library in fragments, and then back up each other.
How to fragment and how many copies? Configured when creating the index library
PUT /itcast
{
"settings": {
"number_of_shards": 3, // 分片数量
"number_of_replicas": 1 // 副本数量
},
"mappings": {
"properties": {
// mapping映射定义 ...
}
}
}
Kibana has been stopped, in fact, you can also use cerebro to create an index library
2) Use cerebro to create an index library
You can also create an index library with cerebro:
Fill in the index library information:
Click the create button in the lower right corner:
4.4. View fragmentation effect
Go back to the home page, and you can view the effect of index library sharding: it
is exactly the same as the picture in the case diagram
. These found a misunderstanding. The above picture does not have 3 index libraries, but an index library is divided into 3 slices for storage. Shards are stored on different es instances, and the shards between the three instances are mutual backups