Introduction
ElasticSearch is a search server based on Lucene. It provides a full-text search engine with distributed multi-user capabilities, based on a RESTful web interface. Elasticsearch is developed in Java and released as an open source under the terms of the Apache license. It is a popular enterprise search engine. Designed for use in cloud computing, it can achieve real-time search, stable, reliable, fast, and easy to install and use. Basepedia, Stack Overflow, and Github all use it.
This article starts from scratch and explains how to use Elasticsearch to build your own full-text search engine. There are detailed instructions for each step, and everyone can learn by following them.
Install ElasticSearch
Download link: https://www.elastic.co/downloads/elasticsearch
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.2.tar.gz
cd /usr/local
tar -zxvf elasticsearch-5.5.2.tar.gz
su tzs
Switch to tzs user (root user is not supported by default)
sh /usr/local/elasticsearch/bin/elasticsearch -d
Where -d means to start in the background
Test whether it is successful on vmware: curl http://localhost:9200/
If an effect like the one shown above appears, it means it has been installed.
Elasticsearch's default restful-api port is 9200. IP addresses are not supported, which means that the services in the virtual machine cannot be accessed from the host, and can only be accessed on the local machine using http://localhost:9200 . If you need to change, you need to modify the configuration file /usr/local/elasticsearch/config/elasticsearch.yml and add the following two lines:
network.bind_host: 0.0.0.0
network.publish_host: _nonloopback:ipv4
Or remove the comment before network.host and http.port, and modify the IP address of network.host to the external IP of the machine. Then restart, Elasticsearch
Shutdown method (enter the command:, ps -ef | grep elasticsearch
find the process, and then kill it.
If you still can not access the external network, there may be caused by firewall settings (turn off the firewall: service iptables stop
)
Modify the configuration file:vim config/elasticsearch.yml
cluster.name: my-app (the name of the cluster, the same name is a cluster)
node.name: es1 (the name of the node, which must be the same as the name in the hosts configured earlier)
path.data: /data/elasticsearch/data (the path of the data. There is no need to create ( mkdir -p /data/elasticsearch/{data,logs}
), and give the execution user permission chown tzs /data/elasticsearch/{data,logs} -R
)
path.logs: /data/elasticsearch/logs (the path of the data log information, the same as above)
network.host: 0.0.0.0 //Allow external network access, or your own ip address
http.port: 9200 //Access port
discovery.zen.ping.unicast.hosts: ["192.168.153.133", "192.168.153.134", "192.168.153.132"] //The ip address of each node
Remember to add: (This is for installing the head plug-in, currently not needed)
http.cors.enabled: true
http.cors.allow-origin: "*"
Finally, the effect in the external browser is as follows:
Install IK Chinese word segmentation
You can download the source code yourself and use maven to compile, of course, if you are afraid of trouble, you can download the compiled directly
https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v5.5.2
Pay attention to download the corresponding version in the plugins directory
Unzip
unzip elasticsearch-analysis-ik-5.5.2.zip
Create a new ik directory under the plugins of es
mkdir ik
Copy the unzipped file to the ik directory
cp -r elasticsearch/* ik
Delete the decompressed
rm -rf elasticsearch
rm -rf elasticsearch-analysis-ik-5.5.2.zip
IK comes with two tokenizers
ik_max_word : The text will be split at the finest granularity; split out as many words as possible
ik_smart : Will do the coarsest split; words that have been separated will not be occupied by other words again
After installing the IK Chinese tokenizer (of course, there are more than this Chinese tokenizer, there are others, you can refer to my article Comparison between the Elasticsearch default tokenizer and the middle tokenizer and how to use it ), the test differences are as follows:
i_max_word
curl -XGET ' http://192.168.153.134:9200/_analyze?pretty&analyzer=ik_max_word ' -d'Lenovo is the world's largest notebook manufacturer'
{
"tokens" : [
{
"token" : "联想",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "是",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "全球",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "最大",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "的",
"start_offset" : 7,
"end_offset" : 8,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "笔记本",
"start_offset" : 8,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "笔记",
"start_offset" : 8,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "本厂",
"start_offset" : 10,
"end_offset" : 12,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "厂商",
"start_offset" : 11,
"end_offset" : 13,
"type" : "CN_WORD",
"position" : 8
}
]
}
ik_smart
curl -XGET ' http://localhost:9200/_analyze?pretty&analyzer=ik_smart ' -d'Lenovo is the world's largest notebook manufacturer'
{
"tokens" : [
{
"token" : "联想",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "是",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "全球",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "最大",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "的",
"start_offset" : 7,
"end_offset" : 8,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "笔记本",
"start_offset" : 8,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "厂商",
"start_offset" : 11,
"end_offset" : 13,
"type" : "CN_WORD",
"position" : 6
}
]
}
Install the head plugin
elasticsearch-head is an elasticsearch cluster management tool. It is an independent web program written entirely by html5, and you can integrate it into es through a plug-in.
Install git
yum remove git
yum install git
git clone git://github.com/mobz/elasticsearch-head.git 拉取 head 插件到本地,或者直接在 GitHub 下载 压缩包下来
Install nodejs
Go to the official website to download node-v8.4.0-linux-x64.tar.xz first
tar -Jxv -f node-v8.4.0-linux-x64.tar.xz
mv node-v8.4.0-linux-x64 node
Environment variable settings:
vim /etc/profile
Added:
export NODE_HOME=/opt/node
export PATH=$PATH:$NODE_HOME/bin
export NODE_PATH=$NODE_HOME/lib/node_modules
Make the configuration file effective (this step is very important, you should pay more attention to this step)
source /etc/profile
Test whether it is globally available:
node -v
then
mv elasticsearch-head head
cd head/
npm install -g grunt-cli
npm install
grunt server
Add to the es configuration file:
http.cors.enabled: true
http.cors.allow-origin: "*"
Open the browser and http://192.168.153.133:9100/
you can see the effect,
Encounter problems
I walked through the pit again to prevent entering the pit again in the future, hereby record it
1、ERROR Could not register mbeans java.security.AccessControlException: access denied (“javax.management.MBeanTrustPermission” “register”)
Change the owner of the elasticsearch folder to the current user
sudo chown -R noroot:noroot elasticsearch
This is because elasticsearch needs to read and write configuration files. We need to give permissions to the config folder. The elsearch user is created above. The elsearch user does not have read and write permissions, so an error will still be reported. The solution is to switch to the administrator account and grant permissions. :
sudo -i
chmod -R 775 config
2、[WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root
The reason is that elasticsearch does not support startup by root user by default.
Solution 1: Des.insecure.allow.root=true
Modify /usr/local/elasticsearch-2.4.0/bin/elasticsearch,
添加 ES_JAVA_OPTS=”-Des.insecure.allow.root=true”
Or add it during execution: sh /usr/local/elasticsearch-2.4.0/bin/elasticsearch -d -Des.insecure.allow.root=true
Note: Running with root in a formal environment may have security risks. It is not recommended to run with root.
Solution 2: Add dedicated users
useradd elastic
chown -R elastic:elastic elasticsearch-2.4.0
su elastic
sh /usr/local/elasticsearch-2.4.0/bin/elasticsearch -d
3、UnsupportedOperationException: seccomp unavailable: requires kernel 3.5+ with CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER compiled in
Just a warning, with the new linux version, this kind of problem won't occur.
4、ERROR: [4] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
Reason: unable to create local file problem, the maximum number of files that the user can create is too small
Solution: Switch to the root user, edit the limits.conf configuration file, and add content similar to the following:
vim /etc/security/limits.conf
Add the following content:
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096
[2]: max number of threads [1024] for user [tzs] is too low, increase to at least [2048]
Reason: unable to create local thread problem, the maximum number of threads that can be created by the user is too small
Solution: Switch to the root user, enter the limits.d directory, and modify the 90-nproc.conf configuration file.
vim /etc/security/limits.d/90-nproc.conf
Find the following:
- soft nproc 1024
change into
- soft nproc 2048
[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
Reason: the maximum virtual memory is too small
The root user executes the command:
sysctl -w vm.max_map_count=262144
Or modify the /etc/sysctl.conf file and add the "vm.max_map_count"
setting. After setting, you can use
$ sysctl -p
[4]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
Reason: Centos6 does not support SecComp, and ES5.4.1 default bootstrap.system_call_filter is true for detection, so the detection fails, and the failure will directly cause the ES to fail to start.
For details, see: https://github.com/elastic/elasticsearch/issues/22899
Solution: Add a new configuration bootstrap.system_call_filter in elasticsearch.yml, set it to false, pay attention to it under Memory:
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
5、 java.lang.IllegalArgumentException: property [elasticsearch.version] is missing for plugin [head]
Add to the es configuration file:
http.cors.enabled: true
http.cors.allow-origin: "*"