Elasticsearch Series Articles (2): Getting Started Tutorial for Full-text Search Engine Elasticsearch Cluster Building

Introduction

ElasticSearch is a search server based on Lucene. It provides a full-text search engine with distributed multi-user capabilities, based on a RESTful web interface. Elasticsearch is developed in Java and released as an open source under the terms of the Apache license. It is a popular enterprise search engine. Designed for use in cloud computing, it can achieve real-time search, stable, reliable, fast, and easy to install and use. Basepedia, Stack Overflow, and Github all use it.

This article starts from scratch and explains how to use Elasticsearch to build your own full-text search engine. There are detailed instructions for each step, and everyone can learn by following them.

Install ElasticSearch

Download link:  https://www.elastic.co/downloads/elasticsearch

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.2.tar.gz
cd /usr/local
tar -zxvf   elasticsearch-5.5.2.tar.gz

su tzs Switch to tzs user (root user is not supported by default)

sh /usr/local/elasticsearch/bin/elasticsearch -d Where -d means to start in the background

Test whether it is successful on vmware: curl  http://localhost:9200/

If an effect like the one shown above appears, it means it has been installed.

Elasticsearch's default restful-api port is 9200. IP addresses are not supported, which means that the services in the virtual machine cannot be accessed from the host, and can only be accessed on the local machine using  http://localhost:9200  . If you need to change, you need to modify the configuration file /usr/local/elasticsearch/config/elasticsearch.yml and add the following two lines:

network.bind_host: 0.0.0.0
network.publish_host: _nonloopback:ipv4

Or remove the comment before network.host and http.port, and modify the IP address of network.host to the external IP of the machine. Then restart, Elasticsearch

Shutdown method (enter the command:, ps -ef | grep elasticsearch find the process, and then kill it.

If you still can not access the external network, there may be caused by firewall settings (turn off the firewall: service iptables stop )

Modify the configuration file:vim config/elasticsearch.yml

cluster.name: my-app (the name of the cluster, the same name is a cluster)

node.name: es1 (the name of the node, which must be the same as the name in the hosts configured earlier)

path.data: /data/elasticsearch/data (the path of the data. There is no need to create ( mkdir -p /data/elasticsearch/{data,logs}), and give the execution user permission  chown tzs /data/elasticsearch/{data,logs} -R )
path.logs: /data/elasticsearch/logs (the path of the data log information, the same as above)
network.host: 0.0.0.0 //Allow external network access, or your own ip address
http.port: 9200 //Access port
discovery.zen.ping.unicast.hosts: ["192.168.153.133", "192.168.153.134", "192.168.153.132"] //The ip address of each node

Remember to add: (This is for installing the head plug-in, currently not needed)
http.cors.enabled: true
http.cors.allow-origin: "*"

Finally, the effect in the external browser is as follows:

Install IK Chinese word segmentation

You can download the source code yourself and use maven to compile, of course, if you are afraid of trouble, you can download the compiled directly

https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v5.5.2

Pay attention to download the corresponding version in the plugins directory

Unzip

unzip elasticsearch-analysis-ik-5.5.2.zip

Create a new ik directory under the plugins of es

mkdir ik

Copy the unzipped file to the ik directory

cp -r elasticsearch/* ik

Delete the decompressed

rm -rf elasticsearch
rm -rf elasticsearch-analysis-ik-5.5.2.zip

IK comes with two tokenizers

ik_max_word  : The text will be split at the finest granularity; split out as many words as possible

ik_smart : Will do the coarsest split; words that have been separated will not be occupied by other words again

After installing the IK Chinese tokenizer (of course, there are more than this Chinese tokenizer, there are others, you can refer to my article  Comparison between the Elasticsearch default tokenizer and the middle tokenizer and how to use it ), the test differences are as follows:

i_max_word

curl -XGET ' http://192.168.153.134:9200/_analyze?pretty&analyzer=ik_max_word ' -d'Lenovo is the world's largest notebook manufacturer'

{
  "tokens" : [
    {
      "token" : "联想",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "全球",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "最大",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "的",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "笔记本",
      "start_offset" : 8,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "笔记",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "本厂",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "厂商",
      "start_offset" : 11,
      "end_offset" : 13,
      "type" : "CN_WORD",
      "position" : 8
    }
  ]
}

ik_smart

curl -XGET ' http://localhost:9200/_analyze?pretty&analyzer=ik_smart ' -d'Lenovo is the world's largest notebook manufacturer'

{
  "tokens" : [
    {
      "token" : "联想",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "全球",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "最大",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "的",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "笔记本",
      "start_offset" : 8,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "厂商",
      "start_offset" : 11,
      "end_offset" : 13,
      "type" : "CN_WORD",
      "position" : 6
    }
  ]
}

Install the head plugin

elasticsearch-head is an elasticsearch cluster management tool. It is an independent web program written entirely by html5, and you can integrate it into es through a plug-in.

Install git

yum remove git
yum install git
git clone git://github.com/mobz/elasticsearch-head.git   拉取 head 插件到本地,或者直接在 GitHub 下载 压缩包下来

Install nodejs

Go to the official website to download node-v8.4.0-linux-x64.tar.xz first

tar -Jxv -f  node-v8.4.0-linux-x64.tar.xz
mv node-v8.4.0-linux-x64  node

Environment variable settings: 

vim  /etc/profile

Added: 

export NODE_HOME=/opt/node
export PATH=$PATH:$NODE_HOME/bin
export NODE_PATH=$NODE_HOME/lib/node_modules

Make the configuration file effective (this step is very important, you should pay more attention to this step) 

source /etc/profile

Test whether it is globally available: 

node -v

then 

mv elasticsearch-head head
cd head/
npm install -g grunt-cli
npm install
grunt server

Add to the es configuration file: 

http.cors.enabled: true
http.cors.allow-origin: "*"

Open the browser and  http://192.168.153.133:9100/ you can see the effect,

Encounter problems

I walked through the pit again to prevent entering the pit again in the future, hereby record it

1、ERROR Could not register mbeans java.security.AccessControlException: access denied (“javax.management.MBeanTrustPermission” “register”)

Change the owner of the elasticsearch folder to the current user

sudo chown -R noroot:noroot elasticsearch

This is because elasticsearch needs to read and write configuration files. We need to give permissions to the config folder. The elsearch user is created above. The elsearch user does not have read and write permissions, so an error will still be reported. The solution is to switch to the administrator account and grant permissions. :

sudo -i

chmod -R 775 config

2、[WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root

The reason is that elasticsearch does not support startup by root user by default.

Solution 1: Des.insecure.allow.root=true

Modify /usr/local/elasticsearch-2.4.0/bin/elasticsearch,

添加 ES_JAVA_OPTS=”-Des.insecure.allow.root=true”

Or add it during execution: sh /usr/local/elasticsearch-2.4.0/bin/elasticsearch -d -Des.insecure.allow.root=true

Note: Running with root in a formal environment may have security risks. It is not recommended to run with root.

Solution 2: Add dedicated users

useradd elastic
chown -R elastic:elastic  elasticsearch-2.4.0
su elastic
sh /usr/local/elasticsearch-2.4.0/bin/elasticsearch -d

3、UnsupportedOperationException: seccomp unavailable: requires kernel 3.5+ with CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER compiled in

Just a warning, with the new linux version, this kind of problem won't occur.

4、ERROR: [4] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]

Reason: unable to create local file problem, the maximum number of files that the user can create is too small

Solution: Switch to the root user, edit the limits.conf configuration file, and add content similar to the following:

vim /etc/security/limits.conf

Add the following content:

*  soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096

[2]: max number of threads [1024] for user [tzs] is too low, increase to at least [2048]

Reason: unable to create local thread problem, the maximum number of threads that can be created by the user is too small

Solution: Switch to the root user, enter the limits.d directory, and modify the 90-nproc.conf configuration file.

vim /etc/security/limits.d/90-nproc.conf

Find the following:

  • soft nproc 1024

change into

  • soft nproc 2048

[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Reason: the maximum virtual memory is too small

The root user executes the command:

sysctl -w vm.max_map_count=262144

Or modify the /etc/sysctl.conf file and add the "vm.max_map_count"
setting. After setting, you can use
$ sysctl -p

[4]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

Reason: Centos6 does not support SecComp, and ES5.4.1 default bootstrap.system_call_filter is true for detection, so the detection fails, and the failure will directly cause the ES to fail to start.
For details, see: https://github.com/elastic/elasticsearch/issues/22899

Solution: Add a new configuration bootstrap.system_call_filter in elasticsearch.yml, set it to false, pay attention to it under Memory:
bootstrap.memory_lock: false
bootstrap.system_call_filter: false

5、 java.lang.IllegalArgumentException: property [elasticsearch.version] is missing for plugin [head]

Add to the es configuration file:

http.cors.enabled: true
http.cors.allow-origin: "*"

 

Guess you like

Origin blog.csdn.net/weixin_42073629/article/details/114486831