Elasticsearch database

1. What is Elasticsearch?

1.1 Concept and Features

  1. Elasticsearch, like MongoDB/Redis/Memcache, is a non-relational database. It is a near-real-time search platform. There is only a slight delay from indexing this document to this document being searchable. Enterprise application positioning: a scalable and highly available full-text search tool for real-time data analysis using Restful API standards.

  2. Scalable: supports one master and multiple slaves and is easy to expand. As long as the cluster.name is consistent and in the same network, it can automatically join the current cluster; it is open source software itself, and also supports many open source third-party plug-ins.

  3. High availability: Distributed storage among multiple nodes in a cluster. The index supports shards and replication. Even if some nodes are down, data recovery and master-slave switching can be performed automatically.

  4. The smallest unit of data storage is a document, which is essentially a JSON text:

insert image description here

1.2 Overview of ElasticSearch Applicable Scenarios

  1. Wikipedia, similar to Baidu Encyclopedia, full-text search, highlight, search recommendation

  2. The Guardian (a foreign news website), similar to Sohu News, user behavior logs (clicks, browsing, favorites, comments) + social network data (related views on certain news), data analysis, and given to the author of each news article, Let him know about public feedback on his articles (good, bad, popular, trashy, despised, adored)

  3. Stack Overflow (foreign program exception discussion forum), IT problems, program error reports, submit them, someone will discuss and answer with you, full-text search, search for related questions and answers, if the program reports an error, the error message will be pasted into it Go, search if there is a corresponding answer

  4. GitHub (open source code management), search hundreds of billions of lines of code

  5. E-commerce website, search for products

  6. Domestic: In-site search (e-commerce, recruitment, portals, etc.), IT system search (OA, CRM, ERP, etc.), data analysis (a popular usage scenario for ES)

2. Install Elasticsearch

2.1 Download the installation package

Official website download address:
https://www.elastic.co/cn/downloads/past-releases

2.2 Environmental Description

//系统版本
[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (Core)
//关闭防火墙
[root@localhost ~]# systemctl stop firewalld && systemctl disable firewalld
[root@localhost ~]# sed -i '7s/enforcing/disabled/g' /etc/selinux/config
[root@localhost ~]# setenforce 0
[root@localhost ~]# getenforce
Permissive

2.3 Create es user

[root@localhost ~]# groupadd es
[root@localhost ~]# useradd -g es -s /bin/bash -md /home/es es

2.4 Create es storage location

//存放在/var/es(根据实际需求)
[root@localhost ~]# mkdir /var/es && cd /var/es
[root@localhost es]# mkdir data && mkdir log
[root@localhost es]# chown -Rf es:es /var/es/

2.5 install es

//创建文件夹,并将安装包上传到这里
[root@localhost ~]# mkdir /usr/local/es && cd /usr/local/es
//上传安装包
[root@localhost src]# ls
debug  elasticsearch-6.8.20.tar.gz  kernels
//解压安装包
[root@localhost src]# tar xf elasticsearch-6.8.20.tar.gz -C /usr/local/es/
[root@localhost src]# cd /usr/local/es/
[root@localhost es]# chown -Rf es:es /usr/local/es/elasticsearch-6.8.20/

2.5 Modify the configuration file

//编辑配置文件
[root@localhost es]# vim /usr/local/es/elasticsearch-6.8.20/config/elasticsearch.yml
//取消cluster.name前的#号注释,改成自己起的名字。(注意前面的数字代表行号)
17 cluster.name: my-application
//node.name取消#号
23 node.name: node-1
//设置path.data,取消#号,改为如下的
33 path.data: /var/es/data
//设置path.logs,取消#号,改为如下的
37 path.logs: /var/es/log
//network.host取消#号,改为0.0.0.0(允许所有ip访问)
55 network.host: 0.0.0.0
//取消http.port#
59 http.port: 9200
//在文件的最后添加以下配置
 89 bootstrap.memory_lock: false
 90 bootstrap.system_call_filter: false

2.6 System Optimization

//修改文件1
[root@localhost es]# vi /etc/security/limits.conf
末尾添加
62 es soft nofile 65536
63 es hard nofile 65536
64 es soft nproc 4096
65 es hard nproc 4096
//修改文件2
[root@localhost es]# vim /etc/sysctl.conf 
末尾添加
11 vm.max_map_count = 655360
[root@localhost es]# sysctl -p
vm.max_map_count = 655360

2.7 Install jdk environment

//上传jdk安装包
[root@localhost src]# ls
debug  elasticsearch-6.8.20.tar.gz  jdk-8u131-linux-x64.tar.gz  kernels
//解压安装包
[root@localhost src]# tar xf jdk-8u131-linux-x64.tar.gz -C /usr/local/
//添加环境变量
[root@localhost src]# vim /etc/profile
末尾添加
 78 #JAVA_HOME
 79 export JAVA_HOME=/usr/local/java
 80 #JRE_HOME
 81 export JRE_HOME=/usr/local/java/jre
 82 #CALSSPATH
 83 export CLASSPATH=$CLASSPATH:${JAVA_HOME}/lib:${JRE_HOME}/lib
 84 #PATH
 85 export PATH=$PATH:${JAVA_HOME}/bin:${JRE_HOME}/bin

//重命名
[root@localhost ~]# mv /usr/local/jdk1.8.0_131/ /usr/local/java
[root@localhost ~]# source /etc/profile
[root@localhost ~]# java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

2.8 Switch the es user to start the database

[root@localhost ~]# su es
[es@localhost root]$ /usr/local/es/elasticsearch-6.8.20/bin/elasticsearch &

2.9 systemctl management

2.10 Access

Use a browser to access ip:9200 (port 9200 of the installed device), and see the following instructions that the installation is successful:
insert image description here

3. kibana

3.1 introduction to kibana

Kibanais an open source analytics and visualization platform designed to be Elasticsearchused with and with. You can kibanasearch and view Elasticsearchthe data stored in it. KibanaThe way to interact with Elasticsearchis a variety of charts, tables, maps, etc., to display data intuitively, so as to achieve the purpose of advanced data analysis and visualization.
Elasticsearch, Logstashand Kibanathese three technologies are what we often call the ELK technology stack. It can be said that the combination of these three technologies is a very clever design in the field of big data. A very typical MVC idea, model persistence layer, view layer and control layer. Logstash acts as the control layer and is responsible for collecting and filtering data. ElasticsearchPlays the role of data persistence layer and is responsible for storing data. The theme of our chapter Kibanaplays the role of the view layer, has various dimensions of query and analysis, and uses a graphical interface to display the data stored in Elasticsearch.

3.2 install kibana

Official website download address:
https://www.elastic.co/cn/downloads/past-releases#kibana

3.3 Upload installation package

//使用rz命令或者xftp上传
[root@localhost src]# ls
debug  elasticsearch-6.8.20.tar.gz  jdk-8u131-linux-x64.tar.gz  kernels  kibana-6.8.20-linux-x86_64.tar.gz

3.4 Unzip the file

[root@localhost src]# tar xf kibana-6.8.20-linux-x86_64.tar.gz -C /usr/local/

3.5 Modify the configuration file

//下列的序号为行号
[root@localhost src]# vim /usr/local/kibana-6.8.20-linux-x86_64/config/kibana.yml
7 server.host: "192.168.5.55"                           //ES服务器主机地址
28 elasticsearch.hosts: ["http://192.168.5.55:9200"]    //ES服务器地址

3.6 start

[root@localhost src]# cd /usr/local/kibana-6.8.20-linux-x86_64/
[root@localhost kibana-6.8.20-linux-x86_64]# ./bin/kibana &

3.7 Browser access

http://192.168.5.55:5601/app/kibana
insert image description here

4. Elasticsearch high availability cluster

4.1 How ES solves high concurrency

ES is a distributed full-text search framework that hides complex processing mechanisms, such as the core content fragmentation mechanism, cluster discovery, and fragmentation load request routing.

4.2 ES Basic Concept Nouns

Cluster

Represents a cluster. There are multiple nodes in the cluster, and one of them is the master node. This master node can be elected through elections. The master-slave nodes are for the inside of the cluster. One of the concepts of es is decentralization, which literally means no central node. This is for the outside of the cluster, because looking at the es cluster from the outside, it is logically a whole. Your communication with any node and the entire ES cluster communication is equivalent.

Shards

Represents index sharding. es can divide a complete index into multiple shards. The advantage of this is that a large index can be split into multiple pieces and distributed to different nodes. constitute a distributed search. The number of shards can only be specified before the index is created, and cannot be changed after the index is created

replicas

Represents the index copy, es can set multiple copies of the index, the role of the copy is to improve the fault tolerance of the system, when a node or a fragment is damaged or lost, it can be recovered from the copy. The second is to improve the query efficiency of es, and es will automatically load balance the search requests.

Recovery

Represents data recovery or data redistribution. ES will redistribute the index fragments according to the load of the machine when a node joins or exits, and data recovery will also be performed when the node that hangs up is restarted.

4.3 Why does ES implement clusters

An index in an ES cluster may consist of multiple shards, and each shard can have multiple copies. By splitting a single index into multiple shards, we can handle large indexes that cannot be run on a single server. Simply put, the size of the index is too large, causing efficiency problems. The reason for not running may be memory or storage. Since each shard can have multiple replicas, by distributing the replicas to multiple servers, the query load capacity can be improved.

ES cluster core principle analysis:

  1. Each index is divided into multiple shards for storage. By default, 5 shards are allocated for creating an index.
    Each shard will be distributed and deployed on multiple different nodes, and the shard becomes primary shards.
    Note: After the primary shards of the index are defined, they cannot be modified later.

  2. In order to achieve high availability of highly available data, the primary shard can have corresponding backup shards replicas shards, and the replica shards shards are responsible for fault tolerance and load balancing of requests.

Note: In order to achieve high availability, each primary shard will have its own corresponding backup shard. The backup shard corresponding to the primary shard cannot be stored on the same server (a single ES does not have a backup shard). Primary shards can be stored on the same node as other replicas shards.
When storing data in the primary shard server, it will be synchronized to the standby shard server in real time:

insert image description here
but when querying, all (primary and standby) will be queried.

The main one can store the secondary ones:
insert image description here
Node1: P1+P2+R3 constitutes a complete data! distributed storage

The core data stored in the ES core is an index!
If ES implements a cluster, the index file of a single server node will be distributed and stored on multiple different physical machines using fragmentation technology.
Sharding is to split data into multiple nodes for storage.
In ES sharding technology, it is divided into primary (primary) sharding and secondary (replicas) sharding. This is done for fault tolerance

5. High availability ES cluster deployment

5.1 Environmental Description

系统环境:
[es@localhost root]$ cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (Core)

Host:

IP address node database version visualization tool
192.168.5.55 node1 (master) elasticsearch-6.8.20 kibana-6.8.20
192.168.5.56 node2 elasticsearch-6.8.20
192.168.5.100 node3 elasticsearch-6.8.20

5.2 Installation process

Three hosts need to be prepared and the ES database has been installed. Please refer to the second section of the catalog for the installation process
Note: Close the firewall

5.3 Modify the configuration file

修改192.168.5.55配置文件
[root@localhost config]# pwd
/usr/local/es/elasticsearch-6.8.20/config
[root@localhost config]# vim elasticsearch.yml
注:前面的序号为行号,标记多少行

前面省略……
17 cluster.name: myes   //集群名称,保证三台集群名字相同
23 node.name: node-1   //当前节点名称,集群内节点名字不能相同
33 path.data: /var/es/data //数据存放目录
37 path.logs: /var/es/log  //日志存放目录
55 network.host: 0.0.0.0   //主机  
59 http.port: 9200        //端口号
68 discovery.zen.ping.unicast.hosts: ["192.168.5.55", "192.168.5.56","192.168.5.100"]  //多个服务集群ip
72 discovery.zen.minimum_master_nodes: 1   //最少主节点数量
89 bootstrap.memory_lock: false     //添加以下两行,开放网络权限
90 bootstrap.system_call_filter: false

剩下的两台利用scp命令远程覆盖配置文件
注意:记得修改节点名称,集群内节点名字不能相同
[root@localhost ~]# cd /usr/local/es/elasticsearch-6.8.20/config/
[root@localhostconfig]# scp elasticsearch.yml [email protected]:/usr/local/es/  
elasticsearch-6.8.20/config/

5.4 Start the es database

三台依次启动
[root@localhost config]# su es
[es@localhost config]$ /usr/local/es/elasticsearch-6.8.20/bin/elasticsearch &

5.5 Cluster Test

Access the following address through a browser to view the cluster startup status
192.168.5.55:9200/_cat/health?v
insert image description here

//查看集群信息
[root@localhost config]# curl 192.168.5.55:9200/_cat/nodes
192.168.5.56  14 96 1 0.05 0.15 0.13 mdi - node-2
192.168.5.100 14 90 0 0.00 0.06 0.06 mdi - node-3
192.168.5.55  23 97 2 0.00 0.04 0.05 mdi * node-1  带*号表示主节点
//检查分片是否正常

 [root@localhost ~]# curl 192.168.5.55:9200/_cat/shards
.kibana_task_manager            0 p STARTED     2   6.8kb 192.168.5.55  node-1
.kibana_task_manager            0 r STARTED     2   6.8kb 192.168.5.56  node-2
.kibana_1                       0 p STARTED     4  19.8kb 192.168.5.56  node-2
.kibana_1                       0 r STARTED     4  19.8kb 192.168.5.100 node-3
.monitoring-kibana-6-2023.05.15 0 p STARTED  2194 747.9kb 192.168.5.55  node-1
.monitoring-kibana-6-2023.05.15 0 r STARTED  2194 680.4kb 192.168.5.100 node-3
.monitoring-kibana-6-2023.05.12 0 p STARTED  1180   420kb 192.168.5.56  node-2
.monitoring-kibana-6-2023.05.12 0 r STARTED  1180   420kb 192.168.5.100 node-3
.monitoring-es-6-2023.05.15     0 p STARTED 28065    14mb 192.168.5.55  node-1
.monitoring-es-6-2023.05.15     0 r STARTED 28065    14mb 192.168.5.56  node-2
.monitoring-es-6-2023.05.12     0 p STARTED 11490   5.6mb 192.168.5.55  node-1
.monitoring-es-6-2023.05.12     0 r STARTED 11490   5.6mb 192.168.5.100 node-3

P 表示primar shard 主分片,前面的数字表示分片数量
R 表示 replica shard 副本分片

5.6 Verify ES cluster failover

注意实心星号的是主节点,我们尝试将 192.168.5.55 节点服务关闭,验证,主节点是否进行重新选举,并再次启动 192.168.5.55,看看是否变成候选节点:
[root@localhost ~]# ps -ef | grep /usr/local/es/ela
[root@localhost ~]# kill -9 3184

[es@localhost config]$ curl 192.168.5.56:9200/_cat/nodes
192.168.5.56  21 96 1 0.03 0.02 0.05 mdi - node-2
192.168.5.100 24 95 1 0.30 0.10 0.07 mdi * node-3
发现 192.168.5.100变成了主节点,然后启动 192.168.5.55,验证其是否变成了候选节点
[root@localhost ~]# curl 192.168.5.55:9200/_cat/nodes
192.168.5.100 27 94 2 0.08 0.08 0.07 mdi * node-3
192.168.5.55  26 96 1 0.97 0.25 0.12 mdi - node-1
192.168.5.56  22 96 1 0.03 0.03 0.05 mdi - node-2

6. Configure kibana monitoring cluster

6.1 Modify the configuration file

Please refer to the third part of the directory for the installation process

修改配置kibana配置文件
[root@localhost ~]# vim /usr/local/kibana-6.8.20-linux-x86_64/config/kibana.yml
7 server.host: "0.0.0.0"  //服务器主机地址
//集群ip端口号
28 elasticsearch.hosts: ["http://192.168.5.55:9200","http://192.168.5.56:9200",
"http://192.168.5.100:9200"]
[root@localhost ~]# cd /usr/local/kibana-6.8.20-linux-x86_64/bin/
[root@localhost bin]# ./bin/kibana &

6.2 Access

Successfully monitored the entire cluster
insert image description here

6.3 Verify Sharding

When the three nodes are normal, the following shows 16 shards
Node1: 5
Node2: 6
Node3: 5
insert image description here
simulated faults:
Now turn off the node1 node, normally the five shards of the node1 node will be redistributed to the remaining The two nodes below bring the total to 16 shards

[root@localhost kibana-6.8.20-linux-x86_64]# netstat –tunlp
[root@localhost kibana-6.8.20-linux-x86_64]# kill -9 2036

insert image description here
The verification is successful, and you can see that node1 is hung up, and the remaining two nodes are each assigned 8 fragments

Guess you like

Origin blog.csdn.net/qq_49530779/article/details/130620355