Elasticsearch cluster principle, installation and basic use

Elasticsearch (ES) is an open source, distributed, RESTful full-text search engine based on Lucene components. ES is also a distributed document database, where each field is indexed data and can be searched, it can be extended to hundreds of servers to store and process PB-level data. It can store, search and analyze large amounts of data in a short period of time. It is usually used as the core engine in situations with complex search scenarios.
ES is born to test usability and scalability. This can be done by purchasing a server with higher performance.

Elasticsearch features

  • Horizontal scalability: You only need to add a server, do a little configuration, and start ES to be incorporated into the cluster.
  • The sharding mechanism provides better distribution: the same index is divided into multiple shards, which is similar to the block mechanism of HDFS; the divide-and-conquer method of mapreduce can improve processing efficiency.
  • High availability: Provides a replication (replica) mechanism, a shard can have multiple replicas, so that when a server is down, the cluster can still run as usual, and the data and information lost by the server downtime will be replicated and restored to others Available on the node.

Elasticsearch application scenarios

Large-scale distributed log analysis system ELK, elasticsearch (storing logs) + logstash (collecting logs) + kibana (displaying data)
e-commerce system search system, online disk search engine, etc.

Elasticsearch storage structure

Elasticsearch is a file storage, and Elasticsearch is a document-oriented database. A piece of data is a document here, and json is used as the document serialization format. For example, the following piece of user data:

{
    
    
	"user":"zfl",
	"sex":"0",
	"age":"23"
}

Relational database: Database -> Table -> Row -> Column
Elasticsearch: Index (Index) -> Type (type) -> Documents -> Fields

Install ES in Linux environment

  • Install JDK environment variables
export JAVA_HOME=/usr/local/jdk1.8.0_181
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

source /etc/profile

  • Download the Elasticsearch installation package
    Official document: https://www.elastic.co/downloads/elasticsearch

  • Upload the Elasticsearch installation package
    Insert picture description here

  • Decompress the Elasticsearch
    Insert picture description here
    directory structure as follows:

    bin:脚本文件,包括ES启动&安装插件等
    config:elasticsearch.yml(ES配置文件)、jvm.options(JVM配置文件)、日志配置文件等
    JDK:内置的jdk
    lib:类库
    logs:日志文件
    modules:ES所有模块,包括X-pack等
    plugins:ES已经安装的插件,默认没有插件
    data:ES启动的时候,默认与logs同级,用来存储文档数据。该目录可在elasticsearch.yml中配置
    
  • 修改 elasticsearch.yml
    Insert picture description here

  • Start elasticsearch and
    report an error: It is not allowed to run directly as root user

    Insert picture description here
    solve:

创建一个分组
groupadd lnhg
创建用户并添加进分组
useradd lnhu -g lnhg -p 123456
授予权限
chown -R lnhu:lnhg es安装目录
su lnhu 切换用户

Insert picture description here

Continue to report errors:
Insert picture description here

Edit /etc/sysctl.conf and add the following parameters:

vi /etc/sysctl.conf
vm.max_map_count=655360
sysctl -p

Insert picture description here
Will report an error, crash...
Insert picture description here
Edit /etc/security/limits.conf and add the following parameters:

vi /etc/security/limits.conf

* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096

Insert picture description here
Insert picture description here

Now that the stand-alone version is built, restart the service and reboot.
Insert picture description here

  • Visit Elasticsearch to
    close the firewall systemctl stop firewalld.service
    Insert picture description here
    http://192.168.15.130:9200
    Insert picture description here
  • Port difference (9200 and 9300)
    9200 port: ES node and external communication use
    9300 port: ES node communication use

kibana environment installation (windows installation)

  • Download the kibana-6.2.4-windows-x86_64 installation package and unzip it.
    Insert picture description here
    Modify config/kibana.yml to
    change the default configuration to the following:
    server.port: 5601 (default)
    server.host: local ip (default localhost)
    elasticsearch.url: es address
    Insert picture description here

  • Start kibana and
    double-click kibana.bat in the bin directory to start it.

  • Visit
    http://ip address:5601
    Insert picture description here

Kibana implements additions, deletions, and changes

  • Create index
    PUT /lnh
    Insert picture description here

  • Query index
    GET /lnh
    Insert picture description here

  • Add document/index name/type/id

 PUT /lnh/user/1
 {
    
    
	"user":"lnh",
	"sex":"0",
	"age":"23"
 }

Insert picture description here

  • Query document
    GET /lnh/user/1
    Insert picture description here

  • Query after deleting the index DELETE /lnh
    Insert picture description here

    Insert picture description here

Call RestFul to create a document

  • Query the specified data
    http://192.168.15.130:9200/lnh/user/1
    Insert picture description here
  • Query all documents of a certain type

http://192.168.15.130:9200/lnh/user/_search
Insert picture description here

Advanced Search

  • Query according to id
    GET /lnh/user/1
  • Query all current types of documents
    GET /lnh/user/_search
    Insert picture description here
  • Batch query based on multiple IDs,
    query multiple ids as 1, 2
    GET /lnh/user/_mget
    {
          
          
    "ids":["1","2"]
    }
    
    Insert picture description here
  • Complex conditions query
    query age 23 years old
    GET /lnh/user/_search?q=age:23
    Insert picture description here
    query age between 20-30 years old
    GET /lnh/user/_search?q=age[20 TO 30]
    Note: TO Be sure to capitalize the
    Insert picture description here
    query age between 20-30 years old, from 0 data to 1 data
    GET /lnh/user/_search?q=age[20 TO 30]&from=0&size=1
    Insert picture description here

DSL language query and filtering

What is DSL language

There are two ways to query query in es, one is the simplified query, and the other is the complete request body using JSON, called structured query (DSL).

Example

  • According to the name to accurately query the name
 GET /lnh/user/_search
 {
    
    
   "query":{
    
    
		"term":{
    
    
			"user":"zfl"
		}
	}
 }

Insert picture description here
Note: term means a complete match, no tokenizer analysis is performed, and the entire search term must be included in the document. For precise search

  • Fuzzy query by name
 GET /lnh/user/_search
 {
    
    
    "from":0,
    "size":2,
    "query":{
    
    
		"match":{
    
    
			"user":"partner"
		}
	}
 }

Insert picture description here
Note: match will segment the searched keywords when matching, and then search by segmentation matching, generally used for fuzzy query.

The difference between term and match

  • The term query will not segment the field, and will use exact matching.
  • Match will perform word segmentation query according to the word segmenter of the field.

Use filter to filter age

/lnh/user/_search
{
    
    
	"query":{
    
    
		"bool":{
    
    
			"must":[
				{
    
    
					"match_all":{
    
    }
				}
			],
			"filter":{
    
    
				"range":{
    
    
					"age":{
    
    
						"gte":20,
						"lte":30
					}
				}
			}
		}
	},
	"from":0,
	"size":10,
	"_source":["user","age"]
}

Insert picture description here
Description: _source indicates which fields to return

Tokenizer

What is a tokenizer

Because the default standard tokenizer in Elasticsearch is not very friendly to Chinese word segmentation, it will split Chinese words into Chinese characters. Therefore, the Chinese word segmentation device (ik plugin) is introduced.

  • Default standard tokenizer
    Insert picture description here

Install ik tokenizer

Download address: https://github.com/medcl/elasticsearch-analysis-ik/releases
Note: The ik tokenizer plug-in version should be consistent with the es version.

  • File directory after decompression
    Insert picture description here
  • Create a new ik directory under the plugins directory under the es installation directory
  • Copy the decompressed file to the plugins/ik directory
  • Restart es
    Insert picture description here
    instructions:
    • ik_max_work: The most fine-grained split of the intercom text, such as splitting the "National Anthem of the People's Republic of China" into "People's Republic of China, Chinese People, China, Chinese, People's Republic, People, People, People, Republic, Republic, "He, the national anthem" will exhaust all possible combinations.
    • ik_smart: Will do the coarsest resolution, such as splitting the "National Anthem of the People's Republic of China" into "The National Anthem of the People's Republic of China".

Custom extended dictionary

  • Create a new custom directory under the plugins/ik/config directory under the es installation directory and create a new_word.dic file
    Insert picture description here
  • Add custom words to the new_word.dic file
    Insert picture description here
  • Modify the config/IKAnalyzer.cfg.xml file to fill in the location of the customized extension dictionary

Insert picture description here

Document mapping

Introduction

A comparison has been made between the core concepts of Elasticsearch and relational databases. Index (index) is equivalent to database, type (type) is equivalent to database table, and mapping (mapping) is equivalent to the table structure of data table. The mapping in Elasticsearch is used to define a document, which can define the fields it contains, the types of fields, tokenizers, and attributes.
Document mapping is to specify the field type and tokenizer for the fields in the document.
Use GET /lnh2/stu/_mapping

Classification of mapping

  • Dynamic mapping
    We know that in a relational database, you need to create a database, and then create a data table under the database instance, and then insert data into the database table. In Elasticsearch, there is no need to implement defined mapping (mapping). When a document is written into Elasticsearch, the type is automatically identified according to the document field. This mechanism is called dynamic mapping.
  • Static mapping
    In Elasticsearch, a well-defined mapping can also be implemented, including the various fields and types of the document. This method is called static mapping.

Basic data type

  • String
    , string type contains text and keyword.
    text : This type is used to index long texts. Before creating the index, these texts will be segmented into word combinations and indexed. Es is allowed to retrieve these words, text cannot be used for sorting and aggregation.
    keyword : This type does not require word segmentation, and can be used to search, filter, sort and aggregate. The keyword type can only be searched by itself (fuzzy search after text segmentation is not available)

  • Numerical
    long, integer, short, byte, double, float

  • Date type
    date

  • Boolean
    boolean

  • Binary type
    binary

  • Array types
    Array datatype

Example

  • Create a document type and specify the type
	PUT /lnh2
	POST /lnh2/_mapping/stu
	{
    
    	
		"properties":{
    
    
			"age":{
    
    
				"type":"integer"
			},
			"sex":{
    
    
				"type":"integer"
			},
			"name":{
    
    
				"type":"text",
				"analyzer":"ik_smart",
				"search_analyzer":"ik_smart"
			},
			"home":{
    
    
				"type":"keyword"
			}
		}
	}

Insert picture description here

  • Get the mapping information of the specified type
    Insert picture description here
  • Get index information
    http://192.168.15.130:9200/lnh2/_settings
    Insert picture description here

Elasticsearch cluster management

  • How to solve high concurrency
    ES is a distributed full-text search framework that hides complex processing mechanisms, including fragmentation mechanism, cluster discovery, fragmentation load balancing, and request routing.
  • basic concept
 - cluster
代表一个集群,集群中有很多的节点,其中有一个为主节点,这个主节点是可以通过选举产生的,
主从节点是对于集群内部来说的。es的一个概念就是去中心化,字面上理解就是无中心节点,
这是对于集群外部来说的,因为从外部来看es集群,在逻辑上是个整体,你与任何一个节点的通信和
整个es集群通信是等价的。
 - shards
代表索引的分片,es可以把一个完整的索引分成多个分片,这样的好处是可以把一个大的索引拆分成多个,分布到
不同的节点上。构成分布式搜索。分片的数量只能在索引创建前指定,并且索引创建后不能更改。
 - replicas
代表索引的副本,es可以设置多个索引的副本,副本的作用一是提高系统的容错性,当某个节点某个分片损坏或
丢失时可以从副本中恢复。二是提高es的查询效率,es会自动对搜索请求进行负载均衡。
 - recovery
代表数据恢复或叫做重新分布,es在有节点加入或者退出时会根据机器的负载均衡对索引分片进行重新分配,挂掉的节点重新启动时也会进行数据恢复。
  • Core principle analysis
1. 每个索引会被分成多个分片shards进行存储,默认创建索引是分配五个分片进行存储。
每个分片都会分布式部署在多个不同的节点上进行存储,该分片成为primary shards。
主分片定义好后,后面不能做修改。原因是在查询的时候,根据文档id%主分片数量获取分片位置。
路由算法:shard = hash(routing)%主分片数量。
2. 每一个主分片为了实现高可用,都会有自己对应的备份分片,主分片对应的备份分片不能存放在同一台服务器
上。主分片primary shards可以和其他replicas shards存放在一个Node节点上。

假设主分片3 备份1 主分片3个,每个主分片对应1个备份分片=3*2=6
   主分片3 备份2 主分片3个,每个主分片对应的2个备份分片=3*3=9

Es cluster environment construction

  • Prepare three server clusters

    服务器名称    IP地址
    node-0      192.168.15.130
    node-1      192.168.15.131
    node-2      192.168.15.132
    
  • Server cluster configuration modification (each)

    vi elasticsearch.yml
    cluster.name: lnh  ###保证三台服务器节点集群名称相同
    node.name: node-0 #### 每个节点名称不一样 其他两台为node-1 ,node-2
    network.host: 192.168.15.130 #### 实际服务器ip地址
    discovery.zen.ping.unicast.hosts: ["192.168.15.130", "192.168.15.131","192.168.15.132"]##多个服务集群ip
    discovery.zen.minimum_master_nodes: 1
    
  • Start
    Start each es, close the firewall systemctl stop firewalld.service

  • Verification
    Visit: http://192.168.15.130:9200/_cat/nodes?pretty
    Insert picture description here
    * represents the master node

Note: If it is a cloned machine, please delete the data directory of each machine first . The data is in the installation directory by default. It will be present when es starts and can be configured.

This resource was gathered in the Ant Classroom, and after a review, it was said that a good memory is not as good as a bad pen.

Guess you like

Origin blog.csdn.net/qq_37640410/article/details/108723379