Why use Elasticsearch

concept

Elasticsearch, or ES for short, is a distributed full-text search engine.
For example, the code retrieval of github uses ES, and Baidu also uses ES.
Since Lucene's API is relatively complex, ES is actually a simplified encapsulation of Lucene , and provides a relatively simple set of API documents.
es official website

scenes to be used

Mainly massive data retrieval, keyword: massive . Because only a large amount of data can reflect the advantages of ES.

The following are the scenarios I can think of or have encountered before:

1. Commonly used classic combination ELK (elasticsearch + Logstash + kibana) for log analysis and processing

elasticsearch: provides distributed storage of logs and full-text search
Logstash: transfer log
kibana: front-end display

2. Storage and search of massive data in the project

For example, if you want to make a background service similar to Weibo, just retrieve this piece, you can consider ES

3. Precaching of relational data in the project

If your project is using a relational database with the following characteristics, you can consider using ES as a layer of cache

  • Large amount of data, but the data structure is not too complicated
  • The query pressure is large, the query conditions are diverse and changeable

For example, for the e-commerce website, the interface for querying orders for reviewers and customer service personnel
requires quick query, and it can check all hundreds of millions of orders in history, and it is not necessarily based on order number. There may be many query conditions, similar fuzzy matching. I only use a relational database, I am afraid that I have to work hard, but if you use ES, it is very simple.

ES advantages

Support horizontal expansion
Support many plug-ins
actually say that the core advantage of full-text search should be Lucene, so if you want to dig deeper, it is recommended to learn Lucene first, such as lucene data and index storage, etc. also.
All in all, if ES is a sports car, then Lucene is the engine and the core. ES is just to make this engine useful, and really used it.

Similar framework solr

github used solr for full-text search before, but it was replaced by ES,
which basically marked that ES surpassed solr.
Later, few people heard that Solr is still used in the project, which is basically ES.

installation

Install single instance es

First of all, the es version has 1.x, 2.x, 5.x, 6.x, 7.x.
Use the latest ones, try to avoid using the ones before 5.x

https://www.elastic.co/cn/downloads/elasticsearch
enter the es official website to download and install (requires jdk environment, eg: 1.8)
page to visit localhost: 9100

Install visual page-Head plugin

Download: https://github.com/mobz/elasticsearch-head
requires a node environment, eg: 8.2

npm install install
npm run start

There are other visual plug-ins, basically download and install, you can use the configuration url, please search it yourself

Modify es configuration to solve cross-domain problems with head

Under the es config / elasticsearch.yml file,
add the following content at the end of the file

http.cors.enable: true
http.cors.allow-origin: "*"

Start the es and head
pages respectively to visit localhost: 9100

Install distributed cluster es

1 master node + 2 slave nodes

1.
The config / elasticsearch.yml file of the master node configuration es adds the
following content at the end of the file, indicating that the current service is the master node

# 集群名字
cluster.na zhangsan
# 节点名字为master
node.name: master
# 指定当前节点为master
node.master: true
# 绑定本地ip
network.host: 127.0.0.1

Start master
2. Slave node configuration
Copy the installation package information, as the slave
es config / elasticsearch.yml file,
add the following content at the end of the file, which means that the current service is a slave node

# 集群名字,这个必须和master节点的集群名字一致
cluster.na zhangsan
# 节点名字为slave01,多个节点就slave02,slave03...
node.name: slave01

# 绑定本地ip
network.host: 127.0.0.1
# 配置slave的端口,es默认端口是9200,我们让master是默认的9200
http.port: 9201
# 配置这个,为了能找到master
dicxovery.zen.ping.unicast.host: ["127.0.0.1"]

Start slave01, slave02

Some basic concepts of es

index

A collection of documents with the same attributes is
equivalent to a mysql database

Types of

The index can define one or more types, the document must belong to a type
equivalent to mysql table, there can be one or more tables in the database, a row of records must belong to a table

Doc

A document is a basic data unit that can be indexed,
equivalent to a row of records in mysql

Fragmentation

Each index has multiple shards, and each shard is a lucene index.
When creating an index, 5 shards are created by default
. The number of shards must be specified when creating the index, and cannot be modified later. It is
equivalent to mysql data sharding.

Back up

Copy a shard to complete the
shard backup. When creating an index, a backup is created by default. The
backup can be modified later. It is
equivalent to mysql data backup.

Integrated spring-boot

Es provides a set of restful API interface
spring-boot integration, and also through the code to build the parameters of each interface and then execute.

There are also installed word breakers and the like, this is actually the knowledge point of lucene.

Finally say

According to the online tutorial, you can quickly set up an ES service, and it is not difficult to integrate with the project.

The difficulty is to find out why ES can do full-text search. In fact, Lucene still cannot be bypassed. Learning the Lucene index is the top priority.

For applications, the point is to know what ES can do, what it is good at, and when to use it.

I have summarized: It is the full text search of massive data.

It is recommended to have this book to read: from Lucene to Elasticsearch full-text retrieval combat
Insert picture description here

Published 203 original articles · praised 186 · 210,000 views

Guess you like

Origin blog.csdn.net/java_zhangshuai/article/details/105039738