How ElasticSearch works

ElasticSearch, like Solr, is an enterprise-level search engine with high reliability based on Apache Lucene at the bottom.

Some concepts in ElasticSearch actually correspond to relational databases. For example, databases are called indexes in ES, and tables are called Types in ES.

The specific corresponding relationship is shown in the following table.

How ElasticSearch works

Replica in ElasticSearch means copy. There are two advantages to creating a copy. 1. Partial query requests can be offloaded. 2. If a shard in the cluster is lost, you can use this copy to retrieve all the data. For this reason, replica shards and source shards are not placed on the same node. Each index in ES can be divided into multiple shards, but not necessarily every shard has a copy, but once a copy is created, there will be a primary shard (the shard as the source of replication), the shard and the number of replicas can be specified at index creation time. The following figure is a schematic diagram of replicas and shards. A shard and its replica will not be on the same node.

How ElasticSearch works

You can dynamically change the number of replicas at any time after the index is created, but you cannot change the number of shards afterwards. By default, each index in Elasticsearch is sharded with 5 primary shards and 1 replica, which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and An additional 5 replicas, so a total of 10 shards per index.

When a node of ES starts, it will find other nodes in the cluster by broadcasting and establish a connection.

In a cluster, one of the nodes is selected as the master node, which is responsible for managing the cluster state. This master node is transparent to the user, and the user does not need to know which node is the master node. Any operation can be sent to any node. When necessary, any node can send subqueries to other nodes in parallel, and combine the responses and send them to the user, all of which do not require access to the master node.

The master node reads the cluster information. During the reading process, it will detect the situation of the shards, which shards are the primary shards and are available. After this step, all the shards are ready, and the replicas Not yet. The next step is to find those shards that have been replicated and use them as replicas. If all goes well, then ES started successfully and all shards and replicas are ready.

When ES is working, the master node will monitor whether all nodes are normal. The default configuration is: the node will send a heartbeat every 1s, the timeout time is 30s, the number of tests is 3 times, and if it exceeds 3 times, it is considered that the The node has detached from the master node. If there is a problem with a node, ES thinks the node is broken, the node will be removed from the cluster, and ES will rebalance the entire cluster.

ES queries data through Query DSL (json-based query language). Inside ES, each query is divided into two steps, dispersion and aggregation. Scattering refers to querying all related shards, and aggregation refers to querying all shards. The results are merged, sorted, processed and then returned to the client.

ElasticSearch has 4 ways to build a database. The easiest way is to use the index API to send a Document to a specific index, usually through curl tools. The second and third methods are through bulk API and UDP bulk API. The difference between the two is only in how they are connected. The fourth way is through a plugin-river. river runs on ElasticSearch and can import data from external databases into ES. It is important to note that data construction only happens on shards, not replicas.

Vagrant and Docker: How to Install and Setup Postgres, Elasticsearch and Redis on OS X http://www.linuxidc.com/Linux/2014-09/106898.htm

Distributed search ElasticSearch stand-alone and server environment constructionhttp : //www.linuxidc.com/Linux/2012-05/60787.htm

Detailed introduction of ElasticSearch : please click here

The download address of ElasticSearch : please click here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326418308&siteId=291194637