The principle of the distributed architecture Elasticsearch

Distributed architecture principles outlined in ES

This is a face questions

First, the basic concept of ES

Here and do a mysql analogy (just an analogy will help to understand it): the index index = mysql Database, type type = mysql the table, the document document = Column mysql row, the domain field = mysql.
When the index has been created to determine the number of primary fragments primary shard, and the main points is the number of pieces can not be changed, as well as to determine the number of sub-fragments replica shard of (this can be changed, so you can do the level of expansion ).
An index can have more than one master data fragments to store the index, each main fragment is only part of the data stored in the index, and a copy of the vice fragmentation is the main fragment, then why have so many of the main fragmentation and vice slice it? Mainly in order to enhance the throughput of the cluster and high availability.
Inverted index
summarizes what is inverted index sentence: with property values to determine the position of the recording, instead of recording the attribute value is determined.
Analogy:
the Java language is the best
php language is the most garbage
python is the best language to learn
(just an analogy, and not malicious)
then after word created inverted index:
Keywords article where ID
the Java 1
php 2
Python 3
is 1,2,3
best 1
most junk 2
best science 3
1, 2
languages 1,2,3
then you can search keywords to obtain the location where the article, which is called an inverted index

Two, ES distributed implementation

ES distributed implementation that white is to start multiple ES on multiple machines in the process, so the trip ES cluster.
Clusters
Please note that the details of the above, the primary and the secondary slice in the same slice is not a node (a machine unless you have only slightly), the allocation is automatically assigned cluster, do not control.
Multiple cluster nodes automatically elections master node, is responsible for maintaining metadata index, primary shard and a replica shard identity switch and so on. If the master goes down, that cluster on the re-election of a new master.
If one node goes down, the top of the main fragment is not gone yet? Then the master node to the sub-fragments corresponding to the level of the primary lifting fragment, in order to achieve high availability, fault repair over, the fragment which continues as a sub fragment is present, it is not appreciated Why primary and secondary fragmentation fragmentation can not be on the same node, right? Similarly, if the master goes down, then the new master were the main and auxiliary switches.
Said process also read and write operations with it:
write:
1, select any client node called a coordinator node, initiating a write operation
2, the coordinating node according to the document id, route calculation, forwards the request to a specific processing node
3, primary shard execution processing node write request, and the request after a successful replica shard distributed to each execution, such as after all the replica shard are successfully processed, and then inform the coordinator node to notify the client
of course can set the parameters, such as after a successful primary shard notify the client, but not necessary, because the ES itself is very fast
read:
1, any client node to select a call coordinator node initiates a read operation
2, coordinator node calculates the route, forwards the request to a specific processing node
3, the processing load balancing node, polling all fragments, the result returned to the coordinator node, and then returned to the client
reads search:
1, any client node to select a call coordinator node initiates a read operation
2, the coordinating node search request to all the nodes corresponding to the primary and Replica
. 3, all fragments will be found in their return to the coordinator node id, merging, sorting, and so paging by the coordinator node
4. Finally, according to a coordinator node id specifically pull data node, back to the client

Released six original articles · won praise 1 · views 7237

Guess you like

Origin blog.csdn.net/qq_34864092/article/details/103892023