ES cluster operation principle

routing

When you index a document, which is stored on a separate master slice. How Elasticsearch know which documents belong to slice it? When you create a new document, how it is to know should be stored in slices 1 or 2 slices on it?
  The process is not random, because we want to retrieve documents in the future.
 
Algorithm decides:
  shard = hash(routing) % number_of_primary_shards
  routing value is an arbitrary string, which is the default, but can also be customized _id.
 
Why is the number of primary fragmentation can only be defined when creating an index and can not be modified?
  If the number of primary fragmentation change in the future, all the previous routing value becomes ineffective, the document will never be found.
All documents API (get, index, delete, bulk, update, mget) receives a routing parameter, it uses the mapping from a document to define the slices. Custom routing values ​​can ensure that all relevant documents - for example, belong to the same individual documents - are stored in the same slice.
 
Operation data node workflow
Each node has the ability to handle any request. Each node knows where any node in the document, so it can forward the request to the desired node.
New, indexing and deletion requests are written (write) operation, they must successfully complete in order to be copied to the relevant replication on the master slice slices.
  1. The client transmits to the new Node 1, index, or delete request.
  2. Node1 nodes belonging to the document to determine the document _id slice 0. It forwards the request to Node 3, the master slice slice 0 is located on the node.
  3. Node 3 performs a request on the master slice, if successful, it forwards the request to the appropriate node located on Node 1 and Node 2 of replication. When all nodes replicate the success of the report, Node 3 report success to the requesting node, requesting node and then reported to the client.
 
replication
  The default value is copied sync. This will cause the primary copy fragments obtained after fragmentation success response is returned.
  If you set replication to async, a request on the master slice is returned to the client after being executed. It still will forward the request to copy nodes, but you will not know replicate the success of the node.
  Above this option is not recommended. The default sync copy permission Elasticsearch force feedback transmission. async because of replication may be sending too many requests without waiting for the other fragments ready Elasticsearch overload.
 
Retrieval Process
Fragment can be from the main document or a copy of any retrieved fragment.
1. The client transmits to a Node 1 get request.
Node 2. Node 1 determines the document using the document belongs _id slice 0. Slice 0 has a corresponding slice in the copy three nodes. In this case, it forwards the request to Node 2.
3. Node 2 returns the document (document) to Node 1 then returned to the client.
   For a read request, in order to balance the load, the requesting node requests the different options for each slice - it copies all the parts of the cycle.
   It may be that an indexed document already exists in the primary on-chip division has not had time to sync to replicate fragments. Then copy the fragmentations will report document is not found, the main fragment will be successful return to the document. Once the index is returned to the user request is successful, the document and the copy sheet in the main and fragmentation are available.
 
 

Guess you like

Origin www.cnblogs.com/sx66/p/11886955.html