ElasticSearch (two) ElasticSearch additions and deletions to change search in the works

ElasticSearch create the index / write data Principle

  1. The first step when the client requests ElasticSearch, since the upper ElasticSearch each node keeps the metadata fragment, the metadata node on which recorded data is stored in the fragment.
  2. When the client sends a second step of creating the cluster index ElasticSearch write data requested to the node by forwarding the request routing algorithm to a corresponding node.
  3. A third step of receiving the request will first node Document written to the cache memory and written to simultaneously Document translog, if the request is processed successfully on the primary slice, then the request is sent to the parallel slices translog on the copy and synchronize to slice a copy, until translog all synchronized to a master slice and on all the copies, the client will receive a confirmation notice.
  4. Step four memory cache periodically (default 1 second) will be written to the Document segment file (segment files), that is every second will generate a new segment file, this time a new segment immediately written to (write there will be a consolidation segment files before into the process, the default will merge three similar sized segment files into one file segment) cache file system (not actually written to the file system), then the cache memory will be empty, data in the file system cache can be searched.
  5. A fifth step translog, (every 5 seconds to flush data to disk), or when reaching a threshold for every 30 minutes, the Flush operation occurs, first executes commit operation, all data written to the memory segment file, and then forcibly file system cache data is written to disk, and then empty the translog data, this completes the persistent data.
  6. After the file system cache data Step six is ​​lost, it can be recovered according to the last data recorded after the refresh translog inside the store.
  7. In order to ensure Translog files themselves without data loss problems, when each of the index operation, will trigger Translog refresh operation, Translog flushed to disk, will want to send the client sends a status code 200, request to ensure the success of data, and therefore reduces the performance a bit.Here Insert Picture Description

ElasticSearch query index principle

ES query data requires two steps, the first phase of a query, the results of the so-called lightweight inquiry stage is the query file is returned to the coordinator node, then the harmonization of sorting node, and then get the document into the real stage. The second stage is to obtain, through the so-called acquisition phase is sorted coordinator node, a priority established by the document, the process to get a real slice of the document.
Inquiry stage, when we send a request to the ES cluster, there will be a node receives this request, the request received this node will become a coordinator node, the coordinator node based routing algorithm, it forwards the request to the corresponding all the slices, slice independent perform a search, and create a priority queue according to relevance scores, then all fragments will return the results to the coordinator node lightweight, this result contains Id required documents and ordering information, then the coordinator node receives all the result of the segment to return, and to document the global sort, query phase has been completed so far.
Acquisition phase, the coordinator node globally ordered after the document, generate a global list of documents, he would request the original document from all the slice, so slice will continue to enrich the document is returned to the coordinator node, the coordinator node returns the data to the client end.
Here Insert Picture Description

ElasticSearch update delete indexes principle

In ElasticSearch document can not be changed, so it updates the index and dropping indexes also writes, that is, do not change the original index.
ElasticSearch, each segment will correspond to a .del file, when we request to delete a document, the document was not really deleted, but is marked for deletion in the .del, but the document can still be indexed to the , but it is filtered out in the search results. When the trigger segment of the merger, the segment is marked for deletion of files will not merge the new segment file so that the document will be completely deleted.
When updating the document, too, when a document is inserted when, ES will automatically generate a version number to him, when the document is updated, the old version of the document will be marked for deletion, the new document will write new segment file, wait until the segment is flushed to disk when the update is complete.

ElasticSearch routing principle

Fragment Shard

We all know ElasticSearch distributed search engine, then it is how to do it distributed? Then we need to introduce the concept of a Shard (fragmentation) of, Shard (fragmentation) is divided into Primary Shard and Replic Shard. The so-called Shard (fragmentation) is a container for storing data, when the request is inserted into an index when the container (Primary Shard) has been created, the default is 5, then these containers (Primary Shard) evenly scattered in ES on each node. So it's distributed in principle this was.

Shared route

When the start data is inserted, ES which will be stored in the container (Primary Shard) Document routing algorithm according to the request. Specific routing algorithm hash(routing)%number_of_primary_shards. routing就是Document的_id,number_of_primary_shard就是Primary的分片数据量. Example, Index has a three Primary Shard respectively P0, P1, P2. routing = 1, then hash (1)% 3 = 0 , then P0 Document will be assigned to this Shard up. Query is this principle.

Published 94 original articles · won praise 55 · views 110 000 +

Guess you like

Origin blog.csdn.net/Suubyy/article/details/85144377