How Elastic Search works

Interview questions

What is the working principle of ES writing data? How does ES query data work? What about Lucene at the bottom? Do you understand the inverted index?

Interviewer psychoanalysis

Ask this, in fact, the interviewer wants to see if you understand some basic principles of es, because using es is nothing more than writing data and searching data. If you don’t understand what es is doing when you initiate a write and search request, then you really are...

Yes, es is basically a black box, what else can you do? The only thing you can do is to use the es API to read and write data. If something goes wrong and you don't know anything, what can you expect from you?

Analysis of Interview Questions

es write data process

  • The client selects a node to send the request, and this node is the  coordinating node (coordinating node).
  • coordinating node The document is routed and the request is forwarded to the corresponding node (there is a primary shard).
  • primary shard Process the request on the actual node  , and then synchronize the data to it  replica node .
  • coordinating node If it is found  primary node and everything  replica node is done, the response result is returned to the client.

es-write

es read data process

It can doc id be inquired through  , and based doc id on the hash, it will be  judged doc id which shard is allocated to at that time  , and the shard will be inquired from.

  • The client sends a request to any node and becomes  coordinate node .
  • coordinate node Of  doc id Hash route, forwards the request to the corresponding Node, this time using  round-robin a random polling algorithm , the  primary shard random selection and a replica all, so that the read request load balancing.
  • The node that receives the request returns the document to  coordinate node .
  • coordinate node Return the document to the client.

es search data process

The most powerful feature of es is to do full-text search, that is, for example, you have three pieces of data:

java真好玩儿啊
java好难学啊
j2ee特别牛Copy to clipboardErrorCopied

You java search based on  keywords, and search for the contained  java ones  document . es will return to you: java is so fun, java is so hard to learn.

  • The client sends a request to one  coordinate node .
  • The coordinating node forwards the search request to the corresponding  OR of  all shards   .primary shardreplica shard
  • Query phase: Each shard returns its own search results (in fact, some  doc id ) to the coordinating node, and the coordinating node performs operations such as data merging, sorting, and paging to produce the final result.
  • fetch phase: followed by the coordinator node according to the  doc id various nodes to pull the actual of  document data, eventually returned to the client.

The write request is written to the primary shard and then synchronized to all replica shards; the read request can be read from the primary shard or replica shard, using a random polling algorithm.

The underlying principle of writing data

es-write-detail

Write to the memory buffer first, the data cannot be searched while in the buffer; at the same time, write the data to the translog log file.

If the buffer is almost full, or after a certain period of time, the data refresh in the memory buffer will be transferred  to a new  segment file one, but at this time the data will not enter the segment filedisk file directly  , but first  os cache . This process is just that  refresh .

Every 1 second, es writes the data in the buffer to a new one segment file  , and a new disk file  is generated every second segment file , which  segment file stores the data written in the buffer in the last 1 second.

But if there is no data in the buffer at this time, then of course the refresh operation will not be executed. If there is data in the buffer, the refresh operation will be executed once per second by default and flashed into a new segment file.

In the operating system, disk files actually have something called  os cache the operating system cache, that is, before data is written to the disk file, it will first enter  os cache , first enter a memory cache at the operating system level. As long  buffer as the data in is refreshed by the refresh operation  os cache , this data can be searched.

Why is it called es in quasi real timeNRT , The full name  near real-time . The default is to refresh every 1 second, so es is quasi real-time, because the written data can only be seen after 1 second. Es through the  restful api or  java api , manually perform a refresh operation, the data buffer is manually into the brush  os cache , so that the data can be immediately found. As long as the data is input  os cache , the buffer will be emptied, because there is no need to retain the buffer, and the data has been persisted to disk in the translog.

Repeating the above steps, new data continuously enters the buffer and translog, and continuously writes the  buffer data one after another  segment file . Each time  refresh the buffer is cleared, the translog is retained. As this process progresses, translog will become larger and larger. When the translog reaches a certain length, the commit operation is triggered  .

The first step commit operation occurs, it is to buffer the data prior  refresh to  os cache go empty the buffer. Then, write one  commit point to the disk file, which identifies the  commit point corresponding all  segment file , and at the same time force  os cache all the current data  fsync to the disk file. Finally, clear the  existing translog log file, restart a translog, and the commit operation is now complete.

This commit operation is called  flush . The default is automatically executed once in 30 minutes  flush , but if the translog is too large, it will also be triggered  flush . The flush operation corresponds to the entire commit process. We can manually execute the flush operation through the es api, and manually flush the data in the os cache to fsync to the disk.

What is the purpose of translog log files? Before you execute the commit operation, the data either stays in the buffer or the os cache. Both the buffer and the os cache are memory. Once the machine dies, all the data in the memory is lost. Therefore, the operations corresponding to the data need to be written into a special log file  translog . Once the machine is down and restarted again, es will automatically read the data in the translog log file and restore it to the memory buffer and os cache.

The translog is actually written to the os cache first. By default, it is flushed to the disk every 5 seconds, so by default, there may be 5 seconds of data that will only stay in the os cache of the buffer or translog file. If the machine is at this time If you hang up, you will lose  5 seconds of data. However, this performance is better, losing up to 5 seconds of data. You can also set translog so that each write operation must be directly  fsync to the disk, but the performance will be much worse.

In fact, you are here. If the interviewer did not ask you the question of es losing data, you can show off to the interviewer here. You said, es first is quasi real-time, and the data can be searched after 1 second. ; May lose data. There are 5 seconds of data that stays in the buffer, translog os cache, and segment file os cache, but not on the disk. If it goes down at this time, it will cause 5 seconds of data loss .

To sum up , the data is first written into the memory buffer, and then every 1s, the data is refreshed to the os cache, and the data can be searched in the os cache (so we said that es can be searched from writing to, and there is 1s in between. delay). Every 5s, write data to the translog file (so if the machine is down and the memory data is completely gone, there will be at most 5s of data loss), the translog is large to a certain extent, or every 30mins by default, the commit operation will be triggered and the buffer will be buffered. The data in the zone are flushed to the segment file disk file.

After the data is written into the segment file, an inverted index is created at the same time.

The underlying principle of deleting/updating data

If it is a delete operation, a .del file will be generated when committing  , and a certain doc will be marked as a  deleted status. Then when searching, you .del can know whether the doc has been deleted according to the  file.

If it is an update operation, the original doc is marked as a  deleted state, and then a new piece of data is written.

One buffer is generated every refresh  segment file , so by default it is one every 1 second  segment file , so it segment file will come down  more and more. At this time, merge will be executed regularly. Each time you merge, multiple  doc will be  segment file merged into one. At the same time, the identified  deleteddoc will be physically deleted , and then the new  segment file one will be written to the disk. One will be written here to  commit point identify all new ones  segment file , and then open  segment file for search. , And delete the old one at the same time  segment file .

Low-level lucene

To put it simply, lucene is a jar package that contains various encapsulated algorithm codes for establishing inverted indexes. When we develop in Java, we introduce the lucene jar, and then develop it based on the lucene api.

Through lucene, we can index existing data, and lucene will organize the data structure of the index on the local disk.

Inverted index

In a search engine, each document has a corresponding document ID, and the content of the document is represented as a collection of a series of keywords. For example, document 1 has been word segmented and 20 keywords are extracted, and each keyword will record the number of times it appears in the document and where it appears.

Then, the inverted index is  the mapping of keywords to document IDs. Each keyword corresponds to a series of files, and keywords appear in these files.

Give a chestnut.

The following documents are available:

DocId Doc
1 The father of Google Maps jumps to Facebook
2 The father of Google Maps joins Facebook
3 Google Maps founder Russ leaves Google to join Facebook
4 The father of Google Maps quit Facebook related to the cancellation of the Wave project
5 Russ, the father of Google Maps, joins social networking site Facebook

After word segmentation of the document, the following inverted index is obtained .

WordId Word DocIds
1 Google 1, 2, 3, 4, 5
2 map 1, 2, 3, 4, 5
3 Father 1, 2, 4, 5
4 Job hopping 1, 4
5 Facebook 1, 2, 3, 4, 5
6 Join 2, 3, 5
7 Founder 3
8 Las 3, 5
9 go away 3
10 versus 4
.. .. ..

In addition, a practical inverted index can also record more information, such as document frequency information, indicating how many documents in the document collection contain a certain word.

Then, with the inverted index, search engines can easily respond to user queries. For example, the user enters a query  Facebook , the search system looks up the inverted index, and reads the documents containing the word from it. These documents are the search results provided to the user.

Pay attention to two important details of the inverted index:

  • All terms in the inverted index correspond to one or more documents;
  • The terms in the inverted index are sorted in ascending lexicographical order

Guess you like

Origin blog.csdn.net/qq_27828675/article/details/115374940