Interview questions
What is the working principle of ES writing data? How does ES query data work? What about Lucene at the bottom? Do you understand the inverted index?
Interviewer psychoanalysis
Ask this, in fact, the interviewer wants to see if you understand some basic principles of es, because using es is nothing more than writing data and searching data. If you don’t understand what es is doing when you initiate a write and search request, then you really are...
Yes, es is basically a black box, what else can you do? The only thing you can do is to use the es API to read and write data. If something goes wrong and you don't know anything, what can you expect from you?
Analysis of Interview Questions
es write data process
- The client selects a node to send the request, and this node is the
coordinating node
(coordinating node). coordinating node
The document is routed and the request is forwarded to the corresponding node (there is a primary shard).primary shard
Process the request on the actual node , and then synchronize the data to itreplica node
.coordinating node
If it is foundprimary node
and everythingreplica node
is done, the response result is returned to the client.
es read data process
It can doc id
be inquired through , and based doc id
on the hash, it will be judged doc id
which shard is allocated to at that time , and the shard will be inquired from.
- The client sends a request to any node and becomes
coordinate node
. coordinate node
Ofdoc id
Hash route, forwards the request to the corresponding Node, this time usinground-robin
a random polling algorithm , theprimary shard
random selection and a replica all, so that the read request load balancing.- The node that receives the request returns the document to
coordinate node
. coordinate node
Return the document to the client.
es search data process
The most powerful feature of es is to do full-text search, that is, for example, you have three pieces of data:
java真好玩儿啊
java好难学啊
j2ee特别牛
Copy to clipboardErrorCopied
You java
search based on keywords, and search for the contained java
ones document
. es will return to you: java is so fun, java is so hard to learn.
- The client sends a request to one
coordinate node
. - The coordinating node forwards the search request to the corresponding OR of all shards .
primary shard
replica shard
- Query phase: Each shard returns its own search results (in fact, some
doc id
) to the coordinating node, and the coordinating node performs operations such as data merging, sorting, and paging to produce the final result. - fetch phase: followed by the coordinator node according to the
doc id
various nodes to pull the actual ofdocument
data, eventually returned to the client.
The write request is written to the primary shard and then synchronized to all replica shards; the read request can be read from the primary shard or replica shard, using a random polling algorithm.
The underlying principle of writing data
Write to the memory buffer first, the data cannot be searched while in the buffer; at the same time, write the data to the translog log file.
If the buffer is almost full, or after a certain period of time, the data refresh
in the memory buffer will be transferred to a new segment file
one, but at this time the data will not enter the segment file
disk file directly , but first os cache
. This process is just that refresh
.
Every 1 second, es writes the data in the buffer to a new one segment file
, and a new disk file is generated every second segment file
, which segment file
stores the data written in the buffer in the last 1 second.
But if there is no data in the buffer at this time, then of course the refresh operation will not be executed. If there is data in the buffer, the refresh operation will be executed once per second by default and flashed into a new segment file.
In the operating system, disk files actually have something called os cache
the operating system cache, that is, before data is written to the disk file, it will first enter os cache
, first enter a memory cache at the operating system level. As long buffer
as the data in is refreshed by the refresh operation os cache
, this data can be searched.
Why is it called es in quasi real time ? NRT
, The full name near real-time
. The default is to refresh every 1 second, so es is quasi real-time, because the written data can only be seen after 1 second. Es through the restful api
or java api
, manually perform a refresh operation, the data buffer is manually into the brush os cache
, so that the data can be immediately found. As long as the data is input os cache
, the buffer will be emptied, because there is no need to retain the buffer, and the data has been persisted to disk in the translog.
Repeating the above steps, new data continuously enters the buffer and translog, and continuously writes the buffer
data one after another segment file
. Each time refresh
the buffer is cleared, the translog is retained. As this process progresses, translog will become larger and larger. When the translog reaches a certain length, the commit
operation is triggered .
The first step commit operation occurs, it is to buffer the data prior refresh
to os cache
go empty the buffer. Then, write one commit point
to the disk file, which identifies the commit point
corresponding all segment file
, and at the same time force os cache
all the current data fsync
to the disk file. Finally, clear the existing translog log file, restart a translog, and the commit operation is now complete.
This commit operation is called flush
. The default is automatically executed once in 30 minutes flush
, but if the translog is too large, it will also be triggered flush
. The flush operation corresponds to the entire commit process. We can manually execute the flush operation through the es api, and manually flush the data in the os cache to fsync to the disk.
What is the purpose of translog log files? Before you execute the commit operation, the data either stays in the buffer or the os cache. Both the buffer and the os cache are memory. Once the machine dies, all the data in the memory is lost. Therefore, the operations corresponding to the data need to be written into a special log file translog
. Once the machine is down and restarted again, es will automatically read the data in the translog log file and restore it to the memory buffer and os cache.
The translog is actually written to the os cache first. By default, it is flushed to the disk every 5 seconds, so by default, there may be 5 seconds of data that will only stay in the os cache of the buffer or translog file. If the machine is at this time If you hang up, you will lose 5 seconds of data. However, this performance is better, losing up to 5 seconds of data. You can also set translog so that each write operation must be directly fsync
to the disk, but the performance will be much worse.
In fact, you are here. If the interviewer did not ask you the question of es losing data, you can show off to the interviewer here. You said, es first is quasi real-time, and the data can be searched after 1 second. ; May lose data. There are 5 seconds of data that stays in the buffer, translog os cache, and segment file os cache, but not on the disk. If it goes down at this time, it will cause 5 seconds of data loss .
To sum up , the data is first written into the memory buffer, and then every 1s, the data is refreshed to the os cache, and the data can be searched in the os cache (so we said that es can be searched from writing to, and there is 1s in between. delay). Every 5s, write data to the translog file (so if the machine is down and the memory data is completely gone, there will be at most 5s of data loss), the translog is large to a certain extent, or every 30mins by default, the commit operation will be triggered and the buffer will be buffered. The data in the zone are flushed to the segment file disk file.
After the data is written into the segment file, an inverted index is created at the same time.
The underlying principle of deleting/updating data
If it is a delete operation, a .del
file will be generated when committing , and a certain doc will be marked as a deleted
status. Then when searching, you .del
can know whether the doc has been deleted according to the file.
If it is an update operation, the original doc is marked as a deleted
state, and then a new piece of data is written.
One buffer is generated every refresh segment file
, so by default it is one every 1 second segment file
, so it segment file
will come down more and more. At this time, merge will be executed regularly. Each time you merge, multiple doc will be segment file
merged into one. At the same time, the identified deleted
doc will be physically deleted , and then the new segment file
one will be written to the disk. One will be written here to commit point
identify all new ones segment file
, and then open segment file
for search. , And delete the old one at the same time segment file
.
Low-level lucene
To put it simply, lucene is a jar package that contains various encapsulated algorithm codes for establishing inverted indexes. When we develop in Java, we introduce the lucene jar, and then develop it based on the lucene api.
Through lucene, we can index existing data, and lucene will organize the data structure of the index on the local disk.
Inverted index
In a search engine, each document has a corresponding document ID, and the content of the document is represented as a collection of a series of keywords. For example, document 1 has been word segmented and 20 keywords are extracted, and each keyword will record the number of times it appears in the document and where it appears.
Then, the inverted index is the mapping of keywords to document IDs. Each keyword corresponds to a series of files, and keywords appear in these files.
Give a chestnut.
The following documents are available:
DocId | Doc |
---|---|
1 | The father of Google Maps jumps to Facebook |
2 | The father of Google Maps joins Facebook |
3 | Google Maps founder Russ leaves Google to join Facebook |
4 | The father of Google Maps quit Facebook related to the cancellation of the Wave project |
5 | Russ, the father of Google Maps, joins social networking site Facebook |
After word segmentation of the document, the following inverted index is obtained .
WordId | Word | DocIds |
---|---|---|
1 | 1, 2, 3, 4, 5 | |
2 | map | 1, 2, 3, 4, 5 |
3 | Father | 1, 2, 4, 5 |
4 | Job hopping | 1, 4 |
5 | 1, 2, 3, 4, 5 | |
6 | Join | 2, 3, 5 |
7 | Founder | 3 |
8 | Las | 3, 5 |
9 | go away | 3 |
10 | versus | 4 |
.. | .. | .. |
In addition, a practical inverted index can also record more information, such as document frequency information, indicating how many documents in the document collection contain a certain word.
Then, with the inverted index, search engines can easily respond to user queries. For example, the user enters a query Facebook
, the search system looks up the inverted index, and reads the documents containing the word from it. These documents are the search results provided to the user.
Pay attention to two important details of the inverted index:
- All terms in the inverted index correspond to one or more documents;
- The terms in the inverted index are sorted in ascending lexicographical order