How Elasticsearch do one hundred million data queries millisecond return?

Read this article takes about 6 minutes.

If the interview encounter such a face questions: ES In the case of a large amount of data (billions of level) How to improve query efficiency?

This problem plainly, is to see that you have not actually used the ES, because what? In fact, ES performance is not so good as you think.

Many times a large amount of data, especially when there are hundreds of millions of pieces of data, you may be forced to find ignorant, how to run a search at 5 ~ 10s, father of the pit.

The first time search, is 5 ~ 10s, but back on the fast, probably a few hundred milliseconds.

You are very ignorant, for the first time each user access will be slower, more cards it? So if you do not play too ES, or just play with children their own Demo, asked this question easy ignorant force, showing that you really play too well on the ES?

To be honest, ES performance optimization is no silver bullet. What does it mean? Just do not look forward to readily adjust a parameter, you can deal with all of the Almighty slow performance scene.

Maybe some scenes you change parameters or adjust the syntax, you can get, but definitely not all scenes are like this.

Performance Optimization killer: Filesystem Cache

ES you go to write data actually written to the disk file gone, the query time, the operating system disk file will be automatically cached data to the Filesystem Cache inside.

ES search engine is heavily dependent on the underlying Filesystem Cache, if you give Filesystem Cache more memory, try to make memory can accommodate all of the IDX Segment File index data file, so when you search on the go basically memory, performance It will be very high.

How much can the performance gap? Many of our previous test and pressure test, if the disk to go on a general affirmation seconds, search performance is definitely the second level, 1 second, 5 seconds, 10 seconds.

But if it is to go Filesystem Cache, taking a pure memory, then the general disk performance than go to an order of magnitude, basically millisecond, ranging from a few milliseconds to a few hundred milliseconds.

Here is a real case: a company ES node has three machines, each machine looks like a lot of memory 64G, the total memory is 64 * 3 = 192G.

Each machine to ES JVM Heap is 32G, then the rest is left to each machine Filesystem Cache is only 32G, a total cluster to Filesystem Cache is 32 * 3 = 96G of memory.

By this time, the entire index data file on disk, on taking up a total of three machines 1T disk capacity, ES is the amount of data 1T, then the amount of data each machine is 300G. Such performance okay?

Filesystem Cache memory only 100G, one tenth of the data memory can be released, others are in the disk, and then you perform a search operation, most of the operations are taking disk, performance is certainly bad.

Ultimately, you have to let ES good performance, under the best of circumstances, is your machine's memory, you can fit at least half of the total amount of data.

Based on our own practical experience in production environment, under the best of circumstances, it is only exist in the ES small amount of data that you want to use those index search, if left Filesystem Cache memory is 100G, then you will index data control within 100G.

In this case, almost all of your data go to search for memory, performance is very high, generally less than 1 second.

For example, you now have one row of data: id, name, age .... 30 fields. But now you search, you only need to search by id, name, age three fields.

If you silly row of data is written to the ES in all fields, saying it will lead to 90% of the data is not used to search.

Results simply occupy space on the Filesystem Cache ES machines, the greater the amount of data of the data sheet, it will lead to less energy Filesystem Cahce cached data.

In fact, just write a few fields to be used to retrieve the ES can be, for example, is written es id, name, age three fields.

Then you can put the other field data exist MySQL / HBase, we generally recommend using ES + HBase such a framework.

HBase is characterized applicable to online storage of massive data, that is, HBase can write huge amounts of data, but do not do complex searches, do some very simple operation such a query based on the id or scope on it.

The name and age to search from the ES, the result might get id DOC 20, and then go id to HBase complete query data corresponding to each DOC id, to check out, return to the leading end according to the doc.

ES data writing preferably less than or equal, or slightly larger than the memory capacity of the ES Filesystem Cache.

Then you retrieved from the ES might spend 20ms, then under the id ES returned to HBase in queries, search data 20 may also spend a 30ms.

You may then play the original, 1T data is put ES, will each query is 5 ~ 10s, now may be the performance will be high, each inquiry is 50ms.

Data preheat

If you say, even if you do it in accordance with the above scheme, the amount of data written to each machine in the cluster ES or exceed the Filesystem Cache doubled.

For example, you write a machine 60G data, results Filesystem Cache to 30G, 30G or have data left on the disk.

In fact, you can do warm-up data. For example, take the micro-blog, you can put some big V, people usually see a lot of data, the system put forward earlier in the background.

Every moment, their back-office systems to search thermal data, Filesystem Cache brush to go behind the user actually look at the data when hot, they are searched directly from memory, and soon.

Or electricity supplier, you can usually see the most of some commodities, such as iPhone 8, the heat ahead of the background data to put forward a program, every minute access time on their own initiative, Filesystem Cache brush to go.

For data that you think is hot, often someone visits, it is best to do a special pre-cache subsystem.

Thermal data is from time to time, advance access and let the data into the Filesystem Cache inside. So the next time someone visited, the performance will be much better.

Separation of hot and cold

ES can do things like MySQL split level, that will access a large number of small, very low frequency data, write a single index, and then write a separate index to access the data very frequently hot.

The best is cold data is written to an index, and then write data to another heat index, which ensures data after being preheated hot, try to let them stay in the Filesystem OS Cache in, do not let the cold data to scour out.

Look, suppose you have six machines, two indexes, a data allowed to cool, a heat data, each index 3 Shard. 3 exotherm machine data Index, additional data allowed to cool machines 3 Index.

In this case, you are in a lot of time accessing hot data Index, thermal data probably 10% of the total amount of data, at this time the amount of data is small, almost all remain in the Filesystem Cache inside, you can ensure data access performance heat It is very high.

But for cold data, is in the other in the Index, the Index with thermal data is not the same machine, we have no touch with each other.

If someone visits the cold data, it may be a lot of data on the disk, in which case the performance handicap, 10% of people go to visit the cold data, 90% of people accessing hot data, it does not matter.

Document model design

For MySQL, we often have some complex associated with the query. How to play in the ES, the ES complex relationship inside the query try not to use, once with the performance generally not very good.

The best is the first in the Java system to complete the association, the association good data is written directly to the ES. When searching, you do not need to use to complete the search syntax ES associated search Join the like.

Document model design is very important, many operations do not only want to perform a complicated mess of various operations at the time of the search.

ES can support operations on so much, do not consider the use of ES do something bad it operated. If you really have that kind of operation, as far as possible when Document model design, when the write is complete.

In addition to some of the very complex operations, such as join / nested / parent-child search should be avoided, performance is poor.

Paging Performance Optimization

ES paging is less pit, so why then? To give an example, if you are 10 data per page, and you now want to query on page 100, in fact, before 1000 the data stored on each Shard will all be found on a coordinator node.

If you have five Shard, then there is 5000 data, then the coordinator node some of these 5000 data merge process, and then get to the final 10 on page 100 of the data.

Distributed, you have to check 10 on page 100 of the data, it is impossible to say from 5 Shard, Shard on each check 2 data, and finally to the coordinator node merged into 10 data, right?

You have to check all the data came from 1000 for each Shard, and then sorted according to your needs, screening, etc. operations, the last page again, to get the data inside on page 100.

You turn the pages, turn the deeper, the more each Shard data returned, and the longer the coordinator node processing, very pit father. So do pagination with ES, you will find more turn back, the more slowly.

We also encountered this problem before, with ES as paging, the first few pages to tens of milliseconds, turn to page 10 or dozens of times, basically going from 5 to 10 seconds to check out a page of data.

What is the solution? Depth does not allow paging (default depth paging poor performance). Contact product manager, said the system does not allow you to turn the page so deep, deeper default turn, the worse the performance.

App recommendation in similar commodities continues to drop down out page by page; like microblogging, microblogging drop-down brush, brush out page by page, you can use the Scroll API, on how to use their own Internet search.

Scroll will give you generate a one-time snapshot of all the data, and then flip back each slide is moved by cursor scroll_id, get the next page, next page like this, performance than the above said that paging performance is higher lot, basically millisecond.

However, the only thing is, this is suitable for the kind of pull-down flip of a similar microblogging and can not jump to any page of the scene.

That is, you can not enter the first page 10, then go to page 120, and then back to page 58 and can not bounce page.

So now many products, you are not allowed to randomly flip, App, there are some sites, you can only do is pull down, turn page by page.

Scroll initialization parameter must be specified, ES tell how long you want to save the context of the search. You need to ensure that users do not constantly flip turn a few hours, or it may fail because of a timeout.

In addition to using Scroll API, you can also do with search_after. search_after The idea is to use the results of the previous page of data to help retrieve the next page.

Obviously, this approach does not allow you to freely turn the page, you can only turn back a page. At initialization, the field requires the use of a unique value as the Sort field.

 

Author: Day parade flood programmers

 

 

Editor: Tao Jialong, Sun Shujuan

 

 

Source: https: //zhuanlan.zhihu.com/p/60458049

 

 

 

·END·

Programmers growth path

Although the road is far, certainly the line to

This paper originating in the "road programmer growth of" micro-channel public number of the same name, reply "1024" you know, give a praise chant.

Reply [520] receive the best programmers learning

Reply to [256] View Java programmers growth plan

 

Past wonderful review

 

 

 

Guess you like

Origin www.cnblogs.com/gdjk/p/10942262.html