How much data can Elasticsearch play with?

It is not meaningful to see how much data ES can play alone. In practice, it is often impossible to continue to increase the amount of data due to various business requirements. Major considerations include the following:

1. Inquiry speed. ES can support a variety of query types, such as single term matching, complex historm agg, and even text highlighting after bool query in parent-child document mode. The larger the amount of data, the longer the query time. If you just simply write the data in and get the data according to the ID, then just write the data in it.

2. Write speed. The larger the amount of data, the more likely the write speed will be affected. The business requires that 1 hour of data must be written within 1 hour. If this cannot be done, sub-indexing or sub-clustering must be considered.

3. Update speed. Same as above, update is more than a simple write operation, first get then merge and then overwrite to es.

4. Other factors.

At present, the ES cluster I have encountered needs to support high-concurrency highlighted queries with an average query of less than 500ms when there are 1.5T-2T indexes. In our scenario, this magnitude is not too small.

 

At present, there are 32 nodes in our project. The data is currently only TB level. Occasionally problems occur. Generally, the node link is abnormal due to network reasons. Others have not found any abnormality.

 

Netflex's public data last year said that there were more than 2,000 nodes in total, of course, multiple clusters. I personally rarely hear that companies in production really use a lot of nodes to
supplement the public use cases:

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326308312&siteId=291194637