ElasticSearch search evolution at high flow

This is the mason (bysocket.com) of 27 to share the essence

ES (ElasticSearch) is a distributed search engine. Engine too obscure, in fact, similar to a MySQL, a memory. Facilitate the following features:

  • Near real-time search
  • Full-text search, structure search, statistical analysis

Then the data stored in the ES where they come from?

The answer is data synchronization. We recommend the following ways:

  1. Data transmission (Data Transmission) is a support data exchange between RDBMS (relational database), NoSQL, OLAP data sources and other data services provided by Ali cloud. [Ali]
    https://help.aliyun.com/product/26590.html

  2. To praise one hundred million orders exploration and practice synchronized [my brother to stay out of the team]
    https://mp.weixin.qq.com/s/33KACMxXkgzZyIL9m6q4YA

Return to the ES Evolution

A small flow stage

At that time in start-up companies, the full amount of sync every time, then you can look in the morning to run the task. ES CRUD or directly to data synchronization.

Pseudo single cluster can also be run. Specific full-text search ideas:

  • Based on "phrase match" and set the minimum match weight value
  • Where's the phrase, using the word IK tokenizer
  • Based Fiter implement screening
  • Pageable implement paging sorted based on

Specifically look at my blog and ES series GitHub.

Second, the large flow slowly

The order of magnitude is estimated one million / million data records and queries synchronization.

Can not be a pseudo-single cluster, and operation and maintenance level to resolve this amount:

  • ElasticSearch multiple instances running (node ​​Node) of the cluster assembly is ElasticSearch
  • Add more nodes to the cluster via the horizontal expansion

How the level of expansion

Create a master slice it has been identified in the index. The read operation can simultaneously be the master and the sub-fragment fragment processing. Therefore, more fragmented, it will have higher throughput. Naturally, the need to add more hardware resources to support throughput. Described herein can not improve performance, because fewer resources become available each slice. Dynamically adjusting the number of copies of fragments, clusters scale on demand, such as the number of copies the default value is 1 to 2:

PUT /blogs/_settings
{
"number_of_replicas" : 2
}

Basically a cluster Cluster mouth of each business Soso: orders, merchandise, etc.

7d048daf8363bdfe9ecf203febc39442.png

Third, the sudden surge of orders flow

Suddenly found a problem:

  • A cluster inside the big slow search index will affect other indexes A small cluster.

For example, the index is now orders the same big, slow investigation. Affecting other businesses. It should not be that way, you supposed to?

The answer is: the physical isolation for multi-cluster:

  • Divided into many clusters: Cluster orders, cluster commodities isolation
  • Multiple Computer Support

This time is often the origin of the problem: how businesses a single point of upgrading?

A project, the relevant data is stored items index. Magnitude of the project is growing, billion-middleweight, middleweight trillion. That a large index query of what will be a bottleneck. How this time to optimize it?

Solution: hot and cold separator; Split

Break large index, it is not hard. Similar slice routing rules can be specified according to the specific service.

Here, we can define the 1000 index, are named project_1, project_2, project_3 ...

Then a layer of simple proxy ES cluster in the top rack. Inside the core routing business rules can be:

project_id project increment ID
index_id get out of the index corresponding to the ID

index_id = project_id % 1000

dca0d9e6ea4a9cc7a58cd5e2738679f9.png

  • ES proxy layer: do the real total index and sub-index mapping
  • ES index configuration management: do mapping index of business
  • ES cluster

Hot and cold separator; are similar to the intermediate state is hottest independent data independent cluster index. Periodically delete data from the inside end state. Then the index is less data to support search search queries big thief. Why not.

  • Finish -

file

Guess you like

Origin www.cnblogs.com/Alandre/p/11130898.html