Advertising architecture and optimized inverted index

Inverted index architecture

In the ad inverted index system plays a vital role when the request came, you need to match the right ads from inverted index based on targeting information. Our inverted index uses ElasticSearch (later referred to as ES), consider the point is active in the community, the relevant acquisition, visualization, monitoring, and alarm and other components more perfect, while ES java-based development, so the tuning is relatively easy and secondary development

Look at our Chart inverted index

The architecture is designed to figure above this, and think through the following iterative

Indexing problems and Optimization

Single point of stability

Multi-node deployment

A builder and wherein B builder are two nodes, a primary and a backup, they are determined by competition lock (implemented ZooKeeper) who is the primary

A plurality of nodes bring data inconsistencies

And more and more consumers have a news producer timing issues

The message is provided to the stateless

Query the database to obtain the latest data (orders and creative update frequency is low, so the pressure is not on the database)

Because abnormal resulting in inconsistent data

Retry using (idempotent) and task processing timing of abnormality

The full amount of the index is updated, the impact of online index search function

Used standby Index

Standby switching process Index: Index Update Standby -> Standby verification Index -> standby switching -> main index update

Index query optimization and indexing problems with reconstruction

ES QPS pressure measurement is not high, high CPU load, YGC frequent, time-consuming rebuild the index index

We were from two directions queries and reconstruction of view

Inquire

1s once YGC, STW about 10ms, a greater impact on low-latency system

Adjust -Xmn 3g-> 7g, after adjustment 10s once YGC, STW about 12ms

Before adjustment YGC frequent, low-latency greater impact on the system, so I want to increase the YGC intervals, reduce jitter performance, taking into account the YGC using replication algorithm, each time garbage collection includes scanning the young generation to survive and replicate live objects, the case is much lower than the cost of the scanned object copy objects, so YGC time depends on the number of live objects, there is no major changes in the object life cycle, YGC time naturally there will be major changes

After the adjustment, YGC time interval has been greatly improved, GC time did not increase linearly

Adjustment and number of copies fragmentation, loss of threads reduced, less IO

ES default number of fragments is 5, the default condition, the index will be assigned to different nodes so that each node has only part of the index, cause a request for data to merge a plurality of nodes, the number of the IO multiple

As shown, if there are three nodes, two main fragments, a copy of each fragment. When a query over time

Query process is roughly: First node3 receives the request, it may put forward the request to the R0 or P0 node2 node1, and then after the completion of the collection of data to retrieve node3, finally returned. Wherein the interior of each index, the data will be saved in multiple segment, the segment is a query of a serial

Our request scenario is large, the index is small (less than 100M), so the master slice was adjusted to 1, a copy of the adjusted number of nodes -1 This ensures that each node stores all index, this will only once io operation, as shown in FIG.

ES (lucencu) serial read all segment

Index update will increase the number of segment, es queries on segment is serial, so we use every minute timed segment will be reduced to 1 with _forcemerge

The method of investigation found that hot JSON deserialization representing 50% cpu

Disable source using only the necessary field storage field

Specify the query favor of this node

Set preference: _local

reconstruction

Closed before the full amount of reconstruction from fragments, disable real-time indexing

replicas:0 refresh_interval:-1

Reduce the consumption index in the reconstruction process to bring the index synchronization

Batch rebuild the index

Use bulk batch rebuild indexes to improve the performance of construction index

postscript

We use the program, some are not common in the industry and in line with the recommended way, but in line with our own business, so be sure to plan for their own team of business, not the best solution, only more suitable program