Reading - How to achieve Ali-second one million TPS? Search offline reading large data platform architecture

Original Address https://mp.weixin.qq.com/s?__biz=MzIzOTU0NTQ0MA==&mid=2247488245&idx=1&sn=1c70a32f11da7916cb402933fb65dd9f&chksm=e9292ffade5ea6ec7c6233f09d3786c75d02b91a91328b251d8689e8dd8162d55632a3ea61a1&scene=21#wechat_redirect

 

TPS: system throughput per second request / number of transactions

  Search Offline: The various sources of data conversion, processed into a search engine such as "online" services system are collectively referred to as "off-line" system.

 

Other I really do not understand

Source https://mp.weixin.qq.com/s?__biz=MzIzOTU0NTQ0MA==&mid=2247488245&idx=1&sn=1c70a32f11da7916cb402933fb65dd9f&chksm=e9292ffade5ea6ec7c6233f09d3786c75d02b91a91328b251d8689e8dd8162d55632a3ea61a1&scene=21#wechat_redirect

★ Hbase based storage architecture

Search offline in about 2012, that is, the introduction of Hbase as the storage engine data and strong support for the entire development process of the search business from the main search platform Taobao to off-line, double 11 after several tests, stability and performance have been clearly verified . From a functional level, the reason for introducing Hbase search off mainly the following points:

  1. By Scan / Get can obtain data batch / single, you can import data batch / single through the bulkload / put, which is the full amount of search / incremental model completely consistent, natural search for support offline business.

  2. Based on the underlying storage HDFS, LSM-Tree of architecture to ensure data security, computing storage architecture that enables the separation of cluster size level of scalable, easy to improve overall throughput. Optimization (Async, BucketCache, Handler stratification, Offheap) expansion and clusters, ensuring substantial growth when business has never been a storage system through a single performance bottleneck.

  3. Free Schema situation characteristic can be a good business deal with frequently changing data, it is possible to easily support data logic of some special business scenarios.

By introducing Hbase as internal data storage system offline, we have successfully solved the problem of a lot of pressure on the upstream Mysql day at full volume, significantly enhance the overall throughput of the system. Data storage but also the entire amount to Hbase task transition (MR-> Stream) basis to the streaming process, and this was later Blink flow in the breeding and development of search engine offline foreshadowed.

Hbase course, is not without shortcomings, JVM memory management of chronic illness, single Handler played lead to avalanches, the lack of container deployment capability, also brought a lot of trouble, and soon we will replace Ali Hbase internal development of another set storage engine, expect to be able to solve these problems portions.

Guess you like

Origin www.cnblogs.com/0710whh/p/11028132.html