Disclaimer: This article is an original PPT share from Hbase technology communities, individuals made finishing and refining.
Attention Kazakhstan, PPT meeting this kind of thing can learn to experience more of a technical solution and others in the course of practice. I hope everyone has to help.
background
Deft generate tens of billions of user characteristics data every day, analysts need hundreds of billions of features across the 30-90 days of data, select any combination of multiple dimensions (eg: Beijing city = & sex = male), second-stage analysis of user behavior. In response to this demand, deft based on independent research and HBase support bitmap conversion, storage, indexing, analysis services --BitBase fast computation, retain and successfully applied to the analysis, subscriber growth, advertising and marketing, ABTest other business scenarios.
Business needs and challenges
Quick worker needs encountered in actual business, we need to use business scenarios: in one hundred billion level of the log, select any dimension, calculate the 7-90 day user retention, return to the second level.
Technology Selection
To this end, deft research program includes a variety of techniques Hive, ES, clickhouse including.
Technical solutions
Finally, the formation of BitBase solution is based on a bitmap and Hbase.
Of the bitmap are not familiar with the students to see here: https://www.jianshu.com/p/bf9dbbc147ed
It is called Bit-map with one bit corresponding to an element to mark Value, while Key that is the element. As a result of Bit units to store data, can greatly save storage space.
Finally, multidimensional calculation is designed to do and between bitmap, OR, NOT, XOR, count, list calculations.
Whole BitBase
Overall structure:
Memory module:
Here all the original information table will exist in a bitmap, specifically there are different bitmap data, the bitmap bits determined based on the table data size.
Calculation module:
deviceId problem
In practical problems, the complex is converted into a deviceId index (long) value. And the need to have the following characteristics: continuous, uniform, reverse solution, the conversion speed.
Continuous, consistent, inverse solution of technical solutions
How to achieve rapid transformation
Business results
In practice, latency, retention time of 90 days may be returned within 10 seconds.
Service Status:
future plan
Future plans include:
- Offline bitmap can be introduced in 5min
- SQL support
- Open source
Disclaimer: This article numbers for all except otherwise specified, all original and the public have a priority right to read the reader number, shall not be reproduced without the author allows, otherwise pursue tort liability.
I am concerned about the number of public, backstage reply [JAVAPDF page 200] get questions!
50,000 people of concern to large data path of God, do not come to know about it?
Road 50000 Big Data concern to God, do not really look at it?
50,000 people of concern to large data path of God, do not really determined to learn about it?