HBase deft one hundred billion application and practice in the user characteristic data analysis

Disclaimer: This article is an original PPT share from Hbase technology communities, individuals made finishing and refining.
Attention Kazakhstan, PPT meeting this kind of thing can learn to experience more of a technical solution and others in the course of practice. I hope everyone has to help.

background

Deft generate tens of billions of user characteristics data every day, analysts need hundreds of billions of features across the 30-90 days of data, select any combination of multiple dimensions (eg: Beijing city = & sex = male), second-stage analysis of user behavior. In response to this demand, deft based on independent research and HBase support bitmap conversion, storage, indexing, analysis services --BitBase fast computation, retain and successfully applied to the analysis, subscriber growth, advertising and marketing, ABTest other business scenarios.

Business needs and challenges

Quick worker needs encountered in actual business, we need to use business scenarios: in one hundred billion level of the log, select any dimension, calculate the 7-90 day user retention, return to the second level.

file

Technology Selection

To this end, deft research program includes a variety of techniques Hive, ES, clickhouse including.

file

Technical solutions

Finally, the formation of BitBase solution is based on a bitmap and Hbase.

file

Of the bitmap are not familiar with the students to see here: https://www.jianshu.com/p/bf9dbbc147ed

It is called Bit-map with one bit corresponding to an element to mark Value, while Key that is the element. As a result of Bit units to store data, can greatly save storage space.

Finally, multidimensional calculation is designed to do and between bitmap, OR, NOT, XOR, count, list calculations.

Whole BitBase

Overall structure:

file

Memory module:

file
Here all the original information table will exist in a bitmap, specifically there are different bitmap data, the bitmap bits determined based on the table data size.

Calculation module:

file

deviceId problem

file

In practical problems, the complex is converted into a deviceId index (long) value. And the need to have the following characteristics: continuous, uniform, reverse solution, the conversion speed.

file

Continuous, consistent, inverse solution of technical solutions

file

How to achieve rapid transformation

file

Business results

In practice, latency, retention time of 90 days may be returned within 10 seconds.
file

Service Status:
file

future plan

Future plans include:

  • Offline bitmap can be introduced in 5min
  • SQL support
  • Open source

file

Disclaimer: This article numbers for all except otherwise specified, all original and the public have a priority right to read the reader number, shall not be reproduced without the author allows, otherwise pursue tort liability.

I am concerned about the number of public, backstage reply [JAVAPDF page 200] get questions!
50,000 people of concern to large data path of God, do not come to know about it?
Road 50000 Big Data concern to God, do not really look at it?
50,000 people of concern to large data path of God, do not really determined to learn about it?

Welcome your interest in "big data into the path of God."

Big Data technology and architecture

Guess you like

Origin www.cnblogs.com/importbigdata/p/11845651.html