How does Suning's 600 million members achieve rapid and accurate analysis?


With the rapid development of Suning's business, big data platforms are becoming more and more challenging to analyze massive business data, especially in precise deduplication and complex JOIN scenarios, such as user portraits, UVs, new and old buyers, retention, and lost users Wait.


image.png

Picture from Pexels


With the rapid development of Suning's business, big data platforms are becoming more and more challenging to analyze massive business data, especially in precise deduplication and complex JOIN scenarios, such as user portraits, UVs, new and old buyers, retention, and lost users Wait.


The current overall architecture of the Suning big data platform for OLAP analysis is to use Druid+ClickHouse for sequential data, PostGreSQL for non-sequential data, Hbase+phoenix for curing scenarios, and Elasticsearch for detailed data analysis.


Based on Druid, we have the HyperLogLog imprecise deduplication algorithm, which has a very excellent space complexity of O(m log2log2N). The space occupation does not change much with the growth of the base, but there is a certain deviation in statistics.


Based on other engines, our commonly used precision deduplication algorithm is generally count distinct operations after GROUP BY. GROUP BY will bring a lot of shuffle operations, occupy a lot of disk and IO, and its performance is relatively low.


The following will reveal the mystery of how Suning integrates RoaringBitmap for efficient and accurate deduplication architecture.


RoaringBitmap's application practice in Suning


Why choose RoaringBitmap


First, I will briefly introduce RoaringBitmap for everyone. The 32-bit RoaringBitmap is composed of the high 16-bit Key and the low 16-bit Value, and the Key and Value correspond one-to-one through the subscript.


The Key array is kept in order and stored in roaring_array_t, which is convenient for binary search. The lower 16 bits of Value are stored in the Container. There are three types of Containers.


RoaringBitmap has its own optimization strategy for which Container to create. Array Container is created when it is created by default or when the number of elements is less than 4096.


It is a dynamically expanding array, suitable for storing sparse data. When the maximum capacity of 4096 is exceeded, it will be automatically converted to a Bitmap Container.


When the element exceeds 4096, the size of the Array Container will increase linearly, but the memory space of the Bitmap Container will not increase, and it will still occupy 8 K.


The other is Run Container, which is only triggered when the runOptimize() method is called. It compares the space occupied with ArrayContainer and BitmapContainer, and then chooses whether to convert.


The storage size occupied by Run Container depends on the continuity of the data, and the upper and lower limits are [4 Bytes, 128 KB].


In recent years, big data technology has developed rapidly. Various open source technologies have brought great convenience to big data developers. Among the many technologies, RoaringBitmap was chosen because of its low storage space and high computational efficiency. .


The storage of RoaringBitmap is to identify the status by bit, and the storage is compressed. According to estimates, if Suning’s 600 million members are a conventional array, the storage space is about 2.2G, while RoaringBitmap storage only needs 66MB, which greatly reduces the storage space and reduces The cost of the business.


RoaringBitmap is performed by bit operations (such as AND, OR, ANDNOT, etc.), and its computing power is also quite amazing.


We have done a comparative test with count distinct based on PostGresql+Citus and found that RoaringBitmap's statistical time is nearly 1/50 of count distinct.


The native RoaringBitmap only stores plastic data, and the maximum data storage capacity of the 32-bit RoaringBitmap is 2147483647.


For members and the like, it is possible to use 64-bit RoaringBitmap for data such as orders and traffic. In terms of performance, 32-bit efficiency is better than 64-bit under the same conditions.


Suning has a large amount of business data and has a large number of offline and real-time computing tasks every day. The use of RoaringBitmap technology not only greatly saves storage costs, but also significantly improves computing efficiency.


Application scenario


①Calculation of member-related indicators


RoaringBitmap has many important application scenarios in the analysis of member-related indicators. For example, members' new and old buyers, retention, repurchase, and activeness all require accurate deduplication statistics.


Suning currently has 600 million members. The calculation of indicators such as new and old buyers is to compare current buyers with all historical buyers. How to quickly and accurately analyze the calculation results? Before the introduction of RoaringBitmap A bigger challenge.


②Accurate marketing


Pushing preferential products to target user groups to increase the company's sales is already a common precision marketing method used by e-commerce companies.


However, it is difficult to construct the crowd, to insight into the customer group, and to accurately place the advertisement in the massive user behavior log.


If the timeliness of the offline calculation solution cannot be guaranteed, the target customer may be lost during this period. In this scenario, it is particularly important to calculate the target population efficiently and accurately.


After the introduction of RoaringBitmap, a comprehensive and in-depth portrait of the audience is carried out in the massive data, and the advertisement is accurately placed, which finally helps the company to build a complete digital marketing closed loop.


RoaringBitmap based on PostgreSQL


Suning uses a distributed HTAP database PostgreSQL+Citus architecture scheme for analyzing massive amounts of non-sequential data.


We integrated RoaringBitmap and Citus, and integrated RoaringBitmap into the Citus cluster, and the specific manifestation is a bitmap column in the table, as shown in the following figure:

image.png

The following briefly introduces the scenario of using PostgreSQL+Citus+RoaringBitmap's technical architecture to implement new and old members of the buyer.


Data Dictionary


Before the storage and calculation of RoaringBitmap, we must first build a global dictionary table, which is a mapping relationship between the dimension value to be transformed and int or long.


Store this mapping relationship in the global dictionary table. The choice of 32-bit and 64-bit RoaringBitmap is based on the actual amount of data.


Process Design


The overall design process can be divided into three steps:

  • Model creation

  • Data intake

  • data analysis
  • image.png

Data model creation flowchart


The model creation process is as shown above:


模型的创建、数据初始化、以及查询我们采用的基于 Citus 的存储过程实现方式,经测试基于存储过程的方式比常用的 SQL 方式在性能方面有所提升。


分片表设计:模型中的元素是有维度、指标、bitmap 组成,Citus 目前支持三种类型的表,分别为本地表、参考表以及分片表,分别应用在不同的场景。
Citus 支持 Hash 和 Append 的方式进行分片,以新老买家为例,我们以会员的 member_id 进行 Hash 分片。


分片表设计的不仅解决了 T 级别的数据存储问题,也可以根据分片进行并行计算最后再汇总,提高计算效率。


Cube_bitmap 表的创建是基于模型的,在后台我们有收集用户的查询方式,根据采集的样本数据我们会根据 Cost 自动的创建 Cube 用于加速。


数据分析的数据源我们会根据 Cost 计算从预计算结果、Cube_bitmap 或模型 bitmap 表中获取。

image.png

 数据摄入流程图


数据摄入流程如上图:


数据字典同步:全量和增量的模型摄入时候需要同步更新全局字典表。


模型 bitmap 表逻辑处理(以会员为例):


第一步:模型表和字典表通过设置的业务主键 Key 进行关联。


第二步:获取模型增量维度对应的会员 bitmap 数据信息,可根据 rb_or_agg(rb_build(ARRAY [b.id :: INT])) 获取 。


第三步:将模型 bitmap 表里当天的 (flag=1) 和前一天 (flag=2) 统计的 bitmap 数据进行 rb_or_agg(bitmap) 操作,数据整合后作为当天的 flag=2 数据插入到 bitmap 表中。


第四步:日全量统计表只有 flag+statis_date+bitmap 字段,主要统计当天的用户和历史用户 bitmap 情况,统计 flag=1 的当天 bitmap 数据。


模型 bitmap 表与会员表进行关联 bitmap 取 rb_or_agg(rb_build(ARRAY[b.id :: INT]))。


第五步:日全量统计表统计 flag=2 的当天 bitmap 数据,从自身表中获取当天 flag=1 和昨天统计的 flag=2 的数据然后做 rb_or_agg(bitmap)。


Cube_bitmap、预聚合结果表的源来自于数据模型表,在此基础上做加速处理。image.png

数据查询流程图


数据分析如上图:


根据要查询的维度进行 Cost 分析判断,最终路由到预计算结果表、Cube_bitmap 表、模型表进行数据分析。


从模型 bitmap 表或 cube_bitmap 表获取 bitmap_cur 和 bitmap_sum,从全量 bitmap 表中获取 bitmap_all 数据(flag=2 并且日期是查询日期的前一天)。


后续的 bitmap 位运算可在 bitmap_cur、bitmap_sum 和 bitmap_all 中进行。


应用举例


①业务场景


业务场景如下图:

image.png

image.png

②设计方案


第一步:将买家的 ID 作为数据字典的信息,与对应的 int 或 long 形成关系映射存入全局字典表。


第二步:统计每天的线上、线下的新老买家,统计维度根据渠道(线上和线下)+tag(1 当天 2 历史)+日期。


每天有两条统计信息,一个是当天的用户买家 bitmap 集合,一个是历史的用户买家 bitmap 集合。


第二天统计基于第一天统计的集合和当天的集合做 rb_or_agg,形成一个新的当天历史 bitmap 集合(结果存储在 Bitmap_Table_A)。


第三步:基于统计维度(品类+渠道)+tag+日期来统计新老买家情况,每天也会有两条统计信息,一个是当天的一个是历史的,当天统计的是所有的品类和渠道做的 group by 统计,统计 bitmap 集合打上标签为 flag=1,历史 flag=2 是基于前一天历史加上当天统计的集合做 rb_or_agg,形成一个新的当天历史 bitmap 集合(结果存储在 Bitmap_Table_B)。


③场景分析


场景一:0428 线上新买家image.png

统计 0428 线上新买家实则就是 bitmap 集合 {A,D} 和 bitmap 集合 {A,C} 进行 rb_andnot_cardinality 位运算,结果为 {D},新买家的数量为 1。


场景二:0428 线上空调新买家image.png

统计 0428 线上空调新买家则就是 bitmap 集合 {C ,A} 和 bitmap 集合 {C} 进行 rb_andnot_cardinality 位运算,结果为 {A},新买家的数量为 1。


0428 线上冰洗新买家则是 bitmap 集合 {D} 和 bitmap 空集合做 rb_andnot_cardinality 位运算,结果为 {D},数量为 1。


场景三:0428 线上空调新买家中有多少是线上新买家


统计则根据和 Bitmap_Table_A 和 Bitmap_Table_B 做 rb_and_cardinality 操作,则拿 bitmap 集合 {A} 和 bitmap 集合 {{A,C}} 进行 rb_andnot_cardinality 位运算,结果为空集,数量为 0。


0428 线上冰洗新买家则根据 bitmap 集合 {D} 和 bitmap 集合 {A,C} 进行 rb_andnot_cardinality 位运算,运算结果 bitmap 集合为 {D},数量为 1。


0428 线上新买家品类分布即为:基于 Bitmap_Table_B 表,0428 线上品类有冰洗 {D} 和空调 {A},基于 Bitmap_Table_A 表统计线上历史买家为 {A,C}。


线上新买家冰洗则拿 {D} 和 {A,C} 做 rb_andnot_cardinality 后的集合为 {D},数量为 1。


线上新买家空调则是拿 {A} 和 {A,C} 做 rb_andnot_cardinality 后的集合为空集,数量为 0。


不足与挑战


基于 PostgreSQL+Citus 的 RoaringBitmap 技术方案,bitmap 集合之间的位运算性能表现的较为卓越,但在很多业务场景需要高基数的 bitmap 集合进行位运算。


基于 Citus 我们分析发现,在位运算的时候 CPU 利用率处于低位,后期我们也针对基于 Citus 做了优化。


如 bitmap 下压到 Work 运算降低 CN 运算量,创建 cube 降低基数,在一定的程度了提高了效率,然在 Ctius 下的 CPU 始终没有得到充分利用。


ClickHouse 的并发 MPP+SMP 这种执行方式可以很充分地利用机器的集成资源,但当时看了 ClickHouse 还没有提供 bitmap 相关的接口,不能直接加以应用,如何将 RoaringBitmap 融合到 ClickHouse 是一个挑战。


RoaringBitmap 与 ClickHouse 的整合


在计算引擎中 ClickHouse 算是后起之秀,是一个列导向数据库,原生的向量化执行引擎,其存储是采用 Wired Tiger 的 LSM 引擎。


At present, Suning's big data has introduced and transformed ClickHouse, and developed a related RoaringBitmap interface to support interactive business queries.


The calculation process of the RoaringBitmap solution based on ClickHouse is greatly simplified, and the IO, CPU, MEM, and network resources of the query are significantly reduced, and it does not increase with the scale of the data.


Based on ClickHouse, we have developed RoaringBitmap related interfaces, and the supported Function functions are:

  • bitmapBuild

  • bitmapToArray

  • bitmapMax

  • bitmapMin

  • bitmapAnd

  • bitmapOr

  • bitmapXor

  • bitmapAndnot

  • bitmapCardinality

  • bitmapAndCardinality

  • bitmapOrCardinality

  • bitmapAndnotCardinality 


They are used to support calculations in various scenarios, and their related interface development is still in continuous improvement.


Future outlook


In order to promote the RoaringBitmap solution based on ClickHouse to more businesses and scenarios of the company, we are constantly optimizing and improving.


Currently working on the following attempts:

  • ClickHouse currently does not support 64-bit bitmaps. It is trying to partition by hash value. Each partition is calculated separately. The partitions can be superimposed horizontally to support long length easily.

  • The global dictionary table is costly to build under high cardinality, occupies more resources and time-consuming. The data dictionary table can be reused to the greatest extent according to the business scenario in the future. At the same time, consider the segment of this column when cross-segment aggregation is not required. Dictionary substitution.

  • The complete link monitoring can perform time-consuming analysis of each link according to query_id, which is convenient for optimization and problem location.


Guess you like

Origin blog.51cto.com/14410880/2545887