JD Logistics × StarRocks: Building Udata, an integrated service analysis platform

Author: Zhang Dong, data expert of JD Logistics Group

 

Jingdong Group began to build its own logistics in 2007. In April 2017, Jingdong Logistics Group was officially established. In May 2021, Jingdong Logistics was listed on the main board of the Hong Kong Stock Exchange. JD Logistics is a leading technology-driven supply chain solution and logistics service provider in China. With the mission of "technology-driven, leading global efficient circulation and sustainable development", it is committed to becoming the world's most reliable supply chain infrastructure service provider.

Based on underlying technologies such as 5G, artificial intelligence, big data, cloud computing and the Internet of Things, JD Logistics has built a comprehensive intelligent logistics system to realize service automation, digital operation and intelligent decision-making. As of December 31, 2021, JD Logistics operates a total of 43 "Asia No. 1" large-scale smart warehouses across the country. By 2021, JD Logistics has owned and is applying for more than 5,500 technical patents and computer software copyrights.

In the process of digital operation and intelligent decision-making of JD Logistics, there are more and more business demands for real-time operation analysis, and problems such as data islands, low query performance, difficult operation and maintenance, and low development efficiency in the original platform architecture have become increasingly prominent. In this context, JD Logistics has built a Udata unified query engine based on StarRocks ' federated query solution , which efficiently solves many pain points in data service and data analysis, greatly reduces development, operation and maintenance costs, and solves the problem of inconsistent query engines and data islands. Let analytics and services no longer be separated.

 

The pain points of the original data application

Data service and data analysis scenarios are the two major directions of data application. Data practitioners may encounter the following problems:

data service

  • Chimney-style development model: For each requirement, a data service is developed, and the data service cannot be reused, difficult to platform, and technically impossible to accumulate.

  • Service maintenance is difficult: When a large number of data services are developed, post-maintenance is a big problem. Especially during the 618 and Double 11 promotions, it is very painful for one person to maintain hundreds of data services without a unified operation and maintenance monitoring, current limiting and downgrading, and business disaster recovery solutions, which also causes great security. hidden danger.

  • Large business demands: Data development students are often bound by a large number of repetitive and boring data service development, and spend a lot of time in business data service development.

data analysis

  • Difficulty in finding data: It is difficult for users to find what they want. Even if they find indicators or data with similar names, they cannot be used directly because the indicators are not clearly defined and uniform.

  • Difficulty in using data: Since data is currently distributed in various systems, users cannot use one system to meet all data needs. In particular, front-line operators need to do data analysis by exporting a large number of Excel tables from various systems, which is time-consuming and labor-intensive, and also causes data security risks.

  • Slow query: With traditional OLAP engines, it often takes several minutes for users to run SQL to get results, which greatly reduces the efficiency of analysts.

  • Inconsistent query engines: The system may be composed of multiple query engines, and each query engine has its own DSL, which increases the learning cost for users. It is also inconvenient to query across multiple data sources. Another problem brought about by heterogeneous query engines is the formation of data islands, and the data between systems cannot be correlated with each other.

  • Real-time data update: The traditional offline T+1 data update can no longer meet the business demands of today's real-time operations, which requires the system to reach a delay of seconds.

In addition to the above problems, data services and data analysis systems cannot be unified, and the data results generated by analysis are often offline, requiring additional development of data services, which cannot be quickly transformed into online services to empower external systems, making it difficult to quickly communicate between analysis and services. form a closed loop. Moreover, in the past data processing, the storage often only considered the needs at that time. When the subsequent demand scenarios are expanded, the original storage engine may not be applicable, resulting in a piece of data that needs to be stored in different storage engines for different scenarios, bringing data consistency. Sexual hazards and cost waste.

 

Data service analysis integration practice based on StarRocks

Based on the above business pain points, the JD logistics operation data product team has developed an integrated service analysis platform - Udata. The Udata system is implemented based on the StarRocks engine. Udata abstracts the process of generating data indicators, and generates data services in a low-code configuration method, which greatly reduces the complexity and difficulty of development, allowing non-R&D students to configure and publish data services according to their own needs. The development time of the indicator has been shortened from the previous one or two days to 30 minutes, which has greatly liberated the research and development power.

The platform-based indicator management system and data map functions allow users to find and maintain indicators more intuitively and conveniently, and also make indicator reuse possible. In terms of data analysis, we built the Udata unified query engine with the federated query scheme based on StarRocks, which solved the problem of inconsistent query engines and data silos.

At the same time, StarRocks provides powerful data query performance, whether it is large-width table or multi-table association query performance is very good. StarRocks provides the ability to ingest data in real time and a variety of real-time data models, which can well support real-time data update scenarios. The Udata system combines analysis and service, so that analysis and service are no longer two separate processes. Users can generate corresponding data services immediately after analyzing valuable data, so that service analysis can quickly close the loop. The data flow architecture before the transformation:

  • Real-time data is imported into Apache Flink by JDQ (Jingdong Log Message Queue, similar to Apache Kafka) and JMQ for real-time data processing. After processing, the data is written to ClickHouse and ElasticSearch to provide OLAP query services for data services and data analysis.

  • The offline data is processed at the data warehouse level by Apache Spark, and the APP layer data will be synchronized to MySQL or ClickHouse for OLAP query.

In this architecture, data service and data analysis are two separate parts. Analysis tools are difficult to perform data analysis across multiple data sources and different query languages, and data services are also developed in a chimney style. The transformed data process architecture:

  • In terms of real-time links, we introduced StarRocks on the basis of the original ClickHouse and ElasticSearch, realizing extremely fast single-table and multi-table query capabilities.

Later, we built a unified query engine based on StarRocks. The unified query engine adds functions such as data source and aggregation pushdown according to JD.com's business characteristics. Udata unifies data analysis and data service functions based on the unified query engine.

Building an integrated data service analysis system has relatively high requirements on the query engine, which needs to meet at the same time: extremely fast query performance, support for federated queries, and unification of real-time and offline storage. Based on these three requirements, next, we will discuss the extremely fast query performance of StarRocks, our transformation of federated query, and the practice of real-time scenarios.

Extreme query performance of StarRocks

Single table query for extremely fast query

StarRocks has done a lot in terms of extremely fast query. The following four points are highlighted:

1) Vectorized execution: StarRocks implements a comprehensive vectorized execution from the storage layer to the query layer, which is the basis for the speed advantage of StarRocks. Vectorized execution takes full advantage of the processing power of the CPU. A comprehensive vectorization engine organizes and processes data in a columnar fashion. The data storage of StarRocks, the organization of data in memory, and the calculation method of SQL operators are all implemented in columnar format. Column-based data organization will also make more full use of the CPU's Cache, and column-based calculation will have fewer virtual function calls and fewer branch judgments, thereby obtaining more sufficient CPU instruction pipeline. On the other hand, StarRocks' comprehensive vectorization engine takes full advantage of the SIMD instructions provided by the CPU through vectorized algorithms. In this way, StarRocks can complete more data operations with fewer instructions. After verification on standard test sets, StarRocks ' comprehensive vectorization engine can improve the performance of execution operators by 3-10 times as a whole.

2) Materialized view accelerates query: In actual analysis scenarios, we often encounter the situation of analyzing tens of billions of large tables. Although StarRocks has excellent performance, the large amount of data still affects the query speed. At this time , adding materialized views to the dimensions that users often aggregate, the query speed can be increased by more than 10 ** times without changing the query statement . **StarRocks' intelligent materialized views allow requests to automatically match views without the need to manually query views.

3) CBO: The CBO optimizer (Cost-based Optimizer) adopts the Cascades framework, uses a variety of statistical information to improve cost estimates, and supplements the logic transformation (Transformation Rule) and physical implementation (Implementation Rule) rules, which can be executed at the tens of thousands of levels In the search space of the plan, the optimal execution plan with the lowest cost is selected.

4) Adaptive low cardinality optimization: StarRocks can adaptively build a global dictionary for low cardinality string type columns according to data distribution, and use Int type for storage and query, making memory overhead smaller, which is beneficial to SIMD instructions Execute, speeding up the query speed. Clickhouse also has low cardinality optimization, but it needs to be declared when building the table, which will be troublesome to use.

Extremely fast multi-table association

In real-time data analysis scenarios, it is not enough to only satisfy the fast query of a single table. In order to speed up the query speed, the industry is accustomed to grouping multiple tables into a large-width table. Although the large-width table is fast, the problem it brings is extremely inflexibility. The real-time data processing layer uses Flink to join multiple tables into one table. Write to a large wide table. When the business side wants to modify or increase the analysis dimension, the data development cycle is often too long, and after the data processing is completed, it is found that the best time for analysis has been missed. Therefore, a more flexible data model is required, and it is an ideal method to return the large wide table schema to the star schema or snowflake schema.

In this scenario, the performance of the query engine for the multi-table data association query has become the key. In the past, ClickHouse mainly used large and wide tables. In the case of multi-table joint query, the query response time cannot be guaranteed, and there is even a high probability of OOM. StarRocks solves this problem very well. The performance of large table Join is improved by more than 3-5 times, making it a powerful tool for star model analysis. CBO is the key to the ultimate performance of multi-table association. At the same time, StarRocks supports Broadcost Join, Shuffle Join, Bucket shuffle Join, Colocated Join, Replicated Join and other Join methods. CBO can intelligently select the Join order and Join method.

The transformation of the StarRocks federation query by the JD logistics team

StarRocks supports a variety of appearances such as ElasticSearch, MySQL, HIVE, data lake, etc. in federated query, and already has a good foundation for federated query. However, in actual business scenarios, some aggregation queries need to pull data from external data sources and then aggregate them. In addition, the aggregation performance of these data sources is also good, which increases the query time.

Our idea is to let this part of the engine that is good at aggregation do the aggregation by itself, and push the aggregation operation down to the external engine. The engines that currently meet this optimization condition are: MySQL, ElasticSearch, ClickHouse. At the same time, in order to be compatible with more data sources, we have also added JSF (JD.com internal RPC service) / HTTP data sources. The following two parts are mainly introduced:

Aggregation push-down function of MySQL and ElasticSearch

Now StarRocks' solution for aggregating external data sources is to pull the full amount of data pushed down by the predicate. Although some data has been filtered after the predicate pushdown, pulling the data to StarRocks for re-aggregation is a heavy operation, resulting in unsatisfactory aggregation time. We choose to push down the aggregation operation and let the external table engine do the aggregation by itself, which saves data pulling time, and at the same time, the localized aggregation is more efficient.

First look at the process of generating an execution plan: 1. SQL Parse: Convert SQL text into an AST (Abstract Syntax Tree) 2. SQL Analyze: Perform syntactic and semantic analysis based on AST 3. SQL Logical Plan: Convert AST into a logical plan 4. SQL Optimize: Rewrite and convert the logical plan based on relational algebra, statistical information, and Cost model, and select the physical execution plan with the "lowest" Cost 5. Generate Plan Fragment: Convert the physical execution plan selected by the Optimizer to BE Plan Fragment that can be executed directly

We optimize the generated physical execution plan again in step 4. When the ElasticSearch or MySQL aggregation operation is encountered, the execution plan of ScanNode+AGGNode will be optimized to QueryNode. QueryNode is a special ScanNode. The difference from ordinary ScanNode is that QueryNode will directly send aggregate query requests to the corresponding external engine, instead of performing aggregation locally after scanning data.

Among them, when using ElasticSearch QueryNode, we will generate the DSL statement of ElasticSearch query on the FE side, and directly push it down to the BE side for query. At the same time, on the BE side, we have implemented two QueryNodes, ElasticSearch QueryNode and MySQL QueryNode. We also set the agg_push_down switch for this feature, which is off by default.

Refer to the query statement in the figure below. Below is the SQL of the Join query after the aggregation of the two data sources from ElasticSearch and MySQL. The original execution plan needs to scan a large amount of data from the data source and then aggregate it locally. After optimization, this part of the process of Scan and aggregation is removed, and the aggregated data is directly obtained.

The execution plan generated by the above SQL after optimization changes as follows:

 

Add JSF (JD internal RPC service)/HTTP data source

Data services may involve integrating external data services and reusing previously developed metrics. Our idea is to abstract JSF (JD's internal RPC service)/HTTP into an external table of StarRocks, and users can access data services through SQL like querying a database. In this way, not only old indicators can be reused, but also new composite indicators can be generated by combining data from other data sources. We add two ScanNodes, JSF and HTTP, at the FE and BE sides.

Practice in real-time scenarios

Most of the real-time data of JD Logistics belong to the update scenario, and the waybill data will change according to the change of the business status. Here are our three real-time update solutions in production:

Real-time update scheme based on ElasticSearch

Principle: 1) Internally obtain the document first 2) Update the old document in memory 3) Mark the old document as deleted 4) Create a new document

Advantages: 1) Support data real-time update, can do Partial update

Disadvantages: 1) The aggregation performance of ElasticSearch is poor, and the query time will be very long when there are multiple aggregation dimensions. 2) The DSL syntax of ElasticSearch increases development work. Although ElasticSearch can support simple SQL, it cannot meet complex business scenarios. 3) Old Data cleaning is difficult. When Compaction is triggered to physically delete the marked bit document, a large number of IO operations will be triggered. If the write volume is large at this time, it will seriously affect the read and write performance.

Real-time solution based on ClickHouse

Principle: 1) Use ReplacingMergeTree to implement 2) Distribute the same data with the same Primary key to the same data partition of the same data node 3) Do Merge on read when querying, merge multi-version data read

Advantages: 1) ClickHouse writing is basically Append writing, so the writing performance is strong,

Disadvantages: 1) Due to version merging when reading, the query and concurrency performance is poor 2) The Join performance of ClickHouse is not good, which will cause data island problems

Real-time update scheme based on StarRocks primary key model

Principle: When StarRocks receives an update operation on a row, it will find the location of the record through the primary key index, mark it for deletion, and insert a new record. This is equivalent to rewriting Update to Delete+Insert. When StarRocks receives a delete operation on a row, it will find the location of the record through the primary key index and mark it for deletion. In this way, the use of predicate push-down and index is not affected during query, which ensures the efficient execution of query. The query speed is 5-10 times faster than the Merge on read method.

Advantages: 1) Only one version of data, strong query performance, real-time update 2) Although Delete+Insert has a slight loss in write performance, it is still very powerful in general 3) MySQL protocol, easy to use

Disadvantages: 1) The current version has some restrictions on data deletion, which cannot be deleted using the Delete statement. This function will be added in the new version to update the scene in real time. Generally speaking, there are the following solutions:

1) Merge on read: StarRocks aggregation, Unique model and ClickHouse's ReplacingMergeTree, AggregatingMergeTree are all used in this scheme. The feature of this solution is that the Append method has good writing performance, but multi-version data needs to be merged during query, resulting in poor query performance. It is suitable for real-time analysis scenarios with low data query performance requirements.

2) Copy on write: At present, some data lake systems such as Apache Hudi and Apache Iceberg have Copy on write solutions. The principle of this scheme is that when there is updated data, the new and old data will be merged and a new file will be rewritten to replace the old file. There is no need to do Merge operation when querying, so the query performance is very good. The problem is that the operation of writing and merging data is heavy, so this solution is not suitable for real-time writing scenarios.

3) Delete and insert: This scheme is an Upsert scheme, which locates the row to be updated through the primary key index in memory, marks deletion and then inserts. At the expense of part of the write performance, it brings several times the query performance improvement of Merge on read, and also improves the concurrency performance.

Real-time update has always been a technical difficulty in the OLAP field. It is difficult for the previous solutions to have the characteristics of good writing performance, good reading performance, and easy use at the same time. The delete and insert methods of StarRocks are currently closer to the ideal solution. They have excellent performance in reading and writing, and are very simple and friendly in supporting the MySQL protocol. At the same time , the offline analysis of Udata is also completed with StarRocks , allowing JD Logistics to achieve the goal of real-time offline analysis integration.

 

follow-up direction

Data Lake Exploration

Batch-stream integration has become the general trend of future development. As a storage carrier for batch-stream integration, data lakes have become the standard. Our general direction in the future is bound to be batch-stream integration. At present, a major pain point in the batch-stream integration is that there is no query engine that can perform extremely fast queries on the data lake. Later, we will use SR to build the ultra-fast analysis capabilities on the lake, so that the batch-stream integration does not only stay in the computing stage.

The architecture diagram is as follows:

Real-time data storage unified

At present, there are many sets of real-time storage solutions in the system, and the operation and maintenance cost is still quite high. We will gradually replace ElasticSearch and ClickHouse with StarRocks to achieve unified storage in the real-time layer. We are also looking forward to StarRocks' later primary key model to support the function of deleting data by Detele statement, which can solve the current data clearing problem.

Support for more data sources

In the future, we will support more data sources, such as Redis, Apache HBase and other KV type NoSQL databases, to enhance the query capability of StarRocks.

Federated Queries between StarRocks Clusters

In actual production, it is difficult to use only one large cluster, especially when there are a large number of real-time writes. It is safer to split different small clusters. When a cluster fails, it will not affect other businesses. However, the problem is that the clusters may become data islands again. Even if StarRocks is disguised as MySQL to create an external appearance, tools are needed to synchronize the table structure and other information of each cluster, which is time-consuming and labor-intensive to manage. The community discussed how to realize the federation function between clusters. If there are interested community partners, they can also participate in the co-construction.

Resource isolation

StarRocks started to introduce the function of resource group in version 2.2, which can effectively isolate large and small query loads, and will introduce more functions in the aspects of large query circuit breaker, import and query load resource isolation in the future. Therefore, it will be possible for some smaller businesses to run in the form of resource isolation on the same cluster.

 

About StarRocks 

Since its establishment more than two years ago, StarRocks has been focusing on building the world's top new generation of ultra-fast full-scene MPP database, helping enterprises to establish a new paradigm of "extremely fast and unified" data analysis, and helping enterprises to fully digitalize their operations.

At present, it has helped more than 110 large-scale users such as Tencent, Ctrip, SF Express, Airbnb, Didi, JD.com, and ZhongAn Insurance to build new data analysis capabilities, and the number of StarRocks servers running stably in the production environment reaches thousands. 

In September 2021, the StarRocks source code is open, and the number of stars on Github has exceeded 2900. The global community of StarRocks has grown rapidly. So far, there have been more than 100 contributors and more than 5,000 community users, attracting dozens of domestic and foreign industry leaders to participate in the joint construction.

 

 

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5658056/blog/5553088