The practice of UData+StarRocks in JD Logistics | JD Logistics technical team

1 background

Data services and data analysis scenarios are the two major directions of data application in data teams. Everyone in the industry may encounter the following problems:

1.1 Data services

  • Chimney development model: A data service is developed for each requirement. Data services cannot be reused, are difficult to platform, and cannot accumulate technology.
  • Service maintenance is difficult: When a large number of data services are developed, post-maintenance is a big problem, especially during the 618 and Double 11 promotions, when one person maintains hundreds of them without unified monitoring, current limiting, and disaster recovery plans. Data services are a very painful thing and also cause great security risks.
  • Large business demand: Data development students are often constrained by a large number of repetitive and boring data service development, and invest a lot of time in the development of business data services.

1.2 Data analysis

  • Difficulty in finding data: It is difficult for users to find what they want. Even if they find indicators or data with similar names, they cannot be used directly because the indicator caliber is unclear and inconsistent.
  • Difficulty in using data: Since data is currently distributed in various systems, users cannot use one system to meet all data needs. In particular, front-line operators have to export large amounts of Excel from various systems for data analysis, which is time-consuming and labor-intensive, and also creates data security risks.
  • Slow query: With the traditional Olap engine, it often takes several minutes for users to run SQL to get results, which greatly reduces the efficiency of analysts.
  • Query engines are not unified: The system may be composed of multiple query engines. Each query engine has its own DSL, which increases the user's learning cost. At the same time, it is also very inconvenient to query across multiple data sources. Another problem caused by heterogeneous query engines is the formation of data islands, and the data between various systems cannot be related to each other.
  • Real-time data update: The traditional offline T+1 method of data update can no longer meet the business requirements of today's real-time operations, which requires the system to reach the second-level delay.

In addition to the above problems, data services and data analysis systems cannot be unified. The data results generated by analysis are often offline, requiring additional development of data services. They cannot be quickly transformed into online services to empower external systems, making it difficult to quickly connect analysis and services. Form a closed loop. Moreover, in the past data processing process, storage often only considered the needs at that time. When subsequent demand scenarios expand, the initial storage engine may not be applicable, resulting in a piece of data having to be stored in different storage engines for different scenarios, resulting in data consistency. Sexual hazards and cost waste.

2 Integrated practice of data service analysis based on StarRocks

Based on the above business pain points, the JD Logistics Operation Data Product Team developed an integrated service analysis system - UData (Universal Data). The UData system is implemented based on the technology of StarRocks engine. UData abstracts the process of data indicator generation and uses configuration to generate data services with low code, which greatly reduces the development complexity and difficulty, allowing non-R&D students to configure and publish their own data services and indicator development according to their own needs. The time was shortened from the previous one or two days to 30 minutes, which greatly liberated research and development capabilities. The platform-based indicator management system and data map functions allow users to find and maintain indicators more intuitively and conveniently, and also make indicator reuse possible.

In terms of data analysis, we use the federated query solution based on StarRocks to create a UData unified query engine, which solves the problem of inconsistent query engines and data islands. At the same time, StarRocks provides powerful data query performance, whether it is a large wide table or a multi-table association. Query performance is excellent. StarRocks provides real-time data ingestion capabilities and a variety of real-time data models, which can well support real-time data update scenarios. The UData system combines analysis and services, so that analysis and services are no longer two separate processes. After users analyze valuable data, they can immediately generate corresponding data services, allowing service analysis to quickly close the loop.

Data flow architecture diagram:

Structure before transformation:


Figure 1 Architecture diagram before transformation


Before the transformation, real-time data was imported into Flink from JDQ (Jingdong Log Message Queue, similar to Kafka) and JMQ for real-time data processing. After processing, the data was written to Clickhouse and ElasticSearch to provide Olap query services for data services and data analysis. Offline data is processed by Spark at the data warehouse level, and APP layer data is synchronized to Mysql or Clickhouse for Olap query. In this architecture, data services and data analysis are two separate parts. It is difficult for analysis tools to perform data analysis across multiple data sources and different query languages. Data services are also developed in a chimney style.

Renovated architecture:


Figure 2 The transformed architecture


After the transformation, we introduced StarRocks in the data storage layer. StarRocks provides extremely fast single-table and multi-table query capabilities. At the same time, we built a unified query engine based on StarRocks. The unified query engine adds data sources and aggregation according to JD.com’s business characteristics. Push-down and other functions, UData unifies data analysis and data service functions on the basis of a unified query engine.

Building an integrated data service and analysis system has relatively high requirements for the query engine, which must simultaneously meet: extremely fast query performance, support for federated queries, and unification of real-time and offline storage. Based on these three requirements, let’s discuss the reasons for StarRocks’ extremely fast query performance, our transformation of federated queries, and the practice of real-time scenarios.

2.1 Reasons for StarRocks’ extremely fast query performance

Single table query for extremely fast query:

StarRocks has done a lot in terms of extremely fast querying. Here we focus on the following four points:

  1. Vectorized execution: StarRocks implements comprehensive vectorized execution from the storage layer to the query layer, which is the basis of SR's speed advantage. Vectorized execution fully utilizes the processing power of the CPU. The fully vectorized engine organizes and processes data in a columnar fashion. StarRocks' data storage, the organization of data in memory, and the calculation of SQL operators are all implemented in a columnar manner. Column-based data organization will also make more full use of the CPU's cache. Column-based calculations will have fewer virtual function calls and fewer branch decisions, thereby obtaining a more adequate CPU instruction pipeline. On the other hand, StarRocks' comprehensive vectorization engine makes full use of the SIMD instructions provided by the CPU through vectorization algorithms. In this way, StarRocks can complete more data operations with fewer instructions. After verification on a standard test set, StarRocks' comprehensive vectorization engine can improve the overall performance of operator execution by 3 to 10 times.
  2. Materialized views accelerate queries: In actual analysis scenarios, we often encounter the situation of analyzing tens of billions of large tables. Although SR performance is excellent, the query speed is still affected by the large amount of data. At this time, adding dimensions that users often aggregate With the materialized view, the query speed is increased by more than 10 times without changing the query statement. SR's intelligent materialized view allows the request to automatically match the view without manually querying the view.
  3. CBO: CBO (Cost-based Optimizer) optimizer uses the Cascades framework, uses a variety of statistical information to improve cost estimation, and supplements logical transformation (Transformation Rule) and physical implementation (Implementation Rule) rules, and can execute plans at tens of thousands levels. In the search space, select the optimal execution plan with the lowest cost.
  4. Adaptive low-cardinality optimization: StarRocks can adaptively build a global dictionary for low-cardinality string type columns based on data distribution, and use the Int type for storage and query, making the memory overhead smaller and conducive to SIMD instruction execution. The query speed is accelerated. Correspondingly, Clickhouse also has LowCardinality optimization, but it needs to be declared when creating the table, which makes it more troublesome to use.

Extremely fast multi-table association:

In real-time data analysis scenarios, it is not enough to satisfy the extremely fast query of a single table. Currently, in order to speed up the query, the industry is accustomed to converting multiple tables into one large wide table. Although large wide tables are fast, the problems they bring are: It is extremely inflexible. The real-time data processing layer uses flink to join multiple tables into one table and write it into a large table. When the business side wants to modify or add analysis dimensions, the data development cycle is often too long. After the data processing is completed, it is found that the analysis has been missed. The best chance. Therefore, a more flexible data model is needed. The ideal method is to return the large wide table model to the star model or snowflake model. In this scenario, the performance of the query engine for multi-table data correlation queries has become key. In the past, clickhouse mainly used large and wide tables. In the case of multi-table joint queries, the query response time cannot be guaranteed, and there is even a high probability of OOM. SR solves this problem very well. The performance of joining large tables is improved by more than 3 to 5 times, making it a powerful tool for star model analysis. CBO (Cost-based Optimizer) is the key to the ultimate performance of multi-table association. At the same time, StarRocks supports multiple join methods such as Broadcost Join, Shuffle Join, Bucket shuffle Join, Colocated Join, Replicated Join, etc. CBO can intelligently select the join order and join method.

2.2 Transformation of StarRocks federated query

It is difficult to achieve truly unified storage at the storage layer due to requirements, scenarios, history and other reasons. In the past data service development, due to the non-unified storage layer and different database query syntax, the development was basically a chimney development. We have developed It is difficult to reuse indicators, and it is also difficult to manage a large number of developed indicators. Federated query can solve this problem very well. It uses a unified query engine to shield the proprietary DSL of different olap engines, which greatly improves development efficiency and learning costs. At the same time, it can use ONE SQL to integrate indicators from different data sources to form a new indicators, thereby improving the reusability of indicators. The StarRocks appearance extension function provides the basis for implementing federated queries, but we have some of our own business requirements in terms of details.

StarRocks supports a variety of surfaces for federated query, such as ES, Mysql, hive, data lake, etc., and already has a good foundation for federated query. However, in actual business scenario requirements, some aggregation queries need to pull data from external data sources and then aggregate them, and the aggregation performance of these data sources themselves is also good, which actually increases the query time. Our idea is to let these engines that are good at aggregation do the aggregation themselves and push the aggregation operations to external engines. Currently, the engines that meet this optimization condition are: Mysql, ElasticSearch, and Clickhouse. At the same time, in order to be compatible with more data sources, we have also added JSF (Jingdong internal RPC service)/HTTP data source. Here is a brief introduction to these two parts:

1.Mysql, ElasticSearch’s aggregate push function

StarRocks' current solution for aggregating external data sources is to pull the full amount of data after predicate pushdown. Although a part of the data has been filtered after predicate pushdown, pulling the data into StarRocks and then aggregating it is a heavy operation, resulting in a long aggregation time. ideal. Our idea is to push down the aggregation operation and let the external table engine do the aggregation itself, saving data retrieval time and making localized aggregation more efficient. The optimization of aggregate pushdown can improve performance by more than 10 times in some scenarios.

 


Figure 3 Physical plan optimization diagram


At the physical execution plan layer, we have optimized it again. When encountering the aggregation of ES, Mysql, and clickhouse, we will optimize the execution plan of ScanNode+AGGNode into QueryNode. QueryNode is a special ScanNode, which is different from ordinary ScanNode. QueryNode will directly send the aggregation query request to the corresponding external engine instead of scanning the data and performing aggregation locally. Among them, for EsQueryNode, we will generate the DSL statement of ES query on the FE side and directly push it down to the BE side for query. At the same time, we implemented two QueryNodes, EsQueryNode and MysqlQueryNode, on the BE side.

2. Add JSF (Jingdong internal RPC service)/HTTP data source

Data services may involve integrating external data services and reusing previously developed indicators. Our idea is to abstract JSF (JD.com’s internal RPC service)/HTTP into an external table of StarRocks. Users can query the database through SQL. Access the data service in the same way, so that you can not only reuse old indicators but also combine data from other data sources to generate new composite indicators. We add JSF and HTTP ScanNodes on both the FE and BE sides.

2.3 Practice in real-time scenarios

The vast majority of JD Logistics’ real-time data belongs to update scenarios, and waybill data will change according to changes in business status. Here are our three real-time update solutions in production:

Solution 1: Real-time update solution based on ES

The principle is as follows:

  1. Internally get the document first
  2. Update old documents in memory
  3. Mark old documents as deleted
  4. Create new document

advantage:

  • Supports real-time data update, and can achieve partial update

shortcoming:

  • ES aggregation performance is poor, and the query time will be very long when multiple aggregation dimensions appear.
  • ES's DSL syntax increases development work. Although ES can support simple SQL, it cannot meet complex business scenarios.
  • It is difficult to clean up old data. When compaction is triggered to physically delete marked documents, a large number of IO operations will be triggered. If the amount of writing is large at this time, it will seriously affect the read and write performance.

Solution 2: A quasi-real-time solution based on clickhouse

The principle is as follows:

  1. Implemented using ReplacingMergeTree
  2. Distribute data with the same Primary key to the same data partition of the same data node
  3. Do Merge on read when querying, merge multiple versions of data to read

advantage:

  • Clickhouse writing is basically append writing, so the writing performance is strong

shortcoming:

  • Due to version merging when reading, query and concurrency performance is poor.
  • Clickhouse's join performance is poor and will cause data island problems

Solution 3: Real-time update solution based on StarRocks primary key model

Principle: When StarRocks receives an update operation on a row, it will find the location of the record through the primary key index, mark it for deletion, and then insert a new record. It is equivalent to rewriting Update as Delete+Insert. When StarRocks receives a deletion operation on a certain row, it will find the location of the record through the primary key index and mark it for deletion. In this way, predicate pushdown and the use of indexes are not affected during queries, ensuring efficient execution of queries. The query speed is 5-10 times faster than the Merge on read method.

advantage:

  • Only unique version of data, strong query performance, real-time updates
  • Although Delete+Insert has a slight loss in writing performance, it is still very powerful overall.
  • Mysql protocol, easy to use

shortcoming:

  • The current version has some restrictions on data deletion and cannot be deleted using the delete statement. The community will add this feature in the new version.

Generally speaking, there are the following solutions for real-time update scenarios:

  1. Merge on read: StarRocks’ aggregation and Unique models and Clickhouse’s ReplacingMergeTree and AggregatingMergeTree all use this solution. The characteristic of this solution is that the append method has good writing performance, but the need to merge multiple versions of data during query results in poor query performance. It is suitable for real-time analysis scenarios that do not require high data query performance.
  2. Copy on write: Currently, some data lake systems such as Hudi and Iceberg have copy on write solutions. The principle of this solution is that when there is updated data, the new and old data will be merged and a new file will be rewritten to replace the old file. There is no need to perform merge operations when querying, so the query performance is very good. The problem is that writing and data merging operations are heavy, so this solution is not suitable for write scenarios with strong real-time performance.
  3. Delete and insert: This scheme is an upsert scheme, which locates the row to be updated through the primary key index in memory, marks it for deletion and then inserts it. While sacrificing part of the write performance, the query is improved several times compared to Merge on read, and the concurrency performance is also improved.

Real-time updates have always been a technical difficulty in the Olap field. It is difficult for previous solutions to have the characteristics of good writing performance, good reading performance, and ease of use at the same time. The Delete and insert methods of StarRocks are currently closer to the ideal solution. They have excellent performance in reading and writing. They support the Mysql protocol and are simple and friendly to use. At the same time, offline analysis of Udata is also completed using StarRocks, allowing us to achieve the goal of integrating real-time offline analysis.

3 follow-up directions

Data lake exploration: Batch-stream integration has become a major trend in future development. The data lake has become a standard storage carrier for batch-stream integration. Our future general direction must also be batch-stream integration. At present, a big pain point in batch-stream integration is that there is no query engine that can perform extremely fast queries on the data lake. Later, we will use SR to build extremely fast analysis capabilities on the lake, so that batch-stream integration does not just stay in the calculation stage.
The architecture diagram is as follows:

Figure 4 Later planning architecture diagram

  • Unified real-time data storage: There are still multiple real-time storage solutions in the system, and the operation and maintenance costs are still quite high. In the future, we will gradually replace ES and Clickhouse with StarRocks to achieve unified storage in the real-time layer. We are also looking forward to StarRocks’ later feature that the primary key model will support deletion of data using detele statements. This feature can simplify the current data clearing problem.
  • Support more data sources: In the future, we will support more data sources, such as Redis, Hbase and other kv-type Nosql databases, to enhance the query capabilities of SR.
  • Federated query between StarRocks clusters: In actual production, it is difficult to use only one large cluster, especially when there are a large number of real-time writes. A safer approach is to split different small clusters. When one cluster has a problem It will not affect other businesses. However, the problem is that clusters may become data islands again. Even if StarRocks is disguised as Mysql to create a table, tools are still needed to synchronize the table structure and other information of each cluster. Management is time-consuming and laborious. We will also follow up with The community discusses how to implement federation functions between clusters.

Author: JD Logistics Zhang Dong He Siyuan

Source: JD Cloud Developer Community Ziyuanqishuo Tech Please indicate the source when reprinting

Spring Boot 3.2.0 is officially released. The most serious service failure in Didi’s history. Is the culprit the underlying software or “reducing costs and increasing laughter”? Programmers tampered with ETC balances and embezzled more than 2.6 million yuan a year. Google employees criticized the big boss after leaving their jobs. They were deeply involved in the Flutter project and formulated HTML-related standards. Microsoft Copilot Web AI will be officially launched on December 1, supporting Chinese PHP 8.3 GA Firefox in 2023 Rust Web framework Rocket has become faster and released v0.5: supports asynchronous, SSE, WebSockets, etc. Loongson 3A6000 desktop processor is officially released, the light of domestic production! Broadcom announces successful acquisition of VMware
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10305521