The technology behind Alibaba Cloud Dragon Team's TPCx-BB ranking first

 Introduction: The Shenlong big data acceleration engine independently developed by Alibaba Cloud has won the world's No. 1 result in the TPCx-BB SF3000 ranking.

A background introduction

Recently, TPC Benchmark Express-BigBench (referred to as TPCx-BB) announced the latest world rankings. Alibaba Cloud's self-developed Shenlong big data acceleration engine won the first place in the TPCx-BB SF3000 ranking.

The TPCx-BB test is divided into two dimensions: performance and cost performance. Among them, in terms of performance, in this ranking, Alibaba Cloud leads the second place by 41.6%, reaching 2187.42 BBQpm, and the cost performance leads the second place by 40%, down to 346.53 USD/BBQpm.

(TPCx-BB SF3000 performance dimension ranking)

(TPCx-BB SF3000 cost-effective dimension ranking)

Take this opportunity to share with you the technical process behind this first.

2. Overview of Dragon Big Data Acceleration Engine MRACC

The MRACC (Apasara Compute MapReduce Accelerator) self-developed by Alibaba Cloud is the trump card for the excellent results.

Today, with the surge in data processing demand, many enterprises will use open source Spark, Hadoop components or common suites such as HDP and CDH to build their own open source big data clusters, processing data volumes from TB to PB, and cluster scales from a few to thousands. . MRACC Shenlong big data acceleration engine, for customer self-built scenarios, relying on the Shenlong base, provides acceleration capabilities for common components, such as Spark, Hadoop, Alluxio, etc.

Combined with the characteristics of Alibaba Cloud Dragon's architecture, MRACC has optimized software and hardware integration to form a unique performance advantage. Ultimately, the performance of complex SQL query scenarios is improved by 2-3 times compared with the community version of Spark, and the use of eRDMA to accelerate Spark performance improves by 30% . With the support of the Shenlong big data acceleration engine, enterprises using Alibaba Cloud ECS cloud servers to run big data clusters will obtain higher performance and cost-effectiveness.

Figure 1 MRACC Shenlong big data acceleration engine architecture

Three MRACC-Spark introduction

Spark has been in development for ten years since its launch in 2010, and it has now become the engine of choice for batch computing of big data. MRACC has been optimized for the most commonly used Spark engine for big data. Specifically, in response to the IO-heavy characteristics of big data tasks, MRACC combines the architectural advantages of the cloud to accelerate software and hardware in terms of network and storage, including software SQL engine optimization, using optimization methods such as caching, file clipping, and indexing, and attempts to integrate Compression and other operations are offloaded to heterogeneous devices; eRDMA is also used for network acceleration, and the data exchange in the shuffle phase is run on the eRDMA network, which reduces latency and greatly improves CPU utilization.

Figure 2 MRACC-Spark architecture

Four Spark SQL engine optimization

Since Spark 2, Spark SQL, DataFrames and Datasets interfaces have gradually replaced the basic RDD API as Spark's mainstream programming model. The community has invested a lot in Spark SQL. According to statistics, nearly half of the optimizations in Spark 3.0 are concentrated on Spark SQL. Using SparkSQL instead of Hive to perform offline tasks has become a mainstream choice for many enterprises.

We have made some optimizations for the analyzer, optimizer, planner, and query execution stages of the SQL engine. Spark 3.0 has carried out a drastic transformation and optimization of the SQL engine, among which the AQE and DP mechanisms have attracted wide attention. However, the AE mechanism of open source Spark currently only supports partition clipping. It does not support non-partition key and subquery clipping. We have optimized this area to support dynamic data clipping of subquery, which can greatly reduce the amount of data involved in computing.

In the execution phase of the physical plan, we support window topn sorting, which greatly improves the performance of SQL statements including limit, and supports advanced features such as parquet rowgroup pruning and bloom filter join. The CBO mechanism of SPAKR SQL can better improve the efficiency of SQL execution. However, in the cbo stage, too many join tables will lead to a surge in cbo search overhead. We support genetic algorithm search to solve the surge in overhead caused by too many join tables. Case.

In addition, it also supports functions such as deduplication and pushdown, join foreign key elimination, integrity constraints, etc., and combined with deltalake to support data addition, deletion and modification operations.

Figure 3 SQL engine optimization of MRACC-Spark

5. Near-Network RDMA Optimization

At the 2021 Hangzhou Yunqi Conference, Alibaba Cloud released the fourth-generation Shenlong architecture, providing the industry's first large-scale elastic RDMA acceleration capability. RDMA is a high-performance network transmission technology that provides direct memory access and bypasses the Kernel for data transmission, thereby reducing CPU overhead and providing a low-latency, high-performance network. In distributed computing, the shuffle process is essential and consumes a lot of computing and network resources. It is the optimization focus of big data distributed computing. According to the data exchange characteristics of Spark memory computing in the shuffle phase, the shuffle data exchange can be changed into a memory-network-memory mode, making full use of the characteristics of direct interaction, low latency, and low cpu consumption of RDMA user-mode memory, and finally in tpcxhs and other terminals A 30% performance improvement was achieved on the end-to-end benchmark.

Figure 4 The eRDMA near-network optimization plugin of MRACC-Spark

Six performance optimization results

Finally, on the TPCDS 10T dataset, the performance is improved by 2.19 times compared to the latest Spark 3.1 version. Compared with the second place on TPCx-BB, the lead is as high as 41.6%.

Figure 5 Data effects of TPCDS and TPCx-BB

Seven Outlook

At present, all these optimizations are packaged and delivered to customers in the form of plug-ins. The customer code basically does not need to be modified, which is convenient for customers to use directly.

In the future, we will continue to serve Alibaba Cloud's big data customers with our software-hardware integration extreme performance optimization capabilities. In addition, we will continue to iterate on software-hardware integration performance optimization capabilities to build MRACC Dragon Big Data with higher performance and lower cost. Accelerated service capabilities are provided to the majority of users.

Attachment: Introduction of TPCx-BB

TPCx-BB is an end-to-end big data test benchmark based on retail scenarios released by the International Standardization Testing Authority (TPC). It supports mainstream distributed big data processing engines and simulates the entire online and offline business processes. 30 query statements, involving descriptive procedural queries, data mining and machine learning algorithms. The test of TPCx-BB has the characteristics of large amount of data, complex features, and complex sources.

The test results of TPCx-BB can comprehensively and accurately reflect the overall performance of the end-to-end big data system. The test covers structured, semi-structured and unstructured data, and can more comprehensively evaluate the software and hardware performance, cost performance, service, and power consumption of big data systems from the perspective of actual customer scenarios.

Original link

This article is original content of Alibaba Cloud and may not be reproduced without permission. 

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/123704941