Meituan Big Data Query Technology

Series of articles

  1. Real-time storage engine and real-time calculation engine
  2. Meituan Dianping Hadoop/Spark System Practice
  3. Meituan Big Data Query Technology


This article mainly related to data resources and services in data products and data services section.
Insert picture description here
The contents of this article are as follows:
Insert picture description here

1. Application scenarios

Background: I want to understand how this business affects the entire Meituan App.
Insert picture description here
"Growth Hacker" mentioned a pirate model method, which is essentially a funnel-shaped disassembly and analysis of traffic conversion. Contains the steps from user acquisition to user conversion and activation and corresponding analysis methods.
Insert picture description here
But only methods are not enough, and there must be corresponding data support. How is the data organized? This is divided into 5 parts.
Insert picture description hereInsert picture description here
What should we do when we do these analyses?

It depends on the following SQL.
First look at FROM, associate order table, city table and city dimension table,
then look at WHERE, select the bus business and repurchase orders between August 18 and August 19, then
look at GROUP BY and SUM, it’s basically clear Up.
Insert picture description here
The OLAP analysis mentioned earlier is used here. What are the methods for OLAP analysis? There are five kinds mentioned here.

  1. Drilling (drilling down): Increase the dimension and analyze the problem with a finer granularity. (Assume that a rectangular parallelepiped has one layer and expands into three layers)
  2. Rolling up (drilling up): Reduce dimensions and be able to look at problems from a macro (relative) perspective. (Assuming a cube has three layers, compressed into one layer)
  3. Slice: The same dimension, only one value is looked at. (Assuming a cube has three layers, only one layer is retained)
  4. Dicing: The same dimension, only a few values. (Assuming that a cube has three layers, only two layers remain)
  5. Rotation: row and column transformation.
    Insert picture description here
    Some commercial BI systems
    Insert picture description here

Two, system architecture

2.1. System Architecture Review-Presto

Focus on introducing the widely used and representative databases to see how distributed SQL statements run.

First introduce the database selection ideas of Meituan. Mainly divided into the following three scenarios:
Insert picture description here
Presto we will introduce in the next stage is mainly in the ad hoc query part. Let's take a look at the evolutionary history of Presto, if you are interested.

Insert picture description here
The design concept in a nutshell is to trade reliability for performance . The Spark and Map Reduce we talked about before are placed in the shuffle process, so that even if a node is down, it can be quickly built on the previous basis. Run, reuse the previous data as much as possible. But Presto did not consider this issue, his positioning is relative to Hive and Spark ultra-large-scale scenarios.

Then if you can't hold it, you will fail fast. You can also associate data from other types of databases. Insert picture description here
From a macro perspective, the overall structure is as shown in the figure above. The blue one is the service of Presto itself, and the front is a client-type control, similar to the MySQL command line. MetaStore basically stores the metadata of the tables and libraries on HDFS. (Dolls are prohibited)

Presto is still a master-slave structure, Cordinator is the coordinator, and Worker is the worker.

Insert picture description here
The detailed structure is shown in the figure:

The client submits a SQL, analyzes, plans, splits, and schedules in the Coordinator, and then submits the modules managed by each worker. After each worker gets the task, it scans the data from HDFS and then calculates it. After each worker aggregates, it passes one The stream interface is returned to the Client.

Insert picture description here
The important part is the internal processing of Coordinator.
Insert picture description here
How to parse the grammar? This actually involves knowledge of compilation principles, and generates syntax trees through lexical analysis and grammatical analysis.Insert picture description here

What does the syntax tree look like? Probably like
Insert picture description here
to make a statement in the SQL explaindatabase will tell you how to perform next. According to the syntax tree obtained above, a logical plan of the execution process is constructed.
Insert picture description here
Insert picture description here
Here is the abstraction of Presto's access to the data source. The SQL itself is shielded at the layer of reading data. This requires some configuration to access the data source.
Insert picture description here
This is the optimization part of SQL. pv is the table viewed by the user.
Insert picture description here
Normally, the process of writing a SQL is to associate pv and user, and then select eligible rows. Can it be optimized here? If so, how to optimize it?

When there is a lot of data, it would be great if we filter out the data we need before the association table. In Presto, before associating, when scanning the table, not all the data is taken out, only the siteId and userid are used for association, which becomes more efficient.

How to achieve these optimizations? First, you will define some rules, and then run these optimized strategies over and over again based on the tree structure of these logical plans, but whenever you find a similar structure, you can adjust the tree of the logical plan to make the entire SQL more efficient.
Insert picture description here
Insert picture description here
After optimizing the logical plan, we must consider how to physically implement the logical plan.
Insert picture description here
Insert picture description here
The division is mainly based on the abstraction of Optimizer. Then after the segmentation is finished, some Shuffle and Group by and HashJoin will be done.

Insert picture description here
Insert picture description here
The Shuffle here is not sold.

Question: If you don't place an order during Shuffle, what restrictions will SQL impose? (Hint: when joining / groupby)

Answer: When associating or merging, we want to
calculate the data of the same key at the same node, but if data skew occurs, the memory of a node will explode if there is too much data.

Then we get a distributed execution plan, which is scheduled to be executed on each node separately. This is the detailed execution logic inside Coordinator.

Below is the scheduling of the physical plan.

Insert picture description here
Presto will also do some physical level programs, such as codegen, the main feature is the run-time compiler .
Insert picture description here
Another optimization is partial storage layer, data index and data organization structure optimization. Derive column-based storage structures (such as ORC, parquet, etc.) from the row-based storage structure.
Insert picture description here

2.2. Distributed OLAP system expansion technology

Introduce some trade-offs when implementing architecture design of some systems. It mainly introduces four systems: Kylin, Druid, Clickhouse and Doris.

2.2.1 Kylin and Cube pre-aggregation

Insert picture description here
There are open source and commercial versions. The specific feature is pre-polymerization. What does that mean?

Assuming that a fact table has many dimensions, it is often necessary to aggregate according to these dimensions, and then Kylin will aggregate before we find it. When we find it, go to Kylin and select it. It is commonly used in data cube analysis.
Insert picture description here

2.2.2 Druid and streaming write isolation, the dimension column is inverted

Insert picture description here
Druid itself is also a columnar storage, and the most extreme is an inverted index-an index that stores the value of a column with a bitmap.

For example, there are two values ​​in the column advertiser, we scan the entire column to get a bitmap,{"bing.com" :[0,0,0,1], "google.com": [1, 1, 1, 0] }. The length of the array is the number of rows in the column. [0,0,0,1] corresponding to bing.com means the value of bing.com that appears in the fourth row. Why is it 4? Because the position/index of 1 in the array is 4 (start at index 1). We can also easily see that [1, 1, 1, 0] corresponding to google.com means that google.com appears in lines 1, 2, and 3.

Thus when we WHEREwhen there are a plurality of conditions can be taken directly after the andoperation, the operational efficiency becomes.

Insert picture description here

2.2.3 Clickhouse and SIMD

It is mainly through the use of hardware and memory to improve on-site computing capabilities, which can make full use of the CPU.
Insert picture description here
Insert picture description here

2.2.4 Doris and our integration plan

Doris is cohesive and has no major external dependencies. Compatible with MySQL protocol. (Previously called Palo, the reverse of OLAP)
Insert picture description here
Insert picture description here
Frontend can be considered as Presto's horizontal expansion Coordinator, and Bankend can be considered as horizontal expansion Worker plus some storage.
Insert picture description here
Use the LSM-Tree mode. Solved the problem that I hope to have a faster result for KV queries while improving the overall throughput capacity when writing fast and large batches.
Insert picture description here

Three, transformation case

3.1 Presto on Yarn

Bind Presto's elastic scaling, query scheduling and YARN together.
Insert picture description here

3.2 Unified ADhoc query One SQL

Mainly to solve the problem of different dialects of multiple engines. The
Insert picture description here
reconstructed architecture is shown in the figure:
Insert picture description here
(There is even a training decision tree model, and the extraction of feature judgment sentences is faster on that database, and then the corresponding engine dialect is generated)

3.3 Unified OLAP construction

Insert picture description here
Insert picture description here

3.4 Database comparison method

By comparing databases from a multi-dimensional perspective, there are two methods that can help you build the content of the database and the structure of SQL, and then test the implementation of the database with the structure. Insert picture description here
This is the performance comparison after the transformation:
Insert picture description here

Learning is not easy, and praise and collection.

Guess you like

Origin blog.csdn.net/qq_36366757/article/details/109374543