Stuck as soon as you encounter a complex analysis query? MySQL analysis examples to understand

Author: Yue Chang, Senior Product Manager Ali cloud database

With the explosive growth of enterprise data, MySQL's analysis and query problems are more and more, the user's timeliness cannot be guaranteed, and the demand for refined operations cannot be met. How to seamlessly connect to the business library, realize the real-time multi-dimensional analysis and business exploration of the trillion-level data for trillions of data, and the MySQL analysis example gives a perfect solution.

The MySQL analysis example is a cloud database RDS MySQL and AnalyticDB for MySQL. It is a product-level deeply integrated OLTP + OLAP solution, which solves the customer's complex analysis query query problem and quickly builds a real-time data warehouse.

One-click purchase, you can open permissions, free automatic data synchronization, business database without real-time perception and impact, data is synchronized to the analysis database in real time, helping cloud database RDS MySQL customers quickly build a real-time data warehouse platform.

Users do not need to build data warehouses by themselves, do not need to pay attention to how data is stored in the database, and do not need to worry about real-time data. They only need to purchase an analysis instance to create an analysis library with an analysis performance of about 100 times that of MySQL.

 

1. The difference between MySQL analysis instance and read-only instance

In the usage scenario, the read-only instance is mainly for online applications, and the analysis instance is for the complex report analysis of the RDS MySQL source instance, as shown in the following figure; in product implementation, the read-only instance is RDS MySQL that can only receive read-only requests The instance is an AnalyticDB for MySQL cluster.

 

 

 

2. The analysis example is actually AnalyticDB

The MySQL analysis instance is actually an AnalyticDB for MySQL cluster with built-in data transfer (DTS) (currently the default is the basic version). After successful purchase, it will automatically synchronize the full amount and incremental data of the cloud database RDS MySQL master instance to AnalyticDB in real time For MySQL, users can log in to the cluster details page of the AnalyticDB for MySQL console to view the data synchronization progress and delay.

 

2.1 Introduction to AnalyticDB

AnalyticDB is the only PB-level real-time data warehouse independently developed by Alibaba and verified by ultra-large scale and core business. Since its first online launch in 2012, it has cumulatively released nearly 100 versions iteratively, supporting many online analysis businesses such as e-commerce, advertising, rookie, entertainment, and flying pigs within the group. The many real-time analysis business peaks mentioned above during the Double Eleventh annually drive the continuous architectural evolution and technological innovation of AnalyticDB.

AnalyticDB officially began exporting to Alibaba Cloud in 2014, supporting industries including traditional large and medium-sized enterprises and government agencies, as well as numerous Internet companies, covering more than a dozen external industries.

In July 2019, the world's most well-known data management system evaluation standardization TPC organization announced the analysis performance benchmark test ranking in the database field, and AnalyticDB topped the list. It is the world's first cloud database product that passed TPC strict audit certification. AnalyticDB performs online statistics and analysis of data, helping companies to mine data value quickly and in real time. Ascending to the top of the TPC list means that it has become the fastest real-time data warehouse in the world! In complex analysis scenarios, the performance is increased by 10 times, and multi-dimensional analysis of trillions of data only takes milliseconds.

Recently, Alibaba Cloud conducted a TPC-H (100GB) comparative test based on common open source data analysis products and Alibaba Cloud's self-developed AnalyticDB. The test results are: AnalyticDB performance is about 100 times that of open source MySQL, about Presto, Spark, Impala 6 ~ 10 times, refer to the test results for details.

 

 

 

2.2 AnalyticDB Basic Edition

The product series of AnalyticDB for MySQL includes a basic version (stand-alone version) and a cluster version. The basic version provides services for a single node, and the minimalist architecture greatly reduces the threshold for using the basic version. Storage and computing separation architecture, row and column mixed storage technology, lightweight index construction method and distributed hybrid computing engine ensure the powerful analysis performance of the basic version. Only need to spend 860 yuan to build real-time data warehouse, without the need to set up a special big data team, saving millions of costs for enterprises.

 

2.3 AnalyticDB technical architecture

The following is the basic version and cluster version architecture diagram, no matter which series of AnalyticDB is composed of Coordinator and Worker.

 

The picture above is the basic version architecture diagram, the following is the cluster version architecture diagram

 

 

2.3.1 Coordinator: front-end control node, responsibilities include:

(1) MySQL protocol layer access, SQL analysis

(2) Authentication and authentication, providing a more complete and detailed permission system model, whitelist and cluster-level RAM control, and auditing and compliance records of all SQL operations.

(3) Cluster management: member management, metadata, data consistency, route synchronization, backup and recovery (data and log management)

(4) Background asynchronous task management

(5) Transaction management

(6) Optimizer, execution plan generation

(7) Computing scheduling, responsible for task scheduling

 

2.3.2 Worker: storage and computing nodes, including

(1) Calculation module

Distributed MPP + DAG hybrid computing engine and optimizer have achieved higher complex computing power and mixed load management capabilities. Using the advantages of flexible scheduling of resources on the Alibaba Cloud computing platform, flexible scheduling of computing resources is achieved. The computing worker node can be pulled up individually, and can be expanded in minutes or even seconds in response to business needs to achieve the most efficient use of resources.

(2) Storage module

The storage module is more lightweight, with real-time writing and reading capabilities that carry greater throughput data. The writing performance is about 50% higher than the same specifications of the previous version. It is visible in milliseconds to meet customer real-time analysis needs. Storage nodes provide full and incremental backup and recovery capabilities. Periodic snapshots and logs of cloud disks will be synchronized and saved in OSS in real time, providing higher security for user data and helping users to recover to the maximum extent when database problems occur.

(3)Worker Group

Worker nodes with storage modules are divided into worker groups. The cluster version provides three copies of storage. It works like a whole through the Raft distributed consistency protocol, allowing some of the Worker nodes to continue to provide failures. service.

AnalyticDB's read-write link is stable on the appeal component

(1) Write link, after the data is written through the Coordinator, it will be written to the Worker Group to which different table partitions (Shards) belong according to the partition key on the table. Three copies of the Worker in each Worker Group pass the Raft protocol To ensure strong consistency, high reliability, high availability, and real-time visibility (linear consistency) of the data; in order to ensure high efficiency, AnalyticDB has carried out many optimization technologies such as GroupCommit, compression, asynchronous, zero copy, etc., which greatly optimizes the write performance, TPC-H form node TPS reaches more than 15w, and can be linearly extended;

(2) Query link. After the user's SQL statement is sent to the Coordinator through the MySQL protocol, the SQL will be parsed by Parser to generate a logical plan, and then optimized by the Optimizer to generate a physical execution plan, and then sent to the calculation scheduling module to send the SQL The physical execution task is distributed to different Worker nodes for execution, and each execution task will be associated with the storage on the Worker, pulling data from the storage for distributed computing.

In order to improve computing performance, AnalyticDB has carried out a series of optimizations such as computing push-down storage, efficient intelligent index filtering, vectorization + Pipeline streaming execution, which is why AnalyticDB's TPC-DS performance is the fastest in the world.

 

3. Advantages of MySQL analysis instance

(1) One-click purchase, access permission

RDS MySQL users only need to purchase MySQL analysis instances through the console to achieve data synchronization, without the need to separately purchase AnalyticDB for MySQL cluster and data transfer (DTS) instances; the system automatically opens RDS MySQL, data transfer (DTS) and AnalyticDB for MySQL No need for repeated authorization.

(2) Free automatic data synchronization

The MySQL analysis instance has built-in data transfer (DTS), which automatically synchronizes full and incremental data after it is successfully created.

(3) Larger scale and higher performance

The performance of complex analysis is more than 100 times that of MySQL; it has the ability to carry real-time writing and reading of larger throughput data.

(4) Extremely flexible resilience

The node group and disk space can be scaled up or down at any time in seconds. It supports storage-intensive and computation-intensive specifications, tiered storage of hot and cold data, and historical data is kept at an unlimited cost (coming soon).

(5) Complete OLTP + OLAP solution

The deep integration of product levels can perfectly solve the problem of customers' complicated analysis and query, and quickly build real-time data warehouses.

 

4. Suitable for users

MySQL analysis examples are especially suitable for the following people:

(1) Hadoop / Spark and other RDS customers who are too complex and want to quickly realize data transformation;

(2) RDS users whose report database query is slow;

(3) RDS users who need to quickly build a test environment to select the data warehouse;

(4) Learning customers can quickly understand RDS users of AnalyticDB for MySQL;

For cloud, see Yunqi: more cloud information, cloud cases, best practices, product introduction, visit: https://yqh.aliyun.com/

This article is original content of Alibaba Cloud and may not be reproduced without permission.

Published 1217 original articles · 90 praised · 230,000 views +

Guess you like

Origin blog.csdn.net/weixin_43970890/article/details/105293650