Open Source Big Data Platform E-MapReduce Serverless StarRocks Product Introduction

Abstract: This article will share the cloud-native product practice of the StarRocks speed lake warehouse on the cloud jointly built by Alibaba Cloud and the StarRocks community. It mainly includes four parts. The first part introduces StarRocks fully managed form and OLAP cloud products that are free of operation and maintenance services; the second part introduces the instance management, diagnostic analysis, metadata management, security center and other functions of StarRocks Manager; the third part introduces Use cases in social networking, online education, e-commerce and other scenarios; finally, long-term and short-term planning for the product:
1. StarRocks product introduction
2. StarRocks function introduction
3. StarRocks scenario case
4. StarRocks future planning

1. StarRocks product introduction

Alibaba Cloud and the StarRocks community will cooperate in a semi-hosted form from early 2022. About 200 customers are already using semi-hosted StarRocks products. This year, we started to make a fully managed product form, hoping to help you further reduce the threshold of management and use, and cooperate with the community to promote the product to more OLAP users.

EMR Serverless StarRocks is a fully managed service of StarRocks on Alibaba Cloud. Combining StarRocks' own extremely fast and unified features, it focuses on the two goals of lowering the threshold and reducing the complexity of operation and maintenance, and provides customers with more capabilities.

In terms of ease of use, in the form of Serverless, it provides fully managed, O&M-free services, so you don't have to worry about the stability of the StarRocks cluster, such as downtime in daily use. In terms of data management, it provides easy-to-use slow SQL analysis and cluster health diagnosis, convenient import task management, and visual metadata management.

Combined with some products on Alibaba Cloud, cloud-native capabilities are integrated. The first is to integrate the underlying resources, combined with K8S, to achieve out-of-the-box use, and it only takes three to four minutes to complete the rapid creation of a cluster. In addition, it provides the ability to efficiently expand and shrink capacity and upgrade and upgrade configurations in the future, and realizes the rapid delivery of resources. In addition, deep integration with DLF has realized the connection of the entire data lake system on the cloud. Deeply integrated with Flink VVP to further reduce development costs.

The figure above shows the EMR product system. This introduction focuses on the OLAP part. StarRocks is the first fully managed form launched by EMR, and there will be more fully managed forms such as Serverless Doris and Presto to help users use the big data technology stack with a low threshold.

Using StarRocks, we can build a new generation of extremely fast and unified data architecture. In the analysis layer, we can use StarRocks to unify the OLAP engine and cover all OLAP scenarios. This way, the technology stack can be unified, and one technology and operation and maintenance can be applied to various OLAP analysis scenarios.

The StarRocks system architecture is shown in the figure above, and the core of the whole system is FE (Frontend) and BE (Backend).

EMR full hosting mainly revolves around the deployment form of K8S, and semi-hosting mainly revolves around the deployment form of ECS. Semi-management mainly provides rapid deployment capabilities, including basic cluster management capabilities such as monitoring and alarming. Full hosting is a higher level, and the service management of FE and BE is also managed, so that users do not need to care about the operation, maintenance and management of computing resources. Furthermore, it is expected that the platform's operation and maintenance capabilities, including expansion and contraction, cluster monitoring and alarms, etc., will be fully managed, so as to help users save more operation and maintenance costs. The capabilities provided by full hosting are, on the one hand, a full range of services without operation and maintenance, and on the other hand, the ability to automatically upgrade. There are also some Manager capabilities to better manage data, including import tasks, metadata, permissions, etc.

2. StarRocks Function Introduction

instance management

Instance management is mainly to quickly solve the deployment and monitoring capabilities of the cluster in a fully managed form, which is the most basic capability. And can better realize automatic upgrade. In addition, it provides the ability to visualize configuration, as well as templates for some monitoring and alarm rules.

Diagnosis and Analysis

In the process of daily data query or data application, the problem of slow SQL is often encountered. It is necessary to analyze the cause and find a corresponding solution. EMR StarRocks Manager provides visual SQL diagnosis and analysis capabilities, which can help users quickly find the root cause.

metadata management

At present, metadata management only provides a relatively basic capability, which is to display the contents of tables. More and finer-grained functions will be implemented in the future, such as import tasks, materialized views, appearance management capabilities, and so on.

Security center

Currently, a basic user management and library-level permission control are provided in the Serverless version. Because 3.0 is going to restructure the permissions of a community, it is planned to do a finer-grained permission control after the release of 3.0.

Version function description

The following table lists the functional differences of different versions of EMR StarRocks.

The core level is basically consistent. For individual functions, such as some scenarios of data lake query, in terms of iteration rhythm, because it will be faster to adapt to Alibaba Cloud's internal products, Alibaba Cloud's version will be launched faster, but it will eventually be contributed to the community. In the scenario of Flink VVP CTAS, because it is a special customized version with Flink, it cannot be contributed to the community.

In terms of instance O&M management, the fully managed version provides a wider range of visualization and O&M-free capabilities. Some of the Manager's capabilities introduced above, such as visual database/table management, slow SQL analysis, etc., are currently only available in the Serverless version.

3. StarRocks scenario case

4. Future planning of StarRocks

The EMR Serverless version has been invited to be tested since January this year, and only had some basic capabilities at that time. After the public beta started on April 10, more abilities were released.

The plan for Q2, one is commercial release, and the other is to make more enhancements around the scene of DLF Hucang analysis, because the requirements for computing resources in Hucang analysis are more flexible, so pay-as-you-go and some flexibility are required ability. In addition, we will provide instance health checks to help you quickly locate problems with the cluster. Around the Manager, the management capability of materialized views is realized. Although materialized views are not used much at present, with the release of 3.0 and the release of storage-computing separation architecture, materialized views will be used more and more. There are also data import management, and SQL Editor and so on.

Q3, after the release of 3.0 storage and calculation separation, it is expected that the entire big data scene can be directly applied around lake formats such as Iceberg and Hudi, and some capabilities of materialized views and lake formats can be used to quickly realize LakeHouse scenarios. In addition, there are refactoring permission models, and MaxCompute integration, etc.

Q4, we will enhance the ease of use and productization in terms of instance backup and recovery, and instance migration. And continue to do deeper optimization and iteration on existing functions.

The above is an overall plan for this year, and of course adjustments will be made based on the needs of customers in specific scenarios.

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/131050117