What is a transient cluster? Interpretation of the innovative concept and application of the volcano engine EMR Stateless

94eb3a73333ee2389567b19566400fd1.gif

Author | Volcano Engine EMR Team    

As we all know, the EMR system based on Hadoop has gone through many stages in its development to the present. It has evolved from the 1.0 stage, which is deployed through CDH based on IDC computer rooms, to the 2.0 stage, which is carried out based on the separation of storage and computing on the public cloud.

On these basis, the EMR team of VeDI, the volcano engine digital intelligence platform, has explored the stateless EMR 3.0 evolution stage. At the end of last month, Volcanic Engine EMR officially launched the new function of transient clusters. This capability is based on the industry-leading EMR Stateless concept, which can realize elastic scaling at the cluster level, that is, release the cluster when there is no business demand, and pull up the cluster when there is business demand. Thereby helping enterprises significantly reduce product usage and platform operation and maintenance costs.

What is a transient cluster and what is the Stateless concept? This article comprehensively introduces the innovative concepts and applications of EMR Stateless from multiple perspectives such as basic concepts, architecture systems, evolution processes, practical application scenarios & usage value.

67421cc1a92a52e4bd9c241a31fa755d.jpeg

488c6f7b02c213c5deb98c1c4a03fc0e.png

What is Stateless?

Stateless - Its essence is the concept of a transient cluster, but it is not entirely a transient cluster. It is a lightweight, stateless transient cluster that is delivered. So what does stateless transient clustering mean?

First of all, the Stateless cluster is a transient cluster further evolved on the basis of the separation of storage and computing. Ordinary storage-computing-separated clusters, such as the relevant content in the Hadoop system, are bound in the cluster, and these stateful content are not completely separated into an independent service. Stateless, on the other hand, serves the Hive Metastore and History Server, that is, separates them from the computing cluster.

With the support of Stateless, the master, core, task and other nodes in the Hadoop system we refer to form a stateless lightweight transient cluster, which can be created or released at any time, and has multiple copies, which undoubtedly can Let the cluster have better scalability. Based on this, on the basis of cloud native, we will be able to better grow capabilities and optimize costs from the perspective of clusters.

1fdd36b97b8f930d3c0196084f9028f4.png

Next, let's compare the Stateful mode and the Stateless mode. What are the typical differences between them?

f1b40d8ad9f946cc92b662e4a5c0ffcd.png

The flowchart on the left is a traditional Stateful mode.

In this mode, the data flow that everyone wants to submit a task is usually like this. First, there must be a long-running cluster. Either write the data to HDFS or object storage, and you will get the historical results after the execution.

From the perspective of big data maintenance, after the process of submitting tasks is completed, the operation and maintenance of long-running clusters, whether it is to monitor its running status to see if it has a failure, or to monitor its existing services Log collection, these actions will generate a certain amount of operation and maintenance costs. At the same time, after the task ends, these clusters actually become an empty cluster. From the perspective of total cost tolerance, this is actually a disadvantageous option. The above is a typical Stateful model.

In the stateless mode, all this will change.

First of all, the first step of the operation directly becomes submitting the task. After submitting the task, the cluster will be created in time and on demand for running the task. When the task is completed, the cluster will be released. After the user gets the calculation results, it means that the entire task submission process ends.

In this process, Stateless has externalized functions with state attributes, such as log services, into the cluster. After the cluster is released, users can still query the results of any task executed in the cluster under the Stateless cluster template within any period of time through the log service.

In such a process, users do not need to operate and maintain the execution cluster. This is the biggest difference between Stateful and Stateless.

8bf2e705dd85f9cd4008a9e6fd804e63.png

After reading the above content, you will definitely have some questions. There are some common conceptual issues. I can explain them to you here first.

1. What is the difference between Stateless and Serverless?

First of all, compared to Stateless, Serverless is actually the difference between fully managed and semi-managed. In the case of semi-managing, users need to operate and maintain some cluster resources and cluster configuration-related content by themselves. In the case of full hosting, users can omit this part of the configuration, but they will also lose some customized configurations. Cluster flexibility.

Stateless, on the other hand, is actually under a semi-hosted scenario, based on the cloud-native optimization system in the form of on cluster, which has no essential connection with the fully managed form of Serverless. Their similarity is that they are relatively sufficient in the use of resources, and computing resources exist only when performing tasks.

2. Stateless transient cluster, how do you understand transient?

Regarding this issue, a deeper point is transient state. What kind of time granularity can it be called transient state?

First, let's compare a transient cluster with a common EMR cluster on the cloud. Ordinary EMR clusters are deployed for a long time, and may be deployed for a week or two, or even a month or two months. In transient clusters, when tasks arrive, we create a cluster for these tasks and release the cluster after the tasks are run.

Similarly, when you create it for the second time, you can directly perform an operation similar to replication. The configuration and specifications of the cluster will be the same as before. For users, there is no cost to achieve this level. Users only need to define this cluster, and Stateless can create such a transient cluster on demand, and the time granularity of this transient cluster is minute-level, without considering any errors that may occur if the interval is too long.

3.What business scenarios is Stateless suitable for?

Based on our practical scenarios, first of all, it is suitable for users who need to separate storage and calculation, and is more suitable for offline batch running scenarios. When the calculation amount is relatively large and there are obvious tidal properties, the cost savings are very obvious.

4. Does Stateless require users to change their usage habits?

In terms of user usage, no changes to the process are required. Stateless is only optimized at the level of cloud native management and control, and is optimized at the level of stateless service stripping. For the user interface, whether it is the open source web UI or the external interface of the open source engine, there is no change in the task submission process. These are completely compatible with open source, and you can always enjoy the technical dividends brought by the iteration of the open source community version.

f68b8d1cc46358db6f5fece0bc30bbc2.png

Stateless big data system

After understanding the above content, you should have a preliminary understanding of Stateless, and then I will introduce how the system is implemented.

de0428edf9fd176f3d0fc332eb3f1a92.png

First of all, in the Stateless architecture system, user clusters include offline analysis (Hadoop system), real-time computing (Flink system), interactive analysis, NoSQL database, machine learning and other related contents. This is a cluster with computing features, and all content with state parts has been stripped away. Stateless separates History Server and UI-related content into independent services, including Spark History Server, Presto History Server, YARN Timeline Server, etc. These services exist regardless of whether the cluster exists.

Secondly, unified scheduling and development packaging are implemented through Open API. At the same time, EMR Studio is turned into a service (EMR Studio can be understood as a scheduling engine similar to Oozie, Airflow, DolphinScheduler, etc.). Users can use these services directly on Volcano Engine EMR without submitting machines for deployment.

Relying on the rich cloud ecosystem of Volcano Engine, Stateless can also seamlessly connect with data research and development products. In addition, EMR metadata, including Hive Metastore's built-in metadata database, external RDS, etc., are also extracted into unified services. I believe that friends who have used the Hive Metastore must have been cheated by the RDS of the Metastore. If there is a disturbance in RDS, there will be problems with the Hive Metastore, but these problems can now be effectively solved by cloud native services.

At the same time, the configuration center has also built a layer of clustering for the cluster, such as cluster configuration, required components, etc., which will be stored in a virtual form. At the same time, the engine's metadata has been service-oriented, including permission control, user system, etc.

Finally, Stateless solves a problem that is very troublesome for operation and maintenance - the log fills up the local disk. There will no longer be such problems under the stateless system. Through TOS object storage, logs are placed on an on-demand object storage. Object storage can be considered infinite, so there is no need to worry about the disk space it occupies. You only need to define its life cycle, and this problem can be solved.

789ab5e12f3110d0a82e2333c7f8be98.png

The big data system based on Stateless was mentioned above. Now we will enter a link where we will use a case on the state flow to explain the system just now.

First of all, as you can see from the picture above, what is enclosed by the dotted line is what Stateless abstracts from the entity cluster, including a series of services such as metadata services and management and control services, including Web UI and Open API. These Open APIs will be used as triggers to control cluster creation and destruction, and relevant instructions will be handed over to the scheduling platform, such as Airflow, DolphinScheduler, etc. When submitting tasks, the scheduling platform will have some impact on the life cycle of the cluster through the interface.

Secondly, at the trigger level, the cluster can be controlled mainly through the Open API provided by Cloud Native. If you want to submit a task, you will start a new cluster and restore the status of the cluster, which refers to what kind of cluster configuration the task wants to have. This configuration may be a version parameter, or it may be a configuration of some models. Regardless of the configuration, Stateless faithfully restores the cluster to its initial state. Because the cluster is stateless, after the execution of the job is completed, the entity cluster will be released, and its life cycle will end.

The above case is about the submission of a single task. However, in practice, multiple tasks may be submitted to the cluster. In this case, the entity cluster will be released until all tasks are executed. After the cluster is released, if there are tasks that need to be submitted, in the same way, you only need to start a cluster with the same configuration, execute the tasks again, and then release them after the execution is completed. This is the general process of operation of the Stateless system.

e2819e893625c63372ea495f948f601d.png

Stateless Evolution Process

After understanding the conceptual and structural content, let's share the evolution process of EMR Stateless. EMR Stateless sounds like a relatively new thing. Why do we do this?

8f52bfcc456006916446d93ba15b198c.png

In the opening chapter, I shared with you the many stages of the EMR system based on Hadoop.

The first is the 1.0 stage of deployment based on IDC computer rooms through CDH. Up to now, many users are still operating based on the 1.0 system. It also has its benefits, whether adding resources or other operations, it is completely controllable. Of course, there are also many problems. For example, the complexity of operation and maintenance is very high, and due to the integrated storage and computing architecture, a large amount of computing resources are idle, and there is no way to perform on-demand computing scenarios.

Based on these problems, the 2.0 stage of construction based on the separation of storage and computing on the public cloud has evolved. By separating storage resources and computing resources, computing resources can be allocated on demand as much as possible. Of course, this also has limitations and prerequisites. Its computing resources are elastic based on nodes. Therefore, in the 2.0 era, the most fundamental problem it solves is the decoupling between computing and storage.

However, the EMR team of Volcano Engine discovered some problems in the 2.0 era, which affected daily operation and maintenance and cluster stability. Based on these two points, we first considered the elasticity capability and made elastic scaling for the entire cluster. The prerequisite for doing this was to decouple the cluster and related cluster metadata. On this basis, we made the cluster stateless. ization, that is, the separation of cluster services.

When it comes to this, the advantages of Stateless are actually reflected. First of all, the stateless cluster is 100% free of operation and maintenance, because it only appears transiently during runtime. Secondly, users do not need to worry about cluster services, and it also retains the advantages of the semi-managed model. After technical research and comprehensive judgment, compared with the offline deployment model of IDC computer rooms in the EMR 1.0 era, the resource cost of using EMR Stateless can be optimized by more than 40%.

The above is the evolution process of EMR Stateless, as well as the Volcano Engine EMR team’s thinking and process about Stateless.

Next, I will share with you in detail how each feature evolves.

2b25e73de0cc5d14d09326259c097d42.png

First of all, the separation of storage and calculation is actually a product of the EMR 2.0 era. In the stateless environment, first of all, some features in EMR 2.0 are retained. On the new layer, the most important function is to optimize and accelerate the load. This can be considered as a local buffer, which will make the speed faster.

c8c6e24ff5dd07aee579584af73fb8da.png

It further provides management of hot and cold stratification, which can directly control and define the behavior of hot and cold stratification, which is also a prerequisite for cost savings. Computing resources have been optimized to the extreme, so storage resources also need to be optimized in time.

First of all, the platform will automatically diagnose the user's hot and cold stratification. What is the prerequisite for diagnosis? From the above figure, all read and write behaviors of users on the Volcano EMR side will go through the Metastore service to understand which data is cold data and which is hot data.

Based on the user's definition of the table, automated diagnosis is then performed, and these diagnoses and predictions are exposed to operation and maintenance personnel, allowing them to see the diagnosis results and give platform recommendations. Then manual judgment is used to let the cold data find its final destination. On this topic, Stateless provides many levels of cold data stratification, from standard to low frequency, to archiving, to refrigeration, and finally to deletion, which can find the most suitable storage point for data with different characteristics. This is also an aspect of Stateless’ empowerment for users.

fda56cd86c0f2e811360faffef53c7f6.png

Elastic scaling is also a major feature resulting from the evolution of Stateless.

For elastic scaling, the mainstream is divided into two types. The first is time-based elastic scaling. For example, the operation and maintenance personnel know when the peak is and when the valley is. The second is an elastic scaling based on load indicators, because there may be special situations, such as within a time range, the value suddenly increases or decreases, or it is based on load conditions.

In this regard, the Volcano Engine EMR has unique settings that can be used to mix time and load. Moreover, in addition to acting on the nodes of the Hadoop cluster, such a hybrid elastic mode can also act on the entire transient cluster. 

654c6992a8a166d032482edd6a07db78.png

The Hive Metastore service was also mentioned above. Whether it is the metadata of offline tasks or real-time and real-time offline mixed deployment, its metadata can actually be hosted in Hive Metastore.

5ae4dc261acf1063acdce78266c4c7fd.png

Currently, Stateless has implemented a series of Public History Server services. It can exist independently of the cluster entity, and when the cluster is running, these jobs will report data to the Public History Server service. Users can access directly through domain names without the complicated steps of binding IP. 

ce43b02b2c659f3b49e1ab7a1b7b56ec.png

Relying on the cloud product ecosystem, Stateless has also upgraded its log service, which is based on OpenSearch, and the final data is implemented on TOS. In the face of data loss or the impact of data disks, these users do not need to consider and operate it. However, this part of the function is not yet completely mature and will take some time to improve.

aa0481dc5736a0bf6193fedf15b0c546.png

Regarding scheduling services, we have also integrated DolphinScheduler, Airflow and other services with scheduling capabilities. Why do we integrate these contents? This is because these components will schedule the API tasks to create the cluster and will be triggered as the tasks in the scheduling system are submitted. Stateless integrates all these methods into services. Users do not need to deploy them themselves, and can directly use them out of the box. .

e7e4033a0d1047f81b183bee61f08e12.png

f58ad6f2c4a421532d6d769d95bd829a.png

Finally, the contents of user service and authentication service are combined and shared with everyone. The first is the servitization of users and the servitization of user rights. User service is to integrate LDAP into a unified user management service. Of course, the LDAP level is still retained. For example, if users are accustomed to using the LDAP UI, they will have no problem operating it themselves. They do not have to repeatedly operate a cluster to import the user's system. This is the biggest benefit.

Similarly, the Stateless authentication service uses Ranger, because Ranger is a concept of RBAC, and on top of the concept of RBAC, Stateless also abstracts the concept of RBAC, allowing users to configure a richer set of permissions. In addition, this permission is interoperable with the user system. A set of user systems plus a set of permissions can cover all RPC models related to user and role permissions. This is also a very important capability in the evolution of Stateless.

33c965c1e8eb365cdcd670ed41b401bc.png

Stateless business value

Finally, I would like to share with you the business value of Stateless.

bd12e9c4b1f0583b4d7372a049df53e3.png

First, let’s introduce a typical scenario that reflects business value——

What kind of clusters are stateless clusters? How to optimize costs?

First, when creating a stateless cluster for a user, the selected cloud server model may be different. First of all, it is a pyramid structure. At the bottom level, the user's computing resources are first guaranteed.

Second, try to meet the computing characteristics of users. For example, word count or CPU-intensive calculations do not use much memory. We will try our best to help users save memory resources and choose models with a close ratio of CPU to memory.

Third, help users optimize costs. There are two pricing models, one is on-demand and the other is bidding. In principle, bidding is cheaper than on-demand, and because stateless clustering takes a short time, we will try our best to choose cheaper models for users. For example, what should I do if a model that a user prefers is out of stock? We try our best to select models similar to those defined by users in terms of price and configuration to ensure that users' computing tasks can be performed.

c9387aaf743381d28c0e0e70c1c8e319.png

Finally, to make a brief summary, what are the benefits of Stateless?

First of all, it is actually very simple to pay as you go. Created on demand and automatically destroyed, users do not need to care about the status of the cluster because it will always be there with the task. Secondly, it is always in an iterative state, and you can always enjoy a dividend brought by the iteration of the open source community version, because we always embrace open source, which is also the original intention of our volcano engine EMR that will not give up.

Then there is the separation of storage and calculation and elastic expansion. Elastic expansion has certain advanced features and can be completed on the strength of the cluster. The logs are idempotent and uploaded to the cloud. They can be viewed at any time. Users do not need to perform excessive operations and maintenance on the logs.

Finally, when it comes to operation and maintenance, Stateless extracts stateful services. Users no longer need to care about content related to cluster services. They only need to care about running calculations, calculation debugging, and calculation diagnosis.

437529d4db2f88f33a27c265a159c80c.gif

The "2022-2023 China Developer Survey" has been launched. Welcome to scan the QR code below to participate in the survey. There are also exquisite gifts such as iPads waiting for you!

14180d03b3a913de66d4e214a2a571e1.png

☞蚂蚁集团强化与阿里隔离:马云不再是实际控制人;iPhone 15 Pro将独占6大功能;Linux 4.9正式EOL|极客头条
☞C++:在“替代”中迎来“转机”的 2022 年!
☞以防作弊,ChatGPT 遭教育部“拉黑”:师生禁用!

Guess you like

Origin blog.csdn.net/CrisAppleYan/article/details/128631346