Use Kubernetes to reduce costs and increase efficiency? EasyMR’s exploration and practice based on Kubernetes deployment

Kubernetes is a cloud-native system for orchestrating containerized applications . Originally created by Google, it is now maintained and updated by the Cloud Native Computing Foundation (CNCF).

Kubernetes is one of the most popular cluster management solutions on the market . It automates the deployment, scaling, and management of containerized applications, allowing management and coordination of container clusters across multiple hosts , providing services such as fault tolerance and scalability.

Simply put, if your applications can be containerized (for example, with Docker), then you should absolutely use Kubernetes to run and manage those applications. With the support of k8s, the utilization of local or cloud hosting infrastructure can be greatly improved , and all computing resources can be dynamically and reasonably shared among multiple applications.

Kubernetes is responsible for scheduling and automatically executing container-related tasks throughout the application life cycle, including deployment, operation and maintenance, service discovery, storage configuration, load balancing , automatic expansion, self-healing to achieve high availability, and more.

Today, Kubernetes and the broader container ecosystem are maturing into a universal computing platform and ecosystem that can compete with, or even overtake, virtual machines (VMs) as the fundamental building blocks of modern cloud infrastructure and applications. potential. However, Kubernetes itself is a relatively complex platform. It is impossible for an operation and maintenance or developer to quickly master Kubernetes, so this raises the threshold for traditional operation and maintenance developers to use it.

EasyMR is a big data computing engine product that provides one-stop visual component installation and deployment and observable operation and maintenance management capabilities. Naturally, we have also conducted practical exploration based on Kubernetes deployment.

Exploration of EasyMR based on Kubernetes deployment

The EasyMR we discussed before is based on the host cluster mode. If you need to deploy services, you need to access the host first, and then deploy the corresponding product package services to complete the rapid construction of the application cluster. However, with the popularity of cloud-native related technology stacks (containers, microservices, service mesh, etc.) and Kubernetes in recent years, the traditional model is in urgent need of updating to adapt to the development of the general trend. Therefore, we decided to create a new containerized deployment version based on EasyMR's original product model based on product package deployment .

As we said before, due to the complexity of Kubernetes itself, it is more laborious for general development and operation personnel to use, such as controller (Deployment/Daemonset/Statefulset/Job/CronJob), storage (PVC/PV/StorageClass), etc. Therefore, we still leave the complexity to the platform to solve, and the interactions exposed to users are easy to understand.

In host cluster mode, the steps to deploy services are downloading the package -> decompressing the installation package -> configuration delivery -> service startup. EasyMR's own easyagent can manage the full life cycle of the service. Based on the Kubernetes architecture, we can also develop corresponding versions of agents. However, after researching some open source services on the market, we found that kubevela can just make up for this part of our capabilities.

kubevela uses OAM (Open Application Model), which is essentially a high-level abstraction and encapsulation of the responsible DevOps process based on the separation of concerns principle of software design. It is an application-centered Kubernetes API layering. This model is designed to define cloud native applications. standards.

As an EasyMR platform, based on kubevela, we only need to provide a variety of scalable component types to shield the underlying complex implementation logic of Kubernetes from upper-level users. Users who use EasyMR to deploy Kubernetes services only need to pay attention to the service type and modify the application configuration to implement service deployment. We will introduce more details about kubevela/OAM in a later article, so we will not go into details in this article.

For EasyMR, the dimension of deploying services is always the product package. We have not changed this. The core of the product package is the schema file . Therefore, we expanded some fields to accommodate the requirements of Kubernetes deployments.

file

The workload in the above table indicates the service type. For example, if the platform has a built-in master-slave MySQL workload, then you only need to declare the service type to be MySQL and the name of the image in the product package. When executing the deployment, the platform will automatically create a stateful application of MySQL. Kubernetes underlying resources such as statefulset, configuration file configmap, service, storage pv/pvc, etc. This greatly saves labor costs and improves delivery efficiency. If subsequent component types need to be expanded, they can be gradually improved during platform iterations.

The EasyMR cloud deployment architecture is shown in the figure below:

file

In the architecture diagram , vela-core is the core deployment component, and config-reloader will dynamically monitor the update status of the configmap used by the Pod to restart the application Pod.

EasyMR’s future exploration based on Kubernetes

EasyMR is an elastic computing engine based on cloud native technology and open source big data components such as Hadoop, Hive, Spark, Flink, Hbase, and Presto. Being able to deploy big data components is only the first step in the milestone. In the future, our goal will be to invest in A longer-term point - the separation of storage and calculation :

● Use Kubernetes instead of Yarn as the scheduling component

The underlying resource management platform of distributed streaming batch computing frameworks represented by Flink and Spark has gradually shifted from YARN in the Hadoop ecosystem to the Kubernetes native scheduler and peripheral resource schedulers in the Kubernetes ecosystem, such as Volcano and Yunikorn.

● Use object storage + cache acceleration

As cloud computing technology matures, enterprise storage has another option - object storage . It started with AWS, and later all cloud vendors were developing in this direction, replacing HDFS with object storage.

However, when object storage is used to support complex systems like Hadoop, the following problems will occur: file listing performance is weak; object storage does not have atomic Rename, which affects the stability of the task; the eventual consistency mechanism of object storage data will reduce the stability of the calculation process sex and correctness. Therefore, we also need cache acceleration layers such as Alluxio/Juicefs to improve the performance of our object storage.

"Dtstack Product White Paper": https://www.dtstack.com/resources/1004?src=szsm

"Data Governance Industry Practice White Paper" download address: https://www.dtstack.com/resources/1001?src=szsm Friends who want to know or consult more about Kangaroo Cloud big data products, industry solutions, and customer cases, please browse Kangaroo Cloud official website: https://www.dtstack.com/?src=szkyzg

At the same time, students who are interested in big data open source projects are welcome to join "Kangaroo Cloud Open Source Framework DingTalk Technology qun" to exchange the latest open source technology information, qun number: 30537511, project address: https://github.com/DTStack

Alibaba Cloud suffered a serious failure, affecting all products (has been restored). The Russian operating system Aurora OS 5.0, a new UI, was unveiled on Tumblr. Many Internet companies urgently recruited Hongmeng programmers . .NET 8 is officially GA, the latest LTS version UNIX time About to enter the 1.7 billion era (already entered) Xiaomi officially announced that Xiaomi Vela is fully open source, and the underlying kernel is .NET 8 on NuttX Linux. The independent size is reduced by 50%. FFmpeg 6.1 "Heaviside" is released. Microsoft launches a new "Windows App"
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3869098/blog/10143760
Recommended