Stretching out to the cloud native of the big data platform, how to migrate the big data platform to K8s?

We are living in the era of data explosion. According to IDC data, in 2022 alone, humans will create more than 97ZB of data; we must know that as of 2012, the data volume of all printed materials produced by humans is 200PB, which is only in 2022. 1/500,000 of the amount of data created in a year. It is predicted that the scale of China's data volume will increase from 23.88ZB in 2022 to 76.6ZB in 2027, with an average annual growth rate of CAGR of 26.3%, ranking first in the world.

The volume of data is increasing rapidly, and data-driven enterprises are facing greater challenges

8fcc785b43470e59179ef345637a0864.png

The big data explosion presents both opportunities and challenges for the development of enterprises. The surge in the amount of data makes it necessary for enterprises to create more value for these data. In the process of promoting this process, it will naturally transform into a data-driven enterprise.

Then, data-driven enterprises are facing six major challenges: most enterprises lack a clear data platform strategy; the cost of storage, analysis and data innovation brought about by high-speed data growth is too high; it is difficult to find scenarios to maximize the value of data; It is not clear what new technologies or products should be used to support business innovation; the internal skills of the enterprise are insufficient to support some innovative data projects; the enterprise lacks the ability of data governance and security protection.

How to turn challenges into opportunities? First, break the island of data and realize the analysis of data integration and fusion. Second, data-driven intelligent innovation, using innovative products to reshape the innovation engine; third, adopting cloud-native architecture to help enterprises drive business innovation with data.

  Breaking through the limitations of traditional big data technology architecture, cloud native and K8s work together

5b53bcffd28cc4c775c1abd68fa29c17.png

The big data ecosystem centered on Hadoop has been the choice of most companies to build a big data platform since it was open sourced in 2006. However, as people use it deeply, more and more problems arise, such as: The installation and configuration of system components is complicated, the utilization efficiency of cluster resources is low, the workload of operation and maintenance is heavy, the iteration efficiency of data application development is low, and the integration of new development tools is very complicated. These problems have become important obstacles to the accelerated iteration and upgrade of enterprise digital transformation.

Since we cannot rely on the development of Hadoop ecological technology itself to solve the problems caused by traditional big data platforms, then we should focus on the latest technology development trends, that is, cloud native represented by containers and K8s technology.

After the official release of the container project in 2013 and the K8s project in 2014, cloud native technology has developed very rapidly. Now, all major public cloud vendors support K8s, and hundreds of technology companies continue to invest in the iteration and update of K8s. At present, CNCF's ecological panorama includes more than 1,000 cloud-native technology products, covering more than 10 technical fields such as database, message-level stream processing, scheduling and task arrangement, and storage system. 

2021 should be a milestone in the development of cloud-native big data technology. In March 2021, Apache announced that Spark 3.1 officially supported K8s. In addition, in May 2021, Confluent, the commercial company behind Apache Kafka, also released Confluent on K8s. A privately released Kafka production cluster system running on K8s. These two important events show that the cloud-nativeization of big data platforms is the general trend. According to this trend, Hadoop is gradually migrating to K8s.

  The road to advanced cloud native, migrating the big data platform to K8s

9f1ea014f18f0417218974a9f7c92cc7.png

Following the trend, more and more enterprises gradually transform their business system loads into cloud-native ones. After migrating to private or public cloud platforms based on Kubernetes, they run a set of traditional big data platforms independently outside the cloud-native system. And operations have added a lot of unnecessary complexity and waste of resources.

Then, the Kubernetes big data platform (KDP for short) based on cloud-native architecture independently developed by Zhilingyun is the key platform to solve the above problems. Migrate the big data platform to K8s to solve the problem that when domestic enterprises use K8s, most of them are doing cloud computing-related scheduling. For the field of big data, enterprises are still managing another complex system, namely traditional big data. platform.

8edd2592fa803a56bc9369706757627c.png

KDP system architecture diagram

KDP uses Kubernetes as a resource scheduling platform to uniformly schedule and manage big data components and data applications. Based on the transformation and integration of open source big data computing and storage engines, the platform realizes the deployment, release, management and operation and maintenance of mainstream big data in a standard way through the big data integration base developed by Zhilingyun. data component.

For example, you must have used the Windows resource manager. KDP is like a resource manager for big data components. It manages all big data components and allows users to use them more conveniently, thereby greatly improving system operating efficiency and reducing Operation and maintenance costs. 

6148ed4421064777923388801a539373.png

KDP management interface diagram

What will KDP bring to the enterprise?

In fact, the efficiency that KDP brings to users is real. For example, take a large operator as an example. There are about 30,000 servers in the data center. The utilization rate of these servers is seriously insufficient, and the average utilization efficiency is only about 20%-30%. However, under the unified resource allocation of the KDP platform, only about 6,000 devices are needed to achieve the original effect, which greatly saves investment in equipment, power, space, etc., and enhances the competitiveness of customers.

Specifically, KDP can standardize configuration management, that is, adopt a unified Kubernetes file configuration method to standardize the configuration management of big data components, simplify the integration of big data components and Kubernetes clusters; realize efficient resource utilization, and cluster resources as a shareable The resource pool realizes the mixed deployment of real-time and offline operations, and the utilization rate of cluster resources is increased to 60% compared with 30% of the traditional big data platform; elastic expansion, using the elastic expansion technology of Kubernetes, calmly copes with the performance bottleneck of computing operations, Realize the dynamic expansion of computing resources and cluster resources; simplify operation and maintenance, based on the Kubernetes standard Operator operation mode, the unified operation and maintenance interface completes the deployment, upgrade, expansion, backup and other operations of big data components, and improves the operation and maintenance efficiency.

Then, in the specific scenario of the implementation of big data technology, this platform can well replace the traditional big data platform and help enterprises achieve the goal of reducing costs and increasing efficiency in the process of digital transformation.

Efficient cluster deployment and operation and maintenance: Some enterprises, as technology providers, need to deploy and implement big data clusters for multiple internal or external organizations. The superior solution is relatively complex and requires many manual deployment steps, resulting in a long cluster deployment cycle, high project implementation costs, complex operation and maintenance process, and high requirements for operation and maintenance personnel. In this scenario, the use of KDP can greatly improve the deployment efficiency of the implementation project and reduce the manpower and time costs of project implementation operation and maintenance.

Improve IT architecture resource efficiency: Some enterprises run multiple types of data applications, different types of storage engines, and real-time and batch computing jobs in the production environment. In the traditional big data platform environment, independent virtual machine clusters are generally used to deploy such a production environment, resulting in low resource utilization. After adopting KDP, enterprises can use platform features such as mixed job scheduling, separation of storage and calculation, and refined scheduling to improve overall resource utilization efficiency and reduce IT architecture investment costs.

Upgrading of traditional technologies: Traditional big data platforms cannot solve performance bottlenecks encountered in operation and maintenance in a timely manner due to the slow iterative process of technology expansion. At the same time, the software package dependencies between big data components are very complicated, which makes it difficult to upgrade components. New components Integration takes time and effort. The technical team using the traditional big data platform is exhausted under the pressure of operation and maintenance, and has no energy to focus on business development and data value discovery. After the traditional big data platform is gradually migrated to the cloud-native big data platform, the operation and maintenance efficiency can be significantly improved, the operation and maintenance cost can be reduced, and the productivity of the technical team can be liberated.

Self-service digital innovation: Some enterprises need multiple big data clusters to serve different business departments, and data scientists in business departments hope to try new cloud-native artificial intelligence machine learning tools on their own. Obviously, traditional big data platforms cannot meet this self-service need. Enterprises can improve the efficiency of multi-platform management through KDP deployment, provide self-service release of data analysis and artificial intelligence development tools, reduce the cost of overall resource consumption, and accelerate the value of data. creative process.

  The advantages are irreplaceable, and all big data components realize unified and standardized management

50cdd25e39f1240ec05ef8c7f94041c0.png

First of all, KDP is ready to use out of the box, and it is easy to get started with a few commands and operations; second, it has visual management and observability capabilities; third, it innovates in scheduling and migrates the big data platform to K8s.

Of course, the biggest advantage of Zhiling Cloud KDP and what differentiates it from other products is that all standardized big data components can seamlessly run on Kubernetes with the support of KDP. Moreover, KDP is perfectly compatible with almost all mainstream Kubernetes releases in the industry and has good compatibility.

Running the big data platform on Kubernetes has the following four advantages: first, unified management, reuse of the Kubernetes infrastructure, greatly reducing complexity; second, resource mix, efficient use of shared resource pools, each component and the entire cluster are very It is easy to elastically scale; third, the entire system can quickly support the integration of new applications and rapid iteration; fourth, the system stability is greatly improved, and the operation and maintenance efficiency is high. KDP focuses on the installation of various big data components and unified resource management. For example, compared to Windows Explorer, KDP is like the resource manager of the big data platform.

At present, Zhilingyun KDP is suitable for the following types of users:

  1. Users who need to deploy and run big data components and applications on Kubernetes, such as cloud native developers, data engineers, data analysts, etc.;

  2. Users who need to carry out cloud-native transformation and migration of existing big data systems, such as users of traditional Hadoop platforms, users who need to improve system efficiency and reduce operation and maintenance costs, etc.;

  3. Users who need to quickly build an enterprise-level cloud-native big data base platform, such as users of digital innovation and transformation, users who need to support multiple data scenarios and applications, etc.

If you want to use Zhilingyun KDP to deploy and run big data components and applications, you can refer to the following steps:

First, you need to install the Zhilingyun KDP platform on the Kubernetes cluster, which is a containerized cloud-native big data platform that can manage big data components and applications on Kubernetes.

Then, you can select the big data components and applications you need on the Zhilingyun KDP platform, such as Hive, Spark, Flink, etc., and configure related parameters.

Finally, you can start and stop your big data components and applications on the Zhilingyun KDP platform, and view related status and logs. You can also access your data sources and storage through the Zhilingyun KDP platform, and perform data analysis and processing.

Kubernetes standardizes the release and management of business applications. The ultimate goal of Zhiling Cloud is to standardize the release and use of data applications. Starting from the containerized cloud-native big data platform, Zhilingyun is moving forward step by step.

About LinkTimeCloud

Zhilingyun is the innovative leader of cloud-native big data technology in China. It provides enterprise customers with cloud-native DataOps product series based on cloud-native big data platform, including cloud-native data integration development platform and cloud-native data asset operation platform. Zhilingyun helps enterprises build data and AI middle platforms through products and services, easily build a closed loop of business data capabilities, establish a digital operation system, and finally complete data-driven digital transformation.

Zhilingyun has served many well-known enterprises at home and abroad in the fields of energy, education, medical health, Internet of Things, finance, etc., and has carried out close cooperation with many partners in the field of cloud-native ecology, making full use of their respective advantages to jointly serve Enterprise customers provide more valuable cloud computing, big data products and technical services.

- FIN -

8032e81da0100b8eb6bd5af5834583bc.png

fe1403adca4cc93a9b376f4b4325aaaa.gifClick "Read the original text" to understand KDP

Guess you like

Origin blog.csdn.net/LinkTime_Cloud/article/details/131199049