Data stack V6.0 new product matrix released, data base EasyMR rejuvenated and upgraded

On April 20, Kangaroo Cloud successfully held the 2023 Spring Growth Conference with the theme of "Integration of Numbers and Reality, Resilient Growth". At the meeting, the one-stop big data basic software independently developed by Kangaroo Cloud - Data Stack V6.0 product matrix was released. It upgraded its full line of products in the three major modules of big data basic platform, big data development and governance, and data intelligent analysis and insight, and focused on releasing the enterprise-level data computing and storage platform - the self-developed big data engine EasyMR .

This year's collective study meeting emphasized: "We must fight hard for the localization of scientific and technological equipment, operating systems, and basic software , improve the level of localization substitution and application scale, and strive to use our country's independent research platforms, equipment, and equipment to solve major fundamental problems as soon as possible." research problem."

Kangaroo Cloud, as a leading digital basic software and application service provider in China, has always adhered to independent innovation, focused on the research and development of big data basic software, and used advanced technology to empower digital transformation of more industry customers, helping to discover and release the potential value of data resources .

New release of the data stack V6.0 product matrix

Sishu, the person in charge of Kangaroo Cloud's production and research, first introduced the progress of the new product matrix of Data Stack V6.0 and the direction of product upgrades. Through sorting out, refactoring and upgrading of years of digital practice, Data Stack V6.0 has formed a new digital product matrix of big data basic platform layer + big data development and governance layer + data analysis and insight platform layer, with iterative new Combinations, new capabilities, and new technologies inject a stronger "digital stack" driving force into digital-real fusion applications.

file

In the entire matrix, the big data basic platform is the base, including two newly upgraded products: Big data computing engine EasyMR and lake warehouse integration EasyLake . EasyMR mainly provides installation, operation and maintenance, and deployment of one-click components such as Hadoop, Spark, Flink, HBase, and Trino. EasyLake mainly provides unified metadata management for data lakes, as well as data services and analysis. The big data basic platform is designed to provide ready-to-use strong power and solid foundation for the digital transformation of various industries.

Looking up, middle-tier big data development and governance integrates the DataOps data concept, including five modules : offline development BatchWorks , real-time development StreamWorks , data service DataAPI , data asset DataAssets and indicator management DataIndex . With self-controllability and security innovation as the technical core, global data assets are aggregated, processed, managed, served, and analyzed to provide customers with a safe, stable, and easy-to-use big data platform, accelerate the release of data value, and empower digital intelligence application.

The top layer of data intelligent analysis and insight layer provides two application functions of customer data insight UserInsight and data visualization analysis EasyBI . Help enterprises build a business value-oriented data analysis and application system to drive business growth.

The following is the big data computing engine product EasyMR, which is the key upgrade of Data Stack this time, organized according to the speech of Sishu.

EasyMR: big data computing engine

Rich features of EasyMR

EasyMR includes computing components such as Hadoop, Hive, Spark, Trino, HBase, Kafka, etc. It is fully compatible with the Apache open source ecosystem, one-click to open the LDAP+Kerberos+Ranger authentication authority system, supports library/table/row/column level authority control, and provides enterprise level security control.

file

The cluster management of EasyMR includes the following five functions:

Host management: It can connect to host types such as x86 servers, ARM servers, and Kubernetes clusters, and perform host operations including batch access, host removal, and host monitoring

· Installation and deployment: including automatic deployment , manual deployment, patch upgrade/downgrade, component rollback and other rich functions

· Cluster operation and maintenance: including functions such as component start and stop, health check, service log viewing, and dynamic expansion and contraction according to the usage of the customer's business side

· Monitoring and alarming: With the operation of the business, if there is an abnormal situation in the operation of the host, automatic alarming can be realized

Basic management: including user management, operation authority management, audit log and other functions

The rich functions of EasyMR can help enterprises use data more comprehensively, intelligently and securely, and accelerate the digital transformation of enterprises.

Core Features of EasyMR

● Localization of Xinchuang

EasyMR has completed the adaptation and mutual recognition work with mainstream Xinchuang ecological manufacturers, and supports domestic operating systems such as Tongxin UOS, Longli, and Kirin, domestic chips such as Kunpeng 920 and Phytium, and domestic servers such as Great Wall Sky CF520 and Huawei Public Cloud. Adaptation, as well as the adaptation of most domestic databases and domestic middleware.

For more Xinchuang compatibility of EasyMR, please see the picture below:

file

● Open source/self-controllable

EasyMR is a self-developed big data basic platform of Kangaroo Cloud . Its big data components are 100% based on open source Hadoop, fully compatible with the Apache open source ecosystem, and iteratively synchronized with the open source community to maintain technological leadership at all times. In addition, EasyMR optimizes and enhances the features of some components such as Spark, Flink, Trino, and Iceberg, and gives back to the community to jointly build the Hadoop ecosystem with an open mind .

● Operation and maintenance hosting service

EasyMR provides big data cluster monitoring and alarming , security assurance, data quality assurance and platform operation and maintenance services; provides regular inspection, in-depth physical examination, cost optimization and high-level tuning services; and provides services including big data cluster migration, cluster disaster recovery construction, Implementation services including architecture design and planning, and full-link one-stop operation and maintenance hosting services .

● security

Through the LDAP+Kerberos+Ranger authentication authority system, authority control is performed on the library/table/row/column level to achieve enterprise-level security control.

EasyMR localization adaptation: middleware, metadata database

Hive Metastore, a subcomponent of Hive, uses redis for cache acceleration in the open source solution, but now it can replace redis with domestic middleware such as Bioland BCS.

The metadata information of Hive Metastore itself is stored in MySQL and OracleSQL in the open source solution, and now it can also replace the open source database by adapting domestic databases, such as TDSQL and OceanBase. Based on this, EasyMR realizes the real localization and is completely independent and controllable.

file

EasyMR knows that only by realizing the independent and localization of key technologies can we truly realize technological innovation and overcome the "stuck neck" problem.

Enhancements to Big Data components in EasyMR

Kangaroo Cloud, as a leading digital basic software and application service provider in China, attaches great importance to strengthening the basic capabilities and technical capabilities of products. On the basis of open source technology, EasyMR has carried out multiple big data core components such as Spark, Flink, Trino, and Iceberg. Functionality and performance enhancements. The specific optimization is shown in the figure below:

file

In 2022 alone, Kangaroo Cloud technology students have completed hundreds of commits, contributing to the technological development of Hadoop ecology.

Giving roses to others has a lingering fragrance. While giving back to the community, Kangaroo Cloud has achieved complete independent control of the core code of the entire Hadoop system , and has achieved 100% independent control over the migration of the EasyMR big data platform and maintenance of big data components.

EasyMR independent research and development capabilities: technology open source

From the release of Data Stack V1.0 in 2016 to the current Data Stack V6.0, Data Stack has gone through seven years of iterative practice of six major versions. With the unremitting exploration of technical capabilities, some excellent big data components have also accumulated in the data stack, such as ChunJun, a data synchronization integration component integrating streaming and batching, Taier, a DAG distributed task scheduling component, and ChengYing, a big data platform operation and maintenance component. These component stacks have all been contributed to Github. The following is the open source address of Kangaroo Cloud. Welcome to use it.

Github address: https://github.com/DTStack

file

Technology has no boundaries, and innovation continues. The Kangaroo Cloud Data Stack technical team won the title of "Excellent Open Source Technology Team of the Year" for two consecutive years. ChunJun also successfully advanced to the finals of the "2022 China Open Source Innovation Competition" and won the "Excellent Open Source Project/Community" award. These encouragements are all due to the outstanding product technical capabilities and independent research and development capabilities of the data stack technology team.

CDP/CDH smooth migration to EasyMR solution

Based on the country's policy requirements for domestic Xinchuang, and the background of CDH's end of service (EoS), users can no longer obtain after-sales support, and the demand for localized replacement of big data platform bases in various industries is becoming stronger and stronger. Based on this, DataStack supports the solution of smooth migration from CDP/CDH to EasyMR , which greatly reduces the cost of enterprise migration.

The production business cannot be stopped. The solution supports dual-track operation, that is, the customer's original CDH cluster and Xinchuang's EasyMR cluster can run simultaneously. The entire migration process is simple to operate and flexible to configure, and all migration tasks can be completed in 4 steps.

The first step is to replace the computing platform and development kit.

The second step is to perform data migration. Including the migration of historical data and metadata. During the migration process, EasyMR supports data verification to ensure that the data is consistent during the data migration process.

The third step is to perform task migration. Including acquisition tasks, data processing tasks, task dependencies and analysis engines.

The fourth step is business division. Including the switchover of the cluster, the offline of the old server, the online of the new server and other action switching, and finally achieve the purpose of smooth migration from CDP/CDH to EasyMR, and realize the unawareness of the business on the client side.

EasyMR has accumulated a large number of successful migration cases and rich experience, which can ensure the safety and reliability of the entire migration process.

Hadoop smooth upgrade solution

The version update speed of big data components is relatively fast every year, and many enterprises are still at the stage of Hadoop2.0. Many customers want to upgrade to Hadoop 3.0 to experience new features and performance optimization, but the customer's business cannot be stopped during the version update, what should be done at this time?

EasyMR can realize the dynamic replacement of nodes , first upgrade a node to Hadoop3.0 node, after confirming that there is no problem on this node, then gradually replace the rest of the nodes, so as to truly realize the non-perceptual and smooth upgrade of Hadoop.

file

Practical application of EasyMR

After introducing the specific functions and features of EasyMR, the following will introduce two classic practical applications of EasyMR, and have a deeper understanding of how EasyMR can help companies achieve localized substitution of imported products more efficiently, smoothly and safely.

A national joint-stock commercial bank: CDH migrated to EasyMR

For a national joint-stock commercial bank, in order to solve the problems of low efficiency of branch data application construction, Data Stack helped the customer realize the smooth migration from CDH to EasyMR, and established a data application of "headquarters unified scheduling + branch data collaborative sharing" for the customer cloud platform .

file

EasyMR uses Trino to replace Impala, which solves the problem of node downtime caused by excessive memory usage of Impala, improves query performance, and realizes dynamic isolation of resources; and adopts the "cloud platform" model. The head office + each branch is a separate tenant on the platform. The sharing of underlying storage and computing resources improves the efficiency of data delivery, and the isolation of data permissions ensures security, so there is no need to worry about security issues such as accidental deletion or loss of branch data. EasyMR is compatible with the Xinchuang ecosystem, smoothly migrated to the localization environment, and meets the requirements of the financial industry for the localization of Xinchuang.

Through the construction of the EasyMR big data basic platform, the national joint-stock commercial bank has realized the double-effect improvement of cost management and business control.

A cloud service brand in the payment industry: EasyMR + Data Stack Saas

The financial industry is gradually moving from the digital era to the intelligent era. The customer's demand is to implement EasyMR and the data stack as a Saas, relying on data middle platform products to empower customers and accelerate the pace of moving towards the intelligent era.

file

Facing customer needs, Datastack + EasyMR is fully compatible with the cloud platform's network architecture, server, and cloud platform's unified authority management and control, so as to ensure the stable operation of EasyMR after the customer activates authority resources. And Datastack + EasyMR realizes various operations such as " ordering-automated deployment-one-click expansion and contraction ".

As one of the important promoters of the localization of big data basic software, Data Stack also has a large number of mature big data solutions, which can better support the construction of intelligent and digital applications in various industries.

The first digital stack V6.0 product white paper

In addition, the "Digital Stack Product White Paper" was also released at the meeting : conduct in-depth research and judgment from four aspects: digital technology, product capabilities, application practices, and service support, and overcome the shortcomings of digital transformation in a targeted manner. Interpret solutions in eight aspects including integration and DataOps, focusing on improving customer data management and control capabilities. In addition, this white paper summarizes the experience of effective big data basic software construction, and provides reference and guidance for the digital transformation of enterprises in various industries.

Everyone is welcome to scan the code for free access.

file

Data Stack has always insisted on being independent and controllable, and is actively committed to helping customers create domestically-made innovative enterprise-level big data basic software, helping customers to consolidate the data base, and establishing a full life cycle management system from data acquisition and production to data consumption and utilization, so that data " Visible, usable, and manageable", gain insight into digital opportunities, clarify the direction of transformation, and create new data value.

In the future, DataStack products will also be more suitable for actual scenarios, solve problems with digital intelligence, and practice the mission of "creating value with data".

"Dutstack Product White Paper": https://www.dtstack.com/resources/1004?src=szsm

"Data Governance Industry Practice White Paper" download address: https://www.dtstack.com/resources/1001?src=szsm If you want to know or consult more about Kangaroo Cloud big data products, industry solutions, and customer cases, visit Kangaroo Cloud official website: https://www.dtstack.com/?src=szkyzg

At the same time, students who are interested in big data open source projects are welcome to join "Kangaroo Cloud Open Source Framework DingTalk Technology qun" to exchange the latest open source technology information, qun number: 30537511, project address: https://github.com/DTStack

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3869098/blog/8695646