Best practice of self-built big data platform migration to Tencent Cloud EMR

1890c133be7b33590a3eb83c85f2cba4.gif

Self-built open source big data platforms will encounter problems with the growth of enterprise data: slow performance, long expansion cycle, poor platform stability, difficult operation and maintenance, and high investment costs. Here we will introduce how EMR solves these problems from the introduction of EMR, the comparative advantages of EMR and self-built Hadoop, and the practical cases of self-built migration to the cloud.

1. Product Introduction

Elastic MapReduce (EMR) is a safe, low-cost, and highly reliable open source big data platform based on cloud native technology and pan-Hadoop ecological open source technology. Data clusters are seamlessly and smoothly migrated to Tencent Cloud EMR.

Tencent Cloud EMR products integrate common popular components in the community, including but not limited to Hadoop, Hive, Hbase, Spark, Presto, Impala, Flink, Sqoop, Hue, Iceberg, Starrocks, etc., to meet your needs for big data online business, offline / Online data warehouse, cloud-native data lake construction, real-time streaming computing and other comprehensive scenarios.

The core product capabilities of Tencent Cloud Elastic EMR are shown in the figure below:

2e768ac743681292bd117d85d06dc1a8.png

● EMR integrates 30+ open source big data components, and provides Hadoop2/3 multi-version component sets for users to choose from. You can choose the components you want to use according to the scenario, and pull up the big data platform on the cloud in minutes with one click.

● The cloud-native big data platform supports a storage-computing separation architecture, which solves the challenges of lagging capacity expansion and low resource utilization of traditional self-built platforms. Under the storage-computing separation architecture, COS-based data layered storage can be realized, and computing resources can be used flexibly on demand, improving resource utilization and reducing idle costs.

● During your use, you can visually manage the cluster through the EMR console interface, including service startup and shutdown, configuration management, script delivery, etc.; it also provides 1000+ rich monitoring indicators covering clusters, nodes, and services, and supports multi-channel configuration alarms ; EMR also provides high-level insight functions such as Yarn job query, Impala query analysis, HDFS file storage analysis, etc., to help you significantly improve the efficiency of big data clusters and business operation and maintenance.

2. Comparative advantages of EMR and self-built Hadoop

Compared with using the open source Hadoop release version to build a big data platform, Tencent Cloud EMR has the following main advantages:

1)   The cluster is easy to build, and the operation and maintenance management functions are rich and easy to use, which is easy to use and maintain

i. Building a big data platform based on open-source Hadoop publisher software takes a long period of time, high technical complexity, imperfect supporting development, operation and maintenance, monitoring and other tools, and cannot obtain effective technical support, requiring a large labor cost for support and maintenance.

ii. Tencent Cloud EMR can quickly build a cluster with one click, and the release version has been fully tested and verified for compatibility; the console provides rich operation and maintenance and monitoring tools out of the box, which greatly reduces the cost of use and operation and maintenance; in addition, Tencent provides Professional technical support can also help customers quickly locate and solve problems.

2)   Computing resources are used flexibly on demand, data can be stored in layers, and resource utilization is high

i. Self-built big data platforms generally need to estimate server resources in advance, and make certain reservations for business peaks, resulting in low resource utilization and inflexibility in dealing with tidal changes in computing needs.

ii. Tencent Cloud EMR supports flexible elastic scaling, resources are used on demand, scaling can be completed in minutes, and computing resources can be automatically scaled according to business load or time period. In addition to the first time, your big data service can also be deployed on the container service; the storage also supports the storage-computing separation architecture, which can store data in layers and greatly reduce the cost of storage-computing resources for customers.

3) Continuous reinforcement and optimization of open source components, with better stability and performance

i. Self-built big data platforms generally use open source community version components, and compatibility issues and component defects need to be handled by themselves. The performance also needs to be optimized by itself. If you need to use the new version components of the community or the cutting-edge technology stack to build and test, the cost is high.

ii. Tencent Cloud EMR integrates internal large-scale practical experience, core components such as Hadoop, Hbase, etc. have introduced the Tianqiong Oteam version, which is compatible with open source and provides effective stability reinforcement. In addition, the emerging technology stack provides a wealth of advantageous features. For example, Iceberg supports Z-Order optimization, which can improve scene performance by more than 10 times. Under the agile iteration of cloud products, users can also conveniently build clusters based on the latest stable version components of the community, and easily use emerging real-time lake warehouse technologies such as StarRocks and Iceberg.

4) Full-stack security protection strategy, cloud environment and data are more secure

i. The security capabilities of the self-built big data platform from the bottom layer to the service layer need to be built by itself, which is complex, incomplete coverage, and supporting audit capabilities are not perfect, and there are many hidden dangers and risks.

ii. Tencent Cloud EMR provides full-stack and easy-to-use security protection capabilities from hardware, network, operating system, and big data services. Provide CVM host security protection and abnormal alarm; support cloud hard disk encryption, object storage COS encryption; network level supports VPC network isolation management, network security group setting; cluster support based on Kerberos+LDAP security architecture, identity authentication ensures cluster access security, and provides Data rights management based on Ranger's multiple strategies.

5) Seamlessly connect with cloud ecological services, and quickly build complete supporting capabilities

i. Self-built big data platforms such as data development tools and other supporting capabilities need to be built by themselves, which is costly and takes a long time.

ii. Tencent Cloud EMR can easily seamlessly connect with Wedata data development platform and BI business intelligence analysis products, helping customers to lower the threshold and quickly build services such as data integration, data development, and data visualization. In addition, EMR also supports peripheral services such as cloud monitoring and cloud auditing, helping customers quickly build an intelligent ecosystem for enterprises on the cloud.

589e375b98058f0ee6fd2396f30d5270.png

3. Best practice cases of self-built migration to the cloud

Case 1: A top education client

【Customer background】

The client is a leader in online education in China, and has been committed to using technology to help inclusive education, using cutting-edge technologies such as artificial intelligence and big data, to provide students, teachers, and parents with more efficient learning and education solutions, intelligent hardware products wait.

【Core pain points】

Before using Tencent Cloud EMR service, customers mainly adopt CDH to maintain their own open source big data clusters. However, with the explosive growth of business and the skyrocketing amount of data, in order to meet the timeliness requirements of different business scenarios, the customer has tried to adopt many technical solutions and continuously expand the scale of offline clusters, but it still cannot fully meet the business needs. Based on the CDH self-built hive system The core reports cannot be produced on time, which seriously affects the data analysis work and business decision-making.

【solution】

Tencent Cloud Elastic MapReduce proposes two solutions for the core demands of customers, such as the timeliness of offline data warehouses of massive data and the security after migrating to the cloud:

Lake warehouse integrated solution: Promote the implementation of data lake iceberg technology, Tencent Cloud EMR's Iceberg unique ability, help customers migrate hive-based PB-level report system to iceberg data lake, after Z-Order optimization, report calculation performance is significantly improved, Improve the efficiency of core reports while reducing costs comprehensively;

Unified authority scheme: leading the unified scheme of separation of storage and computing authority, based on the unified authority management and control ability of the product, and increasing the expansion ability. In this scheme, object storage (cos) is used as a resource of authority control, unified authority management, and solves the problem of inconsistency in authority control. ;

【Migration effect】

Scenario-based query efficiency increased by 10 times: Through Tencent Cloud Elastic MapReduce's Iceberg feature optimization, cache acceleration, storage-computing separation and intelligent layering and other applications, the performance of scenario-based query has been improved by nearly 10 times. In addition, through the Iceberg external Matastore function provided by Tencent Cloud EMR, the cost of customer metadata transformation is reduced, and Iceberg metadata access is realized with almost zero transformation.

Fixed computing power reduced by 5,000 cores: Through cloud-native capabilities such as EMR computing node elastic scaling and offline scheduling of container resources, it avoids wasting idle resources and reduces overall costs.

Case 2: A leading tool customer in an industry

【Customer background】

China's leading enterprise cloud business and marketing solution provider, and also China's leading precision marketing service provider. In order to better serve customers' own BI, search, marketing, recommendation and other business scenarios, a set of stable and high-performance big data solutions is needed.

【Core pain points】

Due to the rapid development of customer business and the sharp increase in data volume, the original big data platform based on Blackstone physical machine + CDH self-built gradually appeared new business scenarios such as long blackstone node expansion cycle, old CDH component version, not rich, unable to cover data lakes, etc. disadvantages. At the same time, due to the large number of CDH cluster components and the long average bug repair cycle, customers have to invest more manpower in operation and maintenance. In general, in order to support the rapid development of business, the customer's big data team has to invest a lot of cost and manpower in the expansion of self-built clusters, support for new business scenarios, platform stability and operation and maintenance.

【solution】

Tencent Cloud EMR provides the ability to build clusters with one click to meet the needs of minute-level elastic expansion and contraction during peak business hours. The deep integration of cluster computing and storage components satisfies customers' multi-tenant conscientiousness and table-level and field-level granular authorization capabilities.

The automatic scaling capability can elastically scale computing nodes based on two strategies, time and load, to meet the resource requirements of customers in different time periods in offline and ad hoc analysis scenarios. Login authentication and index authority management capabilities at the document field level provide a solid guarantee for cluster security access. The integrated object storage COS function makes backup easy. The multi-availability zone capability provides disaster recovery protection for the cluster in the case of abnormal power or network conditions in the city.

【Migration effect】

Through the introduction of Tencent Cloud EMR, the delivery efficiency of the customer cluster is increased by 10 times.

The minute-level elasticity of the cluster helps customers easily cope with the pressure of the sudden increase in traffic during the event.

The Tencent Cloud EMR security system provides a more favorable guarantee for business security and high availability.

4. Migration plan and purchase discount

After deciding to migrate to the cloud, the data and analysis tasks will be migrated to Tencent Cloud EMR. We provide you with IDC self-built migration EMR solution practice and customized migration EMR solution.

● Migration plan practice:

Claim link:

https://drive.weixin.qq.com/s?k=AJEAIQdfAAod5vyDEGAFcADQaEACc#/preview?fileId=i.1970325010981265.1688850523229527_f.6789599412zz1

● Customized migration plan:

Claim link:

https://cloud.tencent.com/apply/p/5tjcbikd2f7

Currently, you can enjoy a 55% discount for 3 years when you purchase EMR.

Purchase link: https://buy.cloud.tencent.com/emr

Guess you like

Origin blog.csdn.net/cloudbigdata/article/details/129679915