In-depth Interpretation: Alibaba Cloud DataWorks Ideas and Capabilities Won High Evaluation by Forrester for Big Data Capabilities

Abstract:  Forrester released the Now Tech: Cloud Data Warehouse Q1 2018 report, which comprehensively evaluates the main functions, regional performance, market segments and typical customers of Cloud Data Warehouse (CDW).

1 Introduction

This article is based on the analysis of Now Tech: Cloud Data Warehouse, Q1 2018 (Published: by Noel Yuhanna, March 13, 2018), and the content in the article represents my personal opinion only.

On March 13, 2018 Forrester released the Now Tech: Cloud Data Warehouse Q1 2018 report. The report comprehensively evaluated the main functions, regional performance, market segments and typical customers of Cloud Data Warehouse (CDW), and finally AWS, Alibaba Cloud, Google and Microsoft entered the global first-line camp. Alibaba Cloud DataWorks+MaxCompute became the only Chinese product selected.

In the report, Forrester highlighted four core competencies of CDW:

· Flexible deployment. CDW should have multiple flexible deployment modes. For small customers, CDW should be able to provide an online multi-tenant model, enabling customers to quickly mobilize computing resources and complete the deployment of data warehouses within minutes. For large and medium-sized customers, CDW should provide exclusive or local deployment mode, provide powerful computing performance and absolute security, while shielding complex technical details;

· Efficient data transfer to the cloud. For customers who have not clouded their data warehouses, or customers who adopt a hybrid online-offline architecture, CDW should provide a fast and low-cost way to help users complete data integration.

·Diversified analysis methods. CDW should provide a variety of technical means to help users obtain the desired data processing capabilities in various business scenarios.

·safety. The security of CDW should comprehensively consider data encryption, auditing, desensitization, access control and other aspects.

DataWorks ( https://data.aliyun.com/product/ide ), as the core of Alibaba's CDW service capabilities, why can it be favored by Forrester? Today we will do an interpretation.

2. DataWorks product architecture

Before the official interpretation, let's take a look at the role of DataWorks in the Alibaba Cloud CDW service system and the product architecture of DataWorks.

Among Alibaba Cloud's many products, DataWorks and MaxCompute together constitute the core of CDW's service capabilities. Among them, MaxCompute, as a storage computing engine, plays the role of IaaS layer support, providing users with massive and reliable large data table storage and SQL execution capabilities. However, just having MaxCompute is not enough. In order for big data technology to truly empower customers, a series of CDW services such as data development and data integration are also required, and DataWorks provides a relatively complete solution.

Specifically, it contains 8 main modules:

 

  • Data integration: Heterogeneous data integration, bringing together massive data from various source systems to big data platforms
  • Data Development: Data Warehouse Design and ETL Development Process
  • Monitoring operation and maintenance: operation and maintenance monitoring of ETL online jobs
  • Real-time analytics: probe and analyze data in real-time
  • Data asset management: metadata management, data map, data lineage, big picture of data assets, etc.
  • Data Quality: Data Quality Profiling, Monitoring, Verification and Scoring Systems
  • Data security: data rights management, data graded marking, desensitization, and data auditing
  • Data Services: Data Sharing and Data Exchange, Data API Services

 

3. Flexible deployment

Forrester expounded the necessity of multiple deployment forms in the report, and compared multiple CDWs, and DataWorks is one of the few products in the first camp that provides multiple deployment methods.

First of all, as the core of Alibaba Group's data middle-office system, DataWorks has supported Alibaba Group, Ant Financial, Cainiao and other group-wide businesses since 2009. As long as the data services of Taobao, Tmall, Ant Financial and other products are used, it is possible to indirectly use the computing services of DataWorks.

Second, DataWorks is already open in the public cloud. Up to now, DataWorks has served 4,000+ public cloud customers, supporting important customers such as Sina Weibo, Renrenche, and Tianhong Fund.

Finally, DataWorks also supports proprietary cloud output. As an important means of empowering big data capabilities, DataWorks has appeared in Alibaba Cloud's proprietary cloud solutions such as Apsara Enterprise. Since 2015, it has supported heavyweight government and enterprise projects including "City Brain" and "Run at Most Once".

Through flexible deployment methods, DataWorks can meet a variety of customer needs in different forms. For small users, it can be flexibly supported through public cloud; for large and medium-sized customers, proprietary cloud or hybrid cloud solutions can also fully meet customer needs.

4. Efficient data transfer to the cloud

Efficient data integration means it is self-evident that enterprise data can be migrated to the cloud. In the initial cloud stage, enterprises need to quickly and safely migrate their data assets to the cloud; in the continuous operation stage, enterprises need to input various forms of data into CDW, and output the data processed in CDW to the each business unit.

The data integration of DataWorks provides the ability to read and write various types of data sources, including relational databases, NoSQL databases, big data databases, text storage (FTP), etc. And it can synchronize and integrate heterogeneous data sources under complex network conditions. In the arrangement of specific import tasks, DataWorks supports batch, full, and incremental synchronization of offline data, and supports custom synchronization times in minutes, days, hours, weeks, and months.

The data integration of DataWorks also has the ability of data flow management and control, which can control the behavior of data flow from multiple dimensions such as dirty data, data flow rate, and number of concurrent threads, saving user costs in multiple directions and realizing lean management.

5. Diversified Analysis Methods

DataWorks provides a powerful data development IDE that supports editing from SQL code, integrated task editing, and visual editing of business process DAG diagrams. The multi-person online collaboration function and the version management function of task scripts are also very suitable for the actual needs of enterprise-level data development. In addition to routine offline processing tasks, DataWorks also provides a lightweight tool "Data Analysis Workbench", which makes full use of the computing power of MaxCompute to meet users' needs for ad hoc data analysis.

It is reported that DataWorks has also recently updated the drag-and-drop business process editing function to further improve the user experience and create the best possible data development IDE.

6. Security

DataWorks regards data security capabilities as the top priority, and sensitive data protection needs to comply with industry regulations and data privacy laws. DataWorks provides a data security module that provides comprehensive data security protection through the following aspects:

· Multi-tenant isolation. DataWorks has its own multi-tenant permissions model. Tenants can apply for resource quotas on demand and manage their own resources independently; tenants can also independently manage their own data, permissions, users, and roles, and isolate each other to ensure data security.

·Data security level setting. Through the data security level, sensitive data is found and located, its distribution on the data resource platform is clarified, and sensitive data is automatically discovered and classified according to the defined sensitive data type. It is usually divided into top secret, confidential, normal and other levels to ensure corresponding security rules.

· Data access auditing. DataWorks has a strict review process for privileged user access, including when to access, what operations to perform, and the order of execution. Recording and auditing the access records of privileged users can ensure that privileged users complete the correct operations at the correct time, check whether there are deviant behaviors, and then ensure the security of the data system.

· Data desensitization. DataWorks can focus on the data content itself, seize sensitive information points, and dynamically access and mask this part of information when it is not sure that those users, those access addresses, or even those fields are suspicious or harmful access. So as to achieve the purpose of protecting data security.

At present, DataWorks has passed the third-level certification of information security level protection of the Ministry of Public Security.

7. Summary

With the deepening of the "Internet +" reform in all walks of life, enterprises have increasingly strong demands for the management, processing and utilization of data assets. Using cloud computing technology, Internet companies can quickly empower their own big data processing capabilities to the outside world. This is also the reason why, in Forrester's list, the world's four major cloud service companies have surpassed established data warehouse technology companies such as Oracle and IBM to become first-tier CDW suppliers.

Thanks to Alibaba's years of experience in data utilization, DataWorks has achieved a high degree of compliance with enterprise-level needs in terms of deployment mode, data integration, analysis methods, and data security.

It is understood that DataWorks will continue to output more advanced data management concepts, including real-time data integration, data asset analysis, etc. Combining cloud computing technology with data warehouse management methodology, maintaining continuous iteration, and striving to create "the most suitable platform for big data data warehouse construction", I think this is the reason why DataWorks was included in the Forrester CDW list.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324523594&siteId=291194637