In-depth interpretation of the thinking and capabilities of Alibaba Cloud DataWorks

本文基于Now Tech: Cloud Data Warehouse, Q1 2018 (Published: by Noel Yuhanna, March 13, 2018)进行了分析,文中内容仅代表个人观点。

On March 13, 2018, Forrester released the Now Tech: Cloud Data Warehouse Q1 2018 report. The report comprehensively evaluates the main functions, regional performance, market segments, and typical customers of Cloud Data Warehouse (CDW). Eventually , the four giants of AWS, Alibaba Cloud , Google, and Microsoft have entered the global front-line camp. Alibaba Cloud DataWorks+MaxCompute became the only Chinese product selected.

In the report, Forrester emphasized the four core capabilities of CDW:

• Flexible deployment. CDW should have multiple flexible deployment modes. For small customers, CDW should be able to provide an online multi-tenant model, so that customers have the ability to quickly mobilize computing resources, and complete the deployment of the data warehouse in a few minutes. For large and medium-sized customers, CDW should provide an exclusive or local deployment model to provide powerful computing performance and absolute security, while shielding complex technical details;

• Efficient data transfer to the cloud. For customers who have not cloudified their data warehouse, or customers who adopt a hybrid online-offline architecture, CDW should provide a fast and low-cost way to help users complete data integration.

• Diversified analysis methods. CDW should provide a variety of technical means to help users obtain desired data processing capabilities in various business scenarios.

•safety. The security of CDW should consider all aspects of data encryption, auditing, desensitization, and access control.

DataWorks ( https://data.aliyun.com/product/ide ), as the core of Alibaba's CDW service capabilities, why can it be favored by Forrester? Today we will make an interpretation.

DataWorks product architecture

Before officially starting the interpretation, let's first understand the role of DataWorks in the Alibaba Cloud CDW service system and the product architecture of DataWorks.

image description

Among the many products of Alibaba Cloud , DataWorks and MaxCompute together form the core of CDW service capabilities. Among them, as a storage computing engine, MaxCompute plays the role of IaaS layer support, providing users with massive and reliable storage of big data tables and the ability to execute SQL. However, MaxCompute alone is not enough. In order for big data technology to truly empower customers, a series of CDW services such as data development and data integration are also needed, and DataWorks provides a relatively complete solution.

Specifically, it contains 8 main modules:
 Data integration: heterogeneous data integration, which gathers massive amounts of data from various source systems to the big data platform
 Data development: data warehouse design and ETL development process
 Monitoring operation and maintenance: Operation and maintenance monitoring of ETL online operations
 Real-time analysis: real-time exploration and analysis of data
 Data asset management: metadata management, data map, data blood relationship, data asset big map, etc.
 Data quality: data quality exploration, monitoring, checksum Scoring system
 Data security: data authority management, data classification, desensitization, and data auditing
 Data services: data sharing and data exchange, data API services

image description

Flexible deployment

In the report, Forrester elaborated on the necessity of multiple deployment forms and compared multiple CDWs. DataWorks is one of the few products in the first camp that provides multiple deployment methods.

First of all, as the core of the Alibaba Group's data center system, DataWorks has supported the entire group of businesses such as Alibaba Group, Ant Financial, and Cainiao since 2009. As long as the data services of products such as Taobao, Tmall, and Ant Financial are used, it is possible to indirectly use the computing services of DataWorks.

Second, DataWorks has been opened in the public cloud. Up to now, DataWorks has served 4000+ public cloud customers, supporting important customers such as Sina Weibo, Renrenche, and Tianhong Fund.

Finally, DataWorks also supports proprietary cloud output. As an important means of empowering big data capabilities, DataWorks appears in Apsara Enterprise and other Alibaba Cloud proprietary cloud solutions. Since 2015, it has supported heavyweight government-enterprise projects including "City Brain" and "Run at most once".

Through flexible deployment methods, DataWorks can meet a variety of different customer needs. For small users, it can be flexibly supported by public cloud; for large and medium-sized customers, proprietary cloud or hybrid cloud solutions can also fully meet customer needs.

Efficient data transfer to the cloud

Efficient data integration is self-evident for enterprise data on the cloud. In the initial cloud phase, companies need to quickly and safely migrate their data assets to the cloud; in the continuous operation phase, companies need to input various forms of data into the CDW, and output the data results processed in the CDW to Various business units.

The data integration of DataWorks provides the ability to read and write multiple types of data sources, including relational databases, NoSQL databases, big data databases, text storage (FTP), etc., and can perform a unified inventory of data resources at the source of the data. And it can synchronize and integrate data from heterogeneous data sources under complex network conditions. In terms of specific import task scheduling, DataWorks supports batch, full, and incremental synchronization of offline data, and supports custom synchronization time in minutes, days, hours, weeks, and months.

image description

The data integration of DataWorks also has the ability to control data flow, which can control the behavior of data flow from multiple dimensions such as dirty data, data flow rate, and the number of concurrent threads, which saves user costs in many ways and realizes lean management.

Diversified analysis methods

DataWorks provides a powerful data development IDE, which supports from SQL code editing, integrated task editing to visual editing of business process DAG diagrams. The multi-person online collaboration function and the version management function of task scripts are also very suitable for the actual needs of enterprise-level data development. In addition to regular offline processing tasks, DataWorks also provides a lightweight tool "Data Analysis Workbench" to make full use of MaxCompute's computing power to meet the needs of users for ad hoc data analysis.

image description

It is reported that DataWorks also recently updated the drag-and-drop business process editing function to further improve the user experience and create the best data development IDE.

safety

DataWorks regards data security capabilities as its top priority, and sensitive data protection needs to comply with industry regulations and data privacy laws. DataWorks provides a data security module, which provides comprehensive data security protection through the following aspects:

• Multi-tenant isolation. DataWorks has its own multi-tenant permission model. Tenants can apply for resource quotas on demand and independently manage their own resources; tenants can also independently manage their own data, permissions, users, and roles, and isolate them from each other to ensure data security.

• Data security level setting. Through the data security level, sensitive data is discovered and located, its distribution on the data resource platform is clarified, sensitive data is automatically discovered according to the defined sensitive data type, and it is classified and classified. Usually divided into top secret, confidential, normal and other levels for corresponding security rules protection.

• Data access audit. DataWorks has a strict review process for privileged user access, including when to access, what operations to perform, and the order of execution. Recording and auditing the access records of privileged users can ensure that the privileged users complete the correct operation at the correct time, check whether there is any deviant behavior, and then ensure the security of the data system.

• Data desensitization. DataWorks can pay attention to the content of the data itself, seize sensitive information points, and dynamically access this part of the information in a targeted manner when it is not sure what users, those access addresses, and even those fields are suspicious or harmful. So as to achieve the purpose of protecting data security.

Currently, DataWorks has passed the third-level certification of information security level protection by the Ministry of Public Security.

to sum up

With the deepening of the "Internet +" reform in all walks of life in society, enterprises have increasingly strong demands for the management, processing and utilization of data assets. Using cloud computing technology, Internet companies can quickly empower their own big data processing capabilities. This is also the reason why the four major global cloud service companies in Forrester's list have surpassed established digital warehouse technology companies such as Oracle and IBM to become first-line CDW suppliers.

Benefiting from the accumulation of Alibaba's years of data utilization experience, DataWorks has achieved a high degree of compliance with enterprise-level requirements in terms of deployment mode, data integration, analysis methods, and data security.

It is understood that DataWorks will continue to output more advanced data management concepts, including real-time data integration and data asset analysis. Combining cloud computing technology with data warehouse management methodology, maintaining continuous iteration, and committed to building "the most suitable platform for large data warehouse construction", I think this is the reason why DataWorks was selected for the Forrester CDW list.

Guess you like

Origin blog.csdn.net/NicolasLearner/article/details/112218613