HUAWEI CLOUD Enterprise Rapid Growth Technology Innovation Forum National Tour Beijing First Stop Successfully Concluded

On April 15th, the "Beijing Station of Enterprise Rapid Growth Big Data Technology Innovation Forum" jointly held by HUAWEI CLOUD and msup was successfully held. More than 100 big data technical directors/technical managers/R&D engineers from all over the country gathered together to explore the data lake. Architecture evolution, data governance methodology and best practices.

 

First, You Peng, President of Huawei Cloud Big Data and Artificial Intelligence, delivered a speech. He said that in order to solve the challenges faced by enterprises in the implementation of AI, HUAWEI CLOUD has continuously strengthened its technology, platform, and application capabilities on the basis of "everything is a service", helping to lower the threshold of AI applications and allowing AI technology to be used at any time.

 You Peng, President of Huawei Cloud Big Data and Artificial Intelligence Domain

At the same time, Huawei provides a lightweight data governance solution; it is also developing a lightweight BI product combined with the latest NLP technology; Huawei's self-developed DWS is based on ecological OLAP components, which can be convenient and free of charge. Operation and maintenance, and out-of-the-box use are empowered to customers.

As You Peng said: "The HUAWEI CLOUD digital intelligence fusion platform provides enterprises and organizations with data full-link solutions through the integration of DataArts, the data governance production line, and ModelArts, the AI ​​development production line, and accelerates the digital intelligence transformation of enterprises. "

Later, Wang Lue, a big data technology expert of Huawei Cloud, shared the dry goods of "Interpretation of Huawei Cloud, Lake and Warehouse Integrated Modern Data Stack".

Wang Lue, Huawei Cloud Big Data Technical Expert

Huawei is currently using the lake-warehouse integrated architecture, which can realize second-level financial control, minutes and reports, and can also stably support Huawei's ERP business. However, many enterprises will face a series of challenges when conducting data management, including three major problems: high technical threshold, high investment cost, and difficult data governance.

In response to these challenges, Wang Lue described in detail DLI (Serverless multi-mode computing service, batch, stream, interactive integration, O&M-free, out-of-the-box), DataArts Studio (one-stop data governance operation platform, AI-driven Intelligent data governance, safe and efficient realization of data value discovery), CloudTable (fully managed Doris engine, real-time, simple, and efficient), they jointly built a serverless lake warehouse integrated architecture, and has worry-free, lightweight, and sustainable evolution features.

Finally, he shared the successful practice of Huawei Cloud's serverless-based data lake in enterprises/industries through the case of Cloud Whale Intelligence. He hopes that these practical practices can help enterprises better support internal data analysis and data decision-making.

Immediately afterwards, Apache hudi Committer&Apache Druid Committer Zhang Yue shared the topic of "Hudi-based Lakehouse Ingestion Performance Optimization Discussion".

 Apache Hudi Committer&Apache Druid Committer 张越

Zhang Yue elaborated on four aspects including the basic characteristics of Hudi, the process of Hudi Upsert data, the performance optimization of Hudi Ingestion (RFC-53), and the optimization of Hudi Multi-Writer by Early Conflict Detection (RFC-56).

Apache Hudi is an open source solution for Data Lakes, which can support terabytes or even petabytes of row-level updates. Zhang Yue said that if the records can be guaranteed to be unique at the computing layer, then you can set
hoodie.combine.before.upsert false to close the Dedupe Stage, so as to avoid unnecessary ReduceByKey global shuffle operations; at the same time, you can choose the appropriate Index according to business characteristics Method: BloomIndex, SimpleIndex, BucketIndex, HBase Index, etc.

From the Create Marker stage to Merging && Writing Data Files to RFC-53 && HUDI-5023 && HUDI-3923 and the schematic diagram and implementation steps of Spark Streaming bulk_insert aggregated data into Lakehouse + clustering stage, Zhang Yue shared with you in detail .

In the process of industrial production, as the amount of data continues to increase and the requirements for timeliness continue to improve, the need to write more file data to the lake has become a rigid need. In the world of Hudi, Zhang Yue also gave us detailed answers on how to support Multi-Writer and the current problems of Multi-Writer, and the participants expressed that they benefited a lot.

"Huawei Cloud DataArts Studio Helps Enterprises Efficiently Manage and Use Data and Discover the Value of Data" brought by Li Pinxin, a product expert of Huawei Cloud DataArts Studio, was a hot topic for the audience.

 Li Pinxin, Product Expert of HUAWEI CLOUD DataArts Studio

Huawei has gone through two stages in data management. First, it took more than 10 years to digitize all business data, and then form unified clean data to support enterprises to produce various reports. At this stage, its main goal is to digitize business, and the second is to improve data quality. Huawei has built a unified data base when all data exists. Through the establishment of various data connections on this data base, a unified data map of the enterprise is finally formed, which can support the rapid data analysis of the enterprise.

In order to achieve the goal of data governance, Huawei provides a variety of computing engines and supports. Computing engines include serverless DLI data lake exploration, MRS cloud-native data lake, DWS cloud data warehouse, CSS cloud search and GES graph computing, etc.

DataArts Studio is based on these computing engines to support various tools for the whole process of data integration, data development, data governance, and data services. It has the following characteristics:

• Unified data integration for all domains and scenarios, one-stop task configuration, and full-link task monitoring

• Full-scenario data integration, one-click integration of full data, incremental data, and real-time data

• One-stop development + launch + operation and maintenance, unified development environment to improve data development efficiency; batch jobs, real-time jobs, AI job development and debugging, unified scheduling of various jobs

• No Code data analysis and exploration, Auto-ETL greatly reduces the threshold of data analysis; it can also perform baseline operation and maintenance, efficient operation and maintenance around core operation links, and ensure timely completion of key tasks

• A two-tier directory builds an enterprise-level data asset system, and the data map solves the problem of the last mile of data usage

• Global data map, allowing users to better find, understand and use data assets, and efficiently use data

• Data asset lifecycle management, reducing data usage costs

• Unified policy configuration, proactive permission application, and various means to protect data asset security

• Both safety and efficiency, two practices of isolation between development and production environments, hard isolation and soft isolation

• AI4Data drives the automation and intelligence of the whole process to improve the efficiency of data governance

At the end of the sharing, Li Pinxin said that through automation, the efficiency of Huawei's entire data processing process has been greatly improved. The entire Huawei Cloud solution for data governance not only provides products, but also provides implementation capabilities. We can use Huawei products to provide Ability to carry out some consulting planning to implement the entire data governance.

Yan Xiaowei, head of Data Warehouse Technology at the Jobbang China-Taiwan Industry and Research Center, shared the theme of "Thinking and Practice of Jobbang Data Governance System".

 Yan Xiaowei, Technical Director of Data Warehouse of Jobbang Zhongtai Production and Research Center

With the rapid development of business and the increasingly perfect data analysis system, data has become an important compass and weather vane in the daily operation of the business; The problems are endless. How to maximize the value of data and use computing and storage resources more reasonably has become a new challenge we all face.

Based on these problems, Yan Xiaowei elaborated on the whole set of data governance solutions and implementation of Jobbang from the aspects of indicators in the data production link and the construction of data models, metadata management and blood relationship analysis, data quality and data security, and the construction of supporting platforms. During the implementation process, how to cover the entire data life cycle, so that the data is accurate and easy to use, and the cost is transparent and controllable. At the same time, combined with the strategic transformation of Jobbang's business, from brutal growth to refined governance, he summed up four basic key points for the success of data governance:

1. The implementation of data construction standards and norms is the basis of data governance. It is a prerequisite for subsequent data governance to formulate a set of general and comprehensive data modeling standards and development specifications and to promote the development and use of business data warehouses.

2. The accuracy, stability and timeliness of data are the core of data governance. Through the construction of data quality management platforms and other data governance tools, stable and reliable data services are provided for business

3. Data ROI evaluation is the main starting point of data governance. By opening up the entire link of data production, objectively evaluate data value, automatically identify redundant data and invalid tasks, and provide a basis for subsequent computing and storage resource governance, thereby curbing data costs disorderly growth of

4. Establishing a data open sharing mechanism is an important means to maximize the value of data. On the premise of ensuring data security and availability, lower the threshold for data acquisition and enhance the ease of use of data so as to give full play to the value of data.

Finally, Shan Xiaoming, Senior Data Governance Architect of Chinasoft International, shared the topic of "Chinasoft Data Governance Delivery, Constructing Efficient Data Management and Win-Win Customer Value" . The data governance enterprise value is unfolded in four parts.

 Shan Xiaoming, Senior Architect of Data Governance, Chinasoft International

In the process of delivering data governance projects to enterprises, ChinaSoft found some problems (data management organization, data standards, data quality, data security, life cycle management), based on these problems, they will first contact with enterprises What has been agreed is the common vision, that is: data is the core asset of the corporate strategy, and secondly, the mission, goals, and solutions of ChinaSoft will be communicated with the company.

Based on the entire lifecycle of data governance, from metadata to data standards, data quality, and then to data integration, it helps enterprises form a data asset to enhance their own value.

Regarding the data governance process, Shan Xiaoming said that after a lot of research, we have summed up nine stages of data governance. On this basis, Chinasoft International has specialized industry-level experts to help companies do a big sorting research. See what data the company has and what effect these data can ultimately achieve. At the same time, he also shared examples of helping movie theater companies with data governance and the specific processes implemented in the governance process. During this process, ChinaSoft used Huawei's DataArts Studio to help customers complete a set of data governance architecture design, business processes, logical relationships, interface combing, theme design, standard design, and logical layering, and finally helped customers realize automatic Automatic account reconciliation between accounts and third-party channels, revenue accounting, and revenue settlement automation.

Shan Xiaoming also shared with us the implementation cases and solutions of several data governance projects. Data governance focuses on the realization of enterprise value, and realizes the improvement of IT productivity and the maximization of value.

 The forum came to an end amid heated discussions and unfinished business. Big data is promoting the high-quality development of China's economy and profoundly affecting people's lives and progress. HUAWEI CLOUD will continue to adhere to the "everything as a service" strategy, continuously strengthen the capabilities of technologies, platforms, and applications, and help lower the threshold for AI applications. Grab and go. HUAWEI CLOUD will also join hands with more ecological partners to build more high-quality solutions and help thousands of industries innovate on the cloud.

Guess you like

Origin blog.csdn.net/msup789/article/details/130300905