Practical cases of data governance in automobile companies, realizing closed-loop links of data production and consumption | Digital benchmarking

With the rapid development of business, the number, complexity and data volume of a certain automobile manufacturing company's business systems are increasing exponentially, which places higher and higher requirements on the enterprise's IT capabilities and IT architecture model. In addition, the company is vigorously developing digital marketing , new energy vehicles and other businesses, hoping to create a sustainable digital transformation path by continuously optimizing customer experience.

In order to better cope with the challenges brought about by digital transformation, the existing siled data system cannot meet the needs of more and faster systems and data interaction, agile innovative applications, data sharing, and new business development. Data-driven digitization will help car companies comprehensively understand changes in user needs, and can also provide support for companies in all aspects of marketing, production, and services, further improving the company's operating efficiency.

When carrying out the data transformation of a car company, three core issues need to be solved: How to collect, summarize and operate your own data? How to build a data governance operations team? How to quickly demonstrate results in a short period of time and build confidence within the company?

The second phase of the construction of a car company's data center focuses on the construction of a data management platform . The core concept of the data governance platform is that "data comes from the business and is used for the business", that is, a complete closed-loop process for a car company from data production to consumption, and then the data generated after consumption flows back to the production process.

01 Data governance solution for data “production-consumption-production” closed loop

1. Consulting services

Based on the organizational structure, institutional system and data asset inventory of a certain car company, combined with international, domestic and industry standards, a relevant data specification system was developed around the full life cycle management of data assets . Through data governance consulting, we build the data governance system of the projects involved, including standards, organizations, specifications, processes, systems, etc., to achieve the formulation of data classification and classification standards for marketing business lines, manufacturing business lines, and R&D business lines, and formulate master data and data standards. , data model , metadata, data quality, data security, data life cycle, data architecture and other standards, processes and management systems, and have the ability to promote them to the entire company's business lines.

The first is data governance system planning. The overall planning of data governance includes data management vision, organizational model, management boundaries and promotion strategies. The design of data management system includes data governance foundation, core areas of data management, and data applications. The tasks and planning include identification of data management tasks and analysis of implementation principles. , Implementation plan formulation.

The second is data governance organizational planning. According to the actual needs of data management work, the responsibilities of each staff member must be determined among the business department, technical management department and business application department. For example, different business departments should clarify the specific requirements and relevant rules for data for their respective business operations, while the technical department will be responsible for specific implementation work according to the needs of the business department, including converting the requirements put forward by the business department into technical language for use in advance. Control (such as field constraints), logical control during the event (such as the control cannot be empty), post-event verification, as well as specific technical operations and the preparation of regular reports, etc.

2. Platform construction

Provides Kangaroo Cloud data asset management suite and visual development suite to meet the capabilities of data offline development, real-time development, data modeling, data standards, data quality, data lineage, data security, metadata management, data assets, data tags, etc., and integrates automatically There are big data platforms , open platforms, scheduling platforms and visualization platforms to manage data assets, improve data quality, create data asset centers, and data service centers and application centers that support business innovation.

file

3. Project implementation

Sort out the data assets of the marketing business line, manufacturing business line, and R&D business line, divide the data domains , build data applications, and realize the entire process of the data life cycle. Specific implementation contents include data asset maps, data models, data standards, metadata management, data lineage, data classification , data quality rules and reports, etc.

First, the data asset portal

Global statistics of enterprise data assets allow enterprise managers to have an intuitive understanding of the distribution, growth, use, and quality of data. Including but not limited to:

1) Statistics of data indicators : number of data sources, number of tables, storage amount, usage, quality score. 2) Statistics of data trends: data distribution, data growth trends, and data usage popularity. 3) Data usage ranking: data storage ranking; metadata quality: normative trends and normative rankings.

file

The second is data map

The data map is positioned as a visual data asset center. Users can view all data tables in the platform in the data map module, and at the same time, they can manage data assets in an all-round way.

1) Data search: gathers all data table information in the platform to facilitate developers to quickly locate the required data tables, and supports users to filter based on category, table name, project, and authorization status, or search directly based on table name.

2) Data table metadata display: After the user specifies a table, he can view the basic information of the table, including table name, physical storage amount, life cycle, whether to partition the table, field name, field type, partition information, etc., and at the same time Preview and visually view the data in the table.

3) Data category management : When there are more and more data tables in the platform, the importance of data categories will become increasingly prominent. It provides three-level category management. Users can customize the level and name, and assign the data table to a certain node. Data developers can quickly locate according to the data category when looking for data.

4) Data approval authorization: Provides management of table-level data permissions . When users need to access tables across projects (read/write), they must first be approved and authorized by the project administrator. After approval, the table can be accessed across projects. At the same time, authorization approval has the concept of validity period, and the authorization is automatically canceled after the validity period is exceeded, improving the security of data access.

4) Life cycle management: Provides table life cycle management. Users can specify the life cycle when creating a table. The system regularly detects the data update time of each table/partition, and automatically deletes the data after the time is exceeded, reducing the storage pressure caused by temporary data. .

5) Data lineage analysis: Provides automatic parsing synchronization tasks and SQL codes to automatically establish table-level and field-level lineage relationships of each data table. Users can see the "past and present life" of each indicator directly on the page, which facilitates quick troubleshooting of indicators. Problem, check the indicator statistics logic, whether the dependent link is normal, etc.

Third, data quality

As part of data governance, ensuring and improving data quality is an essential function of a big data platform. The management of data quality can be roughly carried out according to the process system of before, during and after the event, that is, the definition of monitoring rules before the event, the data generation and monitoring during the event, and the data quality analysis after the event.

file

1) Prior management: Access the data sources that need to be managed, and combine the understanding of business needs and data to configure monitoring rules for the data that needs to be monitored.

2) In-process management: By configuring the scheduling cycle for the defined monitoring rules, the system automatically executes and verifies data quality.

3) Post-event management: Timely issue error reminders for data whose verification does not meet the rules. At the same time, the system automatically generates monitoring reports to help users review and summarize data problems.

Fourth, data security

1) Data permission control: Supports the management of table-level data permissions. When users need to access tables across projects (read/write), they must first be approved and authorized by the project administrator. After approval, the table can be accessed across projects. At the same time, authorization approval has the concept of validity period, and the authorization is automatically canceled after the validity period is exceeded, improving the security of data access. Support permission application and approval for data resource services to ensure the security of data services.

2) Life cycle management: supports table life cycle management . Users can specify the life cycle when creating a table. The system regularly detects the data update time of each table/partition, and automatically deletes the data after the time is exceeded, reducing the storage pressure caused by temporary data. .

3) Data impact analysis: When the user configures a synchronization task and performs multiple steps of cleaning and transformation processing through SQL tasks, the result data will eventually be output. In the entire processing link, the blood relationship of the data is implicit in the synchronization In tasks and SQL code, data impact represents the process of how each statistical indicator is derived from the raw data.

4) Data desensitization: Supports custom desensitization rules , which can be applied to different sensitive data to prevent data leakage during data preview. Including support for customizing security levels according to national standards, classifying and grading people and tables; supporting custom script functions and regular expressions , associating identification rules, identification functions and desensitization rules on demand, automatically and dynamically identifying sensitive data; supporting built-in Various sensitive data identification regular templates, namely identification of ID card, bank card number, email, mobile phone number, IP, landline phone number, license plate number, name, company, address, and user-defined rules are also provided.

02 Build a data governance platform and significantly improve data quality

Through the data governance platform project , a car company has completed the construction of data specifications, standards, quality, service systems, governance organizational structures, etc., which can basically meet the company's data development requirements for 2-3 years. Combining the data middle platform + data governance solution, phased results have been achieved at this stage:

The first is to build a powerful data development and governance platform system . Through the construction of the data platform, it can realize a data basic processing platform, data asset management platform, and data service platform for a certain car company. This enables a complete set of standardized data processing processes from standardized data collection, data quality management, data asset management and data application, while simultaneously connecting to BI and reporting tools, while also providing standardized API management capabilities for metadata.

The second is to quickly locate the root cause of data problems. There are many data problems that are not necessarily real data problems. If all users find technical personnel to help locate problems as soon as they encounter problems that are difficult to understand, technical personnel will spend too much time locating the problems. will eventually lead to more and more data problems piling up. Therefore, this project provides users with a self-service troubleshooting function to help users find the cause of the problem. If it cannot be solved, they can find technical personnel to help solve the problem. In addition, the data of the intermediate results of the data flow is visually presented, so that when the final result report is missing or incorrect, the data error link can be quickly located.

Third, data quality is guaranteed and data value is high. Reliable data quality not only improves the decision-making efficiency and results of decision-makers, but also reduces the probability of risks. When businesses use reliable data, they can answer questions and make decisions faster and more consistently. If the data is of high quality, you can spend less time finding problems and more time using the data to gain insights, make decisions, and serve users.

"Dutstack Product White Paper" download address: https://www.dtstack.com/resources/1004?src=szsm

"Data Governance Industry Practice White Paper" download address: https://www.dtstack.com/resources/1001?src=szsm

For those who want to know or consult more about big data products, industry solutions, and customer cases, visit the Kangaroo Cloud official website: https://www.dtstack.com/?src=szkyzg

IntelliJ IDEA 2023.3 & JetBrains Family Bucket annual major version update new concept "defensive programming": make yourself a stable job GitHub.com runs more than 1,200 MySQL hosts, how to seamlessly upgrade to 8.0? Stephen Chow's Web3 team will launch an independent App next month. Will Firefox be eliminated? Visual Studio Code 1.85 released, floating window Yu Chengdong: Huawei will launch disruptive products next year and rewrite the history of the industry. The US CISA recommends abandoning C/C++ to eliminate memory security vulnerabilities. TIOBE December: C# is expected to become the programming language of the year. A paper written by Lei Jun 30 years ago : "Principle and Design of Computer Virus Determination Expert System"
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3869098/blog/10319237