Application of metadata blood relationship analysis in insurance business

This article discusses in detail the blood relationship analysis/influence analysis in the application of metadata, that is, how the "data genealogy" of data realizes value in the insurance industry.

one

Some scenes of data lineage

A. Xiao Li is the DBA of the operation and maintenance department. The company's marketing department has recently planned a large-scale online and offline marketing campaign, and the company's leaders attach great importance to it. As the company's IT department, we must do a good job in supporting and guaranteeing work, and we must not lose the chain during critical periods. Therefore, the company also conducts a unified and comprehensive inspection and evaluation of all systems to ensure that nothing goes wrong. During the inspection, it was found that a certain core system used up to 80% of the resources for running batch jobs at night. If the system pressure increases during the activity, there is a greater risk. After investigation, several stored procedure packages take a long time to execute and occupy a lot of resources. Xiao Li reported the situation to the leader and submitted it to the system development department. The system development department researched for a long time and found that they were all written many years ago by a previous outsourcing team. Now the personnel can no longer be contacted, and the entire department has also asked about it, saying that it is useless. The code inside is a few hundred lines small, tens of thousands of lines long, and the comments are not complete, and it is called layer by layer. It is estimated that the activities will be over after checking. Therefore, in order to ensure the stable operation of the system, it is recommended to stop running batches at night first. As a result, the second problem came out. The calculation result of a stored procedure in these stored procedure packages was used by a later developed application, which affected the normal operation of the business. Many customers could not receive text messages, resulting in production accidents.

B. Xiao Zhang from the business department found that the new policy premiums of the month were far from the expected one. Obviously, through this marketing campaign, the company’s turnover increased a lot. Why didn’t the new policy premiums in a certain area increase significantly? Xiao Zhang called Xiao Huang, the developer of the report project team.

Xiao Zhang: Is there something wrong with the statement you issued? The premium for the new policy is wrong!

Xiao Huang: Probably not. This report was developed very early and has not been changed recently. Now it is running normally every day.

Xiao Zhang: But the data for this month is definitely wrong. The difference is too big. It must be that there is a problem with your processing logic or that the data is under-calculated. Please help me to check it.

Xiao Huang: This report data is calculated from multiple core systems and various levels in between. There are ETL tools, stored procedures, SPARK, and report layer logic. There will be no way to check it in a while.

Xiao Zhang hung up the phone and could only hope that his intuition was wrong, and the current report data was correct or the difference was not as big as imagined.

C. Xiao Chen is a data development engineer in the development center. He suddenly received a request to change the table structure. The downstream systems and applications used are not afraid of killing a thousand by mistake or letting one go), and then the person in charge of each system or application starts to check one by one, modify, test, and go online at the specified time. The whole process is guaranteed by manual work, and then after the source system table structure is changed, each system application gets up in the middle of the night to check, the data has come, and there is no problem with what I am responsible for. If there is a problem, it has to be adjusted. For the insurance industry, how to straighten out these huge data so that they flow smoothly and well-organized like blood vessels? Data lineage analysis may be a good method.

two

How is blood relationship analysis implemented?

Data lineage analysis is one of the important applications of metadata management. It sorts out the relationships among systems, tables, views, stored procedures, ETL, program codes, fields, etc., and uses graph databases for visual presentation. In short, it is to show how the data comes from, what processes, stages and calculation logic it has gone through through visualization.

From a technical point of view, data T1 is processed by ETL to generate data T2, and then data T2 and data T3 are merged to generate data T4. Then data T1, T2, and T3 constitute a blood relationship. Data T1 is the upstream data of data T2, while data T2 is the downstream data of data T1, and T3 and T2 are both upstream data of T4. From the hierarchical granularity of lineage analysis, it can be divided into entity, business unit, organization, application, system, table level, and field level lineage.

Knowing what is the lineage of data, let's take a closer look at how to sort out the lineage of data. There are two main methods for sorting out data lineage:

(1) Automatic analysis: by analyzing SQL statements, stored procedures, ETL procedures, reports, program codes, etc. in data processing and circulation, give a simple example

Use Kettle to load the data from the source to the information management database --> load the data inside the enterprise to the ODS layer of the data warehouse through ETL --> finally enter the model layer DW

The table structure information of each data source, the processing logic of kettle, Informatica, and stored procedures are automatically explained through the program, and connected according to the logical level.

(2) Manual sorting: refers to manual sorting of blood relationship by technicians, which is relatively inefficient and difficult, and it is difficult to revise in time once there is a change.

Of course, for a mature enterprise now, there are dozens or hundreds of systems at every turn, and it is unrealistic to sort out by hand, like a foolish old man moving mountains.

Therefore, the automatic analysis of blood relationship is particularly important. Now there are many tools for metadata management, but there are few tools that can automatically analyze databases, stored procedures, ETL tools, codes, etc. .

The enterprise data intelligent map provides good analytical metadata for almost all databases, big data products, ETL tools, complex SQL, stored procedures, JAVA codes, Python, etc., and automatically associates metadata through gallery technology to form data lineage.

  • Rich metadata interfaces, more than 50+ (many data sources can be connected, including local Huawei star ring big data interface);
  • Field-level lineage analysis;
  • Code lineage analysis; - not only can parse SQL in the database, but also can parse Cobol on the mainframe, Java, Python, and realize comprehensive metadata analysis from historical traditional coding to the latest machine learning model;
  • Realize the analysis and presentation of calculation logic including stored procedures, codes and ETL tools.

three

The value of kinship analysis

After the first two chapters, we have a certain understanding of data lineage and how to sort out data lineage. Then within the enterprise, what are the practical benefits for enterprise employees (IT and business)?

Let's go back to the scene at the beginning. With the data pedigree, Xiao Li can quickly understand which system data, intermediate processing logic, target tables are used by the PKG that consumes a lot of performance, and which system applications use these tables. When it arrives, you can quickly locate what business these PKG will affect, so as to avoid similar production accidents. Of course, Xiao Zhang can also quickly find out which calculation logic the premium of the new policy is calculated from the source through blood relationship analysis, so that he can quickly locate the problem and directly tell Xiao Huang of the report development team where the filter conditions of this data are incorrect and where the calculation logic is If there is a mistake, let the report development team adjust it as soon as possible.

Xiao Zhang from the business department and Xiao Chen, the data development engineer, can also quickly and accurately find the relevant upstream and downstream through blood relationship analysis/influence analysis and notify relevant users in a timely manner. These scenarios actually correspond to abnormal location, traces of blood relationship and impact analysis in data blood relationship analysis. Of course, the application scenarios of data blood relationship are far more than that, and it is also widely used in regulatory submission, quality inspection, and evaluation of data value.

With the advent of the DT era, enterprises, employees, and equipment are constantly producing and consuming data. Faced with massive amounts of data, if enterprises manage and use them well, they can make it play its due value. Metadata management is particularly important, and data lineage analysis, as one of the metadata applications, also requires our attention and utilization. Therefore, for the blood relationship of data, we need to ensure that each link can clearly see the logic, understand the upstream and downstream relationships, and let the data serve us better and create value.

Guess you like

Origin blog.csdn.net/xljlckjolksl/article/details/132185318