"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!

This article is mainly one of the data management foundation and core from: Metadata to start, expand explain in detail from the following perspectives:

  • Metadata concept

  • And collecting the metadata distribution

  • Some of the practical application of scene metadata

First, the metadata in the end is Gesha?

If I say: metadata (Meta Data), the data is described . No technical background blessing passers-pink see the phrase "tongue twister", the heart may emerge this idea:

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!

Simply, in fact, the equivalent of metadata account of this data .

What account of this that? In addition it contains a person's name, age, sex, ××× numbers and other basic descriptive information, there are blood ties this man and his family, such as father and son, brother and sister, and so on. All of this information together, these constitute a comprehensive description of the individual, it can also be called metadata of this person.

Similarly, if we want a clear description of the actual data to a particular table, for example, we need to know the table name, the physical location of the table aliases, owner of the table, data storage, which fields are primary keys, indexes, tables there, this relationship between the table and the other tables and so on. All of this information together, this is a metadata table. Such a analogy, the concept of metadata we might know a lot: Metadata is the account of this data .

Second, the metadata management

It is the core and foundation data governance

If you let the troops in war, and now what you have to master the information? Yes, a map of the battlefield is essential! Data management and metadata among all the equivalent of map data.

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!

In the map on this data, we can know:

  • What data we have?

  • Where data distribution?

  • These data are what type?

  • What is the relationship between the data?

  • What data is often cited? What data no one to patronize?

    ……

所以,如果我们做数据治理,却没有掌握这张地图,就犹如瞎子摸象。后续的文章中我们要讲到的数据资产管理,知识图谱,其实大部分也是建立在元数据之上的。所以我们说:元数据是一个组织内的数据地图,它是数据治理的核心和基础

三、元模型又是谁?

元模型(Meta Model),是描述元数据的数据。它与元数据、数据之间的关系,可以用下面这张图来描述。

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!

对于元模型的概念,我们不做深入的讨论。我们只需要知道下面这些:元数据本身的数据结构也是需要被定义和规范的,定义和规范元数据的就是元模型,国际上元模型的标准是 CWM(Common Warehouse Metamodel,公共仓库元模型),一个成熟的元数据管理工具,需要支持 CWM 标准

以下内容理解难度升级,请各位技术小白谨慎阅读

如有不懂,蒋老师后台单独辅导!

四、元数据从哪来?

在大数据平台中,元数据贯穿大数据平台数据流动的全过程,主要包括数据源元数据、数据加工处理过程元数据、数据主题库专题库元数据、服务层元数据、应用层元数据等。下图以一个数据中心为例,展示了元数据的分布范围:

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!

业内通常把元数据分为以下类型:

  • 技术元数据:库表结构、字段约束、数据模型、ETL 程序、SQL 程序等。

  • 业务元数据:业务指标、业务代码、业务术语等。

  • 管理元数据:数据所有者、数据质量定责、数据安全等级等。

元数据采集是指获取数据生命周期中的元数据,对元数据进行组织,然后将元数据写入数据库中的过程。使用包括数据库直连、接口、日志文件等技术手段,对结构化数据的数据字典、非结构化数据的元数据信息、业务指标、代码、数据加工过程等元数据信息进行自动化和手动采集。元数据采集完成后,被组织成符合 CWM 模型的结构,存储在关系型数据库中。

五、有了元数据,我们能做些什么?

先看一张元数据管理的整体功能架构图,有了元数据,我们能做些什么,从这张图里一目了然:

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!
(如果你没看懂,请来评论区告诉我)

① 元数据查看

一般是以树形结构组织元数据,按不同类型对元数据进行浏览和检索。如我们可以浏览表的结构、字段信息、数据模型、指标信息等。通过合理的权限分配,元数据查看可以大大提升信息在组织内的共享。

② 数据血缘和影响性分析

数据血缘和影响性分析主要解决「数据之间有什么关系」的问题。因其重要价值,有的厂商会从元数据管理中单独提取出来,作为一个独立的重要功能。但是考虑到数据血缘和影响性分析其实是来自于元数据信息,所以还是放在元数据管理中来描述。

血缘分析指的是获取到数据的血缘关系,以历史事实的方式记录数据的来源,处理过程等。以某张表的血缘关系为例,血缘分析展示如下信息:

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!

数据血缘分析对于用户具有重要的价值,如:当在数据分析中发现问题数据的时候,可以依赖血缘关系,追根溯源,快速地定位到问题数据的来源和加工流程,减少分析的时间和难度

数据血缘分析的典型应用场景:某业务人员发现「月度营销分析」报表数据存在质量问题,于是向 IT 部门提出异议,技术人员通过元数据血缘分析发现「月度营销分析」报表受到上游 FDM 层四张不同的数据表的影响,从而快速定位问题的源头,低成本地解决问题。

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!

除了血缘分析之外,还有一种影响性分析,它能分析出数据的下游流向。当系统进行升级改造的时候,如果修改了数据结构、ETL 程序等元数据信息,依赖数据的影响性分析,可以快速定位出元数据修改会影响到哪些下游系统,从而减少系统升级改造带来的风险。从上面的描述可以知道:数据影响性分析和血缘分析正好相反,血缘分析指向数据的上游来源,影响性分析指向数据的下游

影响性分析的典型应用场景:某机构因业务系统升级,在“FINAL_ZENT ”表中修改了字段:TRADE_ACCORD 长度由 8 修改为 64,需要分析本次升级对后续相关系统的影响。对元数据“FINAL_ZENT”进行影响性分析,发现对下游 DW 层相关的表和 ETL 程序都有影响,IT 部门定位到影响之后,及时修改下游的相应程序和表结构,避免了问题的发生。由此可见,数据的影响性分析有利于快速锁定元数据变更带来的影响,将可能发生的问题提前消灭在萌芽之中。

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!
③ 数据冷热度分析

冷热度分析主要是对数据表的被使用情况进行统计,如:表与ETL 程序、表与分析应用、表与其他表的关系情况等,从访问频次和业务需求角度出发,进行数据冷热度分析,用图表的方式,展现表的重要性指数。

Hot and cold analysis data of great value to the user, the typical application scenarios : We observed that some of the data in the long-term resource idle, not to call any application, no other program to use state, this time, the user can reference data of the report of hot and cold, combined with manual analysis, different degrees of hot and cold data do tiered storage, HDFS to make better use of resources, or to assess whether this part of the value of the lost data do offline processing to save data storage.

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!

④ map data assets

By processing of the metadata, data assets can be formed maps applications. Map data assets are generally used to organize information at the macro level, the global perspective to the information merging, sorting, displaying the amount of data, data changes, data storage, the overall data quality information, reference data management and decision-makers .

Other applications ⑤ metadata management

Metadata management there are a number of other important functions, such as: metadata change management , change history for metadata queries, before and after the change of the version for comparison, and so on; metadata comparative analysis of similar metadata ratio for; metadata statistical analysis , statistics for the number of various types of metadata, such as the type of all kinds of data, the number, user-friendly control summary information metadata. Applications such as these, to name a few.

Six, to be summed up

"Data governance that something" Series II: holding data "account of this" data governance is certainly stable!

About the author : Jane Jiang Bo, 6 years + big data management experience, specializes in providing customers with scientific and rational data management solutions. He had worked for Longtop, iSoftStone, general meta-information companies, responsible for building a data warehouse, BI, big data platform, data management and other pre-sales consulting work, the government has experience in the industry, electric power, manufacturing and so on. Big Data platform currently serves as pre-sales consulting work in several lan technology.

Guess you like

Origin blog.51cto.com/14463231/2425153