The Road to Big Data, Alibaba Big Data Practice Reading Notes --- Chapter 12, Metadata

1. Metadata concept

  • Metadata definition

    • According to the traditional definition, Metadata is data about data. Metadata opens up the source data, data warehouse, and data application, and records the entire process from data generation to consumption;

    • The metadata mainly records the definition of the model in the data warehouse, the mapping relationship between the various levels, and the task operation status of the first-level ETL that monitors the data status of the data warehouse;

    • In the data warehouse system, metadata can help data warehouse administrators and developers to find the data they care about very conveniently, used to guide their data management and development work, and improve work efficiency;

    • category:

      • Technical Metadata (Technical Metadata)

        • Technical metadata is data that stores technical details about the data warehouse system and is the data used to develop and manage the data warehouse. Common technical metadata of Alibaba are:

          • The distributed computing system stores metadata, such as information about the running of all jobs on MaxCompute; similar to the job log of Hive, you include job type, instance name, input and output, SQL, operating parameters, execution time, and the most granular FuXi Instance (MaxCompute MR in the implementation of the smallest unit) execution information, etc .;

        • Data synchronization in the data development platform, computing tasks, task scheduling and other information, including data synchronization input and output tables and fields, and node information of the synchronization task itself; computing tasks mainly include input and output, node information of the task itself; task scheduling mainly Dependency types and dependencies of tasks, and running logs of different types of scheduling;

        • Data quality and metadata related to operation and maintenance, such as task monitoring, operation and maintenance alarm, data quality, and fault light information, including task monitoring operation log, alarm configuration and operation log, and fault information;

      • Business Metadata

Guess you like

Origin blog.csdn.net/u012965373/article/details/105463849