Understanding master data and reference data in one article

If you're about to start a project that drives data governance or data quality, chances are you've heard a few terms: master data and reference data. When you first hear the term master data, it sounds very high-level, and it is definitely not understood by non-professionals (even friends who are engaged in the data industry are difficult to understand). This section will answer the following questions:

1. What is master data?
2. What is master data used for?
3. What is reference data?
4. What is the reference data used for?
5. What is the relationship between master data and reference data?

master data definition

From the definition on Baidu Encyclopedia, master data refers to the shared data between systems, also known as benchmark data , which describes the people, places, and things involved in the organization's business (that is, what we often call people and goods yards), such as Customer, employee, supplier-related data, place data (location, sales area), things (accounts, products, assets), etc. in an enterprise are master data, because these data are often used by multiple business processes and IT systems, And master data can be used to analyze and drive business processes to improve operational efficiency.

It feels like I understand everything, but it feels like I don’t understand anything.
picture
Then we can literally understand it as the main data, what is the main data? It plays a key role in the entire business process of the enterprise, and it is relatively commonly used data. The data generated in the core business process is master data. From this explanation, it can be found that master data is a way to detect whether the development of the enterprise is healthy. But this understanding is somewhat biased. A slightly stricter definition is that master data is core, non-transactional data used throughout the enterprise. Note: This refers to non-transactional data . For example, you can see some transaction data in the ERP system, such as the date and number of the order, location, amount, commodity, user, supplier, store and other information. Then the products, suppliers, users, and locations in this information are all master data, that is to say, the main data participating in the core process are all master data. These entities provide contextual information for business transactions and analysis . Go here I wonder if you have a little understanding of the concept of master data?

Of course, through this example, some readers may have doubts, what is the order transaction record? In fact, another data type has been expanded here: transaction data , that is, these entity data are combined to generate an event activity record, then this record belongs to transaction data. Such as call records, sales records and other events. It seems that the master data is embedded in the transaction data, but compared with the transaction data, the attributes of the master data are relatively stable, and the reliability requirements are high, so it needs to be uniquely identified. Since the transaction data is involved, picturehere Let’s compare it with another noun: metadata . For the definition and concept of metadata, you can see Wanzi’s introduction to 25 metadata management solutions (including videos, recommended for collection) . The difference between metadata and master data is mentioned in the "White Paper on Master Data Management Practice" issued by the Institute of Communications. The editor thinks it is very vivid . The concept of master data is selected from metadata, which represents the key and general data of enterprise business operation. It is a relatively subjective concept. Master data is not only header information, but also includes instance data "

pictureThen metadata is involved here, and you may associate it with the data warehouse. Master data is actually somewhat similar to the data warehouse, but master data cannot be completely equated with the data warehouse. First of all, both master data and data warehouses have one thing in common, which is integration . Because master data is shared data across businesses, systems, and departments, it is necessary to centrally manage the data shared by each business system, which reduces The problem of data redundancy and inconsistency is eliminated; the data warehouse is also integrated for data, and all data is put into a "warehouse" for everyone in the enterprise to check (of course, data security must be considered). After sorting it out in this way, you should actually find the difference between the two. The data warehouse is oriented to all data, that is, all comers are accepted, and all are accepted; but the master data is not all the data integrated, but the core data. Only data with high value density will be managed centrally.
picture

There is also a difference between data flow direction and timeliness: for data warehouses, it is generally one-way, that is, after data is entered into the warehouse from the business system, it is processed by ETL and then out of the warehouse for decision analysis; while the master data comes from the business system, and at the same time It will also flow back to the business system, which is a two-way flow; and the data in the data warehouse has changed, which is generally not sensed until T+1. The change of master data needs to be applied to the business system in real time. For example, if the customer's address or contact information changes, it must be synchronized to the business system immediately, otherwise the historical data may be used, affecting the business service experience.picture

The role and characteristics of master data

pictureFrom the definition at the beginning, we can understand its first feature: sharing; master data is data shared across systems and departments. Since it is shared data, the problem of data inconsistency between systems can be solved. For example, a user may have different local information (such as moving) between multiple systems. If each system uses its own address information, There will definitely be problems in the end. If each system uniformly uses the latest address, this problem does not need to be considered, which also improves the collaboration process (in fact, the scope of master data management MDM needs to be involved here to ensure the standard and unified specification of master data). At the same time, because the data is shared, it is of high value to the enterprise. Since it is high-value data, it is not an exaggeration to name it master data. pictureFrom the above sentence, we can simplify it to get two words: sharing and value ; since this kind of data is shared and has relatively high value, it is necessary to ensure the quality of the data, and it cannot be changed frequently (ouch, this It is a bit similar to slowly changing dimensions), if each system shares this part of data and the quality cannot be guaranteed, the development of the entire enterprise will be very dangerous, and if the changes are frequent, the maintenance cost of each system will be increased, and the risk will also increase , It can be said that it pulls the whole body.picture

Reference data definition

The definition of reference data in DAMA's Data Management Body of Knowledge Guide is "any data that can be used to describe or classify other data, or to link data to information outside the organization". This definition can be said to be relatively abstract. Simply put, it is dimensional data, the data dictionary that everyone usually understands. The main function of this type of data is to enhance the readability and interpretation of data , such as status code, gender, product Dimensional data such as dimension tables and geographic information. It can be seen that the source of parameter data may be internally generated or externally collected manually (such as international standard codes, industry standards)

features

The characteristics of the reference data are the same as those of the dimension tables, there are slow dimensions and fast dimensions.

Difference between reference data and master data

Master data and reference data are generally two different types of data.
1. From a definition point of view, master data is the data representing business objects, composed of key business entities, which contains the most valuable information shared by the entire organization; while reference data defines a set of allowed values ​​​​used by other data fields Data, which contains additional text descriptions, is more like a data dictionary;
2. From the perspective of scope, reference data is a special subset of master data.
The table below summarizes the difference between master data and reference data:picture

References:

  1. "White Paper on Master Data Management Practice 1.0" issued by China Academy of Communications

  2. DAMA Data Management Body of Knowledge Guide

Guess you like

Origin blog.csdn.net/qq_28680977/article/details/121940112