Tag category system (business-oriented data asset design methodology) - Reading Notes 4

Chapter 4: 4 core principles

1. Development of links between business and data

(1) V1.0: enabled

Using platform tools to improve the operational efficiency of data processing and lower the threshold of technical operations is a level that allows technicians to use data.

(2) V2.0: faster

This is a way to unify the three elements of technology, data, and culture in the data center to improve the enterprise's business ability to respond to changes (the core of the data center strategy is to allow business personnel to use repeatable technical methods to quickly trial and error). Enterprise personnel use data faster, layer of trial and error.

(3) V3.0: Intelligence

To use data intelligently and effectively, on the one hand, the data operating system can understand the meaning more intelligently and automatically operate the entire data link; on the other hand, continuous innovation and in-depth exploration of data applications are a way to allow data to produce and reproduce on its own. The level of living.

The first principle of the tag category system methodology is "tree-structured tag tree". This basic principle can be specifically elaborated through four core secondary theories:

  • Roots, branches, leaves/flowers
  • Energy, nutrients and decay
  • Fractal Structure and Asset Tree Planting Pattern
  • Asset tree usage pattern deduction

2. Roots, branches, leaves/flowers

2.1 The roots of a tree determine what kind of tree it is

Designing a label category system requires starting from the "root directory". The data granularity corresponding to the "root directory" is "object". Objects are divided into entity objects and relationship objects. Therefore, there are two major types of label category trees: entity trees and relationship trees.

2.2 Classification of corresponding labels of tree trunks

The branches of the tree correspond to categories in the label category system, that is, label classification. Categories are a fractal structure that can be continuously differentiated; any subsystem can also be intercepted and used as an independent label category according to the needs of the scene. Categories are classifications of labels, not objects.

2.3 Tree leaf/flower department pointing labels

The leaf/flower parts of the tree correspond to various attributes of the object, that is, labels. Tags are mapped to fields in the database table, which is the most appropriate data asset granularity verified by a large number of data application practices. Roots, branches, leaves/flowers are connected together to form the basic structure of a tag tree.

2.3.1 Leaves and flowers are like dynamic tags and static tags

(1) The difference between dynamic and static tags: The difference between dynamic and static tags lies in whether the tag value of an individual object under the tag will change frequently.

(2) Relationship between dynamic and static tags: The value of static tags may affect the value of dynamic tags. If the gender value is female, it is likely to affect the values ​​of some behavioral action tags. The values ​​of static tags can be inferred and calculated from the values ​​of a large number of dynamic tags. For example, through a large number of consumption, browsing, and collection tag values, the value of the gender tag can be inferred.

2.3.2 Leaf and flower-like genes affect population shape

The leaves/flowers on the category tree are like gene fragments, mapping one by one and affecting the attribute performance of individuals in the population, and they vary from one to another. However, normal individuals in the population have the same number of gene types and the standards are unified.

2.3.3 The tag category system is essentially a pattern design of object attributes

The design of the label category system for a certain type of object actually completes the schema design of the attributes of that type of object. A well-designed label category system is like a mold, which can quickly depict the image characteristics of specific individuals under this type of object.

2.3.4 The difference between labeling and label design

Labeling is similar to coloring a leaf of a specific instance tree, that is, marking the label value or calculating the label value; label design is a shape design at the template level, and the two are not at the same latitude.

2.3.5 Meta tags

The leaves/flowers themselves describe the attributes of the object, and there are also some attributes that describe the attributes of the leaves/flowers. These tags used to describe the tags are called meta tags.

Tag system designers use a data description method to describe the essence of objects in a unified way: upgrading individual observations to group observations, rather than summarizing individual phenomena in the past, and having the ability to adapt to future scenarios.

3. Energy, nutrients and decay

3.1 Entity trees are connected through relationship trees

Relationship trees connect different entity trees, but dynamic tags cannot simply pick leaves from the relationship tree and paste them onto the entity tree. It needs to go through the transformation of the object angle and the transformation of the statistical form.

3.2 Backtracking from the entity tree leaves to open the relationship tree forest

The dynamic category labels of entity objects are often calculated through statistics or algorithms, and are obtained by reprocessing a series of detailed behavioral data. Therefore, a "colored" dynamic leaf on the entity object tree can often be traced back to a large number of specific detailed behavior leaves, or even a specific relationship entity tree forest can be opened.

3.3 The relationship tree gives energy to the entity tree

The tags of entity objects will increase and enrich accordingly as the number of relationship objects related to them increases. Every time a new action, behavior, or connection is added, that is, a relationship tree, a new type of leaf will be mapped and transformed on the entity tree.

3.4 Business use is the nutrient supply to the tag tree

If tags are widely used in business, their value position is very stable and they will receive corresponding service guarantees in the data asset system, such as data governance, resource priority, operation marketing, etc.; but if tags are only used once or twice, they will be If it is shelved, or has no business use at all, it will wither and be removed from the shelves due to lack of nutrition. Removing tags with no reuse value from the shelves is a tag life cycle process that must be considered. Otherwise, enterprises will easily face the risk of data asset explosion, that is, there will be more and more data items, and management and operation costs will be huge.

3.5 Finally tease out a forest rather than a tree

Each entity object and relationship object will form an independent category tree. After sorting out enterprise data assets using the tag category system, a large number of tree structures will generally be sorted out. Once the relationship tree structure is formed, it will be relatively stable and unlikely to change in shape, while the entity tree will undergo corresponding tree shape changes as the relationship tree is added and destroyed.

What enterprises need to focus on maintaining is entity trees that are frequently used and have reuse value. Business relationships are phenomena, and entities are often the essence of business.

4. Fractal structure and asset tree planting model

The tag category tree, like the evolutionary tree of life, is constantly differentiated by the influence of energy and environment, forming rich tag clusters. Tag clusters will undergo survival of the fittest and natural selection. Depending on the purpose and pace of building data assets, there are two models for reference.

4.1 Complete planning, from shallow to deep

If the purpose of building assets is to form a complete plan of data assets, guide data collection, sorting, processing, mining and other stages of work, and are willing to spend a long time to implement the overall plan of data assets, then this model can be used

  • Select the most basic branches in the object's classic category tree, and add labels as needed under the basic branch categories to form a version 1.0 consumer label category system.
  • According to business development needs, comprehensive expansion is carried out such as medium circle and large circle. At this time, the category tree gradually grows, with many categories and rich tags.
  • Unilateral expansion may also occur when the existing basic data or business development is relatively simple, or when a certain business develops rapidly and a certain type of tags develop rapidly.

No matter which method is adopted, it needs to start from the roots to the basic trunk, to subdivided branches, and then to the leaves, which reflects an overall planning idea.

  • Advantages: Comprehensive planning, future-oriented, and knowing the company’s comprehensive layout on the data side
  • Disadvantages: The construction period is long and the results are slow, so it encounters great resistance. It must be the top project leader to finally complete the entire process from planning to implementation of comprehensive data set assets.

4.2 Deep penetration and direct interception from part of the

If an enterprise builds assets to support business scenarios, especially to enable rapid reuse of tag assets among multiple business scenarios and needs to quickly reduce data effectiveness, then this mode can be selected.

Just cut out the required branches directly from the task part of the object's classic category tree, and assemble the roots and leaves. Because the label category system is a fractal structure, the whole and the parts are isomorphic, and any local branch can be cut out as an independent category tree.

  • Advantages: tags directly affect the business, can quickly obtain business nourishment, present data value, and receive little doubts and resistance.
  • Disadvantages: When businesses and labels continue to change and adjust, the entire category structure will undergo major changes or even reconstruction, which will have a greater impact.

In the process of evolution, what is important is not the ultimate evolution of a certain line, but the branches that continue to differentiate. Enterprises should sort out as much data as possible from multiple business formats and departments across the entire group, continuously carry out energy mapping and genetic crossover, form rich and interesting tag clusters, and organize and organize them in an orderly manner through the tag category system, so that data assets can not only It can meet the needs of various scenarios in the future, and also has very strong self-iteration capabilities and good sustainability.

5. Asset tree usage model deduction

The data asset library formed through the tag category system methodology includes asset lists and asset entities:

  • Asset list: The asset list is similar to the asset catalog. Users can clearly see the label category system of all objects through the asset catalog/portal/market interface. After selecting a certain tree, you can see the specific branch outline of the tree: first-level directory, second-level directory, etc. After selecting a leaf category, you can see a list of all tags covered under it. Each tag is like a unique leaf, with independent meta tag values ​​such as ID, name, logic, type, and value dictionary.
  • Asset entity: Asset entity is a specific individual instance in the designed tag category tree mode, that is, each object individual. Asset entities all have labels and label classifications contained in the object category tree. Asset entities can be simply considered to be specific trees of different colors. At the library table storage level, asset entities can be mapped to each specific data record in the processed tag table. These data records have unified and standard column information, but the specific column values ​​are different.

5.1 Query service

  • Determine what to look for
  • Create query service
  • After the query service is created, an API or interactive interface is generated. Specific business systems or business personnel can call the API or use the service through interface system operations.

5.2 Analysis Services

Analysis services are often used in OLAP analytical data operations in business systems. The process is as follows:

  • Sort out clearly what is the object of analysis
  • Select the object, select the service type "Analysis" in the service management, and enter the service creation process.
  • After the analysis service creates an account, it generates an API or interactive interface. Specific business systems or business personnel can use the service through the API or through the interface operating system.

Data analysis is the processing of the values ​​of an object on a certain attribute label, that is, the different deformations of colored leaves on a certain latitude section; the value distribution is the deformation of the color distribution into quantitative representations on different data axes; find Averaging is to transform the quantitative differences of various colors into the final harmonious color.

5.3 Circle selection service

The circle selection service is often used in the operation of specific target objects. The object is first confirmed and selected. After the circle selection service is created, an API or interactive interface is generated. The specific business system or business personnel can call the API or use the interface system operation. Serve.

Guess you like

Origin blog.csdn.net/baidu_38792549/article/details/125715393