Knowledge Graph: History of Development of Knowledge Representation

Data is the core asset of many industries. The deep integration of artificial intelligence technology and data has also become a focus of major industry organizations. Among various artificial intelligence technologies, knowledge graphs can better express business scenarios. A diverse and comprehensive picture can better serve the analysis and decision-making scenarios in the artificial intelligence era, and has become one of the hot spots of technological innovation in recent years.

concept of knowledge

Knowledge representation is a description of knowledge, or a set of conventions for knowledge, a data structure acceptable to computers for describing knowledge. It is the basis for machines to achieve intelligence, allowing machines to use knowledge like humans.

Knowledge has the characteristics of relative correctness, uncertainty, representability and availability. According to different classification standards, knowledge can be divided into different categories. For example, according to the scope of scope, it can be divided into common sense knowledge and domain knowledge. According to functions and expressions, it is classified into factual knowledge, procedural knowledge and control knowledge. According to the classification of certainty, there are deterministic knowledge and uncertain knowledge. According to the structure and expression form, it can be divided into logical knowledge and image knowledge.

Early knowledge representation methods

1. First-order predicate logic

Predicate logic (LP) can further analyze atomic propositions, analyze individual words, predicates, and quantifiers, and study the logical relationships of their formal structures, correct reasoning forms, and rules.

First-order logic is the basic part of mathematical logic, mainly including classical propositional logic and first-order predicate logic, but in fact first-order predicate logic includes propositional logic. The reason why first-order logic is "first-order" is because the predicate logic it contains is first-order. Predicate is a word that expresses the properties of an object. The properties of objects have levels. In predicate usage, this level is called "order". The so-called first-order predicates refer to predicates that describe individual attributes. Predicates such as "red" and "greater than" are only applicable to individual concepts. Predicates such as "bright" and "transitive" are used to describe predicates such as "red" and "greater than". Predicates are higher-order predicates, and they describe the attributes of attributes.

First-order predicate logic has the advantages of being natural, close to natural language, easy to accept, rigorous, and easy to convert into computer internal forms. However, it also has the disadvantages of being unable to express uncertain knowledge, difficult to express heuristic knowledge and meta-knowledge, and combinatorial explosion. Disadvantages such as low efficiency. In order to overcome the above shortcomings, people have proposed improvements such as Horn logic and description logic.

2. Production system

The production system is a broader rule system that is related to, but also different from, predicate logic. Most of the early expert systems were based on production systems. Production knowledge representation is one of the commonly used knowledge representation methods. It is based on the large number of causal relationships between various kinds of knowledge in the memory model of the human brain, and is expressed in the form of "IF-THEN", that is, production rules. This form of rules captures the behavioral characteristics of human problem-solving and solves problems through the cyclic process of awareness-action. A generated system consists of three basic parts: a rule base, a comprehensive database and a control mechanism.

The rules in predicate logic are similar to the basic forms of productions. In fact, implication is just a special case of productions. Production rule representation has very obvious advantages, such as good naturalness, easy modular management, effective representation of knowledge, and clear knowledge representation. However, production rules also have shortcomings such as low efficiency and inability to express structural knowledge. Therefore, people often combine it with other knowledge representation methods (such as frame representation and semantic network representation).

3. Frame representation

Frame representation was proposed by Minsky in 1975. Its most prominent feature is that it is good at representing structural knowledge. It can express the internal structural relationship of knowledge and the special relationship between knowledge, and associate it with a certain entity or entity. The relevant properties of the set are brought together.

The frame representation method believes that people's understanding of various things in the real world is stored in memory in a structure similar to a frame. When facing a new thing, find a suitable framework from memory, and modify and supplement its details according to the actual situation, thereby forming an understanding of the current thing.

A frame is a data structure that describes a fixed situation. Generally, a frame can be regarded as a network composed of nodes and relationships. The highest level of the framework is fixed and describes something that is always true for a given situation. There are many terminals - called Slots - at the lower levels of the framework. By filling in specific values in the slots, you can get a framework that describes specific transactions. Each slot can have some additional instructions – called facets, which are used to point out the value range and evaluation method of the slot, etc. . A frame can contain all kinds of information: information describing things, information about how to use the frame, expectations about what will happen next and information about what to do if the expected event does not happen, etc. This information is contained in the frame. in each slot or side.

A specific thing can be described by the value filled in the slot, and a frame with different slot values can reflect each specific thing in a certain type of thing. Related frames are linked together to form a frame system, and the transition from one frame to another in the frame system can represent changes in state, reasoning, or other activities. Different frameworks can share the same slot value. This method can better coordinate information collected from different angles.

Frame representation describes knowledge very completely and comprehensively; the quality of the knowledge base based on the frame is very high; and the frame allows numerical calculations, which is better than other knowledge representation languages. However, the construction cost of the framework is very high, and the quality requirements for the knowledge base are very high; the expression form of the framework is inflexible, and it is difficult to be used in correlation with other forms of data sets.

4. Semantic Network

Semantic network is one of the most important methods in knowledge representation. It is a knowledge representation method with strong expressive ability and flexibility. Semantic networks use directed graphs of nodes and labeled edge structures to describe relationships between events, concepts, situations, actions, and objects. Labeled directed graphs can describe the relationships between objects very naturally.

Semantic networks are widely used due to their natural nature. The characteristic of a knowledge base represented by a semantic network is that it uses a labeled directed graph to describe possible events. Nodes represent objects, object properties, concepts, events, situations, and actions, and labeled edges describe relationships between objects. The modification of the knowledge base is achieved by inserting and deleting objects and their related relationships. Most of the fields where network representation is suitable are areas where reasoning is based on very complex classifications and areas where it is necessary to express the relationship between event conditions, properties, and actions.

The basic form of the semantic network is (node, arc, node 2). The nodes represent various things, concepts, situations, attributes, actions, states, etc. Each node can have several attributes, which are generally represented by frames or tuples. In addition, a node can also be a semantic sub-network, forming a multi-level nested structure. The arcs in the semantic network represent various semantic connections and indicate a certain semantic relationship between the nodes it connects. Both nodes and arcs must be labeled to facilitate the identification of different objects and the various semantic connections between objects.

Advantages of Semantic Network:

1. It is an intuitive representation method to express the connection between various nodes in a clear and concise way;

2. It emphasizes the semantic connection between things, embodies the associative process of human thinking, and is in line with people's expression of the relationship between things. Therefore, it is easier to convert natural language into a semantic network;

3. It has a wide range of representation and powerful representation ability. Almost all knowledge that can be expressed by other forms of representation methods can be represented by semantic networks;

4. It is a structured knowledge representation that displays the attributes of things and various semantic relationships between things.

Disadvantages of Semantic Networks:

1. The inference rules are not very clear, and the rigor and validity of the inferences obtained from network operations cannot be fully guaranteed;

2. Once the number of nodes is too large and the network structure is complex, reasoning will be difficult;

3. It is inconvenient to express judgmental knowledge and deep knowledge.