Knowledge Fusion: Knowledge Reasoning

Introduction to Knowledge Reasoning

1. Classification of knowledge reasoning tasks

Reasoning is to obtain new knowledge or conclusions through various methods, and these knowledge and conclusions satisfy semantics. Its specific tasks can be divided into satisfiability, classification, and materialization.

Satisfiability can be reflected in ontology or concept. In ontology, ontology satisfiability is to check whether an ontology can be satisfied, that is, to check whether the ontology has a model. If the ontology is not satisfied, it means there is an inconsistency. Concept satisfiability is to check the satisfiability of a certain concept, that is, to check whether there is a model so that the explanation for the concept is not an empty set.

2. Introduction to knowledge reasoning

The OWL ontology language is the most standardized (formulated by W3C) and the most rigorous (using description logic) in the knowledge graph. The most expressive language (a subset of first-order predicate logic), it is based on RDF grammar, so that the represented documents have a structural basis for semantic understanding. Promotes the use of a unified vocabulary and defines rich semantic vocabulary. Also allows logical reasoning.

About the logical basis of OWL language: description logic.

Description logic

Description Logic is a formalization of object-based knowledge representation, also called concept representation language or terminology logic. It is a decidable subset of first-order predicate logic.

A description logic system consists of four basic parts:

1. The most basic elements are concepts, relationships, and individuals.

Concepts are interpreted as subsets of a domain

The relation is interpreted as a binary relation (Cartesian product) over the field

Individuals are interpreted as instances within a domain

2. TBox term set: a collection of axioms for conceptual terms

It is generalized knowledge, knowledge that describes concepts and relationships, and is called an axiom. Due to the inclusion relationship between concepts, TBox knowledge forms a Lattice-like structure. This structure is determined by the inclusion relationship and has nothing to do with the specific implementation. The TBox language has definitions and inclusions, where definitions are names that introduce concepts and relationships, such as Mother, Person, and has_child, and inclusions refer to axioms that declare inclusion relationships.

3. Abox assertion set: individual assertion set

Refers to the information of specific individuals. ABox contains extensional knowledge (also known as assertion), which describes specific individuals in the domain of discourse. Description logic knowledge base  K:= <T, A> ,  T is TBOx ,  A is ABOx . The ABox language includes concept assertions and relationship assertions. Concept assertions indicate whether an object belongs to a certain concept, such as Mother (Alice), Person (Bob). A relational assertion indicates whether two objects satisfy a specific relationship

4. Inference mechanism on TBox and ABox

Descriptive logical semantics: Interpretation I is a model of knowledge base K if and only if I is a model of every assertion in K. If a knowledge base K has a model, K is said to be satisfiable. If it is asserted that σ is satisfied for every model of K, then K is said to logically imply σ, denoted as . For concept C, if K has a model I such that C is said to be satisfiable.

Description logic constructs complex concepts and relationships based on simple concepts and relationships based on the construction operators provided. Description logic contains at least the following construction operators: intersection (), union (), not (¬), existential quantifier () and universal quantifier (). With semantics, we can make inferences. Ensure the correctness and completeness of reasoning through semantics.

Ontology reasoning methods and tools

Common methods based on ontology reasoning include methods based on Tableaux operations, methods based on logic programming rewriting, methods based on first-order query rewriting, methods based on production rules, etc.

1. Based on Tableaux operation

Tableaux-based operations are suitable for checking the satisfiability of an ontology and instance detection. The basic idea is to construct an Abox through a series of rules to detect satisfiability, or to detect whether a certain instance exists in a certain concept. This idea is similar to the reductive refutation of first-order logic.

Tableaux operation rules (taking the main DL operators as examples) are as follows:

Let me explain the first one here. The first one is that if the conjunction of C and D(x) is, and at the same time C(x) and D(x) are not in, then it means that it may only contain part of C, and C(x) is not in it, then we add them to it.

The Tableaux operation is based on the Herbrand model. You can simply understand the Herbrand model as the smallest model that can satisfy the model.

2. Method of rewriting based on logic programming

Ontology reasoning has certain limitations. For example, it only supports reasoning based on predefined ontology axioms and cannot support flexible reasoning for custom vocabulary; users cannot define their own reasoning process, etc. Therefore, rule reasoning is introduced, which can customize rules according to specific scenarios to achieve user-defined reasoning processes.

Based on the above description, the Datalog language is introduced, which can combine ontology reasoning and rule reasoning. A logic language for knowledge base and database design. Its expression ability is equivalent to that of OWL. It supports recursion, making it easy to write rules and implement reasoning.

The basic syntax of Datalog includes:

Atom: where p is the predicate, n is the order number, and is the item (variable or constant), such as has_child(X, Y);

Rule: Constructed from atoms, where H is the head atom and is the body atom. For example: has_child X, Y : −has_son X, Y

Fact: It is a rule with no body and no variables, such as has_child Alice, Bob: −

A Datalog program is a collection of rules;

3. Method based on first-order query rewriting

Based on query rewriting, we can efficiently combine data sources with different data formats; at the same time, the rewriting method associates different query languages.

First-order query is a language with first-order logical form. Because Datalog is a query language for the database and also has first-order logical form. Therefore, Datalog can be used as the intermediate language. First, the SPARQL language can be rewritten as Datalog, and then Datalog can be rewritten. for SQL queries.

4. Methods based on production rules

The production system is a forward reasoning system that can execute rules according to a certain mechanism to achieve certain goals. It is similar to first-order logic, but there are also differences. It is used in automatic planning and expert systems.

The production system consists of: fact collection (Working Memory), production/rule collection, and inference engine:

Fact set/working memory (WM): It is a collection of facts and is used to store all the facts in the current system.

Facts (Working Memory Element, WME) include description objects and description relationships. The description object is in the form, where type, attr_i, val_i are all atoms (constants), for example (student name: Alice age: 24). Describe the relationship (Refication), for example (basicFact relation:olderThan firstArg: John secondArg: Alice) is abbreviated as (olderThan John Alice).

Production Memory (PM) is a collection of productions. Productions are statements similar to this. where conditions is a set of conditions, also known as LHS. Actions are a sequence of actions, called RHS.

LHS is a set of conditions, and the relationship between each condition is and. When all conditions in LHS are met, the rule is triggered. The condition is of the form:

RHS is an action sequence, that is, the order of execution, which is executed in sequence. The types of actions include ADD pattern, REMOVE i, MODIFY i (attr spec).

Inference engine: It can control the execution of the system, including pattern matching (using the condition part of the rule to match the facts in the fact set, the rule that the entire LHS is satisfied is triggered and added to the agenda), conflict resolution (according to a certain The policy selects one from multiple triggered rules) and executes the action (executes the RHS of the selected rule, thereby performing certain operations on the WM).

Pattern Matching  RETE  Algorithm

Pattern matching uses the conditional part of each rule to match the current WM. An efficient pattern matching algorithm is the RETE algorithm, proposed by Charles Forgy (CMU) in 1979. It organizes the production LHS into a discriminant network form, which is a typical algorithm that trades space for time. The process is shown in the figure below:

Introduction to related tools

Drools

Drools is a commercial rule management system that provides a rule reasoning engine. The core algorithm is based on an improvement of the RETE algorithm. Provides a rule definition language and supports embedding Java code.

Jena

Jena is used to build a Java framework for semantic web applications. It provides interfaces for processing RDF, RDFs, and OWL data, and also provides a rules engine. Provides in-memory storage of triples for querying.

RDF4J

RDF4J is an open source framework for processing RDF data, supporting the parsing, storage, reasoning and query of semantic data. Can be associated with almost any RDF storage system and can be used to access remote RDF storage.

Guess you like

Origin blog.csdn.net/WhiteCattle_DATA/article/details/133316581