An introductory compulsory course for AI product managers (4)-knowledge graph

about the author

@毛毛

Product manager

Combines beauty and talent.

In-depth understanding of AI and rich experience.

1 Why understand the knowledge graph

The core of AI is to study how to make computers complete tasks that required human intelligence in the past, and the core of human intelligence is reflected in the perception, reasoning, and decision-making capabilities of different things. Therefore, to make AI products, it is inseparable from the research on perception, the research on the reasoning mechanism and the research on the direction of intelligent decision-making. In terms of perceptual intelligence, AI has made many breakthroughs. For example, the machine's ability to perceive hearing, vision, and touch can be recognized through a camera, microphone or other sensing devices, with the help of some algorithm models of voice recognition and image recognition. And understanding.

The development of perceptual intelligence can collect massive amounts of data from different sources and different storage methods. If you want to use these data to make specific scenario-based applications, there are currently two commonly used methods on the market. One is statistical analysis, which is The most data understanding and analysis done in business, including semantic analysis, sentiment analysis, and data visualization of various indicator analysis. The other is decision-making, which is based on the collected or generated data to make automated decisions, or intelligent recommendation, intelligent question and answer, etc. The core technology that we rely on when doing these content is the technology related to the knowledge graph.

An introductory compulsory course for AI product managers (4)-knowledge graph

2 What is the knowledge graph

Before understanding what the knowledge graph is, first understand the relationship between data, information, and knowledge.

Data refers to sounds, images, and symbols, and usually refers to the most primitive records. The data are isolated from each other without processing and interpretation.

Information refers to the establishment of a certain connection or the addition of certain attributes after the data has been processed;

Information can be processed and processed into data for storage. Data is the manifestation of information.

Knowledge is the sum of knowledge or experience gained through practice. It can be textualized knowledge or cognition stored in the brain.

eg:

"38.5" This is a piece of data and does not have any meaning.

"Xiao Ming's body temperature is 38.5 degrees" This is a piece of information, and 38.5 is a key indicator.

"The temperature of a normal human body is 36-37 degrees. When the body temperature exceeds the basal body temperature by 1 degree or more, it is considered to be feverish, and different temperature ranges can be divided into low fever and high fever..." This is a piece of knowledge. Cases and experiments are generally accepted as correct.

"Xiao Ming has a fever because his body temperature is 38.5 degrees." This result is derived from knowledge.

Knowledge graph is a technical means to describe knowledge and build association relationship model based on graph model. Knowledge commonly used in the real world or knowledge in our mind is usually a descriptive word, and knowledge graph is a description of a certain paragraph The words of knowledge are abstracted into a triple of subject, attribute, and relationship, and presented in the form of a graph. The figure below is a simple knowledge graph. "Cecilia Cheung", "Nicholas Tse", and "Faye Wong" are the main characters; "birth date", "gender" and "age" are the main attributes; "ex-wife", "current girlfriend", and "rival in love" are relationships abstracted by knowledge .

An introductory compulsory course for AI product managers (4)-knowledge graph

Knowledge reasoning process

"Ex-wife" knowledge:

Both the man and the woman had established a marriage in the law, and then dissolved the marriage through agreement or litigation, terminating the rights and obligations between the husband and wife. For the man, the woman is called the ex-wife.

Reasoning process:

Cecilia Cheung and Nicholas Tse had a legal marriage, and then the marriage was dissolved, and Cecilia Cheung was a female, so Cecilia Cheung was Nicholas Tse's ex-wife.

In the knowledge graph technology, "Cecilia Cheung", "Nicholas Tse", and "Faye Wong" are called nodes. Nodes can be entities or abstract concepts; bold black lines are called edges, which represent the relationship between entities or concepts. Relationship, such as the relationship between "Cecilia Cheung" and "Nicholas Tse" is an "ex-wife". Each circle in the figure is a node, and the straight lines connecting the circles are edges. It can be seen that the knowledge graph is composed of nodes and edges. The edges between nodes and nodes can be attributes or relationships. For example, the edges between "Cecilia Cheung" and "Nicholas Tse" represent relationships, and the edges between "Cecilia Cheung" and "Gender: Female" represent relationships. Is an attribute.

What can it be used for?

The earliest application of knowledge graphs was used to improve the capabilities of search engines. Early search relied on hyperlinks between web pages and the matching relationship between search keywords and web pages containing keywords for precise or fuzzy searches. But the ultimate form of the Internet is the interconnection of all things, and the ultimate purpose of search is also a direct search of all things. Therefore, relying only on matching between keywords is not enough to meet the increasingly abundant search needs. In the traditional search mode, when we search for "Who is Nicholas Tse's ex-wife?", the search result may be that a certain webpage contains the sentence "Nicholas Tse's ex-wife is Cecilia Cheung". Then we can find the webpage. The information learned that Nicholas Tse's ex-wife was Cecilia Cheung.
An introductory compulsory course for AI product managers (4)-knowledge graph

The establishment of the above knowledge graph will quickly return "Cecilia Cheung" and personal information when search requirements arise.

An introductory compulsory course for AI product managers (4)-knowledge graph

The construction principle and process of the knowledge graph?

The construction of knowledge graphs is usually divided into two categories, one is the knowledge graph of the open domain, and the other is the knowledge graph of the vertical domain. The knowledge graphs established by search engines such as Google and Baidu search belong to the open domain, like a certain domain. Knowledge graphs based on specific fields and scenarios, such as e-commerce, finance, graphic information, and life entertainment, are knowledge graphs of vertical fields. The scenarios of the two graphs are not the same, but the underlying logic and construction process involved are similar.

The construction of knowledge graph involves many aspects such as knowledge representation, knowledge acquisition, knowledge processing and knowledge utilization.

Knowledge representation:

Simple understanding is that the designer designs the acquired knowledge into a variety of expressions for various problem types and scenarios, and the user can directly use this designed expression method to represent this kind of knowledge information.

For example, as a system designer, I defined "V" to mean "or", and other users can use "V" to represent "or".

Knowledge acquisition:

Refers to the fact that people use design, program coding, and human-computer interaction to enable machines to acquire knowledge. For example, humans build knowledge bases and allow expert systems to acquire knowledge. Most of them are artificially stored in machines. This process is knowledge The process of obtaining.

* A knowledge base is a collection of interrelated facts and data, which is often used to support expert systems. It is a collection of rules in the professional field and contains all the relationships and data connected by the rules.

*Expert system is one of the research directions of artificial intelligence, which refers to the use of human experts in a certain field of knowledge or methods to program, relying on the knowledge system in the knowledge base to make decisions.

Knowledge processing:

It includes the process of knowledge processing, logical judgment, reasoning, and knowledge output.

nlp natural language processing is the core of knowledge processing.

Knowledge utilization:

Apply the standardized knowledge structure to specific scenarios to create value.

In terms of construction technology, data and algorithms are the underlying support of the knowledge graph, which includes multiple stages such as information representation, information extraction, information fusion, information reasoning, and information decision-making.

Information Sources:

The knowledge graph data can usually be obtained through multiple channels or sources, including text, structured databases, multimedia data, sensor data, and artificial crowdsourcing data.

Information representation:

Use computer language to describe the knowledge in the human brain or text to help the next step of reasoning.

The technical means applied, such as text data, usually use nlp natural language processing technology to extract knowledge from the text for entity recognition, entity linking, relationship extraction, event extraction, etc., and use RDF to treat triples as basic data model.

The basic logic includes entities, entity attributes, and relationships between entities.

Information extraction:

Structured and textual data is the main data form currently used. To extract information from structured data, existing D2R tools, such as D2RServer, are generally used.

Extracting information from text mainly involves entity recognition and relationship extraction. Relationship extraction can generally be extracted using a feature template-based method (manual labeling) or machine learning.

Information fusion:

Usually when the own data source or knowledge base is not enough to build to solve the actual problem, it will go to the third-party knowledge base or collect structured data from other channels for fusion, which mainly includes the integration of the model layer and the integration of the data layer, and the core solution The problem is to avoid conflicts between entities and relationships, or use different data identifiers with the same entity meaning, causing unnecessary redundancy.

Knowledge graph completion and reasoning:

The core of this link is to rely on the completion algorithm to achieve, one method is the completion method based on ontology reasoning, and the other is the completion based on the graph structure and relationship path.

Normally reasoning and completion are a collaborative process, and the problem is found through reasoning and supplemented.

Application and decision:

Semantic retrieval, intelligent question answering, intelligent decision-making system, recommendation system.

The following uses specific examples to understand the construction process of the knowledge graph.

3 Application example: Construction of e-commerce knowledge graph

In the current e-commerce transaction scenarios, the transaction scale is huge, not only involving online and offline transaction scenarios, but also various complex shopping scenarios combining various new retail, multilingual platforms, and online and offline transactions. The demand for data connectivity is becoming stronger and stronger, so the knowledge graph of e-commerce becomes very important for the industry.

The knowledge graph of e-commerce is mainly built around commodities and disassembled based on the main framework of people, goods, and markets.

When performing knowledge representation in the field of e-commerce, it is first necessary to confirm how many primary and secondary ontologies are involved. The main source of e-commerce knowledge acquisition is knowledge crowdsourcing. The core involves the design of the ontology and the attributes of the product itself. , Consumer needs, platform operation and management mechanism. The data collection tools on different platforms and channels are different, and the storage form of the collected data will also be slightly different, such as the selling points, details, pictures, reviews of e-commerce, brand and word of mouth in public opinion information, involving a lot of text Data, image data. Various NLP and CNN technologies are involved in knowledge representation. The knowledge naming recognition system is required to have the ability to recognize large-scale entity types. And link the identified subject with the knowledge graph. Examples of Alibaba's e-commerce cognitive map mainly include:

Commodity domain:

Model, size, size, color, taste, material...

User domain:

Gender, age, style, brand, purchasing power...

LBS domain: shopping scene, group, pan-category...

An introductory compulsory course for AI product managers (4)-knowledge graph

Then the entity needs to be described. In addition to the basic attributes and attribute values, it needs to be realized through entity tags. Most of the entity tags change quickly and are usually obtained through knowledge inference. For example, in the label of a commodity, it can be processed through the ratio of materials or national industry standards. E.g:

Low sugar:

The sugar content per 100g or 100ml of the food cannot exceed 5g;

Sugar-free:

The sugar content per 100g or 100ml of food cannot exceed 0.5g

Through knowledge reasoning, the data in the product ingredient list can be transformed into "no sugar" and "low sugar" knowledge points, thereby converting the data into knowledge labels. Most of the information will be fragmented after extraction. It is necessary to integrate the information from the established relationship knowledge base or the third-party knowledge base source, as well as the technical operations of entity alignment and entity disambiguation.

Entity alignment:

For example, Dior is a brand name, and DIOR is the English name of the same brand. Although the text of the same brand is different, it will be recognized by the computer as two entities. Therefore, we need to align and unify similar content.

Entity disambiguation:

For example, an apple is a kind of fruit. In some contexts, it may express an Apple mobile phone. In this case, entity disambiguation needs to be carried out according to the context.

After the above operations are completed, entities will be extracted. In the process of entity extraction, algorithms will be used to calculate the similarity between entities, which mainly rely on the relationship between the ontologies established in the ontology library for reasoning and complement. For example, if different people bought the same product or bought similar products, what node should be used to associate the knowledge graph. It can be realized by automatic extraction or manual extraction. Automatic extraction can be used for large-scale tasks and has great advantages in multi-source heterogeneous data processing. However, the extraction and recognition of complex scenes still requires manual intervention.

After the preliminary knowledge graph is successfully constructed, the quality of the knowledge base needs to be evaluated. When some relationships cannot be extracted from the knowledge base, knowledge reasoning algorithms and knowledge graph completion algorithms are needed to optimize the relationship links. At present, there are some technical solutions on the market. If you are interested in children's shoes, you can consult more information to expand.

A community where data people communicate and learn, follow us, master professional data knowledge, and get to know more data partners.

Take you to explore the magical mystery of data

Guess you like

Origin blog.51cto.com/13526224/2575264