Graph combat | Jingdong’s question answering system based on time series knowledge graph

Reprint public account | DataFunSummit




7b80cd5baee3cdf1da9c7e41ce08c840.png

Sharing guest: Dr. Shang Chao, Researcher of JD Silicon Valley Research Institute

Editing: Zhang Cunwang Beihang Hangzhou Innovation Research Institute

Production platform: DataFunTalk

Guide: This article will share a latest attempt in the direction of Temporal Knowledge Graphs, how to build a question-answering system on a time-series knowledge graph, mainly including the following parts:

  • Introduction to the Background of Time Series Knowledge Graph

  • Questions in Question Answering System Based on Time Series Knowledge Graph

  • TSQA method

  • Analysis of results

01

Introduction to the Background of Time Series Knowledge Graph

d6a433a34bfd7fddeca2ed30ff88ad43.png

First, let me introduce the concept of time-series knowledge graph. The essence of time-series knowledge graph is to extend the traditional static knowledge graph in the time dimension. The knowledge graphs studied by most people are concentrated in the field of static knowledge graphs. This type of knowledge graph is composed of triplet relationships, which contain points and edges, but many graphs actually imply some dynamic information, such as the time on the point Changes, time changes on the side, the main focus of this sharing is the time changes in the knowledge graph.

The above figure is an example. If there is time information on the edge, then the graph can be split into multiple graphs in the time dimension based on the time on the edge, which is equivalent to the series connection of multiple static knowledge graphs in the time dimension. Our follow-up The focus is on the research of time-series knowledge graphs.

eb2de3b220013e994bce6906c5bc9105.png

Compared with the traditional knowledge graph, Temporal Knowledge Graphs have more additional information. Each representation in the original knowledge graph is a triplet, while each representation in the temporal knowledge graph is a quadruple. It not only contains The head entity, tail entity and relationship, as well as the time point or time range when the relationship was established.

For example, for the fact that Obama is the president of the United States, it is only established in the time period of 2009-2017, not in all time periods, so the facts in the time-series knowledge map are time-dependent, Time considerations for facts are very important. When we ask a question, we need to know the specific occurrence time to accurately find the tail entity we want, so how to model the time in the time series knowledge graph is a very difficult problem.

25ce8ac1382af9b32ac2b3f2ce967bda.png

Next, I will introduce how to do representation learning and relationship prediction tasks in time series knowledge graphs.

These methods can be divided into two categories. The first method is to split the quadruples in the time-series knowledge graph into time-based triples, and then use the triple-based modeling method to realize the construction of the time-series knowledge graph. die . In this type of method, I will give two examples. The first example is the method based on time transformation. In this HyTE article, each time is mapped to a hyperplane, which means that different time points will be mapped to different hyperplanes. plane, and then modeling the relationship on each hyperplane is actually modeling the triplet relationship. After expanding this logic, we can get the second example, which is to make the graph into slices. The original time series knowledge graph is a very large graph, but it is actually a subgraph from each time point. In a time period, not all edges are established. We can obtain multiple subgraphs based on time changes by picking out the edges established in different time periods. This is how Temp is split into multiple subgraphs in this way, and then each subgraph is convolved through GNN, and then the direct time sequence of multiple graphs is modeled based on RNN, and the results of GNN are connected. This method Not only the spatial dimension, but also the representation learning in the temporal dimension are considered.

a517f7c189962a8c5cc0e93a99586089.png

The second type of method has a different idea. It does not convert the quadruple into a triple, but directly models the quadruple. In this type of method, there are two angles to do it . The first angle is relatively simple. You can refer to the method in TransE. In TransE, a head entity can reach the corresponding tail entity after a relation conversion. Extending this concept to quadruple relations, we can add a time ( ) in addition to the head entity ( ef1ad5ff44a143616d8f3f16d689db73.png) and relation ( ) to make it equal to the representation of the tail entity ( ), which is the most straightforward way to model four Methods for tuple relations. In addition, there are more complex methods, based on the hyperbolic space transformation of Tensor Decomposition. For example, for triples, you can use ComplEx to multiply three complex numbers, and then take the real part as the output of the function. This idea It can also be extended on the quaternion, and it becomes TcomplEx. This type of method is relatively new and easy to use.9123870e7f8b454c99d87fac48105eec.png0359c603f55a0815a3cf84970b0cc7f5.png19ec8e394990eb651da25b7348fafce9.png

In summary, the expression of how to learn time-series knowledge graphs is mainly in these two directions. Triplets can be modeled and then connected based on time, or quadruples can be directly modeled. 

02

Questions in Question Answering System Based on Time Series Knowledge Graph

Now that we have understood what a time-series knowledge map is and how to learn the Embeddings of a time-series knowledge map, we need to explore what kind of problems will be encountered if we do a question answering system on a time-series knowledge map.

55cb8850d6eb9532b9c70a6d47091e8e.png

To explore the question-answering system based on time-series knowledge graphs, we first need to look at what the time-series or time-related questions here look like.

These questions can be roughly divided into two categories. The first category is relatively simple questions , such as Simple Time in the table, which asks about the missing time in a quadruple in the time series knowledge graph. If the question is "Where is Obama in In which year did you become the president of the United States?”, the question answering system needs to find out or predict the corresponding time. Of course, what is missing in the quadruple may also be an entity. This is a Simple Entity problem. The object to be predicted is an entity. For example, if the head entity, relationship, and time are known, the question answering system needs to tell us which tail entity is the corresponding one. Both of these are simple questions because they are just one-hop relations and are based on a quaternion to answer the question.

In addition to this, we will encounter some more difficult problems, which are called complex problems here . In the above table, complex questions are divided into three types: Before/After questions are questions that want to understand events that occur before or after an event, and require the content of the corresponding event or information corresponding to the question as an answer. This type of question needs to be considered The time change is forward or backward; the First/Last type of question is a comparative question, for example, there are many people who became the president of the United States, so who was the first to be the president of the United States? The question answering system needs to find who is the first one in a large time period, that is, who is the president of the United States at the earliest point in time. To give another example, such as asking which team Messi played in the first game, you need to find the earliest time point first, then make inference predictions, and finally find the corresponding answer. For Time Join problems, sometimes two facts are involved, such as World War II and the President of the United States. Their time intervals will overlap, and what we want is the factual information of the intersection time point. If we find If the cross-over time point is more accurate, then the question can be answered better. Since answering these questions requires temporal reasoning, complex problems are harder to solve.

e4ec5cfbb28574f293c37169bf9e99de.png

We use an example to explain how to understand complex problems and how to find answers in knowledge graphs. In this example, we want to ask who the president of the United States was before World War II. Based on this question, we will first search the time series knowledge graph based on the two keywords "World War II" and "President of the United States" to reduce the answer search space. Because a time-series knowledge graph is very large, containing millions or tens of millions of nodes, there is no need to look for answers in all the nodes, you only need to look at the nodes related to the current question. According to the World War II and the US President node, combined with its related nodes, the subgraph is extracted from the time series knowledge graph to carry out the next search.

In the second step, when analyzing the problem, we will find that this is a Before problem, which involves a logical reasoning "before World War II". We need to find World War II and understand the time range of World War II (here is 1939-1945 year), and secondly, you need to know that Before is equivalent to an operation on the time axis. What you need is the time before the time period of 1939-1945, which is about the time of 1938. Based on the found time point, the keywords "President of the United States" and "Who is the President of the United States" is a factual query in the time-series knowledge graph, and it can be found that Franklin was the President of the United States at that time, which is the answer we finally want.

In this process, we can see that the question answering system needs to reduce the search space and do time reasoning through multiple steps, and then make predictions based on the time derived from reasoning, so it is a multi-step reasoning process, which is to solve the time sequence The basic idea of ​​complex questions in knowledge graph question answering.

ccf397a4419646a9578ed5491bc80737.png

In order to better solve the complex problem scenarios in the question-and-answer of time-series knowledge graphs, three small problems need to be solved :

  • Understand the entered question and speculate about what the timing is implicit in the question.

  • How to model time points and timelines in time series knowledge graphs.

  • Time expression is inconsistent : time in the time series knowledge map is a value or a time range, such as 1939 is a value, but the time mentioned in the question is often a text description, such as the time description containing Before, so how to learn The relationship among them, modeling time and time constraints, is a very difficult and important problem.

79f1159a8a6615fefa09a29d340bf7d3.png

Next, I will introduce an article on ACL2021 about question and answer in the temporal knowledge map. The logic of this article is relatively simple, which is to map all the relationships in the time series knowledge map to a new space, which is equivalent to making predictions in the Embedding space. First, in the first half, the TComplEx method will be used to obtain entities in the time series knowledge map , relationship, and embedding representation of time, and then for the problem entered by the user, no matter whether the problem is difficult or not, a language model such as BERT is used to obtain the expression of the problem, and it is mapped to the same space, and then in this space Calculate the similarity between the representation of the question and the representation in the temporal knowledge graph one by one to predict time or entities.

1f161539958c6b4fffc679b0fb59ff9e.png

There are some aspects that have not been considered in the method just introduced and the methods in some other related articles:

  • Unable to deal with complex problems : There is no time reasoning process in this type of method, so it cannot be predicted for such as "time point before World War II", and time reasoning ability is very important for complex problems.

  • The pre-trained language model is not sensitive to time words : it is difficult for the pre-trained language model to understand time words. For example, if simple words such as Before/After appear in a sentence, it is difficult for the language model to distinguish and understand it should be Forward or backward. Because in the pre-trained language model, sometimes the change of a simple word does not particularly affect the expression learned from the entire sentence, it will appear insensitive to time words. So how to improve the sensitivity to these time words is a very important issue.

  • Lack of consideration of implicit relationships: In traditional time series knowledge graphs, there are actually many implicit relationships. There is a lot of time in the knowledge graph, and these times are a quadruple relationship when put into the corresponding facts, making this fact true, but what is the relationship between these quadruples? For example, the events that occurred in 1999 and the events that occurred in 1998 are actually in a time-disordered order, but the traditional time-series knowledge map does not model this time-series relationship.

Based on the above three questions, we propose our TSQA method, which will be introduced in detail below.

03

TSQA method

The ultimate goal of the TSQA method is to improve the time sensitivity of the entire question answering system, not only from the perspective of time series knowledge graphs, but also from the question answering system to improve time sensitivity.

1. How to infer the correct point in time?

679a879c17749b05c3210fe696bb7399.png

Because timing-based question answering itself is a reasoning process, the most important step in this timing reasoning process is how to find the time point of interest more accurately.

3b8d4699d855376099f6c1430fcf3623.png

So here we propose a time-sensitive question-and-answer sub-module . There are three main goals of this module:

  • Learning representations for complex problems

  • Add time-sensitive reasoning

  • Limit the search space for answers

9b9c989601acf92c3d681ea8e5876216.png

First of all, the first step is to simply split the problem, and it is necessary to extract the core keywords in the problem . The purpose of extracting core keywords is to search in the time-series knowledge map based on this. A sub-graph can be obtained through searching, and it will be more accurate and easier to find answers on the sub-graph. The process of splitting the problem is to split it into entities and expression templates for timing problems. In the template in the example, the two entities of the President of the United States and Obama are used as a special token for representation modeling.

a8c69948cbec253d5ed00c5503cb42fd.png

After splitting the problem, the next step is to search the time-series knowledge graph based on the split entities and extract the corresponding subgraphs. For example, based on the two entities of the President of the United States and Obama, the retrieval is performed, and other relationships and entities related to these two entities in the map are taken out to form a subgraph for further retrieval. This subgraph is relatively small, and it is easier to find the answer on this basis.

The right side of the figure is the language model part, where the input is the temporal sentence template after the question is split. The key word in the question has been removed from the template and replaced with the corresponding general vocabulary such as subject or object, so that it will be much more common when it is put into the pre-trained language model for training. For example, the same sentence pattern can be used to say how Obama , can also be used to say how about Trump. Through representation learning such as sentence templates, the expression of the problem can be made more general.

The above two operations are to reduce the search space, and the second is to better express the problem. In our example, the subgraphs extracted by reducing the search space are already very small. Compared with the original time series knowledge graph, only less than 5% of them are left.

ddffb44e68e17d56498e3d8d4ad8f180.png

After narrowing the search space and learning the expression of the problem, the next step is to predict the corresponding time .

Here we refer to the practice of TcomplEx. TcomplEx performs tensor decomposition on the basis of complex numbers, and makes a complex product of the head entity, relationship, time, and tail entity as a scoring function. Similarly, in the Time Estimation model part, the representation of the head entity and tail entity learned from the map, and the representation of the problem by the pre-trained language model are used as input to build a neural network model to predict time. Inside the model, the real part and the imaginary part are calculated based on the above three inputs. After splicing these two parts together, a time representation is obtained after a linear transformation layer. Then calculate the similarity between the predicted time representation and the time point representation in the subgraph we extracted to find the correct time point in the time series knowledge graph. The above is the process of time prediction.

445e71271164ec761b5c81be37e5264b.png

The above is to predict time, but the answer to a question is not all time, it may also be an entity, so entity prediction is also required. Use the time predicted in the previous step as the input of entity prediction, plus the known representation of one of the entities and the representation of the question to predict the missing information of another entity. For example, the question is "Who was the President of the United States before Obama?" It is equivalent to predicting what the time point before Obama was, and then adding the information of the President of the United States, we can predict that the President of the United States before Obama is George Bush .

d6cd9a8a6ba9dc6cc6d24d2e80fe9ff1.png

The previous prediction step is the logic for the simulated human to understand and answer a time series question. First, it is necessary to predict the exact time point, and then use the predicted time point to assist in predicting who the final answer entity is. In fact, it is a multi-step reasoning process.

2. How to improve the sensitivity of time words?

debd58bc981cad65c447ffdab6901cee.png

The above mentioned how to do time reasoning, and predict the answer entity through time reasoning. But in this process, the model is still insensitive to time words, and we need to solve the sensitivity problem of time words in this process.

66e497965dfa11e7bfce0333d716ae28.png

For example, for the question "What happened before the given event?" and "What happened after the given event?", there is very little difference in their expressions, just the difference between before and after, but the answer to the question is They are completely different, because one is to find the answer forward, and the other is to find the answer backward. It is difficult for the pre-trained language model to learn such subtle changes, and it does not clear what kind of changes are contained in the changes of time words.

Based on this question, we construct contrastive question pairs . If one of the words in the dictionary appears in the question, replace it with another one. For example, if first appears, replace it with last; if before appears, replace it with after; conversely, if after appears, replace it with before. After such a transformation, the same problem will have different angles in time, for example, there will be samples looking forward and looking backward.

e856eee93f64aa2fe75a48bbca396c7e.png

With the comparative samples, we established a Dual model , which consists of two parts, one part is the time reasoning of the original question and the search for the answer; the other part is the comparison question as input. In essence, the structure of these two parts is the same, but the input part is different, which will bring about the difference in the final loss function value, so here we can see that two losses are defined. One of the losses is the loss of chronological calculations, such as asking "Who was the President of the United States before Obama?" The time prediction of the corresponding answer is the time before Obama became president, and the comparison question "President of the United States after Obama Who is it?" The time prediction of the corresponding answer is the time point after Obama ends his presidency, so the time point predicted by the latter must be after the former. It can be seen that the two are time-ordered. We define the Before->After conversion as 0 and the After->Before as 1. The definition of mutual conversion between other words in the comparative dictionary is also similar, for example, First->Last is defined as 0, and Last->First is defined as 1.

5edb83b2ff1ef8ae5b6ea9a6fefd8185.png

Another loss is defined based on the non-intersection of entities. In general, the answer entities of the original question and the comparison question are different. For example, in the previous example, the president of the United States before Obama and the president of the United States after Obama will not be the same person. Therefore, the entity that is 1 when predicting the original question is 0 when predicting the answer to the comparison question, and the entities predicted by the original question and the comparison question have no intersection.

Through the above two losses, we hope that the model can capture the impact of changes in time words during the learning process, which improves the sensitivity of the model to time words and can find answers to questions more effectively and accurately .

3. How to model the timeline?

4311220d0320d6b0e1002d01aee984c1.png

Next, we describe how to model the timeline. The two questions just introduced are how to do time reasoning in the question-and-answer part and how to improve the sensitivity of the model to time words. However, there is still a problem that the sequence of events is ignored when learning the representation of the time-series knowledge graph, and there is no module to model the sequence.

33de10a7ebcd7c9323b93fa955a8b237.png

For example, for the two facts of the events "Obama is the President of the United States" and "Biden is the President of the United States", after adding time to the relationship, two quaternion facts are formed. The time is only relative to each fact, and the two The sequence of facts is not expressed. To solve this problem, we made a simple attempt to add Time position embedding to the regular embedding , which is based on the concept of position embedding in transformers. If the time points are placed on a timeline according to the sequence, then these time points There is a sequence, and there are respective front and rear positions, and the Time position embedding can be added. The formula used is exactly the same as that in the transformer.

1314a343a909305d1ad87ecd124d94a5.png

In addition, we also added a BCE Loss Function , hoping that the predicted time points are a binary classification problem. For example, in 1995 and 2020, we need to use a binary classification model to distinguish who is ahead and who is behind. By adding time-order loss in the learning process, it is ensured that the model can better learn the temporal relationship.

a17580a7c1ad6ff5df09ca3b966d6b3a.png

04

Analysis of results

Next, introduce the effect of the model.

1. Dataset introduction

705e6e634ce87a8cce3662f006cf064b.png

The data set used in the experiment is the data set provided by ACL in an article last year. From the question category, we can see that there are simple questions and complex questions. What we want to solve is how to make better reasoning on complex questions; from the perspective of answer categories, it can be divided into questions about entities and questions about time. .

2. Experimental results

e229b5a6f97f7e5b6c308756eb7fd77a.png

We compared four models. EmbedKGQA is a traditional embedding-based method, and T-EaE-add and T-EaE-replace are improved methods based on this; CronKGQA is the model in the ACL2021 article introduced earlier. It can be seen from the table that TSQA has greatly improved the overall effect, especially reducing 32% errors on complex problems, which is also in line with our approach to achieve better reasoning for complex problems, indicating that our time reasoning is very effective.

33675e1a5140be9ae1f4ff84b4ccb454.png

We also conducted a more in-depth analysis of the results to verify which specific category of complex problems the model improved the most, and how much each improved. In the table, we list each type of sub-problem. It can be seen that the relative improvement percentages of the three types of problems (Before/After, First/Last, Time Join) are 75%, 94%, and 56%, respectively. The improvement is the most on the First/Last problem, and the improvement is less on the Time Join problem. This is because the Time Join problem itself is relatively easy, and the previous method has achieved relatively good results. At the same time, we can also see that Before/After problems are the most difficult. To better predict the content of Before/After problems, you need to know what operations are performed on the time axis, and you need to define what kind of time you are looking for. , this is still a difficult problem.

3. Ablation Experiment

7c31d9224de0675d22662384a9f485bc.png

In addition, we also did some ablation experiments to better understand which part of the design of the entire model has the greatest impact on the final effect, and then know which part is the most important module. The "-" in the table indicates which part is removed from TSQA. NG and TE are the two modules that have the greatest impact on the overall effect. NG is the choice of Neighbor Graph. In this part, only 5% of the original graph is selected as the subgraph for search, and then the answer is reasoned in the subgraph. Therefore, this part has a huge impact on the overall effect of the model. The subgraph The more accurate you find it, the more obvious the effect will be. TE is the time reasoning part of Time Estimation. If the time can be predicted better and the multi-step reasoning process is modeled, the final effect will be greatly improved. Here we see that the overall effect has increased from 0.661 to 0.757. It is a very big improvement; at the same time, it has increased from 0.412 to 0.583 on complex problems, and the improvement effect is also very obvious.

05

Summarize

The purpose of our entire model is to improve the time sensitivity of question answering systems based on temporal knowledge graphs. The so-called sensitivity includes not only how to realize time reasoning in the question-and-answer part, how to increase the sensitivity to time words, but also how to better model time when learning the representation of time-series knowledge graphs. Starting from the above three problems, we designed corresponding modules to improve the overall time sensitivity.

The above is the core content of our article, thank you very much for the opportunity to share with you!

06

Q&A

Q1: How to judge whether the found subgraph is suitable? Or how to choose the appropriate subgraph?

A: How to better select the appropriate subgraph is a very important step, but the method we use here is not complicated. For a problem, such as "Obama is the president of the United States", his core words "Obama" and "President of the United States" is very obvious. After we extract it, we must find a subgraph around these two core words, which can be a subgraph of 1 hop, 2 hops, or multiple hops. The specific number of hops needs to be defined based on the question. For example, in this example, there is actually no need for many hops of information to answer, just find the right time. But there may be more complex problems that need to extract a larger subgraph, and then reason the answer more accurately on this large subgraph. From another perspective, it is actually a so-called balance, depending on whether you need higher recall or higher precision, so it is a trade-off.

Q2: In the reasoning process mentioned, is it better to do reasoning based on graph path search or to add machine learning method to do reasoning?

That is, what are the advantages or differences between the rule-based method and the deep learning-based method? 

A: This involves two aspects. On the one hand, the so-called logical reasoning or the recently mentioned symbolic concept is equivalent to adding clear logical reasoning information to achieve reasoning; on the other hand, it is to directly establish a neural network for learning . Which of these two aspects is better? There is no clear answer to this question, but we feel that there is no so-called path for us to use in time reasoning, because we are more concerned about a time prediction, and your in the figure On a path, no matter how you walk, you can’t get a change in time. It can only be said to go from one point to another, and then jump to other points. Unless you are jumping on the time axis, you can reach other points through changes on the time axis, so the path selection of the time axis is very important. But in the traditional time series knowledge map, we have no way to model the changes on the time axis, so it is still difficult to choose the path.

Q3: You can make a summary of the main contributions of the article. Can these contributions be extended to other time-sensitive issues?

A: The core point of our article is to increase the sensitivity of time, which contains three main contribution points. The first point is that if the question in the question and answer is about time, the time reasoning process in the question can be modeled; the second point is to increase the sensitivity to time words in the process of modeling time reasoning, so that the model Understand the conjugations of time words. Because all question and answer parts are based on the Embeddings learned earlier, our third contribution point is how to better model time in the previous representation learning, and model the so-called time axis.

That's all for today's sharing, thank you all.



01 / Share guests

fcd418b3f095eade677d4af17460966a.jpeg

Dr. Shang Chao

Researcher at JD Silicon Valley Research Institute 


Shang Chao is currently working as a research scientist at JD Silicon Valley Research Institute. Ph.D. graduated from the Department of Computer Science, University of Connecticut. His research mainly focuses on graph neural network and natural language processing. Recently, he has devoted himself to the representation learning of knowledge graphs, the design of question answering systems, and the application of graph neural networks in the fields of time series data and biochemistry.


02 / About DataFun

DataFun: Focus on the sharing and communication of big data and artificial intelligence technology applications. Launched in 2017, it has held more than 100+ offline and 100+ online salons, forums and summits in Beijing, Shanghai, Shenzhen, Hangzhou and other cities, and has invited more than 2,000 experts and scholars to participate in the sharing. Its official account, DataFunTalk, has produced 700+ original articles, with millions of readings and 140,000+ precise fans.


OpenKG

OpenKG (Chinese Open Knowledge Graph) aims to promote the openness, interconnection and crowdsourcing of knowledge graph data with Chinese as the core, and promote the open source and open source of knowledge graph algorithms, tools and platforms.

4110d58b84d872e566f6c2940a246de9.png

Click to read the original text and enter the OpenKG website.

Guess you like

Origin blog.csdn.net/TgqDT3gGaMdkHasLZv/article/details/126899720#comments_26619967