ACL 2023 Outstanding Paper | Probing Language Models' Memory and Understanding of Ontology Knowledge

©Author | Wu Weiqi

Unit | ShanghaiTech University

By | PaperWeekly

The emergence of large-scale language models has greatly promoted the progress of the field of natural language processing, but at the same time there are some limitations, such as the model may produce content that seems reasonable but is actually wrong or false. This phenomenon is called hallucination ( hallucination). The existence of hallucinations makes the reliability of large language models in critical tasks and practical applications challenged.

Model hallucinations may be due to lack or misinterpretation of relevant knowledge by the model. When humans think about and remember things, ontological knowledge plays an important role in our thinking process. Ontology knowledge involves categories, attributes and the relationships among them. It helps us understand the world, organize and classify information, and be able to derive new knowledge. For language models, we can design detection tasks, implicit knowledge and learning bias inside the model.

963c6bcb712a10294639dc5231d7be9f.png

Paper title:

Do PLMs Know and Understand Ontological Knowledge?

Paper link:

https://www.aclanthology.org/2023.acl-long.173.pdf

Code link:

https://github.com/vickywu1022/OntoProbe-PLMs

82844defa8699856e9a76290901e08a7.png

background introduction

In order to explore all kinds of knowledge learned by large models in the pre-training stage, researchers test these models by designing probe tasks. Through the performance of the model on these tasks, we can understand the learning bias, error or limitation of the language model in different aspects, and try to improve the performance and reliability of the model. However, existing knowledge probes mainly study the memory of models for factual knowledge, that is, knowledge describing concrete facts, attributes, and relations. For example, we know that in "Journey to the West", "Monkey King beats the Bone Demon three times", this is a specific piece of factual knowledge.

Compared with factual knowledge, ontology knowledge focuses on classes and attributes, and the relationship between them. It can describe the hierarchical relationship between concepts, attribute constraints and other associations, and provides a structured way to understand world knowledge. The following is an ontology knowledge map. From the factual knowledge of "Sun Wukong beats the bone spirit three times", there are more connections between concepts, including instance type (type), subclass (subclass), subproperty (subproperty), Attribute domain (domain) and attribute range (range).

ecea70c91bf57d829e8d5656c8cf0f83.png

Ontology knowledge can help models better understand real-world objects and their relationships, and plays a vital role in many NLP tasks such as question answering. Therefore, exploring whether the pre-trained language model can memorize and understand ontology knowledge can expand the academic community's understanding of the language model's cognitive ability, which is of great significance in this era of rapid development of large models.

f41a6ec12c5214b1d8911f12bf139764.png

probe method

We study the encoder-based pre-trained language models BERT and RoBERTa, and the decoder-based large model ChatGPT. For the encoder structure model, we use a prompt-based probe method to explore whether the model can predict the correct answer based on the uncovered context; while for the decoder structure model, we will need to fill in the blank prompts Transform into multiple-choice questions and explore whether the model can give the correct choice.

2.1 Memory task

We designed five memory task subtests, each task subtest is to detect the memory ability of the pre-trained language model for an ontology relationship:

1. The type of the given instance;

2. The parent category of a given class;

3. The superior attribute of the given attribute;

4. Domain constraints for a given attribute;

5. Range constraints for a given attribute.

65283b95a1dcb5a42543858b1a93dea0.png

For the BERT model, we use human prompts and trainable soft prompts for probe testing, and the following prompts are designed for each ontology relation. The model ranks candidate words based on log probability predictions.

aae51edd131088bd13b8240c0bb0c8f1.png

2.2 Reasoning tasks

We construct reasoning tasks according to the rules specified in the Resource Description Framework Schema (RDFS), and each reasoning subtask explores the ability of a pre-trained language model to reason according to a syllogism rule. For each premise, we distinguish whether the premise is explicitly included in the model input, and use the probe results of the memory task to further distinguish whether the premise is memorized by the model, and explore the impact of different forms of the premise on model inference.

b34ce4c536f278816bb7acbd05ad3f68.png

To prevent the model from reaching correct conclusions through memorizing the hypothesis rather than reasoning, we replace specific instances, classes, and attributes included in the hypothesis prompt with coined words. For encoder-structured models, we obtain coinages of pre-trained language models by creating word embeddings without special semantics.

c487afed18a852c0e143f5e05e2142a1.png

Experimental results and findings

3.1 Memory task

Through the analysis of experimental data, we found that: BERT and RoBERTa models can memorize certain ontology knowledge, but they are not perfect.

BERT and RoBERTa beat a strong frequency baseline model on the memory task. This shows that during the pre-training process, the language model not only learns factual knowledge about entities, but also learns more abstract ontology relationships behind the facts, which is crucial for the model to better organize its understanding of the world. However, the accuracy of the model on the five sub-tasks still has a lot of room for improvement, indicating the limitations of the model on ontology knowledge memory.

0bd537645585e00b5064ebf4e6354c4b.png

Compared with the BERT model, ChatGPT has significantly improved the accuracy in memory tasks. 

Since multiple-choice is not directly comparable to the difficulty of filling in the blank, we feed prompt words in multiple-choice form to the BERT-base-uncased model and compare with ChatGPT. As can be seen from the table below, in most memory tasks related to ontology knowledge, ChatGPT is significantly better than BERT-base-uncased in terms of accuracy, showing stronger ontology knowledge memory ability.

b8d744dd74ae47895968fbb62085930d.png

3.2 Reasoning tasks

Through the analysis of experimental data, we found that: BERT and RoBERTa models have limited understanding of ontology knowledge.

The figure below shows the inference performance after averaging all inference rules and the BERT and RoBERTa models. When explicitly given in the input text, the model was able to significantly improve the ranking of the correct answer. Since the correct answer that needs to be predicted is included, this leads to suspicion that the performance improvement is not obtained through logical reasoning, but because the model tends to predict the words and related vocabulary that appear in the input.

When the premise is implicitly given, the MRR is higher than when the premise is not given. This means that to a certain extent, the pre-trained language model can use the encoded ontology knowledge to select the correct inference rules for inference. However, all premise combinations cannot give near-perfect (MRR close to 1) inference performance, indicating that the pre-trained language model still has limitations in its ability to understand ontology knowledge.

01a12b0c6ff0fcb41ba4539a06b4b8e9.png

ChatGPT has a stronger ability to reason and understand ontology knowledge. 

ChatGPT exhibits high accuracy on various inference subtasks when the inference premises are included in the model input or memory. At the same time, compared with the BERT-base-uncased model, ChatGPT's explicit reasoning ability is also better (97.1% vs 88.2%).

f8013e90d2fe0af2c731bd9a314df489.png

ede39ca4ef9fe792359a472a53f6ecc9.png

Summarize

In this study, we conducted a comprehensive and systematic discussion on whether the pre-trained language model can effectively encode ontology knowledge during the pre-training process and whether it can deeply understand the semantic content, and found that the language model does have a certain ability to memorize and understand Ontology knowledge, and can follow the ontology knowledge reasoning rules to perform a certain degree of reasoning based on these implicit knowledge. However, both memory and inference of the model have limitations. At the same time, ChatGPT's outstanding performance on the two tasks proves that the model's memory and understanding of ontology knowledge still has the possibility of further improvement.


Enter the NLP group —> join the NLP exchange group

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/132288675