Python builds a question answering system for Jingdong Mall based on knowledge graph-system logic introduction

Here is a simple diagram:

 It can be seen from the schematic diagram that:

The main process is to realize the content of the box, but before that, some preparations need to be made, which is the content in the dotted box.

  • Problem preprocessing

What to do in question preprocessing, it depends on what we want to get from the question, first of all we want to get what is the object of inquiry in the question, that is, the subject, the computer with that name, this It involves named entity recognition in natural language processing; then we also want to know what the user wants to ask from the question, that is, the user's intention. To understand the user's intention, this involves the text representation problem, the most basic text The representation method is one-hot form, and the tfidf tool in sklearn is used in the experiment.

  • Extract key information

For named entity recognition in key information extraction, during the experiment, in this part we actually use part-of-speech tagging, because part-of-speech tagging tools can tag computer names, as follows:

nm 评论

As long as we mark the part of speech of the question and find the word corresponding to nm, it will be the computer name.

  • Training Question Classifier

We want to find out what the user wants to ask, is it about the comments after buying the computer, or how is the configuration of a certain computer? Think about whether we classify the problem in our minds in the process of thinking about this question. In fact, many problems in natural language processing can be abstracted into classification problems. Okay, now we have abstracted what users want to ask into a classification problem. What are the main aspects of user questions and how many categories can they be divided into? And how do we identify which category this user problem belongs to? One of the most direct methods is to list all aspects that users will ask, and then categorize them, thus solving the first two problems. Which category does the problem of how to identify users belong to? Of course we choose a classifier, but wait, what if we need data to train a classifier. The solution is to divide user questions into many categories. For each category of questions, think about how users will ask questions, and then we record these questions, so that we can get the corresponding questions of this category. training data.
 

nm的评论是多少
nm得了多少分
nm的评论有多少
nm的评分
nm的评论数是
nm电脑评论数是多少
nm评论
nm的评论数是多少
nm这太电脑的评论数是多少
nm的好评
nm好评率
nm好评
nm差评
nm速度
nm运行速度
nm反应
nm外观
nm画质
nm清晰
nm舒适
nm好评度
nm的好评度是多少
nm买的人多吗
  • question template

In the previous training problem classifier, we classified the user's problems into many categories. Under this module, what we have to do is to abstract each category, for example, for users to ask a series of questions such as the configuration of a certain computer, We can abstract this into:

nm 电脑配置

 The nm and nnt are just labels and can be changed at will.

0:nm 评论
1:nm 价格
2:nm 类型
3:nm 简介
4:nm 配置
5:nnt 介绍
6:nnt 电脑重量
7:nnt 电脑评论数 大于 x
8:nnt 电脑好评度 大于 x
9:nnt 颜色
10:nnt 处理器
11:nnt 显卡
12:nnt 包邮
13:nnt 有货
14:nnt 快递
15:nnt 屏幕刷新率
16:nnt 固态
17:nnt 产地
18:nnt 厚度
19:nnt 屏幕色域
  • query answer

When we know what the user wants to ask, we can query according to the user's request and get the answer back. This part is mainly about how to operate the graph database.

Ok, now, you know all the main modules, and finally I will make a series of questions, first get the user's question, get the key information of the question from the user's question, the main computer name, and then classify the question , see what the question wants to ask, and predict the question template. After getting the question template, replace the abstract information in the template with specific information such as the name of the person or movie name, and get a new question. The following is a practical example:

result=que.question_process("联想小新Pro14好评率")

Obtain key information: Lenovo Xiaoxin Pro14
problem classification to get the problem template: 0:nm comment
and replace it to get a new problem:

Lenovo Xiaoxin Pro14 positive rate

Then query based on this new question and get the answer back.

Guess you like

Origin blog.csdn.net/weixin_45897172/article/details/130996857