CSDN question and answer module title recommendation task (1) - the construction of the basic framework

series of articles


Team Blog: CSDN AI Group


1 Problem Definition

1.1 Background

In the question-and-answer module of CSDN , many beginners' question titles lack effective information, such as:

Save the children!
Big brother save me! ! ! Help me finish this topic,
please God! ! !

insert image description here
In the example above, a better title would be " How to add scroll bars to mobile pages? "

In this type of question title, the amount of useful information contained is very small, and the meaning of the question cannot be quickly understood from the title, which will affect the efficiency of question answerers and user experience to a certain extent. In addition, this type of data will further affect the effect of downstream NLP tasks, such as: question classification, question recommendation, etc.

Therefore, in order to improve the quality of question titles, it is necessary to recommend more accurate titles to users after users input titles and question descriptions based on information such as question descriptions, and prompt users to make changes.

1.2 input

The available input information is as follows:

"id" : 998678,
"title" : "Help, very simple C# problem",
"body": "The teacher asked to build a grade management system, and asked to implement classified login, but if the user of this cheng'xu is a student, even if the comboBox does not select a student, he can log in smoothly, and the admin does not have this problem. Please tell me why?\r\ n\t\r\n\t\r\n\tprivate void button1_Click(object sender, EventArgs e)\r\n {\r\n string sUser = txtUser.Text.ToString();\r\n string sPassword = txtPassword.Text.ToString();\r\n\r\n if (sUser == “admin” && sPassword == “1234” && comboBoxLeixing.Text == “admin”)\r\n {\r \n Menuadmin main = new Menuadmin();\r\n main.Show();\r\n this.Hide();\r\n }\r\n\r\n if (sUser == "Xu Guangrui " || sUser == "Cao Guang" || sUser == "Cao Ziyue" || sUser == "Chen Sijia" || sUser == "Chen Xu" || sUser == "Huang Wenguang" ||\r\n sUser == "Lei Zhangshu" || sUser == "Liu Qingqing" || sUser == "Qi Shimeng" || sUser == "Shen Bin" || sUser == "Shuaixing" || sUser == " Sun Quanwei" ||\r\n sUser == "Wang Heng" || sUser == "Wang Rui" || sUser == "Xiang Meng" || sUser == "Zhang Guoliang" || sUser == "Zhang Zongyou ” || sUser == “Zhang Shumin”\r\n && sPassword == “1234” && comboBoxLeixing.Text == "Student")\r\n {\r\n Menustudent main = new Menustudent();\r\n main.Show();\r\n this.Hide();\r\n }\ r\n\r\n if (sUser == "Liu Zhaoliang" || sUser == "Long Long" || sUser == "Feng Wei" || sUser == "Liu Shanyong" ||\r\n sUser = = "Yin Forest" || sUser == "Cheng Leli" || sUser == "Liu Yan" || sUser == "Zhao Junwei"\r\n && sPassword == "1234" && comboBoxLeixing.Text== "Teacher") \r\n {\r\n Menuteacher main = new Menuteacher();\r\n main.Show();\r\n this.Hide();\r\n }\r\n \r\n else\r\n label3.Text = "Username or password is wrong, please re-enter!";\r\n }",\r\n this.Hide();\r\n }\r\n \r\n else\r\n label3.Text = "Username or password is wrong, please re-enter!";\r\n } ",\r\n this.Hide();\r\n }\r\n \r\n else\r\n label3.Text = "Username or password is wrong, please re-enter!";\r\n } ",
"tag_id" : 95,
"tag_name" : "c language"

The input mainly includes the above five fields, where title is the title that needs to be improved. Currently only two fields title and body
are used as input.

1.3 Output

Improved question title.

2 solutions

This article further abstracts the problem into a text summarization task in NLP. The specific implementation steps are as follows:

2.1 Data preprocessing

At present, the following preprocessing operations are mainly done:

  1. Remove irrelevant information. For example: code snippets, URLs, irrelevant characters, etc.;
  2. Split paragraphs into sentences. Split based on delimiters, such as newline, period, question mark, exclamation point, etc.

2.2 Model

2.2.1 Coarse sorting

The current solution uses the classic extractive model TextRank to rank all input sentences, and finally select TopN sentences for recommendation.

2.2.2 Fine sorting

Because this article is to recommend the title of the question, it is necessary to give priority to the question sentence.

Here a dictionary-based approach is used to identify all questions in the input. Then look at the results of the rough sorting, and put the questions at the top.

2.3 Experimental results and error data analysis

The preliminary analysis results are shown in the figure below:
insert image description here
As can be seen from the figure above, the main existing problems include:

  1. Sample questions : Some questions only contain pictures, code snippets, etc. in the body , and do not contain useful Chinese text information.
  2. The title is too long : The current preprocessing method is too simple, resulting in some sentences being too long after segmentation, and the currently used model is an extractive text summarization algorithm, which will not modify the input sentences. As a result, some recommended titles are too long. Question titles are generally more concise.

3 Next steps

  1. To classify samples, for samples with only pictures or code segments, it is necessary to identify and judge the information in them before recommending titles.
  2. Refine the title and consider improving it using question templates or generative text summarization.

P.S.

This series of articles will continue to be updated. What I am doing now is too simple, and the effect is not satisfactory. I hope that colleagues, teachers and experts in NLP and other fields can provide valuable suggestions, thank you!

Guess you like

Origin blog.csdn.net/u010280923/article/details/117200163