Construction of automatic question answering system based on machine learning (transfer)

Original source: https://blog.csdn.net/sparkexpert/article/details/52447553 

 Automatic question answering system is a very hot direction in the field of natural language processing. It integrates knowledge representation, information retrieval, natural language processing and other technologies. The automatic question answering system can enable users to ask questions in natural language rather than a combination of keywords, put forward information query requirements, the system analyzes the questions based on the analysis, and automatically finds accurate answers from various data resources. In terms of system functions, automatic question answering is divided into open domain automatic question answering and limited domain automatic question answering. Open domain means that users can ask questions at will, and the system looks for answers from massive data; restricted domain means that the system declares in advance that it can only answer questions in a certain domain, and cannot answer questions in other domains.

  In order to test whether this aspect is feasible or not, recently, using the relevant question-and-answer corpus known by Baidu, tested it.

 

  Specific steps:

  (1) Data preprocessing: The original data known by Baidu is integrated into the data in a standardized format and imported into the database through preprocessing, which is convenient for subsequent processing, and forms the original data set required for training data.

   (2) Build a classifier: use the given data to train a text classifier model, when users ask test questions, they can label the test questions with category labels to lock the knowledge range of the answers;

  (3) Similar question retrieval: Calculate the text similarity between the test question and other questions in the same category in the training corpus, and find out the questions with higher similarity as a set of similar questions

(4) Answer extraction: sort all the answers in the set of similar questions, and select the best answer to feed back to the user.

 

The core technology inside is the construction of the classifier. Since the deep learning method has not been adopted, only the SVM classifier is used for testing, and it is found that it is feasible. And similar problems to calculate this, there are many ready-made stuff.

 

  Implemented with JAVA code, the test results are as follows:



 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326004740&siteId=291194637