[Python natural language processing + tkinter graphical interface] Realize the actual combat of intelligent medical customer service question and answer robot (with source code, data set, and super detailed demonstration)

If you need source code and data sets, please like and follow the collection and leave a private message in the comment area~~~

1. Introduction to Q&A Smart Customer Service

QA Q&A is the abbreviation of Question-and-Answer. It retrieves the answers according to the questions raised by the users and answers them in natural language that the users can understand.

From the perspective of application domain, question answering systems can be divided into limited domain question answering systems and open domain question answering systems.

According to the document base, knowledge base, and technical classification that support the question answering system to generate answers, it can be divided into natural language database question answering systems, conversational question answering systems, reading comprehension systems, question answering systems based on common question sets, and knowledge base-based systems. question answering system etc.

Functional Architecture of Smart Q&A Customer Service

A typical question answering system includes question input, question understanding, information retrieval, information extraction, answer sorting, answer generation, and result output. First, the user asks a question, and the retrieval operation obtains relevant information by querying the knowledge base, and extracts information from the extracted information according to specific rules. Extract the corresponding candidate answer eigenvectors, and finally filter the candidate answer results and output them to the user 

 Intelligent question answering customer service framework

1: Problem processing The problem processing process identifies the information contained in the problem, judges the topic information and topic category of the problem, such as whether it belongs to a general category or a specific topic category, and then extracts key information related to the topic, such as character information, location and time information.

2: Question mapping Perform question mapping to eliminate ambiguity based on the questions consulted by users. Solve mapping problems through string similarity matching and synonym tables, etc., perform split and merge operations as needed.

3: Query construction transforms the question into a query language that the computer can understand by processing the input question, then query the knowledge graph or database, and obtain the corresponding alternative answers through retrieval.

4: Knowledge reasoning is performed based on the attributes of the question. If the basic attribute of the question belongs to the known definition information in the knowledge graph or database, it can be searched from the knowledge graph or database and the answer can be returned directly. If the question attribute is an undefined question, it is necessary to generate an answer through machine algorithm reasoning.

5: Disambiguation sorting According to one or more alternative answers returned by the query in the knowledge graph, combined with the question attributes, disambiguation processing and prioritization are performed, and the best answer is output.

2. Smart medical customer service Q&A practice

Customized intelligent customer service programs generally need to select a corpus, remove noise information, train predictions according to algorithms, and finally provide human-machine interface question-and-answer dialogues. Based on the medical corpus obtained from the Internet, and through the basic principle of cosine similarity, design and develop the following questions and answers Intelligent medical customer service application

The project structure is as follows 

Show results 

Below are some cases defined in the csv file

Pre-defined welcome sentences 

 

 

Run the chatrobot file, the following window pops up to output the question, and then click Submit Consultation  

 

Automatically infer answers to questions not in the corpus (often less accurate) 

 

 

 3. Code

Part of the code is as follows All codes and data sets please like and follow the collection and leave a private message in the comment area

# -*- coding:utf-8 -*-
from fuzzywuzzy import fuzz
import sys
import jieba
import csv
import pickle
print(sys.getdefaultencoding())

import logging
from fuzzywuzzy import fuzz
import math
from scipy import sparse
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from scipy.sparse import lil_matrix
from sklearn.naive_bayes import MultinomialNB
import warnings
from tkinter import *
import time
import difflib
from collections import Counter
import numpy as np


filename = 'label.csv'

def tokenization(filename):


    corpus = []
    label = []
    question = []
    answer = []
    with open(filename, 'r', encoding="utf-8") as f:
        data_corpus = csv.reader(f)
        next(data_corpus)
        for words in data_corpus:
            word = jieba.cut(words[1])
            tmp = ''
            for x in word:
                tmp += x
            corpus.append(tmp)
            question.append(words[1])
            label.append(words[0])
            answer.append(words[2])
    
    with open('corpus.h5','wb') as f:
        pickle.dump(corpus,f)
    with open('label.h5','wb') as f:
        pickle.dump(label,f)
    with open('question.h5', 'wb') as f:
        pickle.dump(question, f)
    with open('answer.h5', 'wb') as f:
        pickle.dump(answer, f)

    return corpus,label,question,answer



def train_model():

    with open('corpus.h5','rb') as f_corpus:
        corpus = pickle.load(f_corpus)

    with open('label.h5','rb') as f_label:
        label = pickle.load(f_label,encoding='bytes')


    vectorizer = CountVectorizer(min_df=1)
    transformer = TfidfTransformer()
    tfidf = transformer.fit_transform(vectorizer.fit_transform(corpus))
    words_frequency = vectorizer.fit_transform(corpus)
    word = vectorizer.get_feature_names()
    saved = tfidf_calculate(vectorizer.vocabulary_,sparse.csc_matrix(words_frequency),len(corpus))
    model = MultinomialNB()
    model.fit(tfidf,label)


    with open('model.h5','wb') as f_model:
        pickle.dump(model,f_model)

    with open('idf.h5','wb') as f_idf:
        pickle.dump(saved,f_idf)

    return model,tfidf,label
    
    
    
    
class tfidf_calculate(object):
    def __init__(self,feature_index,frequency,docs):
        self.feature_index = feature_index
        self.frequency = frequency
        self.docs = docs
        self.len = len(feature_index)

    def key_count(self,input_words):
        keys = jieba.cut(input_words)
        count = {}
        for key in keys:
            num = count.get(key, 0)
            count[key] = num + 1
        return count

    def getTfidf(self,input_words):
        count = self.key_count(input_words)
        result = lil_matrix((1, self.len))
        frequency = sparse.csc_matrix(self.frequency)
        for x in count:
            word = self.feature_index.get(x)
            if word != None and word>=0:
                word_frequency = frequency.getcol(word)
                feature_docs = word_frequency.sum()
                tfidf = count.get(x) * (math.log((self.docs+1) / (feature_docs+1))+1)
                result[0, word] = tfidf
        return result    

if __name__=="__main__":
    tokenization(filename)
    train_model()

It's not easy to create and find it helpful, please like, follow and collect~~~

Guess you like

Origin blog.csdn.net/jiebaoshayebuhui/article/details/128199100