If you need source code and data sets, please like and follow the collection and leave a private message in the comment area~~~
1. Introduction to Q&A Smart Customer Service
QA Q&A is the abbreviation of Question-and-Answer. It retrieves the answers according to the questions raised by the users and answers them in natural language that the users can understand.
From the perspective of application domain, question answering systems can be divided into limited domain question answering systems and open domain question answering systems.
According to the document base, knowledge base, and technical classification that support the question answering system to generate answers, it can be divided into natural language database question answering systems, conversational question answering systems, reading comprehension systems, question answering systems based on common question sets, and knowledge base-based systems. question answering system etc.
Functional Architecture of Smart Q&A Customer Service
A typical question answering system includes question input, question understanding, information retrieval, information extraction, answer sorting, answer generation, and result output. First, the user asks a question, and the retrieval operation obtains relevant information by querying the knowledge base, and extracts information from the extracted information according to specific rules. Extract the corresponding candidate answer eigenvectors, and finally filter the candidate answer results and output them to the user
Intelligent question answering customer service framework
1: Problem processing The problem processing process identifies the information contained in the problem, judges the topic information and topic category of the problem, such as whether it belongs to a general category or a specific topic category, and then extracts key information related to the topic, such as character information, location and time information.
2: Question mapping Perform question mapping to eliminate ambiguity based on the questions consulted by users. Solve mapping problems through string similarity matching and synonym tables, etc., perform split and merge operations as needed.
3: Query construction transforms the question into a query language that the computer can understand by processing the input question, then query the knowledge graph or database, and obtain the corresponding alternative answers through retrieval.
4: Knowledge reasoning is performed based on the attributes of the question. If the basic attribute of the question belongs to the known definition information in the knowledge graph or database, it can be searched from the knowledge graph or database and the answer can be returned directly. If the question attribute is an undefined question, it is necessary to generate an answer through machine algorithm reasoning.
5: Disambiguation sorting According to one or more alternative answers returned by the query in the knowledge graph, combined with the question attributes, disambiguation processing and prioritization are performed, and the best answer is output.
2. Smart medical customer service Q&A practice
Customized intelligent customer service programs generally need to select a corpus, remove noise information, train predictions according to algorithms, and finally provide human-machine interface question-and-answer dialogues. Based on the medical corpus obtained from the Internet, and through the basic principle of cosine similarity, design and develop the following questions and answers Intelligent medical customer service application
The project structure is as follows
Show results
Below are some cases defined in the csv file
Pre-defined welcome sentences
Run the chatrobot file, the following window pops up to output the question, and then click Submit Consultation
Automatically infer answers to questions not in the corpus (often less accurate)
3. Code
Part of the code is as follows All codes and data sets please like and follow the collection and leave a private message in the comment area
# -*- coding:utf-8 -*-
from fuzzywuzzy import fuzz
import sys
import jieba
import csv
import pickle
print(sys.getdefaultencoding())
import logging
from fuzzywuzzy import fuzz
import math
from scipy import sparse
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from scipy.sparse import lil_matrix
from sklearn.naive_bayes import MultinomialNB
import warnings
from tkinter import *
import time
import difflib
from collections import Counter
import numpy as np
filename = 'label.csv'
def tokenization(filename):
corpus = []
label = []
question = []
answer = []
with open(filename, 'r', encoding="utf-8") as f:
data_corpus = csv.reader(f)
next(data_corpus)
for words in data_corpus:
word = jieba.cut(words[1])
tmp = ''
for x in word:
tmp += x
corpus.append(tmp)
question.append(words[1])
label.append(words[0])
answer.append(words[2])
with open('corpus.h5','wb') as f:
pickle.dump(corpus,f)
with open('label.h5','wb') as f:
pickle.dump(label,f)
with open('question.h5', 'wb') as f:
pickle.dump(question, f)
with open('answer.h5', 'wb') as f:
pickle.dump(answer, f)
return corpus,label,question,answer
def train_model():
with open('corpus.h5','rb') as f_corpus:
corpus = pickle.load(f_corpus)
with open('label.h5','rb') as f_label:
label = pickle.load(f_label,encoding='bytes')
vectorizer = CountVectorizer(min_df=1)
transformer = TfidfTransformer()
tfidf = transformer.fit_transform(vectorizer.fit_transform(corpus))
words_frequency = vectorizer.fit_transform(corpus)
word = vectorizer.get_feature_names()
saved = tfidf_calculate(vectorizer.vocabulary_,sparse.csc_matrix(words_frequency),len(corpus))
model = MultinomialNB()
model.fit(tfidf,label)
with open('model.h5','wb') as f_model:
pickle.dump(model,f_model)
with open('idf.h5','wb') as f_idf:
pickle.dump(saved,f_idf)
return model,tfidf,label
class tfidf_calculate(object):
def __init__(self,feature_index,frequency,docs):
self.feature_index = feature_index
self.frequency = frequency
self.docs = docs
self.len = len(feature_index)
def key_count(self,input_words):
keys = jieba.cut(input_words)
count = {}
for key in keys:
num = count.get(key, 0)
count[key] = num + 1
return count
def getTfidf(self,input_words):
count = self.key_count(input_words)
result = lil_matrix((1, self.len))
frequency = sparse.csc_matrix(self.frequency)
for x in count:
word = self.feature_index.get(x)
if word != None and word>=0:
word_frequency = frequency.getcol(word)
feature_docs = word_frequency.sum()
tfidf = count.get(x) * (math.log((self.docs+1) / (feature_docs+1))+1)
result[0, word] = tfidf
return result
if __name__=="__main__":
tokenization(filename)
train_model()
It's not easy to create and find it helpful, please like, follow and collect~~~