Fly paddle dialogue model kit (b): Dialogue automatic evaluation module ADE

REVIEW: man-machine dialogue is an important challenge in artificial intelligence, gained widespread attention in recent years academia and industry. In order to help the majority of developers to achieve more and faster to develop dialogue system, fly paddle open a dialogue model tool library in natural language processing model library (PaddleNLP), a built-in dialogue universal understanding model (DGU) and dialogue automatic evaluation module (ADE ). In the previous article, we have introduced a common understanding of the dialogue model (DGU) for everyone. In this article, we will introduce an automatic dialogue evaluation module (ADE).

Dialogue automatic evaluation

With the development and maturity of the dialogue system, how to assess the quality of the dialogue reply system, it has become a new research direction.

Dialogue automatic assessment techniques can help businesses or individuals to quickly assess the quality of the dialogue reply system, reduce labor costs assessment, has important commercial significance.

For example, in the field of customer service, dialogue automatic assessment techniques can be applied to assess the quality of customer service, to determine whether there has been irrelevant answers to help get the electricity business managers to learn more about the level of customer service personnel, to assist management decisions.

In the field of human-computer dialogue, automatic assessment techniques can also be used to assess the quality of the robot's reply, as the pros and cons of a dialogue system auxiliary criterion, the dialogue system improvements become a reference index.

Fly paddle ADE Module Description

2.1 The model introduced

Fly paddle dialogue automatic evaluation module (ADE), mainly for reply quality assessment system of dialogue open field.

It is a text for input (supra, reply), the output is a reply Quality Score.

Considering the matching task (context prediction matches) and the natural connection between the automatic evaluation task, fly paddle ADE module utilizes a matching task pretraining automatic evaluation tasks, and then with a small amount annotation data to fine-tune the model.

Accordingly, fly paddle ADE module may no annotation data or a case where a small amount of labeled data is used:

In the absence of labeling data, the use of negative samples match training model as an evaluation tool to achieve more quality dialogue system back sort.
Marked with a small amount of data (specific dialog system or artificial scene scoring), to fine-tune the model based on matching, can significantly improve the results of the evaluation of the dialogue system or scene.

Providing two models flying in the module pitch ADE:

Matching model: context as input and response, using the learning lstm represent two sentences, in the calculation of the two linear tensor product as logits, then used as sigmoid_cross_entropy_with_logits Loss, ultimately used to assess the degree of similarity.
finetuing model: Based on the matching model, optimized to square the sigmoid_cross_entropy_with_logits loss loss loss, training.

2.2 Evaluation results

We have four different dialogue systems (seq2seq_naive / seq2seq_att / keywords / human), for example, the use of automated assessment tools dialogue automatic evaluation.

1, the annotation data where no direct use of the pre-trained evaluation tools for evaluation; on four dialog system, automatic and manual scoring assessment scores assessed spearman correlation coefficient, as follows:

Here Insert Picture Description
Here Insert Picture Description

Fly paddle ADE Start Guide

The following codes will be sent, the hand and teach you how to use the paddle to fly dialogue automatic evaluation module (ADE).

3.1. Installation Instructions

  • Environmental Dependence:
  • Python >= 2.7
  • cuda >= 9.0
  • cudnn >= 7.0
  • pandas >= 0.20.1
  • PaddlePaddle >= 1.6.0

Cloning Project:


git clone https://github.com/PaddlePaddle/models.git
cd models/PaddleNLP/dialogue_model_toolkit/auto_dialogue_evaluation

3.2. Task List

This training module within the model consists of two phases:

1) The first stage: training a matching model as an evaluation tool that can be used to reply within a dialogue system to be evaluated sort; (matching tasks)

Model structure: input into context and response, two input embedding learning, said learning to express through lstm represent higher-order learning higher-order calculation, said context and response of bilinear tensor product logits, logits computing and label sigmoid_cross_entropy_with_logits loss ;

2) The second stage: a small amount of data using a labeled dialog system, the model matching the first stage of training is finetuning, you can improve the assessment of the effect (including human, keywords, seq2seq_att, seq2seq_naive, 4 th finetuning tasks);

Model structure: finetuning study stage represents the same model structure and a first calculation section logits phase, phase difference is calculated finetuning square_error_cost loss;

Dialogue system for fine-tuning of the second stage includes the following four parts:

  • human: artificial dialogue system;
  • keywords: seq2seq keywords dialogue system;
  • seq2seq_att: seq2seq attention model dialogue systems;
  • seq2seq_naive: naive seq2seq model dialogue systems;

3.3 Data Preparation

Data collection, correlation model download:


cd ade && bash prepare_data_and_model.sh

Data path: data / input / data /
model path: data / saved_models / trained_models /

3.4. Model Configuration

Profile Path: data / config / ade.yaml

3.5. Stand-alone training

1, the first phase matching model of training:

Method 1: Recommended trained to use scripts directly within the module

bash run.sh matching train

Second way: training related to the implementation of the code:


export FLAGS_sync_nccl_allreduce=0
export FLAGS_eager_delete_tensor_gb=1  #开启显存优化

export CUDA_VISIBLE_DEVICES=0  #GPU单卡训练
#export CUDA_VISIBLE_DEVICES=0,1,2,3  #GPU多卡训练

#export CUDA_VISIBLE_DEVICES=  #CPU训练
#export CPU_NUM=1 #CPU训练时指定CPU number

if  [ !"$CUDA_VISIBLE_DEVICES" ]
then
    use_cuda=false
else
    use_cuda=true
fi

pretrain_model_path="data/saved_models/matching_pretrained"

if [ -f ${pretrain_model_path} ]
then
    rm${pretrain_model_path}
fi

if [ ! -d ${pretrain_model_path} ]
then
     mkdir${pretrain_model_path}
fi

2, the second phase finetuning model of training:

Method 1: Recommended trained to use scripts directly within the module

bash run.sh task_name task_type

task_name and task_type for the specific task parameters, you can see more Github at the end of the text.

Second way: training related to the implementation of the code:

export FLAGS_sync_nccl_allreduce=0
export FLAGS_eager_delete_tensor_gb=1  #开启显存优化

export CUDA_VISIBLE_DEVICES=0  #GPU单卡训练
#export CUDA_VISIBLE_DEVICES=0,1,2,3  #GPU多卡训练

#export CUDA_VISIBLE_DEVICES=  #CPU训练
#export CPU_NUM=1 #CPU训练时指定CPU number

if  [ !"$CUDA_VISIBLE_DEVICES" ]
then
    use_cuda=false
else
    use_cuda=true
fi

save_model_path="data/saved_models/human_finetuned"

if [ -f ${save_model_path} ]
then
    rm${save_model_path}
fi

if [ ! -d ${save_model_path} ]
then
    mkdir${save_model_path}
fi

3.6. The model predicts

1, the first stage of matching the model predictions:

Method 1: Recommended use scripts directly within prediction module

bash run.sh matching predict

Second way: performing a prediction relevant code:


export FLAGS_sync_nccl_allreduce=0
export FLAGS_eager_delete_tensor_gb=1  #开启显存优化

export CUDA_VISIBLE_DEVICES=0  #单卡预测
#export CUDA_VISIBLE_DEVICES=  #CPU预测
#export CPU_NUM=1 #CPU训练时指定CPU number

if  [ !"$CUDA_VISIBLE_DEVICES" ]
then
    use_cuda=false
else
    use_cuda=true
fi

2, the second phase finetuning model predictions:

Method 1: Recommended use scripts directly within prediction module


bash run.sh task_name task_type

task_name and task_type for the specific task parameters, you can see more Github at the end of the text.

Second way: performing a prediction relevant code:

export FLAGS_sync_nccl_allreduce=0
export FLAGS_eager_delete_tensor_gb=1  #开启显存优化

export CUDA_VISIBLE_DEVICES=0  #单卡预测
#export CUDA_VISIBLE_DEVICES=  #CPU预测
#export CPU_NUM=1 #CPU训练时指定CPU number

if  [ !"$CUDA_VISIBLE_DEVICES" ]
then
    use_cuda=false
else
    use_cuda=true
fi

3.7. Model Assessment

Module 5 tasks, each task support evaluation index content calculated as follows:

The first stage:
matching: using R1 @ 2, R1 @ 10, R2 @ 10, R5 @ 10 sorted four indicators to assess the effect of the model;

second stage:

  • human: Use spearman correlation coefficient to measure the relationship between the scoring model to evaluate the system and the actual dialogue scoring system;
  • keywords: Use spearman correlation coefficient to measure the relationship between the scoring model to evaluate the system and the actual dialogue scoring system;
  • seq2seq_att: using spearman correlation coefficient to measure the relationship between the scoring model to evaluate the system and the actual dialogue scoring system;
  • seq2seq_naive: using spearman correlation coefficient to measure the relationship between the scoring model to evaluate the system and the actual dialogue scoring system;

1, the first phase matching model assessment:

Method 1: Recommended scripts directly within the module assessment

bash run.sh matching evaluate

Second way: to assess the implementation of the code related to:

export CUDA_VISIBLE_DEVICES=  #指默认CPU评估
export CPU_NUM=1 #CPU训练时指定CPU number

python -u main.py \
     --do_eval=true \
     --use_cuda=false \
     --evaluation_file="data/input/data/unlabel_data/test.ids" \
     --output_prediction_file="data/output/pretrain_matching_predict"\
     --loss_type="CLS"

2, the second stage finetuning Model Evaluation:

Method 1: Recommended scripts directly within the module assessment

bash run.sh task_name task_type

task_name and task_type for the specific task parameters, you can see more Github at the end of the text.

Second way: to assess the implementation of the code related to:


export CUDA_VISIBLE_DEVICES=  #指默认CPU评估
export CPU_NUM=1 #CPU训练时指定CPU number

python -u main.py \
     --do_eval=true \
     --use_cuda=false \
     --evaluation_file="data/input/data/label_data/human/test.ids"\
     --output_prediction_file="data/output/finetuning_human_predict"\
     --loss_type="L2"

3.8. Inference model

1, concluded that the first phase matching models:
Method 1: Recommended use scripts directly within the module save inferencemodel

bash run.sh matching inference

Second way: the implementation of inferencemodel relevant code:


export CUDA_VISIBLE_DEVICES=0  # 指GPU单卡推断
#export CUDA_VISIBLE_DEVICES=  #CPU推断
#export CPU_NUM=1 #CPU训练时指定CPU number

if  [ !"$CUDA_VISIBLE_DEVICES" ]
then
    use_cuda=false
else
    use_cuda=true
fi

python -u main.py \
     --do_save_inference_model=true \
     --use_cuda=${use_cuda} \
     --init_from_params="data/saved_models/trained_models/matching_pretrained/params"\
     --inference_model_dir="data/inference_models/matching_inference_model"

2, the second phase finetuning model inference:

Method 1: Recommended use scripts directly within the module save inferencemodel


bash run.sh task_name task_type

task_name and task_type for the specific task parameters, you can see more Github at the end of the text.

Second way: the implementation of inferencemodel relevant code:


export CUDA_VISIBLE_DEVICES=0  # 指GPU单卡推断
#export CUDA_VISIBLE_DEVICES=  #CPU推断
#export CPU_NUM=1 #CPU训练时指定CPU number

if  [ !"$CUDA_VISIBLE_DEVICES" ]
then
    use_cuda=false
else
    use_cuda=true
fi

python -u main.py \
     --do_save_inference_model=true \
     --use_cuda=${use_cuda} \
     --init_from_params="data/saved_models/trained_models/human_finetuned/params"\
     --inference_model_dir="data/inference_models/human_inference_model"

3.9. Service Deployment

Provided within the module have been trained five inference_model models available for you to download.

Today introduction to fly paddle dialogue automatic evaluation module (ADE) to the end here, get hands-on try it!

Want to connect with more depth learning developer, join the fly paddle official QQ group: 796 771 754.

If you want to learn more about the content flying paddle PaddlePaddle more, see the following document.

Official website address:
https://www.paddlepaddle.org.cn/

Project Address:
https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/PaddleDialogue

Published 116 original articles · won praise 1 · views 4567

Guess you like

Origin blog.csdn.net/PaddleLover/article/details/103928023