1 download
GLUE
Tasks are divided into main tasks ( CoLA
, MNLI
, MRPC
, QNLI
, QQP
, RTE
, SST-2
, STS-B
, WNLI
) and additional tasks ( Diagnostic
, hereinafter referred to as AX
). Among them, only STS-B
is a regression task , and the rest are classification tasks . It should also be noted that MNLI
the validation set and test set of the data set include two parts: matched (简称为 MNLI-m)
and mismatched (简称为 MNLI-mm)
, which can be treated as two tasks. And for all classification tasks , MNLI-m
, MNLI-mm
and AX
are three-classification tasks, and the rest are two-classification tasks.
Regarding the main tasks, all can be downloaded from the https://gluebenchmark.com/tasks website. MRPC
The task is downloaded as a msi
file and needs to be installed. There are only training sets and test sets. The training set needs to be divided to obtain the verification set. There AX
are two download links for the test set of the task. If it is a simple unlabeled test set, it can be downloaded at https://gluebenchmark.com/tasks . The link for the labeled test set is https://www. dropbox.com/s/ju7d95ifb072q9f/diagnostic-full.tsv?dl=1 . AX
The training set and test set of the task are MultiNLI 1.0
datasets, and the link is https://cims.nyu.edu/~sbowman/multinli/multinli_1.0.zip .
Different tasks have different evaluation indicators, as shown in the figure below.
CoLA
andAX
the evaluation index isMCC
, you can call during training:
from sklearn.metrics import matthews_corrcoef
SST-2
,MNLI-m
,MNLI-mm
,QNLI
,RTE
andWNLI
the evaluation indicators areACC
, which can be called:
from sklearn.metrics import accuracy_score
MRPC
andQQP
evaluation index ismacro-F1
(becausemicro-F1 = ACC
) andACC
, can be called:
from sklearn.metrics import accuracy_score, f1_score
STS-B
The evaluation index isPearsonr
, you can call:
from scipy.stats import pearsonr
2. Submit
You need to fill in some information when submitting. The corresponding information can FAQ
be found at https://gluebenchmark.com/faq , or you can also see it by placing the mouse on each input box when submitting directly.
What needs attention is the format of the data to be submitted. If you want to see the original English version, you can FAQ
check the first question directly at https://gluebenchmark.com/faq .
- The submitted data must be in a folder, and this folder is compressed into a
zip
file. Let's say I put my data inglue_results
a folder named , so I'm going to commit theglue_results.zip
file. NOTE: Do not contain folders within folders. - The data in the folder must be
tsv
files, and the names correspond to the tasks, that is: ,CoLA.tsv
,MNLI-m.tsv
,MNLI-mm.tsv
,MRPC.tsv
,QNLI.tsv
,QQP.tsv
,RTE.tsv
,SST-2.tsv
,STS-B.tsv
,WNLI.tsv
.AX.tsv
tsv
There are only two columns in the file, the first column isIDs
, the second column islabels
, the table header is required, and is\t
delimited by . And it should be noted that, for the five tasks ofAX
,MNLI-m
,MNLI-mm
,QNLI
, this column must be the label name. For example, for the data set, it should be ["contradiction", "entailment", "neutral"], so after the model is output, a mapping must be made here.RTE
labels
MNLI
labels
zip
These 11 tasks must be included in , otherwise the submission cannot be completed.- If there is no choice
public
and you want to see the specific score of each task, you canprofile
find the successfully submitted name on the page, and then click on the name to view the details.