GLUE Data Download and Submission

1 download

GLUETasks are divided into main tasks ( CoLA, MNLI, MRPC, QNLI, QQP, RTE, SST-2, STS-B, WNLI) and additional tasks ( Diagnostic, hereinafter referred to as AX). Among them, only STS-Bis a regression task , and the rest are classification tasks . It should also be noted that MNLIthe validation set and test set of the data set include two parts: matched (简称为 MNLI-m)and mismatched (简称为 MNLI-mm), which can be treated as two tasks. And for all classification tasks , MNLI-m, MNLI-mmand AXare three-classification tasks, and the rest are two-classification tasks.

Regarding the main tasks, all can be downloaded from the https://gluebenchmark.com/tasks website. MRPCThe task is downloaded as a msifile and needs to be installed. There are only training sets and test sets. The training set needs to be divided to obtain the verification set. There AXare two download links for the test set of the task. If it is a simple unlabeled test set, it can be downloaded at https://gluebenchmark.com/tasks . The link for the labeled test set is https://www. dropbox.com/s/ju7d95ifb072q9f/diagnostic-full.tsv?dl=1 . AXThe training set and test set of the task are MultiNLI 1.0datasets, and the link is https://cims.nyu.edu/~sbowman/multinli/multinli_1.0.zip .

Different tasks have different evaluation indicators, as shown in the figure below.

  • CoLAand AXthe evaluation index is MCC, you can call during training:
from sklearn.metrics import matthews_corrcoef
  • SST-2, MNLI-m, MNLI-mm, QNLI, RTEand WNLIthe evaluation indicators are ACC, which can be called:
from sklearn.metrics import accuracy_score
  • MRPCand QQPevaluation index is macro-F1(because micro-F1 = ACC) and ACC, can be called:
from sklearn.metrics import accuracy_score, f1_score
  • STS-BThe evaluation index is Pearsonr, you can call:
from scipy.stats import pearsonr

GLUE tasks

2. Submit

You need to fill in some information when submitting. The corresponding information can FAQbe found at https://gluebenchmark.com/faq , or you can also see it by placing the mouse on each input box when submitting directly.

What needs attention is the format of the data to be submitted. If you want to see the original English version, you can FAQcheck the first question directly at https://gluebenchmark.com/faq .

  1. The submitted data must be in a folder, and this folder is compressed into a zipfile. Let's say I put my data in glue_resultsa folder named , so I'm going to commit the glue_results.zipfile. NOTE: Do not contain folders within folders.
  2. The data in the folder must be tsvfiles, and the names correspond to the tasks, that is: , CoLA.tsv, MNLI-m.tsv, MNLI-mm.tsv, MRPC.tsv, QNLI.tsv, QQP.tsv, RTE.tsv, SST-2.tsv, STS-B.tsv, WNLI.tsv.AX.tsv
  3. tsvThere are only two columns in the file, the first column is IDs, the second column is labels, the table header is required, and is \tdelimited by . And it should be noted that, for the five tasks of AX, MNLI-m, MNLI-mm, QNLI, this column must be the label name. For example, for the data set, it should be ["contradiction", "entailment", "neutral"], so after the model is output, a mapping must be made here.RTElabelsMNLIlabels
  4. zipThese 11 tasks must be included in , otherwise the submission cannot be completed.
  5. If there is no choice publicand you want to see the specific score of each task, you can profilefind the successfully submitted name on the page, and then click on the name to view the details.

Guess you like

Origin blog.csdn.net/qq_35357274/article/details/125541230