Using the large model MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 to achieve zero-sample classification

concept

1. Zero-sample classification: Classify text without sample labels.

2. nli: (Natural Language Inference), natural language inference

3. xnli: (Cross-Lingual Natural Language Inference) is a data set that supports 15 languages. The data set contains 10 fields, each field contains 750 samples, and a total of 7500 manually labeled English tests in 10 fields. The sample consists of 112,500 English-other language annotation pairs. Each data sample consists of two sentences, namely the premise and the hypothesis. The relationship between the premise and the hypothesis has three categories: inclusion, contradiction, and neutral.

Model

1. Manually download MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 to the local, url: MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 at main

2. Git download:

git lfs install
git clone https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7

Code:

Save as m.py file

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
model_name = "mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
for aspect in ['camera', 'phone']:
   print(aspect, classifier('The camera quality of this phone is amazing.',  text_pair=aspect))

Output:

[ipa@comm-agi-p]$ python m.py
camera [{'label': 'entailment', 'score': 0.9938687682151794}]
phone [{'label': 'entailment', 'score': 0.9425390362739563}]

Guess you like

Origin blog.csdn.net/duzm200542901104/article/details/132676457