concept
1. Zero-sample classification: Classify text without sample labels.
2. nli: (Natural Language Inference), natural language inference
3. xnli: (Cross-Lingual Natural Language Inference) is a data set that supports 15 languages. The data set contains 10 fields, each field contains 750 samples, and a total of 7500 manually labeled English tests in 10 fields. The sample consists of 112,500 English-other language annotation pairs. Each data sample consists of two sentences, namely the premise and the hypothesis. The relationship between the premise and the hypothesis has three categories: inclusion, contradiction, and neutral.
Model
1. Manually download MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 to the local, url: MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 at main
2. Git download:
git lfs install
git clone https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7
Code:
Save as m.py file
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
model_name = "mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
for aspect in ['camera', 'phone']:
print(aspect, classifier('The camera quality of this phone is amazing.', text_pair=aspect))
Output:
[ipa@comm-agi-p]$ python m.py
camera [{'label': 'entailment', 'score': 0.9938687682151794}]
phone [{'label': 'entailment', 'score': 0.9425390362739563}]