NLP Road to Frozen Hands (4) - the use of pipeline pipeline functions


✅ NLP research 0 player's study notes



Previous article link: NLP Road to Freezing Hands (3)——Evaluation and the use of indicator functions (Metric, taking BLEU and GLUE as examples)


1. The required environment

python3.7+ required, pytorch1.10+ required

● The library used in this article is based on Hugging Face Transformer, the official website link: https://huggingface.co/docs/transformers/index [A very good open source website, which has done a lot of integration for the transformer framework, currently github 72.3k ⭐️]

● To install the Hugging Face Transformer library, you only need to enter pip install transformers[this is the pip installation method] in the terminal; if you are using it conda, enterconda install -c huggingface transformers



2. Introduction to pipeline

● Hugging Face provides a very lightweight and simple tool pipelinethrough which we can solve some simple NLP tasks. pipelineProvides simple APIs dedicated to multiple tasks, including named entity recognition, Mask language modeling, sentiment analysis, feature extraction, and question answering, among others.

● Through learning and using pipeline, we can experience the feeling of dealing with NLP tasks more intuitively and directly .



Three, the use of pipeline

3.1 Sentiment Classification

● If we do not specify a model, it will automatically download the model distilbert-base-uncased-finetuned-sst-2-english to ~/.cache/torchthe folder .

● If the download speed is too slow, you can configure Tsinghua source first and then run again:pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

● Task: Sentiment binary classification for a given text.

from transformers import pipeline

my_classifier = pipeline("sentiment-analysis")
result = my_classifier("This restaurant is good")
print(result)
result = my_classifier("我觉得这家餐馆不好吃")
print(result)

● The running results are as follows, first he will download the sentiment analysis model, and then conduct sentiment analysis.

insert image description here


3.2 Cloze

● If we do not specify a model, it will automatically download the model distilroberta-base to ~/.cache/torchthe folder .

● Task: The model will fill in <mask>the blanks , and the score represents the probability of filling in the word.

from transformers import pipeline
from pprint import pprint
my_unmasker = pipeline("fill-mask")
sentence = 'HuggingFace is creating a <mask> that the community uses to solve NLP tasks.'
result = my_unmasker(sentence)
pprint(result )

输出:
[{
    
    'sequence': 'HuggingFace is creating a tool that the community uses to solve NLP tasks.',
  'score': 0.17927534878253937,
  'token': 3944,
  'token_str': ' tool'},
 {
    
    'sequence': 'HuggingFace is creating a framework that the community uses to solve NLP tasks.',
  'score': 0.11349416524171829,
  'token': 7208,
  'token_str': ' framework'},
 {
    
    'sequence': 'HuggingFace is creating a library that the community uses to solve NLP tasks.',
  'score': 0.05243571847677231,
  'token': 5560,
  'token_str': ' library'},
 {
    
    'sequence': 'HuggingFace is creating a database that the community uses to solve NLP tasks.',
  'score': 0.034935351461172104,
  'token': 8503,
  'token_str': ' database'},
 {
    
    'sequence': 'HuggingFace is creating a prototype that the community uses to solve NLP tasks.',
  'score': 0.028602460399270058,
  'token': 17715,
  'token_str': ' prototype'}]

3.3 Text generation

● If we do not specify a model, it will automatically download the model gpt2 to ~/.cache/torchthe folder .

● Task: Given a paragraph/sentence to the model, the model then generates subsequent text, the length of which is max_lengthdetermined .

from transformers import pipeline

text_generator = pipeline("text-generation")
result = text_generator("As far as I am concerned, I will",
               max_length=50,
               do_sample=False)
print(result)

输出:
[{
    
    'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]

3.4 Named Entity Recognition

● If we do not specify a model, it will automatically download the model dbmdz/bert-large-cased-finetuned-conll03-english to ~/.cache/torchthe folder .

● Task: Given a paragraph in the model, the model matches the names of people, places, cities, companies, etc. in it.

from transformers import pipeline

ner_pipe = pipeline("ner")
sequence = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""
for entity in ner_pipe(sequence):
    print(entity)

输出:
{
    
    'entity': 'I-ORG', 'score': 0.9995786, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}
{
    
    'entity': 'I-ORG', 'score': 0.9909764, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}
{
    
    'entity': 'I-ORG', 'score': 0.9982225, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}
{
    
    'entity': 'I-ORG', 'score': 0.999488, 'index': 4, 'word': 'Inc', 'start': 13, 'end': 16}
{
    
    'entity': 'I-LOC', 'score': 0.9994345, 'index': 11, 'word': 'New', 'start': 40, 'end': 43}
{
    
    'entity': 'I-LOC', 'score': 0.9993196, 'index': 12, 'word': 'York', 'start': 44, 'end': 48}
{
    
    'entity': 'I-LOC', 'score': 0.9993794, 'index': 13, 'word': 'City', 'start': 49, 'end': 53}
{
    
    'entity': 'I-LOC', 'score': 0.98625815, 'index': 19, 'word': 'D', 'start': 79, 'end': 80}
{
    
    'entity': 'I-LOC', 'score': 0.9514269, 'index': 20, 'word': '##UM', 'start': 80, 'end': 82}
{
    
    'entity': 'I-LOC', 'score': 0.9336589, 'index': 21, 'word': '##BO', 'start': 82, 'end': 84}
{
    
    'entity': 'I-LOC', 'score': 0.97616535, 'index': 28, 'word': 'Manhattan', 'start': 114, 'end': 123}
{
    
    'entity': 'I-LOC', 'score': 0.9914629, 'index': 29, 'word': 'Bridge', 'start': 124, 'end': 130}

3.5 Summary generation

● If we do not specify a model, it will automatically download the model sshleifer/distilbart-cnn-12-6 to ~/.cache/torchthe folder .

● Task: Omit.

from transformers import pipeline

summarizer = pipeline("summarization")
ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""

result = summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)
print(result)

输出:
[{
    
    'summary_text': ' Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002 . At one time, she was married to eight men at once, prosecutors say .'}]

3.6 Text translation

● If we do not specify a model, it will automatically download the model t5-base to ~/.cache/torchthe folder .

● Task: Omit.

from transformers import pipeline

#翻译
translator = pipeline("translation_en_to_de")
sentence = "Hugging Face is a technology company based in New York and Paris"
result = translator(sentence, max_length=40)
print(result)

输出:
[{
    
    'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]

3.7 Reading comprehension

● If we do not specify a model, it will automatically download the model t5-base to ~/.cache/torchthe folder .

● This code may not run successfully, it may be a bug of the original model.

● Task: Given a piece of text, then ask the text a question, and the model gives the corresponding answer.

from transformers import pipeline

question_answerer = pipeline("question-answering")
# 字符串前面加 r 是为了消除转义字符对字符串的影响. 加了 r 之后, 再打印字符串就会打印出完整的字符串
context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a 
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune 
a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
"""
result = question_answerer(question="What is extractive question answering?",
                           context=context)
print(result)
result = question_answerer(
    question="What is a good example of a question answering dataset?",
    context=context)
print(result)

输出:
{
    
    'score': 0.6177279353141785, 'start': 34, 'end': 95, 'answer': 'the task of extracting an answer from a text given a question'}
{
    
    'score': 0.5152313113212585, 'start': 148, 'end': 161, 'answer': 'SQuAD dataset'}


Four. Summary

● This section is not the focus, but it allows us to intuitively experience the charm of NLP at our fingertips, and there will be in-depth exploration in the future, let's work together~!


5. Supplementary Notes

Previous article link: NLP Road to Freezing Hands (3)——Evaluation and the use of indicator functions (Metric, taking BLEU and GLUE as examples)

● The pipeline will go to the hugging face to download the current highest-ranked and most popular model according to the task you specify, so the above SOTA model may change every few years.

● If there is something wrong, or if you have any questions, please feel free to comment and exchange.

● Reference video: HuggingFace concise tutorial, BERT Chinese model practical example, NLP pre-training model, Transformers class library, datasets class library quick start.


⭐️ ⭐️

Guess you like

Origin blog.csdn.net/Wang_Dou_Dou_/article/details/127532166