Introductory Guide to Deep Learning in 2023 (7) - HuggingFace Transformers Library

Introductory Guide to Deep Learning in 2023 (7) - HuggingFace Transformers Library

In this section, we will learn about the package library of the pre-trained model and the use of Hugging Face's Transformers library. Hugging Face's library is very active, such as supporting LLaDA large-scale classes, which were released the day before this article was written.
The library is so new, and the corresponding supporting library is constantly being revised. When entering this field, you must be prepared to spend time perfecting immature functions, especially to spend more time debugging problems.

In addition, I would like to remind everyone that large model algorithms are not ordinary programming. Model size and chain of thought are still very important.

Pipeline programming

Pipeline is a task-oriented programming method in the transformers library. For example, our most common task is text generation.

We only need to specify the "text-generation" task, and then choose a model, that's it. For example, as follows, we choose to use gpt2 for text generation:

text_generator = pipeline("text-generation", model="gpt2")

Let's make a full version, remove the reference package and set a terminator, basically two sentences, one sentence to generate the pipeline, and one sentence to print the result.

from transformers import pipeline

text_generator = pipeline("text-generation", model="gpt2", max_new_tokens=250)

text_generator.model.config.pad_token_id = text_generator.model.config.eos_token_id

text = text_generator("I have a dream ")[0]["generated_text"]

print(text)

Here is the result of one of my runs:

I have a dream "

The young man's lips parted under a wave of laughter. "My dream!"

Bagel said that "My dream!"

The young man jumped back the moment he got off the train. "Good, good!"

On the other hand, the boy had gotten off. "My dream!"

There he was again in that black and white moment that his soul couldn't shake.

In this youth, the only thing that could stop him from reaching his dream was this.

"Dad, we're here now!"

Bagel didn't know how to react, at his level of maturity, he had to show up before the others to ask him something, if that wasn't his right, then his first duty had always been to save Gung-hye's life. But even so, he didn't understand why Bamboo was being so careful and so slow to respond to him. It turned out that she hadn't sent him one word to the authorities, she had simply told them not to respond.

Of course they wouldn't listen to the question, it was even worse after realizing it, Bamboo had to understand when his next

GPT2 is openai's second-generation GPT model. We can see that under the .cache\huggingface\hub\models–gpt2 directory in your personal directory, there will be more than 500 M of data, which is the size of the gpt2 model.

If you feel that the effect of gpt2 is not good enough, we can change to a larger gpt-large model:

text_generator = pipeline("text-generation", model="gpt2-large", max_new_tokens=250)

text_generator.model.config.pad_token_id = text_generator.model.config.eos_token_id

text = text_generator("I have a dream ")[0]["generated_text"]

print(text)

The size of .cache\huggingface\hub\models–gpt2-large is more than 3G.

If you are not satisfied, you can use gpt2-xl, and the size of the sub-model will be 6 G.

If the C drive has limited space, you can point it to the D drive or other drives by specifying the TRANSFORMERS_CACHE environment variable.

In addition to text generation, the pipeline supports many other tasks based on text, speech, images, etc.
Although it is not recommended, when the model is not specified, the system will actually configure a model for us by default.

For example, we write a pipeline for sentiment analysis:

from transformers import pipeline

pipe = pipeline("text-classification")
result = pipe("这个游戏不错")
print(result)

The system found the distilbert-base-uncased-finetuned-sst-2-english model for us by default.

Similarly, we can also engage in a dialogue pipeline. The only difference is that we need to use Conversation to wrap the input information, and the obtained results are also read from the Conversation object.
For example, we use facebook's blenderbot model:

from transformers import pipeline, Conversation

pipe = pipeline('conversational', model='facebook/blenderbot-1B-distill')

conversation_1 = Conversation("What's your favorite moive?") # 创建一个对话对象
pipe([conversation_1]) # 传入一个对话对象列表,得到模型的回复
print(conversation_1.generated_responses) # 打印模型的回复
conversation_1.add_user_input("Avatar") # 添加用户的输入
pipe([conversation_1]) # 再次传入对话对象列表,得到模型的回复
print(conversation_1.generated_responses) # 打印模型的回复

Using tokenizers and models

In addition to using the pipeline, we have a more traditional usage, which is to show the method of using tokenizers and models.

Language strings, especially languages ​​that do not use Latin or Cyrillic alphabets like Chinese and Japanese, are inconvenient to be directly used by the language model, so we need to use the Tokenizer to encode the string first, and then use the word segmentation after the reasoning is completed device for decoding.
Generally speaking, we don't need to specify the type of tokenizer, just use AutoTokenizer:

tokenizer = AutoTokenizer.from_pretrained("gpt2")

Let's take an example to see:

import torch
from transformers import GPT2LMHeadModel, AutoTokenizer

# 加载预训练模型及对应的分词器
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

# 使用分词器将文本转换为tokens
input_tokens = tokenizer.encode("I have a dream ", return_tensors="pt")

model.config.pad_token_id = model.config.eos_token_id

# 使用模型生成文本
output = model.generate(input_tokens, max_length=250,
                        num_return_sequences=1, no_repeat_ngram_size=2)

# 将生成的tokens转换回文本
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

print(generated_text)

We can also be more abstract, using the language model's general abstract class AutoModelForCausalLM:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")

# 加载预训练模型及对应的分词器
tokenizer = AutoTokenizer.from_pretrained("gpt2", cache_dir='e:/xulun/models/')
tokenizer.pad_token_id = tokenizer.eos_token_id
model = AutoModelForCausalLM.from_pretrained("gpt2", cache_dir='e:/xulun/models/')

# 使用分词器将文本转换为tokens
input_tokens = tokenizer.encode("I have a dream ", return_tensors="pt")

# 使用模型生成文本
output = model.generate(input_tokens, max_length=250,
                        num_return_sequences=1, no_repeat_ngram_size=2)

# 将生成的tokens转换回文本
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

print(generated_text)

With the above abstraction layer, we can use other large models to follow the prescription.
However, the LlaMA model is not yet fully supported, for example, LlamaTokenizerFast is still in the testing phase. With the update in the future, I will come back and update this article.

from transformers import LlamaTokenizerFast

tokenizer = LlamaTokenizerFast.from_pretrained("hf-internal-testing/llama-tokenizer")
print(tokenizer.encode("Hello this is a test"))

Larger models that perform other tasks

With the above framework in place, as long as we know what models are available, we have to introduce some pre-trained models.

First of all, the first one must be the GPT model that we have been familiar with many times. We have just learned gpt2, and we introduced the API of gpt3 in the openai API part of the second article.

The second one worth mentioning is Google's T5 model. Its core idea is based on transfer learning, which can unify various text tasks. We can look at the table below to understand the results of T5 on each subtask.

In addition, T5 training has used 1024 and TPU v3 accelerators.

We use the large T5 1.1 model to try to write a summary:

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/t5-v1_1-large")
model = T5ForConditionalGeneration.from_pretrained("google/t5-v1_1-base",max_length=250)

str1 = """
Summarize:
We have explored chain-of-thought prompting as a simple and broadly applicable method for enhancing
reasoning in language models. Through experiments on arithmetic, symbolic, and commonsense
reasoning, we find that chain-of-thought reasoning is an emergent property of model scale that allows
sufficiently large language models to perform reasoning tasks that otherwise have flat scaling curves.
Broadening the range of reasoning tasks that language models can perform will hopefully inspire
further work on language-based approaches to reasoning.
"""

input_ids = tokenizer(str1, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

GPT comes from openai, and BERT comes from Google. The Facebook team tried to integrate the strengths of the two and launched the BART model.

The pre-training process of BART consists of two steps: (1) corrupting the text with an arbitrary denoising function, such as randomly shuffling sentences or replacing text fragments with masked symbols; (2) learning a model to reconstruct the original text. BART uses a standard Transformer-based neural machine translation architecture, which can be seen as a generalization of BERT (due to the bidirectional encoder), GPT (due to the left-to-right decoder), and other more recent pre-training schemes.

Let's take an example of using bart-large-cnn to write a summary:

from transformers import AutoTokenizer, BartForConditionalGeneration

model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

ARTICLE_TO_SUMMARIZE = (
    """
    We have explored chain-of-thought prompting as a simple and broadly applicable method for enhancing
reasoning in language models. Through experiments on arithmetic, symbolic, and commonsense
reasoning, we find that chain-of-thought reasoning is an emergent property of model scale that allows
sufficiently large language models to perform reasoning tasks that otherwise have flat scaling curves.
Broadening the range of reasoning tasks that language models can perform will hopefully inspire
further work on language-based approaches to reasoning.
    """
)
inputs = tokenizer([ARTICLE_TO_SUMMARIZE],
                   max_length=1024, return_tensors="pt")

# Generate Summary
summary_ids = model.generate(
    inputs["input_ids"], num_beams=2, min_length=0, max_length=100)
print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True,
      clean_up_tokenization_spaces=False)[0])

The generated result is as follows:

We find that chain-of-thought reasoning is an emergent property of model scale that allows large language models to perform reasoning tasks. Broadening the range of reasoning tasks that language models can perform will hopefully inspire further work.

summary

After learning the basic framework programming, you can try based on various models.

Guess you like

Origin blog.csdn.net/lusing/article/details/130191334