Easily play open source large language model bloom (2)

foreword

Continuing from the above, we can already use bloom’s text generation function through simple examples. What are the specific strategies for text generation? How do we control bloom to generate the effect we want? This article will start with the decoding strategy and introduce the two most basic strategies: greedy search (greedy search) and beam search (particle beam search).

greedy search

The strategy of greedy search is very simple. Since the words are generated one by one, after each generated word, select the word with the highest calculated probability and connect them together. Here I quote someone else's picture:
insert image description here
The input is The, the first choice is nice with the highest probability of 0.5, and the second choice is the woman with the highest probability of 0.4 under this branch, and the total branch probability is 0.5x0.4=0.2.

An example of model.generate using hugging face is as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer
import time
a1 = time.time()
checkpoint = "bigscience/bloom-1b1"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint)

inputs = tokenizer.encode("我和我的猫都很想你", return_tensors="pt") #prompt
outputs = model.generate(inputs,min_length=150,max_length=200)
print(tokenizer.decode(outputs[0])) #使用tokenizer对生成结果进行解码
a2 = time.time()
print(f'time cost is {
      
      a2 - a1} s')

Note that only the two parameters min_length and max_length are used in this example. Greedy search will be used when no other parameters affecting the strategy are set by default.

The generated effect is as follows:

insert image description here
It can be found that it was normal at the beginning, and then it became meaningless repetition. This is the most obvious disadvantage of using this strategy.

particle beam search

Particle beam, as the name implies, is the joint probability of a whole sentence. Here we first assume that num_beams=2, that is, the case of two particle beams: it can be seen that the solid
insert image description here
line with the highest probability is 0.4*0.9=0.36, which is higher than what we originally used. The dotted line with a probability of 0.2 obtained by greedy search is higher, and the benefit of particle beam search is here. Words with high probability can be considered later, such as has with a probability of 0.9 here. Of course, particle beam search is essentially just a limited part of breadth-first search . It is heuristic and cannot obtain the global optimal solution, so it has the same problems as it should.

An example of model.generate using hugging face is as follows:

outputs = model.generate(inputs,min_length=150,max_length=200,
num_beams=5,early_stopping=True)

The difference in use from greedy search is only to set the number of num_beams and the early stop mechanism. When early stopping is set to True, the search will stop once num_beams finds enough candidate beams. If it is False, a heuristic search will be added, and the search will stop when it is unlikely to find a candidate. If set to never, the search will only stop when it is determined that there will be no better candidates.

The generated effect is as follows:
insert image description here

The effect is very poor. Not only does it contain repetition , but the time is 30S slower than before . Increasing num_beams may make the effect better, but the increase in time is also multiplied . For example, the effect when num_beams=10:

insert image description here

Constrained Particle Beam Search

Here I only introduce two techniques, setting conditions to limit repeated vocabulary and forcing the vocabulary we want. For more techniques, please refer to the documentation of huggingface.

Limit repeated words:

outputs = model.generate(inputs,min_length=150,max_length=200,num_beams=5,
no_repeat_ngram_size=2,early_stopping=True)

That is, set the no_repeat_ngram_size parameter to indicate the maximum number of repeated words.

The generated effect is as follows:

我和我的猫都很想你,想你的时候,我就会想起你。 
猫咪,我爱你。 我爱你,我的小猫。 
你是我生命中不可缺少的一部分,你是我生命中的阳光,你的出现,让我的生活多了一抹亮丽的色彩,使我的生命更加丰富多彩。 
你的到来,给了我无尽的快乐和幸福,让我感到无比的幸福和快乐。 
在你的陪伴下,我不再孤单,不再寂寞,我在你的怀抱中,感受到了家的温暖,感受到家的亲切,感到家的温馨。 
我的生命因你而更加精彩,因为有了你的存在,使我的人生更加充实,更加有意义。 
因为有你在我身边,我才感到生活的意义,才感到生命的价值。因为有你的相伴,我对生活充满了信心,对未来充满了希望。</s>
time cost is 97.57852363586426 s

Force generation of desired vocabulary:

from transformers import AutoModelForCausalLM, AutoTokenizer
import time
a1 = time.time()
checkpoint = "bigscience/bloom-1b1"
force_words = ["回家"] #强制词汇

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint)
force_words_ids = tokenizer(force_words, add_special_tokens=False).input_ids #装载
inputs = tokenizer.encode("我和我的猫都很想你", return_tensors="pt") 
outputs = model.generate(inputs,force_words_ids=force_words_ids,min_length=150,max_length=200,num_beams=5,no_repeat_ngram_size=2,early_stopping=True)
print(tokenizer.decode(outputs[0])) 
a2 = time.time()
print(f'time cost is {
      
      a2 - a1} s')

Here we force the generated stories to contain home vocabulary, and then use force_words_ids to control.

The generated effect is as follows:

我和我的猫都很想你。 猫咪:你什么时候回来? 
狗狗:明天。 我:,好。 明天,明天,我一定回来。 
狗:,再见。 小狗:再见,小猫。 你什么时候回家? 小猫:今天。 今天,今天,我今天一定回家。 
天黑了,狗和猫都睡着了。 一天过去了,猫和狗都睡得香甜。 
第二天一大早,天刚蒙蒙亮,就听见狗的叫声,原来是猫回来了。
狗对猫说:"猫,你今天怎么这么晚才回家呢?"
猫对狗说:"我昨天晚上没睡好觉,所以今天才迟到。" 
狗说:"那你为什么不早一点回来呢?""我明天还要上班呢!"
猫回答道。 "上班?"
狗又问道,"你明天要上班吗?" "是的,我要上班。""那为什么还迟迟不回来?" "我怕
time cost is 115.72432279586792 s

It can be seen that the whole story is indeed generated according to our requirements.

postscript

We will introduce more decoding strategies in the next issue. It took so long to update because I got influenza A last week, which is more uncomfortable than the new crown. I recommend that you take oseltamivir as soon as possible if you get it. If you have the same basic disease (pharyngitis) as me, you can take it at the same time. Amoxicillin is antibacterial.

quote

https://huggingface.co/blog/how-to-generate
https://huggingface.co/docs/transformers/generation_strategies
https://huggingface.co/blog/constrained-beam-search

Guess you like

Origin blog.csdn.net/weixin_43945848/article/details/129553267