Decoding Strategy (Search)


beam search

And beam search is an improvement to the greedy strategy. The idea is also very simple, that is, to slightly relax the scope of investigation. At each time step, instead of only retaining the output with the highest current score, num_beams are retained. When num_beams=1, the beam search degenerates into a greedy search.
insert image description here

Advantages: Greedy search and Exhausitive Search are integrated, and a balance is struck between them. A beam size of 1 is greddy search, and a beam size of N (thesaurus size) is Exhausitive Search. Cons: None.

Exhaustive Search

The brute force search considers the global optimal solution, and needs to calculate every possible output result, and then find the output with the highest probability. This kind of search space is very huge. Assuming that our vocabulary size is N, the sentence length is T words, the entire search time complexity is O(N N T), and the value of N is generally tens of thousands to hundreds of thousands . , T is hundreds of words, the actual calculation is relatively slow.

Advantages: global optimal solution, Disadvantages: slow calculation speed.

greedy search (greedy search)

greedy search is relatively simple, it is a greedy search, each step selects the word with the highest probability for output, and finally forms the entire sentence output. The results given by this method are generally poor, because only the optimal solution of each step is considered, and there is often a large gap between the global optimal solution. Suppose our vocabulary size is N, the sentence length is T words, and the overall search time complexity is O(1 N T).
For example, in the figure below, the result with the largest conditional probability is taken out at each time step, and the sequence [A,B,C] is generated.

insert image description here

Obviously, doing so directly compresses the solution space of the original exponential level to a size linearly related to the length. Since most of the possible solutions are discarded, this strategy of focusing on the moment cannot guarantee that the final sequence probability is optimal.

Summarize

Work essay, I hope it can help everyone!
If there are deficiencies, please advise!

Guess you like

Origin blog.csdn.net/black_lightning/article/details/120931343