LLaMA: Open and Efficient Foundation Language Models

background

With the least computing resources, the LLM large-scale model prediction problem was solved, and a series of LLaMa models were trained to achieve the effect of large-scale models in the industry with a relatively small number of parameters.
The main contribution is to improve the training speed and efficiency of the LLM model, and greatly improve the effect of the model on the basis of small capacity.
At the same time, due to the smaller and simpler model structure, the reasoning speed is greatly improved.

data

The pre-training data is a combination of open data in the industry, which is relatively transparent.
insert image description here

model structure

The main model structure is still the classic transformer model structure, but it has been optimized. For example, instead of performing norm regularization on the output results of each layer, norm regularization is performed on the input layer. Activation functions etc. are replaced.
insert image description here

optimizer

insert image description here

Training Acceleration Optimization

Using the idea of ​​"SELF-ATTENTION DOES NOT NEED O(n2) MEMORY", the memory of self-attention is optimized, and the memory usage is simplified from O(n2) to O(log(n)), which greatly reduces the model memory Occupancy, effectively improving the ability to process long sequences.
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/WitsMakeMen/article/details/131606157