introduce
This tutorial uses C language to deploy and run the llama2 model, which can efficiently perform inference on the CPU. The main contents are:
1 Operating environment configuration, including C, python
2 Original llama2 model converted to binary format
3 Use C language to infer llama2
Environment installation and configuration
Project download:
git clone https://github.com/karpathy/llama2.c.git
Operating system: ubuntu (I tried it under Windows, and the compilation will report an error)
Software environment:
gcc make (If you already have it, you don’t need to install it)
python (I used 3.9, other versions are expected to be ok)
gcc installation: apt install build-essential
make installation: apt-get install make make
python installed and then install the dependency package, pip install -r requirements.txt
The main function of python is to convert the original llama2 model to .bin binary format
github project introduction
Using the code in this repo, you can train the Llama 2 LLM architecture from scratch in PyTorch, then export the weights to a binary file, and load it into a simple 500-line C file (run.c) that does Model inference. Alternatively, you can load, fine-tune and infer Meta's Llama 2 (but this is still actively being refined). As such, this repository is a "full-stack" training + inference solution for Llama 2 LLM, emphasizing minimalism and simplicity. You might think that you need an LLM with many billion parameters to perform any meaningful