Approaching GPT-4, AI programming needs to be revolutionized! Code Llama, the most powerful code tool in the history of Meta open source

Source | Xinzhiyuan ID | AI-era

With the help of open source Llama's crazy-killing Meta, today we have another big move!

The dedicated programming version of Code Llama is officially open source and available for free commercial use and research.

picture

Code Llama is fine-tuned from the Llama-2 basic model. There are three versions: basic version, Python version, and instruction follow.

Each version has 3 parameters: 7B, 13B, 34B. It is worth mentioning that a single GPU can run 7B models.

Based on the evaluation, the performance of Code Llama is on par with GPT-3.5, while the 34B parameter model is close to GPT-4 on the HumanEval benchmark.

picture

I don’t know, have you noticed one of the models: Unnatural Code Llama.

OpenAI scientist Karpathy, at first glance, looks very nice!

But about this Unnatural Code Llama, with its mysterious name, vague description, secrecy, and crushing all other models, how tempting it is!

picture

picture

After the release of Code Llama, LeCun also liked and forwarded his scientific research results like crazy.

Nvidia scientist Jim Fan said that Llama-2 almost reached the level of GPT-3.5, but fell far behind in terms of programming, which is really disappointing. Now, Code Llama has finally closed this gap with GPT-3.5!

Programming is undoubtedly the most important LLM task. It is the cornerstone of powerful inference engines and powerful AI agents like Voyager.

picture

The emergence of Code Llama marks a major leap in AI programming. Everyone can use this model to carry out complex and precise programming development tasks.

In addition, it is worth mentioning that Perplexity’s chat tool is now available with Code Llama.

Come and give it a try:

https://labs.perplexity.ai/?utm_content=first_codellama&s=u&utm_source=twitter&utm_campaign=labs

picture

How was Code Llama trained?

Meta claims that Code Llama is the most advanced model in LLM currently publicly available for coding tasks. It can make developers' workflow faster and more efficient and lower the barrier to learning programming.

Code Llama can be used as a productivity and educational tool to help programmers write software that is more stable and conforms to coding standards.

Meta believes that open source strategies can promote innovation in the field of AI and are the best way to develop safe and responsible AI tools.

Therefore, the community license agreement of Code Llama and Llama2 is exactly the same, and it is free for both academic and commercial use.

Code Llama is a version of Llama 2 with enhanced coding capabilities.

picture

Paper address: https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/

Meta extracted more data from the data set used to train Llama 2 code capabilities, trained for a longer time, and got Code Llama.

picture

It can generate code and natural language descriptions related to the code based on the code and natural language prompts (such as "Write a function that outputs the Fibonacci sequence").

It can also be used for code completion and debugging, and supports today's most popular programming languages, including Python, C++, Java, PHP, Typescript (Javascript), C# and Bash.

picture

Meta has released three Code Llama of different sizes, with parameters of 7B, 13B and 34B respectively.

Each model was trained using 500B tokens of code and code-related data.

Among them, the basic models of 7B and 13B and the Instruct model have also been trained with the Fill-In-the-Middle (FIM) capability, allowing them to insert code into existing code, enabling them to complete tasks related to code completion. .

These three models can meet different service scenarios and latency requirements.

The 7B model can be run on a single GPU.

The 34B model generates the best results, and the coding assistance effect is also the best. However, the smaller 7B and 13B models run faster and are more suitable for tasks that require low latency, such as real-time code completion.

picture

The Code Llama model can stably support contexts of up to 100,000 tokens. And all models are trained based on 16k token sequences, showing improved effects on inputs of up to 100,000 tokens.

In addition to generating longer programs, having longer context capabilities also brings many new uses for large language models of code.

It allows users to feed longer code base context to the model, making the generated results more relevant to the original code.

This also allows the model to play a greater role in scenarios such as "Debugging large code bases".

Because in this case, it is a headache for developers to keep track of all the code "related to a specific problem."

When developers need to debug a large amount of code, they can feed the entire code snippet directly to the model.

picture

In addition, Meta has further fine-tuned two sub-versions of Code Llama:

Code Flame - Python和Code Flame - Instruct。

Code Llama - Python is the product of using Code Llama to further fine-tune the 100B Python code token.

Since Python is the most commonly used language for code generation tasks, and Python and PyTorch play a pivotal role in the AI ​​community, specifically training a model that can provide better support for Python can greatly enhance the practicality of the model.

Code Llama - Instruct is Code Llama that has been fine-tuned and aligned with instructions.

Meta feeds "natural language instructions" to the model and gives the desired output. This process makes the model better at understanding the expected outcomes of human cues.

Meta recommends using Code Llama-Instruct for code generation tasks because Code Llama-Instruct has been fine-tuned to generate more useful and safer natural language responses.

Meta does not recommend using Code Llama or Code Llama - Python directly to perform general natural language tasks, as neither model is designed to follow natural language instructions.

Moreover, Code Llama is only used for code-related tasks and is not suitable as a basic model for other tasks.

picture

Refresh SOTA and crush the open source dedicated code model

How does Code Llama perform?

Meta adopts two popular coding benchmarks: HumanEval and Mostly Basic Python Programming (MBPP).

HumanEval tests the model's ability to complete code based on docstrings, while MBPP tests the model's ability to write code based on descriptions.

The results show that Code Llama outperforms open source code-specific LLM and outperforms Llama 2.

Code Llama 34B scored 53.7% on HumanEval and 56.2% on MBPP, almost tied with ChatGPT.

picture

Similarly, Code Llama, as a large model, also has unknown risks.

In order to build AI models responsibly, Meta took a number of measures, including red team testing, before releasing Code Llama.

Researchers conducted a quantitative assessment of the risk of Code Llama generating malicious code.

picture

By creating prompts that attempt to lead to the generation of malicious code with clear intent, Code Llama's responses to these prompts were scored compared to those of ChatGPT (GPT3.5 Turbo).

It turns out that Code Llama gives a more secure response.

Code open source

Today, Meta also released the Code Llama source code so that the entire community can evaluate its capabilities, identify issues, and fix vulnerabilities.

picture

Model download

To download model weights and taggers, visit the Meta AI website and accept the license.

Once the request is approved, you will receive a URL in an email. Then run the download.sh script, passing the provided URL when prompted to start the download. Make sure to copy the URL text and do not use the "Copy link address" option when you right-click the URL.

If the copied URL text starts with: https://download.llamameta.net, the copy is correct. If the copied URL text starts with: https://l.facebook.com, then the copy is wrong.

Prerequisite: Make sure you have wget and md5sum installed. Then run the script: bash download.sh.

Keep in mind that the link expires after 24 hours and a certain number of downloads. If you start seeing errors such as 403: Forbidden, you can always re-request the link.

set up

In a conda environment with PyTorch/CUDA available, clone the repo and run in the top directory:

 
 
pip install -e .

reasoning

Different models require different MP values:

picture

All models support sequence lengths up to 100,000 tokens, but Meta pre-allocates cache based on max_seq_len and max_batch_size values.

So set it up based on your hardware and use case.

Pretrained code model

Code Llama and Code Llama-Python models have no fine-tuning instructions to follow. When prompting, the expected answer should be a natural continuation of the prompt.

See example_completion.py for some examples. To illustrate, see the command below to run it using the CodeLlama-7b model (nproc_per_node needs to be set to the MP value):

 
 
torchrun --nproc_per_node 1 example_code_completion.py \    --ckpt_dir CodeLlama-7b/ \    --tokenizer_path CodeLlama-7b/tokenizer.model \    --max_seq_len 128 --max_batch_size 4

The pre-trained code models are: Code Llama models CodeLlama-7b, CodeLlama-13b, CodeLlama-34b and Code Llama-Python models CodeLlama-7b-Python, CodeLlama-13b-Python, CodeLlama-34b-Python.

code padding

Code Llama and Code Llama-Instruct7B and 13B models are able to fill in code based on the surrounding environment.

See example_infilling.py for some examples. The CodeLlama-7b model can be run to populate using the following command (nproc_per_node needs to be set to the MP value):

torchrun --nproc_per_node 1 example_text_infilling.py \    --ckpt_dir CodeLlama-7b/ \    --tokenizer_path CodeLlama-7b/tokenizer.model \    --max_seq_len 192 --max_batch_size 4

The pre-trained filler models are: Code Llama models CodeLlama-7b and CodeLlama-13b and Code Llama-Instruct models CodeLlama-7b-Instruct, CodeLlama-13b-Instruct.

Instructions to fine-tune the model

Code Llama - Directive model fine-tuned to follow instructions.

To obtain expected functionality and performance, the specific format defined in chat_completion needs to be followed, including INST and <<SYS>> tokens, BOS and EOS tokens, and intervening spaces and newlines (it is recommended to call strip() on the input to avoid double grid).

You can also deploy additional classifiers to filter out inputs and outputs deemed unsafe. See the llama-recipe repository for examples of how to add safety checkers to the input and output of your inference code.

Example using CodeLlama-7b-Instruct:

torchrun --nproc_per_node 1 example_instructions.py \    --ckpt_dir CodeLlama-7b-Instruct/ \    --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \    --max_seq_len 512 --max_batch_size 4

The fine-tuned instruction following models are: Code Llama - Instruct models CodeLlama-7b-Instruct, CodeLlama-13b-Instruct, CodeLlama-34b-Instruct.

Responsible use

Meta's research paper reveals more details about the development of Code Llama, as well as specific methods for benchmarking.

The paper also includes specific information on the current limitations of the model, known challenges encountered, as well as measures taken by Meta to address and future challenges.

Meta has also updated its Responsible Use Guidelines, which include guidance on how to responsibly develop downstream models, including:

  1. Define content strategy and mitigation measures

  2. Prepare data

  3. Fine-tune the model

  4. Evaluate and improve performance

  5. Address risks at the input and output levels

  6. Build transparency and reporting into user interactions

Developers should evaluate their models using code-specific evaluation benchmarks and perform security studies on code-specific use cases, targeting issues such as generating malware, computer viruses, or malicious code.

Coding the future of generative AI

Code Llama is designed to assist software engineers in various fields in their daily work and can play an important role in research, industry, open source projects, non-profit organizations and enterprises.

However, the areas where the basic model and Instruct model can play a role are far beyond these.

Meta hopes that Code Llama will inspire the public to further develop Llama 2 and become a new creative tool for research and commercial product creation.

Netizens take action

As soon as Code Llama was released, some people could not wait to run away.

Code Llama-34B runs on four 3090 graphics cards, 49ms per token.

picture

picture

Here is some data from inference on Code Llama models of different parameter sizes on M2 Ultra using the latest llama.cpp.

picture

picture

In addition to code completion and code generation, it can also help you find bugs or program pairs.

picture

picture

References:

https://ai.meta.com/blog/code-llama-large-language-model-coding/

Guess you like

Origin blog.csdn.net/lqfarmer/article/details/132642306