CodeGeeX2 is the second generation model of the multilingual code generation model CodeGeeX ( KDD'23 ). Different from the first generation of CodeGeeX (which was completely trained on the domestic Huawei Ascend chip platform), CodeGeeX2 is based on the ChatGLM2 architecture and adds code pre-training. Thanks to the better performance of ChatGLM2, CodeGeeX2 has achieved performance improvements in multiple indicators (+107% > CodeGeeX ; Only 6 billion parameters (nearly 10% of StarCoder-15B exceeding 15 billion parameters), and more features include:
- More powerful coding capabilities : Based on the ChatGLM2-6B base language model, CodeGeeX2-6B has further been pre-trained with 600B code data. Compared with the first-generation model, it has comprehensively improved its coding capabilities. All six programming languages in the HumanEval-X evaluation set have Significant improvement (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321%), reaching a Pass@1 first-time pass rate of 35.9% on Python, exceeding the scale The larger StarCoder-15B.
- Better model features : Inheriting the ChatGLM2-6B model features, CodeGeeX2-6B better supports Chinese and English input, supports a maximum sequence length of 8192, and the inference speed is greatly improved compared to the first generation CodeGeeX-13B. After quantification, it only needs 6GB of video memory to run, and supports Lightweight localized deployment.
- More comprehensive AI programming assistant : CodeGeeX plug-in ( VS Code , Jetbrains ) back-end upgrade, supports more than 100 programming languages, and adds practical functions such as contextual completion and cross-file completion. Combined with the Ask CodeGeeX interactive AI programming assistant, it supports Chinese and English dialogue to solve various programming problems, including but not limited to code explanation, code translation, code error correction, document generation, etc., helping programmers develop more efficiently.
quick start
Use transformers
Quick Call CodeGeeX2-6B :
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True) model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda') model = model.eval() # remember adding a language tag for better performance prompt = "# language: Python\n# write a bubble sort function\n" inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device) outputs = model.generate(inputs, max_length=256, top_k=1) response = tokenizer.decode(outputs[0]) >>> print(response) # language: Python # write a bubble sort function def bubble_sort(list): for i in range(len(list) - 1): for j in range(len(list) - 1): if list[j] > list[j + 1]: list[j], list[j + 1] = list[j + 1], list[j] return list print(bubble_sort([5, 2, 1, 8, 4]))
Start Gradio DEMO:
python ./demo/run_demo.py
usage: run_demo.py [-h] [--model-path MODEL_PATH] [--example-path EXAMPLE_PATH] [--quantize QUANTIZE]
[--chatglm-cpp] [--fastllm] [--n-gpus N_GPUS] [--gpu GPU] [--cpu] [--auth] [--username yourname]
[--password yourpassword]
[--port PORT] [--listen ADDRESS]
# 若要启用身份验证,请先启用--auth,然后定义--username与--password,如:
python run_demo.py --auth --username user --password password # 若要监听所有地址请指定 --listen 0.0.0.0
Supports quantitative inference acceleration using ChatGLM.cpp :
python ./demo/run_demo.py --quantize 4 --chatglm-cpp
Start FAST API:
python ./demo/fastapicpu.py
usage: fastapicpu.py [-h] [--model-path MODEL_PATH] [--listen ADDRESS] [--port PORT] [--workders NUM] [--cpu] [--half] [--quantize QUANTIZE] [--chatglm-cpp]
# --cpu启用cpu --half启用.half()
Supports quantitative reasoning acceleration using ChatGLM.cpp , just add --quantize 4 --chatglm-cpp
parameters.
API usage examples
curl -X POST "http://127.0.0.1:7860" \
-H 'Content-Type: application/json' \
-d '{"lang": "Python", "prompt": "# Write a quick sort function"}'
❗️Please note:
-
CodeGeeX2-6B is a base code generation model without chat capabilities. Please go to the plug-in to experience a more comprehensive Ask CodeGeeX chat function.
-
When using the completion function of CodeGeeX2-6B, the input prompt needs to follow a specific format to obtain the best results. For example, you need to add a programming language tag at the beginning (
# language: Python
see the complete language list ), write prompts in the form of comments, etc.run_demo.py
Processing in ref . -
If the graphics card does not support
bfloat16
the format, incorrect content will be output, and the model needs to be converted to thefloat16
format:model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).half().cuda()
-
If you need to use multiple graphics cards to load the model, you can add the following code:
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True) model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda') model = model.eval()
Replace with
def get_model(): tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True) from gpus import load_model_on_gpus # The gpus file is in the demo folder model = load_model_on_gpus("THUDM/codegeex2-6b", num_gpus=2) model = model.eval() return tokenizer, model tokenizer, model = get_model()
Code ability assessment
As a multi-language code generation base model, CodeGeeX2's coding capabilities have been greatly improved compared to the previous generation. The following are the evaluation results on the HumanEval, HumanEval-X, and DS1000 benchmarks (the definition of the evaluation indicator Pass@k is consistent with the paper ) :
HumanEval (Pass@1,10,100)
Model | Pass@1 | Pass@10 | Pass@100 | Introduction | company |
---|---|---|---|---|---|
CodeGen-16B-multi | 19.2 | 34.6 | 55.2 | Open source | Saleforce |
CodeGeeX-13B | 22.9 | 39.6 | 60.9 | Open source | Tsinghua University |
Codex-12B | 28.8 | 46.8 | 72.3 | Descendants of GPT-3, not open source | OpenAI |
CodeT5Plus-16B-mono | 30.9 | 51.6 | 76.7 | Open source | Saleforce |
Code-Cushman-001 | 33.5 | 54.3 | 77.4 | ||
LLaMA-65B | 23.7 | - | 79.3 | ||
LLaMA2-70B | 29.9 | - | - | ||
CodeGen2.5-7B-mono | 33.4 | 58.4 | 82.7 | Open source | Saleforce |
StarCoder-15B | 33.2 | 61.0 | 84.7 | BigCode | |
CodeGeeX2-6B | 35.9 | 62.6 | 88.3 | Open source | Tsinghua University |
Pass@1 is used
n=20, t=0.2, top_p=0.95
; Pass@10, Pass@100 is usedn=200, t=0.8, top_p=0.95
.
HumanEval-X (Pass@1)
Model | Python | C++ | Java | JavaScript | Go | Rust | Overall |
---|---|---|---|---|---|---|---|
CodeGen-16B-multi | 19.2 | 18.1 | 15.0 | 18.4 | 13.0 | 1.8 | 14.2 |
CodeGeeX-13B | 22.9 | 17.1 | 20.0 | 17.6 | 14.4 | 4.3 | 16.0 |
Replit-code-v1-3B | 22.0 | 20.1 | 20.1 | 20.1 | 12.2 | 8.6 | 17.2 |
CodeGen2.5-7B-multi | 30.6 | 24.3 | 29.0 | 27.5 | 18.9 | 20.1 | 25.1 |
StarCoder-15B | 35.5 | 28.2 | 31.5 | 33.2 | 21.3 | 17.8 | 27.9 |
CodeGeeX2-6B | 35.9 | 29.3 | 30.8 | 32.2 | 22.5 | 18.1 | 28.1 |
Pass@1 is used
n=20, t=0.2, top_p=0.95
.
scripts/run_humanevalx.sh
The above results can be reproduced using scripts . For environment configuration and description, see the evaluation environment .
DS1000 (Pass@1)
Model | Matplotlib | Numpy | Pandas | Pytorch | SciPy | Scikit-learn | TensorFlow | Overall |
---|---|---|---|---|---|---|---|---|
# Samples | 155 | 220 | 291 | 68 | 106 | 115 | 45 | 1000 |
CodeGen-16B-Mono | 31.7 | 10.9 | 3.4 | 7.0 | 9.0 | 10.8 | 15.2 | 11.7 |
code-cushman-001 | 40.7 | 21.8 | 7.9 | 12.4 | 11.3 | 18.0 | 12.2 | 18.1 |
Codex-001 | 41.8 | 26.6 | 9.4 | 9.7 | 15.0 | 18.5 | 17.2 | 20.2 |
CodeGeeX2-6B | 40.5 | 25.5 | 14.5 | 17.3 | 19.3 | 24.0 | 23.0 | 23.1 |
StarCoder-15B | 51.7 | 29.7 | 11.4 | 21.4 | 20.2 | 29.5 | 24.5 | 26.0 |
Codex-002 | 57.0 | 43.1 | 26.5 | 41.8 | 31.8 | 44.8 | 39.3 | 39.2 |
Pass@1 is used
n=40, t=0.2, top_p=0.5
.
The above results can be reproduced using the DS1000 evaluation code .
Quantify inference performance
CodeGeeX2 is more deployment-friendly than the previous generation. Thanks to the use of Multi-Query Attention and Flash Attention, inference is faster and only requires 6GB of video memory to run after quantization:
Quantify
Model | FP16/BF16 | INT8 | INT4 |
---|---|---|---|
CodeGeeX-13B | 26.9 GB | 14.7 GB | - |
CodeGeeX2-6B | 13.1 GB | 8.2 GB | 5.5 GB |
Based on PyTorch 2.0 test,
torch.nn.functional.scaled_dot_product_attention
efficient Attention calculation is implemented.
reasoning
Model | Inference speed (characters/second) |
---|---|
CodeGeeX-13B | 32 |
CodeGeeX2-6B | 94 |
batch_size=1, max_length=2048
, all use the acceleration framework, and the test hardware isGeForce RTX-3090
.