The CodeGeeX2 model is newly launched, and the programming assistant capabilities are fully upgraded!

The second-generation CodeGeeX code generation model CodeGeeX2-6B has been released recently, and it is fully launched on the CodeGeeX programming assistant plug-in. The new model is based on the ChatGLM2-6B architecture and adds code pre-training to achieve higher accuracy, faster speed, and stronger capabilities. Let's take a look at the changes that the new version of the model brings to the CodeGeeX programming assistant:

1. Code generation is faster and more accurate, and Q&A is smarter

Compared with the first-generation model, CodeGeeX2-6B has greatly improved the accuracy and speed of code generation. The first-generation CodeGeeX model can only generate code backwards based on the previous text, while CodeGeeX2-6B can fill in the blanks according to the context. This means that when completing the code, the content of the current cursor context can be integrated to complete the code more accurately. With the support of the new model, the question-and-answer function "Ask CodeGeeX" in the plug-in has also been upgraded. The original "Ask CodeGeeX" used the ChatGLM model to answer questions. After the update, the model used by this function is replaced by a dialogue model based on CodeGeeX2-6B fine-tuning, which is more professional and intelligent than before when answering programming-related questions.

2. Support more programming languages

The new version of CodeGeeX supports more than 100 programming languages. In addition to mainstream programming languages ​​such as Python, Java, JavaScript, and GO, the code generation effects of programming languages ​​such as Kotlin and Rust have been greatly improved. For development frameworks such as Vue commonly used by front-end programmers, the capabilities of the new version of the model have also been enhanced. The new version also has amazing performance in the scene of generating SQL queries in natural language. In the "Ask CodeGeeX" function, the corresponding SQL query statement can be automatically generated according to the specified database table structure and query requirements.

3. From 8k to 32k, longer context support

Relying on the ChatGLM2-6B base model, the context length supported by CodeGeeX2-6B has been increased to 32K. Based on this new feature, the content of other files in the current project can also be introduced as context, which enables the model to better understand the current development task when it is generated. Based on the feature of 32K context length, more new functions will be launched in the future, so stay tuned.

Attachment: Model introduction and evaluation

CodeGeeX2 is the second generation of the multilingual code generation model CodeGeeX . Different from the first-generation model, CodeGeeX2 is implemented based on the ChatGLM2 architecture and adding code pre-training. Thanks to the better performance of ChatGLM2, CodeGeeX2-6B has achieved greater performance improvements in many indicators. CodeGeeX2-6B better supports Chinese and English input, and supports a maximum sequence length of 8192. The inference speed is greatly improved compared with the first generation of CodeGeeX-13B. After quantization, it only needs 6GB of video memory to run, and supports lightweight localization deployment.

In the HumanEval evaluation, the performance of CodeGeeX2-6B has completely surpassed the StarCoder model with larger parameters and OpenAI's Code-Cushman-001 model (a model used by GitHub Copilot).

file

In terms of multilingualism, the new version of the CodeGeeX 2 model performed well in the evaluation of the HumanEval-X dataset. Compared with the first generation, the average performance of CodeGeeX2's Pass@1 indicator in each language has increased by 107%. Among them, the performance of the Rust language has been significantly improved by 321%; the performance of the C++ and JavaScript languages ​​has also increased by more than 70%.

file

Since its launch in September 2022, CodeGeeX has assisted developers to improve programming efficiency and achieved remarkable results. Up to now, the download volume of CodeGeeX plug-in has exceeded 130,000, and nearly 10 million lines of code are generated every day. The upgraded CodeGeeX plug-in continues to be free for individual users.

This article is published by OpenWrite, a multi-post platform for blogging !

Guess you like

Origin blog.csdn.net/mp817/article/details/132045753