If AI programming wants to go further, what key problems should the big model solve?

guide

On the origin of intelligent programming, it can be traced back to the auxiliary prompt function of the integrated development environment (IDE). However, the intelligent auxiliary functions of traditional IDEs are relatively simple, generally based on the input of developers and existing project codes, real-time prediction of class names, method names, code fragments, etc. in the code to be completed, and provide a list of suggestions.

The so-called intelligent programming now, under the influence of deep learning technology, is quite different from before, and even has an essential difference.

Ma Yuchi, HUAWEI CLOUD Dev AI Lab Leader and an expert in intelligent R&D algorithm technology, believes: "2021, when GitHub Copilot came out, is the first year of the emergence of truly intelligent programming. Code completion before that is equivalent to the input method. Lenovo function."

If AI programming wants to go further, what key problems should the big model solve? If AI programming wants to go further, what key problems should the big model solve?

HUAWEI CLOUD Dev AI Lab Leader, Intelligent R&D Algorithm Technical Expert Ma Yuchi

Developed by GitHub and OpenAI, GitHub Copilot's underlying technology uses a Codex model -- a model trained on exascale code, including public repositories on GitHub. What's special about GitHub Copilot is that it's not just a code completion tool. It understands more context than most code aids. Whether it's documentation, comments, function names, or the code itself, GitHub Copilot will synthesize matching code based on the context provided by the developer. Developers can get suggestions on entire lines of code or complete functions in the editor through GitHub Copilot.

The rate at which Copilot has evolved is astounding. In June 2021, when Copilot was first released, its accuracy rate was 28.8%, and the highest in the industry at that time could only reach 11%. In March of this year, the accuracy rate of Copilot X, which is connected to GPT-4, has reached 67%.

"An AI programming assistant that is not based on a large model has very poor programming ability. In our opinion, it does not even belong to the category of intelligence." Ma Yuchi said. The birth of Copilot inspired Ma Yuchi and his team a lot. So at the end of 2021, Huawei combined the research and development tool CodeArts with the Pangu large model to develop an intelligent programming assistant CodeArts Snap.

It is understood that CodeArts Snap has trained 76 billion lines of selected codes, 85 million open source code warehouses, and more than 13 million technical documents. It has three core functions: intelligent generation, intelligent question and answer, and intelligent collaboration. Click to automatically annotate and generate test cases, and deploy intelligently with one instruction.

At present, CodeArts Snap has accumulated many users and received a lot of feedback. Ma Yuchi said that in the future, CodeArts Snap will continue to evolve and continuously improve its intelligent programming capabilities. He mentioned that at the current stage, from the perspective of model optimization and engineering optimization, "code generation based on large models" still faces eight key technical challenges. This is also the direction of future evolution of CodeArts Snap.

One is Chinese-friendly code generation. At present, the pre-training corpus data of many large models are mainly in English, and the Chinese corpus accounts for only 3% to 5%. When using conversational interaction in the IDE, the performance of Chinese is far inferior to that of English. How to enhance the comprehension ability of Chinese semantics under the condition of guaranteeing the performance of the model in the case of limited corpus, so as to satisfy the same code generation ability of using Chinese/English descriptions, is a major concern at present.

The second is Prompt optimization and interactive Input improvement. One of the characteristics of the large model is that the more accurate the description when interacting, the better the prompt is written, and the better the quality of the generated content. How to judge the completeness and rationality of the task description input by the user even when the user's intent is not so clear, and clarify the intent through interaction is the key to improving the accuracy of code generation.

The third is integrated learning exploration. The current large-scale models are often tens of billions, hundreds of billions, and trillions of parameters. To "eat" such a large-scale data, the cost of inference is very high. Therefore, can the pre-trained model be combined to achieve the inference effect of a larger-scale model by using a model with a smaller number of parameters while satisfying the inference accuracy rate, so as to achieve the purpose of improving inference efficiency?

The fourth is experience evaluation and optimization. Constructing evaluation indicators/methods that are objective and close to real projects can better help the healthy development of the industry.

The fifth is model online learning. User data feedback is very helpful to improve the model's capabilities. Under the premise of protecting user privacy, fine-tuning the online large model based on the user's explicit and implicit feedback, and realizing real-time update of the online model are issues to be discussed in the industry.

Sixth is low-cost SFT. How to realize the rapid and low-cost construction of training/verification data sets for various R&D scenarios, as well as model training and automatic verification deployment is also critical.

Seven is post-processing. Post-processing is critical to improving the effect of the model in actual application scenarios. According to the project context, check and repair the compilation and operation errors of the generated code; combined with the unit test, repair the logical errors in the generated program. Although all small problems and small mistakes are solved, it can improve the quality of the entire code generation. Especially in the future, there will be more large models that will generate code and test simultaneously, matching each other to improve the overall quality of the code in a closed loop. In this case, post-processing is very helpful to improve the overall ability of the large model.

Eighth, the model is lightweight. The importance of model lightweight is unquestionable, it is related to cost, efficiency, performance and user experience. Under the premise of ensuring that the accuracy does not drop too much, it is also worthy of attention to use the lightweight model support end to measure the computing power to realize model inference.

If AI programming wants to go further, what key problems should the big model solve?

Guess you like