Coding assistants based on large models are gradually being accepted by the market and industry

a8f359460aa83cb4783b4386ff357255.gif

IDInstitute of Computer Vision

Learning groupScan the QR code to get the joining method on the homepage

Computer Vision Research Institute column

Column of Computer Vision Institute

On June 6, at the Wenxin Large-scale Model Technology Exchange Conference (Chengdu), Baidu Smart Cloud launched the "Comate" code assistant, and officially opened invitation testing. With the help of the understanding and reasoning capabilities of Wenxin's big model, "Comate" can realize fast code completion, natural language code recommendation, automatic code error finding, and comprehensively improve the developer's R&D efficiency. In the future, developers can use the "Comate" code assistant in mainstream development software through plug-ins and other forms. There are already many code assistant tools on the market, will Baidu stand out?

01 background

As early as June 2021, in order to meet the future large-scale model training tasks, Baidu Smart Cloud began to plan the construction of a new high-performance GPU cluster. Together with NVIDIA, it completed the IB network architecture design that can accommodate 10,000 cards or more. The nodes in the cluster Each GPU card in the room is connected through the IB network, and the cluster construction will be completed in April 2022, providing single-cluster EFLOPS-level computing power.

In March 2023, Wenxin Yiyan was born on this high-performance cluster, and iteratively developed new capabilities. At present, the size of this cluster is still expanding. Dr. Lai Junjie, General Manager of Solutions and Engineering, NVIDIA China: GPU clusters interconnected by high-speed IB network are the key infrastructure in the era of large models. The largest high-performance GPU/IB cluster in the domestic cloud computing market jointly built by NVIDIA and Baidu Smart Cloud will accelerate Baidu's breakthrough in the field of large models.

911d2f09c83b6fb0351322e3a418a285.png

799688e12ce6b92223932d07f00d50bc.png

  • Covering the whole life cycle of large models - more comprehensive and more comprehensive

Provide comprehensive functional services for data labeling, model training and evaluation, reasoning services and application integration

  • Significantly improved training and inference performance - more efficient and more efficient

The training performance of MLPerf list is world-leading, and the acceleration capability of distributed parallel training of 100 billion models and the utilization rate of computing power have been greatly improved

  • Rapid application orchestration and plug-in integration - more open and more open

Preset Baidu Wenxin large models and third-party large models, support flexible arrangement of plug-ins and applications, and help large models to be applied in multiple scenarios

  • Built-in sensitive word filtering - safer and more secure

Perfect authentication and flow control security mechanism, built-in sensitive word filtering, double guarantee of machine review and human review

Built-in Wenxin large model base

  • technology leadership

    Knowledge-enhanced large model, unified paradigm supports multiple types of downstream tasks

    Advanced parallel strategy supports large model training, compression and deployment

    Controllable and reliable language understanding and generation capabilities

  • Full scene coverage

    Support dialogue interaction, free question and answer, copywriting and other capabilities

    Covering energy, finance, aerospace, industry, media and other fields

  • Low threshold and easy to use

    One line of code to call the service

    One-click automatic model fine-tuning

    A small amount of data to complete the implementation of multi-scenario AI applications

  • Real and landable

    Provide enterprise-level one-stop customer service

    Get through the four-tier architecture of chip + platform + model + application

    Cooperate with multiple partners to achieve end-to-end application landing

02Large model code assistant

With the increasing demand for digital transformation, more and more applications of AI in enterprises, high threshold for AI development, complex and diverse application scenarios, and dependence on scene annotation data have become challenges for the large-scale implementation of AI, while pre-training large models The emergence of artificial intelligence has brought new opportunities and hopes.

As an important starting point for the government and enterprises to promote the development of the artificial intelligence industry, large models have shown significant advantages and great potential in the generalization, versatility, and migration of AI tasks such as recognition, understanding, decision-making, and generation. It is no longer a fantasy if programmers have a code assistant who can easily and accurately assist in completing some repetitive, simple, and trivial tasks.

Now, more and more developers need to use this must-have tool. The current mainstream AI intelligent programming code assistants include Github CopilotX, Codeium, Tabnine, Replit Ghostwriter and Amazon CodeWhisperer.

  • Github CopilotX

df5d228f3de4715c30df91ea2053f6bf.jpeg

Copilot X is an upgrade to Copilot released in 2021. It is connected to GPT-4 and has added functions such as chat and voice. In Copilot X, you only need to "move your mouth" and it can write your code By the way, I also wrote the test cases for you. It can also explain the code snippets that you don’t understand, and let it help you debug directly. It is simply a thoughtful little assistant for programmers.

85dbd508c1c00a18d526f720a4e9e1cf.jpeg

With the release of OpenAI's GPT-4 model, GitHub released a new version of GitHub Copilot X. The AI ​​model of Copilot X uses the latest OpenAI GPT-4. GitHub Copilot X is committed to improving the developer experience and will provide chat and voice interfaces, support pull requests, answer documentation questions, and enable a more personalized developer experience through GPT-4. Using GitHub Copilot X, it can explain the purpose of the code, and when it encounters bugs, let Copilot X try to fix it, and even generate unit tests by the way.

  • Replit Ghostwriter

0903fcecd2b23c897fd4a254f9c3ed5c.png

Replit Ghostwriter is an artificial intelligence-based code assistance tool that helps developers quickly write, generate, convert, and interpret code, while providing a function to search and import open source code within the editor. Replit is an online integrated development environment (IDE), which supports multiple programming languages, such as Python, JavaScript, Ruby, etc., allowing developers to create, run and share code in the browser. Replit also provides functions such as multi-person collaboration, version control, and cloud deployment, allowing developers to easily build and release applications. Replit AI Ghostwriter is a new feature of Replit that leverages OpenAI's GPT-4 model to provide developers with an AI-powered coding assistance tool.

However, now Baidu Smart Cloud has created a new generation of coding assistance tools based on the Wenxin model - code assistant Comate!

912e94bdbb642998eeb197d04cddf83b.png

During the engineer's development process, Comate can predict the code by reading the declared function name through the context and comment combination code in the development. While allowing to view suggestions and manually edit suggested codes, duplicate codes are automatically filled.

The working principle is to read through the head open source code on the global GitHub repository, collect data and try to find the best code related to it, and continuously train and improve the recommendation accuracy through the returned data. The core capabilities are reflected in single-line recommendation, multi-line recommendation and natural language conversion code.

single line recommendation

82fb6e23bc6c0e18e5e3b283bdb7eaf5.gif

Multi-line recommendation

7de6208ff2605585cc926f8fd14fee19.gif

natural language transcoding

fabcf7f1e8acf3ee39495c77482af2a1.gif

After a lot of internal testing, among the codes suggested by Comate, 30%-50% of the suggested codes are adopted by developers, accounting for more than 10% of the official new codes, and more and more are applied to various product development. Comate supports mainstream IDE frameworks and currently covers 30+ languages, especially in C/C++, Python, Java, Go, PHP, JavaScript and other mainstream languages.

© THE END 

Reprint please contact this number for authorization

76c058fcab3f800f199b260de41e33bb.gif

The Computer Vision Research Institute study group is waiting for you to join!

ABOUT

Institute of Computer Vision

The Institute of Computer Vision is mainly involved in the field of deep learning, and is mainly committed to research directions such as target detection, target tracking, and image segmentation. The research institute always shares the algorithm framework of the latest papers, and the platform focuses on "research" and "practice". In the later stage, we will share the practical process for the corresponding fields, so that everyone can truly experience the real scene of getting rid of the theory, and cultivate the habit of loving programming and brain thinking!

 

 

 

Guess you like

Origin blog.csdn.net/gzq0723/article/details/131160177