In the era of big models, a developer’s guide to growth | New programmers

8224af68f19bb4287774c4f1c3618ee6.png

[Editor's Note] The launch of the GPT series has affected the world and various industries, and developers feel it most deeply. Led by ChatGPT and Github Copilot, various AI programming assistants emerge in endlessly. The programming paradigm is undergoing unprecedented changes, from assembly to high-level languages ​​such as Java, to today's Prompt project characterized by natural language. The threshold for programming has been further lowered, leaving many developers to think about how the future of programming will evolve. In this era of big models, where should developers go? Based on this, New Programmer 007: Developers in the Big Model EraSpecially invite senior programmer Phodal to write this article, hoping to be helpful to all developers in the future.

Note: "New Programmer 007" focuses on the growth of developers, including Turing Award winner Joseph Sifakis and former OpenAI scientist Joel Lehman and others are far-sighted, and have growth paths, engineering practices, and pit experience that are crucial to developers. Everyone is welcome to click to subscribe for an annual pass.

Author | Huang Fengda (Phodal)

Editor | Tang Xiaoyin

Exhibition | 《New program introduction》Edition section

In the past year, the emergence of ChatGPT has brought about a series of new changes in the entire software development industry. Whether individuals, teams, or company CXOs, they are all paying attention to the efficiency improvements brought by generative AI.

  • In terms of product development, generative AI (AIGC) has begun to affect all stages of the product life cycle. It can be used to generate candidate product designs, optimize product designs, improve product testing efficiency, and improve product quality.

  • In software development, the applications of generative AI are wide and varied, from project initiation and planning, to system design, coding, testing, and maintenance. It helps developers generate context-based code drafts and even generate test code. Additionally, it helps optimize the integration and migration of legacy frameworks and provides natural language translation capabilities to transform old legacy code into modern code.

For developers, large generative language models pose challenges but also provide valuable opportunities.

2a388361afd21ca04cbc01e7330f77c6.png

Figure 1 Corporate AIGC investment strategy

According to our previous analysis of the corporate AIGC investment strategy/difficulty curve (shown in Figure 1), three stages are also divided here:

  • L1: Learn to get along with LLM and improve personal or team efficiency.

  • L2: Develop LLM priority application architecture and explore organizational scale implementation.

  • L3: Fine-tuning and training large language models, deeply binding specific scenarios.

Although, there may be some companies that have made organizational structural adjustments (commonly known as layoffs) in anticipation of the performance improvements brought by generative AI. However, we can see that AIGC has brought more new opportunities, which can not only improve development efficiency, but also create new fields and solutions, bringing epoch-making changes to the software development industry.

Simply put,If developers want to maintain their competitiveness, we must master LLM capabilities to improve productivity, and we can also join the army of developing LLM applications.

L1: Learn to get along with LLM to improve personal or team efficiency

In the past year, we have seen many people encounter some problems while using LLM. But it is also obvious that it can improve our efficiency in many aspects, especially in many tedious matters, such as documents, test cases, code, etc.

What is LLM good at and what is it not good at?

First, we need to establish our understanding of LLM. Different models have their own strengths due to many reasons such as training materials and parameters. For example, in some WeChat robots, we use Wen Xinyiyan to query real-time information content, and then combine it with domestic and foreign open source and closed source models (such as ChatGPT, etc.) for optimization. And once you want to write some English documents and material emails, you will give priority to foreign models.

Secondly, we need to know what LLM is not good at. LLM is a language model that is good at generating text. In essence, it is a probability model, so it needs to use other tools to complete content that it is not good at (such as mathematical calculations). Therefore, we should not expect LLM to help us complete some mathematical calculations, but we should expect it to generate formulas, codes, etc. for mathematical calculations based on our context.

Learn to communicate with LLM and improve personal efficiency

The communication method of LLM is mainly through Prompt, and the construction of Prompt is a process that requires continuous iteration. In this process, you must keep trying to find the mode that suits you best. For example, the pattern I am used to is:

  • Roles and tasks. Tell the LLM what role it should be and what it needs to do.

  • background. Provide some necessary context to provide a better chance of matching the answer.

  • Require. Put some requirements on it, such as the returned format, content, etc.

  • Introductory words (optional). Let LLM better understand our intentions.

The following is the Prompt (part) generated by the author's test in the open source IDE plug-in AutoDev:

Write unit test for following code. You are working on a project that uses Spring MVC,Spring WebFlux,JDBC to build
RESTful APIs.

You MUST use should_xx style for test method name.
When testing controller, you MUST use MockMvc and test API only.

// class BlogController {
// blogService
// + public BlogController(BlogService blogService)
// + @PostMapping("/blog")     public BlogPost createBlog(CreateBlogDto blogDto)
// + @GetMapping("/blog")     public List<BlogPost> getBlog()
// }

// 选中的代码信息
Start with `import` syntax here:

The final import will be determined based on whether the user selects a class or a method. If it is a method, it will become: Start with @Test syntax here: . Since most open source code models are based on English, and the code itself used for training is also "English", using English prompts will have better results.

Refining contextual costs and making full use of various tools

After chatting with a large number of people, the author got the biggest pain point of people using AIGC tools:When writing prompts, it often exceeds the time to complete the task< a i=2>.

Therefore, to a certain extent, the context we need does not have to be accurate, but it must be refined to save yourself time. Therefore, in terms of time cost, we have to consider introducing tools or building our own tools to improve this process.

For developers, popular tools currently on the market include: GitHub Copilot, ChatGPT, Midjourney and other content generation tools. The reason why GitHub Copilot is good at generating results is that it will analyze some similar contexts based on the current code files and editing history, and then hand them over to LLM for processing. The entire process is fully automated, so it saves a lot of time. When using tools like Midjourney, we are also building an automatic prompt method - from a sentence of requirements, generate a prompt that suits your own habits to generate the corresponding picture.

Considering that each tool may cost between 10 and 30 US dollars per month, we still need to carefully study what is a more suitable solution.

It is worth mentioning that although LLM can improve performance, your workload may be higher. Therefore, the tool costs you pay should be provided by your organization, because the production capacity per unit time has increased.

Personal development: Improve the abilities that AI is not good at

As AIGC costs fall further, some departments may be downsized by companies due to generative AI. This is not because AIGC can replace humans, but people expect to improve performance by 20 to 30%, and after some teams piloted it, they found this to be the case.

Assuming that AIGC can improve the effectiveness of a team by 20%, then from the management's perspective, they will consider reducing the number of members by 20%. What’s more interesting is that if the team size is reduced by 20%, more efficiency will be improved due to the reduction in communication costs. So, the team's effectiveness increased by more than 20%.

In the short term, developers who have mastered AIGC capabilities will not be eliminated due to this trend. In the long term, this trend will intensify in the development industry, which already has a certain degree of involution. More than ten years ago, people thought of someone who knew some Java. Today, the standard has become someone who understands not only Java design patterns but also various Java algorithms. Therefore, from the perspective of personal development, we must appropriately improve the abilities that AI is not good at.

In terms of capabilities, AI is not good at solving problems with complex contexts, such as architecture design, software modeling, etc. From another level, because AI serves as a knowledge base, it can help us solve some basic problems in software development (such as the grammar of a certain language), making it easier for us to get started with new languages. This further encourages developers to become generalists and multi-language developers.

Rather than improving these capabilities,in the short term, we should join the army of developing large models. Because this is a brand new field that does not require knowledge of various traditional AI algorithms, only knowledge of engineering applications is required.

L2: Develop LLM priority application architecture and explore organizational scale implementation

Similar to the mobile wave a decade ago, generative AI lacks a large pool of talent. These talents are difficult to recruit directly from the market and need to be converted from existing technical talents. So, since we may not be able to beat the AI, let’s join the wave.

b36f64601de103e95e9b00f7efcbae84.jpeg

The wave of AI is sweeping across (Source: generated by editor AIGC)

PoC experiment: Integrate existing products into LLM and explore LLM-first architecture

In the past year, we can see that a large number of developers have joined the development army of LLM applications. It does not have too complicated technology, it only requires some simple prompt basics, as well as design about user experience and so on. From various online chatbots to integration with internal IT systems and applications in business. Only by actually using it in a project will we discover the advantages and disadvantages of LLM.

Considering the capability differences and capabilities between different models, we recommend using some better large language models to know what the best large models can bring. From the author's perspective, the model I use more often is ChatGPT 3.5. Firstly, it is expected that domestic models will reach this level by the end of 2023 or 2024. Secondly, its cost is relatively low and it can be applied on a large scale.

a5d9a6db0d44b8cdb13a6a4a7bcea3ed.jpeg

Figure 2 LLM priority architecture

The following are the four principles of the LLM-first architecture (shown in Figure 2) based on our internal and external experience:

  • User intent driven design. Design new human-computer interaction experiences and build domain-specific AI characters to better understand user intentions. Simply put, find ways of interacting that are better suited to understanding human intent.

  • Context aware. Build an application architecture suitable for obtaining business context to generate more accurate prompts and explore engineering methods with high response speed. That is, Prompt projects around high-quality context.

  • Atomic ability mapping. Analyze the atomic capabilities that LLM is good at, combine them with the capabilities that the application lacks, and perform capability mapping. Let each AI do what it is good at, such as making good use of the AI's reasoning capabilities.

  • Language API. Explore and find suitable new generation APIs to facilitate LLM’s understanding, scheduling and orchestration of service capabilities. For example, natural language serves as a human-machine API, and DSL serves as an API between AI and machines.

For example, when we build an application that generates SQL, charts, and UIs from text, the first design we do is: the user inputs a sentence, let LLM analyze the user's intention based on the context we give, and then generate the corresponding DSL. The following is a refined version of this type of tool thinking chain:

思考:是否包含了用户故事和布局信息,如果明确请结束询问,如果不明确,请继续询问。
行动:为 ”CONTINUE” 或者 ”FINISH”
询问:想继续询问的问题
最终输出:完整的问题
思考-行动-询问可以循环数次,直到最终输出

Subsequently, after having the user's intention, we will generate the corresponding DSL based on the user's intention, and then the DSL will generate the corresponding SQL, charts, UI, etc. Here is a simple DSL:

pageName: 博客详情页
usedComponents: Grid, Avatar, Date, Typography, CardMedia, Button,
------------------------------------------------------
| NavComponent(12x)                                              |
------------------------------------------------------
| Text(6x, "标题")              | Empty(6x)                     |
------------------------------------------------------
|Avatar(3x, "头像") | Date(3x, "发布时间")| Empty(6x)|
------------------------------------------------------
| CardMedia(8x)                   | Empty(4x)                   |
------------------------------------------------------
| Typography(12x, "内容")                                            |
------------------------------------------------------
| FooterComponent(12x)                                              |
------------------------------------------------------

Then, the final application code is generated based on the associated component sample code and business context information.

LLM as Co-pilot: Build and apply Copilot-type applications according to your own habits

I believe that most readers already have some experience with GitHub Copilot. For this type of AI tools that revolve around individual roles, we will call them Copilot-type applications. So, at this stage, we'll call it LLM as Co-pilot:

That is, the professional division of labor in software engineering is not changed, but each professional technology is enhanced. The AI-based R&D tool platform assists engineers in completing tasks and affects individual work.

It is mainly used to solve "I am too lazy to do" and "I do it repeatedly" things. Within Thoughtworks, different roles have also built their own Copilot-type applications, such as:

  • BoBa AI for product managers. Used to conduct industry research, design participation plans, prepare for seminars, etc.

  • BA Copilot for Business Analysts. Help business analysts convert one-sentence requirements into acceptance conditions, test cases, etc.

  • AutoDev for developers. Used for code completion and generation, test generation, code interpretation, code translation, documentation generation, and more.

  • For testers: SpeedTest. Used to generate test cases, UI test code generation, API test code generation, etc.

  • ……

When building this type of tool, we need to have a deep understanding of how each role works, as well as their pain points. For example, when developers use IDEs, they not only write code, but the built-in IDE AI plug-in like JetBrains AI Assistant will:

  • When the user renames a method, let AI generate possible class names, method names, and variable names.

  • When a user makes a mistake, let AI generate possible solutions.

  • Let AI generate possible commit messages as the user writes them.

  • ……

From a personal perspective, these tools are built for general scenarios and can be called general AI tools. Within the enterprise or if we have the ability, we can build AI tools that are more suitable for us to improve efficiency. In the process of developing AutoDev, the author was thinking about this problem and how to build an AI tool that is more suitable for myself. Therefore, a large number of customizable specifications, IDE intelligent behaviors, document generation and other functions have been added to improve its efficiency. For example, you can add custom intelligent behaviors to the code base, such as directly selecting the code and letting it generate tests:

---
interaction: AppendCursorStream
---
你是一个资深的软件开发工程师,你擅长使用 TDD 的方式来开发软件,你需要根据用户的需求,帮助用户编写测试代码。

${frameworkContext}

当前类相关的代码如下:

${beforeCursor}

用户的需求是:${selection}

请使用 @Test 开头编写你的代码块:

Such tools should provide some customized capabilities after completing common functions to improve personal efficiency.

LLM as Co-Integrator: Exploring the construction of knowledge-based teams and accelerating integration between teams

In addition to the above-mentioned single-point tools, how to achieve this full collaboration effect across roles and teams is a question we need to continue to explore. In the field of software development, we call this stage LLM as Co-Integrator, and its definition is:

Synergies across R&D responsibilities and roles. The AI-based R&D tool platform solves the problem of different role communication and efficiency, and affects role interaction.

It is mainly used to solve the problem of information communication alignment. In non-software development areas, such as internal IT systems, it is similar, how to use AI to assist in the alignment of information based on information between teams. For example, we can help the team automatically schedule by obtaining personal calendar information, and combine it with IM (instant chat) to assist in determining meeting times.

In these scenarios, we can see that many companies are starting from building internal knowledge question and answer systems and exploring the application of this type of tools to further explore the possibilities of AIGC. In some leading companies, internal knowledge platforms are directly built. Employees only need to upload their own documents, codes, etc., and they can ask questions based on them as context. On some mature MLOps/LLMOps platforms, it can directly provide APIs based on the knowledge platform to be directly inserted into applications.

The core of these LLM-based question and answer tools is the RAG (Retrieval Enhanced Generation) mode, which is also the core capability that developers need to master when building AIGC. Here is a simple RAG representation (based on RAGScript):

@file:DependsOn("cc.unitmesh:rag-script:0.4.1")

import cc.unitmesh.rag.*

rag {
    // 使用 OpenAI 作为 LLM 引擎
    llm = LlmConnector(LlmType.OpenAI)
    // 使用 SentenceTransformers 作为 Embedding 引擎
    embedding = EmbeddingEngine(EngineType.SentenceTransformers)
    // 使用 Memory 作为 Retriever
    store = Store(StoreType.Memory)

    indexing {
        // 从文件中读取文档
        val document = document("filename.txt")
        // 将文档切割成 chunk
        val chunks = document.split()
        // 建立索引
        store.indexing(chunks)
    }

    querying {
        // 查询
        store.findRelevant("workflow dsl design ").also {
            println(it)
        }
    }
}

A typical LLM + RAG application is divided into two stages:

  • indexing phase. Cut documents, codes, etc. into chunks and then create indexes.

  • query stage. According to the user's query, the relevant chunk is found from the index, and the answer is generated based on the chunk.

In these two stages, due to different scenarios, it is necessary to consider combining different RAG modes to improve the retrieval quality and thus the answer quality. For example, in the code indexing scenario, in the indexing stage, there will be different splitting methods depending on the code splitting rules; while in the query stage, different modes will be combined to improve the quality of the answer. Here are some common RAG patterns used in the coding world:

  • Query2Doc. The original Query is expanded into rewritten words that are highly relevant to user needs, and multiple rewritten words are searched together with the user's search terms.

  • HyDE (Hypothetical Documentation). By generating fictitious documents and codes and converting them into vectors, it helps the search system find relevant information without relevance tags.

  • LostInTheMiddle. Performance is generally highest when relevant information appears at the beginning or end of the input context, but performance drops significantly when the model must access relevant information in the middle of a long context.

Due to space limitations, we will not continue the discussion here.

Design and develop internal framework to apply LLM on a large scale

As we build more and more large model applications within our organization, we will discover:How to apply generative AI applications at scale will become our next challenge. For this problem, different organizations have different ideas. Some will build large model platforms and build a series of general capabilities on the platform to help teams quickly build applications. However, this model is limited to large IT organizations. In small organizations, what we need to build is a more lightweight framework. This framework should be integrated into various internal infrastructures to provide some common capabilities.

Although a certain number of developers have used LangChain to develop AIGC applications, because it is a large framework and uses the Python language. For enterprises using the JVM language system, it is not easy to directly integrate with business applications. Therefore, either we build an API service layer around LangChain, or we have to consider providing an SDK based on the JVM language.

c4d9a78f0dc7709423f0f94c5f25c75b.png

LLM SDK's location in the LLM reference architecture

At this point, the Spring Framework's experimental project Spring AI provides a very good example. It refers to a series of ideas from LangChain and LlamaIndex, and builds its own modular architecture - relying on Gradle and Maven on demand. Import modules. But for enterprises, limited by our scenarios and infrastructure, it cannot be well integrated into our applications.

Therefore, we built our own LLM SDK - Chocolate Factory in our software application development scenario. In addition to providing basic LLM capability encapsulation, it also adds some of its own features specifically for R&D scenarios, such as code splitting (split) based on syntax analysis, Git commit information parsing, Git commit history analysis, etc.

L3: Fine-tuning and training large language models, deeply binding specific scenarios

In the past few months, a series of open source models have emerged, all joining the game and changing the rules of the game. For most organizations, we do not need to build models ourselves, but can directly use open source models, so developers do not need to master model training or fine-tuning capabilities. Of course, the author's ability in this area is relatively limited, but it is a direction worth considering by developers.

Build an LLMOps platform to accelerate the implementation of large model applications

The LLMOps platform is a platform for managing the application lifecycle of large language models (LLMs). It includes a set of tools and best practices for developing, deploying, maintaining, and optimizing LLM. The purpose of the LLMOps platform is to enable developers to use LLM efficiently, scalably, and securely to build and run real-world applications such as chatbots, writing assistants, programming assistants, and more. ——Bing Chat

Building an LLMOps platform is slightly more difficult than developing large model applications. Because it needs to consider more issues, from model deployment and fine-tuning, to model monitoring, management and collaboration, to the experience design of developers as users.

In essence, what LLMOps does is to accelerate the rapid implementation of LLM applications, so it needs to consider more issues:

  • Quick PoC (proof of concept). How to quickly build a PoC on the platform to verify the feasibility of business requirements?

  • Multi-model routing. How to unify large model APIs of the same type so that developers can quickly access and test them, such as ChatGLM, Baichuan, LLaMA 2, etc.

  • Compliance management. How to avoid data security caused by data export through the platform? Ensure data is auditable?

  • Quick application access. For example, by uploading internal documents and materials, external APIs can be provided to facilitate developers to quickly access.

On this basis, for the LLMOps platform, a platform in the AI ​​2.0 era, it also needs to provide model training, fine-tuning and optimization capabilities. To provide low-threshold model optimization capabilities for different business scenarios.

Therefore, for developers, being able to join such platform development can also help them grow quickly.

Model fine-tuning to adapt to specific task scenarios

For most developers, including the author, it is almost impossible to join the army of training models. Because we need a lot of data, computing resources, and time to train a good model. Therefore, we can only work on fine-tuning the model to improve the effect of the model.

Regarding model fine-tuning, we can see that only a home GPU (such as RTX 3090) is needed for fine-tuning. Considering that I have been using a Macbook Pro, I use online cloud GPUs when doing fine-tuning. What model fine-tuning does is adapt a pre-trained language model to a specific task or domain. For example, one of the capabilities in the API provided by ChatGPT is Function Calling. Simply put, it can output our preset function calling parameters based on user input, that is, identify and format the output. Therefore, we can collect 10,000+ related data and fine-tune a tens of billions model so that this model can achieve the corresponding functions.

Therefore,the essence of model fine-tuning is to quickly build a model for a specific task and scenario with the help of high-quality data. Once fine-tuned, it becomes very poor at performing other tasks. Therefore, it is necessary to combine dynamic LoRA loading capabilities (such as similar capabilities on Stable Diffusion) on the platform to improve better flexibility.

For developers, they need to master a series of skills including data collection and cleaning, model fine-tuning, model evaluation and deployment, etc. In addition, it is important to understand how to use dynamic loading and other model enhancement techniques to meet different needs.

Self-developed enterprises and large industry models

The basic open source models use public corpora and lack industry-specific corpora. For example, in the field of programming, according to StarCoder's training corpus (mainly based on GitHub):

9b13f754f2c755a4a844636ce846e053.png

In some large companies in the communications industry, they have a large amount of relevant code based on C language, and have also built proprietary communication protocols, which are not public. Therefore, these companies can build better models based on open source corpus + internal corpus, and tools based on these models can better improve the code acceptance rate.

Therefore, for large enterprises, these general models do not meet their needs. But if you want to develop your own model, in addition to a large amount of public corpus, you also need a large amount of internal corpus. Since the author has limited experience in this area, I will not discuss it here.

Summarize

In the era of large models, developers face huge opportunities and challenges. Generative AI (AIGC) is increasingly changing all aspects of the software development industry, from product development to code writing, from testing to maintenance, and even to the arrangement and coordination of work tasks (LLM as Co-Facilitator).

As developers, we can gradually master these capabilities:

1. First learn to get along with large language models (LLM) to improve personal or team efficiency. Including understanding the capabilities of LLM, learning to effectively use Prompt to communicate with LLM, and making full use of various tools.

2. Focus on developing LLM-first application architecture and explore organizational scale implementation. This includes conducting various prototype experiments, integrating large models into existing products, building Copilot-style applications, and designing and developing internal frameworks to scale application development.

3. In-depth fine-tuning and training of large language models, and deeply binding them to specific scenarios. Although not every developer needs to master the ability of model training and fine-tuning, this is a direction well worth considering.

In fact, if we look at it from a broad perspective, there are quite a lot of explorations in the field of combining large models with biology and medicine abroad, which is also a point worthy of our attention. Back in China, we can see that a hundred flowers are blooming in China. If we don’t join the AI ​​development army now, we will wait until later.

About the Author:

Huang Fengda (Phodal), Thoughtworks China open source director, technical expert, CSDN blog expert, is a geek and creator. He is the author of books such as "Front-End Architecture: From Entry to Micro Front-End", "Designing the Internet of Things by Yourself", "Full-Stack Application Development: Lean Practice", etc. Mainly focused on AI + engineering performance, as well as architecture design, IDE and compiler related areas.

4990c21b838abd378301a4a192562375.gif

ae329ff040d69ff87fec093b06f1d667.jpeg

Guess you like

Origin blog.csdn.net/programmer_editor/article/details/134303705