"A person's ability is limited, but a team's strength is unlimited", this sentence is vividly reflected in the software development in the real world. For complex tasks, people use teamwork strategies to solve them. But does the same pattern apply in the AI world?
In April this year, the team of Professor Li Ge of Peking University proposed a new self-collaboration (self-collaboration) model . It allows multiple large models to play different roles and form a software development team without human participation. Through the cooperation and interaction between large models, the entire software development process can be completed autonomously, even including some complex code generation tasks.
Paper link: https://arxiv.org/pdf/2304.07590.pdf
Paper publication time: 2023/4/15
Although large language models (referred to as: large models) have demonstrated amazing capabilities in code generation, there are still challenges in handling complex tasks. In the real software development process, people usually solve complex tasks through the strategy of collaborative teamwork, which can significantly control the complexity of development and improve the quality of software.
Inspired by this, the researchers propose a self-collaboration framework for code generation using large models. Specifically, through role instructions, 1) multiple large-scale language models play different "expert" roles, each model is responsible for handling a specific subtask in a complex task; 2) specify the way of cooperation and interaction, so that different roles form A virtual team that helps each other get the job done, eventually working together on code generation tasks without human intervention.
In order to effectively organize and manage this virtual team, the researchers cleverly integrated the waterfall model in the software development methodology into the framework, and formed a basic team consisting of three ChatGPT roles (namely analyst, programmer and tester) , implement the analysis, coding, and testing phases of the software development process.
The experimental results show that compared with directly using large model code generation, the performance of self-collaboration code generation is greatly improved, and even makes GPT-3.5 surpass GPT-4. Furthermore, the researchers show that self-collaboration enables large models to efficiently handle more complex real-world code projects that are often intractable by direct code generation.
Figure 1: The Self-collaboration framework for code generation and its examples.
This research has created a new path of software development using artificial intelligence language models, which closely integrates artificial intelligence with all stages of the software development process, which not only improves development efficiency, but also ensures software quality. By exploiting the potential of LLMs such as ChatGPT, stronger support for inter-model cooperation and interaction can be provided, thereby facilitating the success of virtual teams in handling complex software development tasks. This self-collaboration framework provides a new and more efficient approach to automatic code generation, helping to drive innovation and advancement in the field of software development. Furthermore, this work can serve as a basis for future research on self-cooperative methods in various fields and the development of more advanced and specialized virtual teams for more complex tasks.
The following is a detailed introduction to the Self-collaboration framework and the example of establishing a virtual team according to the software development methodology based on the framework.
Self-collaboration framework
Given a requirement x, perform Self-collaboration with a large model to generate an output y. The task is defined as . The Self-collaboration framework consists of two parts: division of labor and cooperation.
In the division of labor part, researchers use prior knowledge to decompose complex tasks into a series of stages and construct some different roles , which are based on large models and role instructions. Each stage is responsible for one or more roles .
Large models are notoriously sensitive to context because they are trained to predict subsequent words based on previous words. Therefore, a widely used way is to control the generation of large models through instructions or hints. Researchers assign identities and responsibilities to large models using specific types of directives, known as role directives . Specifically, the researchers asked the large model to play a specific role closely related to its responsibilities and convey the detailed tasks that this role should perform.
The advantage of using character commands is that they only need to be provided once at the beginning of the interaction. In subsequent interactions, only intent is conveyed, not a combination of instruction and intent. Therefore, role instructions improve the overall efficiency and clarity of subsequent communication and cooperation.
In the collaboration part, the researchers focus on facilitating efficient interactions between large models that assume different roles within a self-collaboration framework. Each megamodel contributes to the overall task by performing its assigned duties under the direction of its assigned role directives. As the stages progress, large models communicate their outputs with other large models, exchanging information and outputting y.
The output format of the large model can be effectively controlled by using the role instruction. Combined with fundamental aspects of language models, this can initially establish communication between large models.
The cooperation part can be formalized as:
Among them is the output of the stage, indicating the output of the prerequisite stage, indicating the corresponding role. Note that self-collaboration frameworks can be parallelized if the relationship between stages is not linear. Computation is viewed as collaboration, where characters are generated in cooperation with characters from each preceding stage . The output y is iteratively updated as the stages progress:
where f is an update function. To facilitate effective collaboration, the researchers established a shared blackboard from which each role obtains the information it needs to complete its respective tasks . Algorithm 1 presents the complete algorithm of the self-collaboration framework.
instantiate
The researchers introduced the classic waterfall model in software engineering methodology into the self-collaboration framework to make the team collaboration of code generation more efficient. Specifically, the researchers design a simplified waterfall model consisting of three stages of analysis, coding, and testing as an instance of self-cooperative code generation. The workflow of this example follows the waterfall model from one stage to the next, and if problems are found, it goes back to the previous stage for refinement. Therefore, the researchers established a basic team, including analysts, coders, and testers, responsible for the analysis, coding, and testing phases, as shown in Figure 1 (right). These three distinct roles are assigned the following tasks:
Analyst: The analyst's goal is to develop a high-level plan and focus on guiding programmers to write programs, rather than delving into implementation details. Given a requirement x, the analyst decomposes x into several easy-to-solvable subtasks for programmers to implement directly, and develops a plan outlining the major steps of implementation.
Programmer: As the core role of the team, programmers will receive plans from analysts or test reports from testers throughout the development process. Therefore, the researcher assigns two main responsibilities to the programmer through the role description: 1. Write code that meets the specified requirements and follows the plan provided by the analyst. 2. Fix or refine the code, taking into account the test report feedback from the testers. The details of encoder role instructions are shown in Figure 2.
Tester: Testers take code written by programmers and record test reports covering aspects such as functionality, readability, and maintainability. Researchers advocate that models simulate the testing process and generate test reports, rather than generating test cases and then manually testing code by executing them, so as to facilitate interaction and avoid extra work.
The researchers developed role directives for this instance to play the three roles. An example of an encoder role directive is shown in Figure 2. In this case, the role directive includes not only the role description (the role and its responsibilities), but also the team description and user requirements, which together will initialize the ChatGPT agent and thus set the behavior of ChatGPT. The instance only updates the output when the stage is coding , and this development process ends when the testers confirm that the requirements are met.
Experimental results
The researchers compared self-collaboration code generation with various state-of-the-art (SOTA) methods, and experimental results show that the self-collaboration framework significantly improves the performance of the underlying large model. It is worth noting that even with a simple three-person team (including analysts, programmers, and testers), self-collaboration code generation based on ChatGPT (GPT-3.5) achieved the best performance among the four code generation benchmarks. Best performance, even surpassing GPT-4. Given the gap in the underlying large model itself, applying the self-collaboration framework to a more powerful model, such as GPT-4, will yield better results.
The researchers further studied code generation using only natural language descriptions, a setting that is closer to actual software development. Under this setting, the researchers compared the performance of each ChatGPT role in primary teams instantiated by the self-collaboration framework, as shown in Table 2. The experimental results show that compared with only using the programmer role, the performance of the two-person or three-person team is significantly improved.
The researchers also investigated the self-cooperation capabilities of large models at different model sizes. The researchers evaluated the effectiveness of self-collaboration methods on complex tasks, especially those that are challenging for direct code generation. For such tasks, researchers employ self-collaboration strategies as solutions. As shown in Figure 6, with the expansion of the model scale, the coding ability of the large model usually shows an increasing trend, and the self-cooperation ability starts to appear at around 7B parameters, and then continues to improve. Experimental results show that self-cooperation can help stimulate the latent intelligence of large models.
Furthermore, the researchers demonstrate an example of self-collaboration code generation, as shown in Figure 4. In this test report, the tester pointed out that the implemented code may cause duplicate elements to be removed from the list, which may cause some edge test cases to fail. Therefore, it is recommended to remove the line "lst = list (set (lst))" from the implemented code. The programmer then removed the line "lst = list (set (lst))" based on feedback in the test report. In the last interaction, the tester confirms that the modified code has passed all tests and meets the requirements, and the code generation process ends.
The researchers also applied the self-collaboration framework to two examples of more complex real-world coding projects, game development and web page production, as shown in Figures 5 and 9. Self-collaboration can produce complete game logic and a satisfactory game interface. For the development of weather forecast web pages, it can also correctly call the external weather interface to realize all functions. However, direct code generation does not cover all required functions and has bugs, which is not effective.
In conclusion, the self-collaboration framework exhibits significant performance gains in code generation tasks, and multi-role teams are able to deal with various problems and challenges more effectively than single-role teams. This approach provides a new research direction in the field of natural language processing and code generation, which is worthy of further exploration and optimization. Future work may include the exploration of more roles and more powerful models, and the application of the self-collaboration framework to other natural language processing tasks.
in conclusion
In this work, the researchers propose a self-collaboration framework, which aims to enhance the problem-solving ability of large models through collaborative and interactive methods. Specifically, the researchers explore the potential of ChatGPT in facilitating team-based code generation and collaboration in the software development process. To this end, the researchers assembled an elementary team of three different ChatGPT roles with the aim of comprehensively addressing the code generation task. To evaluate the effectiveness and generalization performance of our self-collaboration framework, we conduct extensive experiments on various code generation benchmarks. Experimental results provide substantial evidence supporting the effectiveness and generalizability of the self-collaboration framework. The researchers believe that enabling models to form their own teams and cooperate to complete complex tasks is a key step towards AGI. In the future, this research technology will also be directly applied to the products of aiXcoder (an intelligent software development system based on large code models).