Peking University used ChatGPT to set up a development team: the large model played multiple roles and collaborated to complete software development tasks

55caa05c55018f67ae1917e357db706f.gif

"A person's ability is limited, but a team's strength is unlimited", this sentence is vividly reflected in the software development in the real world. For complex tasks, people use teamwork strategies to solve them. But does the same pattern apply in the AI ​​world?

In April this year, the team of Professor Li Ge of Peking University proposed a new self-collaboration (self-collaboration) model . It allows multiple large models to play different roles and form a software development team without human participation. Through the cooperation and interaction between large models, the entire software development process can be completed autonomously, even including some complex code generation tasks.

969cba78ef26a46003a120912ae63802.png

Paper link: https://arxiv.org/pdf/2304.07590.pdf

Paper publication time: 2023/4/15

Although large language models (referred to as: large models) have demonstrated amazing capabilities in code generation, there are still challenges in handling complex tasks. In the real software development process, people usually solve complex tasks through the strategy of collaborative teamwork, which can significantly control the complexity of development and improve the quality of software.

Inspired by this, the researchers propose a self-collaboration framework for code generation using large models. Specifically, through role instructions, 1) multiple large-scale language models play different "expert" roles, each model is responsible for handling a specific subtask in a complex task; 2) specify the way of cooperation and interaction, so that different roles form A virtual team that helps each other get the job done, eventually working together on code generation tasks without human intervention.

In order to effectively organize and manage this virtual team, the researchers cleverly integrated the waterfall model in the software development methodology into the framework, and formed a basic team consisting of three ChatGPT roles (namely analyst, programmer and tester) , implement the analysis, coding, and testing phases of the software development process.

The experimental results show that compared with directly using large model code generation, the performance of self-collaboration code generation is greatly improved, and even makes GPT-3.5 surpass GPT-4. Furthermore, the researchers show that self-collaboration enables large models to efficiently handle more complex real-world code projects that are often intractable by direct code generation.

1d1ade5051de0b1534210dcca19e4e56.png

Figure 1: The Self-collaboration framework for code generation and its examples.

This research has created a new path of software development using artificial intelligence language models, which closely integrates artificial intelligence with all stages of the software development process, which not only improves development efficiency, but also ensures software quality. By exploiting the potential of LLMs such as ChatGPT, stronger support for inter-model cooperation and interaction can be provided, thereby facilitating the success of virtual teams in handling complex software development tasks. This self-collaboration framework provides a new and more efficient approach to automatic code generation, helping to drive innovation and advancement in the field of software development. Furthermore, this work can serve as a basis for future research on self-cooperative methods in various fields and the development of more advanced and specialized virtual teams for more complex tasks.

The following is a detailed introduction to the Self-collaboration framework and the example of establishing a virtual team according to the software development methodology based on the framework.

be3475cc61fc530ce664c10e99591c54.png

Self-collaboration framework

Given a requirement x, perform Self-collaboration with a large model to generate an output y. The task is defined as 7a7e611b2d387932727c59517ad54283.png. The Self-collaboration framework consists of two parts: division of labor and cooperation.

In the division of labor part, researchers use prior knowledge to decompose complex tasks into a series of stages  3b43689519275470fe1fa75d70ce39da.pngand construct some different roles 5f4610a04ec1b40aa486ee9048072840.png, which are based on large models and role instructions. Each stage  is responsible 74544d912f85d62304dae57663ba8f11.pngfor one or more roles  699280185673841f10fb8a47a1e851f6.png.

Large models are notoriously sensitive to context because they are trained to predict subsequent words based on previous words. Therefore, a widely used way is to control the generation of large models through instructions or hints. Researchers assign identities and responsibilities to large models using specific types of directives, known as role directives . Specifically, the researchers asked the large model to play a specific role closely related to its responsibilities and convey the detailed tasks that this role should perform.

The advantage of using character commands is that they only need to be provided once at the beginning of the interaction. In subsequent interactions, only intent is conveyed, not a combination of instruction and intent. Therefore, role instructions improve the overall efficiency and clarity of subsequent communication and cooperation.

In the collaboration part, the researchers focus on facilitating efficient interactions between large models that assume different roles within a self-collaboration framework. Each megamodel contributes to the overall task by performing its assigned duties under the direction of its assigned role directives. As the stages progress, large models communicate their outputs with other large models, exchanging information and outputting y.

The output format of the large model can be effectively controlled by using the role instruction. Combined with fundamental aspects of language models, this can initially establish communication between large models.

The cooperation part can be formalized as:

a2510d1ac3c7b80311cf9a4245de4d54.png

Among them  e2bfcfd2aafb7b000523dfa3473efc32.pngis c821fa223bfdf28b0cc283250902ac71.pngthe output of the stage, 9838805bc8c7b59d21b2ff7362b59c62.pngindicating  18bd111501f313e40658aa852f4cedc8.pngthe output of the prerequisite stage, 9027dc474a2f2f6371d721f1ac2ce0f6.pngindicating  24ffd05b26dcf0879c4f7ecac6354f81.pngthe corresponding role. Note that  2d27548cb293523dd3613929ff45ff28.pngself-collaboration frameworks can be parallelized if the relationship between stages is not linear. Computation  db14eddaca3c34c8935bf564c2db75b2.pngis viewed as collaboration, where characters  08745d41b291fa547031531bec304cd0.png are generated in cooperation with characters from each preceding stage  28e29562599dfa13a5d8e579a7797cef.png. The output y  2f1e3eb88266b9114b0d45e17acf1f1c.pngis iteratively updated as the stages progress:

597d24600571f5b79d1882c4541c963f.png

where f is an update function. To facilitate effective collaboration, the researchers established a shared blackboard from which each role obtains the information it needs to complete its respective tasks  11dd5ea2c89f2abc9eebf9a02e16f8b2.png. Algorithm 1 presents the complete algorithm of the self-collaboration framework.

622c9fbf4da3405a55a64a09d393ee92.png

79c2f1534220ecafec7ad07ea55415c4.png

instantiate

The researchers introduced the classic waterfall model in software engineering methodology into the self-collaboration framework to make the team collaboration of code generation more efficient. Specifically, the researchers design a simplified waterfall model consisting of three stages of analysis, coding, and testing as an instance of self-cooperative code generation. The workflow of this example follows the waterfall model from one stage to the next, and if problems are found, it goes back to the previous stage for refinement. Therefore, the researchers established a basic team, including analysts, coders, and testers, responsible for the analysis, coding, and testing phases, as shown in Figure 1 (right). These three distinct roles are assigned the following tasks:

Analyst: The analyst's goal is to develop a high-level plan and focus on guiding programmers to write programs, rather than delving into implementation details. Given a requirement x, the analyst decomposes x into several easy-to-solvable subtasks for programmers to implement directly, and develops a plan outlining the major steps of implementation.

Programmer: As the core role of the team, programmers will receive plans from analysts or test reports from testers throughout the development process. Therefore, the researcher assigns two main responsibilities to the programmer through the role description: 1. Write code that meets the specified requirements and follows the plan provided by the analyst. 2. Fix or refine the code, taking into account the test report feedback from the testers. The details of encoder role instructions are shown in Figure 2.

Tester: Testers take code written by programmers and record test reports covering aspects such as functionality, readability, and maintainability. Researchers advocate that models simulate the testing process and generate test reports, rather than generating test cases and then manually testing code by executing them, so as to facilitate interaction and avoid extra work.

The researchers developed role directives for this instance to play the three roles. An example of an encoder role directive is shown in Figure 2. In this case, the role directive includes not only the role description (the role and its responsibilities), but also the team description and user requirements, which together will initialize the ChatGPT agent and thus set the behavior of ChatGPT. The instance only 3d70dd01a684af1260c19437730ce4f7.pngupdates the output when the stage is coding 237c617c9ef1c92cefa8e35c1f5ef691.png, and this development process ends when the testers confirm 8bcb14dd002ab886ff34d2b8f14ca53b.pngthat the requirements are met.

2162549f9000c2ee7ced8303df607c31.png

5b2a67008c101a1676128f6c63e0137c.png

Experimental results

7e365858f3d8a6f08f66fab097cbd4f0.png

The researchers compared self-collaboration code generation with various state-of-the-art (SOTA) methods, and experimental results show that the self-collaboration framework significantly improves the performance of the underlying large model. It is worth noting that even with a simple three-person team (including analysts, programmers, and testers), self-collaboration code generation based on ChatGPT (GPT-3.5) achieved the best performance among the four code generation benchmarks. Best performance, even surpassing GPT-4. Given the gap in the underlying large model itself, applying the self-collaboration framework to a more powerful model, such as GPT-4, will yield better results.

7974beb2cec583896896afe09941f2f6.png

The researchers further studied code generation using only natural language descriptions, a setting that is closer to actual software development. Under this setting, the researchers compared the performance of each ChatGPT role in primary teams instantiated by the self-collaboration framework, as shown in Table 2. The experimental results show that compared with only using the programmer role, the performance of the two-person or three-person team is significantly improved.

016d4abdb3f6b8690c46e2c1baab9900.png

The researchers also investigated the self-cooperation capabilities of large models at different model sizes. The researchers evaluated the effectiveness of self-collaboration methods on complex tasks, especially those that are challenging for direct code generation. For such tasks, researchers employ self-collaboration strategies as solutions. As shown in Figure 6, with the expansion of the model scale, the coding ability of the large model usually shows an increasing trend, and the self-cooperation ability starts to appear at around 7B parameters, and then continues to improve. Experimental results show that self-cooperation can help stimulate the latent intelligence of large models.

7ca80efd56b492b9f0af79540cfa5942.png

Furthermore, the researchers demonstrate an example of self-collaboration code generation, as shown in Figure 4. In this test report, the tester pointed out that the implemented code may cause duplicate elements to be removed from the list, which may cause some edge test cases to fail. Therefore, it is recommended to remove the line "lst = list (set (lst))" from the implemented code. The programmer then removed the line "lst = list (set (lst))" based on feedback in the test report. In the last interaction, the tester confirms that the modified code has passed all tests and meets the requirements, and the code generation process ends.

7f2f7a95b6bdb314094e901260811c4a.png

ab2a0025366757f1d7232d95a196610c.png

The researchers also applied the self-collaboration framework to two examples of more complex real-world coding projects, game development and web page production, as shown in Figures 5 and 9. Self-collaboration can produce complete game logic and a satisfactory game interface. For the development of weather forecast web pages, it can also correctly call the external weather interface to realize all functions. However, direct code generation does not cover all required functions and has bugs, which is not effective.

In conclusion, the self-collaboration framework exhibits significant performance gains in code generation tasks, and multi-role teams are able to deal with various problems and challenges more effectively than single-role teams. This approach provides a new research direction in the field of natural language processing and code generation, which is worthy of further exploration and optimization. Future work may include the exploration of more roles and more powerful models, and the application of the self-collaboration framework to other natural language processing tasks.

ae043b06d4ecf9173ff5481cab13e81b.png

in conclusion

In this work, the researchers propose a self-collaboration framework, which aims to enhance the problem-solving ability of large models through collaborative and interactive methods. Specifically, the researchers explore the potential of ChatGPT in facilitating team-based code generation and collaboration in the software development process. To this end, the researchers assembled an elementary team of three different ChatGPT roles with the aim of comprehensively addressing the code generation task. To evaluate the effectiveness and generalization performance of our self-collaboration framework, we conduct extensive experiments on various code generation benchmarks. Experimental results provide substantial evidence supporting the effectiveness and generalizability of the self-collaboration framework. The researchers believe that enabling models to form their own teams and cooperate to complete complex tasks is a key step towards AGI. In the future, this research technology will also be directly applied to the products of aiXcoder (an intelligent software development system based on large code models).

7679c8c2bb544d23ee8de2b17dd2ea3e.gif

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/131989685