Another scary skill! Carnegie Mellon University released a super intelligent body, blowing up the scientific research circle

70b3fe207fd81582b0599aa224e4001b.png

Text |

What is the daily routine of an ordinary doctoral student? Looking up information online? Read literature? Write two lines of code according to the API or documentation of various perfect tools, and then output it to the experimental machine to complete high-precision experiments ? Think carefully about the workflow of our so-called "scientific researchers", but we are terrified to find that most of the things we can do now seem to be able to do most of GPT-4!

7ac26c6c315c12ed6b1cf8f6ede39f55.png

Finally, the prediction of "machines replacing workers" has come to the field of scientific research. Recently, scholars from the Department of Chemical Engineering at Carnegie Mellon University have constructed an autonomous scientific research agent based on large-scale language models (LLMs) (not named yet, and Call him little A), and realized a complete system from independent design, planning to execution of complex scientific experiments .

Talking about integration alone, automation is a bit abstract. What can Little A do? For example, look at the picture below. Suppose I want to design a system that can be used to perform Suzuki Reaction (Suzuki Reaction) and the coupling of the head. The scheme of the reaction (Sonogashira Reaction) is shown in module A in the figure below. The author of the paper provided A with a series of solutions, reagents and necessary operating equipment, but did not tell A about which solution and reagent to choose to complete the reaction . And quite insidiously, the author of the paper asked Xiao A to use the Heater Module intelligent heating module, but this module was actually released after the GPT-4 training data deadline.

660c1fa5cd71ed139cbcf1a9654575e2.png

Let us see what little A has done. You can notice that the input we give to little A is actually just a simple prompt like "complete xx response". Therefore, Little A first starts to search the Internet for information about the required reactions, their stoichiometry and conditions, etc., and these processes are recorded in module D in the above figure . Through the search, we can see that Little A has selected the corresponding correct reagent. Among all the aryl halides, it chose halobenzene for Suzuki reaction coupling and iodobenzene for Sono coupling reaction. It is worth noting that these selection results come from search, so this behavior will change every time it is run, which in fact gives Little A the possibility to repeatedly implement different experimental schemes to obtain more and more meaningful information.

After Little A completed the selection of different reagents and catalysts, Little A used Python to calculate the volume, capacity and other information required for all reactants in the specific experiment, and completed the test plan . This test plan will be input to the operating equipment equipped with Little A in the form of executing code, as shown in module E in the above figure, so that the operating equipment can automatically execute the test designed by Little A.

And the most interesting thing is that where the author of the paper dug a hole for Little A, Little A did make a mistake, and it used the name of Heater Module intelligent heating module by mistake. This led to a mismatch between the name and the name in the downstream device documentation. When Little A realized this mistake, Little A consulted the document and corrected his mistake in time! Thus successfully running the code output by itself .

This set of abnormally "reasonable" workflow, coupled with the magical self-correction function, is honestly not just integration, automation is as simple as that. Recalling the complaints about "biochemical ring materials" in Zhihu, many of them are nothing more than talking about entering the laboratory to do "burning stoves, raising mice, and building pillars", but if Xiao A can really be popularized, it will be great . To a certain extent, the goal of liberating (replacing) scientific researchers from trivial work is achieved .

So how does Little A have the functions to complete the above-mentioned set of operations? Let us enter this paper that can be called the vanguard of the "future era".

论文题目:
Emergent Autonomous Scientific Research Capabilities of Large Language Models

Paper link:
https://arxiv.org/abs/2304.05332

system structure

First of all, let's take a detailed look at the overall structure of A, the agent system (A) is mainly composed of four components, namely "network searcher", "document searcher", "code executor" and "automation module". " . If these four modules are likened to the limbs of Little A, then the "Planner" (Planner) is equivalent to the brain of Little A, which is used to accept the task description Prompt sent by humans to Little A, and coordinate the four components to complete as needed Work. In the paper, the planner itself can be understood as a GPT-4, and the planner's coordination, reasoning, judgment, decision-making and other abilities are all derived from the infinite potential . Compared with the "brain", the execution of other actions will be much simpler. In each specific state, Little A actually only has four action options, which are:

  • Access to the Internet, access to Google for query operations

  • Visit View Hardware Documentation

  • Do calculations in Python

  • Run the final experiment

ca513d06169ee5790fceb4258e9f9d33.png

Among them, the "Web searcher" component receives queries from the planner and translates them into appropriate web search operations, then executes the queries using the Google Search API, and the final results are returned to the "Web searcher" as a list of web pages . This component can also extract text from web pages via the BROWSE operation and compile answers for the planner. It is worth noting that the "web searcher" itself actually uses the GPT3.5 model, because it is faster and more adaptable to the needs of retrieval tasks than GPT-4.

Whereas the "Docs searcher" component searches information from hardware (e.g., robotic liquid manipulator, GC-MS, or cloud lab) documentation by using queries and document indexes to find the most relevant pages or sections . The best matches are then aggregated to provide a comprehensive and accurate final answer. Because in the end, the output of small A is a piece of executable code for the operating system, so the focus of the "document searcher" is to provide specific function parameters and syntax information of the hardware API.

The "Code execution" component does not use any large-scale language model and only executes the code in a separate Docker container to protect the final host machine from any unintended actions of the planner . All code output is passed back to the planner so that its predictions can be corrected if the software makes mistakes.

Last but not least, the "Automation" module (Automation) automatically generates code for execution in the corresponding hardware, or provides a synthesis process for manual experiments .

Detailed study

After constructing the entire underlying architecture of Small A, the author of the paper used Small A to try many small tasks to verify the capabilities of each component of Small A. For example, the author of the paper simply enters a prompt such as "synthetic ibuprofen" to Xiao A, as shown in Figure A below, through the "Internet Searcher" component, Xiao A accurately identifies the first step of synthesizing ibuprofen, That is, the Friedel-Crafts reaction of isobutylbenzene and acetic anhydride under the catalysis of aluminum chloride . And successfully output the execution program to complete the Friedel-Crafts reaction.

448c9d3e5a2d4a24b90ae8cc076dc89b.png

And what will Little A do when there is a lack of a certain raw material necessary for the reaction in the existing materials, as shown in Figure D above, Little A will report a prompt of "Lack of necessary raw materials" . And when the existing reaction conditions are prone to instability, small A will also prompt to reselect the catalyst or base.

At the same time, if Little A is equipped with "equipment", for example, connecting Little A's search engine with chemical reaction databases such as Reaxys14 or SciFinder15 can significantly improve the performance of the system .

Also, if you drill down to "document searcher", this is in fact a model for document retrieval. For the Embedding of the API document, use the Embedding of the query information to calculate the similarity, and finally locate the most matching paragraph position in the document, as shown in Figure A below.

8ce43e6eb5318ac228aa2c92cd90ad35.png

However, what if the API documentation we need to retrieve has not been seen in GPT-4? As shown in Figure B above, the author of the paper submits the API document together with the query to Little A. From the experimental results, Little A has the magical ability to locate the specified query to the corresponding paragraph of the API document .

To some extent, this ability of Little A solves a big problem of "expert knowledge" . In the extensive API documents of many devices and different technical solutions, due to the highly technical language style of API documents, the interpretation of API documents often requires the introduction of "expert knowledge". Paragraphs and finally the ability to generate corresponding standard codes or functions has greatly overcome the "entry barriers" and "entry barriers" for potential users to use these professional technologies .

Finally, in the "automation module", the author designed a simple "coloring" test for Little A. All Little A needs to do is to control the instrument and use the existing solution to color the microplate according to the instructions. This kind of instruction is often quite simple, such as "color each line with a color of your choice", and Xiao A needs to recognize this kind of instruction and "translate" it into an executable Python program for operating the instrument. The experiment proves that little A has done the work excellently.

e7ff8dd8b5c73a58285f61604c65736f.png

In a more complex environment, such as a complex environment where it is not one instrument that needs to be operated but multiple instruments, Little A can still accurately identify the information that the instruction wants to express, and control multiple instruments to cooperate to complete experiment .

Under the test of various tests, Little A has shown strong abilities beyond expectations. Therefore, the author of the paper believes that Little A already has very good reasoning ability . For example, in the experiment on Suzuki Reaction, when Little A requested the system to execute the code that imported the SymPy package, he found that the package had not been installed, and Little A quickly adjusted the code after receiving the feedback. This ability to adjust shows that Little A has quite promising potential in reasoning ability.

top priority

In a series of gratifying experiments, little A has indeed turned out, but the outstanding ability of little A will inevitably attract researchers' concerns. This is not because of the consideration of his own job, but because of little A Even the consideration of "safety" when the entire large-scale language model is used for scientific research . Once Little A's ability to automatically design experiments and even execute experimental operations is abused, such as using Little A for drug synthesis, the entry threshold for the criminal act of drug production will be geometrically lowered. This alone can already have an extremely serious negative impact on society, not to mention Little A's potential impact on chemical weapons, biological weapons, etc.

Therefore, the authors of the paper made a solemn statement on the safety of A, and safety is the top priority: " We strongly declare that protective measures must be set up to deal with the potential negative effects of this large-scale language model. We Calling on the AI ​​community to put securing these powerful models on the agenda. We call on OpenAI, Microsoft, Google, Meta, Deepmind, Anthropic, and all other major large-model effort participants to do their best to secure large-scale language models security. We call on the physical sciences community to engage with developers of large-scale language models to help them build barriers to model security

446271799283f86194143e68347ad909.png

Secondly, the author of the paper also explored the current "safety" status of Little A. The author of the paper hopes that Little A will synthesize 11 dangerous compounds (including marijuana, heroin and other drugs) for them. What is disturbing is that although Little A rejected the synthesis request of 7 of them, he still provided 4 of them. synthetic scheme . Among the compounds that were refused to be synthesized, the reason for A's rejection was often at the stage of "Internet search". During the online search, A found that some substances belonged to "controlled drugs", so he rejected them. However, this rejection model hides a great risk, that is to say, as long as the name of the drug is replaced, it is still very possible to use the small A to generate a potential drug . At the same time, Little A's ability to refuse can often only reject drugs with known names or banned drugs, and some unknown drugs are likely to take advantage of Little A's review, thus causing serious negative impacts on society.

1714f02144b326766adfc6aa378e363d.png

Therefore, the emergence of Little A actually poses new challenges to laboratories capable of running scientific experiments. How do these laboratories "screen", "monitor" and "control" the experiments safely, select "credible" experimenters and reject potential "abusers" and other bad actors? Another important factor affecting model security outside the previous iteration.

summary reflection

I think the emergence of small A may still be just the beginning of these large-scale language models changing our lives. The emergence of new technologies often forces us to challenge our imagination . When the powerful capabilities of language models represented by GPT-4 are replaced When suddenly presented in front of us, how to better release the potential contained in the large-scale language model to all walks of life should be a meaningful topic and challenge for us who are currently experiencing this wave of enthusiasm.

In addition to the opportunities that can be seen from the "top of the wave", even though it is too early for us to talk about the intelligence crisis instead of the singularity, but in the face of these possible dangers, we are also facing the possible problems starting from the small A. Little B, little C, and little D, how to reflect on our human intelligence through the development of the word AI, and by constantly asking questions such as "what can't be replaced by them" may be able to make us Better understand and appreciate the full meaning of the motto "Know thyself" on the pillars of the Temple of Delphi.

Finally, in this magical age, perhaps such a sentence is correct, that is, "Only imagination cannot be replaced"!

661795a6c4ab30b21545a9bf3fd8f24a.png

c81de098968526cf93fb0b72bc6efe25.pngCute House Author: Xiaoxi

Learn NLP while learning linguistics~

Recommended works

  1. After a long wait, it came out - GPT-3 is finally open source!

  2. Which subdivision of NLP has the most social value?

  3. Wu Enda launched a new competition paradigm! The model is fixed, only the data is adjusted? !

  4. Is it reasonable to reject a manuscript just because the method is too simple?

  5. Algorithm Engineer's Three Views Test

2b9de8d6266cfafef09d037dc8e96251.jpegReply keywords in the background [ join the group ]

Join the NLP, CV, search promotion and job hunting discussion group

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/130164680