Science | University of Washington Baker team proposes new AI paradigm to design new protein complexes

The structural form and biological function of proteins are determined by the amino acid sequence. The goal of artificial protein design is to create novel amino acid sequences that can fold into specific structures to achieve specific functions. Of course, this is no simple question, as it requires understanding how proteins fold in cells, a process that remains largely unknown to biophysicists. In recent years, recent advances in artificial intelligence and deep learning technology have allowed computational biologists to use neural networks to establish quantitative relationships between protein sequences and structures. At the same time, due to the development of artificial intelligence, artificial protein design has also made great progress.

Professor David Baker's team at the University of Washington recently published a research paper titled "Top-down design of protein architectures with reinforcement learning" in the journal Science. This study proposes a new paradigm of "top-down" protein design, thereby developing a protein design software based on reinforcement learning and demonstrating its ability to create functional higher-order protein complexes. This breakthrough will usher in a new era of protein design, with positive impacts on cancer treatment, regenerative medicine, powerful vaccines and biodegradable daily necessities.

Protein design principles and new paradigms

The work published by the Baker laboratory is mainly based on two AI-based tools, one is ProteinMPNN for protein design previously developed by the Baker laboratory, and the second is used for protein structure prediction developed by the Google DeepMind team 2 years ago. AlphaFold2.

While there have been many successes in the field (including Baker's group) in designing individual protein folding units, this paper by Baker and colleagues attempts to address the challenge of designing protein-protein complexes containing many symmetric chains. This natural symmetry is why many biological viruses form their protein shells, called capsids, to perform specific functions.

Most of the previous studies on protein complexes designed protein complexes by first designing individual component chains and then assembling the component chains into symmetrical complex structures. One problem with this so-called bottom-up design paradigm is that the monomer design process cannot take into account the symmetry of the final composite, which can lead to imperfect shape matching of the monomer design.

The main technical innovation of this work from Bake's lab is the simultaneous design of the components and global symmetry of the complex through a process called Monte Carlo Tree Search (MCTS). This is what the article calls a top-down ("top-down") design paradigm, which will help improve the efficiency and quality of design complexes and may lead to tight packaging of design units.

The top-down protein design paradigm proposed by the Baker laboratory 

Design protein complexes with specific high symmetry

In recent years, artificial intelligence (Artificial Intelligence, AI) has shone in many fields and penetrated deeply into our daily lives. From AlphaGo in the Go field to AlphaFold that predicts protein structures, from AI painting to the popular ChatGPT, artificial intelligence, as an emerging disruptive technology, is gradually releasing the huge energy accumulated by the technological revolution and industrial transformation, and will Profoundly change human life and way of thinking.

AlphaGo's ability to defeat top human professional Go players relies on a machine learning system called reinforcement learning, in which computer programs learn how to make the most correct decisions by constantly trying things and receiving feedback at the same time.

Back to protein design, if proteins are compared to Go sets, then protein domains are like Go sets. From this point of view, artificial intelligence software based on reinforcement learning can also be applied to the de novo design of proteins - through extensive training, a powerful new protein design software is finally obtained.

Protein nanoparticles with natural symmetry designed by a top-down design paradigm.​ 

In order to create such an AI software that can be used for protein design, Baker's team input the sequence and structure information of millions of simple proteins into the computer. Then, the AI ​​software made tens of thousands of attempts and made feedback improvements each time. , in order to achieve the predetermined goal-to design new proteins from scratch. In this process, the computer stretches or bends proteins in specific ways until it learns how to fold them into the desired shape.

The research team designed hundreds of proteins through this reinforcement learning software and performed gene cloning, protein expression and structural determination in the laboratory. To measure the accuracy of the software, they measured the actual structures of these AI-designed proteins using equipment such as electron microscopy and found that they were very consistent with the protein structures predicted by the software.

The research team focused on designing new nanoscale structures composed of many protein molecules, which required that the proteins they designed have chemical interfaces that allow the self-assembly of the nanostructures. So the research team looked at the nanostructure of the AI-designed protein and found that every atom in it was in a predetermined position. In other words, this kind of reinforcement learning software has the ability to design with atomic precision, with the deviation between the expected and actually realized nanostructures being smaller on average than the width of a single atom.

In addition, the research team also demonstrated through primary cell models of blood vessel cells that this reinforcement learning software can also optimize protein scaffold structures. For example, by packing cell receptors more densely onto a more compact scaffold, it could be more effective at promoting blood vessel stability.

Cryo-electron microscopy imaging results show that the experimental structure of the computer-designed protein complex is highly consistent with the originally designed structure. 

Professor David Baker, corresponding author of the Science paper, said the research shows reinforcement learning can do more than just master board games. While trained to solve long-standing puzzles in protein science, it also excels at creating useful protein molecules. If this approach is applied to the right research questions, it can accelerate progress in a variety of scientific fields.

Summary and comments

Overall, the main innovation of this work is to propose a new paradigm for protein complex design, that is, when designing protein complexes, the monomer structure and the higher-order symmetry between monomer structures are simultaneously considered. Apart from this, the other tools used in this work (ProteinMPNN and AlphaFold2) have been published previously. The concept of reinforcement learning used also comes from the previously proposed Monte Carlo tree search (MCTS) algorithm (--the essence of this algorithm is considered to be a type of reinforcement learning). Nonetheless, applying this design paradigm to the design of high-quality protein complexes, such as artificial viral capsids, has many important biomedical applications. In addition to its design applications as signaling and vaccine proteins, this work could also be used to aid gene therapy.

As we all know, the goal of gene therapy is to modify a patient's genes to treat or cure a disease, and a key step in gene therapy is to safely deliver engineered genetic cargo to target cells. Most gene therapy methods use adeno-associated viruses (AAV) as gene vectors. But AAV is a naturally occurring virus that humans are frequently exposed to, and many patients carry antibodies against AAV. The methods reported in this work from the Baker lab could also be used to redesign new AAV-like cages, which would be very helpful for safer gene delivery and could greatly improve the efficiency of gene therapy.

references

I. Lutz et al. Top-down design of protein architectures with reinforcement learning. Science,  Apr 20, 2023. Vol 380, Issue 6642. pp. 266-273.

(https://www.science.org/doi/10.1126/science.adf6591).

​​​​​​

Guess you like

Origin blog.csdn.net/weixin_4528312/article/details/130305839