PSP - Amber Relax anomaly and MSA search anomaly in AlphaFold2

Welcome to my CSDN: https://spike.blog.csdn.net/
This article address: https://spike.blog.csdn.net/article/details/131533695

Amber

The Amber Minimization algorithm of protein structure is a method for optimizing the three-dimensional conformation of protein. Based on the molecular force field theory, the potential energy of the protein reaches the minimum value by iteratively adjusting the atomic coordinates. The Amber Minimization algorithm is divided into two steps:

  • The first step is Steepest Descent, which moves the atoms in the opposite direction of the potential energy gradient until a local minimum is reached or the maximum number of steps is exceeded;
  • The second step is Conjugate Gradient, that is, using the gradient information of the previous two times to calculate a better moving direction, so as to jump out of the local minimum and find the global minimum.

The Amber Minimization algorithm can effectively remove unreasonable bond lengths, bond angles and van der Waals forces in protein structures, improving the stability and reliability of protein structures.

1. Amber Relax abnormality

Test Case, T1104-D1_A117.fasta, Configuration CPU 10, Memory 40Gi:

/nfs_baoding/chenlong/workspace_v2/af2-by-chenlong/mydata/casp15-fasta-idr2-56/T1104-D1_A117.fasta
# fasta
>A
XXEDSEVEAVAKGLEEMYANGVTEDNFKNYVKNNFAQQEISSVEEELNVNISDSCVANKIKDEFFAMISISAIVKAAQKKAWKELAVTVLRFAKANGLKTNAIIVAGQLALWAVQCG

Abnormal: No progress relaxed, use the test command, turn on the GPU, and set the number of models ( -l 1) to 1:

bash run_alphafold.sh -o mydata/results_v2/casp15-fasta-56-af2-idr2-outputs/ -f mydata/casp15-fasta-idr2-56/T1104-D1_A117.fasta -g true -r true -e true -l 1

log:

command args: --fasta_paths=mydata/casp15-fasta-idr2-56/T1104-D1_A117.fasta --output_dir=mydata/results_v2/casp15-fasta-56-af2-idr2-outputs/ --max_template_date=2022-04-01 --db_preset=full_dbs --model_preset=monomer --benchmark=false --use_precomputed_msas=true --num_multimer_predictions_per_model=1 --use_gpu_relax=true --logtostderr --use_saved_msa=false --run_only_msa=false --use_no_template=false

output:

error: 'Amber minimization can only be performed on proteins with well-defined residues. This protein contains at least one residue with no atoms.' for T1104-D1_A117

Reason: Because the unknown amino acid X appears in the protein sequence, the predicted structure cannot be used Amber Minimization. It is recommended to use A or G instead of X.

2. MSA search exception

When running AlphaFold2 for protein structure prediction, it needs to consume a lot of memory, so it is recommended to infer MSA first, and then perform structure prediction. The overall inference speed can be improved while avoiding bfd_uniref_hits.a3msearch exceptions, that is, adding -h trueparameters:

nohup bash run_alphafold.sh -o mydata/results_v2/casp15-fasta-56-af2-idr1-outputs/ -f mydata/casp15-fasta-idr1-56/ -h true > casp15.idr1-msa-only.out &
nohup bash run_alphafold.sh -o mydata/results_v2/casp15-fasta-56-af2-idr2-outputs/ -f mydata/casp15-fasta-idr2-56/ -h true > casp15.idr2-msa-only.out &

Bug1: 遇到Bug “UserWarning: Flag xxx has a non-None default value”

Right now:

UserWarning: Flag --use_gpu_relax has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!

Reason: use_gpu_relaxThe default value of the flag needs to be set to None, and cannot be set to True or False.

Bug2: 遇到Bug “MSA 1 must contain at least one sequence.”

Reason: T1106s2-D1_A111_to_A107MSA search exception in , causing interruption with empty file bfd_uniref_hits.a3m:

casp15-fasta-56-af2-idr1-outputs/T1106s2-D1_A111_to_A107/msas  # MSA 文件夹

Solution, modify the file alphafold/data/pipeline.py:

...
bfd_out_path = os.path.join(msa_output_dir, 'bfd_uniref_hits.a3m')

# BugFix: MSA 1 must contain at least one sequence.
if os.path.isfile(bfd_out_path) and os.path.getsize(bfd_out_path) == 0:  # 判断文件是否为空
  logging.info('[CL] File bfd_uniref_hits.a3m is empty, remove it to avoid bug.')
  os.remove(bfd_out_path)

hhblits_bfd_uniref_result = run_msa_tool(
    msa_runner=self.hhblits_bfd_uniref_runner,
    input_fasta_path=input_fasta_path,
    msa_out_path=bfd_out_path,
    msa_format='a3m',
    use_precomputed_msas=self.use_precomputed_msas)
...

Run the inference file again, you can:

nohup bash run_alphafold.sh -o mydata/results_v2/casp15-fasta-56-af2-idr1-outputs/ -f mydata/casp15-fasta-idr1-56/ > casp15.idr1.out &

At the same time, it is recommended to use a large-memory machine to search the MSA first, and then infer the structure. Perfect solution.

Guess you like

Origin blog.csdn.net/u012515223/article/details/131533695