Multiple Sequence Alignment & achievements

Links
http://journals.sagepub.com/doi/10.3181/0903-MR-94 (Minireview coronavirus)
http://www.biotrainee.com/thread-2253-1-1.html (phylogenetic tree related)
https://blog.csdn.net/Cccrush/article/details/90695891 (details several evolutionary tree construction methods and principles)

Target species and sequence

Species: coronavirus in seven kinds of viruses capable of infecting human
origin sequence: the NCBI Ref sequence has been published, we are only using one of the six species.

Related Seq list

Related seq_list

Multiple sequence alignments of the principles and methods

Multiple sequence alignment algorithm

Related tools

  1. ClustalX / W (the former is a graphical interface, which is the command line interface)
  2. T-Coffee tool
  3. MultAlin Tools
  4. MAFFT Tools
  5. MEGAX Tools (Common)

Several methods of achievements

  1. Packet unweighted average method (unweighted pair Group Method with Arithmetic Mean, UPGAM )
  2. Minimum evolutionary method (Minimum Evolution, ME )
  3. The least square method (Least Squares, LS )
  4. Neighbor-joining (neighbor-joining, NJ )

In fact, the above four methods belong distance method , i.e. by calculating the evolutionary distance between species as the basis for the contribution.
In fact there is a class rule achievements: Character-based Methods characteristics method , where the first jump in the past, looking at the future (digging ing).

Actual operation

Muscle&ClustalW

Several tools in the above EBI on the site has published (in fact, there is also a lot of tools you can implement multiple sequence alignment), we use them MUSCLE method + ClustalW method + MAFFT method to directly obtain a final contribution result.

Links:
https://www.ebi.ac.uk/Tools/msa/muscle/
https://www.ebi.ac.uk/Tools/msa/clustalo/

Visualization of results

Muscle:Accurate MSA tool, especially good with proteins. Suitable for medium alignments.
Muscle
ClustalW:New MSA tool that uses seeded guide trees and HMM profile-profile techniques to generate alignments. Suitable for medium-large alignments.
ClustalW
MAFFT:MSA tool that uses Fast Fourier Transforms. Suitable for medium-large alignments.
MAFFT
后面两个的结果相近,可能更加接近真实情况。

newick text

# Muscle
(
(
KP198610:0.22253,
(
NC_002645.1:0.16531,
MK334047.1:0.15607)
:0.08099)
:0.01856,
NC_019843.3:0.22538,
(
NC_045512.2:0.09935,
NC_004718.3:0.10340)
:0.11661);

# ClustalW
(
(
NC_019843.3:0.23351,
(
NC_045512.2:0.09863,
NC_004718.3:0.10330)
:0.12454)
:0.02357,
KP198610:0.23317,
(
MK334047.1:0.16005,
NC_002645.1:0.16886)
:0.09141);

# MAFFT
(
KP198610:0.23000,
(
MK334047.1:0.15815,
NC_002645.1:0.16536)
:0.08813,
(
NC_019843.3:0.22966,
(
NC_045512.2:0.09772,
NC_004718.3:0.10361)
:0.12929)
:0.03177);

Local MEGAX

Build process

graph TB; Align -->Input_integrated_fasta; Input_integrated_fasta --> Align_by_ClusterW; Align_by_ClusterW --takes_long_time--> Phylogenetic_analysis_in_Data_option; Phylogenetic_analysis_in_Data_option --> Compute_pairwise_distance_in_Distance_option;

Distance matrix and comes achievements

Distance Matrix
forecast result

Manual achievements results

About NEWICK format

Guess you like

Origin www.cnblogs.com/zhengjm/p/12640256.html