Links
http://journals.sagepub.com/doi/10.3181/0903-MR-94 (Minireview coronavirus)
http://www.biotrainee.com/thread-2253-1-1.html (phylogenetic tree related)
https://blog.csdn.net/Cccrush/article/details/90695891 (details several evolutionary tree construction methods and principles)
Target species and sequence
Species: coronavirus in seven kinds of viruses capable of infecting human
origin sequence: the NCBI Ref sequence has been published, we are only using one of the six species.
Related Seq list
Multiple sequence alignments of the principles and methods
Related tools
- ClustalX / W (the former is a graphical interface, which is the command line interface)
- T-Coffee tool
- MultAlin Tools
- MAFFT Tools
- MEGAX Tools (Common)
Several methods of achievements
- Packet unweighted average method (unweighted pair Group Method with Arithmetic Mean, UPGAM )
- Minimum evolutionary method (Minimum Evolution, ME )
- The least square method (Least Squares, LS )
- Neighbor-joining (neighbor-joining, NJ )
In fact, the above four methods belong distance method , i.e. by calculating the evolutionary distance between species as the basis for the contribution.
In fact there is a class rule achievements: Character-based Methods characteristics method , where the first jump in the past, looking at the future (digging ing).
Actual operation
Muscle&ClustalW
Several tools in the above EBI on the site has published (in fact, there is also a lot of tools you can implement multiple sequence alignment), we use them MUSCLE method + ClustalW method + MAFFT method to directly obtain a final contribution result.
Links:
https://www.ebi.ac.uk/Tools/msa/muscle/
https://www.ebi.ac.uk/Tools/msa/clustalo/
Visualization of results
Muscle:Accurate MSA tool, especially good with proteins. Suitable for medium alignments.
ClustalW:New MSA tool that uses seeded guide trees and HMM profile-profile techniques to generate alignments. Suitable for medium-large alignments.
MAFFT:MSA tool that uses Fast Fourier Transforms. Suitable for medium-large alignments.
后面两个的结果相近,可能更加接近真实情况。
newick text
# Muscle
(
(
KP198610:0.22253,
(
NC_002645.1:0.16531,
MK334047.1:0.15607)
:0.08099)
:0.01856,
NC_019843.3:0.22538,
(
NC_045512.2:0.09935,
NC_004718.3:0.10340)
:0.11661);
# ClustalW
(
(
NC_019843.3:0.23351,
(
NC_045512.2:0.09863,
NC_004718.3:0.10330)
:0.12454)
:0.02357,
KP198610:0.23317,
(
MK334047.1:0.16005,
NC_002645.1:0.16886)
:0.09141);
# MAFFT
(
KP198610:0.23000,
(
MK334047.1:0.15815,
NC_002645.1:0.16536)
:0.08813,
(
NC_019843.3:0.22966,
(
NC_045512.2:0.09772,
NC_004718.3:0.10361)
:0.12929)
:0.03177);
Local MEGAX
Build process
Distance matrix and comes achievements