Introduction to sequence alignment 2

In the last issue, we introduced some basic concepts in sequence alignment, including match , mismatch , substitution , gap , insertion , deletion, global and local alignment ( global  and  local  alignments), in this issue we will describe these concepts in more detail.

  • Match: refers to the situation where the bases or amino acids at corresponding positions in two sequences are the same. Matches are usually represented by the same letters or symbols. For example, in DNA sequences, matches between A and A, C and C, G and G, and T and T are expressed as "|".
  • Mismatch: refers to the situation where the bases or amino acids at corresponding positions in the two sequences are different. Mismatches are often represented by different letters or symbols.
  • Gap: refers to a missing region in a sequence, that is, a base or amino acid is missing at a certain position in a sequence. When aligning sequences, you can insert a dash or other symbol into a sequence to represent a gap so that the two sequences are aligned.

 

  • Substitution: refers to the mismatch of bases or amino acids at corresponding positions in two sequences. Substitution is usually expressed by replacing one letter or symbol with another.
  • Insertion: refers to the process of inserting one or more bases or amino acids into a sequence, resulting in a mismatch with another sequence. Insertion is usually represented by a letter or symbol.
  • Deletion: refers to the process of deleting one or more bases or amino acids from a sequence, resulting in a mismatch with another sequence. Deletion is usually represented by a letter or symbol.

 

  • Local alignment: It is to find the best matching region in two sequences and compare them to find the best matching solution. Local alignment is often used to align less similar sequences.
  • Global alignment: Compares the entire length of two sequences to find the best matching solution. Global alignment is often used to align sequences with high similarity.

 

(Above the dotted line is a local alignment. The sequence below is only partially aligned with the sequence above. Local sequence alignment reduces the number of gaps required for different sequence alignments and therefore can reveal short conserved or conserved sequences that are usually not detected in global sequence alignments. Similar regions.; Below the dotted line is the global alignment. The alignment of the two sequences is from beginning to end, that is, the alignment starts from the first base of one sequence and stops at the last base of the other sequence. )

Next, let me look at the two sequences seq1:ATGAAGCGTGC and seq2:ATGAAGAGTGCA. The length of seq1 is 11 and the length of seq2 is 12. We can align the two sequences as shown below, but in this alignment only 5 bases are matched (that is, the bases at positions 1 2 3 4 10). If other methods are used, the matching bases Will the number of bases increase?

 

Here we can introduce a gap (gap) on the two sequences (translated as "gap" seems more appropriate), as shown below,

 

seq1 inserted gaps at position 10 and 12 of sequence number, and seq2 inserted a gap at position 5 of sequence number. In this way, the length of the two sequences became the same, and the number of matching bases also increased, from the original 5 The bases became the current 9 bases.

At the same time, the way to insert gaps is not unique. Insert at different positions, and you may get the same matching effect. The image below shows another way:

 

Carefully comparing the two pictures above, we find that the gap position inserted by seq2 in the second method is sequence number 4, while the gap position inserted by seq2 in the first method is sequence number 5. The image below illustrates the difference between the two methods in more detail, but the end result is the same.

 

Guess you like

Origin blog.csdn.net/m0_56572447/article/details/130465995