第五章模体和循环

　　本章将继续Python语言的基础知识。到本章结尾，你将了解如何：

- 搜索DNA或蛋白质中的模体
- 用键盘与用户交互
- 将数据写入文件
- 使用循环
- 使用字符串内置函数find
- 根据条件测试的结果执行不同的操作
- 通过操作字符串和列表来详细检查序列数据

　　在本章中，你将学习编写一个在序列数据中查找模体的程序，并且提供编写生物信息学程序所需的技能。

1. 流程控制

　　流程控制是执行程序语句的顺序，一般程序按顺序从顶部第一个语句到底部最后一个语句执行，有两种方法可以改变程序执行顺序：条件语句和循环。条件语句只有条件成功时才执行一组语句，否则只是跳过一组语句。循环重复一组语句直到关联的测试失败。

　　1.1 条件语句

　　让我们回想open语句，如果你尝试打开的文件不存在，则会收到错误消息。因此，在尝试打开文件之前，你可以先判断文件是否存在。实际上，这些判断是计算机语言最强大的功能之一。if、if-else是python中存在的判断语句。

　　这些结构的特征是计算True/False值，如果条件为True，则执行其下面的语句；为假，则跳过语句（反之亦然）。

　　然而，什么是真？不同的编程语言可能会稍微不同。

　　本节演示了一些Python条件的示例。每个例子中的真假条件是两个数字之间的相等，数字的相等性由两个等号“==”表示，因为单个等号=已用于赋值给变量。以下示例演示条件测试是否计算为True/False，通常这些简单地测试没有太多用处，大多数条件测试是判断变量的值或函数调用返回的结果——你事先不知道的事情。

　　条件测试为True：

if 1 == 1：
　　print（"1 equals 1\n\n"）

　　输出结果：

1 equals 1

　　测试条件是“1 == 1”，条件计算值为True，则执行与if语句关联的语句，打印出消息。

　　你也可以这样判断：

if 1:
    print("1 evaluates to true\n\n")

　　输出结果：

1 evaluates to true

　　条件测试为False：

if 1 == 0 :
    print("1 equals 0\n\n")

　　没有输出结果。测试条件是“1 == 0”，条件计算结果为False，因此不执行与if关联的语句，也不打印任何消息。

　　你也可以这样判断：

if 0:
    print("0 evaluates to true\n\n")

　　也没有输出结果，因为0的计算结果为False，因此不执行与if关联的语句，也不打印任何消息。

　　现在我们再看看计算结果为True的if-else语句：

if 1 == 1:
    print("1 equals 1\n\n")
else:
    print("1 does not equal 1\n\n")

　　输出结果为：

1 equals 1

　　if-else在测试条件为True时执行一项操作，如果为False则执行另一项操作。如下是计算结果为False的if-else语句：

if 1 == 0:
    print("1 equals 0\n\n")
else:
    print("1 does not equal 0\n\n")

　　输出结果为：

1 does not equal 0

　　1.1.1 条件判断和缩进

　　关于条件判断有两点需要额外说明。首先，有几个运算符可以在条件判断部分使用，除了前面示例中的“==”之外，还有不等于“!=”、大于“>”、小于“<”等等。其次，请注意条件后面的语句是缩进的，称之为块，Python中用缩进来决定逻辑行的缩进层次，从而用来决定语句的分组。正确的缩进对Python程序运行至关重要。

　　再看例子5-1中if-elif-else语句，条件判断首先是if，然后是elif，依次评估，一旦结果为True，就会执行其块，并忽略其余条件。如果没有条件的计算结果为True，则执行else块。

例子5-1 if-elif-else

#!/usr/bin/env python
# if-elif-else

word = 'MNIDDKL'

# if-elif-else conditionals
if word == 'QSTVSGE':
    print("QSTVSGE\n")
elif word == 'MRQQDMISHDEL':
    print("MRQQDMISHDEL\n")
elif word == 'MNIDDKL':
    print("MNIDDKL--the magic word!\n")
else:
    print('Is \"%s\" a peptide? This program is not sure.\n' % word)

exit()

　　注意else块print语句中的\"，它允许你打印双引号"。反斜杠字符告诉Python将"视为符号本身而不是将其解释为字符串结尾的标记。

　　例子5-1输出结果：

MNIDDKL--the magic word!

　　1.2 循环

　　循环允许你重复执行块内的语句块，在Python中有几种循环方式：while循环、for循环等等。例子5-2（来自第四章）显示了while循环以及从文件中读取蛋白质序列数据时如何使用它。

例子5-2 从文件中读取蛋白质序列数据， 4

#!/usr/bin/env python
import os
# Reading protein sequence data from a file, take 4

# The filename of the file containing the protein sequence data
proteinfilename = 'NM_021964fragment.pep'

# First we have to "open" the file, and in case the
# open fails, print an error message and exit the program.
if os.path.exists(proteinfilename):
    PROTEINFILE = open(proteinfilename)
else:
    print("Could not open file %s!\n" % proteinfilename)
    exit()
    

# Read the protein sequence data from the file in a "while" loop,
# printing each line as it is read.
protein = PROTEINFILE .readline()
while protein:
    print("  ######  Here is the next line of the file:\n")
    print(protein)
    protein = PROTEINFILE .readline()


# Close the file.
PROTEINFILE.close()

exit()

　　例子5-2输出如下：

  ######  Here is the next line of the file:
MNIDDKLEGLFLKCGGIDEMQSSRTMVVMGGVSGQSTVSGELQD
  ######  Here is the next line of the file:
SVLQDRSMPHQEILAADEVLQESEMRQQDMISHDELMVHEETVKNDEEQMETHERLPQ
  ######  Here is the next line of the file:
GLQYALNVPISVKQEITFTDVSEQLMRDKKQIR

　　首先，程序读取文件的第一行赋值给变量protein，然后在while循环中，循环每次给变量protein赋值文件下一行的内容，并判断读取的行是否为空来判断条件是否为False，是否退出循环，跳过两个print语句块。条件为True，新行储存在变量protein中，并执行带有两个print语句块。

　　1.2.1 open函数和os模块

　　open函数调用时系统调用，因为要打开文件，Python必须从操作系统中请求该文件。操作系统可以是Unix/Linux、Microsoft Windows、Apple Macintosh等等，文件由操作系统管理，只能由它访问。检查系统调用的成功或失败是一个好习惯，特别是在打开文件时。如果系统调用失败，并且没有检查它，程序将继续读取或写入无法打开的文件。你应始终检查故障，并在无法打开文件时立即通知用户或退出程序。

　　os模块是Python内置的一个模块，提供了一个统一的操作系统接口函数, 这些接口函数通常是平台指定的，os 模块能在不同操作系统平台（如 nt 或 posix）中的特定函数间自动切换，从而能实现跨平台操作。在例子5-2中，调用os模块下面的os.path.exists函数用来判断系统中是否存在文件，如果存在则调用open函数打开文件；不存在，程序输出错误信息，然后退出。

　　总而言之，Python中学习条件和循环不难，而且条件和循环是编程语言最强大的功能之一。条件允许你为程序制定多个备选方案，并根据获得的输入类型做出决策。循环利用计算机的速度，通过几行代码就可以处理大量输入或不断迭代和优化计算。

2. 搜索模体

　　在生物信息学中最常见的事情之一是寻找特别感兴趣的模体、DNA或蛋白质短片段，它们可能是DNA的调节原件或短链蛋白质，在许多物种中都是保守的。（PROSITE网站提供了有关蛋白质模体的广泛信息。）

　　在生物序列中寻找的模体通常不是一个特定序列，它们可能具有多种变体，例如，存在碱基或氨基酸无关紧要的位置，可能有不同的长度。这种情况可以使用正则表达式，在第九章和本专辑其它地方你将看到更多例子。

　　Python有一套便于查找字符串的功能，例子5-3介绍了这种字符串搜索功能，类似的程序一直在生物学研究中使用。它执行以下操作：

- 从文件中读取蛋白质序列数据
- 将所有序列数据放入一个字符串中以便于搜索
- 查找用户输入的模体

例子5-3 寻找模体

#!/usr/bin/env python
import os
# Searching for motifs

# Ask the user for the filename of the file containing
# the protein sequence data, and collect it from the keyboard
print "Please type the filename of the protein sequence data: ";

proteinfilename = input()

# open the file, or exit
if os.path.exists(proteinfilename):
　　PROTEINFILE = open(proteinfilename)
else:
　　print("Could not open file %s!\n" % proteinfilename)
　　exit()

# Read the protein sequence data from the file, and store it
# into the array variable proteins
proteins = PROTEINFILE.readlines()

# Close the file - we've read all the data into @protein now.
PROTEINFILE.close()

# Put the protein sequence data into a single string, as it's easier
# to search for a motif in a string than in an array of
# lines (what if the motif occurs over a line break?)
protein = ''.join(proteins)

# Remove whitespace
protein = protein.replace('\n', '')

# In a loop, ask the user for a motif, search for the motif,
# and report if it was found.
# Exit if no motif is entered.


while True:
    print("Enter a motif to search for: ")
    
    motif = input()
    
    # exit on an empty user input
    if not motif:
        break

    # Look for the motif

    if protein.find(motif) != -1:
        print("I found it!\n\n")
    else:
        print("I couldn\'t find it.\n\n")

# exit the program
exit()

　　例子5-3输出结果：

Please type the filename of the protein sequence data: NM_021964fragment.pep
Enter a motif to search for: SVLQ
I found it!

Enter a motif to search for: jkl
I couldn't find it.

Enter a motif to search for: QDSV
I found it!

Enter a motif to search for: HERLPQGLQ
I found it!

Enter a motif to search for: 
I couldn't find it.

　　从输出中可以看出，该程序找到用户输入的模体序列。美中不足的是如果这个程序不仅报告它找到模体序列，而且在什么位置，那就太好了。你将在第九章中看到如何实现这一点，该章中练习要求你修改此程序以便报告模体的位置。

　　以下内容将检查讨论例子5-3中的内容;

- 从键盘获取用户输入
- 将文件的行连接成单个字符串
- find函数和strip函数
- not

　　2.1 从键盘获取用户输入

　　Python使用内置函数来获取用户在键盘上键入的输入。在例子5-3中，一个名为input的函数接受用户输入数据，返回为 string 类型。当用户键入文件名并通过Enter键发送输入时，文件名会保存到变量proteinfilename。

　　2.2 使用join函数将列表转成字符串

　　通常蛋白质序列数据是分成80个残基的短片段，因为当数据打印在纸上或显示在屏幕上时，需要将其分解为合适的行个数。但是，如果你正在搜索的模体序列在文件中由换行符分分割了，程序就找不到这个模体序列。在例子5-3中，搜索的一些模体序列是由换行符分开的，在Python中，你可以使用join函数将列表中所有数据组合成一个储存在新变量protein中的单个字符串。

protein = ''.join(proteins)

　　你可以指定列表元素连接时使用的字符串，本示例中，指定要放置在输入文件行之间为空字符串，空字符串用一对单引号''（或者双引号“”）表示。

　　回想例子4-2中，我们介绍了两个DNA片段的连接方法，与join函数的使用非常相似。例子4-2中的以下语句，它是连接两个字符串的方法之一：

DNA3 = DNA1 + DNA2

　　完全相同的另一种连接方法是使用join函数：

DNA3 = ''.join([DNA1, DNA2])

　　上面的方法中，指定了一个字符串元素列表，而不是一个列表的名称：

[DNA1, DNA2]

　　2.3 python中实现do-until类似循环

　　例子5-3中，使用while和if判断实现了类似do-until功能的循环。它首先执行一个块，然后进行if判断，如果测试成功，就用break跳出循环。例子5-3首先打印用户提示信息，获取用户输入，调用find函数搜索模体并报告结果（是否为-1，-1意味着没有找到）。在执行查找操作之前，用if语句判断用户是否输入的是空行，空行意味着用户用户没有更多的模体序列要查找，退出循环。

　　2.4 字符串函数find和索引

　　Python可以轻松操作各种字符串，例如DNA和蛋白质序列数据。字符串内含函数find用来搜索子字符串（模体序列）是否出现在字符串（蛋白质序列）中，如果找到则返回子字符串的起始索引，否则返回-1。

3. 统计核苷酸个数

　　关于给定的DNA序列，你可能想知道是编码还是非编码？是否包含调节因子？与其它DNA序列是否有关？DNA序列中四种核苷酸的个数？事实上，在某些物种中，编码区具有特定的核苷酸偏差，因此最后一个问题对于寻找基因非常重要。此外，不同物种具有不同的核算比例，因此计算核苷酸比例是十分有用的。

　　在下面的两个程序，例子5-4和5-6中，它们计算了DNA中每种核苷酸的含量。使用Python部分新功能：

　　将字符串转换成列表

　　迭代列表

　　要获得给定DNA中每种核苷酸的个数，你必须遍历每个碱基，看看是什么碱基，然后统计每个核苷酸的个数，这里使用for循环来统计。

　　首先，来看看伪代码，之后再写跟详细地伪代码来编写Python程序。

　　以下伪代码描述了所需的内容：

for each base in the DNA
    if base is A
        count_of_A = count_of_A + 1
    if base is C
        count_of_C = count_of_C + 1
    if base is G
        count_of_G = count_of_G + 1
    if base is T
        count_of_T = count_of_T + 1
done

print count_of_A, count_of_C, count_of_G, count_of_T

　　如你所见，思路非常简单，现在让我们来看看如何用Python编写该程序。

4. 将字符串转换成列表

　　首先将字符串转换成列表，指将DNA字符串中的每个碱基是分开的，并且每个字母在列表中称为单独的元素。然后你可以逐个查看列表元素（每个元素都是一个单字母），遍历列表统计每个核苷酸的数量。这与2.2节中的join函数将字符串列表中的元素连接成一个字符串的功能相反。

　　下面详细版的伪代码中添加了从文件中获取DNA并操作该文件数据的指令。首先，连接从原始文件中读取的行列表得到字符串序列，替换字符串序列中的换行符和空格，然后将字符串序列转成单个字母的列表。

read in the DNA from a file

join the lines of the file into a single string $DNA

# make an array out of the bases of $DNA
@DNA = explode $DNA

# initialize the counts
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0

for each base in @DNA

    if base is A
        count_of_A = count_of_A + 1
    if base is C
        count_of_C = count_of_C + 1
    if base is G
        count_of_G = count_of_G + 1
    if base is T
        count_of_T = count_of_T + 1
done

print count_of_A, count_of_C, count_of_G, count_of_T

　　上述伪代码详细地介绍了一种方法，通过将DNA序列转换成单个字母元素的裂变来查看每个碱基，还将每个核苷酸的计数初始化为0，例子5-4是实际可行的程序。

例子5-4 计算核苷酸的比例

#!/usr/bin/env python
import os
# Determining frequency of nucleotides

# Get the name of the file with the DNA sequence data
print("Please type the filename of the DNA sequence data: ")

dna_filename = input()


# open the file, or exit
if os.path.exists(dna_filename):
　　DNAFILE = open(dna_filename)
else:
　　print("Could not open file %s!\n" % dna_filename)
　　exit()


# Read the DNA sequence data from the file, and store it
# into the array variable DNAs
DNAs = DNAFILE.readlines()

# Close the file
DNAFILE.close()

# From the lines of the DNA file,
# put the DNA sequence data into a single string.
DNA = ''.join(DNAs)

# Remove whitespace
DNA = DNA.replace('\n', '')

# Now explode the DNA into an array where each letter of the
# original string is now an element in the array.
# This will make it easy to look at each position.
# Notice that we're reusing the variable DNA for this purpose.
DNA = list(DNA)

# Initialize the counts.
# Notice that we can use scalar variables to hold numbers.
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0
errors     = 0

# In a loop, look at each base in turn, determine which of the
# four types of nucleotides it is, and increment the
# appropriate count.
for base in DNA:

    if base == 'A':
        ++count_of_A
    elif base == 'C':
        ++count_of_C
    elif base == 'G':
        ++count_of_G
    elif base == 'T':
        ++count_of_T
    else:
        print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base)
        ++errors

# print the results
print("A = %s\n" % count_of_A)
print("C = %s\n" % count_of_C)
print("G = %s\n" % count_of_G)
print("T = %s\n" % count_of_T)
print("errors = %s\n" % errors)

# exit the program
exit()

　　为了演示例子5-4，我创建了以下DNA文件并命名为small.dna：

AAAAAAAAAAAAAAGGGGGGGTTTTCCCCCCCC
CCCCCGTCGTAGTAAAGTATGCAGTAGCVG
CCCCCCCCCCGGGGGGGGAAAAAAAAAAAAAAATTTTTTAT
AAACG

　　注意文件中有一个字母V，如下是例子5-4输出：

Please type the filename of the DNA sequence data: small.dna
!!!!!!!! Error - I don't recognize this base: V

A = 40
C = 27
G = 24
T = 17

　　在这个程序中，我们使用了list这个新的函数，用法如下:

DNA = list(DNA)

　　这个函数将DNA字符串分割成了单个字母组成的列表。在交互式环境输入help（list）或使用文档查看list函数使用方法，list函数接受一个可迭代的对象，将其转换为列表。

　　在Python中，字符串就是一个可迭代对象，并且可迭代对象是可以使用for循环。因此，上述程序可去掉“DNA = list(DNA)”，写成如下方式：

#!/usr/bin/env python
import os
# Determining frequency of nucleotides

# Get the name of the file with the DNA sequence data
print("Please type the filename of the DNA sequence data: ")

dna_filename = input()


# open the file, or exit
if os.path.exists(dna_filename):
　　DNAFILE = open(dna_filename)
else:
　　print("Could not open file %s!\n" % dna_filename)
　　exit()


# Read the DNA sequence data from the file, and store it
# into the array variable DNAs
DNAs = DNAFILE.readlines()

# Close the file
DNAFILE.close()

# From the lines of the DNA file,
# put the DNA sequence data into a single string.
DNA = ''.join(DNAs)

# Remove whitespace
DNA = DNA.replace('\n', '')


# Initialize the counts.
# Notice that we can use scalar variables to hold numbers.
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0
errors     = 0

# In a loop, look at each base in turn, determine which of the
# four types of nucleotides it is, and increment the
# appropriate count.
for base in DNA:

    if base == 'A':
        ++count_of_A
    elif base == 'C':
        ++count_of_C
    elif base == 'G':
        ++count_of_G
    elif base == 'T':
        ++count_of_T
    else:
        print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base)
        ++errors

# print the results
print("A = %s\n" % count_of_A)
print("C = %s\n" % count_of_C)
print("G = %s\n" % count_of_G)
print("T = %s\n" % count_of_T)
print("errors = %s\n" % errors)

# exit the program
exit()

　　接着，有五个变量初始化为0，python中变量没有类型的分别，如果没有初始化变量，使用该变量时程序会报错终止。

5. 使用索引

　　如下是使用索引方式的伪代码：

read in the DNA from a file

join the lines of the file into a single string of $DNA

# initialize the counts
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0

for each base at each position in $DNA

    if base is A
        count_of_A = count_of_A + 1
    if base is C
        count_of_C = count_of_C + 1
    if base is G
        count_of_G = count_of_G + 1
    if base is T
        count_of_T = count_of_T + 1
done

print count_of_A, count_of_C, count_of_G, count_of_T

例子5-5 计算核苷酸评率 2

#!/usr/bin/env python
import os
# Determining frequency of nucleotides

# Get the name of the file with the DNA sequence data
print("Please type the filename of the DNA sequence data: ")

dna_filename = input()


# open the file, or exit
if os.path.exists(dna_filename):
　　DNAFILE = open(dna_filename)
else:
　　print("Could not open file %s!\n" % dna_filename)
　　exit()


# Read the DNA sequence data from the file, and store it
# into the array variable DNAs
DNAs = DNAFILE.readlines()

# Close the file
DNAFILE.close()

# From the lines of the DNA file,
# put the DNA sequence data into a single string.
DNA = ''.join(DNAs)

# Remove whitespace
DNA = DNA.replace('\n', '')


# Initialize the counts.
# Notice that we can use scalar variables to hold numbers.
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0
errors     = 0

# In a loop, look at each base in turn, determine which of the
# four types of nucleotides it is, and increment the
# appropriate count.
for position in range(len(DNAs)):
    base = DNAs[position]

    if base == 'A':
        ++count_of_A
    elif base == 'C':
        ++count_of_C
    elif base == 'G':
        ++count_of_G
    elif base == 'T':
        ++count_of_T
    else:
        print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base)
        ++errors

# print the results
print("A = %s\n" % count_of_A)
print("C = %s\n" % count_of_C)
print("G = %s\n" % count_of_G)
print("T = %s\n" % count_of_T)
print("errors = %s\n" % errors)

# exit the program
exit()

　　例子5-5输出如下：

Please type the filename of the DNA sequence data: small.dna
!!!!!!!! Error - I don't recognize this vase: V
A = 40
C = 27
G = 24
T = 17
errors = 1

　　上述for循环等价下面的while循环。

#!/usr/bin/env python
import os
# Determining frequency of nucleotides

# Get the name of the file with the DNA sequence data
print("Please type the filename of the DNA sequence data: ")

dna_filename = input()


# open the file, or exit
if os.path.exists(dna_filename):
　　DNAFILE = open(dna_filename)
else:
　　print("Could not open file %s!\n" % dna_filename)
　　exit()


# Read the DNA sequence data from the file, and store it
# into the array variable DNAs
DNAs = DNAFILE.readlines()

# Close the file
DNAFILE.close()

# From the lines of the DNA file,
# put the DNA sequence data into a single string.
DNA = ''.join(DNAs)

# Remove whitespace
DNA = DNA.replace('\n', '')


# Initialize the counts.
# Notice that we can use scalar variables to hold numbers.
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0
errors     = 0

# In a loop, look at each base in turn, determine which of the
# four types of nucleotides it is, and increment the
# appropriate count.
position = 0

while position < len(DNAs):
    base = DNAs[position]

    if base == 'A':
        ++count_of_A
    elif base == 'C':
        ++count_of_C
    elif base == 'G':
        ++count_of_G
    elif base == 'T':
        ++count_of_T
    else:
        print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base)
        ++errors

    ++position

# print the results
print("A = %s\n" % count_of_A)
print("C = %s\n" % count_of_C)
print("G = %s\n" % count_of_G)
print("T = %s\n" % count_of_T)
print("errors = %s\n" % errors)

# exit the program
exit()

　　比较这两个循环，不难发现for循环只是while循环的简写，循环判断条件是position是否小于字符串DNA的长度，这里使用了字符串索引方式。默认情况下，Python假定字符串从0索引开始，其最后一个字符编号为字符串长度减一。

　　在上述程序中，我们还使用了python内置函数range，这是一个生成器函数（生成迭代器或列表），例如，range（5）等效生成[0, 1, 2, 3, 4]五个元素的列表。

6. 输出到文件

　　例子5-6介绍了另一种计算DNA字符串中核苷酸的方法，它使用了python字符串内置函数count。关于Python学习，你会体会到可能有一个相对简洁的方法来实现一个功能。

例子5-6 计算核苷酸的比例 3

#!/usr/bin/env python
import os
# Determining frequency of nucleotides

# Get the name of the file with the DNA sequence data
print("Please type the filename of the DNA sequence data: ")

dna_filename = input()


# open the file, or exit
if os.path.exists(dna_filename):
　　DNAFILE = open(dna_filename)
else:
　　print("Could not open file %s!\n" % dna_filename)
　　exit()


# Read the DNA sequence data from the file, and store it
# into the array variable DNAs
DNAs = DNAFILE.readlines()

# Close the file
DNAFILE.close()

# From the lines of the DNA file,
# put the DNA sequence data into a single string.
DNA = ''.join(DNAs)

# Remove whitespace
DNA = DNA.replace('\n', '')


# In a loop, look at each base in turn, determine which of the
# four types of nucleotides it is, and increment the
# appropriate count.
count_of_A  = DNA.count('A') + DNA.count('a')
count_of_C  = DNA.count('C') + DNA.count('c')
count_of_G  = DNA.count('G') + DNA.count('g')
count_of_T  = DNA.count('T') + DNA.count('t')
errors     = len(DNA) - count_of_A  -count_of_C - count_of_G - count_of_T

# print the results
print("A=%d C=%d G=%d T=%d errors=%d\n" % (count_of_A, count_of_C, count_of_G, count_of_T, errors))

# Also write the results to a file called "countbase"
outputfile = "countbase"

COUNTBASE  = open(outputfile, 'w')

COUNTBASE.write("A=%d C=%d G=%d T=%d errors=%d\n" % (count_of_A, count_of_C, count_of_G, count_of_T, errors))

COUNTBASE.close()

# exit the program
exit()

　　例子5-6输出如下：

Please type the filename of the DNA sequence data: small.dna
A=40 C=27 G=24 T=17 errors=1

　　例子5-7输出文件countbase内容如下：

A=40 C=27 G=24 T=17 errors=1

7.练习

5.1 编写一个无限循环的程序，循环每次判断条件为真。

5.2 用户输入两（短）DNA串，使用“+”运算符将第二个字符串连接到第一个字符串末尾。将连接的字符串打印，然后在连接的位置开始打印第二个字符串。例如，输入“AAAA”和“TTTT”，则打印：

AAAATTTT

TTTT

5.3 编写一个程序，打印从1到100的所有数字。

5.4 编写一个程序来获取DNA链的反向互补链。

5.5 编写一个程序来报告蛋白质序列中疏水性氨基酸的百分比。（（要查找哪些氨基酸是疏水性的，请参阅有关蛋白质，分子生物学或细胞生物学的任何介绍性文章。）

5.6 编写一个程序，检查作为参数输入的两个字符串是否彼此反向互补。

5.7 编写一个程序来报告DNA序列GC的比例。

5.8 编写一个程序，可以替换DNA中指定位置的碱基。

参考资料

Begining Perl for Bioinformatics