第五章 模体和循环

  本章将继续Python语言的基础知识。到本章结尾,你将了解如何:

    • 搜索DNA或蛋白质中的模体
    • 用键盘与用户交互
    • 将数据写入文件
    • 使用循环
    • 使用字符串内置函数find
    • 根据条件测试的结果执行不同的操作
    • 通过操作字符串和列表来详细检查序列数据

  在本章中,你将学习编写一个在序列数据中查找模体的程序,并且提供编写生物信息学程序所需的技能。

1. 流程控制

   流程控制是执行程序语句的顺序,一般程序按顺序从顶部第一个语句到底部最后一个语句执行,有两种方法可以改变程序执行顺序:条件语句和循环。条件语句只有条件成功时才执行一组语句,否则只是跳过一组语句。循环重复一组语句直到关联的测试失败。

  1.1 条件语句

  让我们回想open语句,如果你尝试打开的文件不存在,则会收到错误消息。因此,在尝试打开文件之前,你可以先判断文件是否存在。实际上,这些判断是计算机语言最强大的功能之一。if、if-else是python中存在的判断语句。

  这些结构的特征是计算True/False值,如果条件为True,则执行其下面的语句;为假,则跳过语句(反之亦然)。

  然而,什么是真?不同的编程语言可能会稍微不同。

  本节演示了一些Python条件的示例。每个例子中的真假条件是两个数字之间的相等,数字的相等性由两个等号“==”表示,因为单个等号=已用于赋值给变量。以下示例演示条件测试是否计算为True/False,通常这些简单地测试没有太多用处,大多数条件测试是判断变量的值或函数调用返回的结果——你事先不知道的事情。

  条件测试为True:

if 1 == 1:
  print("1 equals 1\n\n")

   输出结果:

1 equals 1

  测试条件是“1 == 1”,条件计算值为True,则执行与if语句关联的语句,打印出消息。

  你也可以这样判断:

if 1:
    print("1 evaluates to true\n\n")

   输出结果:

1 evaluates to true

   条件测试为False:

if 1 == 0 :
    print("1 equals 0\n\n")

   没有输出结果。测试条件是“1 == 0”,条件计算结果为False,因此不执行与if关联的语句,也不打印任何消息。

  你也可以这样判断:

if 0:
    print("0 evaluates to true\n\n")

  也没有输出结果,因为0的计算结果为False,因此不执行与if关联的语句,也不打印任何消息。

  现在我们再看看计算结果为True的if-else语句:

if 1 == 1:
    print("1 equals 1\n\n")
else:
    print("1 does not equal 1\n\n")

   输出结果为:

1 equals 1

   if-else在测试条件为True时执行一项操作,如果为False则执行另一项操作。 如下是计算结果为False的if-else语句:

if 1 == 0:
    print("1 equals 0\n\n")
else:
    print("1 does not equal 0\n\n")

   输出结果为:

1 does not equal 0

   1.1.1 条件判断和缩进

  关于条件判断有两点需要额外说明。首先,有几个运算符可以在条件判断部分使用,除了前面示例中的“==”之外,还有不等于“!=”、大于“>”、小于“<”等等。其次,请注意条件后面的语句是缩进的,称之为块,Python中用缩进来决定逻辑行的缩进层次,从而用来决定语句的分组。正确的缩进对Python程序运行至关重要。

  再看例子5-1中if-elif-else语句,条件判断首先是if,然后是elif,依次评估,一旦结果为True,就会执行其块,并忽略其余条件。如果没有条件的计算结果为True,则执行else块。

例子5-1 if-elif-else
#!/usr/bin/env python
# if-elif-else

word = 'MNIDDKL'

# if-elif-else conditionals
if word == 'QSTVSGE':
    print("QSTVSGE\n")
elif word == 'MRQQDMISHDEL':
    print("MRQQDMISHDEL\n")
elif word == 'MNIDDKL':
    print("MNIDDKL--the magic word!\n")
else:
    print('Is \"%s\" a peptide? This program is not sure.\n' % word)

exit()

   注意else块print语句中的\",它允许你打印双引号"。反斜杠字符告诉Python将"视为符号本身而不是将其解释为字符串结尾的标记。

  例子5-1输出结果:

MNIDDKL--the magic word!

  1.2 循环

  循环允许你重复执行块内的语句块,在Python中有几种循环方式:while循环、for循环等等。例子5-2(来自第四章)显示了while循环以及从文件中读取蛋白质序列数据时如何使用它。

例子5-2 从文件中读取蛋白质序列数据, 4
#!/usr/bin/env python
import os
# Reading protein sequence data from a file, take 4

# The filename of the file containing the protein sequence data
proteinfilename = 'NM_021964fragment.pep'

# First we have to "open" the file, and in case the
# open fails, print an error message and exit the program.
if os.path.exists(proteinfilename):
    PROTEINFILE = open(proteinfilename)
else:
    print("Could not open file %s!\n" % proteinfilename)
    exit()
    

# Read the protein sequence data from the file in a "while" loop,
# printing each line as it is read.
protein = PROTEINFILE .readline()
while protein:
    print("  ######  Here is the next line of the file:\n")
    print(protein)
    protein = PROTEINFILE .readline()


# Close the file.
PROTEINFILE.close()

exit()

   例子5-2输出如下:

  ######  Here is the next line of the file:
MNIDDKLEGLFLKCGGIDEMQSSRTMVVMGGVSGQSTVSGELQD
  ######  Here is the next line of the file:
SVLQDRSMPHQEILAADEVLQESEMRQQDMISHDELMVHEETVKNDEEQMETHERLPQ
  ######  Here is the next line of the file:
GLQYALNVPISVKQEITFTDVSEQLMRDKKQIR

   首先,程序读取文件的第一行赋值给变量protein,然后在while循环中,循环每次给变量protein赋值文件下一行的内容,并判断读取的行是否为空来判断条件是否为False,是否退出循环,跳过两个print语句块。条件为True,新行储存在变量protein中,并执行带有两个print语句块。

   1.2.1 open函数和os模块

  open函数调用时系统调用,因为要打开文件,Python必须从操作系统中请求该文件。操作系统可以是Unix/Linux、Microsoft Windows、Apple Macintosh等等,文件由操作系统管理,只能由它访问。检查系统调用的成功或失败是一个好习惯,特别是在打开文件时。如果系统调用失败,并且没有检查它,程序将继续读取或写入无法打开的文件。你应始终检查故障,并在无法打开文件时立即通知用户或退出程序。

  os模块是Python内置的一个模块,提供了一个统一的操作系统接口函数, 这些接口函数通常是平台指定的,os 模块能在不同操作系统平台(如 nt 或 posix)中的特定函数间自动切换,从而能实现跨平台操作。在例子5-2中,调用os模块下面的os.path.exists函数用来判断系统中是否存在文件,如果存在则调用open函数打开文件;不存在,程序输出错误信息,然后退出。

  总而言之,Python中学习条件和循环不难,而且条件和循环是编程语言最强大的功能之一。条件允许你为程序制定多个备选方案,并根据获得的输入类型做出决策。循环利用计算机的速度,通过几行代码就可以处理大量输入或不断迭代和优化计算。

2. 搜索模体

  在生物信息学中最常见的事情之一是寻找特别感兴趣的模体、DNA或蛋白质短片段,它们可能是DNA的调节原件或短链蛋白质,在许多物种中都是保守的。(PROSITE网站提供了有关蛋白质模体的广泛信息。)

  在生物序列中寻找的模体通常不是一个特定序列,它们可能具有多种变体,例如,存在碱基或氨基酸无关紧要的位置,可能有不同的长度。这种情况可以使用正则表达式,在第九章和本专辑其它地方你将看到更多例子。

  Python有一套便于查找字符串的功能,例子5-3介绍了这种字符串搜索功能,类似的程序一直在生物学研究中使用。它执行以下操作:

    • 从文件中读取蛋白质序列数据
    • 将所有序列数据放入一个字符串中以便于搜索
    • 查找用户输入的模体
例子5-3 寻找模体
#!/usr/bin/env python
import os
# Searching for motifs

# Ask the user for the filename of the file containing
# the protein sequence data, and collect it from the keyboard
print "Please type the filename of the protein sequence data: ";

proteinfilename = input()

# open the file, or exit
if os.path.exists(proteinfilename):
  PROTEINFILE = open(proteinfilename)
else:
  print("Could not open file %s!\n" % proteinfilename)
  exit() # Read the protein sequence data from the file, and store it # into the array variable proteins proteins = PROTEINFILE.readlines() # Close the file - we've read all the data into @protein now. PROTEINFILE.close() # Put the protein sequence data into a single string, as it's easier # to search for a motif in a string than in an array of # lines (what if the motif occurs over a line break?) protein = ''.join(proteins) # Remove whitespace protein = protein.replace('\n', '') # In a loop, ask the user for a motif, search for the motif, # and report if it was found. # Exit if no motif is entered. while True: print("Enter a motif to search for: ") motif = input() # exit on an empty user input if not motif: break # Look for the motif if protein.find(motif) != -1: print("I found it!\n\n") else: print("I couldn\'t find it.\n\n") # exit the program exit()

   例子5-3输出结果:

Please type the filename of the protein sequence data: NM_021964fragment.pep
Enter a motif to search for: SVLQ
I found it!

Enter a motif to search for: jkl
I couldn't find it.

Enter a motif to search for: QDSV
I found it!

Enter a motif to search for: HERLPQGLQ
I found it!

Enter a motif to search for: 
I couldn't find it. 

   从输出中可以看出,该程序找到用户输入的模体序列。美中不足的是如果这个程序不仅报告它找到模体序列,而且在什么位置,那就太好了。你将在第九章中看到如何实现这一点,该章中练习要求你修改此程序以便报告模体的位置。

  以下内容将检查讨论例子5-3中的内容;

    • 从键盘获取用户输入
    • 将文件的行连接成单个字符串
    • find函数和strip函数
    • not

  2.1 从键盘获取用户输入

   Python使用内置函数来获取用户在键盘上键入的输入。在例子5-3中,一个名为input的函数接受用户输入数据,返回为 string 类型。当用户键入文件名并通过Enter键发送输入时,文件名会保存到变量proteinfilename。

  2.2 使用join函数将列表转成字符串

  通常蛋白质序列数据是分成80个残基的短片段,因为当数据打印在纸上或显示在屏幕上时,需要将其分解为合适的行个数。但是,如果你正在搜索的模体序列在文件中由换行符分分割了,程序就找不到这个模体序列。在例子5-3中,搜索的一些模体序列是由换行符分开的,在Python中,你可以使用join函数将列表中所有数据组合成一个储存在新变量protein中的单个字符串。

protein = ''.join(proteins)

   你可以指定列表元素连接时使用的字符串,本示例中,指定要放置在输入文件行之间为空字符串,空字符串用一对单引号''(或者双引号“”)表示。

  回想例子4-2中,我们介绍了两个DNA片段的连接方法,与join函数的使用非常相似。例子4-2中的以下语句,它是连接两个字符串的方法之一:

DNA3 = DNA1 + DNA2

   完全相同的另一种连接方法是使用join函数:

DNA3 = ''.join([DNA1, DNA2])

   上面的方法中,指定了一个字符串元素列表,而不是一个列表的名称:

[DNA1, DNA2]

  2.3 python中实现do-until类似循环

  例子5-3中,使用while和if判断实现了类似do-until功能的循环。它首先执行一个块,然后进行if判断,如果测试成功,就用break跳出循环。例子5-3首先打印用户提示信息,获取用户输入,调用find函数搜索模体并报告结果(是否为-1,-1意味着没有找到)。在执行查找操作之前,用if语句判断用户是否输入的是空行,空行意味着用户用户没有更多的模体序列要查找,退出循环。

  2.4 字符串函数find和索引

  Python可以轻松操作各种字符串,例如DNA和蛋白质序列数据。字符串内含函数find用来搜索子字符串(模体序列)是否出现在字符串(蛋白质序列)中,如果找到则返回子字符串的起始索引,否则返回-1。

3. 统计核苷酸个数

  关于给定的DNA序列,你可能想知道是编码还是非编码?是否包含调节因子?与其它DNA序列是否有关?DNA序列中四种核苷酸的个数?事实上,在某些物种中,编码区具有特定的核苷酸偏差,因此最后一个问题对于寻找基因非常重要。此外,不同物种具有不同的核算比例,因此计算核苷酸比例是十分有用的。

  在下面的两个程序,例子5-4和5-6中,它们计算了DNA中每种核苷酸的含量。使用Python部分新功能:

  将字符串转换成列表

  迭代列表

  要获得给定DNA中每种核苷酸的个数,你必须遍历每个碱基,看看是什么碱基,然后统计每个核苷酸的个数,这里使用for循环来统计。

  首先,来看看伪代码,之后再写跟详细地伪代码来编写Python程序。

  以下伪代码描述了所需的内容:

for each base in the DNA
    if base is A
        count_of_A = count_of_A + 1
    if base is C
        count_of_C = count_of_C + 1
    if base is G
        count_of_G = count_of_G + 1
    if base is T
        count_of_T = count_of_T + 1
done

print count_of_A, count_of_C, count_of_G, count_of_T

   如你所见,思路非常简单,现在让我们来看看如何用Python编写该程序。

4. 将字符串转换成列表

  首先将字符串转换成列表,指将DNA字符串中的每个碱基是分开的,并且每个字母在列表中称为单独的元素。然后你可以逐个查看列表元素(每个元素都是一个单字母),遍历列表统计每个核苷酸的数量。这与2.2节中的join函数将字符串列表中的元素连接成一个字符串的功能相反。

  下面详细版的伪代码中添加了从文件中获取DNA并操作该文件数据的指令。首先,连接从原始文件中读取的行列表得到字符串序列,替换字符串序列中的换行符和空格,然后将字符串序列转成单个字母的列表。

read in the DNA from a file

join the lines of the file into a single string $DNA

# make an array out of the bases of $DNA
@DNA = explode $DNA

# initialize the counts
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0

for each base in @DNA

    if base is A
        count_of_A = count_of_A + 1
    if base is C
        count_of_C = count_of_C + 1
    if base is G
        count_of_G = count_of_G + 1
    if base is T
        count_of_T = count_of_T + 1
done

print count_of_A, count_of_C, count_of_G, count_of_T

   上述伪代码详细地介绍了一种方法,通过将DNA序列转换成单个字母元素的裂变来查看每个碱基,还将每个核苷酸的计数初始化为0,例子5-4是实际可行的程序。

例子5-4 计算核苷酸的比例
#!/usr/bin/env python
import os # Determining frequency of nucleotides # Get the name of the file with the DNA sequence data print("Please type the filename of the DNA sequence data: ") dna_filename = input() # open the file, or exit if os.path.exists(dna_filename):
  DNAFILE = open(dna_filename)
else:
  print("Could not open file %s!\n" % dna_filename)
  exit()

# Read the DNA sequence data from the file, and store it # into the array variable DNAs DNAs = DNAFILE.readlines() # Close the file DNAFILE.close() # From the lines of the DNA file, # put the DNA sequence data into a single string. DNA = ''.join(DNAs) # Remove whitespace DNA = DNA.replace('\n', '') # Now explode the DNA into an array where each letter of the # original string is now an element in the array. # This will make it easy to look at each position. # Notice that we're reusing the variable DNA for this purpose. DNA = list(DNA) # Initialize the counts. # Notice that we can use scalar variables to hold numbers. count_of_A = 0 count_of_C = 0 count_of_G = 0 count_of_T = 0 errors = 0 # In a loop, look at each base in turn, determine which of the # four types of nucleotides it is, and increment the # appropriate count. for base in DNA: if base == 'A': ++count_of_A elif base == 'C': ++count_of_C elif base == 'G': ++count_of_G elif base == 'T': ++count_of_T else: print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base) ++errors # print the results print("A = %s\n" % count_of_A) print("C = %s\n" % count_of_C) print("G = %s\n" % count_of_G) print("T = %s\n" % count_of_T) print("errors = %s\n" % errors) # exit the program exit()

   为了演示例子5-4,我创建了以下DNA文件并命名为small.dna:

AAAAAAAAAAAAAAGGGGGGGTTTTCCCCCCCC
CCCCCGTCGTAGTAAAGTATGCAGTAGCVG
CCCCCCCCCCGGGGGGGGAAAAAAAAAAAAAAATTTTTTAT
AAACG

   注意文件中有一个字母V,如下是例子5-4输出:

Please type the filename of the DNA sequence data: small.dna
!!!!!!!! Error - I don't recognize this base: V

A = 40
C = 27
G = 24
T = 17

   在这个程序中,我们使用了list这个新的函数,用法如下:

DNA = list(DNA)

   这个函数将DNA字符串分割成了单个字母组成的列表。在交互式环境输入help(list)或使用文档查看list函数使用方法,list函数接受一个可迭代的对象,将其转换为列表。

  在Python中,字符串就是一个可迭代对象,并且可迭代对象是可以使用for循环。因此,上述程序可去掉“DNA = list(DNA)”,写成如下方式:

#!/usr/bin/env python
import os
# Determining frequency of nucleotides

# Get the name of the file with the DNA sequence data
print("Please type the filename of the DNA sequence data: ")

dna_filename = input()


# open the file, or exit
if os.path.exists(dna_filename):
  DNAFILE = open(dna_filename)
else:
  print("Could not open file %s!\n" % dna_filename)
  exit()


# Read the DNA sequence data from the file, and store it
# into the array variable DNAs
DNAs = DNAFILE.readlines()

# Close the file
DNAFILE.close()

# From the lines of the DNA file,
# put the DNA sequence data into a single string.
DNA = ''.join(DNAs)

# Remove whitespace
DNA = DNA.replace('\n', '')


# Initialize the counts.
# Notice that we can use scalar variables to hold numbers.
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0
errors     = 0

# In a loop, look at each base in turn, determine which of the
# four types of nucleotides it is, and increment the
# appropriate count.
for base in DNA:

    if base == 'A':
        ++count_of_A
    elif base == 'C':
        ++count_of_C
    elif base == 'G':
        ++count_of_G
    elif base == 'T':
        ++count_of_T
    else:
        print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base)
        ++errors

# print the results
print("A = %s\n" % count_of_A)
print("C = %s\n" % count_of_C)
print("G = %s\n" % count_of_G)
print("T = %s\n" % count_of_T)
print("errors = %s\n" % errors)

# exit the program
exit()

   接着,有五个变量初始化为0,python中变量没有类型的分别,如果没有初始化变量,使用该变量时程序会报错终止。

5. 使用索引

  如下是使用索引方式的伪代码:

read in the DNA from a file

join the lines of the file into a single string of $DNA

# initialize the counts
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0

for each base at each position in $DNA

    if base is A
        count_of_A = count_of_A + 1
    if base is C
        count_of_C = count_of_C + 1
    if base is G
        count_of_G = count_of_G + 1
    if base is T
        count_of_T = count_of_T + 1
done

print count_of_A, count_of_C, count_of_G, count_of_T
 例子5-5 计算核苷酸评率 2
#!/usr/bin/env python
import os
# Determining frequency of nucleotides

# Get the name of the file with the DNA sequence data
print("Please type the filename of the DNA sequence data: ")

dna_filename = input()


# open the file, or exit
if os.path.exists(dna_filename):
  DNAFILE = open(dna_filename)
else:
  print("Could not open file %s!\n" % dna_filename)
  exit()


# Read the DNA sequence data from the file, and store it
# into the array variable DNAs
DNAs = DNAFILE.readlines()

# Close the file
DNAFILE.close()

# From the lines of the DNA file,
# put the DNA sequence data into a single string.
DNA = ''.join(DNAs)

# Remove whitespace
DNA = DNA.replace('\n', '')


# Initialize the counts.
# Notice that we can use scalar variables to hold numbers.
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0
errors     = 0

# In a loop, look at each base in turn, determine which of the
# four types of nucleotides it is, and increment the
# appropriate count.
for position in range(len(DNAs)):
    base = DNAs[position]

    if base == 'A':
        ++count_of_A
    elif base == 'C':
        ++count_of_C
    elif base == 'G':
        ++count_of_G
    elif base == 'T':
        ++count_of_T
    else:
        print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base)
        ++errors

# print the results
print("A = %s\n" % count_of_A)
print("C = %s\n" % count_of_C)
print("G = %s\n" % count_of_G)
print("T = %s\n" % count_of_T)
print("errors = %s\n" % errors)

# exit the program
exit()

   例子5-5输出如下:

Please type the filename of the DNA sequence data: small.dna
!!!!!!!! Error - I don't recognize this vase: V
A = 40
C = 27
G = 24
T = 17
errors = 1

   上述for循环等价下面的while循环。

#!/usr/bin/env python
import os
# Determining frequency of nucleotides

# Get the name of the file with the DNA sequence data
print("Please type the filename of the DNA sequence data: ")

dna_filename = input()


# open the file, or exit
if os.path.exists(dna_filename):
  DNAFILE = open(dna_filename)
else:
  print("Could not open file %s!\n" % dna_filename)
  exit()


# Read the DNA sequence data from the file, and store it
# into the array variable DNAs
DNAs = DNAFILE.readlines()

# Close the file
DNAFILE.close()

# From the lines of the DNA file,
# put the DNA sequence data into a single string.
DNA = ''.join(DNAs)

# Remove whitespace
DNA = DNA.replace('\n', '')


# Initialize the counts.
# Notice that we can use scalar variables to hold numbers.
count_of_A = 0
count_of_C = 0
count_of_G = 0
count_of_T = 0
errors     = 0

# In a loop, look at each base in turn, determine which of the
# four types of nucleotides it is, and increment the
# appropriate count.
position = 0

while position < len(DNAs):
    base = DNAs[position]

    if base == 'A':
        ++count_of_A
    elif base == 'C':
        ++count_of_C
    elif base == 'G':
        ++count_of_G
    elif base == 'T':
        ++count_of_T
    else:
        print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base)
        ++errors

    ++position

# print the results
print("A = %s\n" % count_of_A)
print("C = %s\n" % count_of_C)
print("G = %s\n" % count_of_G)
print("T = %s\n" % count_of_T)
print("errors = %s\n" % errors)

# exit the program
exit()

   比较这两个循环,不难发现for循环只是while循环的简写,循环判断条件是position是否小于字符串DNA的长度,这里使用了字符串索引方式。默认情况下,Python假定字符串从0索引开始,其最后一个字符编号为字符串长度减一。

  在上述程序中,我们还使用了python内置函数range,这是一个生成器函数(生成迭代器或列表),例如,range(5)等效生成[0, 1, 2, 3, 4]五个元素的列表。

6. 输出到文件

  例子5-6介绍了另一种计算DNA字符串中核苷酸的方法,它使用了python字符串内置函数count。关于Python学习,你会体会到可能有一个相对简洁的方法来实现一个功能。

例子5-6 计算核苷酸的比例 3
#!/usr/bin/env python
import os
# Determining frequency of nucleotides

# Get the name of the file with the DNA sequence data
print("Please type the filename of the DNA sequence data: ")

dna_filename = input()


# open the file, or exit
if os.path.exists(dna_filename):
  DNAFILE = open(dna_filename)
else:
  print("Could not open file %s!\n" % dna_filename)
  exit()


# Read the DNA sequence data from the file, and store it
# into the array variable DNAs
DNAs = DNAFILE.readlines()

# Close the file
DNAFILE.close()

# From the lines of the DNA file,
# put the DNA sequence data into a single string.
DNA = ''.join(DNAs)

# Remove whitespace
DNA = DNA.replace('\n', '')


# In a loop, look at each base in turn, determine which of the
# four types of nucleotides it is, and increment the
# appropriate count.
count_of_A  = DNA.count('A') + DNA.count('a')
count_of_C  = DNA.count('C') + DNA.count('c')
count_of_G  = DNA.count('G') + DNA.count('g')
count_of_T  = DNA.count('T') + DNA.count('t')
errors     = len(DNA) - count_of_A  -count_of_C - count_of_G - count_of_T

# print the results
print("A=%d C=%d G=%d T=%d errors=%d\n" % (count_of_A, count_of_C, count_of_G, count_of_T, errors))

# Also write the results to a file called "countbase"
outputfile = "countbase"

COUNTBASE  = open(outputfile, 'w')

COUNTBASE.write("A=%d C=%d G=%d T=%d errors=%d\n" % (count_of_A, count_of_C, count_of_G, count_of_T, errors))

COUNTBASE.close()
# exit the program exit()

   例子5-6输出如下:

Please type the filename of the DNA sequence data: small.dna
A=40 C=27 G=24 T=17 errors=1

   例子5-7输出文件countbase内容如下:

A=40 C=27 G=24 T=17 errors=1

7.练习

5.1 编写一个无限循环的程序,循环每次判断条件为真。

5.2 用户输入两(短)DNA串,使用“+”运算符将第二个字符串连接到第一个字符串末尾。将连接的字符串打印,然后在连接的位置开始打印第二个字符串。例如,输入“AAAA”和“TTTT”,则打印:

AAAATTTT

         TTTT

5.3 编写一个程序,打印从1到100的所有数字。

5.4 编写一个程序来获取DNA链的反向互补链。

5.5 编写一个程序来报告蛋白质序列中疏水性氨基酸的百分比。((要查找哪些氨基酸是疏水性的,请参阅有关蛋白质,分子生物学或细胞生物学的任何介绍性文章。

5.6 编写一个程序,检查作为参数输入的两个字符串是否彼此反向互补。

5.7 编写一个程序来报告DNA序列GC的比例。

5.8 编写一个程序,可以替换DNA中指定位置的碱基。

参考资料

Begining Perl for Bioinformatics

猜你喜欢

转载自www.cnblogs.com/yahengwang/p/9436014.html