基因组注释之软件使用

1、RepeatMasker

1.1、输入

输入格式为fasta序列,不接受其它 GenBank, Staden,等格式。它既可以处理一个批文件(一个文件包含许多条序列),也可以批处理许多文件(每个文件含有一条序列)。

RepeatMasker *.fasta

 该命令将mask当前目录下所有的以.fasta文件结尾,并为每个文件提供单独的报告。虽然处理批文件更快,但是处理单个文件更精准。

This command will mask all files that end with .fasta in the current directory and give separate reports for each file. Note that if you have
multiple small sequences it is considerably faster to run RepeatMasker on one batch file than on many single sequence files. The summary file 
will be more informative as well. However, analysis on single files (when larger than 2 kb each) can be slightly more accurate, since GC levels
 for each sequence will be calculated and used to choose appropriate parameters.

 1.2、输出

RepeatMasker返回3个文件:

.mask文件:其中包含所有已标识的重复和低复杂度序列,即mask后得基因组。

.out文件:列出被mask的序列,及其注释文件。序列按提交文件中的顺序打印,而序列在注释表中按字母顺序表示。

tbl文件是所分析序列的重复程度得摘要统计。

RepeatMasker returns a .masked file containing the query sequence(s) with all identified repeats and low complexity sequences masked. 
These masked sequences are listed and annotated in the .out file. The masked sequences are printed in the same order as they are in the
submitted file, whereas the sequences are presented alphabetically in the annotation table. The .tbl file is a summary of the repeat
content of the analyzed sequence.

猜你喜欢

转载自www.cnblogs.com/djx571/p/12340799.html