GBrowse2.0中将GFF3数据转存到MySQL中

概述

在实际项目中GFF3的文件数据量非常大,当GFF3数据量大于1000时就应该考虑使用数据进行存储GFF3数据了,GBrowse支持大量的数据库进行数据存储比如MySQL,SQLite...

数据存储转换
我所用到的数据及账号信息(均需替换成自己的):
  • Malus_Gene_Zh.gff3 苹果基因数据
  • MySQL用户名 root 密码 root(本机测试,没使用复杂的密码,建议使用安全性高的密码) 数据库名 Malus

1、建立相应的数据库
    
    
  1. mysql -uroot -proot #用户名 密码
  2. create database Malus_zh #数据库名称

2、将GFF3数据转换到MySQL中
    
    
  1. /usr/bin/bp_seqfeature_load -a DBI::mysql -d Malus_zh -u root -p root --create Malus_Gene_Zh.gff3
出现如下的结果就表明数据正在转换并且成功了:
    
    
  1. loading Malus_Gene_Zh.gff3...
  2. Building object tree...70.36s.19s
  3. load time: 766.59s
  4. Building summary statistics for coverage graphs...
  5. 402000 features processed
  6. coverage graph build time: 295.58s
  7. total load time: 1062.17s

注: bp_seqfeature_load.pl脚本创建和存储GFF3数据到MySQL(该文件所在的位置为  /usr/ bin / bp_seqfeature_load . pl ),并且需要注意权限的问题,一般用root登录操作就没有问题

文档信息
    
    
  1. Usage: /usr/bin/bp_seqfeature_load.pl [options] gff_file1 gff_file2...
  2. Options:
  3. -d --dsn The database name (dbi:mysql:test)
  4. -s --seqfeature The type of SeqFeature to create (Bio::DB::SeqFeature)
  5. -a --adaptor The storage adaptor to use (DBI::mysql)
  6. -v --verbose Turn on verbose progress reporting
  7. --noverbose Turn off verbose progress reporting
  8. -f --fast Activate fast loading (only some adaptors)
  9. -T --temporary-directory Specify temporary directory for fast loading (/tmp)
  10. -c --create Create the database and reinitialize it (will erase contents)
  11. -u --user User to connect to database as
  12. -p --password Password to use to connect to database
  13. -S --subfeatures Turn on indexing of subfeatures (default)
  14. --nosubfeatures Turn off indexing of subfeatures
  15. -i --ignore-seqregion 忽视序列区域
  16. If true, then ignore ##sequence-region directives in the
  17. GFF3 file (default, create a feature for each region)
  18. -z --zip If true, database tables will be compressed to save space
也可以使用man手册进行查看
    
    
  1. man /usr/bin/bp_seqfeature_load
使用实例
    
    
  1. 重新创建并且刷新频率表
  2. bp_seqfeature_load.pl -a DBI::mysql -d <db> -u <user> -p <passwd> --create *.gff3 *.fasta

3、设置MySQL访问权限
    
    
  1. mysql -uroot -p password -e 'grant all privileges on genomegff3.* to me@localhost'
  2. mysql -uroot -p password -e 'grant select on genomegff3.* to apache@localhost'
或者(直接使用下面的方法比较简单)
    
    
  1. grant all privileges on `Malus_zh`.* to 'www-data'@localhost identified by 'root';

最终结果如下图:
 
 
配置GBrowse

1、编辑GBrowse全局配置文件(添加相应的配置项)注:我们需要添加的数据在最后
    
    
  1. # This is the global configuration for gbrowse
  2. # It contains setting common to all data sources as well
  3. # as the various constants formerly scattered amongst scripts and libraries
  4. [GENERAL]
  5. config_base = /etc/gbrowse # overridden by environment variable GBROWSE_CONF
  6. htdocs_base = /usr/share/gbrowse/htdocs
  7. url_base = /gbrowse2
  8. tmp_base = /var/cache/gbrowse
  9. persistent_base = /var/lib/gbrowse/databases
  10. userdata_base = /var/lib/gbrowse/databases/userdata
  11. db_base = /var/lib/gbrowse/databases/databases
  12. # These paths are relative to the url base
  13. buttons = images/buttons
  14. balloons = images/balloons
  15. openid = images/openid
  16. gbrowse_help = .
  17. js = js
  18. # These paths are relative to the config base
  19. plugin_path = plugins
  20. language_path = languages
  21. templates_path = templates
  22. moby_path = MobyServices
  23. # session settings
  24. session lock type = default
  25. # If no session driver is set, then GBrowse will pick one for you.
  26. # It will use db_file for the driver and storable for the serializer
  27. # if available; otherwise falling back to the file driver and default serializer.
  28. # Override driver guessing by setting these options
  29. # The safest, but slowest session driver...
  30. #session driver = driver:file;serializer:default
  31. #session args = Directory /var/lib/gbrowse2/sessions
  32. # to use the berkeley DB driver comment out the previous
  33. # line and uncomment these two
  34. #session driver = driver:db_file;serializer:default
  35. #session args = FileName /var/lib/gbrowse2/sessions.db
  36. # DBI backend to use for uploaded userdata.
  37. # The SQLite option is the easiest to use and the best tested.
  38. # if this option is commented out, then GBrowse will
  39. # try 'DBI::SQLite', 'berkeleydb', 'DBI::mysql' and finally the 'memory'
  40. # backend.
  41. # NOTICE the double semicolon! This is a DBI Perl module, NOT a DBI connection string.
  42. # For the DBI::mysql adaptor to work, you must give the web user
  43. # permission to create databases named userdata_% using the following
  44. # mysql command:
  45. # mysql> grant all privileges on `userdata\_%`.* to 'www-data'@localhost identified by 'foobar';
  46. # Note the backquotes around the database name, and do be sure to replace "foobar" with
  47. # a more secure password!
  48. # for SQLite
  49. #upload_db_adaptor = DBI::SQLite
  50. # for Berkeleydb
  51. #upload_db_adaptor = berkeleydb
  52. # for mysql
  53. #upload_db_adaptor = DBI::mysql
  54. #upload_db_host = localhost
  55. #upload_db_user = www-data
  56. #upload_db_pass = foobar
  57. # Debug settings
  58. debug = 0
  59. debug_external = 0
  60. debug_plugins = 0
  61. # Performance settings
  62. renderfarm = 1
  63. slave_timeout = 45
  64. global_timeout = 60
  65. search_timeout = 15
  66. max_render_processes = 4 # try double number of CPU/cores
  67. # Clean up settings (used by the gbrowse_clean script)
  68. expire session = 1M # expire unused sessions after a month
  69. expire cache = 2h # expire cached data if unmodified for >2 hours
  70. expire uploads = 6w # expire uploaded data if unused for >6 weeks
  71. # Appearance settings
  72. truecolor = 1 # better appearance at the expense of larger image files
  73. # The #include line following this one defines a transparent theme.
  74. # Replace "transparent_colors" with "solid_gray_colors"
  75. # or "warm_colors" for different themes.
  76. #include "themes/warm_colors"
  77. # #include "themes/transparent_colors"
  78. # #include "themes/solid_gray_colors"
  79. balloon tips = 1
  80. titles are balloons = 1
  81. plugins = FastaDumper RestrictionAnnotator SequenceDumper TrackDumper
  82. overview grid = 0
  83. region grid = 0
  84. detail grid = 1
  85. image widths = 450 640 800 1024
  86. default width = 800
  87. pad_left = 60
  88. pad_right = 30
  89. too many landmarks = 100
  90. track listing style = categories # either "categories" or "facets"
  91. # Loads more details image data than can fit on the screen. This lets the user drag and drop the details
  92. # tracks, without loading more data from the server. A value of 1 is default (no drag and drop). A value
  93. # of 3 loads one full width on each side.
  94. details multiplier = 3
  95. # where to link to when user clicks in detailed view
  96. link = AUTO
  97. # HTML to insert inside the <head></head> section
  98. head =
  99. # At the top of the HTML...
  100. header =
  101. # At the footer
  102. footer = <hr />
  103. <p style="font-size:small">The Generic Genome Browser. For questions about the data
  104. at this site, please contact its webmaster. For support of the
  105. browser software <i>only</i>, send email to
  106. <a href="mailto:[email protected]">gmod-gbrowse@lists.sourceforge.net</a>
  107. or visit the <a href="http://www.gmod.org">GMOD Project</a> web pages.
  108. </p>
  109. # Various places where you can insert your own HTML -- see configuration docs
  110. html1 =
  111. html2 =
  112. html3 =
  113. html4 =
  114. html5 =
  115. html6 =
  116. # Limits on genomic regions (can be overridden in datasource config files)
  117. region segment = 200000
  118. max segment = 5000000
  119. default segment = 5000
  120. zoom levels = 100 200 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000
  121. region sizes = 1000 5000 10000 20000
  122. default region = 5000
  123. fine zoom = 10%
  124. # keyword search maxima
  125. max keyword results = 1000
  126. ###### Authorization ######
  127. # uncomment this to use the PAM authentication plugin
  128. # authentication plugin = PamAuthenticate
  129. ####### User Account Registration Database ######
  130. # If no authentication plugin is defined, and
  131. # "user_accounts" is true, then GBrowse
  132. # will attempt to use its internal user accounts database
  133. # to authenticate and/or register users.
  134. user_accounts = 0
  135. user_accounts_registration = 0
  136. user_accounts_openid = 0
  137. # Path to the database -- you will need to create this database and grant all
  138. # privileges on it to the indicated user.
  139. user_account_db = DBI:SQLite:/var/lib/gbrowse2/databases/users.sqlite
  140. # For SQLite
  141. # user_account_db = DBI:SQLite:/var/lib/gbrowse2/databases/users.sqlite
  142. # For MySQL
  143. #user_account_db = DBI:mysql:gbrowse_login;user=root;password=root
  144. # The number of public files to display
  145. public_files = 10
  146. # What email gateway to use for outgoing registration confirmation messages.
  147. # The full format is
  148. # <smtp.server.com>:<port>:<encryption>:<username>:<password>
  149. # Only the first field, the server name, is required.
  150. # The port is assumed to be 25 unless ssl encryption is specified, in
  151. # which case it defaults to 465.
  152. # protocol is either "plain" or "ssl", "plain" assumed.
  153. # username and password may be required by the gateway for authentication
  154. #
  155. # here are some common options
  156. # smtp_gateway = localhost # localhost has properly configured outgoing gateway
  157. # smtp_gateway = smtp.oicr.on.ca # indicated machine will forward email for you
  158. # smtp_gateway = smtp.gmail.com:465:ssl:joe.user:secret # use gmail with account "joe.user" and password "secret"
  159. # smtp_gateway = none # disable outgoing email
  160. smtp_gateway = none # disable outgoing email
  161. # These values are used in the login confirmation message sent during
  162. # user registration. You may customize
  163. application_name = GBrowse
  164. application_name_long = The Generic Genome Browser
  165. email_address = noreply@gmod.org
  166. # name of the "superuser" who can add public tracks
  167. admin_account = admin
  168. admin_dbs = /var/lib/gbrowse2/databases/admin_uploads
  169. ######## DEFAULT DATASOURCE #########
  170. default source = yeast
  171. ###############################################################################################
  172. # Global settings for plugins (used for the PamAuthenticate plugin only at this point)
  173. ###############################################################################################
  174. [PamAuthenticate:plugin]
  175. login hint = your UNIX account
  176. login help = <span style="font-size:9pt">Please see your system administrator for help<br>if you have lost your password.</span>
  177. pam service name = gbrowse
  178. ###############################################################################################
  179. #
  180. # DATASOURCE DEFINITIONS
  181. # One stanza for each configured data source
  182. #
  183. ###############################################################################################
  184. [yeast]
  185. description = Yeast chromosomes 1+2 (basic)
  186. path = yeast_simple.conf
  187. [yeast_advanced]
  188. description = Yeast chromosomes 1+2 (advanced)
  189. path = yeast_chr1+2.conf
  190. [yeast_renderfarm]
  191. description = Renderfarm demo (gbrowse_slave must be running!)
  192. path = yeast_renderfarm.conf
  193. [pop_demo]
  194. description = Population Display Demo
  195. path = pop_demo.conf
  196. [volvox]
  197. description = Tutorial database
  198. path = volvox.conf
  199. [Malus_zh] #添加Malux基因信息
  200. description = Malus_zh database
  201. path = Malus_zh.conf
2、配置数据源配置文件
    
    
  1. vim /etc/gbrowse/Malus_zh.conf
数据项配置
    
    
  1. [GENERAL]
  2. db_adaptor = Bio::DB::SeqFeature::Store #数据库适配器
  3. db_args = -adaptor DBI::mysql # MySQL适配器参数
  4. -dsn dbi:mysql:database=Malus;host=localhost #数据库连接信息
  5. -user root #MySQL用户名
  6. -pass root #MySQL密码
  7. # just the basic track dumper plugin
  8. plugins = FastaDumper RestrictionAnnotator SequenceDumper TrackDumper Submitter S
  9. autocomplete = 1
  10. # list of tracks to turn on by default
  11. default tracks = Contigs Scaffolds #默认显示的track
  12. # size of the region
  13. #region segment = 120000
  14. # examples to show in the introduction
  15. examples = chr1:1..120000
  16. chr2:1..120000
  17. chr3:1..120000
  18. # feature to show on startup
  19. initial landmark = chr1:1..120000 #用户直接进入网页浏览到的数据段
  20. ########################
  21. # Default glyph settings
  22. ########################
  23. [TRACK DEFAULTS] #默认track信息
  24. glyph = generic
  25. height = 10
  26. bgcolor = lightgrey
  27. fgcolor = black
  28. font2color = blue
  29. label density = 25
  30. bump density = 100
  31. # where to link to when user clicks in detailed view
  32. link = AUTO
  33. ################## TRACK CONFIGURATION ####################
  34. # the remainder of the sections configure individual tracks
  35. ###########################################################
  36. [Contigs] #用户自定义的数据信息,在这里是contig
  37. feature = contig
  38. glyph = generic
  39. bgcolor = green
  40. height = 10
  41. key = Contigs
  42. [Scaffolds] #在这里是ultracontig
  43. feature = ultracontig
  44. glyph = generic
  45. bgcolor = blue
  46. height = 10
  47. key = Scaffolds
3、访问测试
    
    
  1. 在浏览器中输入 http://localhost/cgi-bin/gbrowse/gbrowse/Malus_zh/
可以看到下面的效果就表明安装成功:
 

猜你喜欢

转载自blog.csdn.net/zp_00000/article/details/65630451
今日推荐