RNA-seq分析实验出错记录及其解决方案

我的操作平台为linux64bit,所以一切代码均在linux平台下运行。

1.linux解压缩file.gz格式文件。


【参考链接】
[1]https://blog.csdn.net/z69183787/article/details/81739901
[2]https://www.cnblogs.com/wangshouchang/p/7748527.html

【解决方案】

gzip -d gencode.v29.annotation.gff3.gz

2.gff文档的下载
在GENECODE数据库中可以下载到chr开头的gff3人类基因组注释文件。
https://www.gencodegenes.org/human/release_29.html
本次实验我主要下载的Comprehensive gene annotation(Regions:CHR)人类染色体的注释文件。
主要用于解决hiseq-count环节时定量结果都为0的情况。

ENSG00000000003    0
ENSG00000000005    0
ENSG00000000419    0
ENSG00000000457    0
ENSG00000000460    0
ENSG00000000938    0
ENSG00000000971    0
ENSG00000001036    0
ENSG00000001084    0
ENSG00000001167    0
ENSG00000001460    0
ENSG00000001461    0
ENSG00000001497    0
ENSG00000001561    0
ENSG00000001617    0
ENSG00000001626    0
ENSG00000001629    0
ENSG00000001630    0
ENSG00000001631    0
ENSG00000002016    0


【问题链接】https://www.bioinfo.info/?/question/462
另附他人总结的gff文件的四种下载方法:
https://blog.csdn.net/u011262253/article/details/89363809

3.hiseq结果文件解读
结果文件分为2列,第一列是基因名称(ENSMUSG00000000001.4),第二列是统计得到的reads数。
在文件的结尾会有汇总信息。
__no_feature 42987809     #不能对应到任何单位类型的reads数
__ambiguous 183025        #不能判断落在那个单位类型的reads数
__too_low_aQual 0         #低于-a设定的reads mapping质量的reads数
__not_aligned 0           #存在于SAM文件,但没有比对上的reads数
__alignment_not_unique 0   #比对到多个位置的reads数

接着下一步我们会对reads进行进一步的分析整合。
具体参见链接:https://www.jianshu.com/p/d8d5e0b2e33b

4.linux下创建新的文件

touch 新文件名.sh

5.HTSeq的安装指南
【参考官网的安装指南】
https://htseq.readthedocs.io/en/release_0.11.1/install.html#installation-on-linux
我的安装平台为buntu64位
python版本为2.7.1
所以本次安装采用的指令为

sudo apt-get install build-essential python2.7-dev python-numpy python-matplotlib python-pysam python-htseq


安装成功!


之前参考一些人的笔记,尝试过很多办法都不能解决。
https://www.cnblogs.com/triple-y/p/9338890.html
http://blog.sina.com.cn/s/blog_68ddca510102wts6.html
但是在这个过程中报错如:

symlinking folders for python2
Could not import 'setuptools', falling back to 'distutils'.
Traceback (most recent call last):
  File "setup.py", line 200, in <module>
    **kwargs
  File "/usr/lib/python2.7/distutils/core.py", line 111, in setup
    _setup_distribution = dist = klass(attrs)
  File "/usr/lib/python2.7/distutils/dist.py", line 259, in __init__
    getattr(self.metadata, "set_" + key)(val)
  File "/usr/lib/python2.7/distutils/dist.py", line 1220, in set_requires
    distutils.versionpredicate.VersionPredicate(v)
  File "/usr/lib/python2.7/distutils/versionpredicate.py", line 113, in __init__
    raise ValueError("expected parenthesized list: %r" % paren)
ValueError: expected parenthesized list: '>=0.9.0'

包括也尝试过在windows下的pip指令。(据说htseq是不能用在windows平台上的)

C:\Users\Administrator>pip install HTSeq
Collecting HTSeq
  Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161A20>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161908>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161EF0>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161518>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002074F161550>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/htseq/
  Could not find a version that satisfies the requirement HTSeq (from versions: )
No matching distribution found for HTSeq
You are using pip version 18.0, however version 19.1.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

6.处理bam文件的时候遇到"file may be truncated"的错误。

【错误屏显】

Error occured when reading beginning of SAM/BAM file.

no BGZF EOF marker; file may be truncated

【对bam文件是否完整的诊断方案】

samtools view 42_align_sorted.bam|tail


参考链接:https://www.jianshu.com/p/c6dd7edd6e80

【猜测出现这种情况的可能原因】

(1)生成文件的过程中,突然中断指令。

(2)在文件传输的过程中,为传输完整。(我是在用u盘拷贝文件时拷贝不完全。)

猜你喜欢

转载自blog.csdn.net/weixin_40640700/article/details/91549488