Article directory
Install and configure conda
Use Tsinghua source to download the sh script and install it
# 使用清华源下载sh脚本
wget -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
# 从官网下载最新版Miniconda3安装包,但速度较慢
wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Run the script file directly after downloading bash Miniconda3-latest-Linux-x86_64.sh
. You need to enter yes and wait for the installation to complete
. After the final installation, you cannot use conda immediately. You need to source bashrc.
# 激活bashrc
source ~/.bashrc
NOTE⚠️:
- conda will write a script in bashrc and connect to ssh to automatically enter the conda environment command. If not needed, you can run commands and perform performance configuration
conda config --set auto_activate_base false
- In addition, if you use tools such as zsh and zshrc is not automatically written, you can manually write it in the file.
- If the conda command is not read, you can manually define environment variables
export PATH="/home/super/miniconda3/bin:$PATH"
Set mirror source
# 下面这四行配置清华大学的bioconda的channel地址,国内用户推荐
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/
## 官网默认
conda config --add channels r
conda config --add channels conda-forge
conda config --add channels bioconda
After setting the mirror or setting not to automatically enter the base, the config information will be automatically generated in the .condarc file. as follows:
$ cat .condarc
auto_activate_base: false
channels:
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
- defaults
conda environment creation
Create a python2 environment management:
conda create -y -n rna_seq python=3
# -y 自动确认
# -n 新环境名字
# python=3 新环境中python=3
Activate and exit environments
conda activate <conda_name> #激活某环境
conda decativate <conda> #取消激活某环境
conda installation software
Install software using commands in the software environment
conda install -y sra-tools #安装sra-tool软件,可以通过空格安装多个软件
conda install -y sra-tools fastqc trim-galore hisat2 subread multiqc samtools salmon fastp
The installation location of conda software is different from that of ordinary software. which <softname>
Check the location of the software installed by conda.
Quality Assessment@fastQC
fastq format
FastQ format description: https://mp.weixin.qq.com/s/8g-oUjiEhV4cGMJNuhmISQ
FastQ format wiki: https://en.wikipedia.org/wiki/FASTQ_format
FastQ format literature: https://www.ncbi. nlm.nih.gov/pmc/articles/PMC2847217/
Concept
FastQ format is a common sequence format. It stores biological sequences and corresponding quality evaluations. The sequence and quality information are marked with an ASCII character. It was originally developed by Sanger to put FASTA sequences and quality data together. Together, it has now become the de facto standard for high-throughput sequencing results.
Format description
Each sequence in a FASTQ file usually has four lines:
- 1. The first line: must start with "@", followed by a unique sequence ID identifier, and then optional sequence description content. The identifier and description content are separated by spaces;
- 2. The second line: sequence characters (nucleic acid is [AGCTN]+, protein is amino acid characters);
- 3. The third line: must start with "+", followed by an optional ID identifier and optional description content. If there is content after "+", the content must be the same as the content after "@" in the first line ;
- 4. The fourth line: base quality characters. Each character corresponds to the quality of the base or amino acid at the corresponding position in the second line. This character can be converted into a base quality score according to certain rules. The base quality score can reflect the quality of the base. Error rate. The number of characters in this line must be the same as the number of characters in the second line.
FsatQC software
FastQC quality assessment software official website: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Attention ⚠️
- fastqc can
*.bam
*.sam
*.fq
*.fq.gz
perform quality assessment on. - Fastqc can
-t
operate by specifying multi-threading. Multi-threading processes multiple input files at the same time. Several threads can process several files at the same time. It seems meaningless to use multi-threading for a single file. - There seems to be no difference between using fastqc on bam quality assessment and on filtered, post-charge files
- Batch processing in bash is relatively simple, but in zsh, it is different and needs to be used in command substitution.
echo $list
Commonly used parameters:
# 常用参数
fastqc -o <out.dir> -t <thred_num> -f <input_format> <input_file_1> <input_file_2> ...
# -o 设置输出目录
# -t 设置线程数
# -f 设置输入文件格式
Batch processing
# bash中
a=`ls *.fq`
fastqc -o ./fastqc_raw -t 10 $a
# zsh中
b=`ls -C *.fq`
fastqc -o ./fastqc_raw -t 10 `echo $b`
</