Linux file manipulation commands commonly used in bios

The most commonly used operating system for bioinformatics data analysis is Linux. Many open source bioinformatics software are based on the command line and are only Linux versions. Although there are different GUIs (graphical user interfaces) available for bioinformatics analysis, they are no easier to use than on Linux. Therefore, learning Linux commands is an essential skill for bioinformatics data analysis. This article will introduce the basic Linux commands required for daily bioinformatics data analysis.

When we open a Linux terminal, we should usually see a shell prompt (command line interface) with a '$' sign. A shell is a program that takes command input from the user and passes it to the operating system for processing and printing the output on the screen.

View files in the current directory

You can use the ls (list) command to list files in the current path. If you provide no arguments to ls, it lists only the names of files and directories.

ls

file1.txt  file2.txt

#如果添加 -l 选项,可以看到一个长列表格式,包括权限、所有者、日期、磁盘大小和时间
ls -l

total 0
-rw-rw-r-- 1 ubuntu ubuntu 0 Jun  4 11:10 file1.txt
-rw-rw-r-- 1 ubuntu ubuntu 0 Jun  4 11:10 file2.txt

#如果添加 -lrt 选项,则按按升序时间排序(最后修改时间为最后)的长列表格式,
#包括权限、所有者、日期、磁盘大小和时间
ls -lrt

total 0
-rw-rw-r-- 1 ubuntu ubuntu 0 Jun  4 11:10 file1.txt
-rw-rw-r-- 1 ubuntu ubuntu 0 Jun  4 11:10 file2.txt

Check the current path

You can use the pwd command to obtain the absolute path of the current working directory, which is also a command commonly used in the bioinformatics data analysis process.

pwd

/home/ubuntu/test

#变量 $PWD 包含当前工作目录的路径,可以使用 echo 命令打印
echo $PWD

/home/ubuntu/test

Switch directory

Use the cd command to change to another directory. This is the most widely used command in bioinformatics analysis and is used to browse directory structures. We can give the directory name in the current path or we can give the absolute (or relative) path to the directory in another path. If you just type cd without any directory name, the current directory changes to the home directory.

#切换回主目录(/home/ubuntu)
cd

#切换至上级目录
cd ..

#切换至根目录
cd /

Create and delete directories

The mkdir and rmdir (or rm) commands can be used to create and delete directories respectively.

#创建名为一个temp的目录
mkdir temp

#在另一个目录下创建一个目录
mkdir -p parent/temp

#删除目录
rmdir temp

#删除目录及其内的文件
rmdir -rf temp

*Note: Use the rm -r or rm -rf commands with caution as they will recursively delete all files and subdirectories. Once folders and files are deleted, the deleted data cannot be recovered.

Create and edit files

Use vim, touch, cat and echo commands to create files.

#vim创建文件
vim file.txt

#touch创建一个空文件
touch file.txt

#cat创建文件
cat > file.txt

#echo写入文件
echo "This is a test file" > file.txt

View file contents

In Linux, you can use several commands or text editors to read complete or partial files. The most popular Linux commands for reading files include less, more, cat, head, and tail.

#less和more是浏览大文件的首选
less file.txt
more file.txt

#cat一次性显示文件的内容到终端
cat file.txt

#head默认查看前十行
head file.txt

#tail默认查看后十行
tail file.txt

Merge files

Use the cat command to merge and append two or more files. Here, the redirection operators > and >> are used to merge and append files respectively.

#合并file1.txt和file2.txt的内容
cat file1.txt file2.txt > merged_file.txt

#追加file3.txt的内容到merged_file.txt
cat file3.txt >> merged_file.txt

Copy, rename and delete files

The cp command copies one file to another, the mv command renames a file, and rm deletes a file.

#创建file1.txt的副本file2.txt
cp file1.txt file2.txt

#重命名file1.txt为file2.txt
mv file1.txt file2.txt

#删除file1.txt
rm file1.txt

Compress and extract files

Most bioinformatics tools and files are compressed into .tar.gz or other compression formats such as .zip, .gz, .bz2, etc. In bioinformatics analysis we often compress and decompress data files.

The tar command is useful for compressing files and directories into archives (.tar, .tar.gz, .tar.bz2, etc.) and for extracting files from tar archives.

# 从.tar 存档中提取文件
tar -xvf archive.tar

# 列出.tar存档中的文件(不提取)
tar -tf archive.tar

# 从 .tar.gz 存档中提取文件
tar -xvf archive.tar.gz

# 从 .tar.bz2 文件中提取文件
tar -xvf archive.tar.bz2

# 创建 tar 存档
tar -cvf archive.tar dir
tar -czvf archive.tar.gz dir
tar -cjvf archive.tar.bz2 dir

# 提取 .gz 压缩文件
gunzip file.gz

# 创建 .gz 压缩文件
gzip file.fastq

# 提取 .bzip2 压缩文件
bunzip2 file.bz2

# 创建 .bzip2 压缩文件
bzip2 file.fastq

# 提取 .zip 存档
unzip archive.zip

# 创建 .zip存档
zip archive.zip file.txt

#创建 .zip存档(对文件夹)
zip -r archive.zip dir

Standard output (STDOUT) and standard error (STDERR) (>, >2, &>)

Standard output (STDOUT) and standard error (STDERR) are two output streams of Linux commands that print output and error messages directly on the screen. We can also redirect standard output (>1 or >) and errors (2>) to files using the redirection operator (>). &> can be used to redirect standard output and errors to files.

# 在屏幕上打印标准输出
cat file.txt

# 使用重定向运算符(> 或 >1)将标准输出打印到文件
cat file.txt > file2.txt

# 使用重定向运算符(> 和 >2)将标准输出和标准错误打印到文件(file3.txt不存在)
cat file2.txt file3.txt > stdout.txt >2 stderr.txt

# 使用重定向运算符 (&>) 将标准输出和标准错误打印到同一个文件
cat file2.txt file3.txt &> out.txt

Path to search for executable files

In Linux, the which command can be used to locate the absolute path of an executable file (an installed command or tool). For example, if we want to know the absolute path of fastqc command, you can use which fastqc. The which command can also be used to check whether a specific tool is installed and available in the system path.

If you like this article, please like it and support it. You are welcome to follow the vx public account " Bioinformation Technology ".

Guess you like

Origin blog.csdn.net/m0_56572447/article/details/131148134