[Study Notes] Bioinformatics of Shandong University - Perl Language Basics + Advanced

Article directory

8.3 Getting Started with Perl Language Basics

See the video for details: Introduction to Perl Language Basics-01 P137

● Perl : Free download from the official website of Practical Extraction and Report Language (may not be opened if the network is not good), and supports multiple operating systems

Literal translation language : Interpreted Language
Sequence processing : especially suitable for string (sequence) processing
File processing : convenient and concise file writing and writing
Pattern matching : full support for regular expressions

8.3.1 The first Perl

Create a text file named test.pl
The first line writes Perl execution environment (installation path)
linux writes: #!/usr/bin/perl
Windows writes:#!C:/Strawberry/perl/bin/perl.exe
The second line writes a simple statement, such as:print "He1lo world!";
Save test.pl and open the cmd command window.
Change to the directory where test.pl is located.cd Desktop
Execute perl test.plthe command to run your first program.

8.3.2 Basic rules of Perl

insert image description here

Each statement must be followed by " ;".
Add " " in front of the comment# , it is only valid for a single line .
The definition of variables is extremely simple and casual:
◆ There is no need to declare variables in advance, and they can be defined as they are used;
◆ There is no need to define variable types , and the program will automatically judge after assignment ;
◆ Variable names are case-sensitive , start with a letter , and cannot have special characters ;
◆ Variable names Add " $" in front, use " " =to assign value.$scalar = expression

● The use of various quotation marks :

Single quotes'' completely literal translation of everything inside quotes.
Double quotes can identify ""variables and antonyms in quotes .
Backticks execute ``the content inside the quotes as a command .

● The use of various operators :

Comparison Operators Comparison Operators: Comparison operators
for strings are different from comparison operators for numbers !
◆ Numbers are smaller than letters !
◆ Uppercase letters are smaller than lowercase letters!
◆ From small to large : numbers [0-9], uppercase letters [AZ], lowercase letters [az]
Logical Operators Logical Operators:
You can write directly: or, and, not
or use symbols instead: ||, &&, !
Special logical operators: xor XOR
△ a xor b = (a' and b) or (a and b')(a' = not a)
If the two values of a and b are not the same , the XOR result is 1. If the two values of a and b are the same , the XOR result is 0.

String Operators String Operators:
.string splicing and concatenation
x , repeated copying of strings, assignment of
.= strings after splicing and concatenation

$string1 = "potato";
$string2 = "head";
$string3 = $string1 . $string2;	#patatohead
$string4 = $string1 x 2;		#patato
$string1 .= $string2; 			#string1 = patatohead

8.3.3 String Common Functions

Perl has many pre - defined functions: 函数名 ( 参数 1, 参数 2,... 参数 n)
Note: the commas
between the parameters are essential , and the parentheses surrounding the parameters are optional

lengthFunction:
Syntax: length($string);
Description: Find the byte value (length) of the string $string.

$string="Per15";
$size=length($string); #这时 $size=5

#计算一条氨基酸序列的长度
$seq="GHMGSSVLEELVQLVKDKNIDISIKYDPRKDSEVEANRVITDDIELLKKILAYFLPEDAILKGGHYDNQLQNGIKRVKEFLESSPNTQWELRAFMAVMHFSLTADRIDDDILKVIVDSMNHHGDARSKLREELAELTAE";
$len=length($seq);
print "The protein contains $len amino acids.";

substrFunction:
Syntax: substr($string, offset, 1ength);
Description: Intercept the substring of the string $string .
★ Note: The first digit is 0, not 1.
offsetRepresents the position of the starting character , and if offset is a negative value , the specified character will start from the right side of the string .
lengthRepresents the length of the referenced string. If length is omitted , it represents the length from the start value to the last character of the string.
insert image description here

$s1=substr("perl5",2,2);   	#$s="rl" 从第二个字符串后开始截取 2 个字符串
$s2=substr("perl5",1);   	#$s="erl5" 从第一个字符串后开始截取剩余字符串
$s3=substr("perl5",-5,3);    #$s="per" 从倒数第五个字符串开始截取 3 个字符串

#提取出蛋白质序列的功能区，功能区位于第 12 个氨基酸到第 23 个氨基酸。
$seq="GHMGSSVLEELVQLVKDKNIDISIKYDPRKDSEVEANRVITDDIELLKKILAYFLPEDALKGGHYDNQLQNGIKRVKEFLESSPNTQWELRAFMAVMHFSLTADRIDDDILKVIVDSMNHHGDARSKLREELAELTAE";
$begin=11;
$end=22;
$length=$end-$begin+1; 
$region=substr($seq, $begin, $length);
print "The functional region is $region.";

indexFunction:
Syntax: index($string, $substring, position)
Description: Return the position of the character you want to find$substring in the string ($stringreturns only the first position found) and returns if not found-1 . $substringIt is the character to be searched for ;
positionit represents the position from which to start searching, if omitted, it will start from the beginning .

$s1=index("pell5","p");		#$s1=0
$s2=index("pell5","l",2);	#$s2=2  只返回找到的第一个位置
$s3=index("pell5", "perl");	#$s3=-1

#查找某功能区在一条蛋白质序列里的位置。
$seq="GHMGSSVLEELVQLVKDKNIDISIKYDPRKDSEVEANRVITDDELLKKILAYFLPEDAILKGGHYDNQLQNGIKRVKEFLESSPNTQWELRAFMAVMHFSLTADRIDDDILKVIVDSMNHHGDARSKLREELAELTAE";
$region="KEFLESSPNT";
$begin=index($seq, $region)+1;  #起始位置，注意获取 index 后加一
$end=$begin+length($region)-1;  #结束位置，注意减一
$begin_aa=substr($region,0,1);  #第一个氨基酸
$end_aa=substr($region,-1,1); #最后一个氨基酸
print "The functional region is from $begin_aa$begin to $end_aa$end.";

8.3.4 Array common functions

An Array is a set of elements enclosed in parentheses() .
- Elements can be any value or empty , separated by commas
- The number of elements can be increased or decreased
at any time - Arrays are @represented by symbols, for example: - An element@arr("A","T","C","G")
in an array is equal to a variable , written as:$arr[index_number]
```
print $arr[0];  #A
print $arr[1,3]; #TG
print @arr; #ATCG
```

scalarFunction: Number of elements
Syntax: scalar(@array)
Description: Returns the number of elements@array in the array .

@arr=("A","B","C"); #定义一个数组
$num=scalar(@arr); #这时$num=3

#批量处理一组 Uniprot ID 前，先数数这一组 ID 的个数。
@ids=("Q6GV17", "Q9BXR5", "B3Y669", "COLSK8");
$count=scalar(@ids);
print "$count ids : @ids";

reverseFunction: Reverse Order
Syntax: reverse(@array)
Description: Reversely arrange@array the elements in the array from back to front .

sortFunction: Ascending
Syntax: sort[{$a<=>$b}](@array)
Description: Sort @arraythe elements in the array in ascending order of ASCII codes . After adding, sort by numerical size .{$a<=>$b}

@arr1=(21,1,2,12);
@arr2=reverse(@arr1); #@arr2=(12,2,1,21); 元素逆序
@arr3=sort(@arr); #@arr2=(1,12,2,21) 按字符串大小排序
@arr4=sort{
      
      $a<=>$b}(@arr); #@arr3=(1,2,12,21) 按数字大小排序

#将一组计算结果数值升序/降序排列。
@value=(10,34,24,16,4,7);
@value1=sort{
      
      $a<=>$b}(@value);  # 升序
@value2=reverse(@value1); # 降序
print "ascending order : @value1\n";  #\n 代表换行
print "descending order : @value2";

popFunction: Delete the last
Syntax: pop(@array)
Description: Delete the last element@array of the array and return the deleted element .
```
@arr=("A","B","C");
$rm=pop(@arr); #@arr=("A", "B"), $rm="C" 
```

pushFunction: Add at the end
Syntax: push(@array，$newelement/@newarray)
Description: Add a new element or a new array@array at the end of the array .

@arr=("A","B","C") ;
push(@arr,"D") ;  #@arr=("A", "B","C","D")

#由大到小排序，同时去掉最小值，并在最后添加元素的个数。
@value=(10,34,24,16,4,7);
@value1=sort{
      
      $a<=>$b}(@value);
@value=reverse(@value);
pop(@value);
push(@value, scalar(@value)); 
print "new value : @value";

shiftFunction: Delete the first
Syntax: shift(@array)
Description: Similar to pop, delete the first element@array of the array and return the deleted element .
```
@arr=("A","B","C");
$rm=shift(@arr); #@arr=("B", "C"), $rm="A" 
```

unshiftFunction: Add in front
Syntax: unshift(@array，$newelement/@newarray)
Description: Similar to push, add a new element or a new array at the front@array of the array .

@arr=("A","B");
unshift(@arr,"X") ;  #@arr=("X", "A","B")

#由大到小排序，同时去掉最大值，并在最前面添加元素的个数。
@value=(10,34,24,16,4,7);
@value1=sort{
      
      $a<=>$b}(@value);
@value=reverse(@value);
shift(@value);
unshift(@value, scalar(@value)); 
print "new value : @value";

joinFunction: Concatenation :
Syntax: Description: Concatenate the elements in the array into a string join($string, @array)
with the specified character , and return the string as the result.$string@array
```
@arr=("A" , "B" , "C") ;
$get=join(":", @arr);  #$get="A:B:C"
```

splitFunction: Split :
Syntax: split(/pattern/, $string)
Description: Split the string $stringaccording to pattern( delimiter ), and put the split result into an array .

$seq="15:32:54";
@arr=split(/:/, $seq) ; #@arr=("15","32","54")

# 获取 FASTA 格式序列的纯序列部分。
$fasta=">sp|P33316|DUT_HUMAN
MTPLCPRPALCYHFLTSLLRSAMQNARGARQRA
EAAVLSGPGPPLGRAAQHGIPRPLSSAGRLSQG 
CRGASTVGAAGWKGELPKAGGSPAPGPETP";
@line=split(/\n/, $fasta); # 按换行符分割成行数 n 个字符串并放入数组 line
shift(@line); # 去掉数组中的第 1 个字符串，即">"开头的第一行
$seq=join("", @line); #拼接其他字符串，即纯序列
print $seq;

8.4 Perl Language Basics Advanced

8.4.1 if conditional statement

● Grammar one :
insert image description here

The " condition " of the if conditional statement is placed in parentheses, ()and
the "condition" in the parentheses is a logical expression.
If the logical expression is judged as true: execute the statement in ifthe following curly braces If the logical expression is judged as : execute the statement in the following curly braces Note : if and else can not appear in pairs , that is, there can be only if without else, but not only else without if!{}
falseelse{}
```
$a=10;
$b=15;
if ($a > $b) {
      
      
    print "a is bigger!";
}
else {
      
      
    print "b is bigger!";
}
```

●Grammar two :
insert image description here

If logical expression 1 is judged as true: execute statement1

If the logical expression 1 is judged as false:
continue to judge elsifthe following logical expression 2
If the logical expression 2 is judged to be true: execute statement2
If the logical expression 2 is judged to be false: execute statement3

if ($a == 5) {
      
      
    print '$a is 5';
}
elsif ($a == 10) {
      
      
    print '$a is 10';
}
elsif ($a == 15) {
      
      
    print '$a is 15';
}
else {
      
      
    print '$a is not (5 or 10 or 15)';
}

8.4.2 for loop statement

●grammar one：for
insert image description here

init will be executed first, and will only be executed once. This step is to declare and initialize the loop control variables. For example:$i=1;
Next, the condition will be judged . If true, the loop body is executed. If false, no execution. for example:$i<10;
After executing the loop body, the flow of control jumps back to the increment statement and updates the loop control variable. for example:$i++

Judge the condition again , and repeat the above process until the condition becomes false, and the loop terminates.

# 九九乘法表
for ($i=1; $i<10; $i++) {
      
      
    for($j=1; $j<10; $j++) {
      
      
        $x=$i*$j;
        print "$i x $j = $x\t";
    }
    print "\n";
}

● grammar two：foreach

foreach (@array) {
    
    
    statement1;
    ……
}

For @arrayeach element in the array, execute the body of the loop. ( traversing the array )
Use the variable to represent the current element$_ captured in the current cycle .

After each element in the array is rotated in turn, the loop ends.

# 奇数偶数？
@a=(1, 12, 33, 25, 98, 34, 55, 76, 18, 10);

foreach (@a) {
      
      
    if ($_ % 2 == 0)  # % 取余数
    {
      
        print "$_ is even. \n"; }
    else 
    {
      
        print "$_ is odd. \n"; }
}

8.4.3 File reading and writing

Read in file :
open(FH, "read.txt");Open the read.txt file in the current path. FH is the identifier of a file read and write event, and the name can be defined by itself.
@get = <FH>;Store the read file content into an array @get, and divide the elements according to the newline "\n" , that is, one element per line .
close FH;After reading, close the file flag FH.
```
open(FH, "read.txt");
@get = <FH>;
close FH;
```
Write file :
open(FH, ">write.txt");open/create the write.txt file in the current path, and write content to it. " >" means to write out , and the original content in the file will be overwritten .
print FH "Hello world!\n";"print FH" means to write out the contents of the double quotes to the file represented by the file identifier FH instead of writing to the screen.
close FH;After writing, close the file flag FH.
```
open(FH, ">write.txt");
print FH "Hello world!\n";
close FH;
```
Continue writing file :
open(FH, ">>write.txt");open/create the write.txt file in the current path, and write content to it. " >>" stands for continuation , and the original content in the file will not be overwritten .
print FH "Hello world!\n";Write the same file.
close FH;After writing, close the file flag FH.
```
open(FH, ">>write.txt");
print FH "Hello world!\n";
close FH;
```
Screen input :
print "What's your name?\n";The screen prints a sentence
$name = <STDIN>; to STDIN indicating that the value $nameshould be assigned through screen input . The cursor will stay there waiting for the user to input content, and press Enter after input.
chomp($name); The chomp function is used to remove the carriage return at the end of the input content on the screen .
print "Hello $name!";Print the obtained screen input value.
```
print "What's your name?\n";
$name = <STDIN>;
chomp($name);
print "Hello $name!";
```

insert image description here

8.4.4 Get the download page

Basic statement :

# 引用 LWP 模块，以使用 get 函数
use LWP::Simple;  
# 将一个 FASTA 序列的网址赋值给变量 $url
$url='http://www.uniprot.org/uniprot/Q6GV17.fasta';  
# get 函数获取$url 网址打开页面的全部内容（获取页面源代码）
$content = get $url; 
# 如果网址无法打开，die 函数将强制结束程序，并屏幕打印告知。
die "Couldn't get $url" unless defined $content;
# 将获取的网页内容打印出来。
print $countent;

8.4.5 Application example: Batch download and save sequence

Batch download and save sequences

use LWP::Simple;  
open (FH, ">seq.fasta");
@ids= ("Q6GV17" , "Q9BXR5" , "B3Y669" , "COLSK8");
foreach (@ids) {
    
    
    $url= "http://www.uniprot.org/uniprot/$_.fasta";
    $content = get $url;
    die "Couldn't get $url" unless defined $content;
    print FH "$countent";
}
close FH;