8.3 Getting Started with Perl Language Basics
- See the video for details: Introduction to Perl Language Basics-01 P137
● Perl : Free download from the official website of Practical Extraction and Report Language (may not be opened if the network is not good), and supports multiple operating systems
- Literal translation language : Interpreted Language
- Sequence processing : especially suitable for string (sequence) processing
- File processing : convenient and concise file writing and writing
- Pattern matching : full support for regular expressions
8.3.1 The first Perl
- Create a text file named test.pl
- The first line writes Perl execution environment (installation path)
linux writes:#!/usr/bin/perl
Windows writes:#!C:/Strawberry/perl/bin/perl.exe
- The second line writes a simple statement, such as:
print "He1lo world!";
- Save test.pl and open the cmd command window.
- Change to the directory where test.pl is located.
cd Desktop
- Execute
perl test.pl
the command to run your first program.
8.3.2 Basic rules of Perl
- Each statement must be followed by "
;
". - Add " " in front of the comment
#
, it is only valid for a single line . - The definition of variables is extremely simple and casual:
◆ There is no need to declare variables in advance, and they can be defined as they are used;
◆ There is no need to define variable types , and the program will automatically judge after assignment ;
◆ Variable names are case-sensitive , start with a letter , and cannot have special characters ;
◆ Variable names Add "$
" in front, use " "=
to assign value.$scalar = expression
● The use of various quotation marks :
- Single quotes
''
completely literal translation of everything inside quotes.
- Double quotes can identify
""
variables and antonyms in quotes .
- Backticks execute
``
the content inside the quotes as a command .
● The use of various operators :
-
Comparison Operators Comparison Operators: Comparison operators
for strings are different from comparison operators for numbers !
◆ Numbers are smaller than letters !
◆ Uppercase letters are smaller than lowercase letters!
◆ From small to large : numbers [0-9], uppercase letters [AZ], lowercase letters [az]
-
Logical Operators Logical Operators:
You can write directly:or
,and
,not
or use symbols instead:||
,&&
,!
Special logical operators:xor
XOR
△a xor b = (a' and b) or (a and b')
(a' = not a)
If the two values of a and b are not the same , the XOR result is1
. If the two values of a and b are the same , the XOR result is0
. -
String Operators String Operators:
.
string splicing and concatenation
x
, repeated copying of strings, assignment of
.=
strings after splicing and concatenation$string1 = "potato"; $string2 = "head"; $string3 = $string1 . $string2; #patatohead $string4 = $string1 x 2; #patato $string1 .= $string2; #string1 = patatohead
8.3.3 String Common Functions
-
Perl has many pre - defined functions:
函数名 ( 参数 1, 参数 2,... 参数 n)
Note: the commas
between the parameters are essential , and the parentheses surrounding the parameters are optional -
length
Function:
Syntax:length($string);
Description: Find the byte value (length) of the string $string.$string="Per15"; $size=length($string); #这时 $size=5 #计算一条氨基酸序列的长度 $seq="GHMGSSVLEELVQLVKDKNIDISIKYDPRKDSEVEANRVITDDIELLKKILAYFLPEDAILKGGHYDNQLQNGIKRVKEFLESSPNTQWELRAFMAVMHFSLTADRIDDDILKVIVDSMNHHGDARSKLREELAELTAE"; $len=length($seq); print "The protein contains $len amino acids.";
-
substr
Function:
Syntax:substr($string, offset, 1ength);
Description: Intercept the substring of the string $string .
★ Note: The first digit is 0, not 1.
offset
Represents the position of the starting character , and if offset is a negative value , the specified character will start from the right side of the string .
length
Represents the length of the referenced string. If length is omitted , it represents the length from the start value to the last character of the string.
$s1=substr("perl5",2,2); #$s="rl" 从第二个字符串后开始截取 2 个字符串 $s2=substr("perl5",1); #$s="erl5" 从第一个字符串后开始截取剩余字符串 $s3=substr("perl5",-5,3); #$s="per" 从倒数第五个字符串开始截取 3 个字符串 #提取出蛋白质序列的功能区,功能区位于第 12 个氨基酸到第 23 个氨基酸。 $seq="GHMGSSVLEELVQLVKDKNIDISIKYDPRKDSEVEANRVITDDIELLKKILAYFLPEDALKGGHYDNQLQNGIKRVKEFLESSPNTQWELRAFMAVMHFSLTADRIDDDILKVIVDSMNHHGDARSKLREELAELTAE"; $begin=11; $end=22; $length=$end-$begin+1; $region=substr($seq, $begin, $length); print "The functional region is $region.";
-
index
Function:
Syntax:index($string, $substring, position)
Description: Return the position of the character you want to find$substring
in the string ($string
returns only the first position found) and returns if not found-1
.$substring
It is the character to be searched for ;
position
it represents the position from which to start searching, if omitted, it will start from the beginning .$s1=index("pell5","p"); #$s1=0 $s2=index("pell5","l",2); #$s2=2 只返回找到的第一个位置 $s3=index("pell5", "perl"); #$s3=-1 #查找某功能区在一条蛋白质序列里的位置。 $seq="GHMGSSVLEELVQLVKDKNIDISIKYDPRKDSEVEANRVITDDELLKKILAYFLPEDAILKGGHYDNQLQNGIKRVKEFLESSPNTQWELRAFMAVMHFSLTADRIDDDILKVIVDSMNHHGDARSKLREELAELTAE"; $region="KEFLESSPNT"; $begin=index($seq, $region)+1; #起始位置,注意获取 index 后加一 $end=$begin+length($region)-1; #结束位置,注意减一 $begin_aa=substr($region,0,1); #第一个氨基酸 $end_aa=substr($region,-1,1); #最后一个氨基酸 print "The functional region is from $begin_aa$begin to $end_aa$end.";
8.3.4 Array common functions
-
An Array is a set of elements enclosed in parentheses
()
.
- Elements can be any value or empty , separated by commas
- The number of elements can be increased or decreased
at any time - Arrays are@
represented by symbols, for example: - An element@arr("A","T","C","G")
in an array is equal to a variable , written as:$arr[index_number]
print $arr[0]; #A print $arr[1,3]; #TG print @arr; #ATCG
-
scalar
Function: Number of elements
Syntax:scalar(@array)
Description: Returns the number of elements@array
in the array .@arr=("A","B","C"); #定义一个数组 $num=scalar(@arr); #这时$num=3 #批量处理一组 Uniprot ID 前,先数数这一组 ID 的个数。 @ids=("Q6GV17", "Q9BXR5", "B3Y669", "COLSK8"); $count=scalar(@ids); print "$count ids : @ids";
-
reverse
Function: Reverse Order
Syntax:reverse(@array)
Description: Reversely arrange@array
the elements in the array from back to front . -
sort
Function: Ascending
Syntax:sort[{$a<=>$b}](@array)
Description: Sort@array
the elements in the array in ascending order of ASCII codes . After adding, sort by numerical size .{$a<=>$b}
@arr1=(21,1,2,12); @arr2=reverse(@arr1); #@arr2=(12,2,1,21); 元素逆序 @arr3=sort(@arr); #@arr2=(1,12,2,21) 按字符串大小排序 @arr4=sort{ $a<=>$b}(@arr); #@arr3=(1,2,12,21) 按数字大小排序 #将一组计算结果数值升序/降序排列。 @value=(10,34,24,16,4,7); @value1=sort{ $a<=>$b}(@value); # 升序 @value2=reverse(@value1); # 降序 print "ascending order : @value1\n"; #\n 代表换行 print "descending order : @value2";
-
pop
Function: Delete the last
Syntax:pop(@array)
Description: Delete the last element@array
of the array and return the deleted element .@arr=("A","B","C"); $rm=pop(@arr); #@arr=("A", "B"), $rm="C"
-
push
Function: Add at the end
Syntax:push(@array,$newelement/@newarray)
Description: Add a new element or a new array@array
at the end of the array .@arr=("A","B","C") ; push(@arr,"D") ; #@arr=("A", "B","C","D") #由大到小排序,同时去掉最小值,并在最后添加元素的个数。 @value=(10,34,24,16,4,7); @value1=sort{ $a<=>$b}(@value); @value=reverse(@value); pop(@value); push(@value, scalar(@value)); print "new value : @value";
-
shift
Function: Delete the first
Syntax:shift(@array)
Description: Similar to pop, delete the first element@array
of the array and return the deleted element .@arr=("A","B","C"); $rm=shift(@arr); #@arr=("B", "C"), $rm="A"
-
unshift
Function: Add in front
Syntax:unshift(@array,$newelement/@newarray)
Description: Similar to push, add a new element or a new array at the front@array
of the array .@arr=("A","B"); unshift(@arr,"X") ; #@arr=("X", "A","B") #由大到小排序,同时去掉最大值,并在最前面添加元素的个数。 @value=(10,34,24,16,4,7); @value1=sort{ $a<=>$b}(@value); @value=reverse(@value); shift(@value); unshift(@value, scalar(@value)); print "new value : @value";
-
join
Function: Concatenation :
Syntax: Description: Concatenate the elements in the array into a stringjoin($string, @array)
with the specified character , and return the string as the result.$string
@array
@arr=("A" , "B" , "C") ; $get=join(":", @arr); #$get="A:B:C"
-
split
Function: Split :
Syntax:split(/pattern/, $string)
Description: Split the string$string
according topattern
( delimiter ), and put the split result into an array .$seq="15:32:54"; @arr=split(/:/, $seq) ; #@arr=("15","32","54") # 获取 FASTA 格式序列的纯序列部分。 $fasta=">sp|P33316|DUT_HUMAN MTPLCPRPALCYHFLTSLLRSAMQNARGARQRA EAAVLSGPGPPLGRAAQHGIPRPLSSAGRLSQG CRGASTVGAAGWKGELPKAGGSPAPGPETP"; @line=split(/\n/, $fasta); # 按换行符分割成行数 n 个字符串并放入数组 line shift(@line); # 去掉数组中的第 1 个字符串,即">"开头的第一行 $seq=join("", @line); #拼接其他字符串,即纯序列 print $seq;
8.4 Perl Language Basics Advanced
8.4.1 if conditional statement
● Grammar one :
- The " condition " of the if conditional statement is placed in parentheses,
()
and
the "condition" in the parentheses is a logical expression.
If the logical expression is judged astrue
: execute the statement inif
the following curly braces If the logical expression is judged as : execute the statement in the following curly braces Note : if and else can not appear in pairs , that is, there can be only if without else, but not only else without if!{}
false
else
{}
$a=10; $b=15; if ($a > $b) { print "a is bigger!"; } else { print "b is bigger!"; }
●Grammar two :
- If logical expression 1 is judged as
true
: execute statement1 - If the logical expression 1 is judged as
false
:
continue to judgeelsif
the following logical expression 2
If the logical expression 2 is judged to be true: execute statement2
If the logical expression 2 is judged to be false: execute statement3if ($a == 5) { print '$a is 5'; } elsif ($a == 10) { print '$a is 10'; } elsif ($a == 15) { print '$a is 15'; } else { print '$a is not (5 or 10 or 15)'; }
8.4.2 for loop statement
●grammar one:for
- init will be executed first, and will only be executed once. This step is to declare and initialize the loop control variables. For example:
$i=1
; - Next, the condition will be judged . If true, the loop body is executed. If false, no execution. for example:
$i<10;
- After executing the loop body, the flow of control jumps back to the increment statement and updates the loop control variable. for example:
$i++
- Judge the condition again , and repeat the above process until the condition becomes false, and the loop terminates.
# 九九乘法表 for ($i=1; $i<10; $i++) { for($j=1; $j<10; $j++) { $x=$i*$j; print "$i x $j = $x\t"; } print "\n"; }
● grammar two:foreach
foreach (@array) {
statement1;
……
}
-
For
@array
each element in the array, execute the body of the loop. ( traversing the array ) -
Use the variable to represent the current element
$_
captured in the current cycle . -
After each element in the array is rotated in turn, the loop ends.
# 奇数偶数? @a=(1, 12, 33, 25, 98, 34, 55, 76, 18, 10); foreach (@a) { if ($_ % 2 == 0) # % 取余数 { print "$_ is even. \n"; } else { print "$_ is odd. \n"; } }
8.4.3 File reading and writing
-
Read in file :
open(FH, "read.txt");
Open the read.txt file in the current path. FH is the identifier of a file read and write event, and the name can be defined by itself.
@get = <FH>;
Store the read file content into an array@get
, and divide the elements according to the newline "\n" , that is, one element per line .
close FH;
After reading, close the file flag FH.open(FH, "read.txt"); @get = <FH>; close FH;
-
Write file :
open(FH, ">write.txt");
open/create the write.txt file in the current path, and write content to it. ">
" means to write out , and the original content in the file will be overwritten .
print FH "Hello world!\n";
"print FH" means to write out the contents of the double quotes to the file represented by the file identifier FH instead of writing to the screen.
close FH;
After writing, close the file flag FH.open(FH, ">write.txt"); print FH "Hello world!\n"; close FH;
-
Continue writing file :
open(FH, ">>write.txt");
open/create the write.txt file in the current path, and write content to it. ">>
" stands for continuation , and the original content in the file will not be overwritten .
print FH "Hello world!\n";
Write the same file.
close FH;
After writing, close the file flag FH.open(FH, ">>write.txt"); print FH "Hello world!\n"; close FH;
-
Screen input :
print "What's your name?\n";
The screen prints a sentence
$name = <STDIN>;
to STDIN indicating that the value$name
should be assigned through screen input . The cursor will stay there waiting for the user to input content, and press Enter after input.
chomp($name);
The chomp function is used to remove the carriage return at the end of the input content on the screen .
print "Hello $name!";
Print the obtained screen input value.print "What's your name?\n"; $name = <STDIN>; chomp($name); print "Hello $name!";
8.4.4 Get the download page
- Basic statement :
# 引用 LWP 模块,以使用 get 函数
use LWP::Simple;
# 将一个 FASTA 序列的网址赋值给变量 $url
$url='http://www.uniprot.org/uniprot/Q6GV17.fasta';
# get 函数获取$url 网址打开页面的全部内容(获取页面源代码)
$content = get $url;
# 如果网址无法打开,die 函数将强制结束程序,并屏幕打印告知。
die "Couldn't get $url" unless defined $content;
# 将获取的网页内容打印出来。
print $countent;
8.4.5 Application example: Batch download and save sequence
- Batch download and save sequences
use LWP::Simple;
open (FH, ">seq.fasta");
@ids= ("Q6GV17" , "Q9BXR5" , "B3Y669" , "COLSK8");
foreach (@ids) {
$url= "http://www.uniprot.org/uniprot/$_.fasta";
$content = get $url;
die "Couldn't get $url" unless defined $content;
print FH "$countent";
}
close FH;