Shell programming regular expressions, extended regular expressions and text processors

Regular expression concepts

  • Regular expressions are also called regular expressions and regular expressions. Regular expressions use a single string to describe and match a series of strings that meet a certain syntactic rule. Regular expressions are a method of matching strings. Through some special symbols, it can quickly find, delete, and replace a specific String
  • Regular expressions are generally used in script programming and text editors. Many text processors and programming languages ​​support regular expressions, such as the text processors (grep, egrep, sed, awk) in the linux system and the widely used Python language. Regular expressions have a very powerful text matching function, which can process text quickly and efficiently in the text ocean.
  • The string expression method of regular expression is divided into basic regular expression and extended regular expression according to different rigor and function . Basic regular expressions are the most basic part of commonly used regular expressions. Among the common file processing tools in Linux systems, grep and sed support basic regular expressions , while egrep and awk support extended regular expressions .

One, grep

  • Common options:
-n 显示行号
-i  不区分大小写
-v 反向过滤

1.1 Find specific characters

grep -n ‘the’ test.txt
## 若反向选择 查找不含the字符的行
grep -vn ‘the’ test.txt

1.2 Find a character set

  • Use [] to find
grep -n 'sh[io]rt' test.txt    ## 无论[]中有几个字符,都只能取一个字符查找   过滤shirt或者short
  • View the two "o"
grep -n 'oo' test.txt  ## 表示过滤包含两个o的
  • Does not start with w before filtering oo
greo -n '[^w]oo' test.txt  ## 表示过滤oo前不是w的 ^[w]表示以w开头的 ‘[^w]oo' 表示包含两个o ,o前面不是w的
  • Filter beginning with a letter
grep -n '^[a-zA-Z]' test.txt ## 表示以字母开头的
  • Filter for those beginning with a number
grep -n '[0-9]'  test.txt  ## 查看包含数字的行
  • Find blank lines
[root@localhost ~]# grep -n '^$' test.txt

1.3 Find the "^" at the beginning of the line and the "$" at the end of the line

  • Find those beginning with the
[root@localhost ~]# grep -n '^the' test.txt
  • Find the line ending with a decimal point.
[root@localhost ~]# grep -n '\.$' test.txt

1.4 Find any character "." and repeated character "*"

  • Find the line where any character appears after w
grep -n   ‘w.*’ test.txt   ## W后面出现任意字符
  • Find the line with 0 or more occurrences of "o" after oo
grep -n ‘ooo*’ test.txt   ## 表示*前面的(一个o)字符出现0次或者多次,只针对前面的第一个字符有效,本身就有两个o  可以理解为前面一个字符出现的次数
  • Filter "*" lines
grep -n   ‘*’ test.txt     ## *作为普通的字符,被过滤出来    *前面没有别的字符参考,就作为普通字符被过滤出来  

1.5 The use of braces-{} is often used as a limit for the number of times

  • Use braces must add the escape character "\"
  • Filter rows where two "o"s appear
[root@promote ~]# grep -n 'o\{2\}' test.txt 
  • Matches beginning with wo and ending with d, with 2 to 5 o characters in the middle
grep -n ‘wo\{
    
    2,5\}d’test.txt   ## 先匹配最大的5次,再看后面o的个数
  • Matches beginning with wo and ending with d, with more than 2 o characters in the middle
grep -n ‘wo\{
    
    2,\}d

1.6 Summary of metacharacters

character usage
^ Match the beginning of the input string. Unless used in square bracket expressions, it means that the character set is not included. To match the "^" character itself, use "^"
$ End with what
. Any single character
\ Use with metacharacters to convert metacharacters to ordinary characters
* Matches the number of previous characters
[] One of the characters in the middle matches
[^] Assignment character set. Matches an arbitrary character that is not included. For example, "[^bc]" can match any letter in "plain"
[n1-n2] Character range. Match any character in the specified range. For example, "[az]" can match any lowercase alphabetic character from "a" to "z". Note: Only when the hyphen (-) is inside the character group and appears between two characters, can it indicate the range of characters; if it appears at the beginning of the character group, it can only indicate the hyphen itself
{n} n is a non-negative integer, matching certain n times. For example, "o{2}" cannot match the "o" in "Bob", but it can match the "oo" in "food"
{n,} n is a non-negative integer that matches at least n times. For example, "o{2,}" cannot match the "o" in "Bob", but it can match all o in "foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*"
{n,m} Both m and n are non-negative integers, where n<=m, match at least n times and match at most m times

Two, extended regular expression-egrep

  • "?" and "+" are only used in extended expressions

2.1 Filter multiple times at the same time

  • | Is or if it is and does not directly add |
egrep -v  '^$'|'^#'   httpd.conf     ## 直接筛选出没有空行和#号的行   | 为或     若为且中间直接不加|

2.2 Filter out the previous element that appears once or repeatedly

egrep -n  'wo+d'  httpd.conf    ## ”+“ 表示前面的元素出现一个或者重复出现,w开头 d结尾 中间o至少出现一次
egrep -n 'A(xyz)+C' test.txt          ##  ()+辨别重复的组        表示 A开头 C结尾   xyz出现一次或者一次以上

2.3 Filter out the previous element 0 or 1 time

egrep  -n  'wo?d'  httpd.conf  ##  ?  表示w开头  d结尾   o出现0次或者一次

2.4 Filter multiple strings

egrep  -n ‘if|is|on’    ##  |   查找of 或者 if  或者on字符串

2.5 metacharacters

Metacharacter effect
+ Role: Repeat one or more of the previous character
Function: zero or one character before
| Function: Use or (or) to find multiple characters
()+ Function: Identify multiple repeated groups, example: "egrep -n'A(xyz)+C' test.txt". The command is to query the beginning of "A" and the end of "C", and there is more than one "xyz" string in the middle.

Three, text processor

3.1 sed tool

  • sed (Stream EDitor) is a powerful and simple text parsing and conversion tool that can read text and edit the text content (delete, replace, add, move, etc.) according to specified conditions, and finally output all lines or only output processing Certain lines. Sed can also implement quite complex text processing operations without interaction, and is widely used in Shell scripts to complete various automated processing tasks.
  • The work flow mainly includes reading, executing and displaying three processes.
    Reading: sed reads a line of content from the input stream (file, pipe, standard input) and stores it in a temporary buffer (also known as pattern space) ).
    Execution: By default, all sed commands are executed sequentially in the pattern space, except for the specified line address, otherwise the sed command will be executed on all lines at once.
    Display: Send the modified content to the output stream. After sending the data, the pattern space will be cleared.
    Before all the file contents have been processed, the above emptying will be repeated until the contents are all cleaned up.
    The default is to execute in the pattern space, so the input file will not change in any way, unless redirection is used to store the output.
  • sed basic format
sed [选项] '操作' 参数
sed [选项] -f  scriptfile  参数      ## scriptfile 即脚本
  • Common options
-e或--expression=: 表示指定的命令或脚本来处理输入的文件
-f或--file=: 表示用指定的脚本文件来处理输入的文本文件
-h或--help=: 显示帮助
-n、--quiet 或 silent:表示仅显示处理后的结果。
-i:直接编辑文本文件。

  • "Operation" is used to specify the action behavior of file operations, that is, the sed command. Normally, it is the format of "[n1[,n2]]" operating parameters. n1, n2 are optional, representing the choice to operate If the operation needs to be performed between 5-20 lines, it is expressed as "5, 20 action behavior".
a:增加,在当前行下面增加一行指定内容。
c:替换,将选定行替换为指定内容。
d:删除,删除选定的行。
i:插入,在选定行上面插入一行指定内容。
p:打印,如果同时指定行,表示打印指定行;如果不指定行,则表示打印所有内容;如果有非打印字符,则以 ASCII 码输出。其通常与“-n”选项一起使用。
s:替换,替换指定字符。
y:字符转换。

3.2 Migration


H:复制到剪贴板;
g、G:将剪贴板中的数据覆盖/追加至指定行;
w:保存为文件;
r:读取指定文件;
a:追加指定内容。
  • example
sed '/the/{H:d};$G'test.txt   ## 查找有the的行  H:剪切到剪切板  d:删除原来的行   $G: 追加到行尾
sed '1,5{H:d};17G'test.txt  ##将1到5行粘贴到17行后面 
sed '/the/w  abc.txt' test.txt  ## 将test.txt 中有the 的行保存到abc.txt中
sed '/the/{H;d};$G' test.txt	//将包含the 的行迁移至文件末尾,{
    
    ;}用于多个操作
sed '1,5{H;d};17G' test.txt	//将第 1~5 行内容转移至第 17 行后
sed '/the/w out.file' test.txt	//将包含the 的行另存为文件 out.file
sed '/the/r /etc/hostname' test.txt	//将文件/etc/hostname 的内容添加到包含 the 的每行以后
sed '3aNew' test.txt	//在第 3 行后插入一个新行,内容为New
sed '/the/aNew' test.txt	//在包含the 的每行后插入一个新行,内容为 New
sed '3aNew1\nNew2' test.txt	//在第 3 行后插入多行内容,中间的\n 表示换行

3.3 View the delete effect

  • The nl command is used to count the number of lines in a file, and the results of the command execution can be viewed more intuitively with this command.
nl   test.txt  ## 有行号的文本内容
nl  test.txt |sed '3d'      ## 删除第3行
nl  test.txt |sed '3,5d'  ## 删除删除3到5行   
nl  test.txt |sed '/the/d'    ## 删除有 the的行

3.4 Replace

  • The s (string replacement), c (full line/block replacement), and y
    (character conversion) command options are required when using the sed command to perform the replacement operation .
sed 's/the/THE' test.txt    ## 将每行中的第一个the替换为THE
sed 's/the/THE/2' test.txt  ## 替换每行中的第二个the  替换为THE
sed 's/the/THE/g' test.txt    ## 将所有的the替换为THE 
 sed  's/o//g' test.txt  ## 将文中所有的o都删掉

Four, awk tools

  • In Linux/UNIX systems, awk is a powerful editing tool. It reads the input text line by line, searches it according to the specified matching mode, and performs formatting output or filtering processing on the content that meets the conditions. Under the circumstances, quite complex text operations are realized.
  • The execution result of wk can be printed and displayed through the print function. In the process of using the awk command, you can use the logical operators "&&" to mean "and", "||" to mean "or", and "!" to mean "not"; you can also perform simple mathematical operations, such as +,- , *, /, %, ^ represent addition, subtraction, multiplication, division, remainder and power respectively.

4.1 Basic format

awk 选项 '模式或条件 {编辑指令}' 文件 1 文件 2 … //过滤并输出文件中符合条件的内容
awk   -f   脚本文件 文件 1 文件 2 …	//从脚本中调用编辑指令,过滤并输出内容

4.2 awk built-in options

FS:指定每行文本的字段分隔符,默认为空格或制表位。
NF:当前处理的行的字段个数。
NR:当前处理的行的行号(序数)。
$0:当前处理的行的整行内容。
$n:当前处理行的第 n 个字段(第 n 列)。
FILENAME:被处理的文件名。
RS:数据记录分隔,默认为\n,即每行为一条记录

4.3 awk usage example

  • Output text by line
awk '{print}' test.txt	//输出所有内容,等同于 cat test.txt
awk '{print $0}' test.txt	//输出所有内容,等同于 cat test.txt
awk 'NR==1,NR==3{print}' test.txt	//输出第 1~3 行内容
awk '(NR>=1)&&(NR<=3){print}' test.txt	//输出第 1~3 行内容
awk 'NR==1||NR==3{print}' test.txt	//输出第 1 行、第 3 行内容
awk '(NR%2)==1{print}' test.txt	//输出所有奇数行的内容
awk '(NR%2)==0{print}' test.txt	//输出所有偶数行的内容
  • Output text by field
awk '{print $3}' test.txt	//输出每行中(以空格或制表位分隔)的第 3 个字段
awk '{print $1,$3}' test.txt	//输出每行中的第 1、3 个字段awk -F ":" '$2==""{print}' /etc/shadow //输出密码为空的用户的shadow 记录awk 'BEGIN {FS=":"}; $2==""{print}' /etc/shadow//输出密码为空的用户的shadow 记录
awk -F ":" '$7~"/bash"{print $1}' /etc/passwd //输出以冒号分隔且第 7 个字段中包含/bash 的行的第 1 个字段   ~号对应的意思是包含
  • Invoke Shell commands through pipes and double quotes

awk -F: '/bash$/{print | "wc -l"}' /etc/passwd //调用wc -l 命令统计使用 bash 的用户个数,等同于 grep -c "bash$" /etc/passwd
awk 'BEGIN {while ("w" | getline) n++ ; {print n-2}}'//调用w 命令,并用来统计在线用户数
awk 'BEGIN { "hostname" | getline ; print $0}'//调用hostname,并输出当前的主机名

Five, sort tool

  • sort is a tool for sorting the contents of files in units of rows, and can also be sorted according to different data types.

5.1 Format and common options

sort常用语法
sort [选项] 参数

- 参数
-f:忽略大小写;
-b:忽略每行前面的空格;
-M:按照月份进行排序;
-n:按照数字进行排序;
-r:反向排序;
-u:等同于 uniq,表示相同的数据仅显示一行;
-t:指定分隔符,默认使用[Tab]键分隔;
-o <输出文件>:将排序后的结果转存至指定文件;
-k:指定排序区域。

5.2 Example

[root@localhost ~]# sort /etc/passwd   ## 将/etc/passwd 文件中的账号进行排序。
[root@localhost ~]# sort -t ':' -rk 3 /etc/passwd ## 将/etc/passwd 文件中第三列进行反向排序。
[root@localhost ~]# sort -t ':' -k 3 /etc/passwd -o user.txt  ## 将/etc/passwd 文件中第三列进行排序,并将输出内容保存至 user.txt 文件中。

Six, uniq tools

  • The Uniq tool is usually used in conjunction with the sort command in Linux systems to report or ignore duplicate lines in a file.

6.1 Format and common options

- 语法格式
- uniq [选项] 参数

- 常用选项
-c:进行计数;
-d:仅显示重复行;
-u:仅显示出现一次的行

6.2 Example

[root@localhost ~]# uniq -c testfile  ## 删除 testfile 文件中的重复行,并在行首显示该行重复出现的次数。
[root@localhost ~]# uniq -d testfile  ## 查找 testfile 文件中的重复行。

Seven, tr tool

  • Commands are often used to replace, compress, and delete characters from standard input. You can replace a group of characters into another group of characters, often used to write beautiful single-line commands, very powerful.

7.1 Command format and common options

- 命令格式
tr [选项] [参数]



- 常用选项
-c:取代所有不属于第一字符集的字符;
-d:删除所有属于第一字符集的字符;
-s:把连续重复的字符以单独一个字符表示;
-t:先删除第一字符集较第二字符集多出的字符。 

7.2 Example

[root@localhost ~]# echo "KGC" | tr 'A-Z' 'a-z'   ## 将输入字符由大写转换为小写。
[root@localhost ~]# echo "thissss is	a text linnnnnnne." | tr -s 'sn'  ## 压缩输入中重复的字符。
[root@localhost ~]# echo 'hello world' | tr -d 'od'      ## 删除字符串中某些字符。

Guess you like

Origin blog.csdn.net/weixin_47219725/article/details/107606649