Regular expressions, text processing tools and scripting basics
1. Regular expressions
There are two types of regular expressions:
- Basic regular expression: BRE
- Extended regular expression: ERE
1.1 Basic regular expression metacharacters
1.1.1 Character matching
. 匹配任意单个字符,可以是一个汉字
[] 匹配指定范围的任意单个字符,例如:[grain] [0-9] [a-z]
[^] 匹配指定范围外的任意单个字符,例如:[^grain]
[:alnum:] 字母和数字
[:alpha:] 代表任何英文大小写字母,即a-z,A-Z
[:lower:] 小写字母,相当于a-z
[:upper:] 大写字母
[:blank:] 空白字符
[:space:] 水平和垂直的空白字符
[:digit:] 十进制数字
……
1.1.2 Number of matches
Used after the character to specify the number of times, used to specify the number of times the preceding character should appear
* 匹配前面的字符任意次,包括0次
.* 任意长度的任意字符
\? 匹配其前面的字符0或1次
\+ 匹配其前面的字符至少1次
\{n\} 匹配前面的字符n次
\{m,n\} 匹配前面的字符至少m次,至多n次
\{,n\} 匹配前面的字符至多n次,<=n
\{n,\} 匹配前面的字符至少n次
1.1.3 Position anchoring
^ 行首锚定,用于模式的最左侧
$ 行尾锚定,用于模式的最右侧
^PATTERN$ 用于模式匹配整行
^$ 空行
^[:space:]$ 空白行
\< 词首锚定,用于单词模式的左侧
\> 词尾锚定,用于单词模式的右侧
\<PATTERN\> 匹配整个单词
1.1.4 Group other or
Grouping: () Bundle multiple characters together and treat them as a whole, for example: \(root\)+
Back reference: The content matched by the pattern in the grouping brackets will be recorded by the regular expression engine and the internal variables. The naming of these variables is: \1,\2,\3……, \1 means from the left Start the first left parenthesis and the characters matched by the pattern between the matching right parentheses
Or: \|
\(string1\(string2\)\)
\1: string1\(string2\)
\2: sting2
1.2 Extended regular expressions
1.2.1 Character matching metacharacters
. 匹配任意单个字符,可以是一个汉字
[] 匹配指定范围的任意单个字符,例如:[grain] [0-9] [a-z]
[^] 匹配指定范围外的任意单个字符,例如:[^grain]
[:alnum:] 字母和数字
[:alpha:] 代表任何英文大小写字母,即a-z,A-Z
[:lower:] 小写字母,相当于a-z
[:upper:] 大写字母
[:blank:] 空白字符
[:space:] 水平和垂直的空白字符
[:digit:] 十进制数字
……
1.2.2 Times matching
* 匹配前面字符任意次
? 0或1次
+ 1次或多次
{n} 匹配n次
{m,n} 至少m,至多n次
1.2.3 Position anchoring
^ 行首
$ 行尾
\<,\b 词首
\>,\b 词尾
1.2.4 Group other
() 分组
后向引用:\1,\2,...
| 或者
a|b #a或b
C|cat #C或cat
(C|c)at #Cat或cat
2. Grep for text processing
Function: The text search tool, according to the user-specified "mode", performs matching check on the target line by line, and prints the matched line
Mode: filter conditions written by regular expression characters and text characters
format:
grep [OPTIONS] PATTERN [FILE...]
Common options:
--color=auto 对匹配到的文本着色显示
-m # 匹配#次后停止
-v 显示不被pattern匹配到的行
-i 忽略字符大小写
-n 显示匹配的行号
-c 统计匹配的行数
-o 仅显示匹配到的字符串
-q 静默模式,不输出任何信息
-A # 后#行
-B # 前#行
-C # 前后各#行
-e 多个选项间逻辑or关系 如:grep -e 'cat' -e 'dog' file
-w 匹配整个单词
-E 使用ERE,相当于egrep
-f file 根据模式文件处理
-r 递归目录,但不处理软链接
-R 递归目录,并处理软链接
[root@CentOS8 ~]#df | grep '/dev/sd'
/dev/sda2 104806400 2311224 102495176 3% /
/dev/sda5 52403200 402140 52001060 1% /data
/dev/sda1 1038336 172128 866208 17% /boot
[root@CentOS8 ~]#df|grep '^/dev/sd'|tr -s ' ' %|cut -d% -f5|sort -n|tail -1
17
3. Sed for text processing
format:
sed [option]... 'script;script;...' inputfile...
Common options:
-n 不输出模式空间内容到屏幕,即不自动打印
-e 多点编辑
-f file 从指定文件中读取编辑脚本
-r,-E 使用扩展正则表达式
-i 原处编辑,加后缀代表备份并编辑
script format:
'地址命令'
Address format:
1. 不给地址: 对全文进行处理
2. 单地址:
#:指定的行, $:最后一行
/pattern/: 被此处模式所匹配到的每一行
3. 地址范围:
#,# 从#行到#行,3,6 从第3行到第6行
#,+# 从#行到+#行,3,+4 从3行到第7行
4. 步长:~
1~2 奇数行
2~2 偶数行
command:
p 打印当前模式空间内容,追加到默认输出之后
Ip 忽略大小写输出
d 删除模式空间匹配的行,并立即启用下一轮循环
a [\\]text 之指定行后面追加文本,支持使用\n实现多行追加
i [\\]text 在行前面插入文本
c [\\]text 替换行为单行或多行文本
w /path/file 保存模式匹配的行至指定文件
r /path/file 读取指定文件的文本至模式空间中匹配到的行后
= 为模式空间中的行打印行号
! 模式空间中匹配行取反
s/pattern/string/修饰符 查找替换,支持使用分隔符,也可以使用:s@@@,s###
替换修饰符:
g 行内全局替换
p 显示替换成功的行
w /path/file 将替换成功的行保存到文件中
I,i 忽略大小写
[root@CentOS8 ~]#sed -n '1,4p' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@CentOS8 ~]#
4. Awk for text processing
format:
awk [OPTIONS] 'program' var=value file...
awk [OPTIONS] -f programfile var=value file...
Description:
program is usually enclosed in single quotes and can be composed of three parts
- BEGIN statement block
- Common sentence blocks for pattern matching
- END block
Common options:
- -F "Separator" indicates the field separator used during input, the default is continuous blanks
- -v var=value variable assignment
Program format:
pattern{action;...}
pattern: Determine when the action statement is triggered and trigger events, such as BEGIN, END, regular expressions, etc.
action: process the data, specify in {}, common: print, printf
Action print
format:
print item1,item2,...
Description:
- Comma separator
- The output item can be a string or a numeric value
- If item is omitted, it is equivalent to print $0
- Fixed characters need to be quoted with "", while variables and numbers do not need
[root@CentOS8 ~]#awk -F: -v OFS=':' '/root/{print $1,$3,$7}' /etc/passwd
root:0:/bin/bash
operator:11:/sbin/nologin
[root@CentOS8 ~]#
5. Examples of text processing
- Count the number of users whose default shell is not /sbin/nologin in the /etc/passwd file, and display all users
[root@CentOS8 ~]#echo `grep -v '/sbin/nologin' /etc/passwd |wc -l` && grep -v '/sbin/nologin' /etc/passwd | cut -d: -f1
12
root
sync
shutdown
halt
lee
num
number
mageia
slackware
user1
user2
user3
[root@CentOS8 ~]#awk '!/\/sbin\/nologin/{print $NF}' /etc/passwd | wc -l
12
[root@CentOS8 ~]#awk -F: '!/\/sbin\/nologin/{print $1}' /etc/passwd
root
sync
shutdown
halt
lee
num
number
mageia
slackware
user1
user2
user3
[root@CentOS8 ~]#echo `awk '!/\/sbin\/nologin/{print $NF}' /etc/passwd | wc -l` && awk -F: '!/\/sbin\/nologin/{print $1}' /etc/passwd
12
root
sync
shutdown
halt
lee
num
number
mageia
slackware
user1
user2
user3
[root@CentOS8 ~]#
- Find the user name, UID and shell type of the maximum user UID
[root@CentOS8 ~]#awk -F: '{print $1,$3,$7}' /etc/passwd | sort -nr -k2|head -n1
nobody 65534 /sbin/nologin
[root@CentOS8 ~]#getent passwd | sort -t: -k3 -n|tail -n1|cut -d: -f1,3,7
nobody:65534:/sbin/nologin
- Count the number of connections of each remote host IP currently connected to this machine, and sort them from largest to smallest
[root@CentOS8 ~]ss -nt | tail -n +2 | tr -s ' ' : | cut -d: -f6|sort|uniq -c|sort -nr
[root@CentOS8 ~]ss -nt | grep "^ESTAB" | tr -s ' ' : | cut -d: -f6|sort|uniq -c|sort -nr
[root@CentOS8 ~]ss -nt | awk -F" +|:" '/^ESTAB/{print $(NF-2)}' |sort|uniq -c | sort -nr
6, text editing vim
6.1 Three modes of vim and conversion
Three common modes:
- Command or normal mode: more mode, you can move the cursor, cut and paste text
- Insert or edit mode: used to modify text
- Extended command or command mode: save, exit, etc.
Command mode --> Insert mode
i insert,在光标所在处插入
I 在当前光标所在行的行首输入
a 在光标所在处的后面输入
A 在当前光标所在行的行尾输入
o 在当前光标所在行的下方打开一个新行
O 在当前光标所在行的上方打开一个新行
- Insert mode --- Esc ---> command mode
- Command mode ---: ---> Extended command mode
- Extended command mode - Esc, enter ---> command mode
6.2 Common Commands in Extended Command Mode
w 写磁盘文件
wq 写入并退出
q! 不存盘退出
wq! 强制存盘退出
r file 读文件内容到当前文件中
w file 将当前文件内容写入另一个文件
!command 执行命令
r!command 读入命令的输出
6.3 Address delimitation
/part1/,/part2/ #从第一次被part1匹配开始,一直到第一次被part2匹配结束
/pattern/ #从当前行向下查找,直到匹配pattern的第一行
% #全文,相当于1,$
$ #最后一行
.,$-1 #当前行到倒数第二行
# #第 # 行
#,# #从左侧#起始行,到右侧#结束行
6.4 Address delimitation followed by an editing command
d #删除
y #复制
w file #指定范围的行另存至指定文件中
r file #在指定位置插入指定文件中的所有内容
6.5 Find and Replace
format
s/要查找的内容/替换的内容/修饰符
修饰符:
i #忽略大小写
g #全局替换,默认情况下,每一行只替换第一次出现
gc #全局替换,每次替换前询问
要查找的内容: 可使用基本正则表达式
替换为的内容: 不能使用模式,但是可以使用后向引用\1,\2,...等符号,还可以使用 "&" 引用前面查找时查到的整个内容
6.6 Customize vim work features
Configuration file
/etc/vimrc #全局
~/.vimrc #个人
Common features:
行号
显示: set number, 简写 set nu
取消显示:set nonumber,简写 set nonu
复制保留格式
启用: set paste
禁用: set nopaste
Tab用指定空格个数代替
启用: set tabstop=# 指定#个空格代替Tab
简写: set ts=#
光标所在行标识线
启用: set cursorline,简写 set cul
禁用: set nocursorline
6.7 Command Mode
Jump between characters :
h: left l: right j: down k: up
Jump between words :
w: the beginning of the next word
e: the ending of the current or next word
b: The beginning of the current or previous word
Jump to the current page :
H: top of page M: middle line of page L: bottom of page
zt: Move the current line where the cursor is to the top of the screen
zz: Move the current line where the cursor is to the middle of the screen
zb: Move the current line where the cursor is to the bottom of the screen
Jump from the beginning of the line to the end :
^ The first non-blank character
0 Start of line
$ End of line
Move between lines :
#G or: # Jump to # line
G last line
1G, the first line of gg
Move between sentences :
) Next sentence (Previous sentence
Screen flip operation :
Ctrl+f Scroll one screen to the end of the file Ctrl+b Scroll one screen to the beginning of the file
Ctrl+d half-screen to the end of the file Ctrl+u half-screen to the beginning of the file
Delete command :
d Delete, combined with cursor jump to achieve range deletion
d$ delete to the end of the line
d^ Delete to the beginning of a non-blank line
dd Cut the line where the cursor is
#dd delete multiple lines
D: Delete from the current cursor position to the end of the line, equivalent to d$
Copy command :
y$ Copy
y0 Copy to the end of the line
y^ Copy to the beginning of a non-blank line
yy Copy line
#yy copy multiple lines
Y: Copy the entire line
Find:
/pattern: Search from the current cursor to the end of the file
?pattern: search from the current cursor to the beginning of the file
n: same direction as command
N: The opposite direction to the command
Undo changes:
u Undo recent changes
#u Undo multiple changes before
U Undo all changes of this line after the cursor is on this line
. Repeat the previous operation
#. Repeat the previous operation # times
7. Basics of shell scripting
Shell script: A text file containing some commands or statements and symbols in a certain format
Format requirements: first line shebang mechanism
#!/bin/bash
#!/usr/bin/python
#!/usr/bin/perl
7.1 Script example
- Write the script disk.sh to display the maximum value of space utilization in the current hard disk partition
[root@CentOS8 script]#bash disk.sh
17
[root@CentOS8 script]#cat disk.sh
#!/bin/bash
df -h| awk -F" +|%" '/\/dev\/sd/{print $5}'|sort -nr|head -n1
- Write the script system_info.sh to display the current host system information, including: host name, IPv4 address, operating system version, kernel version, CPU model, memory size, hard disk size
[root@CentOS8 script]#cat system_info.sh
#!/bin/bash
RED="\E[1;31m"
GREEN="echo -e \E[1;32m"
END="\E[0m"
$GREEN-------------------Host systeminfo--------------------$END
echo -e "HOSTNAME: $RED`hostname`$END"
echo -e "IPADDR: $RED`ifconfig ens33|grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' |head -n1`$END"
echo -e "OSVERSION: $RED`cat /etc/redhat-release`$END"
echo -e "KERNEL: $RED`uname -r`$END"
echo -e "CPU: $RED`lscpu|grep 'Model name'|tr -s ' '|cut -d: -f2`$END"
echo -e "MEMORY: $RED`free -h|grep Mem|tr -s ' ' :|cut -d: -f2`$END"
echo -e "DISK: $RED`lsblk|awk '/^sd/{print $1,$4}'`$END"
$GREEN------------------------------------------------------$END
[root@CentOS8 script]#