Regular expression and text processing tool script actual combat

Regular expressions, text processing tools and scripting basics

1. Regular expressions

There are two types of regular expressions:

  • ​ Basic regular expression: BRE
  • ​ Extended regular expression: ERE

1.1 Basic regular expression metacharacters

1.1.1 Character matching

.               匹配任意单个字符,可以是一个汉字
[]              匹配指定范围的任意单个字符,例如:[grain]    [0-9]   [a-z]
[^]             匹配指定范围外的任意单个字符,例如:[^grain]
[:alnum:]       字母和数字
[:alpha:]       代表任何英文大小写字母,即a-z,A-Z
[:lower:]       小写字母,相当于a-z
[:upper:]       大写字母
[:blank:]       空白字符
[:space:]       水平和垂直的空白字符
[:digit:]       十进制数字
……

1.1.2 Number of matches

Used after the character to specify the number of times, used to specify the number of times the preceding character should appear

*               匹配前面的字符任意次,包括0次
.*              任意长度的任意字符
\?              匹配其前面的字符0或1次
\+              匹配其前面的字符至少1次
\{n\}           匹配前面的字符n次
\{m,n\}         匹配前面的字符至少m次,至多n次
\{,n\}          匹配前面的字符至多n次,<=n
\{n,\}          匹配前面的字符至少n次

1.1.3 Position anchoring

^               行首锚定,用于模式的最左侧
$               行尾锚定,用于模式的最右侧
^PATTERN$       用于模式匹配整行
^$              空行
^[:space:]$     空白行
\<              词首锚定,用于单词模式的左侧
\>              词尾锚定,用于单词模式的右侧
\<PATTERN\>     匹配整个单词

1.1.4 Group other or

Grouping: () Bundle multiple characters together and treat them as a whole, for example: \(root\)+

Back reference: The content matched by the pattern in the grouping brackets will be recorded by the regular expression engine and the internal variables. The naming of these variables is: \1,\2,\3……, \1 means from the left Start the first left parenthesis and the characters matched by the pattern between the matching right parentheses

Or: \|

\(string1\(string2\)\)
\1: string1\(string2\)
\2: sting2

1.2 Extended regular expressions

1.2.1 Character matching metacharacters

.               匹配任意单个字符,可以是一个汉字
[]              匹配指定范围的任意单个字符,例如:[grain]    [0-9]   [a-z]
[^]             匹配指定范围外的任意单个字符,例如:[^grain]
[:alnum:]       字母和数字
[:alpha:]       代表任何英文大小写字母,即a-z,A-Z
[:lower:]       小写字母,相当于a-z
[:upper:]       大写字母
[:blank:]       空白字符
[:space:]       水平和垂直的空白字符
[:digit:]       十进制数字
……

1.2.2 Times matching

*               匹配前面字符任意次
?               0或1次
+               1次或多次
{n}             匹配n次
{m,n}           至少m,至多n次

1.2.3 Position anchoring

^               行首
$               行尾
\<,\b           词首
\>,\b           词尾

1.2.4 Group other

()              分组
后向引用:\1,\2,...
|                  或者
a|b             #a或b
C|cat            #C或cat
(C|c)at         #Cat或cat

2. Grep for text processing

Function: The text search tool, according to the user-specified "mode", performs matching check on the target line by line, and prints the matched line

Mode: filter conditions written by regular expression characters and text characters

format:

grep [OPTIONS] PATTERN [FILE...]

Common options:

--color=auto            对匹配到的文本着色显示
-m #                    匹配#次后停止
-v                      显示不被pattern匹配到的行
-i                      忽略字符大小写
-n                      显示匹配的行号
-c                      统计匹配的行数
-o                      仅显示匹配到的字符串
-q                      静默模式,不输出任何信息
-A #                    后#行
-B #                    前#行
-C #                    前后各#行
-e                      多个选项间逻辑or关系 如:grep -e 'cat' -e 'dog' file
-w                      匹配整个单词
-E                      使用ERE,相当于egrep
-f file                 根据模式文件处理
-r                        递归目录,但不处理软链接
-R                      递归目录,并处理软链接
[root@CentOS8 ~]#df | grep '/dev/sd'
/dev/sda2      104806400 2311224 102495176   3% /
/dev/sda5       52403200  402140  52001060   1% /data
/dev/sda1        1038336  172128    866208  17% /boot
[root@CentOS8 ~]#df|grep '^/dev/sd'|tr -s ' ' %|cut -d% -f5|sort -n|tail -1
17

3. Sed for text processing

format:

sed [option]... 'script;script;...' inputfile...

Common options:

-n          不输出模式空间内容到屏幕,即不自动打印
-e          多点编辑
-f file     从指定文件中读取编辑脚本
-r,-E       使用扩展正则表达式
-i          原处编辑,加后缀代表备份并编辑

script format:

'地址命令'

Address format:

1.  不给地址:   对全文进行处理
2.  单地址:
    #:指定的行, $:最后一行
    /pattern/:  被此处模式所匹配到的每一行
3.  地址范围:
    #,#         从#行到#行,3,6 从第3行到第6行
    #,+#        从#行到+#行,3,+4 从3行到第7行
4.  步长:~
        1~2 奇数行
        2~2 偶数行

command:

p               打印当前模式空间内容,追加到默认输出之后
Ip              忽略大小写输出
d               删除模式空间匹配的行,并立即启用下一轮循环
a [\\]text      之指定行后面追加文本,支持使用\n实现多行追加
i [\\]text      在行前面插入文本
c [\\]text      替换行为单行或多行文本
w /path/file    保存模式匹配的行至指定文件
r /path/file    读取指定文件的文本至模式空间中匹配到的行后
=               为模式空间中的行打印行号
!               模式空间中匹配行取反
s/pattern/string/修饰符        查找替换,支持使用分隔符,也可以使用:s@@@,s###
替换修饰符:
g       行内全局替换
p       显示替换成功的行
w /path/file    将替换成功的行保存到文件中
I,i     忽略大小写
[root@CentOS8 ~]#sed -n '1,4p' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@CentOS8 ~]#

4. Awk for text processing

format:

awk [OPTIONS] 'program' var=value file...
awk [OPTIONS] -f programfile var=value file...

Description:

program is usually enclosed in single quotes and can be composed of three parts

  • BEGIN statement block
  • Common sentence blocks for pattern matching
  • END block

Common options:

  • -F "Separator" indicates the field separator used during input, the default is continuous blanks
  • -v var=value variable assignment

Program format:

pattern{action;...}

pattern: Determine when the action statement is triggered and trigger events, such as BEGIN, END, regular expressions, etc.

action: process the data, specify in {}, common: print, printf

Action print

format:

print item1,item2,...

Description:

  • Comma separator
  • The output item can be a string or a numeric value
  • If item is omitted, it is equivalent to print $0
  • Fixed characters need to be quoted with "", while variables and numbers do not need
[root@CentOS8 ~]#awk -F: -v OFS=':' '/root/{print $1,$3,$7}' /etc/passwd
root:0:/bin/bash
operator:11:/sbin/nologin
[root@CentOS8 ~]#

5. Examples of text processing

  • Count the number of users whose default shell is not /sbin/nologin in the /etc/passwd file, and display all users
[root@CentOS8 ~]#echo `grep -v '/sbin/nologin' /etc/passwd |wc -l` && grep -v '/sbin/nologin' /etc/passwd | cut -d: -f1
12
root
sync
shutdown
halt
lee
num
number
mageia
slackware
user1
user2
user3
[root@CentOS8 ~]#awk '!/\/sbin\/nologin/{print $NF}' /etc/passwd | wc -l
12
[root@CentOS8 ~]#awk -F: '!/\/sbin\/nologin/{print $1}' /etc/passwd
root
sync
shutdown
halt
lee
num
number
mageia
slackware
user1
user2
user3
[root@CentOS8 ~]#echo `awk '!/\/sbin\/nologin/{print $NF}' /etc/passwd | wc -l` && awk -F: '!/\/sbin\/nologin/{print $1}' /etc/passwd
12
root
sync
shutdown
halt
lee
num
number
mageia
slackware
user1
user2
user3
[root@CentOS8 ~]#
  • Find the user name, UID and shell type of the maximum user UID
[root@CentOS8 ~]#awk -F: '{print $1,$3,$7}' /etc/passwd | sort -nr -k2|head -n1
nobody 65534 /sbin/nologin
[root@CentOS8 ~]#getent passwd | sort -t: -k3 -n|tail -n1|cut -d: -f1,3,7
nobody:65534:/sbin/nologin
  • Count the number of connections of each remote host IP currently connected to this machine, and sort them from largest to smallest
[root@CentOS8 ~]ss -nt | tail -n +2 | tr -s ' ' : | cut -d: -f6|sort|uniq -c|sort -nr
[root@CentOS8 ~]ss -nt | grep "^ESTAB" | tr -s ' ' : | cut -d: -f6|sort|uniq -c|sort -nr
[root@CentOS8 ~]ss -nt | awk -F" +|:" '/^ESTAB/{print $(NF-2)}' |sort|uniq -c | sort -nr

6, text editing vim

6.1 Three modes of vim and conversion

Three common modes:

  • Command or normal mode: more mode, you can move the cursor, cut and paste text
  • Insert or edit mode: used to modify text
  • Extended command or command mode: save, exit, etc.

Command mode --> Insert mode

i   insert,在光标所在处插入
I   在当前光标所在行的行首输入
a   在光标所在处的后面输入
A   在当前光标所在行的行尾输入
o   在当前光标所在行的下方打开一个新行
O   在当前光标所在行的上方打开一个新行
  • Insert mode --- Esc ---> command mode
  • Command mode ---: ---> Extended command mode
  • Extended command mode - Esc, enter ---> command mode

6.2 Common Commands in Extended Command Mode

w           写磁盘文件
wq          写入并退出
q!          不存盘退出
wq!         强制存盘退出
r file      读文件内容到当前文件中
w file      将当前文件内容写入另一个文件
!command    执行命令
r!command   读入命令的输出

6.3 Address delimitation

/part1/,/part2/     #从第一次被part1匹配开始,一直到第一次被part2匹配结束
/pattern/           #从当前行向下查找,直到匹配pattern的第一行
%           #全文,相当于1,$
$           #最后一行
.,$-1       #当前行到倒数第二行
#           #第 # 行
#,#         #从左侧#起始行,到右侧#结束行

6.4 Address delimitation followed by an editing command

d       #删除
y       #复制
w file  #指定范围的行另存至指定文件中
r file  #在指定位置插入指定文件中的所有内容

6.5 Find and Replace

format

s/要查找的内容/替换的内容/修饰符
修饰符:
    i       #忽略大小写
    g       #全局替换,默认情况下,每一行只替换第一次出现
    gc      #全局替换,每次替换前询问
要查找的内容: 可使用基本正则表达式
替换为的内容: 不能使用模式,但是可以使用后向引用\1,\2,...等符号,还可以使用 "&" 引用前面查找时查到的整个内容

6.6 Customize vim work features

Configuration file

/etc/vimrc      #全局
~/.vimrc        #个人

Common features:

行号
    显示: set number, 简写 set nu
    取消显示:set nonumber,简写 set nonu
复制保留格式
    启用: set paste
    禁用: set nopaste
Tab用指定空格个数代替
    启用: set tabstop=# 指定#个空格代替Tab
    简写: set ts=#
光标所在行标识线
    启用: set cursorline,简写 set cul
    禁用: set nocursorline    

6.7 Command Mode

Jump between characters :

​ h: left l: right j: down k: up

Jump between words :

​ w: the beginning of the next word

​ e: the ending of the current or next word

​ b: The beginning of the current or previous word

Jump to the current page :

​ H: top of page M: middle line of page L: bottom of page

​ zt: Move the current line where the cursor is to the top of the screen

​ zz: Move the current line where the cursor is to the middle of the screen

​ zb: Move the current line where the cursor is to the bottom of the screen

Jump from the beginning of the line to the end :

​ ^ The first non-blank character

​ 0 Start of line

​ $ End of line

Move between lines :

​ #G or: # Jump to # line

​ G last line

​ 1G, the first line of gg

Move between sentences :

​) Next sentence (Previous sentence

Screen flip operation :

​ Ctrl+f Scroll one screen to the end of the file Ctrl+b Scroll one screen to the beginning of the file

​ Ctrl+d half-screen to the end of the file Ctrl+u half-screen to the beginning of the file

Delete command :

​ d Delete, combined with cursor jump to achieve range deletion

​ d$ delete to the end of the line

​ d^ Delete to the beginning of a non-blank line

​ dd Cut the line where the cursor is

​ #dd delete multiple lines

​ D: Delete from the current cursor position to the end of the line, equivalent to d$

Copy command :

​ y$ Copy

​ y0 Copy to the end of the line

​ y^ Copy to the beginning of a non-blank line

​ yy Copy line

​ #yy copy multiple lines

​ Y: Copy the entire line

Find:

/pattern: Search from the current cursor to the end of the file

?pattern: search from the current cursor to the beginning of the file

n: same direction as command

N: The opposite direction to the command

Undo changes:

u Undo recent changes

#u Undo multiple changes before

U Undo all changes of this line after the cursor is on this line

. Repeat the previous operation

#. Repeat the previous operation # times

7. Basics of shell scripting

Shell script: A text file containing some commands or statements and symbols in a certain format

Format requirements: first line shebang mechanism

#!/bin/bash
#!/usr/bin/python
#!/usr/bin/perl

7.1 Script example

  • Write the script disk.sh to display the maximum value of space utilization in the current hard disk partition
[root@CentOS8 script]#bash disk.sh
17
[root@CentOS8 script]#cat disk.sh
#!/bin/bash
df -h| awk -F" +|%" '/\/dev\/sd/{print $5}'|sort -nr|head -n1
  • Write the script system_info.sh to display the current host system information, including: host name, IPv4 address, operating system version, kernel version, CPU model, memory size, hard disk size
[root@CentOS8 script]#cat system_info.sh 
#!/bin/bash
RED="\E[1;31m"
GREEN="echo -e \E[1;32m"
END="\E[0m"
$GREEN-------------------Host systeminfo--------------------$END
echo -e "HOSTNAME:  $RED`hostname`$END"
echo -e "IPADDR:    $RED`ifconfig ens33|grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' |head -n1`$END"
echo -e "OSVERSION: $RED`cat /etc/redhat-release`$END"
echo -e "KERNEL:    $RED`uname -r`$END"
echo -e "CPU:       $RED`lscpu|grep 'Model name'|tr -s ' '|cut -d: -f2`$END"
echo -e "MEMORY:    $RED`free -h|grep Mem|tr -s ' ' :|cut -d: -f2`$END"
echo -e "DISK:      $RED`lsblk|awk '/^sd/{print $1,$4}'`$END"
$GREEN------------------------------------------------------$END
[root@CentOS8 script]#

Guess you like

Origin blog.51cto.com/13618052/2644383