File Four Musketeers

Table of contents

Preface

1. Regular expressions

2. grep

3. find

4. sed

五、awk



Preface

The Four Musketeers of Files refer to the four command line tools commonly used in the computer field, including awk, find, grep and sed. They are very powerful and useful when working with text files and searching for files.

1. Awk is a powerful text processing tool that allows users to process text files according to specified rules. It can split text based on field separators and operate on rows, columns, or multiple columns. It also supports functions such as conditional statements, loops, and functions, and can perform complex text processing and data analysis.

2. find is a tool for searching files. It can find files that meet specified conditions in a specified folder and its subfolders. Users can search for files based on file name, date, size and other criteria. It also supports advanced searches using regular expressions.

3. grep is a tool used to search text. It can find lines that satisfy a specified pattern in a text file. Users can search text based on keywords, regular expressions and other conditions. It also supports features such as recursive search, ignoring case, and showing context.

4. sed is a stream editor used to edit and convert text. It can perform operations such as replacing, deleting, inserting and printing text according to specified rules. It also supports functions such as regular expressions, conditional statements, and loops for flexible text processing.

These four tools are often used in the command line. They can be used individually or combined to achieve more complex operations. They provide powerful and flexible capabilities in text processing, data analysis, and file searching.


1. Regular expressions

 Before that, we first prepare a file test.txt for practice

##创建
vim /test.txt
##插入
shirt
short
good
food
wood
wooooooood
gooood
adcxyzxyzxyz
abcABC
best
besssst
ofion
ofson
ofison
AxyzxyzC
test
tast
hoo
boo
joo

a) Find specific characters cat test.txt | grep -n 'What you need to find' where -n means the number of lines to display

For example, if I search for hoo

 b) Use [] to find the collection characters
    cat test.txt | grep -n 'w[io]' to find content starting with w and match content with i or o in it


    cat test.txt | grep -n '[^w]' excludes content starting with w

cat test.txt |grep -n '^[w]' filters out content starting with w


    cat test.txt | grep -n '[ah]oo' filters out content containing abc and oo


    cat test.txt | grep -n '[ac]' filters out content containing abcd

 c) Find "^" at the beginning of the line and "$" at the end of the line
    cat test.txt | grep -n '^[a]' Find the content starting with a
    cat test.txt | grep 'C$' Find the content ending with C

 d) Find any character "." and the repeated character "*"
    cat test.txt | grep -n 'bo' to find the content of three characters starting with b and ending with o
    cat test.txt | grep -n 'oooo* 'Find all oooo contents

e) To find the continuous character range "{}", you need to use the escape character, "\{\}"  

1. `cat test.txt | grep -n 'o\{2\}'`
   This command first uses the cat command to output the contents of the test.txt file to the standard output, and then passes it to grep through a pipe (|) command to search. The grep command uses the `-n` option to display the line number of the matching line, and uses the regular expression `'o\{2\}'` to match lines containing two consecutive letters o.

2. `cat test.txt | grep -n 'wo\{2,5\}d'`
   The operation of this command is similar to the first command, except that the regular expression `'wo\{2,5\}d '' is used to match lines starting with two to five consecutive letters o and ending with the letter d.

3. `cat test.txt | grep -n 'wo\{2,\}d'`
   This command is still similar to the previous two commands. The regular expression `'wo\{2,\}d'` is used Matches lines starting with two or more consecutive letters o and ending with the letter d.

f) +, repeat one or more previous characters
    cat test.txt | grep -nE 'wo+d' or cat test.txt | egrep -n 'wo+d'

g)?, zero or one previous character cat test.txt | egrep -n 'g?od' is a special character in
    regular expressions , indicating that the previous character (here the letter g) can appear 0 times or 1 time. Therefore, this regex will match lines containing "od" or "god". That is, it matches lines containing "od" or "god", where the letter g is optional. ''g?od'?

 h) |, use or to find multiple characters
    cat test.txt | egrep -n 'of|is|on' This means multiple selection conditions, which may not be the same. Here is the search with of/is /on content

 i) (), find the group string
    cat test.txt | egrep -n 't(a|e)st'

 l) ()+, identify multiple duplicate groups
    cat test.txt | egrep -n 'A(xyz)+C'

Here’s what you need to know:

Common regular expression
    numbers
        "^[0-9]*[1-9][0-9]*$" //Positive integer  
        "^((-\d+)|(0+))$" //Not positive Integer (negative integer + 0)  
        “^-[0-9]*[1-9][0-9]*$” //Negative integer  
        “^-?\d+$” //Integer
        “^\d+(\ .\d+)?$” //Non-negative floating point number (positive floating point number + 0)  
        “^(([0-9]+\.[0-9]*[1-9][0-9]*) |([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*))$ " //Positive floating point number
        "^((-\d+(\.\d+)?)|(0+(\.0+)?))$" //Non-positive floating point number (negative floating point number + 0)
        " ^(-?\d+)(\.\d+)?$” //Floating point
    string
        “^[AZ]+$” //String consisting of 26 uppercase English letters  
        “^[az]+$ ” //A string consisting of 26 lowercase English letters  
        “^[A-Za-z0-9]+$” //A string consisting of numbers and 26 English letters  
        “^\w+$” //Consisted of
    Email , a string composed of numbers, 26 English letters or underscores
        “^[\w-]+(\.[\w-]+)*@[\w-]+(\.[\w-]+)+$” //email address “^([w-  
        . ]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([w-]+. )+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$” //Email Url “^[
    a
        -zA-z]+: //(\w+(-\w+)*)(\.(\w+(-\w+)*))*(\?\S*)?$” //url
    IP
        “^(d{1,2} |1dd|2[0-4]d|25[0-5]).(d{1,2}|1dd|2[0-4]d|25[0-5]).(d{1, 2}|1dd|2[0-4]d|25[0-5]).(d{1,2}|1dd|2[0-4]d|25[0-5])$” // IP address
    Tel
        /^((\+?[0-9]{2,4}\-[0-9]{3,4}\-)|([0-9]{3,4}\-) )?([0-9]{7,8})(\-[0-9]+)?$/ //Phone number
    date verification
           /^(d{2}|d{4})-(( 0([1-9]{1}))|(1[1|2]))-(([0-2]([1-9]{1}))|(3[0|1]) )$/ // year-month-day yyyy-MM-dd / yy-MM-dd format
           "^[0-9]{4}-((0([1-9]{1}))|(1[1|2]))-(([0-2]([0-9]{ 1}))|(3[0|1]))$" // Year-month-day yyyy-MM-dd format/^((0
          ([1-9]{1}))|(1[1 |2]))/(([0-2]([1-9]{1}))|(3[0|1]))/(d{2}|d{4})$/ // month/day/year

 2. grep

1. Grep is used to filter the contents of files:
    -r recursively scans each file in the specified directory
    -l only displays the file names that match the specified keyword, not the file content

[root@bogon opt]# grep -lr "good" .
./test.txt
####这表示在当前目录查询那个文件内有“good”这个内容

2. Case view     grep -rl bash
    /etc for all file names containing bash in the /etc directory

3. egrep perfectly supports regular expressions

3. find

1. find ./ -type f -prem 644 Find files with permission 644 in the current directory

[root@localhost opt]# find ./ -type f -perm 644
./test.txt
./1.sh
[root@localhost opt]# ll
总用量 8
-rw-r--r--  1 root root 767 8月  21 22:00 1.sh
drwxr-xr-x. 2 root root   6 10月 31 2018 rh
-rw-r--r--  1 root root  12 8月  21 21:59 test.txt

2. Search according to timestamp
    -atime
    -mtime
    -ctime

按照时间戳查找是指根据文件的访问时间、修改时间和状态改变时间来搜索文件的过程。在Linux系统中,可以使用find命令来进行这样的时间戳搜索。

具体的时间戳搜索可以使用以下三个选项:

1. -atime:根据文件的访问时间来搜索文件。访问时间指的是文件最后一次被访问的时间。使用该选项时,可以指定一个时间参数,如+n、-n或n,来搜索在指定时间范围内被访问的文件。

示例:查找在过去30天内被访问过的文件

find /path/to/search -type f -atime -30


2. -mtime:根据文件的修改时间来搜索文件。修改时间指的是文件内容最后一次被修改的时间。使用该选项时,也可以指定一个时间参数,来搜索在指定时间范围内被修改的文件。

示例:查找在过去7天内被修改过的文件

find /path/to/search -type f -mtime -7


3. -ctime:根据文件的状态改变时间来搜索文件。状态改变时间指的是文件元数据(如权限、所有者等)最后一次改变的时间。同样地,可以指定一个时间参数来搜索在指定时间范围内状态改变的文件。

示例:查找在过去24小时内状态改变过的文件

find /path/to/search -type f -ctime -1


在这些示例中,`/path/to/search`是要搜索的目录路径,`-type f`用于限制搜索仅包括文件而不包括目录。

通过这些选项,可以根据文件的访问时间、修改时间和状态改变时间来精确搜索和筛选文件。

3、-exec   find /var/spool/mail -type f -exec rm -rf {} \;
      xargs   find /var/spool/mail -type f | xargs rm -rf

查看所有邮箱中的文件并
##-exec
    find /var/spool/mail -type f -exec rm -rf {} \;
##xargs
	find /var/spool/mail -type f | xargs rm -rf

4. sed

Syntax: sed [option] 'operation' parameter
        sed [option] -f scriptfile parameter

Option
    -e: Indicates processing with specified command or script
    -f: Specifies script file
    -h: Help
    -n: Indicates only displaying the processed results
    -i: Directly edit the text file
    -r: Supports extended regular expressions
Operation
    a: Add, in Add the specified content in a line below the current line
    c: Replace, replace the selected line
    d: Delete, delete the specified line
    i: Insert, insert a line above the selected line
    p: Print
    s: Replace, replace the specified character
    y: Character conversion

1.输出符合条件的文本:
sed -n 'p' test.txt 	#相当于cat
sed -n '3p' test.txt	#打印第3行
sed -n '3,6p' test.txt	#打印第3到6行的内容
sed -n 'p;n' test.txt	#打印奇数行
sed -n 'n;p' test.txt	#打印偶数行
sed -n '1,6{p;n}' test.txt	#打印1到6行之间的奇数行
sed -n '5,${p;n}' test.txt	#从第5行开始打印奇数行
sed -n '/the/p' test.txt	#匹配the
sed -n '5,/the/p' test.txt 	#匹配从第5行开始到包含the的行
sed -n '/the/,10p' test.txt 	#匹配从包含the的行到第10行结束
sed -n '/the/=' test.txt	#打印包含the的行号
2.删除符合条件的文本
	nl test.txt | sed '3d'	#删除第3行
	nl test.txt | sed '3,5d'
	nl test.txt | sed '/the/d'	#删除the所在行
3.替换符合条件的文本
	nl test.txt | sed 's/the/TTTTTT/'	#替换全文本
	nl test.txt | sed '4s/the/TTTTTT/'	#替换第4行
	nl test.txt | sed 's/l/L/2'		#替换匹配到的第2个l

If you want to directly modify the text source file with the above modification, you only need to add the option "-i"

五、awk

Syntax
    awk option 'pattern or condition {edit command}' file 1 file 2 ...
    awk -f script file file 1 file 2 ...

语法
	awk 选项 '模式或条件{编辑命令}' 文件1 文件2 ...
	awk -f 脚本文件 文件1 文件2 ...
选项
	-F
		指定每行的分隔符
	默认分隔符为空格
内建变量
	FS:指定每行的分隔符
	NF:指定当前处理行的字段个数
	NR:当前处理行的行号
	$0:当前处理行的整行内容
	$n:当前处理的第n个字段
	FILENAME:处理文件名
	RS:数据记录分隔,默认是\n
案例:
	a)按行输出
		awk '{print}' test.txt 	 	#等同cat 
		awk 'NR>=1&&NR<=3{print}' test.txt 
		awk 'NR==1,NR==3{print}' test.txt #打印1到3行
		awk 'NR%2==0{print}' test.txt 	#打印偶数行
	b)按段输出
		默认以"空格"分段!
		ifconfig ens33 |awk '/netmask/{print $2}' #筛选IP地址
		cat /etc/shadow | awk -F : '$2=="!!"{print $1}' #打印不能登录系统的用户
	c)调用shell命令
		cat /etc/passwd | awk -F : '/bash$/{print | "wc -l"}' /etc/passwd 	#统计能够登录系统的用户个数	

Guess you like

Origin blog.csdn.net/2302_78534730/article/details/132414584