Regular expressions (awk, sed, awk) under linux learning

Transfer: https://www.cnblogs.com/menglingqian/p/6783527.html
a regular expression:
regular expression (also known as Regular Expression, referred to as RE) is the general character (such as characters a to z) and special characters (called metacharacters) composed of text mode. The model describes one or more strings to be matched when searching the body of text. Regular expression as a template, a character pattern to match with the search string. Simply put, the regular expression is a string processing method, which is based in units to process behavior strings, regular expressions assisted by some special symbols, allows users to easily reach search / delete / replace a handler particular string. vim, grep, find, awk, sed and other commands support regular expressions. Common regular expression: 1 ,. represents any single character, such as: /l..e/ comprising a L, followed by two characters, then the match line e with a match or a zero character?. Such as: '? Gr p' followed by a match or no gr character, then p row 2, ^ represents the beginning of the line. ^ love such as: matches all lines beginning with Love 3, ending $ representatives row. The love $: all love match line end so '^ $' it means empty row 4, [...] matches one of the characters in brackets [abc] matches a single character a or b, or C [123] to match a single character or 2, or. 3 [AZ] matches one of the lowercase letters AZ [A-zA-the Z] matches any one of the letters [0-9a-zA-Z] matches any one of the letters or numbers Note: Individual and one above the red mark, regardless of the [] more inside the complex, its result is a character! Tag may be used as prefix ^ in [], in addition to the character representing the character [] in the. Pre oo no such search strings g. Application '[^ g] oo' as a search string, if the caret appears in [] indicates the starting position of a negative, but in [] is a position other ordinary characters. [ab & ^ c ^] ^ matches or b or c or any single character is not a 5, * for modifying the preceding character, represents the leading characters appear 0 times or as many times as: 'a * grep' match all 0 or line grep followed after more than a. . "*" Means any string 6, \? Pertaining to the leading character, represents the leading characters appear 0 or 1 time a \? Matches 0 or 1 A 7, \ + for the modification of the leading character, represents the leading characters appear 1 or more a \ + 1 or more of the match A . 8, \ {n, m \} for modification leading characters, the leading characters appear represents n to m times (n and m are integers and n <m) A \ {3,5 \} matches 3-5 consecutive a \ {n, m \} there are several other forms: \ {n \} n consecutive preamble characters \ {n, \} at least n consecutive leading character 9, \ escape immediately following a single special character so that the characters become a common special characters such as: ^ \ [0-9] [0-9] in the start of a period and two numbers. e.g. : a * matches any consecutive (including 0) a a \ match a 0 or 1? a \ + 1 or more of a match a \ {3,5 \} a matching 3-5 consecutive ^ [A-Za-z] * [^,] [A-Za-z] * $ \ * matches zero or more consecutive. \ Indicates ordinary characters periods 10, | or expressed as: a | b | c matches a or b or c. Such as: grep | sed matching grep or Sed . 11, (), the part of the synthesis of a unit group, for example, to search for glad or good can be as follows 'g (la | oo) d ' Comprehensive Example. 1: . 1 Christian Scott Lives here Wallpaper and Will . PUT ON A Christmas Party 2 There are around 30 to 35 people invited. . 3 They are: . 4 Tom . 5 Dan . 6 Rhonda Savage . Nicky. 7 and Kimerly . Steve. 8, Suzanne, Ginger and Larry ^ [AZ] .. $ search line in the beginning of a letter from a to Z, and with two arbitrary letters, then the row with a newline. You will find the first row 5. ^ [AZ] [az] * 3 [0-5] search begins with a capital letter followed by zero or more lowercase letters, followed by the number 3, a number between 0-5 talk. Unable to find a matching row (change ^ [AZ] [az] * . * 3 [0-5] can be found on Line 2) ^ * [AZ] [az] [az] $ searches begin with 0 or more spaces , with a capital letter, two lowercase letters and a symbol change. The fourth row find TOM (match entire row) and the fifth row. Note * there is a space in front. We will look to begin with zero or more uppercase or lowercase letters, not with a comma, then with zero or more uppercase or lowercase letters, then transfer with a symbol. You will find 4 and 5 lines. Comprehensive example 2: # LS the -l / bin | grep '^ ... S' above command is used to find suid files; # LS -lR / usr | grep '^ ... s..s' above command is used to find the suid and guid. Second, use the grep command grep (global search regular expression (RE ) and print out the line, comprehensive search regular expression and print out the line) is a powerful text search tool, you can use a regular expression search text, and matching the print line. parameters: 1. NUM -A, - after addition-context = NUM meet a line other than listed, and listed after NUM lines. Such as: $ grep -A 1 panda file (search line has panda style from file, and displays the row after row 1) 2. NUM -B, - before-context and -A NUM = NUM opposite, but this parameter is displayed in addition to compliance with rows and rows displayed NUM before it. Such as: (from file search pattern row have panda, and displays the row before row 1) $ 1 -B grep panda file . 3, -C [NUM], -NUM, --context [= NUM] listed match beyond the upper and lower rows and lists NUM lines, the default value is 2. Such as: (listed in the file, in addition to comprising line styles and lists panda 2 vertical lines) (to change the default values can be changed directly NUM) $ grep -C [NUM] file panda . 4, -C, --count not meet the display style lines, only in line with the total number of rows displayed. When coupled with -v, - invert-match, total number of rows displayed parameters do not meet the 5, -i, - ignore-case ignore case difference 6, -n, - line-number printed on the front of the line match line number 7, -v, - revert-match anti retrieval, only mismatched line 8, an exact match: for example, such as a character string extraction 484 and 483 contain "48" other characters "48", to return results that contain string, in fact, should be accurately extract only contains the row 48. Use grep extraction exact match is an effective way to increase the extraction string \>. Suppose now that the precise extraction 48, as follows: # grep '48 \> 'filename . 9, -s no error information does not exist or no match text as: Run grep "root" / etc / password , since the password file does not exist , the output error message on the screen, if using the grep command -s switch, can shield an error message to better use grep tool, in fact, to write regular expressions, grep here so not all functions example to explain, only column a few examples to explain the wording of a regular expression. $ Ls -l | grep '^ d ' filter content outputted via conduit ls -l, displaying only the beginning of the line d. $ Grep 'test' d * show all files beginning at row d included in the test. Grep $ 'Test' AA BB CC $ grep '[AZ] \ {. 5, \}' AA show all rows each string comprising at least 5 contiguous string of lower case characters. $ grep 't [a | e ] st' filename Displays the lines of test or tast. $ grep '\. $' filename displayed in all lines to the end. Third, use the command sed sed is an online editor that processes a row content. Handling, storing the row currently being processed in a temporary buffer, called a "model space" (pattern space), followed by treatment with the contents of the buffer sed command, the processing is completed, the contents of the buffer sent to the screen. Then the next line, which is repeated until the end of the file. File contents not changed, unless you use redirection to store the output. sed basic commands: 1. Alternatively: s Commands 1.1 Basic usage such as: sed 's / day / night /' <old> new this example each day replaces the old files in the first row occurrence into night, the output to a file new new S "Alternatively "command /../../ delimiter (the delimiter) Day search string night replacement string fact, the delimiter" / "can be replaced with another symbol, Such as: sed 's / \ / usr \ / local \ / bin / \ / common \ / bin /' <old> new equivalent Sed 'S_ / usr / local / bin_ / Common / bin_' <Old> new newexample ----- no matter what the character, followed by s commands are considered new delimiter, so "#" Here is the delimiter, instead of the default "/" delimiter. 10 represents all 100 replaced. 1.2 & represented by the matching string may sometimes want to at or near the periphery of the matched string plus some characters such as: Sed 'S / ABC / (ABC) /' <Old> new new ABC found before and after the example . parentheses this example can also be written sed 's / abc / (& ) /' <old> new below are more complex example: Sed 'S / [AZ] * / (&) /' <Old> new new Sed default to replace only the first occurrence of the search string, using / g can be replaced all the search string $ sed 's / test / mytest / g' example ----- mytest to replace an entire line range test. If g is not marked, only the first test for each row is replaced with a matching mytest. $ Sed 's / ^ 192.168.0.1 / & localhost /' example ----- & notation partial replacement string is found. All 192.168.0.1 is replaced with the beginning of the line will add to its own localhost, become 192.168.0.1localhost. $ Sed 's # 10 # 100 # g' example ----- no matter what the character, followed by s commands are considered new delimiter, so "#" Here is the delimiter, instead of the default the "/" separator. 10 represents all 100 replaced. If you need to make multiple changes to the same file or line, you can use the "-e" option

Eth0 network card to obtain an IP address:

2. Delete rows: d command

Delete all lines containing "how" from a file

The / etc / passwd contents of the display and find the print line number, while delete 2-5

Annex: nl command linux used to calculate the number of lines in the file system. nl contents of the file can be output automatically with line numbers

If you simply delete the second line, you can use nl / etc / passwd | sed '2d' to achieve, as if you want to delete the third to the last line, it is nl / etc / passwd | sed '3, $ d' can be.

3. Increasing line: a command (add after the specified row) or i commands (add before the specified line)

May be connected behind a string, which strings appear in a new line

Increase "XXXXX" in the words of the second line of / etc / passwd new row

Increase "XXXXX" word in front of the second line of / etc / passwd new row

If you want to add multiple rows, each row use the backslash \ to add a new row

4, a substituted line: c command

C may be connected to the back of strings that can be substituted n1, n2 line between

5. Print: p command

sed '/ north / p' datafile default output all the rows to find the row north repetitive printing

 

sed -n '/ north / p' datafile prevent the default output, print only found north of the line

 

nl / etc / passwd | sed -n '5,7p' lists only lines 5 ~ 7 / etc / passwd file contents

 

Note: sed -i option to modify the contents of the file directly

6. Extended:

Calls sed in three ways:

l at the command line, type the command

l The sed command into a script file, and then call sed

l The sed command into a script file, sed scripts and executables.

A, using the command line format sed:

sed  [options] sed command input file.

Remember the command line using the sed command, the actual command to add a single quote. sed double quotes is also allowed.

 

B, using a sed script file format:

sed [选项] -f sed脚本文件  输入文件

 

C、要使用第一行具有sed命令解释器的sed脚本文件,其格式为:

sed脚本文件 [选项]  输入文件

 

不管是使用shell命令行方式或脚本文件方式,如果没有指定输入文件, sed从标准输入中接受输入,一般是键盘或重定向结果。

 

sed选项如下:

-f, --filer=script-file 引导sed脚本文件名

 

五、awk命令:

awk也是一个数据处理工具!相较于 sed 常常作用于一整个行的处理, awk 则比较倾向于一行当中分成数个字段来处理。

 

.awk语言的最基本功能是在文件或字符串中基于指定规则来分解抽取信息,也可以基于指定的规则来输出数据。

 

有三种方式调用awk

 

1.命令行方式

awk  [-F field-separator]  'commands'  input-files

其中,[-F域分隔符]是可选的,因为awk使用空格或tab键作为缺省的域分隔符,因此如果要浏览域间有空格的文本,不必指定这个选项,如果要浏览诸如passwd文件,此文件各域以冒号作为分隔符,则必须指明-F选项,如:awk -F: 'commands' input-file。

commands 是真正awk命令, input-files 是待处理的文件。

输出结果如下,字段以:分割,取到每行的第一个字段

iput_files可以是多于一个文件的文件列表,awk将按顺序处理列表中的每个文件。

在awk中,文件的每一行中,由域分隔符分开的每一项称为一个域。通常,在不指名-F域分隔符的情况下,默认的域分隔符是空格或tab键。

 

2.shell脚本方式

将所有的awk命令插入一个文件,并使awk程序可执行,然后awk命令解释器作为脚本的首行,以便通过键入脚本名称来调用。

相当于shell脚本首行的:#!/bin/sh可以换成:#!/bin/awk

 

3.将所有的awk命令插入一个单独文件,然后调用:

Awk   -f   awk-script-file         input-files

其中,-f选项加载awk-script-file中的awk脚本,input-files跟上面的是一样的。

 

awk的模式和动作

 

任何awk语句都由模式和动作组成(awk_pattern { actions })。
在一个awk脚本中可能有许多语句。

模式部分决定动作语句何时触发及触发事件。处理即对数据进行的操作。如果省略模式部分,动作将时刻保持执行状态。即省略时不对输入记录进行匹配比较就执行相应的actions。

 

模式可以是任何条件语句或正则表达式等。awk_pattern可以是以下几种类型:

 

1) 正则表达式用作awk_pattern: /regexp/

例如:awk '/ ^[a-z]/' input_file

2) 布尔表达式用作awk_pattern,表达式成立时,触发相应的actions执行。

① 表达式中可以使用变量(如字段变量$1,$2等)和/regexp/

② 布尔表达式中的操作符:

 

关系操作符: < > <= >= == !=
匹配操作符: value ~ /regexp/ 如果value匹配/regexp/,则返回真
value !~ /regexp/ 如果value不匹配/regexp/,则返回真
例如: awk '$2 > 10 {print "ok"}' input_file
      awk '$3 ~ /^d/ {print "ok"}' input_file

③ &&(与) 和 ||(或) 可以连接两个/regexp/或者布尔表达式,构成混合表达式。!(非) 可以用于布尔表达式或者/regexp/之前。

 

例如: awk '($1 < 10 ) && ($2 > 10) {print "ok"}' input_file
      awk '/^d/ || /x$/ {print "ok"}' input_file

模式包括两个特殊字段 BEGIN和END。使用BEGIN语句设置计数和打印头。BEGIN语句使用在任何文本浏览动作之前,之后文本浏览动作依据输入文本开始执行。END语句用来在awk完成文本浏览动作后打印输出文本总数和结尾状态标志。

 

实际动作在大括号{ }内指明。动作大多数用来打印,但是还有些更长的代码诸如i f和循环语句及循环退出结构。如果不指明采取动作,awk将打印出所有浏览出来的记录。

 

awk执行时,其浏览域标记为$1,$2...$n。这种方法称为域标识。使用这些域标识将更容易对域进行进一步处理。

 

使用$1 , $3表示参照第1和第3域,注意这里用逗号做域分隔。如果希望打印一个有5个域

的记录的所有域,不必指明$1 , $2 , $3 , $4 , $5,可使用$0,意即所有域。

为打印一个域或所有域,使用print命令。这是一个awk动作

 

awk的运行过程:

①  如果BEGIN 区块存在,awk执行它指定的actions。

②   awk从输入文件中读取一行,称为一条输入记录。(如果输入文件省略,将从标准输入读取)

③   awk将读入的记录分割成字段,将第1个字段放入变量$1中,第2个字段放入$2,以此类推。$0表示整条记录。

④   把当前输入记录依次与每一个awk_cmd中awk_pattern比较,看是否匹配,如果相匹配,就执行对应的actions。如果不匹配,就跳过对应的actions,直到比较完所有的awk_cmd。

⑤   当一条输入记录比较了所有的awk_cmd后,awk读取输入的下一行,继续重复步骤③和④,这个过程一直持续,直到awk读取到文件尾。

⑥   当awk读完所有的输入行后,如果存在END,就执行相应的actions。

实例:

例1:显示/etc/passwd文件中的用户名和登录shell 

显示/etc/passwd的账户和账户对应的shell,而账户与shell之间以tab键分割

显示/etc/passwd文件中的用户名和登录shell, 而账户与shell之间以逗号分割

 

 

注:

1.awk 后面接两个单引号并加上大括号 {} 来设定想要对数据进行的处理动作

2.awk工作流程是这样的:先执行BEGING,然后读取文件,读入有\n换行符分割的一条记录,然后将记录按指定的域分隔符划分域,填充域,$0则表示所有域,$1表示第一个域,$n

Guess you like

Origin www.cnblogs.com/zzzao/p/11493002.html