Linux Three Musketeers grep, sed, awk and use presentation

Linux is a Three Musketeers (grep, sed, awk) three short, skillful use of these three tools can improve the efficiency of operation and maintenance. Three Musketeers with Linux as the basis of a regular expression, while the Linux system, supports two regular expressions, are "standard regular expression" and "extended regular expressions." After mastering regular expressions, I will explain the specific use of the Three Musketeers.

First, the regular expression

正则表达式:REGular EXPression, REGEXP
元字符:
.: 匹配任意单个字符
[]: 匹配指定范围内的任意单个字符
[^]:匹配指定范围外的任意单个字符
 字符集合:[:digit:], [:lower:], [:upper:], [:punct:], [:space:], [:alpha:], [:alnum:]
    注意:字符集合要用[ ]包含

匹配次数(贪婪模式):
*: 匹配其前面的字符任意次 
 a, b, ab, aab, acb, adb, amnb
 a*b, a?b
 a.*b

 .*: 任意长度的任意字符
\?: 匹配其前面的字符1次或0次
\+:匹配至少一次
\{m,n\}:匹配其前面的字符至少m次,至多n次
 \{1,\}
 \{0,3\}
    备注:至少0次,必须要显示的写出来。

位置锚定:
^: 锚定行首,此字符后面的任意内容必须出现在行首
$: 锚定行尾,此字符前面的任意内容必须出现在行尾
^$: 空白行

\<或\b: 锚定词首,其后面的任意字符必须作为单词首部出现
\>或\b: 锚定词尾,其前面的任意字符必须作为单词的尾部出现

分组:
\(\)
 \(ab\)*
 后向引用
 \1: 引用第一个左括号以及与之对应的右括号所包括的所有内容
 \2:
 \3:

You can see the process of using standard regular expression, many symbols need to escape, which brings some inconvenience at work, thus expanding the regular expression appeared.

Second, to expand the regular expression

1. 字符匹配:
.
[abc]:包含abc任意一个字符
[^abc]:不包含abc任意一个字符

2. 次数匹配(不用再转义):
*: 
?:
+: 匹配其前面的字符至少1次
{m,n}

3. 位置锚定:
^
$
\<
\>

4. 分组(不用再转义):
():分组
\1, \2, \3, ...

5. 或者
|: or
C|cat: C或cat(表示的是整个部分)

Can be seen, the use of extended regular expressions can be omitted many of the escape symbol, which is especially when writing sed statement greatly improves the readability of the code. It recommended that priority use extended regular expressions.

Three, grep command family

3.1. Grep command related

family consists grep command grep, egrep, fgrep three sub-commands which are applied to different scenarios. As follows:
Command Description
grep native grep command, using the "standard regular expression" as matching criteria.
egrep Extended grep command, equivalent to $ (grep -E), using the "Extended Regular Expression" as the matching criteria.
fgrep a simplified version of the grep command does not support regular expressions, but fast search speed, low system resource usage.

3.2 Usage

Syntax
grep [Options] the PATTERN [the FILE ...]
Options section
-i: ignore case
--color: highlighting the character string matching
-v: display pattern not matched to the line
-o: show only pattern matching the string
-E: use extended regular expressions
PATTERN portion
so as to match a given template string, the string may be used and the ordinary regular expressions (standard & extended).
FILE part of the
need to find the contents of the file.

Four, sed command

4.1 Overview

sed name is EDitor Stream
sed is a stream editor, line editor

4.2 The basic syntax

sed [Option] 'script' [INPUT File] ...
Option section
-n: Do not output the contents of the pattern space to stdout
-e: a plurality of source script can specify sed command, the multi-editing
-f: Input sed script, the script reads editing commands
-r: support the use of extended regular
-i: directly edit the source file

script portion
address delimited editing commands (and vim command similar)
1) null address: Full text edit
2) Single Address:
  #: specifies a line, edit a particular row
  / pattern /: Specifies the pattern matching to the line
3) Address range:
  #, #
  #, # +
  #, / pattern /
  / the pattern1 /, / pattern2 /
. 4) stepping address:
  1 to 2: 1 at a starting line, then down match line progressive 2
  2 ~ 2: All even rows
5) editing commands:
  D: delete the entire line, d on the last
  p: display the contents of the pattern space, and finally placed in
  a: add text after the line matching using \ n additional supports multiple lines. placed behind a delimitation
  i: text added in front. Example: Sed 'Hello 3i' XXX
  C: replacement behavior specified text. Example: sed '3c text' xxx third row to replace text. -i Sed '/ XYZ / C HelloWorld' num.txt
  W: content storage space pattern matching to the specified location. Example: sed -n '/ ^ [^ #] / w / tmp / demo' / etc / fstab The / etc / fstab saved to the beginning of the line # Africa / tmp / demo in.
  r: reading the contents of the specified file to the file matches back to the current line, merge files.
  ! : Conditions negated. Usage: Address delimitation editing commands!.
  s ///: Condition replacement.
Alternatively mark Remark: g (overall replacement), p (Success alternative display line)

Replace Example: according to input lookup directory
echo "/ var / log / messages " | sed 's @ [^ /] + $ / @@?'

4.3.sed Advanced Usage

  1. Model space and maintain the space

In the pattern space, matching is completed. When there is no match on, lines of text content will be the default output stdout; when the match on the line of text, will perform editing commands, output the results to stdout.
Keep the space can be understood as a temporary storage area, only for the completion of additional actions.

  1. Parameter
    h: the contents of the pattern space cover to the holding space;
    H: the contents of the pattern space is added to the holding space;
    G: the contents of the holding space is overlaid on the pattern space;
    G: the holding space SUMMARY appended to the pattern space;
    X: the contents of the pattern space holding the contents of the swap space;
    n-: reading matching cover to the next line (change point) into the pattern space;
    N: additional reading the next line of the matched lines (change point) into the pattern space;
    D: delete the line pattern space;
    D: delete all rows multiple rows in the pattern space;
3. 举例
sed -n 'n;p' FILE:显示偶数行;
sed '1!G;h;$!d' FILE:逆序显示文件的内容;
sed '$!d' FILE:取出最后一行;
sed '\$!N;$!D' FILE:取出文件后两行;
sed '/^$/d;G' FILE:删除原有的所有空白行,而后为所有的非空白行后添加一个空白行;
sed 'n;d' FILE:显示奇数行;
sed 'G' FILE:在原有的每行后方添加一个空白行;
  • Example: extract a string
/bin/bash
info="hellozimskyshenzhen"
echo $info | sed 's/hello\(\w\+\)shenzhen/\1/g'

Remarks:

  • sed not supported \ D, if with the use of digital [0-9], but supports \ w.
  • sed in () to escape, to escape +, <> greater than less than the number to be escaped.
  • Example: determining whether the presence of the specified format string
#!/bin/bash
# 判断输入是否为整数
if [ -n "$(echo $1 | sed -n '/^[0-9]\+$/p')" ] ; then
  echo 'yes'
else
  echo 'no'
fi

Five, awk command

5.1. Awk Overview

Awk invention is the first letter of the name of the tool three short, awk is a report generator is used for formatting the output. Formatted text output device.

5.2 Basic Usage

1. Syntax
gawk [option] 'program' FILE
wherein Program: the ACTION STATEMENTS the PATTERN {}
{} may be understood as an operation instruction command, the most commonly used print, printf

2. awk process of reading a document
in accordance with the line to read a document, cut into small portion of the separator according to the input (with a built-in variable represented $ 2 $ 0 $ 1 ...), treated with ACTION STATEMENTS. $ 0 represents the entire row.

Option 3. Option
-F: name input field delimiter;
-v: used to implement custom variable var = value;

4. PATTERN (for delimiting)
 empty: each line represents a document processing
 / pattern /: the line matching using regular need of treatment
 / pattern /:! Take Anti above
 relational expression: true or false result, the processing result is true , not fake treatment. Non-zero non-empty string is true, the other false.
 Delimitation line: format is not supported directly give the number (1, 2 {...}). See, for example.
 BEGIN / END mode: BEGIN {} denotes performed only once before the start of the processing program of the text file, such as print header. END {} denotes performed once, for example, summary text data after processing is completed.

举例:
awk -F: '$NF=="/bin/bash" {print $1, $NF}' /etc/passwd
awk -F: '$NF!"/bash/$"{print $1,$NF}' passwd
awk -F: '$3<1000 {print $1, $3}' /etc/passwd
awk -F; '(NR>=2&&NR<=10){print $1}' /etc/passwd 行定界
awk -F: '{printf "%-15s %10s\n", $1, $2}' /etc/passwd

5. Variables

  • Built-in variables (when referencing your variables plus $)
    FS : Field, the INPUT Seperator: input field separator, default blank characters. The -v specified.
    The OFS : Output field separator. The -v specified.
    The RS : Input newline
    the ORS : newline output
    NF number of fields in each row of the number of field:. Plus $ NF is the last one .
    NR : Number of rows of the record file, print out the print line number is
    the FNR : Number of rows of the plurality of files are counted
    FILENAME : file name of the current file
    ARGC : the number of parameters on the command line arguments
    ARGV : Returns an array of command each parameter row
    example: awk 'BEGIN {print ARGV [ 0]}' / etc / fstab / etc / issue
    here ARGV [0] is awk, fixed in the first argument 0. ARGV [1] is the / etc / fstab, ARGV [2 ] is the / etc / issue
    example: awk -v FS = ':' '{print $ 1}' -v OFS = ':' / etc / passwd named as input colon delimiter. With awk -F: ...

  • Custom Variable
    1 Method: -v var = value (character case sensitivity)
    Method 2: In the definition of the program

    举例:awk -v test='hello' 'BEGIN {print test}'
    awk 'BEGIN {test='hello' print test}'

6. ACTION commonly used commands

  • print
    output format: print item1, item2 ...
    Note: Use comma as a delimiter; output item can be a string, built-in variables, awk expressions; if the item is omitted, the entire line is displayed $ 0;

  • printf
    formatted output: printf FORMAT, item1, item2 ... on the format of the bit.
    Note: format must be given; To wrap must be to write display; behind each item specified in the format required for the character format;
  • Expressions
  • Control statements:控制语句if,while
    if(condition){statement}
    if(condition){statement} else {statements}
    while(condition) {statements}
    do {statements} while(condition)
    for(expr1;expr2;expr3) {statements}
    break
    continue
    delete array[index]
    delete array删除整个数组
    exit 退出语句
  • Compound statements: Statement combination
  • Input statements: input statement
  • Output statements: Statement output
    specifier : %
     C: ASCII value of the character displayed
     % d: Display a decimal integer
     % e: numerical display scientific notation
     % f: shown as floating
     % g: scientific notation shown float
     % s: display string
     % U: unsigned integer displays
     %%: display itself%
    modifier : #
     [. #]: a first digital display for controlling the width of the character, the second number represents the decimal precision (floating point and for Introduction); default output right alignment% 15s, left:% - 15s; +: indicates plus and minus signs;
    operator :
     arithmetic operators: + - / *; + x to numeric string; into the -X- negative;
     string operator: the string concatenation (no operator)
     assignment operator: =, + =, - =, / =, +, -
     comparison operators:>, <, <=, =,! =
    pattern matcher :
     ~: string matching mode whether the left
     ~:! whether the string pattern matching can not be left
    logical operators :
     &&: and
     ||: or
     :! non
    function call :
     FUNCTION_NAME (arg1, arg2, ...)
    conditional expression :
     selector true_exp:? false_exp and ternary operator as

  • Operation example
# 一般来说, 打印无状态内容放在BEGIN和END块中
awk -v begin="hello" -v end="ok" -F: 'BEGIN{print begin}; {print $1, $NF}; END{print end}' /etc/passwd
5.3. Awk Advanced Usage and Examples

Common awk built-in variable


$1:表示第一列

$NF:表示最后一列

$NR:表示行号

Common conditions indicate

1) / specify the content /

This way you can match to the line containing the "designated content", do not add the items brought in $ # conditions, it is recommended not to use regular, there are exceptions.


awk -F: '/nologin/{print $0}' /etc/passwd #匹配到含有nologin关键字的行

seq 100 | awk '/1/{print $1}'

2) $ # = / content specified /

In this way specifies the first column that match the specified content #


awk -F: '$1=/bin/{print $0}' /etc/passwd

3) $ # ~ / specified content /

In this manner fuzzy matches for the specified column (regular match) specify the content, and acquires the row.


awk -F: '$1~/dae/{print $1}' /etc/passwd  #正向选择

awk -F: '$1!~/dae/{print $1}' /etc/passwd #反向选择

4) determining the value of

Using>, <,> =, <=, ==,! = To determine the value for the specified column.


awk -F: '$3>=10{print $1}' /etc/passwd

5) determination logic

Use &&, || to logical judgment.


awk -F: '$3>=5 && $3<=10{print $1}' /etc/passwd

6) if the condition is determined


awk -F: '{if ($NF~/nologin$/){i++}else{j++}}; END{print i, j}' /etc/passwd

#注意if-else条件判断是放在{}中的

7) dictionary use

You can define array types in awk, for statistics.


awk '{ip[$1]++}; END{for (i in ip) {print i, ip[i]}}' access.log

#解析: 将第一列ip设置为字典的key,当出现一次相同的ip时自增1,用于统计所有的ip计数。

#for循环中取到每个字典对应的key,再使用print块打印出来。注意花括号的隔离。

#QQ号 等级  时长

#统计等级(30<=x<=90),相同账号的时长

#1234 12 23

#1234 10 122

#1233 92 4212

#1233 42 4252

#1239 87 2313

#1233 56 1121

#1231 19 45

#1235 45 679

cat data | awk '$2>=30&&$2<=90{dic[$1]+=$3}; END{for (i in dic) {print i, dic[i]}}'

Guess you like

Origin www.linuxidc.com/Linux/2019-10/160947.htm