Commonly used text processing commands

Linux, a lot of text tools use to regular expressions, regular expressions can greatly simplify linux system administration, because there are many online canonical correlation tutorial, so no talk here, I was watching the rookie of the regular expression , one afternoon to see a few times in the basic experiment will, in addition to the positive affirmation pre-investigation, the reverse is certainly pre-check these relatively complex, others are very simple, very often remember also can check online for the writing, you do not need to remember in real time. Here mainly to talk about awk and other tools used in text processing regular expressions.

A, awk

awk instructions must be enclosed in single quotes.

Basic sentence

awk -F 'specified delimiter input' 'the BEGIN {} initialization that some filters condition the END {...} {final finishing work for each row}'

Intermediate processing block can have multiple, we will go through a single filter conditions the filter conditions once per line, where BEGIN and END sides executed only once

Filter records

  1. awk '$3==0 && $6=="LISTEN" ' netstat.txt
  2. awk '$3==0 && $6=="LISTEN" || NR==1 ' netstat.txt

Specify the delimiter

  1. awk -F: '{print $1,$3,$6}' /etc/passwdEquivalent toawk 'BEGIN{FS=":"} {print $1,$3,$6}' /etc/passwd
  2. awk -F '[;:]' Specify multiple delimiters
  3. awk -F: '{print $1,$3,$6}' OFS="\t" /etc/passwdSpecifies the output delimiter

Note that the above print $1,$3,$6is ,to be replaced with the separator, if print $1$3$6the intermediate no delimiter

Special keyword:

  1. The line number currently processed NR
  2. NF number of fields in the current row of the total used in the process
  3. FNR file line number of the current process (when dealing with multiple files, NR will stop accumulating, but if FNR at processing new files starting from 1)
  4. FILENAME filename
  5. $ 0 Current entire line
  6. FS input field separator is a space or a default Tab
  7. RS input record separator default linefeed
  8. OFS output field separator is a space or a default Tab
  9. ORS output record separator default linefeed

Regular

  1. Ordinary matches: awk'/hello/ {print}' test.sh
  2. Invert match: awk '!/hello/ {print}' test.sh
  3. Meanwhile match: awk '/hello/ && /world/ {print}' test.sh
  4. Or match: awk '/hello/ || /world/ {print}' test.shcan also be written asawk '/hello|world/ {print}' test.sh
  5. Match the specified column: awk '$5 ~ /hello/ {print}' test.sh
  6. Invert the specified string matching: awk '$5 !~ /hello/ {print}' test.sh

Output to a different file

  1. $ awk 'NR!=1{if($6 ~ /TIME|ESTABLISHED/) print > "1.txt"; else if($6 ~ /LISTEN/) print > "2.txt"; else print > "3.txt" }' netstat.txt
  2. awk 'NR!=1{print > $6}' netstat.txt

In fact, the use of >redirection, the example uses an if statement

  1. Statistical data: awk 'NR!=1{a[$6]++;} END {for (i in a) print i ", " a[i];}' netstat.txt
  2. Filter the number of rows, the beginning and end use conditions, the partition: awk '/ test1 /, / test2 / {print}' test.txt

And interactive environment variables

$ x=5
 
$ y=10
$ export y
 
$ echo $x $y
5 10
$ awk -v val=$x '{print $1, $2, $3, $4+val, $5+ENVIRON["y"]}' OFS="\t" score.txt
Marry   2143    78      89      87
Jack    2321    66      83      55
Tom     2122    48      82      81
Mike    2537    87      102     105
Bob     2415    40      62      72

Two, grep

parameter list:

  1. -w matches whole words
  2. -s ignore nonexistent files and other error
  3. -l lists only the matching file list
  4. -L only lists do not match the file list
  5. After displaying the number of lines as the -A -1 1 row matching rows
  6. The number of rows in front of a display such as -1 -B matching row before row 1
  7. -number as the number of lines before and after the front row of a row match -1
  8. -n prints the number of lines
  9. -c displays only the number of
  10. -v Reverse
  11. -o display only content that matches
  12. -E said to be used EREs
  13. -P said to be used PREs

grep mainly use a regular expression, which should be noted there are three regular BREs, EREs and PREs. The first two do not support non-greedy matching. grep default is BREs, so he was ?,+,|,{,},(,)
such a character need \to escape, while he did not support \s,\S,\D,\d,\nand other characters.

Three, sed

sed command in the process of writing the script or automate frequently used.

Basic sentence: sed -nfei [Operation]

Operation: n1, n2 action

action:

  1. d: Delete
  2. s: Alternatively, the replacement line, the line matching string, such as hello world hello replace the row into the hi hi World
  3. a and i: a row i increases to increase the matching is increased to increase the back in front of the matching line
  4. c: Replace, replace the entire line for

example:

  1. sed -e 's/hello/hi/g': Replacing text, -e may be omitted
  2. sed -e '1,2s/hello/hi/g' -e '3,4s/world/man/g:Equivalent tosed -e '1,2s/hello/hi/g;3,4s/world/man/g
  3. sed s/hello \(world\)/\1 hi/g': Gangmate, may be used \ n selected front the group

Four, sort and uniq

sort parameters

  1. -r: default ascending, -r reverse order
  2. -u: remove duplicate
  3. -o: redirected to a file, note sort test.txt >test.txtunavailable because> is to empty the file, it will cause the file before it is cleared sort
  4. -n: default sort by character, such as less than 10 2, -n represents sorted by number
  5. -t: Specifies delimiter
  6. -k: specifies which column do with sorting
  7. -b: Ignore the space character in front of each line start out

example:

  1. sort -t $'\t' -k 1 -u res.txt > res2.txt A tab as a delimiter, and to sort the first column by weight

uniq parameters

Note uniq require text is ordered, it is generally used when uniq is sort of behind pipe earlier

  1. -c: display the number of occurrences
  2. -d: Display only repeated lines;    
  3. -u: only shows the ranks once;   

Talk sort|uniqand sort -ualways found it strange that there is any difference between the two functions is the same. sort -uIs added later, so many people still use sort|uniq,
it is currently recommended sort -u, because there are fewer inter-process communication.

Fifth, combat

The following documents deal with the contents of the counts removed and sorted domain, such treatment:

http://www.baidu.com/index . HTML
HTTP: / / www.baidu.com/1.html
http://post.baidu.com/index.html
http://mp3.baidu.com/index .html
http://www.baidu.com/3.html
http://post.baidu.com/2.html
following results were obtained:
. 3 www.baidu.com
2 post.baidu.com
mp3.baidu.com. 1

Solution 1:grep -Po '(?<=//)(.*?)(?=/)' test.txt |sort |uniq -c|sort -nr

1. The use of Perl, he supported non-greedy, 2. use of forward and reverse pre-investigation (Positive pre-investigation is behind (? =)) 3. Using the -o parameter only content output match

Solution 2:awk -F/ '{print $3}' test.txt |sort |uniq -c|sort -nr

It indicates a value corresponding to segmentation symbols direct access

Solution 3:sed 's/http:\/\/\([^/]*\).*/\1/' test.txt|sort |uniq -c|sort -nr

Basic Regular small brackets need to escape, if -r parameter that is extended regular parentheses do not escape

Solution 4: sed -e 's/http:\/\///' -e 's/\/.*//' | sort | uniq -c | sort -rn

Alternatively adopted, in front of the first alternative, the latter in alternative

awk examples

Note awk does not support multidimensional arrays, using a flexible way, in normal usage no problem, but if you need a map stored value is not appropriate, the following
documents columns 1-6 respectively deal od sum up lj day now to calculate the cumulative sum up lj day of output and
the output will have to be deal od sum up lj day is sum up lj day needs to be a map, but awk can not do this

{
    updealids:{
        od: {day,sum,up,lj}
        
    }
}
awk 'BEGIN{OFS="\t"}{result[$1,$3,"sum"]+=$4;result[$1,$3,"up"]+=$5;result[$1,$3,"lj"]+=$6;result[$1,$3,"day"]=$2}\
END{for ( i in result)   {split(i, a, SUBSEP); print result[i] ,a[1], a[2], a[3] }}'  *

References:

  1. AWK simple tutorial
  2. SED simple tutorial

Guess you like

Origin www.cnblogs.com/chenfangzhi/p/11997028.html