Linux (ii) advanced text processing

A, Cut ( Cut  command may be extracted from a text file or text flow text column)

      1, cut grammar

          cut -d 'separator character' -f fields separated by a specific character

          cut -c character information section for neatly

         Options and parameters:
             -d : followed by the separator character. And  -f  used with
             -f : based  -d  delimiter character is a piece of information will be the number of segments divided by  -f  taken on the meaning of paragraphs
             -C : character  (characters)  in units of fixed character extraction section

     2, for example:

           echo $PATH            /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

           Remove the PATH variable to find out the fifth path echo $ PATH | cut -d ':' -f 5 / usr / sbin

            Remove the PATH variable to find the third and fifth path echo $ PATH | cut -d ':' -f 3,5 / sbin: / usr / sbin

            Remove the PATH variable to find a path to the third to the last echo $ PATH | cut -d ':' -f 3- / sbin: / bin: / usr / sbin: / usr / bin: / root / bin

            Remove the PATH variable to find out first to third, and fifth path echo $ PATH | cut -d ':' -f 1-3,5 / usr / local / sbin: / usr / local / bin: / sbin: / usr / sbin

     3, example          

           Segment data have been prepared beforehand so space-separated:
              huangbo 18 jiangxi
              xuzheng 22 hunan
              wangbaoqiang 44 Liujiayao

          Obtaining an intermediate Age: cut -f 2 -d '' sutdent.txt             

             18
             22
             44

           Get the second character to character between the third character: cut -c 2-3 sutdent.txt              

              I
              am Sonntag
              an

Two, grep

      1, the basic use

           Hadoop query contains rows grep hadoop / etc / password

           grep hadoop ./*.txt ## txt to find among the contents of all those who are in the current directory with a string of huangbo                       

                      ./mazhonghua.txt:my name is huangbo is is huangbo
                      ./sutdent.txt:huangbo 18 jiangxi

        2, cut the interception: dividing the seventh segment reserved

               grep hadoop /etc/passwd | cut -d: -f 7                                 /bin/bash        

        3, the query line does not contain hadoop of
              grep -v hadoop / etc / passwd       

        4、 正则表达包含 oo
              grep '.*oo.*' /etc/passwd
                         root:x:0:0:root:/root:/bin/bash
                         lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
                         mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
                         uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin

         5, the regular expression (a character dot represents any) grep 'h. * P' / etc / passwd

         6, beginning with the regular expression hadoop grep '^ hadoop' / etc / passwd

         7, the end of the regular expression to hadoop grep 'hadoop $' / etc / passwd               

                       Regular expressions simple rule:
                         .: Any one character
                       a *: any number of a (zero or more A)
                       A:? Zero or one A
                      A +: One or more A
                       *: any number of any character
                       \: escape.
                O \ {2 \}: O was repeated twice

          8, not to find the beginning of the line grep -v # '^ #' a.txt | grep -v '^ $' ( '^ $' on behalf of blank lines)                   

                     $ hua
                     liu
                     de

               grep -v '^ #' huangbo.txt                      

                    $ hua
                    liu

                    from

           9, or beginning with h r a grep '^ [hr]' / etc / passwd              

               And not in the beginning of the grep h r '^ [^ hr]' / etc / passwd
                not in r h to the beginning of the grep '^ [^ hr]' / etc / passwd

Three, sed command      

      1, Delete: d command
            sed '2d' huangbo.txt ----- huangbo.txt second row delete files.
            sed '2, $ d' huangbo.txt ----- huangbo.txt second row delete all rows to the end of the file.
            sed '$ d' huangbo.txt ----- huangbo.txt delete the last line of the file.
            sed '/ test /' d huangbo.txt ----- huangbo.txt file delete all lines that contain the test.

       2. Replace: s command
            sed 's / test / mytest / g' huangbo.txt ## over the entire range of the test line is replaced mytest. If g is not marked, only the first test for each row is replaced with a matching mytest.
            sed -n 's / ^ test / mytest / p' used with huangbo.txt ## (-n) and p option to print only those rows flag indicates occurrence of replacement. That is, if a test at the beginning of the line is replaced mytest, then print it.
            Sed 'S / ^ 192.168.0.1 / & localhost /' huangbo.txt
            Sed -n 'S / 4444444 / & Test / GP' huangbo.txt ## & notation append a string to the string found. All 192.168.0.1 is replaced with the beginning of the line will add to its own localhost, become 192.168.0.1localhost.
           

            -n Sed 'S / \ (Love \) Able / \ lRS / P' huangbo.txt
            Sed -n 'S / \ (Wang \) WWW / \ 1test / P' Love huangbo.txt ## are labeled 1, All the loveable will be replaced with lovers, and replace the line will be printed.

            sed 's # 10 # 100 # g' huangbo.txt ## no matter what the character, followed by s commands are considered new delimiter, so "#" Here is the delimiter, instead of the default " / "separator. 10 represents all 100 replaced. 

            Range selected rows: a comma
            Sed -n '/ test /, / check / P' huangbo.txt
            Sed -n 'S # 4444444 # # BBBBBBB GP' huangbo.txt ## test and check all templates range determined rows inside is printed.

            sed -n '5, / ^ test / p' huangbo.txt ## starts printing from the fifth row to the first row contains all the rows between the test to start.
            

            sed '/ test /, / check / s / $ / sed test /' huangbo.txt ## for the test line between the template and west, with each end of the line test Sed replacement string.
            Multi-Editor: e command
            sed -e '1,5d' -e 's / test / check /' huangbo.txt ## (-e) switch allows multiple commands on the same line. As shown in the example, the first command line 1-5 delete, replace the test with the second order check. Command execution order of an impact on the results. If both commands are replaced with command, then the first replacement will affect the results of the second command to replace the command.

            sed --expression = 's / test / check /' --expression = '/ love / d' huangbo.txt ## a command is better than -e --expression. It can give sed expression evaluation.
      

       3, read from the file: r command
           sed '/ test / r file' huangbo.txt ----- file the contents is read in, and displayed in a row below the test match, if match multiple rows, the file the content displayed below all matching rows.
 
       4, write to a file: w command
           sed -n '/ test / w file ' huangbo.txt ----- huangbo.txt in all the lines contained in the test file is written.

       5, additional commands: a command
          sed '/ ^ test / a \\ ---> this is a example' huangbo.txt ## '---> this is a example' is added to the beginning of the test line (another from the line) is followed, sed a request command followed by a slash.
       6 , insert:  command
         sed '/ test / i \\ some thing new -------------------------' huangbo.txt If the  test  is matched , backslash put behind the text inserted in front of the matching line.
       7 , a next:  n-  command
         sed '/ test / {n; s / aa / bb /;}' huangbo.If the  test  is a match, the match line to the next line, replacing the line  AA , becomes  BB , and print the row, then continue.

       8 , Exit:  command
         sed '10q' huangbo.txt ----- After the printing of the  10  line after exit  Sed
Four, AWK

     awk  is a powerful text analysis tool, relative to  grep  lookup,  sed  editor,  awk  in its data analysis and report generation, is particularly strong. Simply  awk  it is to read the file line by line, as the default delimiter spaces each row of slices, cut portions then various evaluation. 
    1, assuming that the output of the last -n 5 below
       

root pts/0 192.168.123.1 Wed Dec 28 01:55 still logged in
reboot system boot 2.6.32-573.el6.x Tue Dec 27 04:25 - 03:11 (22:46)
root pts/1 192.168.123.1 Tue Dec 27 02:00 - 02:00 (00:00)
root pts/1 192.168.123.1 Tue Dec 27 01:59 - 02:00 (00:00)
root pts/0 192.168.123.1 Tue Dec 27 01:59 - down (00:16)

    2, only the five most recent login account last -n 5 | awk '{print $ 1}'       

root
reboot
root
root
root

awk workflow is such that: reading there is a record newline separated '\ n', and then recorded in the specified field delimiter into domain, fill-in fields, $ 0 indicates all domains, $ 1 represents the first field, $ n represents the n-th field. The default field separator is "blank key" or "[Tab] button", the $ 1 represents a login user, $ 3 represents a login user IP, and so on
   . 3, / etc / passwd account: cat / etc / passwd | awk - F ':' '{print $ 1}'

the root
bin
daemon
ADM
LP
this exemplary awk + action of each line will be executed action {print $ 1}. -F specified field separator is': '
   4, the display / etc / passwd accounts and account corresponding shell, between the shell and the tab-accounts divided cat / etc / passwd | awk -F ': '' {print $ 1 "\ t" $ 7 } '

root /bin/bash
bin /sbin/nologin
daemon /sbin/nologin
adm /sbin/nologin
lp /sbin/nologin    

   5, if only the display / etc / passwd accounts and account corresponding shell, between the shell and the comma-separated accounts, and add the name column names, all rows in the shell, adding "blue, / bin / nosh" in the last line .

cat /etc/passwd |awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
cat /etc/passwd | awk -F ':' 'BEGIN {print "name \t shell"} {print$1"\t"$7} END {print "blue,/bin/bash"}'
    结果:

name,shell
root,/bin/bash
daemon,/bin/sh
bin,/bin/sh
sys,/bin/sh
....
blue,/bin/nosh        

        awk workflow is such that: before performing beging, then read the file, read there is a record / n newline character segmentation and the records in the specified field delimiter into domain, fill-in fields, $ 0 indicates all domains, $ 1 It represents a domain, $ n denotes the n-th field, and then begin an operation mode corresponding to the action. Then start reading the second record • until all record
entries have been read, the last execution END operation.

    6, search / etc / passwd have all lines awk -F root keyword: '/ root /' / etc / passwd

         root: x: 0: 0: root: / root: / bin / bash
such example is the use of the pattern, the pattern matching (here root) row will be performed action (Action is not specified, the default output the contents of each row ).
    7, search supports regular, for example, to find the  root  beginning : awk -F: '/ ^ root /' / etc / passwd
search / etc / passwd  have  root  all rows keywords and displays the corresponding  shell
awk -F ':' '/ the root / Print $ {}. 7' / etc / the passwd
/ bin / the bash
here designated  action {print $ 7} 

     8, statistical / etc / passwd: file name, line number per row, the number of columns per row, corresponding to the complete line:

awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd

awk -F':' '{print "filename:" FILENAME ",linenumber:" NR ",colums:" NF "linecotent:" $0}' /etc/passwd
      结果:

filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh
filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh
filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh    

   Print using printf Alternatively, makes the code more compact and easy to read 
awk -F ':' '{printf ( "filename:% s, linenumber:% s, columns:% s, linecontent:% s \ n", FILENAME, NR , NF, $ 0)} ' / etc / passwd

Guess you like

Origin www.cnblogs.com/dll102/p/12015737.html