linux text processing tools papers

A commonly used simple tools

  cat [OPTION]... [FILE]... 

  •   -E: $ display line terminator
  •   -n: to show each line are numbered.
  •   -A: show all control characters
  •   -s: compressed air conduct continuous line
   more: Paging File View 
   -d: display the page and quit tips 
  less: page after page to view the output file or STDIN
  When you view useful commands include:
  / Text search text
  n / N skip to the next or previous match
  less command is a command to use pager man
   Text cut cut [OPTION] ... [FILE] ... 
  •   -f: take the first few fields
  •   -c: cutting by character
  •   -d: Specifies the cut, default is tab
  wc text statistics
  •   -l: counting only the number of rows
  •   -w: world, only to calculate the total number of words
  •   -c: Total number of bytes calculated only
  •   -m: just calculate the total number of characters
  •   -L: shows the length of the longest line in the file
   sort, sort text
  •   -r: reverse output
  •   -R: random ordering
  •   -n: digital execution order by size
  •   Ignore character case options (fold) string: -f
  •   -u: delete duplicate rows in the output
  •   c -tc used as delimiter field
  •   -k X X column using c options as split fields can be used to organize multiple times
   uniq statistical tools
  •   -c: Displays the number of each row recurring
  •   -d: show only duplicate rows
  •   -u: do not display duplicate rows
  •   Often used in conjunction with and sort commands: sort userlist.txt | uniq -c
   grep text filtering
   Line to match the print; text search tool, based on user-specified "model" for the target text line by line matching check: the role of
  grep [OPTIONS] PATTERN [FILE...] 
  •   -m #: # times after the match to stop
  •   -v: display pattern is not matched to the line
  •   -i: ignore case
  •   The number of rows to the match statistics: -c
  •   -o: to display only matching rows
  •   -q: silent mode does not output any information
  •   -A #:after后#hang
  •   -B #: before, before the line #
  •   -C #: context, each longitudinal row #
  •   -e: between the reality of multiple options or relationship grep -e 'cat' -e 'dog' file 
  •   -w: match whole words
  •   -E: Use ERE
  •   -F: the equivalent of fgrep, does not support regular expressions
  •   -f: file based on file processing mode
Second, the regular expression
   REGEXP: Regular Expressions, a special kind of mode character and text written characters, some characters (metacharacters) does not represent a character literal meaning, while a control function or wildcard
  Program support: grep, sed, awk, vim, less, nginx, varnish, etc. 
  Divided into two categories: basic regular expressions: BRE, extended regular expression: ERE 
  Basic regular expression metacharacters 
  •   . Matches any single character
  •   [] Match any single character within a specific range, columns formula: [li] [wang] [0-9]
  •   Any single character [^] matches outside the specified range 
  •  [: Alnum:] characters and letters
  •   [: Alpha:] stands for any English uppercase and lowercase characters, az, AZ
  •   [: Lower:] lowercase letters [: upper:] uppercase letters
  •   [: Blank:] blank characters, spaces, tabs, etc.
  •   [: Space:] vertical and horizontal whitespace (ratio [: blank:] contains a wide range) 
  •   [: Cntrl:] uncontrollable printed character (backspace, delete, bell ...
  •   [: Digit:] decimal number [: xdigit:] hexadecimal digits 
  •   [: Graph:] non-blank printable characters
  •   [: Print:] printable characters
  •   [: Punct:] Punctuation
  •   * Matches any number of times in front of the characters, including zero greedy mode: as long as possible match
  •   * Any character of any length
  •   \? Match its preceding character 0 or 1 times
  •   \ + Matches its preceding character at least once
  •   \ {N \} matches the preceding character n times
  •   \ {M, n \} foregoing character matches at least m times, n times at most
  •   \ {, N \} foregoing character match up n times
  •   \ {N, \} foregoing character match at least n times

Location anchoring: positioning to appear 

  •   ^ Beginning of a line anchored mode for most of the left
  •    $ Anchored end of the line, for the rightmost pattern   
  •   ^ PATTERN $ pattern matching for the whole line  
  •   ^ $ Empty line  
  •    ^ [[: Space:]] * $ Blank lines  
  •    \ <Or \ b word-initial anchor for word patterns of the left  
  •    \> Or \ b anchor ending, the right mode for the word   
  •   \ <Pattern \> matches the entire word
Grouping and references
  Packet: \ (\) of one or more characters tied together, as a whole process, such as: \ (root \) \ +
  Packet parentheses pattern matching content to be recorded in the regular expression engine of internal variables, these variables are named: \ 1, \ 2 \ 3, ...
  \ 1 showing from the left a first mode between the left bracket and a right bracket matching the matched character
  Example: \ (string1 \ (string2 \) \)
  \1 :string1\(string2\)
  \2 :string2
  Backward references: character reference foregoing matched parentheses packet mode, rather than the pattern itself
  Or: \ |
  Example: a \ | ba or b
  C \ | cat C or cat
  \(C\|c\)at Cat或cat

Guess you like

Origin www.cnblogs.com/kading/p/10923025.html