When using the Shell under Linux text processing most commonly used tools

http://www.cnblogs.com/me115/p/3427319.html

This article describes the use of Shell to process text most commonly used tools under Linux:
the Find, grep, xargs, the Sort, uniq, TR, Cut, Paste, WC, sed, awk;
providing examples and arguments are most commonly used and most useful ;
principles I used to write a single line command shell script is, try not to over two lines;
if there is demand for more complex tasks, it is still considered python;

find file search

  • Find txt and pdf file

      find . \( -name "*.txt" -o -name "*.pdf" \) -print
  • Regular way to find and .txt pdf

      find . -regex  ".*\(\.txt|\.pdf\)$"

    -iregex: case-insensitive regular

  • Negative Parameters
    Find all non-text txt

       find . ! -name "*.txt" -print
  • Specify the search depth
    print out the files in current directory (depth 1)

      find . -maxdepth 1 -type f  

Custom Search

  • Search by type:

      find . -type d -print  //只列出所有目录

    -type f file / l symbolic link

  • Search by Time:
    -atime access time (in days, minutes, the unit is -amin, similar to the following)
    -mtime modified (content is modified)
    -ctime change over time (metadata or permission changes)
    in the last 7 days to be visited All files:

      find . -atime 7 -type f -print
  • Search by Size:
    w word k MG
    looking for files larger than the 2k

      find . -type f -size +2k

    By permission lookup:

      find . -type f -perm 644 -print //找具有可执行权限的所有文件

    By users find:

      find . -type f -user weber -print// 找用户weber所拥有的文件

After the follow-up action to find

  • Delete:
    Delete all files in the current directory swp:

      find . -type f -name "*.swp" -delete
  • To perform an action (strong exec)

      find . -type f -user root -exec chown weber {} \; //将当前目录下的所有权变更为weber

    Note: {} is a special string, for each matching document, {} will be replaced with the file name;
    EG: copy all the found files to another directory:

      find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;
  • Combine multiple commands
    tips: If you need to follow to perform multiple commands, multiple commands can be written in a script. Then you can execute the script when -exec calls;

      -exec ./commands.sh {} \;

-print delimiter

Use '\ n' as a default file delimiter;
-print0 use '\ 0' as a file delimiter, so that you can search for files containing space;

grep text search

grep match_patten file // default access matching lines

  • Common parameter
    -o output only matching lines of text  VS  -v output only lines of text does not match
    the number -c statistics file containing text

      grep -c "text" filename

    -n Print line numbers match
    -i ignore case when searching
    -l Print only the file name

  • In the multi-level directory recursive text search (search favorite programmer code):

      grep "class" . -R -n
  • Match multiple modes
      grep -e "class" -e "vitural" file
  • grep output \ 0 as the file name of the end character: (- z)
      grep "test" file* -lZ| xargs -0 rm

xargs command-line parameter conversion

xargs data can be input into the command line parameters specific command; this can be used in combination with many commands. Such as grep, for example, find;

  • The output multiple lines into a single line output
    CAT file.txt | xargs
    \ is the n-delimiter between multiple lines of text

  • A plurality of rows into a single row output
    CAT single.txt | xargs. 3 -n
    -n: Specifies the number of fields per line

xargs Parameter Description

-d defined delimiter (default spaces plurality of rows delimiter \ n-)
-n designated multi-line output is
-I {} specifies a replacement string that will be extended when xargs replaced, for command to be executed when multiple parameters
eg:

cat file.txt | xargs -I {} ./command.sh -p {} -1

-0: specify \ 0 as an input delimiter
eg: the number of rows statistical procedures

find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l

sort sort

Field Description:
-n sort by digital VS -d lexicographical sort by
-r decommitment
-k N N-th column designated sort
eg:

sort -nrk 1 data.txt
sort -bd data // 忽略像空格之类的前导空白字符

uniq eliminates duplicate rows

  • Eliminating duplicate rows
      sort unsort.txt | uniq 
  • Count the number of times each line in the file that appears in
      sort unsort.txt | uniq -c
  • Find duplicate rows
      sort unsort.txt | uniq -d
    You can specify each row need to repeat the comparison: the start position compare the number of characters -w -s

Conversion with tr

  • Common usage

      echo 12345 | tr '0-9' '9876543210' //加解密转换,替换对应字符
      cat text| tr '\t' ' '  //制表符转空格
  • delete characters tr

      cat file | tr -d '0-9' // 删除所有数字

    -c Complement set

      cat file | tr -c '0-9' //获取文件中所有数字
      cat file | tr -d -c '0-9 \n'  //删除非数字数据
  • tr compressed character
    tr -s compress repeating characters appear in the text; most commonly used to compress the extra space

      cat file | tr -s ' '
  • Character class
    tr available in various character classes:
    alnum: alphanumeric
    alpha: letters
    digit: Digital
    space: whitespace
    lower: Lowercase
    upper: uppercase
    cntrl: control (non-printable) character
    print: printable characters
    using the method: tr [ : class:] [: class: ]

      eg: tr '[:lower:]' '[:upper:]'

text segmentation by column cut

  • File interception of columns 2 and 4:
      cut -f2,4 filename
  • To file all columns except the third column:
      cut -f3 --complement filename
  • -d Specifies delimiter:
      cat -f2 -d";" filename
  • cut range taken
    N- N-th field to the end of
    -M 1 M field is
    the NM N M field to
  • Unit cut taken
    -b bytes
    -c in characters
    -f field units (using delimiter)
  • eg:
      cut -c1-5 file //打印第一到5个字符
      cut -c-2 file  //打印前2个字符

paste text in columns splicing

The two text columns spliced ​​together;

cat file1
1
2

cat file2
colin
book

paste file1 file2
1 colin
2 book

The default delimiter is a tab, can indicate a delimiter -d
Paste -d file1 file2 ","
. 1, Colin
2, Book

wc statistical line and character tools

// count the number of rows the -l File WC
WC // statistics word -w file
counts characters wc -c file //

sed text replacement tool

  • Replace at the beginning of
      seg 's/text/replace_text/' file   //替换每一行的第一处匹配的text
  • Global Replace

       seg 's/text/replace_text/g' file

    After replacing default, the contents of the output after the replacement, if needed directly replace the original file, use the -i:

      seg -i 's/text/repalce_text/g' file
  • Remove blank lines:

      sed '/^$/d' file
  • Variable into
    a string that has been matched by the tag & referenced.

    echo this is en example | seg 's/\w+/[&]/g'
    $>[this]  [is] [en] [example]
  • Substring matching tag
    bracket contents of the first matching using labeled \ 1 to reference

      sed 's/hello\([0-9]\)/\1/'
  • Double quotes evaluation
    sed is generally referred to by single quotation marks; double quotes may be used, the use of double or double quotes would expression evaluation:

      sed 's/$var/HLLOE/' 

    When using double quotes, we can specify variables and sed replacement string pattern;

    eg:
    p=patten
    r=replaced
    echo "line con a patten" | sed "s/$p/$r/g"
    $>line con a replaced
  • Other examples of
    the string inserting characters: the contents of the text in each line (PEKSHA) is converted to PEK / SHA

      sed 's/^.\{3\}/&\//g' file

awk data stream processing tool

  • awk script structure
    awk 'BEGIN {statements} statements2 END {statements}'

  • Work
    1 in the begin block of statements;
    2. stdin from a file or read one line, and then perform statements2, this process is repeated until all the files to be read is completed;
    3. execute statement block end;

  • When using print without parameters, it will print the current line;

      echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }' 
  • When the print separated by commas, spaces delimited parameters;

    echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \
    print var1, var2 , var3; }'
    $>v1 V2 v3
  • Use - splicing breaks manner ( "" symbol as splicing);
    echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \
    print var1"-"var2"-"var3; }'
    $>v1-V2-v3

Special variables: NR NF $ 0 $ 1 $ 2

NR: represents the number of records corresponding to the current line number in the implementation process;
of NF: represents the number of the field, during the execution of the total corresponding to the number of fields in the current row;
$ 0: This variable contains the text during execution of the current line;
$ 1: First a text field;
$ 2: second text field;

echo -e "line1 f2 f3\n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'
  • Printing the second and third fields each line:
      awk '{print $2, $3}' file
  • The number of rows Statistical file:

      awk ' END {print NR}' file
  • Accumulating the first field of each line:

      echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ;
      print "begin";} {sum += $1;} END {print "=="; print sum }'

Transfer external variables

var=1000
echo | awk '{print vara}' vara=$var #  输入来自stdin
awk '{print vara}' vara=$var file # 输入来自文件

Filtering the style lines treated with awk

awk 'NR <5' # line number less than. 5
awk '. 1 == NR, NR. 4 == {} Print' File # line number is equal to 14 and the printed
awk '/ linux /' # linux row containing the text ( you can use regular expressions to specify, super strong)
awk '! / linux /' # linux text does not contain the line

Set delimiter

Use -F set delimiters (space by default)
awk -F: '$ {Print of NF}' / etc / the passwd

Read command output

Use getline, the output of the read command into the outer shell of variable cmdout;

echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }' 

Using a loop in awk

for(i=0;i<10;i++){print $i;}
for(i in array){print array[i];}

eg:
print line in the form of reverse order of: (realization tac command)

seq 9| \
awk '{lifo[NR] = $0; lno=NR} \
END{ for(;lno>-1;lno--){print lifo[lno];}
} '

awk achieve head, tail command

  • head:

      awk 'NR<=10{print}' filename
  • tail:

      awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \
      print buffer[i %10]} } ' filename

Print the specified column

  • awk ways:
      ls -lrt | awk '{print $6}'
  • cut way to achieve
      ls -lrt | cut -f6

Print the specified text area

  • Determine the line number
      seq 100| awk 'NR==4,NR==6{print}'
  • Determining the text
    printed in the text and between start_pattern end_pattern;
      awk '/start_pattern/, /end_pattern/' filename
    eg:
    seq 100 | awk '/13/,/15/'
    cat /etc/passwd| awk '/mai.*mail/,/news.*news/'

awk commonly used built-in functions

index (string, search_string): returns the position search_string appear in string
sub (regex, replacement_str, string) : n is matched to the first of the content is replaced replacement_str;
match (REGEX, string): Check whether the regular expression can be matching string;
length (string): returns the length of the string

echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }' 

Similar language printf c printf, the output format
eg:

seq 10 | awk '{printf "->%4s\n", $1}'

Iteration file lines, words and characters

1. iterations of each line of the file

  • while round-robin

    while read line;
    do
    echo $line;
    done < file.txt
    改成子shell:
    cat file.txt | (while read line;do echo $line;done)
  • awk法:
    cat file.txt| awk '{print}'

2. iterations of each word in a row

for word in $line;
do 
echo $word;
done

3. Each iteration of a character

$ {string: start_pos: num_of_chars} : extracting a character string from; (the bash text sections)
$ {#} word: return variable word length

for((i=0;i<${#word};i++))
do
echo ${word:i:1);
done

Guess you like

Origin blog.csdn.net/boazheng/article/details/89376028