http://www.cnblogs.com/me115/p/3427319.html
This article describes the use of Shell to process text most commonly used tools under Linux:
the Find, grep, xargs, the Sort, uniq, TR, Cut, Paste, WC, sed, awk;
providing examples and arguments are most commonly used and most useful ;
principles I used to write a single line command shell script is, try not to over two lines;
if there is demand for more complex tasks, it is still considered python;
find file search
-
Find txt and pdf file
find . \( -name "*.txt" -o -name "*.pdf" \) -print
-
Regular way to find and .txt pdf
find . -regex ".*\(\.txt|\.pdf\)$"
-iregex: case-insensitive regular
-
Negative Parameters
Find all non-text txtfind . ! -name "*.txt" -print
-
Specify the search depth
print out the files in current directory (depth 1)find . -maxdepth 1 -type f
Custom Search
-
Search by type:
find . -type d -print //只列出所有目录
-type f file / l symbolic link
-
Search by Time:
-atime access time (in days, minutes, the unit is -amin, similar to the following)
-mtime modified (content is modified)
-ctime change over time (metadata or permission changes)
in the last 7 days to be visited All files:find . -atime 7 -type f -print
-
Search by Size:
w word k MG
looking for files larger than the 2kfind . -type f -size +2k
By permission lookup:
find . -type f -perm 644 -print //找具有可执行权限的所有文件
By users find:
find . -type f -user weber -print// 找用户weber所拥有的文件
After the follow-up action to find
-
Delete:
Delete all files in the current directory swp:find . -type f -name "*.swp" -delete
-
To perform an action (strong exec)
find . -type f -user root -exec chown weber {} \; //将当前目录下的所有权变更为weber
Note: {} is a special string, for each matching document, {} will be replaced with the file name;
EG: copy all the found files to another directory:find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;
-
Combine multiple commands
tips: If you need to follow to perform multiple commands, multiple commands can be written in a script. Then you can execute the script when -exec calls;-exec ./commands.sh {} \;
-print delimiter
Use '\ n' as a default file delimiter;
-print0 use '\ 0' as a file delimiter, so that you can search for files containing space;
grep text search
grep match_patten file // default access matching lines
-
Common parameter
-o output only matching lines of text VS -v output only lines of text does not match
the number -c statistics file containing textgrep -c "text" filename
-n Print line numbers match
-i ignore case when searching
-l Print only the file name -
In the multi-level directory recursive text search (search favorite programmer code):
grep "class" . -R -n
- Match multiple modes
grep -e "class" -e "vitural" file
- grep output \ 0 as the file name of the end character: (- z)
grep "test" file* -lZ| xargs -0 rm
xargs command-line parameter conversion
xargs data can be input into the command line parameters specific command; this can be used in combination with many commands. Such as grep, for example, find;
-
The output multiple lines into a single line output
CAT file.txt | xargs
\ is the n-delimiter between multiple lines of text -
A plurality of rows into a single row output
CAT single.txt | xargs. 3 -n
-n: Specifies the number of fields per line
xargs Parameter Description
-d defined delimiter (default spaces plurality of rows delimiter \ n-)
-n designated multi-line output is
-I {} specifies a replacement string that will be extended when xargs replaced, for command to be executed when multiple parameters
eg:
cat file.txt | xargs -I {} ./command.sh -p {} -1
-0: specify \ 0 as an input delimiter
eg: the number of rows statistical procedures
find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l
sort sort
Field Description:
-n sort by digital VS -d lexicographical sort by
-r decommitment
-k N N-th column designated sort
eg:
sort -nrk 1 data.txt
sort -bd data // 忽略像空格之类的前导空白字符
uniq eliminates duplicate rows
- Eliminating duplicate rows
sort unsort.txt | uniq
- Count the number of times each line in the file that appears in
sort unsort.txt | uniq -c
- Find duplicate rows
You can specify each row need to repeat the comparison: the start position compare the number of characters -w -ssort unsort.txt | uniq -d
Conversion with tr
-
Common usage
echo 12345 | tr '0-9' '9876543210' //加解密转换,替换对应字符 cat text| tr '\t' ' ' //制表符转空格
-
delete characters tr
cat file | tr -d '0-9' // 删除所有数字
-c Complement set
cat file | tr -c '0-9' //获取文件中所有数字 cat file | tr -d -c '0-9 \n' //删除非数字数据
-
tr compressed character
tr -s compress repeating characters appear in the text; most commonly used to compress the extra spacecat file | tr -s ' '
-
Character class
tr available in various character classes:
alnum: alphanumeric
alpha: letters
digit: Digital
space: whitespace
lower: Lowercase
upper: uppercase
cntrl: control (non-printable) character
print: printable characters
using the method: tr [ : class:] [: class: ]eg: tr '[:lower:]' '[:upper:]'
text segmentation by column cut
- File interception of columns 2 and 4:
cut -f2,4 filename
- To file all columns except the third column:
cut -f3 --complement filename
- -d Specifies delimiter:
cat -f2 -d";" filename
- cut range taken
N- N-th field to the end of
-M 1 M field is
the NM N M field to - Unit cut taken
-b bytes
-c in characters
-f field units (using delimiter) - eg:
cut -c1-5 file //打印第一到5个字符 cut -c-2 file //打印前2个字符
paste text in columns splicing
The two text columns spliced together;
cat file1
1
2
cat file2
colin
book
paste file1 file2
1 colin
2 book
The default delimiter is a tab, can indicate a delimiter -d
Paste -d file1 file2 ","
. 1, Colin
2, Book
wc statistical line and character tools
// count the number of rows the -l File WC
WC // statistics word -w file
counts characters wc -c file //
sed text replacement tool
- Replace at the beginning of
seg 's/text/replace_text/' file //替换每一行的第一处匹配的text
-
Global Replace
seg 's/text/replace_text/g' file
After replacing default, the contents of the output after the replacement, if needed directly replace the original file, use the -i:
seg -i 's/text/repalce_text/g' file
-
Remove blank lines:
sed '/^$/d' file
-
Variable into
a string that has been matched by the tag & referenced.echo this is en example | seg 's/\w+/[&]/g' $>[this] [is] [en] [example]
-
Substring matching tag
bracket contents of the first matching using labeled \ 1 to referencesed 's/hello\([0-9]\)/\1/'
-
Double quotes evaluation
sed is generally referred to by single quotation marks; double quotes may be used, the use of double or double quotes would expression evaluation:sed 's/$var/HLLOE/'
When using double quotes, we can specify variables and sed replacement string pattern;
eg: p=patten r=replaced echo "line con a patten" | sed "s/$p/$r/g" $>line con a replaced
-
Other examples of
the string inserting characters: the contents of the text in each line (PEKSHA) is converted to PEK / SHAsed 's/^.\{3\}/&\//g' file
awk data stream processing tool
-
awk script structure
awk 'BEGIN {statements} statements2 END {statements}' -
Work
1 in the begin block of statements;
2. stdin from a file or read one line, and then perform statements2, this process is repeated until all the files to be read is completed;
3. execute statement block end;
print Print this row
-
When using print without parameters, it will print the current line;
echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'
-
When the print separated by commas, spaces delimited parameters;
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \ print var1, var2 , var3; }' $>v1 V2 v3
- Use - splicing breaks manner ( "" symbol as splicing);
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \ print var1"-"var2"-"var3; }' $>v1-V2-v3
Special variables: NR NF $ 0 $ 1 $ 2
NR: represents the number of records corresponding to the current line number in the implementation process;
of NF: represents the number of the field, during the execution of the total corresponding to the number of fields in the current row;
$ 0: This variable contains the text during execution of the current line;
$ 1: First a text field;
$ 2: second text field;
echo -e "line1 f2 f3\n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'
- Printing the second and third fields each line:
awk '{print $2, $3}' file
-
The number of rows Statistical file:
awk ' END {print NR}' file
-
Accumulating the first field of each line:
echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ; print "begin";} {sum += $1;} END {print "=="; print sum }'
Transfer external variables
var=1000
echo | awk '{print vara}' vara=$var # 输入来自stdin
awk '{print vara}' vara=$var file # 输入来自文件
Filtering the style lines treated with awk
awk 'NR <5' # line number less than. 5
awk '. 1 == NR, NR. 4 == {} Print' File # line number is equal to 14 and the printed
awk '/ linux /' # linux row containing the text ( you can use regular expressions to specify, super strong)
awk '! / linux /' # linux text does not contain the line
Set delimiter
Use -F set delimiters (space by default)
awk -F: '$ {Print of NF}' / etc / the passwd
Read command output
Use getline, the output of the read command into the outer shell of variable cmdout;
echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'
Using a loop in awk
for(i=0;i<10;i++){print $i;}
for(i in array){print array[i];}
eg:
print line in the form of reverse order of: (realization tac command)
seq 9| \
awk '{lifo[NR] = $0; lno=NR} \
END{ for(;lno>-1;lno--){print lifo[lno];}
} '
awk achieve head, tail command
-
head:
awk 'NR<=10{print}' filename
-
tail:
awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \ print buffer[i %10]} } ' filename
Print the specified column
- awk ways:
ls -lrt | awk '{print $6}'
- cut way to achieve
ls -lrt | cut -f6
Print the specified text area
- Determine the line number
seq 100| awk 'NR==4,NR==6{print}'
- Determining the text
printed in the text and between start_pattern end_pattern;
eg:awk '/start_pattern/, /end_pattern/' filename
seq 100 | awk '/13/,/15/' cat /etc/passwd| awk '/mai.*mail/,/news.*news/'
awk commonly used built-in functions
index (string, search_string): returns the position search_string appear in string
sub (regex, replacement_str, string) : n is matched to the first of the content is replaced replacement_str;
match (REGEX, string): Check whether the regular expression can be matching string;
length (string): returns the length of the string
echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }'
Similar language printf c printf, the output format
eg:
seq 10 | awk '{printf "->%4s\n", $1}'
Iteration file lines, words and characters
1. iterations of each line of the file
-
while round-robin
while read line; do echo $line; done < file.txt 改成子shell: cat file.txt | (while read line;do echo $line;done)
-
awk法:
cat file.txt| awk '{print}'
2. iterations of each word in a row
for word in $line;
do
echo $word;
done
3. Each iteration of a character
$ {string: start_pos: num_of_chars} : extracting a character string from; (the bash text sections)
$ {#} word: return variable word length
for((i=0;i<${#word};i++))
do
echo ${word:i:1);
done