group summation
1 |
awk '{s[$1] += $2}END{ for(i in s){ print i, s[i] } }' file1 > file2 |
The first column is the variable name and the first column is the variable, and the data in the second column of the same first column is accumulated and printed out.
1 |
awk '{s[$1" "$2] += $3}END{ for(i in s){ print i, s[i] } }' file1 > file2 |
With the variable name of the first column and the second column, the data of the third column of the same first column and the second column are accumulated and printed out and the sum
1 |
awk '{s[$1] += $2; a[$1] += $3 }END{ for(i in s){ print i,s[i],a[i] } }' haha.txt |
If the first column is the same, then group by the first column and print the sum of the second and third columns respectively
match
1. Match the intersection item
1 |
awk 'NR==FNR{a[$1]=1}NR>FNR&&a[$1]>0{print $0}' file1 (field: QQ) file2 (field: QQ point value) > file3 |
If the first column of the two files has the same value in file1 and file2, output all the columns of the second file
Note: If the amount of data exceeds 4Gb or the number of lines reaches 100 million, it is recommended to split file2, otherwise even machines with 32G memory will be eaten;
1 |
awk 'NR==FNR{a[$1" "$2]=1}NR>FNR&&a[$1" "$2]>0{print $0}' file1 file2> file3 |
If in file1 and file2, the first and second columns of the two files have the same value, output all the columns of the second file
2. Match non-intersection items
1 |
awk 'NR==FNR{a[$1]=1}NR>FNR&&a[$1]<1 {print $0}' file1 file2 > file3 |
Compare the first column of the two files, output: remove the lines that appear in the first column of file1 in file2
The second method:
1 2 |
cat file1 file2|sort |uniq -d > jiaoji.txt cat file2 jiaoji.txt |sort |uniq -u > file3 |
Take the maximum and minimum values
1. For (2-column files)
1 |
awk '{max[$1]=max[$1]>$2?max[$1]:$2}END{for(i in max)print i,max[i]}' file |
The first column is unchanged, and the second column is the maximum value of the grouping
1 |
awk '{if (! min [$ 1]) min [$ 1] = 20121231235959; min [$ 1] = min [$ 1] <$ 2? min [$ 1]: $ 2} END {for (i in min) print i, min [ i]} 'file |
The first column is unchanged, and the second column is the minimum value of the grouping
2. For single-column files
1 2 |
awk 'BEGIN {max = 0} {if ($1>max) max=$1 fi} END {print "Max=", max}' file2 awk 'BEGIN {min = 1999999} {if ($1<min) min=$1 fi} END {print "Min=", min}' file2 |
Sum, Average, Standard Deviation
sum
1 |
cat data|awk '{sum+=$1} END {print "Sum = ", sum}' |
average
1 |
cat data|awk '{sum+=$1} END {print "Average = ", sum/NR}' |
find the standard deviation
1 |
cat $FILE | awk -v ave=$ave '{sum+=($1-ave)^2}END{print sqrt(sum/(NR-1))}' |
Consolidate rows and columns
1. Columns are replaced by rows
If the first column is the same, put all the second and third columns into one row
1 |
awk '{qq[$1]=qq[$1](" "$2" "$3)}END{for(i in qq)print i,qq[i]}' |
2. Merge files
2 files, each with 2 columns, merge them into a three-column file according to the same number in the first column, and at the same time, add 0 to the number that is not in the second column corresponding to the first column in each file
1 |
awk 'FNR==NR{a[$1]=$2}FNR<NR{a[$1]?a[$1]=a[$1]" "$2:a[$1]=a[$1]" 0 "$2}END{for(i in a)print i,a[i]}' file1 file2 > file3 |
Note: file 2 must be smaller than the number of lines in file 1
3. 2 files, each with 3 columns, merge them into a 4-column file according to the same number of the first column and the second column, and at the same time, the first and second columns in each file correspond to The numbers not in column 3 are filled with 0
1 |
awk 'FNR==NR{a[$1" "$2]=$3}FNR<NR{a[$1" "$2]?a[$1" "$2]=a[$1" "$2]" "$3:a[$1" "$2]=a[$1" "$2]" 0 "$3}END{for(i in a)print i,a[i]}' file |
4. Replace the column with a row, and when a blank row is encountered, start a new row
1 |
awk 'begin {RS=""} {print $1,$2,$3} file1 |
5. Filtering a column of numbers
1 2 3 4 |
cat canshu |while read a b do awk '{ if ($2>'"$a"' && $2<='"$b"' ) print $1}' result.txt > "$a"_"$b"_result.log done |
Note: When awk uses functions, use '"$a"' (single quotes first, then double quotes)
Collection class
1. Collection
1 |
cat fileA fileB |sort |uniq –d > result.log |
2. Set difference
1 2 |
cat fileA fileB |sort |uniq -d > jiaoji.txt cat fileA jiaoji.txt |sort |uniq -u > result.log |
3. Collect the complete works to remove duplicates
1 |
cat fileA fileB |sort -u > result.log |
4. The complete works of the collection are not repeated
1 |
cat fileA fileB |sort > result.log |