awk grouping sum grouping statistics

group summation

 

1

awk '{s[$1] += $2}END{ for(i in s){  print i, s[i] } }' file1 > file2

The first column is the variable name and the first column is the variable, and the data in the second column of the same first column is accumulated and printed out.

awk

1

awk '{s[$1" "$2] += $3}END{ for(i in s){  print i, s[i] } }'  file1 > file2

With the variable name of the first column and the second column, the data of the third column of the same first column and the second column are accumulated and printed out and the sum

awk

1

awk '{s[$1] += $2; a[$1] += $3 }END{ for(i in s){  print i,s[i],a[i] } }'  haha.txt

If the first column is the same, then group by the first column and print the sum of the second and third columns respectively

awk

 

match

1. Match the intersection item

1

awk 'NR==FNR{a[$1]=1}NR>FNR&&a[$1]>0{print $0}' file1 (field: QQ) file2 (field: QQ point value) > file3

If the first column of the two files has the same value in file1 and file2, output all the columns of the second file

Note: If the amount of data exceeds 4Gb or the number of lines reaches 100 million, it is recommended to split file2, otherwise even machines with 32G memory will be eaten;

awk

1

 awk 'NR==FNR{a[$1" "$2]=1}NR>FNR&&a[$1" "$2]>0{print $0}'  file1 file2> file3

If in file1 and file2, the first and second columns of the two files have the same value, output all the columns of the second file

awk

2. Match non-intersection items

1

  awk 'NR==FNR{a[$1]=1}NR>FNR&&a[$1]<1 {print $0}'  file1 file2 > file3

Compare the first column of the two files, output: remove the lines that appear in the first column of file1 in file2

awk

The second method:

1

2

cat file1 file2|sort |uniq -d > jiaoji.txt

cat file2 jiaoji.txt  |sort |uniq -u > file3

 

Take the maximum and minimum values

1. For (2-column files)

1

awk '{max[$1]=max[$1]>$2?max[$1]:$2}END{for(i in max)print i,max[i]}'  file

The first column is unchanged, and the second column is the maximum value of the grouping

1

awk '{if (! min [$ 1]) min [$ 1] = 20121231235959; min [$ 1] = min [$ 1] <$ 2? min [$ 1]: $ 2} END {for (i in min) print i, min [ i]} 'file

The first column is unchanged, and the second column is the minimum value of the grouping

2. For single-column files

1

2

awk 'BEGIN {max = 0} {if ($1>max) max=$1 fi} END {print "Max=", max}' file2

awk 'BEGIN {min = 1999999} {if ($1<min) min=$1 fi} END {print "Min=", min}' file2

 

Sum, Average, Standard Deviation

sum

1

cat data|awk '{sum+=$1} END {print "Sum = ", sum}'

average

1

cat data|awk '{sum+=$1} END {print "Average = ", sum/NR}'

find the standard deviation

1

cat $FILE | awk -v ave=$ave '{sum+=($1-ave)^2}END{print sqrt(sum/(NR-1))}'

 

Consolidate rows and columns

1. Columns are replaced by rows

If the first column is the same, put all the second and third columns into one row

1

 awk '{qq[$1]=qq[$1](" "$2" "$3)}END{for(i in qq)print i,qq[i]}'

awk

 

2. Merge files

2 files, each with 2 columns, merge them into a three-column file according to the same number in the first column, and at the same time, add 0 to the number that is not in the second column corresponding to the first column in each file

1

awk 'FNR==NR{a[$1]=$2}FNR<NR{a[$1]?a[$1]=a[$1]" "$2:a[$1]=a[$1]" 0 "$2}END{for(i in a)print i,a[i]}' file1 file2 > file3

Note: file 2 must be smaller than the number of lines in file 1

3. 2 files, each with 3 columns, merge them into a 4-column file according to the same number of the first column and the second column, and at the same time, the first and second columns in each file correspond to The numbers not in column 3 are filled with 0

1

awk 'FNR==NR{a[$1" "$2]=$3}FNR<NR{a[$1" "$2]?a[$1" "$2]=a[$1" "$2]" "$3:a[$1" "$2]=a[$1" "$2]" 0 "$3}END{for(i in a)print i,a[i]}'  file

4. Replace the column with a row, and when a blank row is encountered, start a new row

1

awk 'begin {RS=""} {print $1,$2,$3} file1

5. Filtering a column of numbers

1

2

3

4

cat   canshu |while read a b

do

awk '{ if ($2>'"$a"' && $2<='"$b"' ) print $1}' result.txt  > "$a"_"$b"_result.log

done

Note: When awk uses functions, use '"$a"' (single quotes first, then double quotes)

 

Collection class

awk

1. Collection

1

cat fileA fileB |sort |uniq –d > result.log

2. Set difference

1

2

cat fileA fileB     |sort |uniq -d   > jiaoji.txt

cat fileA jiaoji.txt  |sort |uniq -u   > result.log

3. Collect the complete works to remove duplicates

1

cat fileA fileB |sort  -u > result.log

4. The complete works of the collection are not repeated

1

cat fileA fileB |sort     > result.log

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325408009&siteId=291194637