Good programmers to share large data line learning AWK Comments

  Good programmers to share large data line learning AWK how explain , awk is a powerful text analysis tool, relative to edit the look grep, sed is, awk in its data analysis and report generation, is particularly strong. Awk is to simply read the file line by line, as the default delimiter spaces each row of slices, cut portions then various evaluation.

  awk is a browser-based rules specified in the document or string and extract information, the awk to extract information in order to carry out other text manipulation. Complete awk scripts are often used information in a formatted text file.

  Usually, awk is a behavior file processing units. awk each line of the document received, and executing the first command to process text.

awk operation

There are three ways to call awk

1.  command line

  awk [-F  field-separator]  'commands'  input-file(s)

Wherein, Commands awk command is true, [- F field delimiter] is optional. input-file (s) is a file to be processed. In awk, each line file, separated by a field separator for each called a domain. Typically, without delimiters domain name -F, default domain separator is a blank.

2.shell scripted

  All of the awk command to insert a file, and the executable program awk, awk command interpreter and then as the first line of the script, again invoked by typing the name of the script.

  Equivalent to shell script the first line:! # / Bin / sh

  Can be replaced with: # / bin / awk!

3. All awk command to insert a separate file and then calls: awk -f awk-script-file input-file (s)

  Which, -f option to load the script awk awk-script-file in, input-file (s) with the above is the same.

 

awk syntax

1, awk command format

(1) awk [-F domain partition] 'command' input-file (s)

(2) awk -f awk-script-file input-file(s)

 

Simulation file:

cat employee.txt

100  Thomas  Manager Sales 5000

200  Jason   Developer  Technology  5500

300  Sanjay  Sysadmin   Technology  7000

400  Nisha   Manager    Marketing   9500

500  Randy   DBA        Technology  60002

 

2.awk operation

 

1. each line of output file:

  awk '{print $0}' ./employee.txt

  

2. Output / etc / passwd first field

   awk  -F ":" '{print $1}' /etc/passwd

 

3. Print the entire contents of the file

  awk '{print $0}' employee.txt

 

4. The first column of the extract file test

  awk '{print $1}' employee.txt  

  or

  awk -F ' ' '{print $1}' employee.txt

  

  

5. List all the user names and login shell name

  awk -F ':' '{print $1,$6}' /etc/passwd

  

  When the separator is a plurality of symbols, such as:

  a , b , c , d

  a1 , b1 , c1 , d1

  awk -F ' , ' '{print $1,$2}' 文件名

 

 

6.打印用户名为root的那一行

  awk -F ':' '$1=="root" {print $0}' /etc/passwd

  或者

  awk -F ':' '$1=="keke" {print $1}' /etc/passwd

  

  说明:$1=="root"和$1=="keke"都是属于判断条件  

 

 

awk工作流程是这样的:读入有'\n'换行符分割的一条记录,然后将记录按指定的域分隔符划分域,填充域,$0则表示所有域,$1表示第一个域,$n表示第n个域。默认域分隔符是"空白键" 或 "[tab]键"。

 

 

7.给输出信息加上表头

awk -F ":" 'BEGIN {print "name\tshell\n--------------------------------"}

  {print $1"\t"$6}' /etc/passwd

  

8.给输出信息加上表头和末尾

awk -F : 'BEGIN {print "name\tshell\n--------------------------------"} {print $1"\t"$6}

  END {print "end-of-report"}' /etc/passwd


awk -F ":" 'BEGIN {print"--BEGIN--"}

           $1=="root" { print $1}

   END{print"----END------"}' /etc/passwd

   

awk -F ":" 'BEGIN {print"--BEGIN--"} {if( $1=="root") print $1}

  END{print"----END------"}' /etc/passwd

 

awk工作流程是这样的:先执行BEGING,然后读取文件,读入有/n换行符分割的一条记录,然后将记录按指定的域分隔符划分域,填充域,$0则表示所有域,$1表示第一个域,$n表示第n个域,随后开始执行模式所对应的动作action。接着开始读入第二条记录······直到所有的记录都读完,最后执行END操作。

 

awk与mapreduce区别

 

1. awk主要用于单机版文件的操作

2. mapreduce则可以用于分布式文件系统,可用于对大量数据的操作,缺点是编程比awk复杂,不过在框架的支持下,编写mapreducce程序只需要负责业务逻辑即可。


Guess you like

Origin blog.51cto.com/14479068/2432976