awk 4.0+ man手册翻译第一版本

CentOS 7 上awk 4.0以上版本的man手册翻译第一版:

参考地址:http://www.cnblogs.com/wutao666/p/9732976.html
参考地址:https://www.gnu.org/software/gawk/manual/gawk.html#Array-Sorting


GAWK(1)							 Utility Commands						  GAWK(1)



NAME
       gawk - pattern scanning and processing language
	   gawk - 模式扫描和处理语言

SYNOPSIS #概要,摘要
       gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
       gawk [ POSIX or GNU style options ] [ -- ] program-text file ...

       pgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
       pgawk [ POSIX or GNU style options ] [ -- ] program-text file ...

       dgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...

DESCRIPTION #描述
       Gawk  is	 the GNU Project's implementation of the AWK programming language.  It conforms to the definition of the language
       in the POSIX 1003.1 Standard.  This version in turn is based on the description in The AWK Programming Language,	 by  Aho,
       Kernighan, and Weinberger.  Gawk provides the additional features found in the current version of UNIX awk and a number of
       GNU-specific extensions.
	   
	   Gawk是AWK程序语言GNU项目的实现。它准守POSIX 1003.1 定义的标准。是由Aho,Kernighan以及Weinberger这三个人基于AWK程序语言描述实现的版本。
	   Gawk提供了可以在当前Unix awk版本中发现的附加的特性以及一些GNU版本特定的扩展。
	   

       The command line consists of options to gawk itself, the AWK program text (if not supplied via the -f or --file	options),
       and values to be made available in the ARGC and ARGV pre-defined AWK variables.
	   
	   命令行包含gawk命令本身以及它的选项,以及AWK程序处理的文本(如果没有使用-f或--file选项),这些值让awk预定义的变量(内置变量)ARGC和ARGV
	   可用.(简单来说,就是一个awk处理命令,有命令本身,选项,以及的文本组成,内置变量ARGC记录来命令行参数的个数,ARGV是一个数组,数组中
	   每个元素记录的是命令行参数的值。ARGV[0]表示gawk命令本身。这点符合C语言);

       Pgawk  is  the profiling version of gawk.  It is identical in every way to gawk, except that programs run more slowly, and
       it automatically produces an execution profile in the file awkprof.out when done.  See the --profile option, below.
		Pgawk是gawk的仿照品。它与gawk各个方面几乎都是一样,除了程序本身运行比较慢之外,在处理的时候,会在文件awkprof.out中自动生成
		一个执行profile。可以看看下面的--profile选项。
	   
	   
       Dgawk is an awk debugger. Instead of running the program directly, it loads the AWK  source  code  and  then  prompts  for
       debugging  commands.   Unlike  gawk  and	 pgawk, dgawk only processes AWK program source provided with the -f option.  The
       debugger is documented in GAWK: Effective AWK Programming.
	   
	   Dgawk是awk的调试器。它本身不会直接运行程序,而是会去加载AWK的源码并且提示调试的命令。不像gawk和pgawk,dgawk只会处理通过-f选项提供的
	   AWK源程序。调试器是以文档形式组织,真正的AWK程序。

OPTION FORMAT  #选项格式
       Gawk options may be either traditional POSIX-style one letter options, or GNU-style long	 options.   POSIX  options  start
       with  a	single “-”, while long options start with “--”.	 Long options are provided for both GNU-specific features and for
       POSIX-mandated features.
	   
	   Gawk的选项可能是POSIX风格的单字母选项(短选项),也可能是GNU风格的长选项。POSIX风格的选项以单个"-"(短破折号)开始,而GNU风格的长选项
	   以"--"开始。长选项会提供GUN独有的特性和授权的特性。

       Gawk- specific options are typically used in long-option form.  Arguments to long  options  are	either	joined	with  the
       option  by an = sign, with no intervening spaces, or they may be provided in the next command line argument.  Long options
       may be abbreviated, as long as the abbreviation remains unique.

       Additionally, each long option has a corresponding short option, so that the  option's  functionality  may  be  used  from
       within #!  executable scripts.
	   
	   另外,每个长选项有一个与之对应的短选项,以便选项的功能可以在一个以"$#" 开始的执行脚本中使用;
	   

OPTIONS #选项
       Gawk  accepts  the  following options.  Standard options are listed first, followed by options for gawk extensions, listed
       alphabetically by short option.
	   Gawk支持下面的选项。首先列出标准选项,接着列出gawk扩展选项,短选项的列出是按照字母的顺序排序的。

       -f program-file
       --file program-file
	      Read the AWK program source from the file program-file, instead of from the first command line argument.	 Multiple
	      -f (or --file) options may be used.
       
	   awk把我们常说的awk命令叫作处理程序(因为可能很复杂);
	   指定AWK程序读取的处理程序从文件program-file中读取,而不是从第一个命令行参数。可以是用多个-f或--file选项指定多个文件;	  
		  
       -F fs
       --field-separator fs
	      Use fs for the input field separator (the value of the FS predefined variable).
	   	  使用fs作为输入域分隔(符)  (FS是预先定义的变量,即AWK内置变量之一)
		  

       -v var=val
       --assign var=val
	      Assign  the value val to the variable var, before execution of the program begins.  Such variable values are 
		  available to the BEGIN block of an AWK program.
       在awk程序执行开始前,把值val复制给变量var。这样的变量赋值形式可以在AWK的BEGIN块中进行。
		  
		  
       -b
       --characters-as-bytes
	      Treat all input data as single-byte characters. In other words, don't pay any attention to the  locale  information
	      when attempting to process strings as multibyte characters.  The --posix option overrides this one.
		  把所有的输入数据当做单字节字符。换句话说,就是我们在尝试处理多字节字符串的时候,不用担心locale的问题。
		  此选项被--posix选项覆盖。(这个选项我看半天没看明白,用的很少)
	    	  
		  
       -c
       --traditional
	      Run  in  compatibility mode.  In compatibility mode, gawk behaves identically to UNIX awk; none of the GNU-specific
	      extensions are recognized.  See GNU EXTENSIONS, below, for more information.
          运行在兼容模式。运行与兼容模式的gawk,它的行为和传统unix上的awk是一致的,在兼容模式下,GNU特定扩展选项不再支持。
		  下面会有关于GNU扩展的介绍信息。
		  
		  
       -C
       --copyright
	      Print the short version of the GNU copyright information message on the standard output and exit successfully.
          简单格式打印GNU版权说明信息并成功退出。
		  
		  
       -d[file]
       --dump-variables[=file]
	      Print a sorted list of global variables, their types and final values to file.  If no file is provided, gawk uses a
	      file named awkvars.out in the current directory.
	      Having  a	 list  of  all the global variables is a good way to look for typographical errors in your programs.  You
	      would also use this option if you have a large program with a lot of functions, and you want to be sure  that  your
	      functions	 don't	inadvertently use global variables that you meant to be local.	(This is a particularly easy 
		  mistake to make with simple variable names like i, j, and so on.)
		  
		 (如果在awk的程序中加上此选项),以排序后的列表形式打印全局变量以及全局变量他们的类型,以及全局变量最终的值 到一个指定的文件
		或者默认文件中(如果没有给定保存的文件,会在当前目录下生成一个名为awkvars.out的文件来保存当前的awk程序所有上面指定的三类信息)
		
		[这个选项的意义和作用]:
			对于寻找程序排版书错误,列出所有全局的变量是一种比较好的方式。如果你的(gawk)程序有大量的函数,而且你想确认你的函数中不会
			有变量使用来全局定义的变量,你可以使用这个选项。(特别是那些简单变量类似于i,j等等,这些是最容易弄错的)
		  
		  

       -e program-text
       --source program-text
	      Use program-text as AWK program source code.  This option allows the easy intermixing of	library	 functions  (used
	      via  the	-f and --file options) with source code entered on the command line.  It is intended primarily for medium
	      to large AWK programs used in shell scripts.
	  使用program-text作为AWK程序的源码(注意区分和大多数变成语言的源码区别,这里指的就是awk程序模式或动作或者二者的组合)。
	  这个选项允许将文件中的源码和命令行中的源码混合使用。(使用-f或--file选项,可以指定awk的程序部分放在指定文件中)。这个选项主要
	  是为在shell脚本中使用中等或大量篇幅的awk程序时使用。(可以把awk程序部分和shell脚本部分分开时候使用这个选项可以灵活组合)
	   	  

		  
       -E file
       --exec file
	      Similar to -f, however, this is option is the last one processed.	 This should be used with #!   scripts,	 
		  particularly  for  CGI  applications, to avoid passing in options or source code (!) on the command line from a URL.  
		  This option disables command-line variable assignments.

	   	  和-f选项类似,不过这个选项是最后被处理的。这个选项应该放在以#! 开头的脚本中,特别是命令行接口的应用程序,为了避免
		  从一个URL中传递选项或者源码(!)给命令行。这个选项禁止命令行变量赋值;
		  
		  
       -g
       --gen-pot
	      Scan and parse the AWK program, and generate a GNU .pot (Portable Object Template) format file on	 standard  output
	      with  entries for all localizable strings in the program.	 The program itself is not executed.  See the GNU gettext
	      distribution for more information on .pot files.

		  扫描和解析AWK程序,并在标准输出上生成GNU .pot(可移植对象模板)格式文件,其中包含程序中所有可本地化字符串的条目。
		  (awk)程序本身没有执行。可以看GNU gettext了解更多关于.pot文件的信息。
		  PS:这个我本地没有测试成功,不知道为啥。
		  
		  
		  
       -h
       --help Print a relatively short summary of the available options on the standard output.	 (Per the GNU  Coding  Standards,
	      these options cause an immediate, successful exit.)
	  
       awk -h help 或 awk --help
		以简短格式打印可用选项的摘要,总览信息到标准输出。(根据GUN 编码标准,这个选项会立即成功退出。)
		PS:直接使用awk -h会提示我给一个参数(后边随便给个什么参数都行),否则退出状态不成功,帮助信息还是会显示;
		  

       -L [value]
       --lint[=value]
	      Provide  warnings about constructs that are dubious or non-portable to other AWK implementations.	 With an optional
	      argument of fatal, lint warnings become fatal errors.  This may be drastic, but its use  will  certainly	encourage
	      the development of cleaner AWK programs.	With an optional argument of invalid, only warnings about things that are
	      actually invalid are issued. (This is not fully implemented yet.)

	   	  提供关于可疑或不可移植到其他AWK实现的构造的警告. 
		  
       -n
       --non-decimal-data
	      Recognize octal and hexadecimal values in input data.  Use this option with great caution!
			识别输入数据中的八进制和十六进制的值。使用这个选项要非常小心。
		  
		  
       -N
       --use-lc-numeric
	      This forces gawk to use the locale's decimal point character when parsing input data.  Although the POSIX	 standard
	      requires	this  behavior, and gawk does so when --posix is in effect, the default is to follow traditional behavior
	      and use a period as the decimal point, even in locales where the period is not the decimal point	character.   This
	      option overrides the default behavior, without the full draconian strictness of the --posix option.
       
	      当解析输入数据时强制使用语言环境的小数点字符。 (默认小数点字符为".",但是根据不同环境,这种行为是可以改变的。)
		  虽然POSIX标准需要前边提到的这种行为(强制使用语言环境的小数点字符,有可能就不是英文句点字符了),但是在--posix的影响下,
		  gawk不会这样做,它默认还是使用传统的欣慰,使用句点作为一个小数点字符,即使是在句点不是小数点字符的环境下。
		  
		  这个选项显式指定会覆盖默认行为,而且这种行为并不想--posix选项那样规范和标准。
		  
       -O
       --optimize
	      Enable  optimizations  upon  the internal representation of the program.	Currently, this includes just simple 
		  constant-folding. The gawk maintainer hopes to add additional optimizations over time.
	    在程序的内部表示上启用优化。目前,这包括简单的常数合并。 gawk的维护者希望在处理时间上增加额外的优化。
        		
		  

       -p[prof_file]
       --profile[=prof_file]
	      Send profiling data to prof_file.	 The default is awkprof.out.  When run with gawk, the profile is just  a  “pretty
	      printed”	version	 of the program.  When run with pgawk, the profile contains execution counts of each statement in
	      the program in the left margin and function call counts for each user-defined function.

		  将分析数据发送到prof_file文件。默认值是awkprof.out.
		  当运行在gawk时候,分析数据仅仅是打印程序的版本信息。当运行在pgawk时,分析数据配置文件包含包含左边距中程序中每个语句的执行计数
		  和每个用户定义函数的函数调用计数。(感觉这个功能也是给awk开发者的)
		  
		  
       -P
       --posix
	      This turns on compatibility mode, with the following additional restrictions:

	      · \x escape sequences are not recognized.

	      · Only space and tab act as field separators when FS is set to a single space, newline does not.

	      · You cannot continue lines after ?  and :.

	      · The synonym func for the keyword function is not recognized.

	      · The operators ** and **= cannot be used in place of ^ and ^=.

		  
		这个(选项)启用兼容模式,会有以下附加的限制:
			(1) 无法识别\x的转移;
			(2) 当FS被设置为单个空间时,只有空格和Tab充当字段分隔符,换行符不被设置为分隔符; 
			(3) 之后行不能有?和: 
			(4) 关键字function的同义词func不能被识别;
			(5) ** 和 **=操作符不能代替^和^=;
		  
       -r
       --re-interval
	      Enable the use of interval expressions in regular expression matching (see Regular Expressions,  below).	 Interval
	      expressions  were	 not traditionally available in the AWK language.  The POSIX standard added them, to make awk and
	      egrep consistent with each other.	 They are enabled by default, but this option remains for use with --traditional.
       启用正则表达式中的区间表达式的使用。(区间表达式非常强大,sed支持)。下面会讲解正则表达式。
	   区间表达式在传统的AWK语言中是不可以用的。POSIX标准新增来它们,使得awk和egrep以及其他保持一致,就是都支持。
	   默认是启用这种特性的,使用--traditional,这种选项会保留。
		  
		  
       -R
       --command file
	      Dgawk only.  Read stored debugger commands from file.
		  只有DGAWK。从文件读取存储的调试器命令
		  

       -S
       --sandbox
	      Runs gawk in sandbox mode, disabling the system() function, input redirection with getline, output redirection with
	      print  and  printf,  and loading dynamic extensions.  Command execution (through pipelines) is also disabled.  This
	      effectively blocks a script from accessing local resources (except for the files specified on the command line).
      在沙盒模式下运行GOWK,禁用system()函数,用GETLIN输入重定向,输出Read打印和打印的方向,并加载动态扩展。
	  命令执行(通过管道)也被禁用。这有效地阻止了脚本访问本地资源(除了命令行中指定的文件)
		  
		  
       -t
       --lint-old
	      Provide warnings about constructs that are not portable to the original version of Unix awk.
       提供对UNIX AWK原始版本不可移植的结构的警告。
		  
		  
       -V
       --version
	      Print version information for this particular copy of gawk on the standard output.  This is useful mainly for know‐
	      ing  if the current copy of gawk on your system is up to date with respect to whatever the Free Software Foundation
	      is distributing.	This is also useful when reporting bugs.  (Per the GNU Coding Standards, these options	cause  an
	      immediate, successful exit.)
		  
		  打印AWK的版本信息到标准输出。

       --     Signal the end of options. This is useful to allow further arguments to the AWK program itself to start with a “-”.
	      This provides consistency with the argument parsing convention used by most other POSIX programs.
 
		  
       In compatibility mode, any other options are flagged as invalid, but are otherwise ignored.  In normal operation, as  long
       as  program  text  has  been  supplied, unknown options are passed on to the AWK program in the ARGV array for processing.
       This is particularly useful for running AWK programs via the “#!” executable interpreter mechanism.
	   
	   -- 可以终止选项,后边的选项不再识别。这有助于允许AWK程序本身的其他参数以“-”开始。
	   
	   这个标志的与大多数其他POSIX程序使用的一样,都符合这种特性。
	   
	   在兼容模式下,任何其他的选项都被标记为无效的,但是忽略其他。在正常操作中,只要提供了程序文本,未知选项就
	   传递给ARGV数组中的AWK程序进行处理。通过"#!"解释器机制运行akw程序,这样是特别有用的。
	   
	   
AWK PROGRAM EXECUTION  #awk程序执行(过程)   ps:这部分是重点

       An AWK program consists of a sequence of pattern-action statements and optional function definitions.
	   
	   一个AWK程序由一些 pattern-action(模式-动作)语句以及可选的函数定义组成。

	      @include "filename" pattern   { action statements }
	      function name(parameter list) { statements }

       Gawk first reads the program source from the program-file(s) if specified, from arguments to --source, or from  the  first
       non-option  argument  on	 the  command  line.  The -f and --source options may be used multiple times on the command line.
       Gawk reads the program text as if all the program-files and command line source	texts  had  been  concatenated	together.
       This  is	 useful for building libraries of AWK functions, without having to include them in each new AWK program that uses
       them.  It also provides the ability to mix library functions with command line programs.
        
	   如果有指定那个program-files,Gawk首先会去读取这些文件中的程序源码,上面讲过的--source参数,以及命令行的第一个非选项参数等。
       -f或--source选项可以使用多次。Gawk会把所有从指定文件中读取的处理程序(awk命令)合并一起。
	   这种合并特性,对于构建awk的函数是很有用的,可以单独定义特定功能的功能函数代码块,存储到指定的文件,然后awk的其他程序要使用
	   他们,可以通过选项指定这些文件,而不需要每次使用都在awk程序中
		
	   
	   
       In addition, lines beginning with @include may be used to include other source files into your program, making library use
       even easier.
	   除此之外,以@include开头的行可以用于包含其他程序文件到你的程序中,这样可以让引用库变得更加容易。
	   

       The  environment	 variable  AWKPATH specifies a search path to use when finding source files named with the -f option.  If
       this variable does not exist, the default path is ".:/usr/local/share/awk".  (The actual	 directory  may	 vary,	depending
       upon how gawk was built and installed.)	If a file name given to the -f option contains a “/” character, no path search is
       performed.
	   当使用-f选项来搜索源文件的时候,可以通过环境变量AWKPATH来指定一个搜索的路径。如果环境变量AWKPATH没有设置,默认的路径是".:/usr/local/share/awk".
	   实际情况,环境变量的目录是变化的,取决于gawk编译安装的时候的设置。如果-f选项指定文件名包含字符"/",就不会去做目录检索操作。
	   
	   

       Gawk executes AWK programs in the following order.  First, all variable assignments specified via the -v option	are  per‐
       formed.	 Next,	gawk  compiles the program into an internal form.  Then, gawk executes the code in the BEGIN block(s) (if
       any), and then proceeds to read each file named in the ARGV array (up to ARGV[ARGC]).  If there are no files named on  the
       command line, gawk reads the standard input.
	   
	   Gawk执行AWK程序按照以下的顺序。
	   首先,读取通过-v选项定义的变量。接下来,gawk会把程序编译成内部的形式。
	   然后,gawk会执行BEGIN 开始的代码块(如果存在,不管出现的顺序),然后去读取ARGV数组中每个名字(直到读取到ARGV[ARGC])。
	   如果没有指定文件名,gawk将会读取标准输入。
	   
       PS:ARGV和ARGC都是awk内置的预先定义好的变量,简称内置变量。这些内置变量是为了更好的帮助awk工作。
	   ARGC记录的是命令行参数的个数,ARGV是一个数组,数组的元素记录的是命令行参数的值。(这一特性还是C语言的)
	   
       If  a  filename on the command line has the form var=val it is treated as a variable assignment.	 The variable var will be
       assigned the value val.	(This happens after any BEGIN block(s) have been run.)	Command line variable assignment is  most
       useful  for dynamically assigning values to the variables AWK uses to control how input is broken into fields and records.
       It is also useful for controlling state if multiple passes are needed over a single data file.
	   
	   如果命令行的文件名是var=val的形式,这表示一个赋值操作。变量var将被赋值为val。(通过-v选项指定的变量赋值操作,是要在awk的
	   BEGIN语句块执行之后才完成的。) 用于控制输入被分隔成域和记录,在命令行动态的通过上面的方式赋值是非常有用的。也可以应用在
	   处理单个数据文件时,需要反复多次的处理,用于控制读入记录状态等。
	   
	   

       If the value of a particular element of ARGV is empty (""), gawk skips over it.
	   如果ARGV变量的值为空,gawk将跳过不处理。

       For each input file, if a BEGINFILE rule exists, gawk executes the associated code before processing the contents  of  the
       file. Similarly, gawk executes the code associated with ENDFILE after processing the file.
	   对于输入文件而言,如果BEGINFILE规则存在,gawk将在处理文件内容前执行(BEGINFILE)相关的代码。与此类似,gawk将会在处理完文件内容后,
	   执行ENDFILE相关的代码。
	   PS:BEGINFILE和ENDFILE相较于BEGIN和END的区别在于,BEGINFILE/ENDFILE这一组是相较于awk处理的文件数量而言的,对于处理N个文件,
	   BEGINFILE以及ENDFILE中的代码会执行N遍;而BEGIN/END这一组,是相较于整个awk的程序而言的,BEGIN只在文件流开始处理之前执行一次,
	   ENDFILE在文件流处理完成后执行一次。
	   
	   

       For  each  record in the input, gawk tests to see if it matches any pattern in the AWK program.	For each pattern that the
       record matches, the associated action is executed.  The patterns are tested in the order they occur in the program.
       对于每个输入的记录,gawk会去测试是否能被awk程序中的任何模式所匹配到。对于能被awk模式所匹配到的记录,模式对应的动作将会被执行。
	   gawk去测试模式的顺序是它们出现在程序中的顺序。
	   
       Finally, after all the input is exhausted, gawk executes the code in the END block(s) (if any).
	   最后,在所有的输入被处理完后,gawk执行END块的代码;

   Command Line Directories  #命令行目录
       According to POSIX, files named on the awk command line must be text files.  The behavior is  ``undefined''  if	they  are
       not.  Most versions of awk treat a directory on the command line as a fatal error.
	   根据POSIX标准,被awk处理的命令行的文件参数指明的文件必须是文本文件。如果处理的不是文本文件,其行为是未定义的。
	   大部分的版本的awk在命令行传入的文件参数是一个目录的时候,都会作为语法错误处理。
	   

       Starting	 with  version	4.0  of	 gawk,	a directory on the command line produces a warning, but is otherwise skipped.  If
       either of the --posix or --traditional options is given, then gawk reverts to treating directories on the command line  as
       a fatal error.
	   自gawk的4.0版本开始,如果传入awk处理的文件参数指明的是一个目录文件,命令行会提示一个警告信息,并跳过处理。如果加上来
	   --posix或--traditional选项,那么gawk还是会在遇到处理文件参数为一个目录时候,提示语法错误。
	   

VARIABLES, RECORDS AND FIELDS  #变量、记录、域   #(重点知识)
       AWK variables are dynamic; they come into existence when they are first used.  Their values are either floating-point num‐
       bers or strings, or both, depending upon how they are used.  AWK also has one dimensional  arrays;  arrays  with	 multiple
       dimensions  may	be simulated.  Several pre-defined variables are set as a program runs; these are described as needed and
       summarized below.
	   awk的变量都是动态的,在第一次使用的时候就存在了,不用申明变量类型(属于弱类型语言)。
	   根据变量的使用场景,变量的值可以是浮点数,字符串或者二者兼有。
	   AWK也有一纬数字;多维数组是模拟的(在一纬数组基础上)。在程序运行的时候,一些预定义变量(内置变量)被设置为对应值;相关知识点
	   总结如下:
	   

   Records  #记录
       Normally, records are separated by newline characters.  You can control how records are separated by assigning  values  to
       the  built-in  variable	RS.  If RS is any single character, that character separates records.  Otherwise, RS is a regular
       expression.  Text in the input that matches this regular expression separates the record.  However, in compatibility mode,
       only  the  first	 character  of	its  string  value is used for separating records.  If RS is set to the null string, then
       records are separated by blank lines.  When RS is set to the null string, the newline character always  acts  as	 a  field
       separator, in addition to whatever value FS may have.
	   
	   默认情况下,记录是被换行符分隔的(也就是每一行文本是一个记录)。
	   可以通过内建变量RS来控制"(输入)记录"如何被分隔;
	   
	   如果RS是任意单个字符,那么就用这个单个字符来分隔记录。否则,RS可以作为一个正则表达式。输入文本如果匹配到这个正则表达式,
	   就按照正则表达式来分隔记录。
	   
	   但是,在兼容模式下,只有字符串中的第一个字符会作为记录分隔符。
	   如果RS设置成null串,记录会被空白分隔。
	   当RS被设置为空串(null)的时候(RS=' '),不管内置变量FS设置成什么,换行符(\n)总会被作为一个域分隔符。
	   

   Fields  #域/列/字段
       As  each input record is read, gawk splits the record into fields, using the value of the FS variable as the field separa‐
       tor.  If FS is a single character, fields are separated by that character.  If FS is the null string, then each individual
       character  becomes a separate field.  Otherwise, FS is expected to be a full regular expression.	 In the special case that
       FS is a single space, fields are separated by runs of spaces and/or tabs and/or newlines.  (But see the section POSIX COM‐
       PATIBILITY,  below).   NOTE:  The  value	 of IGNORECASE (see below) also affects how fields are split when FS is a regular
       expression, and how records are separated when RS is a regular expression.
	   
	   当每个输入记录被读的时候,gawk使用FS的值作为域分隔符,会把记录分隔成域(列)的形式。
	   如果FS是一个单独的字符,域会被这个单独的字符所分隔(把输入记录按照这个单独的字符分隔成一个一个的列/域)。
	   
	   如果FS是一个空串(null string),那么每个单独的字符都是一个单独的域。其他形式,FS可以是一个完整的正则表达式。
	   
	   在特殊的条件下,FS是一个单独的空白(其实也是默认情况),域被空格,横向制表符以及换行符所分隔。(下面讲解的POSIX兼容模式,会影响特定行为)。
	   
	   注意:IGNORECASE内置变量(下面有讲解这个变量)的值会影响FS作为一个正则表达式时候,域如何被分隔,以及当RS作为一个正则表达式时,
	   记录如何被分隔。
	   
	   
	   
       If the FIELDWIDTHS variable is set to a space separated list of numbers, each field is expected to have fixed  width,  and
       gawk  splits  up	 the record using the specified widths.	 The value of FS is ignored.  Assigning a new value to FS or 
	   FPAT overrides the use of FIELDWIDTHS.
	   
	   如果(内建)变量FIELDWIDTHS设置为空白分隔的数字列表,那么每个域将会显示为固定的宽度,并且gawk会把记录使用指定的宽度来分隔。
	   
	   已经生效的FS将会被忽略。(但是)重新对FS赋值(后面有内容提到FS重新赋值)以及使用FPAT内建变量的优先级要高于FIELDWIDTHS。
	   
	   
	   
       Similarly, if the FPAT variable is set to a string representing a regular expression, each field is made up of  text  that
       matches that regular expression. In this case, the regular expression describes the fields themselves, instead of the text
       that separates the fields.  Assigning a new value to FS or FIELDWIDTHS overrides the use of FPAT.
	   同样地,如果(内建变量)FPAT设置为一个正则表达式,组成域名的文本将会是匹配正则表达式的(gawk基于匹配正则表达式的文本来创建字段)。
	   在这种情况下,正则表达式描述来列它们自己,而不是文本分隔的列。对FS或FIELDWIDTHS赋新值会覆盖FPAT的作用。
	   
	   
       Each field in the input record may be referenced by its position, $1, $2, and so on.  $0 is the whole record.  Fields need
       not be referenced by constants:
	   输入记录中的每个域名可以通过它的位置所引用,$1,$2等等。$0表示全部记录。域不需要被常量所引用(简单来说不能把保存字段数据的变量,
	   $1,$2,$3...赋值给一个常量,例如这种形式是有语法错误的:{1=$1} )。

	      n = 5
	      print $n

       prints the fifth field in the input record.
	   打印输入记录的第五个字段的值。

       The variable NF is set to the total number of fields in the input record.
	   (内置)变量NF记录了输入记录中域的总数;
	   PS:小技巧,引用输入记录的最后一个字段的值,$NF;引用输入记录倒数第二个字段的值,$(NF-1),注意要使用小括号,
	       依次类推。遍历字段可以使用for循环语句。

	   
####下面这段比较重要而且经典,我们拆解的注解	
参考:
http://bbs.chinaunix.net/thread-2319120-1-1.html
https://blog.csdn.net/anljf/article/details/6433498  
	   
       References  to  non-existent fields (i.e. fields after $NF) produce the null-string.  
	   引用不存在的域(例如:$NF之后的域)会产生一个空串。(简单来说就是,如果你输入记录按照域分隔符分隔后只有3列,你引用$4,$5等就为空串值。)
	   
	   However, assigning to a non-existent field (e.g., $(NF+2) = 5) increases the value of NF, 
	   然后,对不存在的域赋值(例如 $(NF+2)=5  )会增加NF的值
	   
	   creates any intervening fields with the null string as  their  value,
	   空字符串作为中间域的值
	   
       and  causes the value of $0 to be recomputed, with the fields being separated by the value of OFS.  
	   根据OFS值,$0会被重新计算
	   
	   References to negative numbered fields cause a fatal error.  
		引用负编号的域是无效的,会导致一个语法错误
	   
	   Decrementing NF causes the values of fields past the new value to be lost, 
	   减少NF值时,索引大于NF的域将会丢失
	   
	   and the value of $0 to be recomputed, with the fields being separated by the value of OFS.
	   同时$0也会根据OFS重新被计算
 
       Assigning  a value to an existing field causes the whole record to be rebuilt when $0 is referenced.
	   对当前存在的域值进行赋值,会使记录在$0被引用时重构
	   
	   Similarly, assigning a value to $0 causes the record to be resplit, creating new values for the fields.
       类似地,对$0赋值,也会使记录重新分隔,对域重新赋值。
	   
	   
	   
	   
   Built-in Variables  #(gawk)内建变量
       Gawk's built-in variables are:
	   Gawk的内建变量如下:

       ARGC	   The number of command line arguments (does not include options to gawk, or the program source).
	           命令行参数数量(不包括gawk的选项或程序源码)。
		

       ARGIND	   The index in ARGV of the current file being processed.
				   被处理文件对应ARGV数组中的索引值。

       ARGV	   Array of command line arguments.  The array is indexed from 0 to ARGC - 1.  Dynamically changing the	 contents
		   of ARGV can control the files used for data.
		   命令行参数值的数组。数组索引从0到ARGC-1 。ARGV内容动态变化能够控制提供数据(处理输入记录)的文件。
			

       BINMODE	   On  non-POSIX systems, specifies use of “binary” mode for all file I/O.  Numeric values of 1, 2, or 3, specify
		   that input files, output files, or all files, respectively, should use binary I/O.  String values of	 "r",  or
		   "w"	specify that input files, or output files, respectively, should use binary I/O.	 String values of "rw" or
		   "wr" specify that all files should use binary I/O.  Any other string value is treated as "rw", but generates a
		   warning message.
		   
		   在非POSIX系统,指定使用"二进制"模式处理所有的文件I/O。 
		   
		   在二进制的模式下,数字1,2,3分别指定为输入文件,输出文件和所有文件,而字符串"r","w","rw"或"wr"分别表示输出文件,输出文件,以及
		   所有文件。其他字符串的值统一当作"rw"处理(即打开所有文件),不过会产生一个警告消息。
		   

       CONVFMT	   The conversion format for numbers, "%.6g", by default.
					数值的输出格式。

       ENVIRON	   An array containing the values of the current environment.  The array is indexed by the environment variables,
		   each element being the value of that variable (e.g., ENVIRON["HOME"] might be  /home/arnold).   Changing  this
		   array does not affect the environment seen by programs which gawk spawns via redirection or the system() func‐
		   tion.
           包含当前环境变量的数组。这个数组是以当前环境变量作为索引的,每个元素对应其变量的值。例如:ENVIRON["HOME"]表示所处环境的环境变量
           HOME的值。修改这个数组的值不影响直接程序所看到的环境变量。		   
		   
		   
       ERRNO	   If a system error occurs either doing a redirection for getline, during  a  read  for  getline,  or	during	a
		   close(),  then  ERRNO will contain a string describing the error.  The value is subject to translation in non-
		   English locales. 
		   
		   如果在做getline重定向的时候,或则在通过getline读的时候,或者通过close()函数操作的时候,系统出现错误。
		   内置变量ERRNO会记录错误的字符串信息。这个字段记录的值会语言环境的变化,翻译成对应的记录。
		   
	   

       FIELDWIDTHS     A whitespace separated list of field widths.	 When set, gawk parses the input  into	fields	of  fixed  width,
		   instead of using the value of the FS variable as the field separator.  See Fields, above.
		    空格分隔的字段宽度列表。如果设置来,gawk会显示输入记录的字段成固定的宽度而不是使用FS的值作为字段分隔符。(具体FS字段分隔符,
			请看上面注解)
		   

       FILENAME	   The	name  of the current input file.  If no files are specified on the command line, the value of FILENAME is
		   “-”.	 However, FILENAME is undefined inside the BEGIN block (unless set by getline).
		   当前输入的文件名。如果在命令行没有文件制定,内建变量FILENAME的值为"-"。然后,在BEGIN这个特殊的语句块中,FILENAME是未定义的(除非使用了getline)
		   

       FNR	   The input record number in the current input file.
			   当前输入文件的输入记录编号;	
	   

       FPAT	   A regular expression describing the contents of the fields in a record.  When set, gawk parses the input  into
		   fields,  where  the	fields match the regular expression, instead of using the value of the FS variable as the
		   field separator.  See Fields, above.
		   用于描述记录中字段内容的正则表达式。如果设置来,gawk会吧输入记录按照匹配正则表达式来分隔字段值,而不是使用默认的FS变量的
		   值作为字段分隔符。(上面有讲解何为字段(Fields) )
		   
		   参考:https://www.cnblogs.com/yangfengtao/archive/2013/06/07/3124100.html
		   PS:假设有个文本内容为:
		   Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA
		   现在要以逗号为分隔符,取出"1234 A Pretty Street, NE",而这个元素中又有逗号,所以有问题,可以使用FPAT作为字段风格符,
		   ,每个域或者是不包含","的字符串,或者是由一对双引号引起来的字符串。其正则表达式形式如下:
		   FPAT = "([^,]+)|(\"[^\"]+\")"
		   echo 'Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA'|awk 'BEGIN{FPAT = "([^,]+)|(\"[^\"]+\")"}{print $3}'
		   
		   
		   

       FS	   The input field separator, a space by default.  See Fields, above.
			   输入字段分隔符,默认是一个空白。关于字段,上面有介绍
	               
	   

       IGNORECASE  Controls the case-sensitivity of all regular expression and string operations.  If IGNORECASE has  a	 non-zero
		   value,  then	 string comparisons and pattern matching in rules, field splitting with FS and FPAT, record sepa‐
		   rating with RS, regular expression matching with ~ and !~, and the gensub(), gsub(),	 index(),  match(),  pat‐
		   split(),  split(),  and  sub()  built-in  functions	all ignore case when doing regular expression operations.
		   
		   NOTE: Array subscripting is not affected.  However, the asort() and asorti() functions are affected.
		   Thus, if IGNORECASE is not equal to zero, /aB/ matches all of the strings "ab", "aB", "Ab", and "AB".  As with
		   all	AWK  variables,	 the initial value of IGNORECASE is zero, so all regular expression and string operations
		   are normally case-sensitive.

		   IGNORECASE内置变量控制所有正则表达式和字符串操作过程中的大小写敏感控制。
		   如果IGNORECASE设置为一个非0值,那么字符串比较,模式匹配规则,使用FS和FPAT内置变量分隔字段,使用内置变量RS分隔
		   记录,通过符号~或~!匹配正则表达式以及gensub(),gsub(),index(),match(),pat-split(),split(),sub()等内建函数,所有
		   上述提到的操作中都不区分字符串大小写(字符串大小写不敏感)。
		   
		   说明: awk中的数组下标不受到内建变量IGNORECASE的影响。然而,asort(),asorti()函数受其影响(默认awk关联数组排序规则
		   和我们认为显示的不一样,有时候要用到asort()和asorti()函数进行排序处理)。
		   如果IGNORECASE设置为非0值,那么/aB/可以匹配"ab","aB","Ab","AB";
		   与所有AWK的变量一样,IGNORECASE的初始值为0,所以所有的正则表达式以及字符串操作全部是大小写敏感的(默认行为)。
		   
		   
		   
       LINT	   Provides dynamic control of the --lint option from within an AWK program.  When true, gawk prints  lint  warnings.
		    When	false,	it  does not.  When assigned the string value "fatal", lint warnings become fatal errors,
		   exactly like --lint=fatal.  Any other true value just prints warnings.
		  
			从一个AWK程序内部提供对--lint选项的动态控制。如果LINT值为真,gawk会打印lint警告,如果设置为假,则不打印。
			如果设置LINT的值为字符串"fatal",lint警告信息将会变成语法错误,和--lint=fatal效果一样。设置为其它真值只会
			打印警告信息。(要了解LINT内建变量,就要了解--lint命令行选项的用法和含义,用的比较少)

		  
	  	   

       NF	   The number of fields in the current input record.
			   当前输入记录的字段的数量。

       NR	   The total number of input records seen so far.
				目前为止,输入记录的总编号。

       OFMT	   The output format for numbers, "%.6g", by default.
				默认关于输出格式的格式控制为"%.6g"。
	   
	   

       OFS	   The output field separator, a space by default.
			  输出字段分隔服,默认是空白。

       ORS	   The output record separator, by default a newline.
				输出记录分隔符,默认是一个换行符。

       PROCINFO	   The elements of this array provide access to information about the running  AWK  program.   On  some	 systems,
		   there may be elements in the array, "group1" through "groupn" for some n, which is the number of supplementary
		   groups that the process has.	 Use the in operator to test for these	elements.   The	 following  elements  are
		   guaranteed to be available:
		   
	   内建变量PROCINFO是一个关联数组。数组中的元素提供了正在运行中的AWK程序的一个访问入口。在某些系统中,还有一些其他元素,
	   从group1到groupn表示n个元素。可以利用关键字in来测试是否这些元素存在(判断数组中元素是否存在,in,下文有讲解)。
	   下面的元素保证都可以使用:
		   

		   PROCINFO["egid"]    the value of the getegid(2) system call.
		   记录系统调用getegid()的值;

		   PROCINFO["strftime"]
				       The default time format string for strftime().
					  默认的时间格式的字符串,相关格式可以了解函数strftime();

		   PROCINFO["euid"]    the value of the geteuid(2) system call.
		   记录系统调用geteuid()的值;

		   PROCINFO["FS"]      "FS"  if	 field	splitting with FS is in effect, "FPAT" if field splitting with FPAT is in
				       effect, or "FIELDWIDTHS" if field splitting with FIELDWIDTHS is in effect.

		   PROCINFO["gid"]     the value of the getgid(2) system call.
		   记录系统调用getgid()的值。

		   PROCINFO["pgrpid"]  the process group ID of the current process.
		   当前进程的运行用户组的组id;

		   PROCINFO["pid"]     the process ID of the current process.
		   当前进程的进程id;

		   PROCINFO["ppid"]    the parent process ID of the current process.
		   当前进程的父进程的进程id;

		   PROCINFO["uid"]     the value of the getuid(2) system call.
		   系统调用getuid()的值。

		   PROCINFO["sorted_in"]
				       If this element exists in PROCINFO, then its value controls the order in which array  elements
				       are  traversed  in for loops.  Supported values are "@ind_str_asc", "@ind_num_asc",
				       "@val_type_asc",	  "@val_str_asc",   "@val_num_asc",   "@ind_str_desc",	 "@ind_num_desc",
				       "@val_type_desc",  "@val_str_desc",  "@val_num_desc", and "@unsorted".  The value can also
				       be the name of any comparison function defined as follows:
          如果此元素存在于PROCINFO数组中,它的值控制来循环时,数组中元素的横穿的顺序(简单来说就是这个值会影响数组元素的
		  排序或顺序)。支持的值有"@ind_str_asc", "@ind_num_asc", "@val_type_asc",	  "@val_str_asc",   "@val_num_asc",   
		  "@ind_str_desc",	 "@ind_num_desc", "@val_type_desc",  "@val_str_desc",  "@val_num_desc", and "@unsorted". 
		  它的值可以是下面这种定义形式的比较函数的名字。
					   
			  function cmp_func(i1, v1, i2, v2)

		   where i1 and i2 are the indices, and v1 and v2 are the corresponding values of the  two  elements  being  compared.
		   It should return a number less than, equal to, or greater than 0, depending on how the elements of the
		   array are to be ordered.
		   i1和i2是指数,v1和v2是要比较的两个元素的值。返回值应该小于,等于或大于0,具体的返回值取决于数组中的元素的顺序。

		   PROCINFO["version"]
			  the version of gawk.
			  当前gawk的版本信息。
			  
			  
			  

       RS	   The input record separator, by default a newline.
	          输入记录分隔符,默认是换行符。
			  

       RT	   The record terminator.  Gawk sets RT to the input text that matched the character or regular expression specified by RS.
		   内建变量RT记录着记录终止符。设置后,每次读取一条记录,读取一条记录后终止。

       RSTART	   The	index  of  the	first  character matched by match(); 0 if no match.  (This implies that character indices
		   start at one.)
		   被match()函数所匹配的第一个字符的索引,如果没有匹配,设置为0
		   (这意味着字符索引从头开始)

       RLENGTH	   The length of the string matched by match(); -1 if no match.
					被match()函数所匹配到的字符串的长度,没有匹配,值为-1。

       SUBSEP	   The character used to separate multiple subscripts in array elements, by default "\034".
					分隔数组元素多个下标的字符,默认为"\034"
					

       TEXTDOMAIN  The text domain of the AWK program; used to find the localized translations for the program's strings.
					AWK程序的文本域说明;用于查找程序字符串的本地环境翻译。
	   
	   
	   
	   
   Arrays   #数组
       Arrays are subscripted with an expression between square brackets ([ and ]).  If the  expression	 is  an	 expression  list
       (expr,  expr  ...)   then  the  array  subscript is a string consisting of the concatenation of the (string) value of each
       expression, separated by the value of the SUBSEP variable.  This facility is used to simulate multiply dimensioned arrays.
       For example:
	   
	   数组通过方括号([和])包含下标表达式。数组下标表示部分可以是一个表达式的列表(expr,expr ...),下标表示部分的表达式可以通过SUBSEP内建变量指定的
	   分隔符分隔多个字符串的值。这种可以使用表达式多个表达式下标的能力,让(awk)的数组变得更加丰富多样化。请看下面的示例:
	   
	   

	      i = "A"; j = "B"; k = "C"
	      x[i, j, k] = "hello, world\n"

       assigns	the  string  "hello,  world\n"	to  the element of the array x which is indexed by the string "A\034B\034C".  All
       arrays in AWK are associative, i.e. indexed by string values.
       数组x,下标索引为"A,B,C" ,其值为"hello, world\n"。awk中的数组都是关联数组,索引可以是字符串值。
	   

       The special operator in may be used to test if an array has an index consisting of a particular value:

	      if (val in array)
		   print array[val]
		   
	  可以通过下面的这种"val in array"操作来判断是否某个索引位置对应的数组中的元素是某个特殊的值。	   

       If the array has multiple subscripts, use (i, j) in array.
	   数组有多个下标索引,(上面的判断元素是否存在的操作)使用"(i,j) in array"这种形式。

       The in construct may also be used in a for loop to iterate over all the elements of an array.
	   in结构语句可以用于循环来迭代遍历数组中的所有元素。

       An element may be deleted from an array using the delete statement.  The delete statement may also be used to  delete  the
       entire contents of an array, just by specifying the array name without a subscript.
	   可以使用delete删除数组的指定元素(要指定要删除元素对应的索引)。也可以使用delete语句删除整个数组的所有元素,即为删除整个数组,
	   这个时候不能指定下标,直接指定数组名即可。

       gawk  supports  true multidimensional arrays. It does not require that such arrays be ``rectangular'' as in C or C++.  For
       example:
	      a[1] = 5
	      a[2][1] = 6
	      a[2][2] = 7
	   gawk支持真正的多维数组。不需要像C或者C++语言一样引入矩阵。例如:
       a[1] = 5
       a[2][1] = 6
       a[2][2] = 7	   
		  
		  

   Variable Typing And Conversion  #变量类型转换
       Variables and fields may be (floating point) numbers, or strings, or both.  How the value of  a	variable  is  interpreted
       depends upon its context.  If used in a numeric expression, it will be treated as a number; if used as a string it will be
       treated as a string.

	   变量和字段可以是(浮点)数值,或字符串或两者。变量的值解析为什么类型的数据取决于它的内容。如果在数值表达式中使用,它的类型为
	   数值,如果在字符串(表达式场景)中使用,它的值为字符串。
	   
       To force a variable to be treated as a number, add 0 to it; to force it to be treated as a string, concatenate it with the
       null string.
	   如果想让一个变量的值强制为一个数值,可以给它复制一个0.如果想让一个变量的值强制为一个字符串,可以复制给它一个空串。

       When  a	string must be converted to a number, the conversion is accomplished using strtod(3).  A number is converted to a
       string by using the value of CONVFMT as a format string for sprintf(3), with the numeric value  of  the	variable  as  the
       argument.   However,  even though all numbers in AWK are floating-point, integral values are always converted as integers.
       Thus, given
	   当一个字符串必须要转换成一个数值的时候,会根据strtod()函数来转换(linux系统的库函数之一)。数值转换成字符串的时候会使用内建变量
	   CONVFMT作为sprintf()函数打印输出时候的格式,变量的数值作为sprintf()函数的参数。然而,虽然awk中所有的数值都是浮点数值,不过如果
	   是一个整数值,总是会转换成一个整数的字符串。例如:

	      CONVFMT = "%2.2f"
	      a = 12
	      b = a ""

       the variable b has a string value of "12" and not "12.00".
	   变量b是通过变量a(其值为一个整数12)参与字符串运行后得到,b变量的字符串的值只能是"12"而不是"12.00"

       NOTE: When operating in POSIX mode (such as with the --posix command line option), beware that locale settings may  interfere
       with  the way decimal numbers are treated: the decimal separator of the numbers you are feeding to gawk must conform
       to what your locale would expect, be it a comma (,) or a period (.).
	   
	   注意:当操作在POSIX模式下(命令行选项使用--posix),不能环境场景对待小数点符号的表现形式是不一样的,要确认不同环境使用的小数点
	   表示形式,有些是逗号(,)有些是点号"."来表示逗号。

       Gawk performs comparisons as follows: If two variables are numeric, they	 are  compared	numerically.   If  one	value  is
       numeric	and  the other has a string value that is a “numeric string,” then comparisons are also done numerically.  Otherwise, 
	   the numeric value is converted to a string and a string comparison is  performed.	 Two  strings  are  compared,  of
       course, as strings.
	   Gawk执行比较会按照下面的方式:如果两个变量是数值类型,它们会按照数值的方式进行比较。如果一个变量是数值类型,另外一个变量是数值字符串,
	   然后也会按照数值方式进行比较。否则,数值会转换成一个字符串并按照字符串的方式进行比较。如果两个变量的类型都是字符串,会按照字符串的方式
	   进行比较。
	   

       Note  that  string  constants,  such  as	 "57",	are not numeric strings, they are string constants.  The idea of “numeric
       string” only applies to fields, getline input, FILENAME, ARGV elements, ENVIRON elements and the elements of an array
       created  by	 split()  or  patsplit()  that are numeric strings.  The basic idea is that user input, and only user input, that
       looks numeric, should be treated that way.
	   注意字符串常量,像"57"不是数值字符串,它只是字符串常量。"数值字符串"只应用于 字段,getline输入,FILENAME内建变量,ARGV的元素,ENVIRON的元素以及
	   

       Uninitialized variables have the numeric value 0 and the string value "" (the null, or empty, string).
	   未初始化的变量初始值为数值0或者字符串""(空字符串)。

   Octal and Hexadecimal Constants  #八进制和十六进制常量
       You may use C-style octal and hexadecimal constants in your AWK program source code.  For example, the octal value 011  is
       equal to decimal 9, and the hexadecimal value 0x11 is equal to decimal 17.
	   可以在AWK程序源码中使用C风格的八进制和十六进制常量。例如:八进制的011表示十进制的9,16进制的0x11表示十进制的17
	   

   String Constants   #字符串常量
       String  constants  in AWK are sequences of characters enclosed between double quotes (like "value").  Within strings, 
       certain escape sequences are recognized, as in C.  These are:
	   在AWK中,字符串常量被双引号引起来,例如"value"
	   在字符串内部,某特转移序列是支持的,就像在C语言中一样。

       \\   A literal backslash.      表示反斜线字符本身

       \a   The “alert” character; usually the ASCII BEL character. "告警字符",通常对应ASCII表的BEL字符

       \b   backspace.  退格字符

       \f   form-feed.   表单字符

       \n   newline.   换行符

       \r   carriage return.   回车字符

       \t   horizontal tab. 横向/水平制表符

       \v   vertical tab.  垂直制表符

       \xhex digits   16进制字符
	    The character represented by the string of hexadecimal digits following the \x.  As in ANSI C, all following hexadec‐
	    imal digits are considered part of the escape sequence.  (This feature should tell us something about language design
	    by committee.)  E.g., "\x1B" is the ASCII ESC (escape) character.
		
		16进制字符以0x开头表示

       \ddd The character represented by the 1-, 2-, or 3-digit sequence of octal digits.  E.g., "\033" is the ASCII ESC (escape)
	    character.
		八进制字符0开头表示

       \c   The literal character c. 字符c本身

       The  escape sequences may also be used inside constant regular expressions (e.g., /[ \t\f\n\r\v]/ matches whitespace char‐
       acters).
	   转移字符在正则表达式中页游应用。
	   

       In compatibility mode, the characters represented by octal and hexadecimal escape sequences  are	 treated  literally  when
       used in regular expression constants.  Thus, /a\52b/ is equivalent to /a\*b/.
	   在兼容模式中,在正则表达式常数中使用八进制和十六进制转义序列表示的字符时,按字面意思处理。因此,/a\52b/等于/a\*b/
	   

PATTERNS AND ACTIONS   #模式和(处理)动作
       AWK  is	a line-oriented language.  The pattern comes first, and then the action.  Action statements are enclosed in { and
       }.  Either the pattern may be missing, or the action may be missing, but, of course, not both.  If the pattern is missing,
       the action is executed for every single record of input.	 A missing action is equivalent to
	   AWK是面向行的语言。首先是模式,然后是对应的处理动作。处理动作语句包含在{}中。模式可以单独省略,处理动作也可以单独省略(如果都省略,
	   也行,就是没有了)
	   
	      { print }

       which prints the entire record.
	   { print } 会打印全部记录。

       Comments	 begin	with the # character, and continue until the end of the line.  Blank lines may be used to separate state‐
       ments.  Normally, a statement ends with a newline, however, this is not the case for lines ending in a comma, {, ?, :, &&,
       or  ||.	 Lines	ending	in do or else also have their statements automatically continued on the following line.	 In other
       cases, a line can be continued by ending it with a “\”, in which case the newline is ignored.
	   
	   注释以字符#开始,然后直到该行结束。空行可以用来分隔语句,一般来说,一个语句结束是按照换行符来的,但是,如果有下面的情况,行结尾有逗号,{,
	   ?,:,&&,||,前面的结束规则并不使用。如果一行以do或else结束,语句也会继续。还有如果行以符号\(通常叫续行符)结束,行也会继续到下一行。

       Multiple statements may be put on one line by separating them with a “;”.  This applies to both the statements within  the
       action part of a pattern-action pair (the usual case), and to the pattern-action statements themselves.
	   多个语句放在同一行,可以使用分号";"来分隔开。一般来说如果多个语句使用分号来隔开,通常是放在同一组模式-动作中的动作部分,那么这些语句
	   只适用于对应的这组模式。
	   

   Patterns   #模式
       AWK patterns may be one of the following:
	   AWK模式可以是以下的语句:

	      BEGIN
		  BEGIN语句块关键字
		  
	      END
		  END语句块关键字
		  
	      BEGINFILE
		  BEGINFILE语句块关键字(与BEGIN的区别在于处理多个文件,如果有n个文件,这个BEGINFILE模式处理动作语句会执行n遍)
		  
	      ENDFILE
		  ENDFILE语句块关键字(类似于BEGINFILE,与END的区别也在于处理多个文件)
		  
		  
	      /regular expression/
		  正则表达式(如果表示一个正则表达式语句,要使用符号//包含进来)
		  
	      relational expression
		  关系表达式
		  
	      pattern && pattern
		  多个模式之间逻辑与运算
		  
	      pattern || pattern
		  多个模式之间逻辑或运算
		  
	      pattern ? pattern : pattern
		  条件表示式中模式的表示形式。
		  
	      (pattern)
		  单个模式
		  
	      ! pattern
		  模式取反
		  
	      pattern1, pattern2
		  模式地址定界(从某模式匹配到的行开始到某模式匹配到的行结束之间的所有行)
		  

       BEGIN  and  END	are  two special kinds of patterns which are not tested against the input.  The action parts of all BEGIN
       patterns are merged as if all the statements had been written in a single BEGIN block.  They are executed  before  any  of
       the  input  is  read.   Similarly, all the END blocks are merged, and executed when all the input is exhausted (or when an
       exit statement is executed).  BEGIN and END patterns cannot be combined with other patterns in pattern expressions.  BEGIN
       and END patterns cannot have missing action parts.
	   BEGIN和END是两个特殊的模式,这两个模式中的输入不会经过测试。多个BEGIN模式中的动作处理部分会被合并在一起,就好像所有的语句写在
	   单个BEGIN块中。BEGIN块中的语句在记录读入之前被执行。与此类似,所有的END模式中的动作处理部分也会被合并在一起,当所有的输入记录
       被处理完(或者遇到exit语句被执行后)后,END模式中的语句将会被执行。BEGIN和END模式不能和其他模式表达式混合,只能单独作为一个关键字
	   表示一个特定模式,而且BEGIN和END模式不能省略动作处理部分。
	   

       BEGINFILE  and  ENDFILE	are additional special patterns whose bodies are executed before reading the first record of each
       command line input file and after reading the last record of each file.	Inside the BEGINFILE rule,  the	 value	of  ERRNO
       will be the empty string if the file could be opened successfully.  Otherwise, there is some problem with the file and the
       code should use nextfile to skip it. If that is not done, gawk produces its usual fatal error for  files	 that  cannot  be
       opened.
	   BEGINFILE 和 ENDFILE是附加的两个特殊的模块。BEGINFILE对应处理动作部分会在命令行指定的每个输入文件读取其第一个记录之前执行,有
	   多少个输入文件,BEGINFILE模式的action会执行多次。ENDFILE对应处理动作部分会在命令行指定的每个输入文件读取其最后一个记录后执行,
	   同样是有多少个输入文件,该模式的action部分会执行多次。
	   
       For  /regular  expression/  patterns,  the associated statement is executed for each input record that matches the regular
       expression.  Regular expressions are the same as those in egrep(1), and are summarized below.
	   对于正则表达式模式(/regular expression/),只有当输入记录匹配到正则表达式时,相关的语句才会被执行。正则表达式语法和egrep中的
	   类似,下面会有对应总结。
	   

       A relational expression may use any of the operators defined below in  the  section  on	actions.   These  generally  test
       whether certain fields match certain regular expressions.
	   关系表示式可能使用下面 operators段落定义中的部分动作。用来测试是否制定域能被指定正则表达式所匹配。
	   

       The  &&,	 ||, and !  operators are logical AND, logical OR, and logical NOT, respectively, as in C.  They do short-circuit
       evaluation, also as in C, and are used for combining more primitive pattern expressions.	 As in most languages,	
	   parentheses may be used to change the order of evaluation.
	   && , || 以及 ! 操作时逻辑与,逻辑或以及逻辑非 (就像C语句中一样。)
	   逻辑操作服能够实现像C语言中的逻辑短路操作一样,可以让模式表达式更简洁。
	   像在大多数其他语言中一样,可以使用圆括号改变计算的优先顺序。
	  
	   
       The  ?:	operator  is  like the same operator in C.  If the first pattern is true then the pattern used for testing is the
       second pattern, otherwise it is the third.  Only one of the second and third patterns is evaluated.
	   
	   ?:操作类似于C语言中的操作(三目运算符),
	   pattern1 ? pattern2 : pattern3
	   如果pattern1为真,取pattern2的值,否则取pattern3的值。只会取pattern2和pattern3中的一个。
	   

       The pattern1, pattern2 form of an expression is called a range pattern.	It matches all	input  records	starting  with	a
       record  that  matches  pattern1, and continuing until a record that matches pattern2, inclusive.	 It does not combine with
       any other sort of pattern expression.
	   /pattern1/,/pattern2/ 是地址定界,范围模式。对于所有输入记录中,能够被pattern1匹配开始,直到被pattern2匹配结束,并且包含边界。
	   这种形式不支持与其他模式表达式一起结合使用。
	   

   Regular Expressions  #正则表达式
       Regular expressions are the extended kind found in egrep.  They are composed of characters as follows:
	   正则表达式是和egrep的差不多的,支持扩展的正则表达式语法。由以下字符组成:
	   
	  
       c	  matches the non-metacharacter c.  匹配非元字符c
 
       \c	  matches the literal character c.   匹配字符c

       .	  matches any character including newline.  匹配包含换行符在内的任意字符

       ^	  matches the beginning of a string.  匹配字符串的开始

       $	  matches the end of a string.    匹配字符串的结束

       [abc...]	  character list, matches any of the characters abc....  字符列表,匹配abc...中任意字符

       [^abc...]  negated character list, matches any character except abc....  字符列表取反,匹配除了abc...之外的任意字符

       r1|r2	  alternation: matches either r1 or r2. 交替:要么匹配r1要么匹配r2

       r1r2	  concatenation: matches r1, and then r2.  级联,连接:匹配r1然后是r2 

       r+	  matches one or more r's.   匹配单个或多个r

       r*	  matches zero or more r's.  匹配0个或多个r

       r?	  matches zero or one r's.  匹配0个或1个r

       (r)	  grouping: matches r.  分组:匹配r

       r{n}
       r{n,}
       r{n,m}	  One or two numbers inside braces denote an interval expression.  If there is one number in the braces, the
		  preceding  regular expression r is repeated n times.  If there are two numbers separated by a comma, r is repeated
		  n to m times.	 If there is one number followed by a comma, then r is repeated at least n times.
		  花括号中的1个或两个数字表示数字区间表达式。如果花括号中只有一个数字,例如n,表示匹配前边的正则表达式r重复n次。
		  如果花括号中有以逗号分隔的两个数字,r至少重复n次,至多重复m次,如果括号内只有一个数字且后边有一个逗号,表示r至少重复n次。
		  
		  PS:在CentOS 7之前,这个区间表达式如果要想能识别,要显式指明-r(--re-interval)或--posix
		  而CentOS 7默认就支持,除非你加上了--traditional
		  

       \y	  matches the empty string at either the beginning or the end of a word.
			   匹配单词开头或结尾的空串。

       \B	  matches the empty string within a word.
			  匹配单词内的空串,

       \<	  matches the empty string at the beginning of a word.
			  匹配单词开始的空串。

       \>	  matches the empty string at the end of a word.
	          匹配单词结尾的空串。

       \s	  matches any whitespace character.  匹配任意空白字符。

       \S	  matches any nonwhitespace character. 匹配任意非空白字符。

       \w	  matches any word-constituent character (letter, digit, or underscore). 匹配任意单词组成字符(信件字符,数字,或下划线)
 
       \W	  matches any character that is not word-constituent.   匹配任意非单词字符。

       \`	  matches the empty string at the beginning of a buffer (string). 匹配缓存区开头的空串。

       \'	  matches the empty string at the end of a buffer.  匹配缓冲区结尾的空串。

       The escape sequences that are valid in string constants (see below) are also valid in regular expressions.
	   
	  转移序列在字符串常量(下文有提到)以及在正则表达式中都可以使用。

       Character classes are a feature introduced in the POSIX standard.  A character class is a special notation for  describing
       lists  of  characters  that have a specific attribute, but where the actual characters themselves can vary from country to
       country and/or from character set to character set.  For example, the notion of what is an alphabetic character differs in
       the USA and in France.
	   字符类是POSIX标准引入的特性。字符类是一种特殊的符号,用于描述具有特定属性的字符列表,但实际字符本身可以因国家和/或字符集而异。
	   例如,关于字母字符在美国的语言和法国的语言中是有不同概念的。
	   
       A  character  class is only valid in a regular expression inside the brackets of a character list.  Character classes consist
       of [:, a keyword denoting the class, and :].  The character classes defined by the POSIX standard are:
		字符类仅在字符列表括号内的正则表达式中有效。[: 和 :]表示字符类的关键字。在POSIX标准中的字符类定义如下:
	   
	   
       [:alnum:]  Alphanumeric characters. 字母数字字符

       [:alpha:]  Alphabetic characters.  字母字符(大小写字母)

       [:blank:]  Space or tab characters. 空格或横向制表符
 
       [:cntrl:]  Control characters. 控制字符

       [:digit:]  Numeric characters.  数字字符

       [:graph:]  Characters that are both printable and visible.  (A space is printable, but not visible, while an a is both.)
				  可打印和可显式字符。(空格可以打印不过不可见,字符a既可以打印也可见)

       [:lower:]  Lowercase alphabetic characters.  小写字母字符

       [:print:]  Printable characters (characters that are not control characters.)  可打印字符(可打印字符不属于控制字符)

       [:punct:]  Punctuation characters (characters that are not letter, digits, control characters, or space characters).
				  标点符号字符(除了书信字符,数字字符,控制字符,以及空格字符之外)

       [:space:]  Space characters (such as space, tab, and formfeed, to name a few).
				空白字符(空格,横向制表符,表单符)

       [:upper:]  Uppercase alphabetic characters.
				   大写字母字符

       [:xdigit:] Characters that are hexadecimal digits.
				16进制数字字符。	

       For example, before the POSIX standard, to match alphanumeric characters, you would have had to write  /[A-Za-z0-9]/.   If
       your  character	set  had other alphabetic characters in it, this would not match them, and if your character set collated
       differently from ASCII, this might not even match the ASCII alphanumeric characters.  With the  POSIX  character	 classes,
       you  can write /[[:alnum:]]/, and this matches the alphabetic and numeric characters in your character set, no matter what
       it is.
	   
	   例如,在POSIX标准之前,如果要匹配字母数字字符,可以写成/[A-Za-z0-9]/。如果你的字符集中有其他的字母字符,这个模式将不会匹配到他们。
	         如果你的字符集编码不同于ASCII码,那么这个也不匹配ASCII的字母字符。如果换成POSIX的字符类,匹配字母数字字符可以写成/[[:alnum:]]/,
			 这种模式不管你的字符集如何设置,它将会匹配到你字符集的对应的字母数字字符。(/A-Za-z0-9/和[[:alnum:]]的区别主要和字符集环境有关系
			 还有编码)

       Two additional special sequences can appear in character lists.	These apply to non-ASCII character sets, which	can  have
       single  symbols	(called collating elements) that are represented with more than one character, as well as several charac‐
       ters that are equivalent for collating, or sorting, purposes.  (E.g., in French, a plain “e” and a grave-accented “`”  are
       equivalent.)
	   两个附加的特殊的序列可以出现在字符列表中。它们适用于非ASCII字符集,它们有单独的符号标记(称为校对元素)。它们代表不止一种自负,
	   也有一些其他字符,等价于这些能够实现校验,排序目的。(例如,在法语中,字符"e"和字符"`"是等价的)
	   
	   
       Collating Symbols   #校对符号
	      A	 collating symbol is a multi-character collating element enclosed in [.	 and .].  For example, if ch is a collat‐
	      ing element, then [[.ch.]]  is a regular expression that matches this collating element, while [ch]  is  a  regular
	      expression that matches either c or h.
		  校对字符的元素包含在[.和.]之间。例如ch是要校对的元素,那么[[.ch.]]就是一个用于匹配这个校对元素的的正则表达式,而[ch]是一个
		  用于匹配要么c要么h字符的震泽表达式

       Equivalence Classes  #等价的类,相同的类
	      An  equivalence class is a locale-specific name for a list of characters that are equivalent.  The name is enclosed
	      in [= and =].  For example, the name e might be used to represent all of “e,” “´,” and “`.”  In this case,  [[=e=]]
	      is a regular expression that matches any of e, ´, or `.

       These  features	are  very  valuable  in	 non-English  speaking locales.	 The library functions that gawk uses for regular
       expression matching currently only recognize POSIX character classes; they do not recognize collating symbols  or  equiva‐
       lence classes.
	   校验符号和等价的类这些特性在非英语语言环境中很有价值。
	   gawk的库函数只识别POSIX标准中的字符类,不是别校验符号以及等价的类。
	   

       The  \y, \B, \<, \>, \s, \S, \w, \W, \`, and \' operators are specific to gawk; they are extensions based on facilities in
       the GNU regular expression libraries.
	   \y, \B, \<, \>, \s, \S, \w, \W, \`, \' 在gawk中只有定义的;它们是基于GNU标准正则表达式函数扩展而来的。
	   
	   

       The various command line options control how gawk interprets characters in regular expressions.
       各种用于控制gawk如何识别处理正则表达式的命令行选项。
	   
	   
       No options
	      In the default case, gawk provide all the facilities of POSIX regular expressions and the	 GNU  regular  expression
	      operators described above.
	   不指定特定选项
          在默认的情况下,gawk支持上面提到的POSIX正则表达式语法以及GNU正则表达式语法。	   

       --posix
	      Only POSIX regular expressions are supported, the GNU operators are not special.	(E.g., \w matches a literal w).
		  只支持POSIX 正则表达式,GNU操作不支持(列入\w匹配字面意义的w字符)
		  

       --traditional
	      Traditional  Unix awk regular expressions are matched.  The GNU operators are not special, and interval expressions
	      are not available.  Characters described by octal and hexadecimal escape sequences are treated literally,	 even  if
	      they represent regular expression metacharacters.
	      只有传统unix awk正则表达式支持。GNU 操作不支持,区间表达式( {m},{m,n},{m,}等)不可用。八进制和16进制转义序列按照字母
		  原意解释。
		  
		  
       --re-interval
	      Allow interval expressions in regular expressions, even if --traditional has been provided.
		  提供区间正则表达式的语法支持(有时候会说这个是开启awk支持扩展正则表达式语法的选项),如果--re-interval选项显式指定,
		  其优先级要高于--traditional,所以二者同时出现,也会支持区间表达式的正则表达语法。
		  PS:经过测试当--re-interval和--traditional二者共存的时候,CentOS6和CentOS 7这里无法识别正则表达式语法中的区间表达式。
			  所以这里可能是我环境有问题或者我理解有误,所以尽量别使用--traditional
		  

   Actions  #动作
       Action  statements  are	enclosed in braces, { and }.  Action statements consist of the usual assignment, conditional, and
       looping statements found in most languages.  The operators, control statements, and input/output statements available  are
       patterned after those in C.
	   
	   动作语句包含在花括号中{}。动作语句由普通赋值,条件判断以及大部分语言中的循环语句构成。
	   可用的操作符、控制语句和输入/输出语句按照C来的。
	   

   Operators   #操作(符)
       The operators in AWK, in order of decreasing precedence, are

       (...)	   Grouping
	   分组

       $	   Field reference.
	   域引用

       ++ --	   Increment and decrement, both prefix and postfix.
	   自增或自减,两个操作服出现的位置可以是前缀或者后缀。

       ^	   Exponentiation (** may also be used, and **= for the assignment operator).
	   求幂(**也可以被使用,**= 是一个赋值操作)

       + - !	   Unary plus, unary minus, and logical negation.
	   一元操作符加,一元操作符减,以及逻辑取反

       * / %	   Multiplication, division, and modulus.
	   乘,除,取余

       + -	   Addition and subtraction.
	   加法和减法

       space	   String concatenation.
	   字符串连接。
	   

       |   |&	   Piped I/O for getline, print, and printf.、
	   为getline,print以及printf命令的管道I/O操作。

       < > <= >= != ==
		   The regular relational operators.
		   关系操作。
		       小于:<
		       大于:>
			   小于等于:<=
			   大于等于:>=
			   不等于:!=
			   等值比较:==

       ~ !~	   Regular  expression match, negated match.  NOTE: Do not use a constant regular expression (/foo/) on the left-
		   hand side of a ~ or !~.  Only use one on the right-hand side.  The expression /foo/ ~ exp has the same meaning
		   as (($0 ~ /foo/) ~ exp).  This is usually not what was intended.
		   正则表达式匹配:~
		   正则表达式不匹配:!~
		   注意:不要在~ 或 !~的左边使用一个常量正则表达式。常量正则表达式要放在操作符右边。表达式/foo/ ~ exp与 (($0 ~ /foo/) ~ exp)
		   意义相同,这往往不是我们需要的。
		   
		   
       in	   Array membership.
	   数组成员判断。

       &&	   Logical AND.
	   逻辑与

       ||	   Logical OR.
	   逻辑或

       ?:	   The	C  conditional	expression.  This has the form expr1 ? expr2 : expr3.  If expr1 is true, the value of the
		   expression is expr2, otherwise it is expr3.	Only one of expr2 and expr3 is evaluated.

	   C条件表达式。形式如下:
           expr1 ? expr2 : expr3
		   含义:
			   如果expr1为真,整个表达式的值取expr2表达式的值;如果expr1为假,整个表达式的值取expr3表达式的值。
			   表达式expr2和表达式expr3只能取一个。
		   
       = += -= *= /= %= ^=
		   Assignment.	Both absolute assignment (var = value) and operator-assignment (the other forms) are supported.
		   赋值。支持绝对赋值(var = value)和操作复制.
		   =  赋值;
		   += 加等,先相加后赋值;
		   -= 减等,先相减后赋值;
		   *= 乘等,先相乘后赋值;
		   /= 除等,先相除后赋值;
		   %= 取余等,先取余后赋值;
		   ^= 幂等,先求幂后赋值;
		   

   Control Statements  #控制语句
       The control statements are as follows:
	   控制语句有以下几种形式:

	      if (condition) statement [ else statement ]
		  if条件判断语句(单分支和双分支)
		  
	      while (condition) statement
		  while循环判断语句
		  
	      do statement while (condition)
		  do while循环判断语句
		  
	      for (expr1; expr2; expr3) statement
		  标准for循环语句
		  
	      for (var in array) statement
		  特殊for循环语句(数组遍历)
	      
		  break
		  break语句
		  
	      continue
		  continue语句
	      
		  delete array[index]
		  删除数组指定元素
		  
	      delete array
		  删除整个数组
		  
	      exit [ expression ]
		  退出
		  
	      { statements }
		  多语句块
		  
	      switch (expression) {
	      case value|regex : statement
	      ...
	      [ default: statement ]
	      }
		  switch分支语句
		  

   I/O Statements   #I/O语句
   
       The input/output statements are as follows:
	   输入/输出语句如下:
	   

       close(file [, how])   Close file, pipe or co-process.  The optional how should only be used when closing one end of a two-
			     way pipe to a co-process.	It must be a string value, either "to" or "from".
	关闭文件、管道或者co-process(这个东西不知道是什么,所以我就给出字面含义)。		 
				 

       getline		     Set $0 from next input record; set NF, NR, FNR.
	   设置$0来自于下一个输入记录。设置了NF,NR,FNR

       getline <file	     Set $0 from next record of file; set NF.
	   设置$0来自于指定文件的下一个记录,设置了NF

       getline var	     Set var from next input record; set NR, FNR.
	   设置了var来自于下一个输入记录,设置了NR,FNR

       getline var <file     Set var from next record of file.
	   设置var来自于指定文件的下一个记录。

       command | getline [var]
			     Run command piping the output either into $0 or var, as above.
				 把运行命令的结果通过管道的形式给到$0或var变量
				 

       command |& getline [var]
			     Run  command  as a co-process piping the output either into $0 or var, as above.  Co-processes are a
			     gawk extension.  (command can also be a socket.  See the subsection Special File Names, below.)
		
       cmd1 |&	cmd2 这种形式,在bash 4.0以后版本支持的语法,表示把cmd1的标准输出和标准错误都通过管道给cmd2.	
	   这里的command |& getline [var]也可以这里理解。
				 

       next		     Stop processing the current input record.	The next input record is read and processing starts  over
			     with  the	first  pattern	in  the	 AWK  program.	 If the end of the input data is reached, the END
			     block(s), if any, are executed.
				 
				 停止处理当前的输入记录。开始读入下一个输入记录,然后重新从第一个模式开始处理。如果已经读到了输入数据的结尾,
				 那么会执行END块的部分(如果有)或者正常退出(没有END块)。

       nextfile		     Stop processing the current input file.  The next input record read comes from the next input  file.
			     FILENAME  and  ARGIND are updated, FNR is reset to 1, and processing starts over with the first pat‐
			     tern in the AWK program. If the end of the input data is reached, the END block(s), if any, are exe‐
			     cuted.
				 停止处理当前的输入文件。读取下一个输入文件,FILENAME和ARGIND内建变量的值被重新更新,FNR被重新设置为1,并且开始
				 从第一个模式开始处理。如果已经读到了输入数据的结尾,那么会执行END块的部分(如果有)或者正常退出(没有END块)。
				 
       print		     Print the current record.	The output record is terminated with the value of the ORS variable.
	   
				打印当前的记录。输出记录以ORS内建变量的值为结束,默认ORS为换行符。

       print expr-list	     Print  expressions.   Each	 expression  is	 separated  by the value of the OFS variable.  The output
			     record is terminated with the value of the ORS variable.
				 打印表达式。每个表达式被OFS内建变量分隔的,默认的OFS的值为空白(空格)。输出记录以ORS内建变量的值为结束,默认ORS为换行符。
				 

       print expr-list >file Print expressions on file.	 Each expression is separated by the value of the OFS variable.	 The out‐
			     put record is terminated with the value of the ORS variable.
				 打印表达式的覆盖到一个文件。每个表达式被OFS内建变量分隔的,默认的OFS的值为空白(空格)。输出记录以ORS内建变量的值为结束,
				 默认ORS为换行符。

       printf fmt, expr-list Format and print.	See The printf Statement, below.
			    以指定格式打印表达式列表。可以参考下文提到的printf语句。
	   

       printf fmt, expr-list >file
			     Format and print on file.
				 以指定格式输出覆盖写入指定文件。

       system(cmd-line)	     Execute  the  command cmd-line, and return the exit status.  (This may not be available on non-POSIX
			     systems.)
				 在awk中执行命令行的命令,返回命令的退出状态码。(这一特性在非POSIX系统中可能不可用)

       fflush([file])	     Flush any buffers associated with the open output file or pipe file.  If file is missing or if it is
			     the null string, then flush all open output files and pipes.
				 刷新awk打开的输出文件以及管道文件相关联的缓冲。如果文件没有给定或者是一个空串,flush将会刷新所有打开的文件和管道。

       Additional output redirections are allowed for print and printf.
	   附加的用于print和printf语句的输出重定向。

       print ... >> file
	      Appends output to the file.
		  追加输出到指定文件中。

       print ... | command
	      Writes on a pipe.
		  把打印结果写入管道,通过管道传递给后边的命令。

       print ... |& command
	      Sends data to a co-process or socket.  (See also the subsection Special File Names, below.)
          发送特殊到一个co-process 或一个套接字。(可以看下文中特殊文件名的分段内容)
		  PS:我觉得就是把print的标准输出和标准错误通过管道传递给后边的内容。
		  
		  
       The  getline  command  returns 1 on success, 0 on end of file, and -1 on an error.  Upon an error, ERRNO contains a string
       describing the problem.
	   getline命令返回1表示成功,返回0文件结束,-1表示出错。一旦出错,ERRNO会将记录关于错误的描述信息。
	   
       NOTE: Failure in opening a two-way socket will result in a non-fatal error being returned  to  the  calling  function.  If
       using  a pipe, co-process, or socket to getline, or from print or printf within a loop, you must use close() to create new
       instances of the command or socket.  AWK does not automatically close pipes, sockets, or	 co-processes  when  they  return
       EOF.
	   注意:打开多个套接字会导致返回非语法错误给调用函数。如果使用管道,复合管道(co-process允许我这里就翻译成复合管道),以及给getline的套接字,
	   或者通过醺醺的print或printf语句,要显式指明close()后去创建新的实例的命令或套接字。当读取到文件结尾的时候,awk不支持自动关闭管道,套接字
	   文件以及符合管道。
	   

   The printf Statement  #printf语句
       The  AWK versions of the printf statement and sprintf() function (see below) accept the following conversion specification
       formats:
        AWK版本中的printf语句和sprintf()函数接受以下的(格式)转换规范:
	   
	   
       %c      A single character.  If the argument used for %c is numeric, it is treated as a character and printed.  Otherwise,
	       the argument is assumed to be a string, and the only first character of that string is printed.
		   一个单独的字符。如果待打印的参数是一个数值,配合%c格式,会输出数值的字符形式。如果待打印的参数是一个字符串,包含多个字符,
		   那么指定%c格式后,只会打印字符串的第一个字符。
		   
       %d, %i  A decimal number (the integer part).
			   指定%d或%i后会打印十进制数值(只会打印整数部分,不会做是四舍五入)

       %e, %E  A floating point number of the form [-]d.dddddde[+-]dd.	The %E format uses E instead of e.
				科学计数法数值显示;显示格式为[-]d.dddddde[+-]dd或[-]d.ddddddE[+-]dd。
				
				[root@node1 ~]# echo 12222222222|awk '{printf"%e\n",$0}'
                1.222222e+10
                [root@node1 ~]# echo 12222222222|awk '{printf"%E\n",$0}'
                1.222222E+10
                [root@node1 ~]# echo -12222222222|awk '{printf"%E\n",$0}'
                -1.222222E+10
                [root@node1 ~]# echo -122228029380598209322222|awk '{printf"%E\n",$0}'
                -1.222280E+23
                [root@node1 ~]# echo -0.11174447138|awk '{printf"%E\n",$0}'
                -1.117445E-01
				

       %f, %F  A  floating  point  number of the form [-]ddd.dddddd.  If the system library supports it, %F is available as well.
	       This is like %f, but uses capital letters for special “not a number” and “infinity” values. If %F  is  not  avail‐
	       able, gawk uses %f.
		   打印浮点数。

       %g, %G  Use  %e	or  %f	conversion,  whichever	is  shorter, with nonsignificant zeros suppressed.  The %G format uses %E
	       instead of %e.
		   %g或%G会使用%e或%f的格式来显示(科学计数法或浮点数),会去掉没用的0,比如前导0.在科学计数法中,%G会使用E而%g会使用e。
		   
		   

       %o      An unsigned octal number (also an integer).
				无符号八进制整数

       %u      An unsigned decimal number (again, an integer).
				无符号十进制整数。

       %s      A character string.
				字符串。

       %x, %X  An unsigned hexadecimal number (an integer).  The %X format uses ABCDEF instead of abcdef.
				无符号16进制整数。%X使用ABCDEF形式,%x使用abcdef形式。

       %%      A single % character; no argument is converted.
				打印字符%本身。

       Optional, additional parameters may lie between the % and the control letter:
	   在参数和%之间可以加一些控制字符,支持控制字符如下:

       count$ Use the count'th argument at this point in the formatting.  This is called a positional specifier and  is	 intended
	      primarily	 for  use  in translated versions of format strings, not in the original text of an AWK program.  It is a
	      gawk extension.
		  
	  上面的不好理解,我摘自一段GNU官网的说明:

      N$      An integer constant followed by a ‘$’ is a positional specifier. Normally, 
			  format specifications are applied to arguments in the order given in the format string. 
			  With a positional specifier, the format specification is applied to a specific argument, 
			  instead of what would be the next argument in the list. 
			  Positional specifiers begin counting with one. Thus:
			  
			  

			  一个整数常量后边跟着一个美元符"$",表示位置说明符。一般来说,格式规范是按照参数在格式化字符串中给定的顺序来引用生效的。
			  有了位置修饰符,格式规范按照特定参数应用或生效,而不是接下来列表中的参数。位置修饰符从1开始计数。因此:
			
			  printf "%s %s\n", "don't", "panic"
			  printf "%2$s %1$s\n", "panic", "don't"
			  
			  prints the famous friendly message twice.
			  At first glance, this feature doesn’t seem to be of much use. 
			  It is in fact a gawk extension, intended for use in translating messages at runtime. 
			  See Printf Ordering, which describes how and why to use positional specifiers. 
			  For now, we ignore them.

       -      The expression should be left-justified within its field.
			  左对齐修饰符。

       space  For numeric conversions, prefix positive values with a space, and negative values with a minus sign.
				对于数值转换,用空格前缀正值,用负号前缀负值
				
       +      The plus sign, used before the width modifier (see below), says to always supply a sign  for  numeric  conversions,
	      even if the data to be formatted is positive.  The + overrides the space modifier.
			显示使用d、e、f、g转换的整数时,加上正负号(+或-)
		  
		  
       #      Use  an  “alternate  form”  for  certain control letters.	 For %o, supply a leading zero.	 For %x, and %X, supply a
	      leading 0x or 0X for a nonzero result.  For %e, %E, %f and %F, the result always contains a decimal point.  For %g,
	      and %G, trailing zeros are not removed from the result.

		  使用指定控制字符的替代形式。简单来说如果我原先这样用,%o换成来%#o,%x或%X换成了%#x或%#X等,还有%e,%E,%f,%F,%g,%G都可以在%字符和
		  后边的控制格式字母之间加上一个字符"#"。
		  %#o:输出结果,八进制会带上数字0作为前缀表示;%o原先输出八进制数字不会带上0前缀;
		  %#x或%#X:输出结果,十六进制会带上0x或0X作为前缀;%x或%X原先输出不会带上十六进制表示的前缀0x或0X;
		  而 %e, %E, %f and %F配合字符"#"后,结果总是会包含小数点字符;
		  %g和%G配合字符"#"后,结果的前导0不会被移除,原先会移除无用的前导0.
		  
		  
       0      A	 leading  0  (zero)  acts  as a flag, that indicates output should be padded with zeroes instead of spaces.  This
	      applies only to the numeric output formats.  This flag only has an effect when the field width is	 wider	than  the
	      value to be printed.

		  默认,如果字段空格控制显示宽度要比字段的值要宽,那么就会有空格填充。例如:%15d可以写成%015d,那么显示宽度大于字段宽度部分,
		  原先由空格填充,现在变成来由数字0来填充。
		  这里说的是,一个前导的数字0会被当做一种标志,表示输出应该使用0来填充而不是空格。这个是有在打印数值的时候才有用,而且还要
		  显示的宽度要大于字段值的宽度。
		  
	
	
       width  The  field should be padded to this width.  The field is normally padded with spaces.  If the 0 flag has been used,
	      it is padded with zeroes.
		该字段应填充到此宽度。该字段通常填充有空格。如果已经使用了0标志,则用零填充。
		  
		  
       .prec  A number that specifies the precision to use when printing.  For the %e, %E, %f and %F, formats, this specifies the
	      number  of digits you want printed to the right of the decimal point.  For the %g, and %G formats, it specifies the
	      maximum number of significant digits.  For the %d, %i, %o, %u, %x, and %X formats, it specifies the minimum  number
	      of digits to print.  For %s, it specifies the maximum number of characters from the string that should be printed.
			指定打印时使用的精度的数字。
		  
		  
       The  dynamic  width and prec capabilities of the ANSI C printf() routines are supported.	 A * in place of either the width
       or prec specifications causes their values to be taken from the argument list to printf or sprintf().  To use a positional
       specifier  with	a  dynamic  width  or  precision,  supply  the	count$	after  the  * in the format string.  For example,
       "%3$*2$.*1$s".
	   
	   上面这两段可以总结为:
	   num1[.num2]:num1或num2可以省略。第一个数字(num1)控制显示的宽度;第二个数字(num2)表示小数点后的精度;

   Special File Names   #特定的文件名
       When doing I/O redirection from either print or printf into a file, or via getline from a file,	gawk  recognizes  certain
       special	filenames internally.  These filenames allow access to open file descriptors inherited from gawk's parent process
       (usually the shell).  These file names may also be used on the command line to name data files.	The filenames are:
	   
	   

       /dev/stdin  The standard input.  标准输入

       /dev/stdout The standard output. 标准输出

       /dev/stderr The standard error output. 标准错误输出

       /dev/fd/n   The file associated with the open file descriptor n.  文件描述n相关联的打开的文件

       These are particularly useful for error messages.  For example:

	      print "You blew it!" > "/dev/stderr"

       whereas you would otherwise have to use

	      print "You blew it!" | "cat 1>&2"

       The following special filenames may be used with the |& co-process operator for creating TCP/IP network connections:

       /inet/tcp/lport/rhost/rport
       /inet4/tcp/lport/rhost/rport
       /inet6/tcp/lport/rhost/rport
	      Files for a TCP/IP connection on local port lport to remote host rhost on remote port rport.  Use a port	of  0  to
	      have  the	 system	 pick  a  port.	  Use /inet4 to force an IPv4 connection, and /inet6 to force an IPv6 connection.
	      Plain /inet uses the system default (most likely IPv4).

       /inet/udp/lport/rhost/rport
       /inet4/udp/lport/rhost/rport
       /inet6/udp/lport/rhost/rport
	      Similar, but use UDP/IP instead of TCP/IP.

   Numeric Functions    #数值函数
       AWK has the following built-in arithmetic functions:

       atan2(y, x)   Return the arctangent of y/x in radians.
					返回y/x的反正切的弧度值。

       cos(expr)     Return the cosine of expr, which is in radians.
					返回expr的余弦,结果是一个弧度。

       exp(expr)     The exponential function.
					指数函数。

       int(expr)     Truncate to integer.  截断为一个整数。

       log(expr)     The natural logarithm function. 自然对数函数。底数为e

       rand()	     Return a random number N, between 0 and 1, such that 0 ≤ N < 1.
					返回程序第一次产生时候的介于0和1之间的随机数(不是每一次都是随机的)

       sin(expr)     Return the sine of expr, which is in radians. 返回expr的正弦,结果是一个弧度。

       sqrt(expr)    The square root function. 平方根函数。

       srand([expr]) Use expr as the new seed for the random number generator.	If no expr is provided, use the time of day.  The
		     return value is the previous seed for the random number generator.
					使用expr作为随机数生成器的新的种子。如果expr没有提供,将会使用今天的时间。返回值随机数生成器的前一个种子。

   String Functions  #字符串函数
       Gawk has the following built-in string functions:

       asort(s [, d [, how] ]) Return  the  number of elements in the source array s.  Sort the contents of s using gawk's normal
			       rules for comparing values, and replace the indices of the sorted values s with	sequential 
			       integers  starting  with  1.	 If the optional destination array d is specified, then first duplicate s
			       into d, and then sort d, leaving the indices of the source array s unchanged. The optional  string
			       how  controls  the direction and the comparison mode.  Valid values for how are any of the strings
			       valid for PROCINFO["sorted_in"].	 It can also be the name of a user-defined comparison function as
			       described in PROCINFO["sorted_in"].
				   
				   返回原数组s的元素的数量。使用gawk的内部默认规则(就是根据普通的规则对值做比较)对数组内容进行排序,并且会用从1开始的整数,按照往后
				   增加的顺序(1,2,3,4,...)的整数序列值去替换排序的数组s的下标索引。
				   
				   如果可选的目标数组d有指定,会先复制s一份到d,然后对d排序,并且保留源数组s的索引不变。(建议使用d参数)
				   
				   可选的字符串"how"控制比较的方面和比较的模式。how参数的有效值可以使内建变量PROCINFO(它是一个数组)中的
				   数组元素PROCINFO["sorted_in"]的值,上文有讲到这个数组的详细值和含义说明。也可以是用户自定义的存储在
				   PROCINFO["sorted_in"]的自定义比较函数。通常how参数会省略掉,也是新版awk的一个参数。
		
					
				   

       asorti(s [, d [, how] ])
			       Return the number of elements in the source array s.  The behavior is the same as that of asort(),
			       except that the array indices are used for sorting, not the array values.  When done, the array is
			       indexed	numerically,  and  the values are those of the original indices.  The original values are
			       lost; thus provide a second array if you wish to	 preserve  the	original.   The	 purpose  of  the
			       optional string how is the same as described in asort() above.
				   
				   返回源数组s的元素个数。行为和函数asort()很像,不过asorti不像asort函数,它是对数组的下标进行排序,而不是
				   数组的元素。排序后,数组的索引会从1开始的序列数字(1,2,3,...),并且把原先的下标索引作为数组元素的值存储。
				   原来数组的值就丢失来;因此,如果你希望保留原来的数组,请提供第二个参数d(一个数组)。
				   d提供后,同理,先把s拷贝一份,然后后面的行为操作都是对d进行。
				   
				   参数how的意义也和asort函数一样。
				   
		
      上面两个函数的案例可以参考这个链接:
	  http://blog.chinaunix.net/uid-21374062-id-3189744.html
		

       gensub(r, s, h [, t])   Search  the  target  string t for matches of the regular expression r.  If h is a string beginning
			       with g or G, then replace all matches of r with s.  Otherwise, h	 is  a	number	indicating  which
			       match  of r to replace.	If t is not supplied, use $0 instead.  Within the replacement text s, the
			       sequence \n, where n is a digit from 1 to 9, may be used to indicate just the  text  that  matched
			       the n'th parenthesized subexpression.  The sequence \0 represents the entire matched text, as does
			       the character &.	 Unlike sub() and gsub(), the modified string is returned as the  result  of  the
			       function, and the original target string is not changed.
				   
				   在字符串t中搜索(如果省略表示在$0中搜索),匹配正则表达式r。如果h是一个以g或G开头的字符串,会把所有t中
				   被r所匹配到的所有字符串替换成s表达的值。如果h是一个数字,表示替换t中被r所匹配到的第h处的字符串替换成
				   s表示的值,h要大于等于0,如果小于0会产生警告信息并且强制把其值作为1来处理。
				   
				   在替换文本s中,可以使用序列\n,n可以为1到9之间的任意单个数组,可以引用前r正则表达式中的第n个分组。例如引用
				   第一个分组(小括号括起来的部分),表示语法为\1,由于\符号本身有特殊含义,所以要转移它。
				   \0表示匹配所有r所匹配的实体,作用相当于符号&。请看下面的示例
				   示例:
				   echo "hello,world\!hello,awk\!hello Linux\!"|awk 'BEGIN{r="[a-zA-Z]+(,)[a-zA-Z]+"}{print gensub(r,"\\1uj","g")}'
				   
				   gensub()函数不像sub()和gsub()函数,它返回的是修改后的字符串的值,而原来的目标字符串t并没有修改或改变。
				   
				   概述:
				   对于t中匹配r的字串,如果h是以”g”或”G”开头的字符串,则将匹配的所有子串替换为s,如果h是数字n,则将第n处匹配进行替换;如果参数t省略,则t为$0
				   

       gsub(r, s [, t])	       For each substring matching the regular expression r in the string t, substitute the string s, and
			       return the number of substitutions.  If t is not supplied, use $0.  An & in the	replacement  text
			       is  replaced  with  the text that was actually matched.	Use \& to get a literal &.  (This must be
			       typed as "\\&"; see GAWK: Effective AWK Programming for a fuller discussion of the rules	 for  &'s
			       and backslashes in the replacement text of sub(), gsub(), and gensub().)
				   以r表示的模式来查找t所表示的字符中的匹配的内容,并将其所有出现均替换为s所表示的内容;
				   函数返回值为返回成功替换的数量。如果t参数省略了,会使用$0。
				   

       index(s, t)	       Return  the  index  of the string t in the string s, or 0 if t is not present.  (This implies that
			       character indices start at one.)
							返回字符串t第一次出现在字符串s位置时的索引值。如果没有匹配的,返回值为0(匹配的时候,索引从1开始)
				   

       length([s])	       Return the length of the string s, or the length of $0 if s is not supplied.   As  a  non-standard
			       extension, with an array argument, length() returns the number of elements in the array.
				   返回指定字符串s的长度或者$0记录值的长度(省略字符串s)。作为非标准扩展,s参数传入一个数组,length()函数将会
				   返回数组元素的个数。
				   

       match(s, r [, a])       Return  the position in s where the regular expression r occurs, or 0 if r is not present, and set
			       the values of RSTART and RLENGTH.  Note that the argument order is the same as for the ~ operator:
			       str  ~ re.  If array a is provided, a is cleared and then elements 1 through n are filled with the
			       portions of s that match the  in r.  The 0'th element  of
			       a contains the portion of s matched by the entire regular expression r.	Subscripts a[n, "start"],
			       and a[n, "length"] provide the starting index in the  string  and  length  respectively,	 of  each
			       matching substring.
				   返回正则表达式r所匹配的部分在s中出现的位置(如果匹配不带,返回值为0),并且设置RSTART和RLENGTH内建变量的值。
				   注意:match()函数的参数出现的位置与~ 操作时类似的: str ~ re
				   如果提供了数组a,数组a的值首先会被清空,然后会向数组中填充值,r部分的正则表达式如果其中包含了子分组,用小括号
				   引用的子分组的正则表达式,那么数组a的元素从1开始到n会被这个子分组所匹配的内容填充,如果有1个字分组,如果有匹配
				   到内容,那么就会设置a[1]的值为子分组所匹配的内容。a[0]会填充整个正则表达式r所匹配的内容。
				   a[n,"start"]和a[n,"length"]提供了每个匹配的子串在整个匹配的整串中的相对开始位置和相对长度。
				   
				   PS:这个特别是后边一部分理解起来有点复杂,是为了应对正则表达式中的分组的概念。
				   参考:
				   https://www.cnblogs.com/timeisbiggestboss/p/7242351.html
				   假设文本的内容为:
                   this is wang,not wan
                   that is chen,not che
                   this is chen,and wang,not wan che

				   awk '{match($0,/^.+is([^,]+).+not(.+)/,a);print a[0],a[1],a[2]}' file
				   $0是s参数的值,表示输入记录,默认是一行一行的;
				   /^.+is([^,]+).+not(.+)/ 是一个正则表达式部分,其中子分组有两个,一个是([^,]+),另外一个是(.+);
				   a表示第三个参数,就是一个数组。
				   
				   整个正则表达式能够匹配到内容,两个子分组也能够匹配到内容。
				   第一行,$0的值为"this is wang,not wan",整个正则表达式所匹配的值为"this is wang,not wan",第一个分组匹配的值为" wang",第二个分组匹配的值为" wan"
				   数组a元素a[0]="this is wang,not wan"",a[1]=" wang",a[2]=" wan"
				   
				   第二行,$0的值为"that is chen,not che",整个正则表达式所匹配的值为"that is chen, not che",第一个分组匹配的值为" chen",第二个分组匹配的值为" che"
				   数组a元素a[0]="this is wang,not wan"",a[1]=" chen",a[2]=" che"
				   
				   第三行,$0的值为"this is chen,and wang,not wan che",整个正则表达式所匹配的值为"this is chen,and wang,not wan che",第一个分组匹配的值为" chen",第二个分组匹配的值为" wan che"
				   数组a元素a[0]="this is wang,not wan"",a[1]=" chen",a[2]=" wan che"
				   
				   
					
				   

       patsplit(s, a [, r [, seps] ])
			       Split the string s into the array a and the separators array seps on the regular expression r, and
			       return the number of fields.  Element values are the portions of s that matched r.  The	value  of
			       seps[i] is the separator that appeared in front of a[i+1].  If r is omitted, FPAT is used instead.
			       The arrays a and seps are cleared first.	 Splitting behaves identically to  field  splitting  with
			       FPAT, described above.
				   
				   使用r(支持正则表达式)来分割字符串s,被正则表达式r所匹配到的分隔符会记录在数组a中,第一个元素从下标1开始。
				   返回值为分割后的字段的数量。seps[i]是出现在分隔符a[i+]之前的字段的值。如果r省略(seps也会省略),会使用
				   内建变量FPAT的值来作为字段分割符。这里的分隔与上文中讲到的字段分割类似。
				   
				   示例:
				      echo 11-22+33*44|awk '{patsplit($0,a,"[-+*]",b)}'
					  i为0时候,a[1]的值为"[-+*]"所匹配到的字段分割符"-",$0中通过"-"分割后,分隔符之前的字段内容为11,所b[0]为11
					  依次类推,a[2]的值为"+",b[1]的值为22;a[3]的值为"*",b[2]的值为33。最后一个字段的44值也会记录到b[3]
				   
				   
				   

       split(s, a [, r [, seps] ])
			       Split the string s into the array a and the separators array seps on the regular expression r, and
			       return the number of fields.  
				   
				   If r is omitted, FS is used instead.  The arrays	a  and	seps  are cleared first.	 
				   
				   seps[i]  is the field separator matched by r between a[i] and a[i+1].
					
				   If r is a single space, then leading whitespace in s goes into the extra array element seps[0] and trailing
			       whitespace  goes	 into the extra array element seps[n], where n is the return value of split(s, a, r, seqs).
			
				   Splitting behaves identically to field splitting, described above.
				   
				   
				   使用r(支持正则表达式)来分隔字符串s,并把分隔后的结果保存到数组a中,整个函数的返回值为被r分隔的字段的数量。
				   如果r省略,将会使用FS的值来分隔s。数组a和数组seps在第一次使用的时候会被清空掉。
				   seps这个数组是用来存储分隔符的,记录的是r匹配到的分隔符,如果r省略,seps也会省略。
				   a[i]和a[a+i]是被分隔符seps[i]所分隔的两个字段的值。
				   
				   如果r是一个单独的空格的时候,s中的前导空格或空白会被记录到seps[0]中,结尾部分的空格或空白会被记录到数组
				   seps[n]中,其中n是函数的返回值,也就是说s被r分隔的字段数量。
				   split函数的分隔行为和字段分隔类似的。
				   
				   
				   
				   

       sprintf(fmt, expr-list) Prints expr-list according to fmt, and returns the resulting string.
							   根据fmt格式打印表达式列表,并且返回最终的字符串结果。
		

       strtonum(str)	       Examine str, and return its numeric value.  If str begins with a	 leading  0,  strtonum()  assumes
			       that  str  is an octal number.  If str begins with a leading 0x or 0X, strtonum() assumes that str
			       is a hexadecimal number.	 Otherwise, decimal is assumed.
				   检测str,并返回其(十进制)的数值。如果str以前导0开始,strtonum()会吧str当作八进制数字。如果str以前导0x或0X开始,
				   strtonum()会把str当作16进制数字。如果str是一个字符串,返回值永远为0。其他情况,str统一被当作10进制处理。
				   
				   

       sub(r, s [, t])	       Just like gsub(), but replace only the first matching substring.
								以r表示的模式来查找t所表示的字符中的匹配的内容,并将其第一次出现替换为s所表示的内容;
								如果t省略,表示从$0中查找。

       substr(s, i [, n])      Return the at most n-character substring of s starting at i.  If n is omitted, use the rest of s.
							   返回字符串s中从第i个位置开始至多n个字符的子字符串。如果n省略, 从第i个位置开始到s余下的部分。
								简单来说,就是字符串s,我可以截取从第i个字符开始往后的最多n个字符,如果没有指定字符的数量n,
								将会截取从第i个字符开始到s字符串的结尾之间的所有字符串。字符串位置索引从1开始,如果i的值为0或者负数,
								统一视为i为1。
								
	   

       tolower(str)	       Return a copy of the string str, with all the uppercase characters in str translated to their
			       corresponding lowercase counterparts.  Non-alphabetic characters are left unchanged.
				   复制str的字符串的值,并把字符串中所有大写字母转成小写字母,并返回整个转换后的字符串。字符串中非字母字符
				   不会转变。
				   

       toupper(str)	       Return a copy of the string str, with all the lowercase characters in str translated to their cor‐
			       responding uppercase counterparts.  Non-alphabetic characters are left unchanged.
				   复制str的字符串的值,并把字符串中所有小写字母转成大写字母,并返回整个转换后的字符串。字符串中非字母字符不会
				   转变。
				   

       Gawk is multibyte aware.	 This means that index(), length(), substr() and match() all work in  terms  of	 characters,  not
       bytes.
	   index(),length(),substr(),match()等是按照字符来工作的,而不是字节。
	   

   Time Functions  #时间函数
       Since  one  of the primary uses of AWK programs is processing log files that contain time stamp information, gawk provides
       the following functions for obtaining time stamps and formatting them.

       mktime(datespec)
		 Turn datespec into a time stamp of the same form as returned by systime(), and return the result.  The	 datespec
		 is  a	string of the form YYYY MM DD HH MM SS[ DST].  The contents of the string are six or seven numbers repre‐
		 senting respectively the full year including century, the month from 1 to 12, the day of the month from 1 to 31,
		 the  hour  of	the  day from 0 to 23, the minute from 0 to 59, the second from 0 to 60, and an optional daylight
		 saving flag.  The values of these numbers need not be within the ranges specified; for example, an  hour  of  -1
		 means	1  hour before midnight.  The origin-zero Gregorian calendar is assumed, with year 0 preceding year 1 and
		 year -1 preceding year 0.  The time is assumed to be in the local timezone.  If the daylight saving flag is pos‐
		 itive,	 the time is assumed to be daylight saving time; if zero, the time is assumed to be standard time; and if
		 negative (the default), mktime() attempts to determine whether daylight saving time is in effect for the  speci‐
		 fied  time.   If  datespec  does  not contain enough elements or if the resulting time is out of range, mktime()
		 returns -1.
		生成时间格式。
		 
		 
       strftime([format [, timestamp[, utc-flag]]])
		 Format timestamp according to the specification in format.  If utc-flag is present and is non-zero or	non-null,
		 the  result  is  in  UTC,  otherwise  the  result is in local time.  The timestamp should be of the same form as
		 returned by systime().	 If timestamp is missing, the current time of day is  used.   If  format  is  missing,	a
		 default  format equivalent to the output of date(1) is used.  The default format is available in PROCINFO["strf‐
		 time"].  See the specification for the strftime() function in ANSI C for the format conversions that are guaran‐
		 teed to be available.
		 格式化时间输出,将时间戳根据指定格式转为时间字符串。
		 

       systime() Return	 the  current time of day as the number of seconds since the Epoch (1970-01-01 00:00:00 UTC on POSIX sys‐
		 tems).
		 
		 打印当前系统时间距离unix元年之间的秒数。等价于date +%s的值。
		 
		 

   Bit Manipulations Functions   #位操作函数
       Gawk supplies the following bit manipulation functions.	They work by converting double-precision floating point values to
       uintmax_t integers, doing the operation, and then converting the result back to floating point.	The functions are:

       and(v1, v2)	   Return the bitwise AND of the values provided by v1 and v2.
						返回v1和v2 值二进制形式按位与运算后的结果,结果是10进制。

       compl(val)	   Return the bitwise complement of val.
						返回对val按位求的补码

       lshift(val, count)  Return the value of val, shifted left by count bits.
							返回val左移count位后的值;

       or(v1, v2)	   Return the bitwise OR of the values provided by v1 and v2.
						返回v1和v2值的二进制形式按位或运算后的结果,结果是10进制。
					

       rshift(val, count)  Return the value of val, shifted right by count bits.
							返回val右移count位后的值;

       xor(v1, v2)	   Return the bitwise XOR of the values provided by v1 and v2.
						返回v1和v2值二进制形式按位异或运算后的结果,结果是10进制。
	   

   Type Function   #函数类型
       The following function is for use with multidimensional arrays.
	   下面的函数用来配合多为数组使用的

       isarray(x)
	      Return true if x is an array, false otherwise.
	   isarray(x) 
           如果x是一个数组,返回值为真,否则为假。	   
		  

   Internationalization Functions  国际化函数
       The  following  functions may be used from within your AWK program for translating strings at run-time.	For full details,
       see GAWK: Effective AWK Programming.

       bindtextdomain(directory [, domain])
	      Specify the directory where gawk looks for the .mo files, in case they will not or cannot be placed in the  ``stan‐
	      dard'' locations (e.g., during testing).	It returns the directory where domain is ``bound.''
	      The default domain is the value of TEXTDOMAIN.  If directory is the null string (""), then bindtextdomain() returns
	      the current binding for the given domain.

       dcgettext(string [, domain [, category]])
	      Return the translation of string in text domain domain for locale category category.  The default value for  domain
	      is the current value of TEXTDOMAIN.  The default value for category is "LC_MESSAGES".
	      If  you  supply  a value for category, it must be a string equal to one of the known locale categories described in
	      GAWK: Effective AWK Programming.	You must also supply a text domain.  Use TEXTDOMAIN if you want to use	the  cur‐
	      rent domain.

       dcngettext(string1 , string2 , number [, domain [, category]])
	      Return  the  plural form used for number of the translation of string1 and string2 in text domain domain for locale
	      category category.  The default value for domain is the current value of TEXTDOMAIN.  The default value  for  cate‐
	      gory is "LC_MESSAGES".
	      If  you  supply  a value for category, it must be a string equal to one of the known locale categories described in
	      GAWK: Effective AWK Programming.	You must also supply a text domain.  Use TEXTDOMAIN if you want to use	the  cur‐
	      rent domain.

USER-DEFINED FUNCTIONS 用户自定义函数
       Functions in AWK are defined as follows:
	   AWK中的自定义函数格式如下:

	      function name(parameter list) { statements }
		  函数名(函数参数列表) {函数语句部分}
		  

       Functions are executed when they are called from within expressions in either patterns or actions.  Actual parameters supplied
        in the function call are used to instantiate the formal parameters declared in the function.  Arrays are  passed  by
       reference, other variables are passed by value.
	   
	   函数可以在模式或者处理动作中的表达式部分调用并执行函数。在调用函数的时候,实际传参用于实例化函数中声名的形参部分。数组
	   可以通过引用传递,其他变量按值传递。

       Since functions were not originally part of the AWK language, the provision for local variables is rather clumsy: They are
       declared as extra parameters in the parameter list.  The convention is to separate local variables from real parameters by
       extra spaces in the parameter list.  For example:
	   
	   因为早期函数不是awk语言的一部分,所以局部变量的提供(定义)相当的不灵活:它们被作为额外的参数在函数定义参数列表中申明。
       使用惯例是在函数形参列表中使用额外的空格来分隔局部变量和其他非局部变量。   

	      function	f(p, q,	    a, b)   # a and b are local   #这里的a和b是局部变量
	      {
		   ...
	      }

	      /abc/	{ ... ; f(1, 2) ; ... }

       The  left  parenthesis  in  a  function	call is required to immediately follow the function name, without any intervening
       whitespace.  This avoids a syntactic ambiguity with the concatenation operator.	This restriction does not  apply  to  the
       built-in functions listed above.
       在函数调用的时候,函数名和小括号之间不能有空格,调用形式类似于function_name(parameterlist...)
	   之所以这样规定,是为了避免与其他连接操作混淆语法。这个现在是awk的函数中不适用。
	   
	   
       Functions  may  call  each other and may be recursive.  Function parameters used as local variables are initialized to the
       null string and the number zero upon function invocation.
	   函数可以相互调用并且支持递归调用。用作局部变量的函数参数在函数调用时初始化为空字符串和零号
	   

       Use return expr to return a value from a function.  The return value is undefined if no value is provided, or if the function
       returns by “falling off” the end.
	   return expr语句可以返回一个值。如果没有显式提供,这个返回的值是未定义的或者函数调用结束。
	   

       As  a  gawk  extension, functions may be called indirectly. To do this, assign the name of the function to be called, as a
       string, to a variable.  Then use the variable as if it were the name of a function, prefixed with an @ sign, like so:
	      function	myfunc()
	      {
		   print "myfunc called"
		   ...
	      }

	      {	   ...
		   the_func = "myfunc"
		   @the_func()	  # call through the_func to myfunc
		   ...
	      }
	  作为gawk的扩展,函数可以被间接调用。在调用函数的时候,就像赋值字符串一样,直接赋值函数的名字给一个变量。	 然后使用
	  这个变量就像它是函数的名字一样,需要在变量的签名加上符号@.
		  

       If --lint has been provided, gawk warns about calls to undefined functions at parse time, instead of at run time.  Calling
       an undefined function at run time is a fatal error.
	   如果--lint选项有提供,gawk在调用未指定函数的时候,直接在解析的时候会有警告信息,而不是在运行函数的时候。在运行时候调用一个未定义
	   的函数会产生一个语法错误。

       The word func may be used in place of function.
	   在定义函数的时候,可以使用func关键字代理function关键字。
	   

DYNAMICALLY LOADING NEW FUNCTIONS  #动态加载新函数
       You  can dynamically add new built-in functions to the running gawk interpreter.	 The full details are beyond the scope of
       this manual page; see GAWK: Effective AWK Programming for the details.
	   你可以动态地向gawk解析器添加新的内建函数。详细内容已经超过来man手册的范畴,可以看gawk的:有效AWK编程的细节

       extension(object, function)
	       Dynamically link the shared object file named by object, and invoke function in that object, to	perform	 initialization.
	       These should both be provided as strings.  Return the value returned by function.
			动态的链接指定的object这个共享对象,并且在该对象中调用函数,去执行初始化。这些参数都应该作为字符串提供。返回函数返回的值。
		   
		   
       Using this feature at the C level is not pretty, but it is unlikely to go away. Additional mechanisms may be added at some
       point.
	   在C语言级别这样干并不是非常好,不过支持这样做。在某些场景可以添加附加的机制。

SIGNALS  #信号
       pgawk accepts two signals.  SIGUSR1 causes it to dump a profile and function call stack to  the	profile	 file,	which  is
       either  awkprof.out, or whatever file was named with the --profile option.  It then continues to run.  SIGHUP causes pgawk
       to dump the profile and function call stack and then exit.

INTERNATIONALIZATION #国际化
       String constants are sequences of characters enclosed in double quotes.	In non-English speaking environments, it is  pos‐
       sible  to  mark strings in the AWK program as requiring translation to the local natural language. Such strings are marked
       in the AWK program with a leading underscore (“_”).  For example,

	      gawk 'BEGIN { print "hello, world" }'

       always prints hello, world.  But,

	      gawk 'BEGIN { print _"hello, world" }'

       might print bonjour, monde in France.

       There are several steps involved in producing and running a localizable AWK program.

       1.  Add a BEGIN action to assign a value to the TEXTDOMAIN variable to set the text domain to a name associated with  your
	   program:

	   BEGIN { TEXTDOMAIN = "myprog" }

       This  allows  gawk  to  find  the  .mo  file associated with your program.  Without this step, gawk uses the messages text
       domain, which likely does not contain translations for your program.

       2.  Mark all strings that should be translated with leading underscores.

       3.  If necessary, use the dcgettext() and/or bindtextdomain() functions in your program, as appropriate.

       4.  Run gawk --gen-pot -f myprog.awk > myprog.pot to generate a .po file for your program.

       5.  Provide appropriate translations, and build and install the corresponding .mo files.

       The internationalization features are described in full detail in GAWK: Effective AWK Programming.

POSIX COMPATIBILITY  #POSIX兼容
       A primary goal for gawk is compatibility with the POSIX standard, as well as with the latest version of UNIX awk.  To this
       end,  gawk  incorporates	 the following user visible features which are not described in the AWK book, but are part of the
       Bell Laboratories version of awk, and are in the POSIX standard.

       The book indicates that command line variable assignment happens when awk would otherwise open the  argument  as	 a  file,
       which  is after the BEGIN block is executed.  However, in earlier implementations, when such an assignment appeared before
       any file names, the assignment would happen before the BEGIN block was run.  Applications came to  depend  on  this  “fea‐
       ture.”	When  awk  was changed to match its documentation, the -v option for assigning variables before program execution
       was added to accommodate applications that depended upon the old behavior.  (This feature was agreed upon by both the Bell
       Laboratories and the GNU developers.)

       When  processing	 arguments,  gawk uses the special option “--” to signal the end of arguments.	In compatibility mode, it
       warns about but otherwise ignores undefined options.  In normal operation, such arguments are passed on to the AWK program
       for it to process.

       The  AWK	 book  does  not  define the return value of srand().  The POSIX standard has it return the seed it was using, to
       allow keeping track of random number sequences.	Therefore srand() in gawk also returns its current seed.

       Other new features are: The use of multiple -f options (from MKS awk); the ENVIRON array; the \a, and \v escape	sequences
       (done  originally in gawk and fed back into the Bell Laboratories version); the tolower() and toupper() built-in functions
       (from the Bell Laboratories version); and the ANSI C conversion specifications in printf (done first in the Bell Laborato‐
       ries version).

HISTORICAL FEATURES  #历史的特性
       There  is  one  feature of historical AWK implementations that gawk supports: It is possible to call the length() built-in
       function not only with no argument, but even without parentheses!  Thus,

	      a = length     # Holy Algol 60, Batman!

       is the same as either of

	      a = length()
	      a = length($0)

       Using this feature is poor practice, and gawk issues a warning about its use if --lint is specified on the command line.

GNU EXTENSIONS  #GNU扩展
       Gawk has a number of extensions to POSIX awk.  They are described in this section.  All the extensions described here  can
       be disabled by invoking gawk with the --traditional or --posix options.
	   
	   gawk相较于POSIX 的awk有一些扩展。这些扩展在下文会给出。默认就是支持GNU的这些扩展,如果要关闭这些扩展,可以加上选项--traditional或--posix。
	   

       The following features of gawk are not available in POSIX awk.  #以下特性在POSIX的awk下是不可用的。

       · No  path  search is performed for files named via the -f option.  Therefore the AWKPATH environment variable is not special.

       · There is no facility for doing file inclusion (gawk's @include mechanism).

       · The \x escape sequence.  (Disabled with --posix.)

       · The ability to continue lines after ?	and :.	(Disabled with --posix.)

       · Octal and hexadecimal constants in AWK programs.

       · The ARGIND, BINMODE, ERRNO, LINT, RT and TEXTDOMAIN variables are not special.

       · The IGNORECASE variable and its side-effects are not available.

       · The FIELDWIDTHS variable and fixed-width field splitting.

       · The FPAT variable and field splitting based on field values.

       · The PROCINFO array is not available.

       · The use of RS as a regular expression.

       · The special file names available for I/O redirection are not recognized.

       · The |& operator for creating co-processes.

       · The BEGINFILE and ENDFILE special patterns are not available.

       · The ability to split out individual characters using the null string as the value of FS, and as the  third  argument  to
	 split().

       · An optional fourth argument to split() to receive the separator texts.

       · The optional second argument to the close() function.

       · The optional third argument to the match() function.

       · The ability to use positional specifiers with printf and sprintf().

       · The ability to pass an array to length().

       · The use of delete array to delete the entire contents of an array.

       · The use of nextfile to abandon processing of the current input file.

       · The  and(), asort(), asorti(), bindtextdomain(), compl(), dcgettext(), dcngettext(), gensub(), lshift(), mktime(), or(),
	 patsplit(), rshift(), strftime(), strtonum(), systime() and xor() functions.

       · Localizable strings.

       · Adding new built-in functions dynamically with the extension() function.

       The AWK book does not define the return value of the close() function.  Gawk's close() returns the value	 from  fclose(3),
       or  pclose(3),  when  closing  an output file or pipe, respectively.  It returns the process's exit status when closing an
       input pipe.  The return value is -1 if the named file, pipe or co-process was not opened with a redirection.

       When gawk is invoked with the --traditional option, if the fs argument to the -F option is “t”, then FS is set to the  tab
       character.   Note  that	typing	gawk -F\t ...  simply causes the shell to quote the “t,” and does not pass “\t” to the -F
       option.	Since this is a rather ugly special case, it is not the default behavior.  This behavior also does not	occur  if
       --posix	has  been specified.  To really get a tab character as the field separator, it is best to use single quotes: gawk
       -F'\t' ....

ENVIRONMENT VARIABLES  #环境变量
       The AWKPATH environment variable can be used to provide a list of directories that gawk searches when  looking  for  files
       named via the -f and --file options.

       For   socket   communication,   two  special  environment  variables  can  be  used  to	control	 the  number  of  retries
       (GAWK_SOCK_RETRIES), and the interval between retries (GAWK_MSEC_SLEEP).	 The interval is in milliseconds. On systems that
       do not support usleep(3), the value is rounded up to an integral number of seconds.

       If  POSIXLY_CORRECT  exists  in the environment, then gawk behaves exactly as if --posix had been specified on the command
       line.  If --lint has been specified, gawk issues a warning message to this effect.
  
EXIT STATUS  #退出状态
       If the exit statement is used with a value, then gawk exits with the numeric value given to it.

       Otherwise, if there were no problems during execution, gawk exits with the value of the C constant EXIT_SUCCESS.	 This  is
       usually zero.

       If an error occurs, gawk exits with the value of the C constant EXIT_FAILURE.  This is usually one.

       If  gawk	 exits	because	 of  a	fatal  error,  the  exit  status is 2.	On non-POSIX systems, this value may be mapped to
       EXIT_FAILURE.

VERSION INFORMATION   #版本信息
       This man page documents gawk, version 4.0.

AUTHORS   #作者
       The original version of UNIX awk was designed and implemented by Alfred Aho, Peter Weinberger, and Brian Kernighan of Bell
       Laboratories.  Brian Kernighan continues to maintain and enhance it.

       Paul  Rubin  and	 Jay Fenlason, of the Free Software Foundation, wrote gawk, to be compatible with the original version of
       awk distributed in Seventh Edition UNIX.	 John Woods contributed a number of bug fixes.	David Trueman, with contributions
       from Arnold Robbins, made gawk compatible with the new version of UNIX awk.  Arnold Robbins is the current maintainer.

       The  initial DOS port was done by Conrad Kwok and Scott Garfinkle.  Scott Deifik maintains the port to MS-DOS using DJGPP.
       Eli Zaretskii maintains the port to MS-Windows using MinGW.  Pat Rankin did the port to VMS, and Michal Jaegermann did the
       port  to	 the  Atari  ST.  The port to OS/2 was done by Kai Uwe Rommel, with contributions and help from Darrel Hankerson.
       Andreas Buening now maintains the OS/2 port.  The late Fred Fish supplied support for the Amiga, and Martin Brown provided
       the BeOS port.  Stephen Davies provided the original Tandem port, and Matthew Woehlke provided changes for Tandem's POSIX-
       compliant systems.  Dave Pitts provided the port to z/OS.

       See the README file in the gawk distribution for up-to-date information about maintainers and which  ports  are	currently
       supported.

BUG REPORTS  #BUG报告
       If  you find a bug in gawk, please send electronic mail to [email protected].  Please include your operating system and its
       revision, the version of gawk (from gawk --version), which C compiler you used to compile it, and a test program and  data
       that are as small as possible for reproducing the problem.

       Before  sending	a  bug	report,	 please do the following things.  First, verify that you have the latest version of gawk.
       Many bugs (usually subtle ones) are fixed at each release, and if yours is out of date, the problem may already have  been
       solved.	 Second, please see if setting the environment variable LC_ALL to LC_ALL=C causes things to behave as you expect.
       If so, it's a locale issue, and may or may not really be a bug.	Finally, please read this man page and the reference man‐
       ual carefully to be sure that what you think is a bug really is, instead of just a quirk in the language.

       Whatever	 you  do, do NOT post a bug report in comp.lang.awk.  While the gawk developers occasionally read this newsgroup,
       posting bug reports there is an unreliable way to report bugs.  Instead, please use the electronic  mail	 addresses  given
       above.

       If  you're  using a GNU/Linux or BSD-based system, you may wish to submit a bug report to the vendor of your distribution.
       That's fine, but please send a copy to the official email address as well, since there's no guarantee that the bug  report
       will be forwarded to the gawk maintainer.

BUGS   #bug
       The  -F option is not necessary given the command line variable assignment feature; it remains only for backwards compati‐
       bility.

       Syntactically invalid single character programs tend to overflow the parse stack, generating a rather  unhelpful	 message.
       Such programs are surprisingly difficult to diagnose in the completely general case, and the effort to do so really is not
       worth it.

SEE ALSO   #其他参考(学习)
       egrep(1), getpid(2), getppid(2), getpgrp(2), getuid(2), geteuid(2), getgid(2), getegid(2), getgroups(2), usleep(3)

       The AWK Programming Language, Alfred V. Aho,  Brian  W.	Kernighan,  Peter  J.  Weinberger,  Addison-Wesley,  1988.   ISBN
       0-201-07981-X.

       GAWK:  Effective	 AWK  Programming,  Edition  4.0,  shipped with the gawk source.  The current version of this document is
       available online at http://www.gnu.org/software/gawk/manual.
 
EXAMPLES   #示例
       Print and sort the login names of all users:

	    BEGIN     { FS = ":" }
		 { print $1 | "sort" }

       Count lines in a file:

		 { nlines++ }
	    END	 { print nlines }

       Precede each line by its number in the file:

	    { print FNR, $0 }

       Concatenate and line number (a variation on a theme):

	    { print NR, $0 }

       Run an external command for particular lines of data:

	    tail -f access_log |
	    awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'

ACKNOWLEDGEMENTS  #致谢
       Brian Kernighan of Bell Laboratories provided valuable assistance during testing and debugging.	We thank him.

COPYING PERMISSIONS  #复制权限
       Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,  2007,  2009,  2010,
       2011 Free Software Foundation, Inc.

       Permission  is  granted	to make and distribute verbatim copies of this manual page provided the copyright notice and this
       permission notice are preserved on all copies.

       Permission is granted to copy and distribute modified versions of this manual page under the conditions for verbatim copy‐
       ing,  provided  that  the entire resulting derived work is distributed under the terms of a permission notice identical to
       this one.

       Permission is granted to copy and distribute translations of this manual page into another language, under the above  con‐
       ditions	for  modified versions, except that this permission notice may be stated in a translation approved by the Founda‐
       tion.



Free Software Foundation				   Dec 07 2012							  GAWK(1)

猜你喜欢

转载自blog.csdn.net/u012271055/article/details/84669343
awk