一、背景

构建一个简单计算器，识别输入的计算表达式并计算结果。通过计算器程序来说明 lex & yacc 的开发过程和 Lex 的结构规范。

二、环境

➢ 系统： CentOS 7.5
➢ 编译器： gcc - 4.8.5
➢ lex： flex 2.5.37
➢ Yacc： bison (GNU Bison) 3.0.4

安装 flex 和 bison 。
yum install flex bison

如果在编译链接过程中出现以下错误：
/usr/bin/ld: cannot find -lfl
请重新安装flex：
yum remove flex
yum install flex

三、简单计算器实现

计算器实现整数的 +、-、*、/、% 五种简单运算。

词法分析程序 cal.l

%{
#include "cal.tab.h"
extern int yylval;
%}

%%
[0-9]+  { yylval = atoi(yytext); return NUMBER; }
[ \t]   ;       /* ignore white space */
\n  return 0;   /* logical EOF */
.   return yytext[0];
%%

代码中定义了四条规则，前面的部分就是模式，处于一行的开始位置，后面部分是动作，也就是，输入中匹配到了这个模式的时候，对应进行什么动作（就像机器人接受到了什么样的指令，然后会执行相应的动作一样）

第一个模式，匹配连续一个或者多个数字，匹配到之后就返回标签NUMBER。
第二个模式，匹配空格，没有任何操作，忽略所有空格。
第三个模式，匹配一个换行符，匹配到之后结束匹配。
第四个模式，匹配出了\n之外的字符，返回该字符。

总体来说，匹配到连续数字，则返回NUMBER；忽略空格；换行结束；匹配到任何其他字符返回字符。

语法分析程序 cal.y

%{
#include <stdio.h>
%}
%token NAME NUMBER
%%
statement:  NAME '=' expression
    |   expression      { printf("= %d\n", $1); }
    ;

expression: expression '+' NUMBER   { $$ = $1 + $3; }
    |   expression '-' NUMBER   { $$ = $1 - $3; }
    |   expression '*' NUMBER   { $$ = $1 * $3; }
    |   expression '/' NUMBER   { $$ = $1 / $3; }
    |   expression '%' NUMBER   { $$ = $1 % $3; }
    |   NUMBER          { $$ = $1; }
    ;
%%
int main()
{
    yyparse();
    return 0;
}

int yyerror(char *s)
{
    printf("%s/n",s);
    return 0;
}

编译运行过程

[appusr@postgre cal]$ ls -l
total 8
-rw-rw-r--. 1 appusr appusr 200 Aug 16 11:27 cal.l
-rw-rw-r--. 1 appusr appusr 533 Aug 16 14:11 cal.y

/* 1.  编译lex文件，生成lex.yy.c文件 */
[appusr@postgre cal]$ flex cal.l
[appusr@postgre cal]$ ls -l
total 52
-rw-rw-r--. 1 appusr appusr   200 Aug 16 11:27 cal.l
-rw-rw-r--. 1 appusr appusr   533 Aug 16 14:11 cal.y
-rw-rw-r--. 1 appusr appusr 44068 Aug 16 15:58 lex.yy.c

/* 2. 编译yacc文件，生成cal.tab.h 与cal.tab.c文件 */
[appusr@postgre cal]$ bison -d cal.y
[appusr@postgre cal]$ ls -l
total 100
-rw-rw-r--. 1 appusr appusr   200 Aug 16 11:27 cal.l
-rw-rw-r--. 1 appusr appusr 44255 Aug 16 15:58 cal.tab.c
-rw-rw-r--. 1 appusr appusr  2063 Aug 16 15:58 cal.tab.h
-rw-rw-r--. 1 appusr appusr   533 Aug 16 14:11 cal.y
-rw-rw-r--. 1 appusr appusr 44068 Aug 16 15:58 lex.yy.c

/* 3. 链接生成的.c 文件，并生成相应的可执行文件 cal */
[appusr@postgre cal]$ gcc -o cal cal.tab.c lex.yy.c -lfl
[appusr@postgre cal]$ ls -l
total 128
-rwxrwxr-x. 1 appusr appusr 28632 Aug 16 15:58 cal
-rw-rw-r--. 1 appusr appusr   200 Aug 16 11:27 cal.l
-rw-rw-r--. 1 appusr appusr 44255 Aug 16 15:58 cal.tab.c
-rw-rw-r--. 1 appusr appusr  2063 Aug 16 15:58 cal.tab.h
-rw-rw-r--. 1 appusr appusr   533 Aug 16 14:11 cal.y
-rw-rw-r--. 1 appusr appusr 44068 Aug 16 15:58 lex.yy.c

/* 4. 运行可执行文件cal，计算简单表达式  */
[appusr@postgre cal]$ ./cal
22*33
= 726
[appusr@postgre cal]$

四、Lex 的结构规范

lex是用来生成词法分析器的
lex源文件扩展名.l
分为三个段：定义段、规则段、用户子程序段

/* 定义段 */
%{
...
%}
...

%%
    /* 规则段 */
...

%%
    /* 用户子程序段 */
...

三个段用 %% 进行分隔:

1. 定义段

定义段包括文字块、定义、内部表声明、起始条件和转换。
C语言的注释、头文件包含等一般就放在%{%}之间，这一部分的内容会被直接复制到输出文件的开头部分。
例如：

%{
#include "cal.tab.h"
extern int yylval;
%}

2. 规则段

规则段为一系列匹配模式和动作，模式一般使用正则表达式书写，动作部分为C代码：

模式1 {动作1 （C代码）}

例如：
[0-9]+  { yylval = atoi(yytext); return NUMBER; }

在输入和模式1匹配的时候，执行动作部分的代码。
C代码被逐字拷贝到生成的C文件中。

当lex扫描程序运行时，它把输入与规则段的模式进行匹配。
➢ 每次发现一个匹配（被匹配的输入称为标记（token））时就执行与那种模式相关的C代码。
➢ 如果模式后面跟着 | 符号，则该模式将使用与文件中下一个模式相同的C代码。
➢ 当输入字符不匹配模式时，词法分析程序的动作就好像它匹配上了代码ECHO的模式，ECHO将标记的拷贝写到输出。

3. 用户子程序段

这里为C代码，会被原样复制到c文件中，一般这里定义一些辅助函数等，如动作代码中使用到的辅助函数。
如果重新定义input()、unput()、output()、或者yywrap()，新的版本或者支持子程序，都可以放在这里。

词法分析器所做的，就是在输入中寻找字符的模式（pattern）。
在词法分析器中，我们要给定我们需要识别的模式，因此需要使用一种方式来描述模式，这就是常用的正则表达式。

五、总结

本文通过亲自动手构建一个计算器实现来示范 lex & yacc 的开发过程和 Lex 的结构规范，加深 lex & yacc 的编程理解。

Lex & Yacc 学习笔记（2）- 简单计算器