编译器之语法分析器(syntax analyzer)

语法分析器根据语法将标记流(来自词法分析器)转换为语法树。

目标

从“词法分析器” 任务中获取输出,并根据以下语法将其转换为抽象语法树(AST)。输出应为展平格式。

程序应从文件和/或stdin读取输入,并将输出写入文件和/或stdout。如果使用的语言具有解析器模块/库/类,则提供两种版本的解决方案将是很好的选择:一个不带解析器模块,另一个带解析器模块。

语法

    stmt_list           =   {stmt} ;
 
    stmt                =   ';'
                          | Identifier '=' expr ';'
                          | 'while' paren_expr stmt
                          | 'if' paren_expr stmt ['else' stmt]
                          | 'print' '(' prt_list ')' ';'
                          | 'putc' paren_expr ';'
                          | '{' stmt_list '}'
                          ;
 
    paren_expr          =   '(' expr ')' ;
 
    prt_list            =   (string | expr) {',' (String | expr)} ;
 
    expr                =   and_expr            {'||' and_expr} ;
    and_expr            =   equality_expr       {'&&' equality_expr} ;
    equality_expr       =   relational_expr     [('==' | '!=') relational_expr] ;
    relational_expr     =   addition_expr       [('<' | '<=' | '>' | '>=') addition_expr] ;
    addition_expr       =   multiplication_expr {('+' | '-') multiplication_expr} ;
    multiplication_expr =   primary             {('*' | '/' | '%') primary } ;
    primary             =   Identifier
                          | Integer
                          | '(' expr ')'
                          | ('+' | '-' | '!') primary

得到的AST应该表示为二叉树。
示例
给定一个简单程序(如下),该程序存储在一个名为while.t的文件中,使用一种词法分析器创建标记列表。

lex < while.t > while.lex

运行一种语法分析器

parse < while.lex > while.ast

while.t

count = 1;
 while (count < 10) {
     print("count is: ", count, "\n");
     count = count + 1;
 }

while.lex

    1      1 Identifier      count
    1      7 Op_assign
    1      9 Integer             1
    1     10 Semicolon
    2      1 Keyword_while
    2      7 LeftParen
    2      8 Identifier      count
    2     14 Op_less
    2     16 Integer            10
    2     18 RightParen
    2     20 LeftBrace
    3      5 Keyword_print
    3     10 LeftParen
    3     11 String          "count is: "
    3     23 Comma
    3     25 Identifier      count
    3     30 Comma
    3     32 String          "\n"
    3     36 RightParen
    3     37 Semicolon
    4      5 Identifier      count
    4     11 Op_assign
    4     13 Identifier      count
    4     19 Op_add
    4     21 Integer             1
    4     22 Semicolon
    5      1 RightBrace
    6      1 End_of_input

while.ast

Sequence
Sequence
;
Assign
Identifier    count
Integer       1
While
Less
Identifier    count
Integer       10
Sequence
Sequence
;
Sequence
Sequence
Sequence
;
Prts
String        "count is: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1

详细说明
节点类型名称列表

Identifier String Integer Sequence If Prtc Prts Prti While Assign Negate Not Multiply Divide Mod
Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or

在下文中,Null/Empty节点代表";"。

非根(内部)节点
对于操作符,应该创建以下节点:

Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or

对于上面的每个节点,左子节点和右子节点是各自操作的操作数。
采用伪S-Expression格式:

(Operator expression expression)

Negate, Not
对于这些节点类型,左节点是操作数,右节点为空。

(Operator expression ;)

Sequence - 子节点是statement或Sequence。
If - 左节点是expression,右节点是if节点, 同时它的左节点是if-true语块右节点是if-false (else) 语块。

(If expression (If statement else-statement))

如果没有else, 这棵树变成:

(If expression (If statement ;))

Prtc

(Prtc (expression) ;)

Prts

(Prts (String "the string") ;)

Prti

(Prti (Integer 12345) ;)

While - 左节点是expression,右节点是statement.

(While expression statement)

Assign - 左节点是赋值的左边,右节点是赋值的右边。

(Assign Identifier expression)

终端(叶子)节点:

Identifier: (Identifier ident_name)
Integer:    (Integer 12345)
String:     (String "Hello World!")
";":        Empty node

举例
实现以下程序:

   a=11;

用二叉树来实现以下AST:
每个非叶子节点下有两个’|'行。第一个表示左子节点,第二个表示右子节点

   (1) Sequence
   (2)     |-- ;
   (3)     |-- Assign
   (4)         |-- Identifier: a
   (5)         |-- Integer: 11

扁平化形式:

   (1) Sequence
   (2) ;
   (3) Assign
   (4) Identifier    a
   (5) Integer       11

实现以下程序:

   a=11;
   b=22;
   c=33;

生成AST:

   ( 1) Sequence
   ( 2)     |-- Sequence
   ( 3)     |   |-- Sequence
   ( 4)     |   |   |-- ;
   ( 5)     |   |   |-- Assign
   ( 6)     |   |       |-- Identifier: a
   ( 7)     |   |       |-- Integer: 11
   ( 8)     |   |-- Assign
   ( 9)     |       |-- Identifier: b
   (10)     |       |-- Integer: 22
   (11)     |-- Assign
   (12)         |-- Identifier: c
   (13)         |-- Integer: 33

扁平化形式:

   ( 1) Sequence
   ( 2) Sequence
   ( 3) Sequence
   ( 4) ;
   ( 5) Assign
   ( 6) Identifier    a
   ( 7) Integer       11
   ( 8) Assign
   ( 9) Identifier    b
   (10) Integer       22
   (11) Assign
   (12) Identifier    c
   (13) Integer       33

解析器的伪代码
使用优先级上升进行表达式解析,并使用递归下降进行语句解析

def expr(p)
    if tok is "("
        x = paren_expr()
    elif tok in ["-", "+", "!"]
        gettok()
        y = expr(precedence of operator)
        if operator was "+"
            x = y
        else
            x = make_node(operator, y)
    elif tok is an Identifier
        x = make_leaf(Identifier, variable name)
        gettok()
    elif tok is an Integer constant
        x = make_leaf(Integer, integer value)
        gettok()
    else
        error()
 
    while tok is a binary operator and precedence of tok >= p
        save_tok = tok
        gettok()
        q = precedence of save_tok
        if save_tok is not right associative
            q += 1
        x = make_node(Operator save_tok represents, x, expr(q))
 
    return x
 
def paren_expr()
    expect("(")
    x = expr(0)
    expect(")")
    return x
 
def stmt()
    t = NULL
    if accept("if")
        e = paren_expr()
        s = stmt()
        t = make_node(If, e, make_node(If, s, accept("else") ? stmt() : NULL))
    elif accept("putc")
        t = make_node(Prtc, paren_expr())
        expect(";")
    elif accept("print")
        expect("(")
        repeat
            if tok is a string
                e = make_node(Prts, make_leaf(String, the string))
                gettok()
            else
                e = make_node(Prti, expr(0))
 
            t = make_node(Sequence, t, e)
        until not accept(",")
        expect(")")
        expect(";")
    elif tok is ";"
        gettok()
    elif tok is an Identifier
        v = make_leaf(Identifier, variable name)
        gettok()
        expect("=")
        t = make_node(Assign, v, expr(0))
        expect(";")
    elif accept("while")
        e = paren_expr()
        t = make_node(While, e, stmt()
    elif accept("{")
        while tok not equal "}" and tok not equal end-of-file
            t = make_node(Sequence, t, stmt())
        expect("}")
    elif tok is end-of-file
        pass
    else
        error()
    return t
 
def parse()
    t = NULL
    gettok()
    repeat
        t = make_node(Sequence, t, stmt())
    until tok is end-of-file
    return t

一旦构建了AST,就应该以扁平格式输出它。这可以像下面这样

def prt_ast(t)
    if t == NULL
        print(";\n")
    else
        print(t.node_type)
        if t.node_type in [Identifier, Integer, String]     # leaf node
            print the value of the Ident, Integer or String, "\n"
        else
            print("\n")
            prt_ast(t.left)
            prt_ast(t.right)

如果正确地构建了AST,那么将它用以下操作加载到后续程序

def load_ast()
    line = readline()
    # Each line has at least one token
    line_list = tokenize the line, respecting double quotes
 
    text = line_list[0] # first token is always the node type
 
    if text == ";"   # a terminal node
        return NULL
 
    node_type = text # could convert to internal form if desired
 
    # A line with two tokens is a leaf node
    # Leaf nodes are: Identifier, Integer, String
    # The 2nd token is the value
    if len(line_list) > 1
        return make_leaf(node_type, line_list[1])
 
    left = load_ast()
    right = load_ast()
    return make_node(node_type, left, right)

最后,还可以通过对AST解释器解决方案之一运行AST来测试它。
假设它测试程序一个名为prime.t的文件中
lex <prime.t | parse
输入到词法分析器的代码:

/*
 Simple prime number generator
 */
count = 1;
n = 1;
limit = 100;
while (n < limit) {
    k=3;
    p=1;
    n=n+2;
    while ((k*k<=n) && (p)) {
        p=n/k*k!=n;
        k=k+2;
    }
    if (p) {
        print(n, " is prime\n");
        count = count + 1;
    }
}
print("Total primes found: ", count, "\n");

从词法分析器输出,并输入到语法分析器:

    4      1 Identifier      count
    4      7 Op_assign
    4      9 Integer             1
    4     10 Semicolon
    5      1 Identifier      n
    5      3 Op_assign
    5      5 Integer             1
    5      6 Semicolon
    6      1 Identifier      limit
    6      7 Op_assign
    6      9 Integer           100
    6     12 Semicolon
    7      1 Keyword_while
    7      7 LeftParen
    7      8 Identifier      n
    7     10 Op_less
    7     12 Identifier      limit
    7     17 RightParen
    7     19 LeftBrace
    8      5 Identifier      k
    8      6 Op_assign
    8      7 Integer             3
    8      8 Semicolon
    9      5 Identifier      p
    9      6 Op_assign
    9      7 Integer             1
    9      8 Semicolon
   10      5 Identifier      n
   10      6 Op_assign
   10      7 Identifier      n
   10      8 Op_add
   10      9 Integer             2
   10     10 Semicolon
   11      5 Keyword_while
   11     11 LeftParen
   11     12 LeftParen
   11     13 Identifier      k
   11     14 Op_multiply
   11     15 Identifier      k
   11     16 Op_lessequal
   11     18 Identifier      n
   11     19 RightParen
   11     21 Op_and
   11     24 LeftParen
   11     25 Identifier      p
   11     26 RightParen
   11     27 RightParen
   11     29 LeftBrace
   12      9 Identifier      p
   12     10 Op_assign
   12     11 Identifier      n
   12     12 Op_divide
   12     13 Identifier      k
   12     14 Op_multiply
   12     15 Identifier      k
   12     16 Op_notequal
   12     18 Identifier      n
   12     19 Semicolon
   13      9 Identifier      k
   13     10 Op_assign
   13     11 Identifier      k
   13     12 Op_add
   13     13 Integer             2
   13     14 Semicolon
   14      5 RightBrace
   15      5 Keyword_if
   15      8 LeftParen
   15      9 Identifier      p
   15     10 RightParen
   15     12 LeftBrace
   16      9 Keyword_print
   16     14 LeftParen
   16     15 Identifier      n
   16     16 Comma
   16     18 String          " is prime\n"
   16     31 RightParen
   16     32 Semicolon
   17      9 Identifier      count
   17     15 Op_assign
   17     17 Identifier      count
   17     23 Op_add
   17     25 Integer             1
   17     26 Semicolon
   18      5 RightBrace
   19      1 RightBrace
   20      1 Keyword_print
   20      6 LeftParen
   20      7 String          "Total primes found: "
   20     29 Comma
   20     31 Identifier      count
   20     36 Comma
   20     38 String          "\n"
   20     42 RightParen
   20     43 Semicolon
   21      1 End_of_input

语法分析器的输出:

Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    count
Integer       1
Assign
Identifier    n
Integer       1
Assign
Identifier    limit
Integer       100
While
Less
Identifier    n
Identifier    limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier    k
Integer       3
Assign
Identifier    p
Integer       1
Assign
Identifier    n
Add
Identifier    n
Integer       2
While
And
LessEqual
Multiply
Identifier    k
Identifier    k
Identifier    n
Identifier    p
Sequence
Sequence
;
Assign
Identifier    p
NotEqual
Multiply
Divide
Identifier    n
Identifier    k
Identifier    k
Identifier    n
Assign
Identifier    k
Add
Identifier    k
Integer       2
If
Identifier    p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier    n
;
Prts
String        " is prime\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1
;
Sequence
Sequence
Sequence
;
Prts
String        "Total primes found: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;

代码实现

C实现
Go实现
Java实现
JavaScript实现
Python实现

发布了69 篇原创文章 · 获赞 189 · 访问量 9万+

猜你喜欢

转载自blog.csdn.net/qq_36721220/article/details/103599712