Compiler Principle Experiment (3) Comprehensive Design of Lexical Grammar Analysis

Copyright statement: This article is an original article, and the copyright belongs to Geekerstar .

Link to this article: http://www.geekerstar.com/technology/105.html

Except for specially marked articles, welcome to reprint, but please be sure to indicate the source, the format is as above, thank you for your cooperation.

1 Overview

Through a certain high-level language (such as C/C++, Java) to realize the function of lexical and syntax analyzer.

2 Experimental objectives

  1. Understand and master the principles and methods of lexical and grammatical analysis.

  2. Able to implement lexical and grammatical analysis programs in a certain language.

  3. Have a complete and clear understanding of the basic concepts, principles and methods of compilation, and be able to use them correctly and skillfully.

3 Experiment description

3.1 Experimental requirements

Using lexical analysis program as a method of grammatical analysis program subroutine to realize lexical and grammatical analysis.

Specific requirements are as follows:

  • The lexical analysis program is a subroutine of the syntax analysis program.

  • Input data: program segment.

  • Output result: grammatical analysis result, including error list.

3.2 Overview of this experiment

The experiment written in C ++ syntax analyzer, on this basis, using recursive descent
syntax checking and structure analysis procedures, lexical analysis program word sequence provided by the partial
analysis. PL / 0 language features simple, clearly structured, readable, but with the general high-level
must be part of the programming language, so the use of the compiler PL / 0 language fully to an embodiment of
high-level language compiler to achieve basic Method and technology, so PL/0 language is used for lexical analysis.

4 Technical analysis

4.1 Lexical analysis

This experiment mainly uses basic data structure analysis, C++ programming technology, compilation principles, etc.
The basic concepts of compilation are understood by grasping two parts: lexical analysis and syntax analysis:

  • lexical analysis

The lexical analyzer lexical rules identifying the source of each token, each token represents a
class word. Source common mark can be classified into several categories: keywords, identifiers, literals and
special symbols. The input of the lexical analyzer is the source program, and the output is the recognized token stream. Lexical analyzer
task is the character of the source file stream into a stream of tokens. It is the first stage of the compilation process. Its
main task is described from left to right of each character string description of the source, identified by one
word therein, and turn it into a symbol string output of the internal word encoded form, for parsing.

In a nutshell, the grammar should generally complete the following tasks during its work:

(1) Identify each word symbol in the source program and convert it into an internal code form;

(2) Delete useless blank characters, carriage return characters and other insubstantial characters;

(3) Delete comments;

(4) Perform a lexical check and report the errors found.

In addition, depending on the organization compiled workflow, some of the compiler during lexical analysis, but also to
complete the identified identifiers to log on to the work of the symbol table.

  • Parsing:

The syntax analysis is the core of the compilation process, task analysis is to analyze the source code according to the rules of grammar
grammatical structure, and in the analysis of the source code syntax checking, if no syntax errors,
then give the correct grammatical structures, semantic Prepare for analysis and code generation.

At present, there are many kinds of grammatical analysis methods, which can be roughly divided into two categories: top-down and bottom-up.
From top to bottom, it is divided into LL (1) analysis method and recursive descent analysis method. Bottom up is divided into simple
single priority grammar, operator precedence grammar, LR (K) analysis. The following mainly introduces the bottom-up
LR (K) analysis method.

The bottom-up analysis method is also called the shift-reduction analysis method. Its idea is to realize an input symbol string
from left to right scan, and inputs the character one by one into a LIFO stack, the side edge into the analysis,
once the handle is formed of a stack symbol string sentence, (which a handle corresponds to the right portion of the production), to
a left portion of the production string of grammar symbols nonterminal symbol instead of the corresponding right portion, which is called reduction venue.
This process was repeated until the analysis is successful reduction to the stack only when the start symbol of grammar, it is indeed
recognized input string is a sentence in the grammar. Otherwise, failure analysis, represents a symbol input string is not the grammar of
the sentence, which must be a syntax error.

Based on the relevant knowledge of the above compilation combined with the knowledge of the C++ programming language and some data structures to write a lexical parser.

5 Design and implementation

5.1 Design ideas

The preparation of a recursive descent parser, achieve word for lexical analysis program provided by the sequence of
syntax checking and structural analysis. Use C ++ to write a recursive descent parser, and PL / 0 language
syntax analysis. The core idea is to start from the beginning of the state, according to grammar expansion, step by step analysis of the state until
the analysis is complete, if there is a mismatch state during this period, that is a syntax error, stop the analysis. Of course,
the actual parser error recovery mechanisms have to find other grammatical errors. That is, a newspaper
reported more syntax errors. Also in order to achieve parsing, you must first have a lexical analysis, using lexical analysis
result of parsing.

The extended BNF is expressed as follows:

The preparation of a recursive descent parser, achieve word for lexical analysis program provided by the sequence of
syntax checking and structural analysis.

Use C++ to write recursive descent analysis program, and perform grammatical analysis on PL/0 language.

\<program>::=begin\<statement string>end

\<statement string>:=\<statement>{;\<statement>}

\<statement>:=\<assignment statement>

\<assignment statement>::=ID:=\<expression>

\<expression>::=\<item>{+\<item> | -\<item>}

\<item>::=\<factor>{*\<factor> | /\<factor>

\<factor>::=ID | NUM | (\<expression>)

Enter the word string and end with "#". If it is a grammatically correct sentence, output a success message and
print "Syntax analysis succeeded!", otherwise output "Syntax analysis error (error reason)".

E.g:

Enter begin a:=4; b:=2*3; c:=a+b end #

The output syntax analysis is successful!

Enter x:=a+b*c end #

The output is missing begin!

Word symbol Species code
Begin 1
If 2
Then 3
While 4
Do 5
End 6
Identifier 10
digital 20
+ 13
- 14
* 15
/ 16
: 17
:= 18
\< 20
\<> 21
\<= 22
> 23
>= 24
= 25
; 26
( 27
) 28
# 0

5.2 Implementation method

This experiment uses C++ coding, in which the following functions and functions are mainly written:

Void cifa() //词法分析

Void fun_yufa() //判断语法是否有错误

Void fun_op() //处理运算符(\*和/)

Void exp() //处理运算符(+和-)

Void fun_yuju() //判断是否有语句错误(:=)

Void fun_end() //判断程序是否结束

Void yufa() //采用递归下降的语法分析

Among them, cifa() performs lexical analysis, and calls yufa() to perform grammatical analysis with the results of lexical analysis.

833655145ca0bdfcc389a76dd9b2e8d3.png

Figure 1 Part of the lexical analysis code

4a32ed6124fc05f333997cecf27a248f.png

Figure 2 Part of the syntax analysis function

5.3 Test case

Project/software Lexical parser Program Version V1.0
Function module name Lexical analysis module Editor XX
Use case number T1.0 Preparation time 207.11.20
Features Lexical grammar definition analysis and judgment
Testing purposes Determine whether the lexical grammar is correct
Test Data 1:begin a:=3;b:=2*4;c:=a+b;end # 2:a:=3;b:=2*4;c:=a+b; end # 3:begin a:=3;b:=2*4;c:=a+b; #
Test case Operation description Code Desired result actual results Test status
1 Enter the first code begin a:=3;b:=2*4;c:=a+b;end # Syntax is correct Syntax is correct good
2 Enter the second code a:=3;b:=2*4;c:=a+b; end # 缺少Begin 缺少begin 良好
3 输入第三段代码 begin a:=3;b:=2*4;c:=a+b; # 缺少结束符 缺少结束符 良好

5.4 实验结果及分析

输入一段PL/0语言,比如输入:begin a:=3;b:=2*4;c:=a+b; end #

此段代码语法是正确的,所以经过词法语法分析输出的结果应该是:语
法分析正确!如下图所示:

Lexical analysis result graph

图3 实验结果1

当我们漏掉begin,语法分析器应该检测出并输出:缺少begin!如下图:

The final result of the test

图4 实验结果2

6 总结

经过这次实验,我对编译原理有了更近一步的理解,让我知道了词法分
析的功能是输出把它组织成单个程序,让我了解到如何设计、编写并调试词
法分析程序,对语法规则有明确的定义;编写的分析程序能够进行正确的语
法分析;对于遇到的语法错误,能够做出简单的错误处理,给出简单的错误
提示,保证顺利完成语法分析过程,并且通过实验的练习,加强了对基本概
念的理解和应用,对以后的学习也打下了基础。目前程序也存在着少量不足
之处,主要是语法分析部分还有不完善的地方,错误报告也有待改进,希望
在经过进一步的学习后,这些问题能逐步解决。

7 参考文献

1.互联网:百度,CSDN博客。

2.教材:《编译技术》张莉 高等教育出版社。

3.教材:C++ primer plus(第六版)

8 代码展示

#include "cstdio"
#include "string"
#include "iostream"
#include "algorithm"
#include "cstring"
using namespace std;

char str[1000];            //从键盘输入
char bzf[8];      //判断是否是关键字
char ch;
char *keyword[6]={
   
   "begin","if","then","while","do","end"};
int num,p,m,n,sum;
int x;


void cifa()   //词法分析
{
    sum=0;
    for(m=0;m<8;m++)
        bzf[m++]=NULL;
    m=0;
    ch=str[p++];
    while(ch==' ')       //去掉空格
        ch=str[p++];
    if(((ch<='z')&&(ch>='a'))||((ch<='Z')&&(ch>='A')))  //标识符
    {
        while(((ch<='z')&&(ch>='a'))||((ch<='Z')&&(ch>='A'))||((ch>='0')&&(ch<='9')))
        {
            bzf[m++]=ch;
            ch=str[p++];
        }
        p--;
        num=10;
        bzf[m++]='\0';
        for(n=0;n<6;n++)
        if(strcmp(bzf,keyword[n])==0)
        {
            num=n+1;
            break;
        }
    }
    else if((ch>='0')&&(ch<='9'))  //数字
    {
        while((ch>='0')&&(ch<='9'))
        {
            sum=sum*10+ch-'0';
            ch=str[p++];
        }
        p--;
        num=11;
    }
    else
    switch(ch)     //符号
    {
        case '<':
            m=0;
            ch=str[p++];
            if(ch=='>')
            {
                num=21;
            }
            else if(ch=='=')
            {
                num=22;
            }
            else
            {
                num=20;
                p--;
            }
        break;

        case '>':
            m=0;
            ch=str[p++];
            if(ch=='=')
            {
                num=24;
            }
            else
            {
                num=23;
                p--;
            }
        break;

        case ':':
            m=0;
            ch=str[p++];
            if(ch=='=')
            {
                num=18;
            }
            else
            {
                num=17;
                p--;
            }
            break;

        case '+':
            num=13;
        break;

        case '-':
            num=14;
        break;

        case '*':
            num=15;
        break;

        case '/':
            num=16;
        break;

        case '(':
            num=27;
        break;

        case ')':
            num=28;
        break;

        case '=':
            num=25;
        break;

        case ';':
            num=26;
        break;

        case '#':
            num=0;
        break;

        default:
            num=-1;
        break;
    }
}
void term();
void exp();
void fun_yufa()   //判断语法是否错误
{
    if((num==10)||(num==11))//关键字,数字
    {
        cifa();
    }
    else if(num==27)
    {
        cifa();
        exp();

        if(num==28)
        {
            cifa();          /*读下一个单词符号*/
        }
        else
        {
            printf("缺少‘(’\n");
            x=1;
        }
    }
    else
    {
        printf("语法错误\n");
        x=1;
    }
    return;
}
void fun_op()  //处理运算符
{
    fun_yufa();
    while((num==15)||(num==16))//  '*'和'/'
    {
        cifa();             /*读下一个单词符号*/
        fun_yufa();
    }
    return;
}

void exp()   //处理运算符
{
    fun_op();
    while((num==13)||(num==14))   //+和-
    {
        cifa();               /*读下一个单词符号*/
        fun_op();
    }

    return;
}


void fun_yuju()  //判断是否有语句错误
{
    if(num==10)
    {
        cifa();        /*读下一个单词符号*/
        if(num==18)
        {
            cifa();      /*读下一个单词符号*/
            exp();              }
        else
        {
            printf("':='错误\n");
            x=1;
        }
    }
    else
    {
        printf("语法错误!\n");
        x=1;
    }

    return;
}

void fun_end()  //判断程序结束的标志
{
    fun_yuju();         /*调用函数statement();*/

    while(num==26)
    {
        cifa();          /*读下一个单词符号*/
        if(num!=6)
            fun_yuju();          /*调用函数statement();*/
    }

    return;
}

void yufa()  //递归下降语法分析
{
    if(num==1)
    {
        cifa();
        fun_end();
        if(num==6)
        {
            cifa();
            if((num==0)&&(x==0))
            printf("语法分析正确!\n");
        }
        else
        {
            if(x!=1) printf("缺少end!\n");
            x=1;
        }
    }
    else
    {
        printf("缺少begin!\n");
        x=1;
    }

    return;
}

int main()
{
    p=x=0;
    printf("请输入一段语句以#结束: \n");
    do
    {
        scanf("%c",&ch);
        str[p++]=ch;
    }while(ch!='#');
    p=0;
    cifa();
    yufa();
    return 0;
}

版权声明:本文为原创文章,版权归 Geekerstar 所有。

本文链接:http://www.geekerstar.com/technology/105.html

除了有特殊标注文章外欢迎转载,但请务必标明出处,格式如上,谢谢合作。

Guess you like

Origin blog.csdn.net/geekerstar/article/details/79518312