text

mission details

The tasks of this level: deepen the understanding of the working process of the lexical analyzer; strengthen the mastery of the lexical analysis method; be able to use a programming language to implement a simple lexical analysis program; analyze.

related information

In order to complete the task of this level, you need to master: Lexical Analysis Program Design and Implementation.

Basic knowledge of lexical analysis

Lexical analyzer (Lexer for short) is responsible for scanning and decomposing the source program character by character from left to right, and identifying word symbols one by one according to the lexical rules of the language.

Therefore, a lexical analyzer should have the following functions:

Scans the stream of characters that make up the source program from left to right
Recognize words with lexical meaning
Return word records, or lexical error messages

It can be seen from the above that an important link in lexical analysis is to identify the type of word symbols. In order to facilitate grammatical analysis, word symbols are usually divided into five categories.

Identifiers
are used to name variables, arrays, functions, procedures, labels, etc. that appear in a program, usually an alphanumeric string beginning with a letter, such as length, nextch, etc.
Basic words
can also be keywords or reserved words. Such as if, while, for, do, goto, etc. They have the form of identifiers, but they are not defined by the user but by the language, and their meaning is by convention. Most languages stipulate that they cannot be used as identifiers or prefixes of identifiers, that is, users cannot use them to define the names used by users, so we call them reserved words, such as Pascal and C. But there are also languages that allow basic words to be used as identifiers or prefixes of identifiers, such as Fortran.
Constants
include various types of constants, such as integer, real, character, boolean, etc. Such as: 5, 3.1415926, a, TRUE, etc. are all constants.
Operators
Arithmetic operators +, -, ×, ÷; relational operators <,<=,>,>=,==,!= and logical operators &&, (), || or !, etc.
Delimiters such as single -character delimiters
such as , ; and double-character delimiters such as /,/,//, and blank characters, etc.

After lexical analysis, the recognized words should be in some intermediate representation that can be easily referenced for subsequent stages of compilation. Usually a word is represented by a binary:
(word category, word attribute)
Among them, the first element is used to distinguish the category to which the word belongs, and is represented by an integer code. The second element is used to distinguish which word symbol in the category, that is, the value of the word symbol.

Experimental procedure

From the functions that a lexical analyzer should have, our program has the following requirements:

There is a clear definition of the word formation rules for words;
The written analysis program can correctly identify the word symbols in the source program;
The recognized words are stored in the symbol table in the form of <category code, value>, and the symbol table is properly designed and maintained;
For lexical errors in the source program, simple error handling can be made, and simple error prompts can be given to ensure the smooth completion of the lexical analysis of the entire source program;

programming requirements

According to the prompt, after adding the recognition program for code identifiers, numeric characters and other character symbols in the editor on the right, click on the evaluation to run the program, and the system will automatically compare the results.

test introduction

The platform tests the code you write:

Test input:

    using namespace std;
    int main()
    {
        int year;
        cout << "hello" << endl;
        return 0;
    }
    #

Start your quest and good luck!

lab answer

#include<stdio.h>
#include<string.h>
#include<iostream>
using namespace std;
char prog[80], token[20];
char ch;
int syn, p, m = 0, n, row, sum = 0;
const char* rwtab[8] = { "if","int","for","while","do","return","break","continue" };
const char* rwtab1[8] = { "main","a","b","c","d","e","f","g" };

void scaner()
{
	/*
		共分为三大块，分别是标示符、数字、符号，对应下面的 if   else if  和 else
	*/
	for (n = 0; n < 8; n++) token[n] = NULL;
	ch = prog[p++];
	while (ch == ' ')
	{
		ch = prog[p];
		p++;
	}
	if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')) {
		m=0;
		while(1){
			if((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')) {
				token[m++]=ch;
				ch=prog[p++];

			}else{
				p--;
				break;
			}
		}
		token[m]='\0';
		syn=0;
		for(n=0;n<8;n++){
			if (strcmp(token, rwtab1[n]) == 0)
			{
				syn = 2;
				break;
			}
			else if (strcmp(token, rwtab[n]) == 0) {
				syn = 1;
				break;
			}
		}	
		if(!syn)syn=2;		

	}else if ((ch >= '0' && ch <= '9')) {
		syn=3;
		sum=ch-'0';
	}
	else switch (ch)   //其他字符 
	{
		case'<':m = 0; token[m++] = ch;
				ch = prog[p++];
				if (ch == '>')
				{
					syn = 4;
					token[m++] = ch;
				}
				else if (ch == '=')
				{
					syn = 4;
					token[m++] = ch;
				}
					else
					{
						syn = 4;
						p--;
					}
					break;
			case'>':m = 0; token[m++] = ch;
					ch = prog[p++];
					if (ch == '=')
					{
						syn = 4;
						token[m++] = ch;
					}
					else
					{
						syn = 4;
						p--;
					}
					break;
			case':':m = 0; token[m++] = ch;
					ch = prog[p++];
					if (ch == '=')
					{
						syn = 4;
						token[m++] = ch;
					}
					else
					{
						syn = 4;
						p--;
					}
					break;
			case'*':syn = 4; token[0] = ch; break;
			case'/':syn = 4; token[0] = ch; break;
			case'+':syn = 4; token[0] = ch; break;
			case'-':syn = 4; token[0] = ch; break;
			case'=':syn = 4; token[0] = ch; break;
			case';':syn = 5; token[0] = ch; break;
			case',':syn = 5; token[0] = ch; break;
			case'(':syn = 5; token[0] = ch; break;
			case')':syn = 5; token[0] = ch; break;
			case'{':syn = 5; token[0] = ch; break;
			case'}':syn = 5; token[0] = ch; break;
			case'#':syn = 0; token[0] = ch; break;
			case'\n':syn = -2; break;
			default: syn = -1; break;
	}

}
	



int main()
{
	p = 0;
	row = 1;
	cout << "Please input string:" << endl;
	do
	{
	cin.get(ch);
	prog[p++] = ch;
	} while (ch != '#');
	p=0;
    do{
      scaner();
	  //cout<<syn<<endl;
	  //cout<<syn<<endl;
      switch (syn)
      {
		case 0: break;
		case 3: cout << "(" << syn << "," << sum << ")" << endl; break;
		case -1: cout << "Error in row " << row << "!" << endl; break;
		case -2: row = row++; break;
		default: cout << "(" << syn << "," << token << ")" << endl; break;
      }
	  //cout<<row<<endl;
	  if(syn==-2)row++;
    } while (syn != 0); 
}

Epilogue

"If you are undecided, you can ask the spring breeze, and if the spring breeze does not speak, you will follow your heart" means: if you are hesitant about something, ask the spring breeze how to do it. . "If you are undecided, you can ask the spring breeze. If the spring breeze does not speak, you will follow your heart." The sentence comes from the "Jianlai" written by the Internet writer "Fenghuo Opera Princes". The original text is: "If you are undecided, you can ask the spring breeze. Follow your heart".

insert image description here

Experiment 2 "Lexical Analysis Program Design and Implementation" (C language version)