JAVA implements a simple algebraic operation language compiler (3) - lexical analysis

In the previous article, we wrote system prefabricated classes such as reserved words, system symbols, and error prompts for the compiler. In this article, we mainly introduce the lexical analysis part of the compiler.


We first create a class named WordAnalysis, and write a common static method wordAnalysis for this class to provide an external lexical analysis interface. This method accepts a string parameter, which is a split statement. Returns a string queue, that is, a statement that has passed lexical analysis and is segmented word by word. Each string in the queue is a word. For this item, the word may be a variable name, an operation symbol, an assignment symbol, Reserved words.


Since JAVA's character type char can directly compare ASCII codes, here we can write corresponding methods to judge whether a character is a letter, a number, a blank character or a system symbol:

        /*
	 * Determine if it is a letter
	 * @param ch The character to be judged
	 * @return true means letter, false means not letter
	 */
	private static boolean isLetter(char ch) {
		if ((ch >= 65 && ch <= 90) || (ch >= 97 && ch <= 122)) {
			return true;
		}
		return false;
	}

	/*
	 * Determine if it is a number
	 * @param ch The character to be judged
	 * @return true means it is a number, false means it is not a number
	 */
	private static boolean isDigit(char ch) {
		if (ch >= 48 && ch <= 57) {
			return true;
		}
		return false;
	}

	/*
	 * Determine whether it is a space or a newline
	 * @param ch The character to be judged
	 * @return true means whitespace, false means not
	 */
	private static boolean isSpace(char ch) {
		if (ch == 32 || ch == 10) {
			return true;
		}
		return false;
	}

	/*
	 * Determine whether it is a decimal point
	 * @param ch The character to be judged
	 * @return true means decimal point, false means not
	 */
	private static boolean isPoint(char ch) {
		if (ch == 46) {
			return true;
		}
		return false;
	}

	/*
	 * Determine whether it is a system symbol
	 * @param ch The character to be judged
	 * @return true means it is a system symbol, false means it is not
	 */
	private static boolean isSymbol(char ch) {
		for (char symbol : Symbol.symbols) {
			if (symbol == ch) {
				return true;
			}
		}
		return false;
	}


Next is the focus of lexical analysis. How to analyze it? I choose to scan the input string character by character, and the scanned characters may be letters, numbers, symbols, or whitespace. Here is a question, how do we know what the last scanned character is when we scan a character? If we scan a letter, and the last scan was also a letter, then the two characters containing these two characters should belong to a variable name or a reserved word; but if the last scanned character is a number , then these two characters should belong to a variable name, and if the variable name happens to start with the above number, then the compiler should report an error! Because the project stipulates that variable names can only start with a letter.

With the above analysis, I choose to use two static StringBuffer variables variableRegister, digitRegister and a static boolean variable anySpace to temporarily save the state, here we might as well call them variable registers, digital registers and blank registers. Here we take the statement re= nu*2 as an example for analysis.



At the beginning, the variable registers and digital registers are empty, and the empty register value is false.

1. Scan to the first character r, and judge that it is a letter. At this time, both the variable register and the digital register are empty, so it is stored in the variable register.

2. Scan the character e and judge that it is a letter. At this time, the variable register is not empty, and the digital register is empty, so it is stored in the variable register.

3. Scan the character =, and judge that it is a system symbol. At this time, the digital register is empty, and the variable register value is re, so re is added to the string queue to be returned, and the variable register is emptied.

4. A blank character is scanned, and the variable register and digital register are empty at this time, so there will be no lexical error, just set the value of the blank register to true.

5. Scan to the letter n. At this time, the variable register and the digital register are empty, so it is stored in the variable register, and the empty register is reset to false.

6. The character u is scanned, and it is judged that it is a letter. At this time, the variable register is not empty, and the digital register is empty, so it is stored in the variable register.

7. Scan the character * and judge that it is a system symbol. At this time, the digital register is empty, and the variable register value is nu, so re is added to the string queue to be returned, and the variable register is cleared.

8. Scan to character 2 and judge that it is a number. At this time, both the variable register and the number register are empty, so it is stored in the number register.

9. After the string is scanned, the variable register is empty and the value in the digital register is 2, so add 2 to the string queue to be returned, and clear the digital register.

10. Return the result string queue.


The above is a lexical analysis of the statement re= nu*2 and the process of segmentation by words. In practice, there may be various lexical errors, such as irregular variable names, numbers ending with decimal points, wrong blank characters, etc. For these cases, I have written them in the wordAnalysis method. Partners can take a closer look.


The complete code of the WordAnalysis class is as follows:

package com.liu.analysis;


import java.util.ArrayList;
import java.util.List;

import com.liu.system.Error;
import com.liu.system.MyException;
import com.liu.system.Symbol;

/*
 * Lexical analysis class
 * Created on 2017.3.8
 * @author lyq
 * */
public class WordAnalysis {
	
	/* Variable name registration field, you can temporarily register a variable name */
	private static StringBuffer variableRegister = new StringBuffer();

	/* Number register field, you can temporarily register a number */
	private static StringBuffer digitRegister = new StringBuffer();

	/* Whether to store whitespace characters */
	private static boolean anySpace = false;

	/*
	 * Determine if it is a letter
	 * @param ch The character to be judged
	 * @return true means letter, false means not letter
	 */
	private static boolean isLetter(char ch) {
		if ((ch >= 65 && ch <= 90) || (ch >= 97 && ch <= 122)) {
			return true;
		}
		return false;
	}

	/*
	 * Determine if it is a number
	 * @param ch The character to be judged
	 * @return true means it is a number, false means it is not a number
	 */
	private static boolean isDigit(char ch) {
		if (ch >= 48 && ch <= 57) {
			return true;
		}
		return false;
	}

	/*
	 * Determine whether it is a space or a newline
	 * @param ch The character to be judged
	 * @return true means whitespace, false means not
	 */
	private static boolean isSpace(char ch) {
		if (ch == 32 || ch == 10) {
			return true;
		}
		return false;
	}

	/*
	 * Determine whether it is a decimal point
	 * @param ch The character to be judged
	 * @return true means decimal point, false means not
	 */
	private static boolean isPoint(char ch) {
		if (ch == 46) {
			return true;
		}
		return false;
	}

	/*
	 * Determine whether it is a system symbol
	 * @param ch The character to be judged
	 * @return true means it is a system symbol, false means it is not
	 */
	private static boolean isSymbol(char ch) {
		for (char symbol : Symbol.symbols) {
			if (symbol == ch) {
				return true;
			}
		}
		return false;
	}

	
	/*
	 * Perform lexical analysis on an input string
	 * @param str String to analyze
	 * @return returns the string array after lexical analysis
	 * @exception An exception occurs when there is a whitespace before the number
	 */
	public static List<String> wordAnalysis(String str) throws MyException {
		// used to store the analysis results
		List<String> result = new ArrayList<String>();
		
		for (int i = 0; i < str.length(); i++) {
			char ch = str.charAt(i);
			// are letters
			if (isLetter(ch)) {
				if (!variableRegister.toString().equals("")) {
					// letter-whitespace-letter
					if (anySpace) {
						variableRegister.setLength(0);
						digitRegister.setLength(0);
						throw new MyException(Error.LETTER_SPACE_LETTER);
					}
					// letter-letter
					else {
						variableRegister.append(ch);
						continue;
					}
				}
				if (!digitRegister.toString().equals("")) {
					// number-space-letter
					if (anySpace) {
						variableRegister.setLength(0);
						digitRegister.setLength(0);
						throw new MyException(Error.NUMBER_SPACE_LETTER);
					}
					// number letter
					else {
						variableRegister.setLength(0);
						digitRegister.setLength(0);
						throw new MyException(Error.LETTER_AFTER_NUMBER);
					}
				}
				variableRegister.append(ch);
				anySpace = false;
				continue;
			}
			// is a number
			if (isDigit(ch)) {
				if (!variableRegister.toString().equals("")) {
					// letter-whitespace-number
					if (anySpace) {
						variableRegister.setLength(0);
						digitRegister.setLength(0);
						throw new MyException(Error.LETTER_SPACE_NUMBER);
					}
					// letter and number
					else {
						variableRegister.append(ch);
						continue;
					}
				}
				if (!digitRegister.toString().equals("")) {
					// number-whitespace-number
					if (anySpace) {
						variableRegister.setLength(0);
						digitRegister.setLength(0);
						throw new MyException(Error.NUMBER_SPACE_NUMBER);
					}
					// number-number
					else {
						digitRegister.append(ch);
						continue;
					}
				}
				digitRegister.append(ch);
				anySpace = false;
				continue;
			}
			// is a blank character, a blank character appears in the record, and then continue to loop
			if (isSpace(ch)) {
				anySpace = true;
				continue;
			}
			// is the decimal point
			if(isPoint(ch)){
				if(anySpace){
					variableRegister.setLength(0);
					digitRegister.setLength(0);
					throw new MyException(Error.POINT_AFTER_SPACE);
				}
				else{
					if(!variableRegister.toString().equals("")){
						variableRegister.setLength(0);
						digitRegister.setLength(0);
						throw new MyException(Error.POINT_AFTER_LETTER);
					}
					if(!digitRegister.toString().equals("")){
						if(digitRegister.toString().contains(String.valueOf(Symbol.point))){
							variableRegister.setLength(0);
							digitRegister.setLength(0);
							throw new MyException(Error.POINT_AFTER_POINT);
						}
						digitRegister.append(ch);
						continue;
					}
				}
			}
			// is the system symbol
			if(isSymbol(ch)){
				anySpace = false;
				//The variable is stored in the variable register
				if(!variableRegister.toString().equals("")){
					result.add(variableRegister.toString());
					// clear the variable register
					variableRegister.setLength(0);
				}
				//The number is stored in the digital register
				if(!digitRegister.toString().equals("")){
					if(digitRegister.toString().endsWith(String.valueOf(Symbol.point))){
						variableRegister.setLength(0);
						digitRegister.setLength(0);
						throw new MyException(Error.NUMBER_END_POINT);
					}
					result.add(digitRegister.toString());
					// clear the variable register
					digitRegister.setLength(0);
				}
				result.add(String.valueOf(ch));
				continue;
			}
			variableRegister.setLength(0);
			digitRegister.setLength(0);
			throw new MyException(Error.CONTAIN_UNKNOWN_CAHR);
		}
		//The variable is stored in the variable register
		if(!variableRegister.toString().equals("")){
			result.add(variableRegister.toString());
			// clear the variable register
			variableRegister.setLength(0);
		}
		//The number is stored in the digital register
		if(!digitRegister.toString().equals("")){
			if(digitRegister.toString().endsWith(String.valueOf(Symbol.point))){
				variableRegister.setLength(0);
				digitRegister.setLength(0);
				throw new MyException(Error.NUMBER_END_POINT);		
			}
			result.add(digitRegister.toString());
			digitRegister.setLength(0);
		}
		return result;
	}
}

The above is the entire lexical analysis process. In the next article, we will introduce how to calculate expressions.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325845914&siteId=291194637