Write your own compiler: implement command line module

In the previous series of chapters, we completed various algorithms for lexical parsing. Including parsing regular expression strings, building NFA states, converting from NFA to DFA state machines, and finally minimizing the state machine. Next, we focus on the engineering implementation of the lexical parsing module, that is, we combine all algorithms to complete a available programs, thus in the following chapters we will focus on engineering implementation rather than compilation principle algorithms.

Why does a column of ours that emphasizes compilation principles and algorithms spend a lot of effort on engineering implementation. There is a saying in English "you don't know it if you can't build it", that is, if you can't do it, it means you have not mastered it. This is the pain point of our traditional education. If you take computer courses Compilation principles, operating systems, you have mastered a bunch of nouns and algorithm descriptions, but after completing these courses and passing the exam, does that mean you have mastered this knowledge? If you learn the operating system and you can't make a runnable system, and you learn the principles of compilation and you can't make a compiler that can compile the code, it means that you have no real grasp of the knowledge you have learned, and you are just vague. , half-understood.

In order to truly master it, we have to build a concrete entity that works. In the process of realizing this specific entity, we will find that many algorithms or concepts that we thought we understood are actually not mastered at all. At the beginning of this section we are going to add more complex functions to GoLex. When we complete the GoLex tool, its functions are as follows: When the GoLex program is running,
Please add image description
two files need to be input, namely input.lex and lex.par, among which input.lex we We have already known that lex.par is actually a C language template file. We will spend a lot of effort to analyze and implement its content in the following chapters. GoLex will read the contents of these two files and then generate two files lex .yy.c and lex.yy.h, these two files are the codes of the lexical parser of a given language. Assume that we want to develop a program that can recognize the lexicon of the SQL language. Then we identify the keywords and variable names in the SQL language. Wait for the regular expression corresponding to the string to be placed in input.lex, then call GoLex to generate two c language source code files lex.yy.c and lex.yy.h, and then use gcc to compile these files, and finally get The executable file a.out is an executable file that can be used to perform lexical analysis of sql code files. In other words, GoLex is actually a program used to generate the source code of another executable program, which is similar to the second step in calculus. Order derivation.

Stop talking nonsense and don’t push if you can. First create a folder named cmd in the project directory, and then create a file named cmd.go. The implementation code is as follows:

package command_line

import (
	"fmt"
	"time"
)

type CommandLine struct {
    
    
}

func NewCommandLine() *CommandLine {
    
    
	return &CommandLine{
    
    }
}

func (c *CommandLine) Signon() {
    
    
	//这里设置当前时间
	date := time.Now()
	//这里设置你的名字
	name := "yichen"
	fmt.Printf("GoLex 1.0 [%s] . (c) %s, All rights reserved\n", date.Format("01-02-2006"), name)
}

After the above code is run, a line of "copyright" information will be printed. It can make us feel as if we have done something awesome, giving us a sense of achievement that I am a great master. Below we provide a function called PrintHeader, which is used to output C language comments on uncompressed DFA. First, we move the code originally in the main function to the constructor of the CommandLine object. The relevant code is as follows:

package command_line

import (
	"fmt"
	"nfa"
	"time"
)

type CommandLine struct {
    
    
	lexerReader  *nfa.LexReader
	parser       *nfa.RegParser
	nfaConverter *nfa.NfaDfaConverter
}

func NewCommandLine() *CommandLine {
    
    
	lexReader, _ := nfa.NewLexReader("input.lex", "output.py")
	lexReader.Head()
	parser, _ := nfa.NewRegParser(lexReader)
	start := parser.Parse()
	nfaConverter := nfa.NewNfaDfaConverter()
	nfaConverter.MakeDTran(start)
	nfaConverter.PrintDfaTransition()

	return &CommandLine{
    
    
		lexerReader:  lexReader,
		parser:       parser,
		nfaConverter: nfaConverter,
	}
}

func (c *CommandLine) PrintHeader() {
    
    
	//针对未压缩的 DFA 状态就,输出对应的 c 语言注释
	c.nfaConverter.PrintUnCompressedDFA()
	//打印基于 c 语言的跳转表
	c.nfaConverter.PrintDriver()
}

func (c *CommandLine) Signon() {
    
    
	//这里设置当前时间
	date := time.Now()
	//这里设置你的名字
	name := "yichen"
	fmt.Printf("GoLex 1.0 [%s] . (c) %s, All rights reserved\n", date.Format("01-02-2006"), name)
}


Then we enter the file nfa_to_dfa and add the two functions called above to the class NfaDfaConverter. The implementation is as follows:

func (n *NfaDfaConverter) PrintUnCompressedDFA() {
    
    
	fmt.Fprint(n.fp, "ifdef __NEVER__\n")
	fmt.Fprint(n.fp, "/*------------------------------------------------\n")
	fmt.Fprint(n.fp, "DFA (start state is 0) is :\n *\n")
	nrows := n.nstates
	charsPrinted := 0
	for i := 0; i < nrows; i++ {
    
    
		dstate := n.dstates[i]
		if dstate.isAccepted == false {
    
    
			fmt.Fprintf(n.fp, "* State %d [nonaccepting]", dstate.state)
		} else {
    
    
			//这里需要输出行数
			//fmt.Fprintf(n.fp, "* State %d [accepting, line %d <", i, )
			fmt.Fprintf(n.fp, "* State %d [accepting, line %d, <%s>]\n", i, dstate.LineNo, dstate.acceptString)
			if dstate.anchor != NONE {
    
    
				start := ""
				end := ""
				if (dstate.anchor & START) != NONE {
    
    
					start = "start"
				}
				if (dstate.anchor & END) != NONE {
    
    
					end = "end"
				}
				fmt.Fprintf(n.fp, " Anchor: %s %s", start, end)
			}
		}
		lastTransition := -1
		for j := 0; j < MAX_CHARS; j++ {
    
    
			if n.dtrans[i][j] != F {
    
    
				if n.dtrans[i][j] != lastTransition {
    
    
					fmt.Fprintf(n.fp, "\n * goto %d on ", n.dtrans[i][j])
					charsPrinted = 0
				}
				fmt.Fprintf(n.fp, "%s", n.BinToAscii(j))
				charsPrinted += len(n.BinToAscii(j))
				if charsPrinted > 56 {
    
    
					//16 个空格
					fmt.Fprintf(n.fp, "\n *                ")
					charsPrinted = 0
				}
				lastTransition = n.dtrans[i][j]
			}
		}
		fmt.Fprintf(n.fp, "\n")
	}
	fmt.Fprintf(n.fp, "*/ \n\n")
	fmt.Fprintf(n.fp, "#endif\n")
}

func (n *NfaDfaConverter) PrintDriver() {
    
    
	text := "输出基于 DFA 的跳转表,首先我们将生成一个 Yyaccept数组,如果 Yyaccept[i]取值为 0," +
		"\n\t那表示节点 i 不是接收态,如果它的值不是 0,那么节点是接受态,此时他的值对应以下几种情况:" +
		"\n\t1 表示节点对应的正则表达式需要开头匹配,也就是正则表达式以符号^开始," +
		"2 表示正则表达式需要\n\t末尾匹配,也就是表达式以符号$结尾,3 表示同时开头和结尾匹配,4 表示不需要开头或结尾匹配"
	comments := make([]string, 0)
	comments = append(comments, text)
	n.comment(comments)
	//YYPRIVATE YY_TTYPE 是 c 语言代码中的宏定义,我们将在后面代码提供其定义
	//YYPRIVATE 对应 static, YY_TTYPE 对应 unsigned char
	fmt.Fprintf(n.fp, "YYPRIATE YY_TTYPE Yyaccept[]=\n")
	fmt.Fprintf(n.fp, "{\n")
	for i := 0; i < n.nstates; i++ {
    
    
		if n.dstates[i].isAccepted == false {
    
    
			//如果节点i 不是接收态,Yyaccept[i] = 0
			fmt.Fprintf(n.fp, "\t0  ")
		} else {
    
    
			anchor := 4
			if n.dstates[i].anchor != NONE {
    
    
				anchor = int(n.dstates[i].anchor)
			}
			fmt.Fprintf(n.fp, "\t%-3d", anchor)
		}

		if i == n.nstates-1 {
    
    
			fmt.Fprint(n.fp, "   ")
		} else {
    
    
			fmt.Fprint(n.fp, ",  ")
		}
		fmt.Fprintf(n.fp, "/*State %-3d*/\n", i)
	}
	fmt.Fprintf(n.fp, "};\n\n")
	//接下来的部分要在实现函数 DoFile 之后才好实现
	//TODO
}

It should be noted here that we have only implemented part of PrintDriver. We need to implement the C language code template in the following chapters for the remaining part, and then the above TODO part can be implemented. However, after completing the above code, we can already see lex. yy.c file, enter the following code in main.go:

package main

import (
	"command_line"
)

func main() {
    
    
	
	cmd := command_line.NewCommandLine()
	cmd.PrintHeader()
}

After completing the above code and executing it, we will get a lex.yy.c file with the following content:

ifdef __NEVER__
/*------------------------------------------------
DFA (start state is 0) is :
 *
* State 0 [nonaccepting]
 * goto 1 on .
 * goto 2 on 0123456789
* State 1 [nonaccepting]
 * goto 3 on 0123456789
* State 2 [nonaccepting]
 * goto 4 on .
 * goto 5 on 0123456789
* State 3 [accepting, line 6, <  {printf("%s is a float number", yytext); return FCON;}>]

* State 4 [accepting, line 6, <  {printf("%s is a float number", yytext); return FCON;}>]

 * goto 6 on 0123456789
* State 5 [nonaccepting]
 * goto 1 on .
 * goto 5 on 0123456789
* State 6 [accepting, line 6, <  {printf("%s is a float number", yytext); return FCON;}>]

 * goto 7 on 0123456789
* State 7 [accepting, line 6, <  {printf("%s is a float number", yytext); return FCON;}>]

 * goto 7 on 0123456789
*/ 

#endif

/*--------------------------------------
 * 输出基于 DFA 的跳转表,首先我们将生成一个 Yyaccept数组,如果 Yyaccept[i]取值为 0,
	那表示节点 i 不是接收态,如果它的值不是 0,那么节点是接受态,此时他的值对应以下几种情况:
	1 表示节点对应的正则表达式需要开头匹配,也就是正则表达式以符号^开始,2 表示正则表达式需要
	末尾匹配,也就是表达式以符号$结尾,3 表示同时开头和结尾匹配,4 表示不需要开头或结尾匹配
 */

YYPRIATE YY_TTYPE Yyaccept[]=
{
	0  ,  /*State 0  */
	0  ,  /*State 1  */
	0  ,  /*State 2  */
	4  ,  /*State 3  */
	4  ,  /*State 4  */
	0  ,  /*State 5  */
	4  ,  /*State 6  */
	4     /*State 7  */
};

It can be seen that in the output c language file, we first output the contents of the jump table using comments, and then output a receiving status array. If node i is in the receiving status, then the corresponding value of the array Yyaccept[i] is not 0. , otherwise its corresponding value is 0. In the next section, we will study the C language template code in depth, and then complete the TODO part of the code in this section. For more information, please search coding Disney on station B to get a more detailed debugging demonstration video. .

Guess you like

Origin blog.csdn.net/tyler_download/article/details/133347873