Take icice as an example, use an interpreter by hand (1) clarify the goal

code address

# HelloWorld.ice
print("hello, world")

Preface (nonsense)

In fact, it has been almost half a year since I started to learn the principles of compiling, but I often can't keep reading the Dragon Book (often three days of fishing for three days and two days of drying the net, not to mention that I am very tired every time I fish for less than half an hour). I will put down the book again (laughs)), so far I have barely finished reading the first six chapters, and haven’t done anything else for half a year. In fact, I think that it’s been almost two years since I went to college and I still haven’t accomplished anything, and I haven’t learned the knowledge. feel ashamed.

The summer vacation is coming, and this semester is coming to an end soon. When the sophomore year is nearing the end, let’s try to write the toys that I wanted to write before. This series is a record of this process, and it can be regarded as the first six of the Dragon Book. A small practice & summary of the part (I may not be able to write anything after reading the latter part).

In fact, before writing this interpreter, I tried to spell a compiler according to the tutorial with lex + yacc + llvm, but llvm may be too difficult for me (smile). The broken code is on the lyli branch .

This series of tutorials (if it can be counted as a tutorial), in fact, mainly implements the front-end part (there are also many bugs), and the parser has long been studied thoroughly, so this tutorial is basically of no value, and may be the only advantage. The place is that I and the readers who are willing to read these articles are probably at the same level of entry (or not yet entry).

This tutorial is divided into four chapters

  1. Clear goals & design language
  2. Implement a lexical analyzer
  3. Implement a parser
  4. Implement basic data types

And I hope that after reading this series, readers can complete an interpreter language that supports the following items

  1. Integer, float, and string types
  2. Common Binary Operators
  3. Variable definitions
  4. Function definition and call
  5. Basic Control Flow Statements
  6. lambda expression

suitable for readers

  1. Interested in the principles of compilation, but have not yet officially started learning
  2. Trying to finish a toy explainer but don't know where to start

text

Before we actually do hand masturbation, we must first establish what kind of thing we are going to do (are you talking nonsense?). After all, when you want to add some new operations (new features) in the later stage, if you don't design it at the beginning, it is inevitable that there will be various annoying problems in refactoring (although if you follow the explanations in this tutorial) The refactoring will inevitably bring all kinds of bad problems in refactoring, but refactoring itself is a thing that will bring all kinds of bad problems (so don't mind it)), but design before officially writing code, always is something that should be done.

explain what

From the interpreter's point of view, we are interpreting strings. After verifying that the strings meet the rules, we interpret them. After the interpretation, we execute them correctly according to the semantics. This rule is the grammar rule of our Ice.


From the perspective of the lexical analyzer, we interpret it as a string. We only need the input string to satisfy the rules we specify for the morpheme, and then return the token to the parser according to the input string.


From the perspective of syntactic analysis, we interpret the sequence of tokens, and use predictive analysis to select the correct production based on the sequence of tokens and return an Abstract Syntax Tree.

input form

Only consider interactive input (i.e. line-by-line input)

how to explain

This project mainly includes the following categories:

  • Token: The instantiated Token object contains the type of a lexeme and the lexeme value
  • LexicalAnalyzer: Parses the input string and returns a sequence of tokens
  • Node: The instantiated Node and its derived class objects contain the information that a node in the AST should have
  • SyntaxAnalyzer: According to the token sequence prediction analysis, return AST (essentially a Node or its derived class instance)
  • IceObject: includes its own type information, and implements related operations
  • Env: symbol table, which stores the object information of Ice runtime
  • Interpreter: Only the run() interface is provided for the main function to call, hiding the internal logic

Well, basically the structure is like this, let's start thinking about what kind of syntax Ice has.

Ice syntax

Integer, float, and string types
1
1.0
"hello, world"
Common Binary Operators
1 + 1
(100 + 20) * 6 / 3
10 = 10
5 <= 3
Variable definitions
@a: 1
Function definition and call
@add(a, b): a + b

@mul(a, b)
{
    return a * b
}

mul(mul(2, 3), add(2, 3))
Basic Control Flow Statements
@fib(n)
{
    if (n = 0) + (n = 1)
    {
        return 1
    }
    else
    {
        return fib(n-1) + fib(n-2)
    }
}

fib(10) # 89

@a: 3
while a
{
    print(a)
    @a: a - 1
}

@a: 0
do {
    @a: a + 1
    if a = 3
    {
        break
    }
    print(a)
} while a < 5

for 1 to 5
{
    @a: a + 1
    if a = 3
    {
        continue
    }
    print(a)
}
lambda expression
@add(a, b): a + b
@mul: @(a, b){
    return a * b
}
@(a, b){ return a / b }(9, 3)

@quadraticSum: @(a, b){
    @sqrt: @(n){ return n * n }
    return @(a, b){ return a + b }(sqrt(a), sqrt(b))
}

That's basically it, so if you're going to continue reading, the next chapter will start hand-playing Ice's lexer.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325064962&siteId=291194637