Go look at the code from the perspective of the composition of lexical analysis

Go before the series notes, has been completed to the development environment to build, the next original plan is part of the grammar, but then there has been no progress. Mainly because it was the work of busy, distracted, so he put aside.

Recently, ready again to pick it up before the program.

The first step, certainly understand the basic grammar Go section. Go originally planned to write some basic coding knowledge, but what is purely chat keywords, identifiers, literals, operators, it is a bit boring.

Suddenly thought, lexical analysis this knowledge has not carefully studied, it is from this point of view it. , Each token will be classified by gradually dismantling.

Outline

We know that the source code compiled languages ​​(such as Go) to be compiled and linked into a program to be executed by a computer, the first step in this process is the lexical analysis.

What is the lexical analysis?

It is to process source code into one of the pre-defined token. For ease of understanding, we will be divided into two phases introduced.

The first stage of the source string is scanned by the token matches a predefined rule and cut into one with a grammatical meaning, the smallest unit of character string, morphemes (lexme), and on this basis it is classified as some sort of token. At this stage, some characters might be filtered out, for example, whitespace, comments, etc.

The second stage, morphological assessment scanned by the Evaluator evaluator, and it is determined literal value to generate the final Token.

Is not it a bit difficult to understand it?

If you previously never had contact with this content may not be intuitive feel. In fact, we are looking very complicated, but it really is very simple.

A simple example

Look at the piece of code, the classic hello world, as follows:

package main

import "fmt"

func main() {
    fmt.Println("Hello World")
}
复制代码

We can gradually dismantling the entire process of lexical analysis by the example source code.

What is the morpheme

Theoretical concepts not say directly see the effect of it.

First, this example code by the first stage of lexical analysis, we will get the following:

package
main
\n
import
"fmt"
\n
func
main
(
)
{
\n
fmt
.
Println
(
"Hello World"
)
\n
}
复制代码

All these independent sequence of characters output is morpheme.

Cut morpheme division planning and grammar rules of the language concerned. Here, in addition to some of the output of visible characters, line breaks, also have a grammatical meaning, because Go unlike C / C ++ statements must be separated by semicolons, it can also be separated by line breaks.

Source morpheme is divided into a number of process there are certain rules, which is relevant and specific language. But despite the differences, in fact, the rules are similar, no more than two, one is by no grammatical meaning of the characters (spaces, tabs, etc.) segmentation, there is every morpheme can be used as a separator.

What is the token

token, also referred to as lexical units and symbols, which consists of two parts, a name and a literal. From there to the token morpheme fixed correspondence relationship, and not all have literal token.

The hello world source into token, you would get a piece of the corresponding form.

lexme name value
package PACKAGE "package"
main IDENT "main"
\n SEMICOLON "\n"
import IMPORT "import"
"fmt" STRING "\"fmt\""
\n SEMICOLON "\n"
func FUNC "func"
main IDENT "main"
( LPAR ""
) RP ""
{ LBRACE ""
fmt IDENT "fmt"
. PERIOD ""
Println IDENT "Println"
( LPAR ""
"Hello World" STRING ""Hello World""
) RP ""
\n SEMICOLON "\n"
} LBRACE ""
\n SEMICOLON "\n"

Slightly longer, because there is no omitted. The first column in the table is the original content, the name of a second token corresponding to a column, the last column is the literal token.

It can be observed from the table, some of which token and has no value, such as brackets, point, the name itself has expressed their content.

token classification

token can generally be divided keywords, identifiers, literals, operators, these four categories. In fact, this category has very clearly reflected in the Go source code.

View source file src / Go / token / token.go , will find Tokenseveral types of the following methods.

// 是否是字面常量
func (tok Token) IsLiteral() bool { return literal_beg < tok && tok < literal_end }
// 是否是操作符
func (tok Token) IsOperator() bool { return operator_beg < tok && tok < operator_end }
// 是否是关键字
func (tok Token) IsKeyword() bool { return keyword_beg < tok && tok < keyword_end }
复制代码

Code is very simple, determined by the comparison Tokento determine its type is outside the specified range. The above three methods in the determination Tokenis literal, or key operator.

amount? How the identifier is not it?

Of course there are friends, but it is not a Tokenmethod, but a separate function. as follows:

func IsIdentifier(name string) bool {
	for i, c := range name {
		if !unicode.IsLetter(c) && c != '_' && (i == 0 || !unicode.IsDigit(c)) {
			return false
		}
	}
	return name != "" && !IsKeyword(name)
}
复制代码

We often say that the names of variables, constants, functions, methods can not be for the keyword, and must be composed of letters, numbers or an underscore, and the name can not begin to digital rules, see this function is not some to understand.

Here, in fact, I have written almost the same. But think about or take one type and then simply speak.

Keyword

Take for example the keyword it, Go keywords, what does?

Continue to look at the source code. Before that period will determine how a tokenkeyword code to see it again. as follows:

func (tok Token) IsKeyword() bool {
	return keyword_beg < tok && tok < keyword_end
}
复制代码

As long as Tokengreater than keyword_begand less than keyword_endthat is a keyword, it looks very good understanding. That in keyword_begand keyword_endwhat keywords it between? code show as below:

const (
	...
	keyword_beg
	// Keywords
	BREAK
	CASE
	CHAN
	CONST
	CONTINUE

	...

	SELECT
	STRUCT
	SWITCH
	TYPE
	VAR
	keyword_end
	...
)
复制代码

Tease out a total of 25 keywords. as follows:

break       case        chan    const       continue
default     defer       else    fallthrough for
func        go          goto    if          import
interface   map         package range       return
select      struct      switch  type        var
复制代码

Keywords indeed very small. visible. . .

Ok? !

Is not guess I have to say, Go language is concise, keywords are so little. You see Java, a full 53 keywords, two of which are reserved words. You look at Go, even the reserved words are not, it is so confident.

Since you guessed it, that I will not speak of it.

other

Operators and literals not caught, the idea is the same.

Go the operator 47, such as the assignment operator, Bitwise operators, arithmetic operators, comparison operators, and other operators. Believe me, the number is out of the source code, we did not see any information. [Behind his here should put a smile].

Literal it?

There are five types, namely, INT (integer), FLOAT (float), the IMG (plural types), CHAR (character), STRING (string).

to sum up

Finished article, then pulled in front of a bunch of nonsense, in fact, just to introduce the keywords used in the syntax Go, identifiers, operators, literal where to look. And ultimately how they use do not explain how.

Purely for the fun of it? Certainly not (be). because. . . , To not play through, to avoid later embarrassment.

Readings

Go program is how to run up

go-lexer lexical analysis

Lexical analysis

lexical analysis

Guess you like

Origin juejin.im/post/5dbebb3d6fb9a0204e659077