Kotlin Secrets You Don’t Know (1)

This article mainly talks about the first secret in the Koltin branch: how are Koltin keywords (final/if/for) and operators (+/-/?:) recognized? There will be 3-4 articles to introduce other secrets one after another. The code and other information mentioned in the article have been open sourced to the Android knowledge system & Android-Body

People communicate mainly through language. Can communication between programs? What is the communication?

The answer is yes, the communication between people mainly depends on language, and the communication between programs can also depend on language. The language is roughly divided into Chinese, English, Japanese, etc. The purpose is to unify different styles of culture for communication. Programs include C, Python, JAVA, Kotlin, etc., whose purpose is to communicate with people in different styles of programming languages, but in the end they will be converted into computer instructions for machine recognition and execution. Human languages ​​are diverse and changeable. The main reason for the diversity and change is that people are different from each other, and each person has his own way of thinking and characteristics. Understanding a sentence may have different meanings, but the program is fixed and rigorous and cannot tolerate a trace of errors and loopholes. This is why the program is prone to bugs. such as:

Xiaomei: Let's make an appointment tonight

  • People: Why do you ask me? Am I going to wear a suit? Do a hairstyle?

  • Procedure: Roll

The program is so cold, because he is rigorous. You have to tell him the date of the date, the location of the date, what to do on the date, how many people to date, what to bring, how long to date, do you need to bring money, and use it back tonight Home? Wait. We know that JAVA and Kotlin can run on jvm, and at the same time they will convert the language into bytecode for machine recognition. They have their own specific and style but roughly conform to the definition of language composition: lexical, grammar , Semantics . Kotlin is now promoted by Google and many official libraries and gradle libraries have also been transformed into Koltin, which shows its importance. Today we will analyze how the main advantages of kotlin are transformed into machine code. I believe that what you often hear is to know what it is and not know what it is . All of which we analyze under Koltin.

How are Koltin keywords (final/if/for) and operators (+/-/?:) recognized?

In fact, this question can be understood as how each input character is recognized as a word? Abbreviation: lexical analysis

Lexical analysis: The lexical analysis stage is the first stage of the compilation process and is the basis of compilation. The task at this stage is to read the source program character by character from left to right, that is, scan the character stream that constitutes the source program and then recognize words (also called word symbols or symbols) according to the word formation rules-Wikipedia

You can see that there are our common keywords and operators in KtTokens, as well as identifiers, access rights modifiers, and so on. The main role played here is Token flow. In fact, this is to enumerate all Kotlin lexical units one by one and group them, and then perform lexical analysis. It can be seen that they are not simply defined strings, but created through different types of KtToken, but they are all inherited from IElementType.

public class KtToken extends IElementType {
    public KtToken(@NotNull @NonNls String debugName) {
        super(debugName, KotlinLanguage.INSTANCE);
    }
}
复制代码

IElementType is a syntax tree (AST) node type. What is a syntax tree? I'll introduce it later. The more interesting thing is that you can see that super receives two parameters. The first debugName is the keyword operator and identifier we defined, and the second parameter is the KotlinLanguage class passed in.

public class KotlinLanguage extends Language {
    @NotNull
    public static final KotlinLanguage INSTANCE = new KotlinLanguage();
    public static final String NAME = "Kotlin";

    private KotlinLanguage() {
        super("kotlin");
    }
}

public class KotlinFileType extends LanguageFileType {
    public static final String EXTENSION = "kt";
    public static final KotlinFileType INSTANCE = new KotlinFileType();

    private final NotNullLazyValue<Icon> myIcon = new NotNullLazyValue<Icon>() {
        @NotNull
        @Override
        protected Icon compute() {
            return KotlinIconProviderService.getInstance().getFileIcon();
        }
    };

    @Override
    @NotNull
    public String getName() {
        return KotlinLanguage.NAME;
    }

    @Override
    public Icon getIcon() {
        return myIcon.getValue();
    }
}
复制代码

KotlinLanguage extends Language, as the name implies, declares the Koltin language. KotlinFileType defines Koltin files ending with .kt and Koltin's Icon. In the psi.idea directory. We can open our minds a bit. Can we define our own language and name it after the file we want to end, such as .wm , the answer is yes. Those who are interested can practice.

Lexical analyzer

We see that there are many familiar keywords and operators in KtTokens such as: null, false, &&, ?:. They are all commonly used by us. After we input it, it is composed of multiple characters. So is the combination of multiple characters our keyword or a group of meaningless characters? At this time, a lexical analyzer is used .

Lexical analyzer: Lexical analysis is the process of converting a sequence of characters into a sequence of words (Token). The program or function that performs lexical analysis is called a lexer (Lexer for short), also called a scanner. – Encyclopedia

You can see the open source lexical analyzer used by Koltin: JFlex (https://github.com/jflex-de/jflex/) . First define a file that can add lexical rules ending with **". flex" .

1 means user code segment: all the content in this segment will be copied before the class declaration of the generated lexical class. In this paragraph, package and import statements are common. As shown in the figure, we have added import java.util.*;, import org.jetbrains.kotlin.lexer.KtTokens; etc.
2 means parameter settings and declaration section: used to customize the lexical analyzer, including class name, parent class, permission modifier Wait, start with% as a mark. For example: %class _JetLexer
3 means lexical rules: a set of regular expressions and actions, that is, the code to be executed when the regular expression is matched successfully

After the definition, JFlex will read the configuration file and generate a lexical analyzer _JetLexer class. If there are two regular expressions "no" and note", the scanner will match "note" because it is the longest. If two If the regular expressions are exactly the same and have the same length, the scanner will match the expression listed first in the description. If there is no matching regular expression, the lexical analyzer will terminate the analysis of the input stream and give an error message.

public class KotlinLexer extends FlexAdapter {
    public KotlinLexer() {
        super(new _JetLexer((Reader) null));
    }
}

public class FlexAdapter extends LexerBase {

    public void start(@NotNull CharSequence buffer, int startOffset, 
                        int endOffset, int initialState) {
        if (buffer == null) {
            $$$reportNull$$$0(1);
        }

        this.myText = buffer;
        this.myTokenStart = this.myTokenEnd = startOffset;
        this.myBufferEnd = endOffset;
        this.myFlex.reset(this.myText, startOffset, endOffset, initialState);
        this.myTokenType = null;
    }
复制代码

The generated _JetLexer is finally referenced by KotlinLexer. KotlinLexer inherits from LexerBase. An important method in Lexer is start(buffer, startOffset, endOffset), which respectively represent the input character, the offset of the beginning of the character, and the offset of the end of the character. _JetLexer is the place to process various characters. The main implementation method is advance() , which will match some keyword operators defined by itself according to the input characters, and then output.

to sum up

Knowing the table must also know the inside. Combining the above knowledge, we can get how the Koltin keywords (final/if/for) and operators (+/-/?:) are recognized. It is roughly divided into 4 steps: input source, scan, analysis, and output as follows:

1. We enter if, final keywords, etc. in Studio (abbreviation: input source)
2. The start method in Lexer will get the string we entered (abbreviation: scan)
3. _JetLexer's advance method will match the regularity according to the input (Abbreviation: analysis)
4. After matching the corresponding regularity, the keywords and operators defined in KtTokens will be output (abbreviation: output)

Guess you like

Origin blog.csdn.net/weixin_55596273/article/details/115288238