Abstract Syntax Tree AST Must Know | Jingdong Logistics Technical Team

1 Introduction to AST

Open the package.json in the front-end project, and you will find that many tools have occupied every corner of our daily development, such as JavaScript translation, CSS preprocessing, code compression, ESLint, Prettier, etc. Most of these tool modules will not be delivered to the production environment, but their existence is indispensable for our development.

Have you ever wondered how the functions of these tools are realized? Yes, the Abstract Syntax Tree (Abstract Syntax Tree) is the cornerstone of the above tools.

The core of many tools and libraries such as Babel, Webpack, Vue-cli, and EsLint is to implement code inspection, analysis, and other operations through the concept of Abstract Syntax Tree. The use of AST in the front end is very wide. For example, in Vue.js, the first step in the process of converting the template we wrote in the code into a render function is to parse the template string to generate AST.

The official definition of AST:

Abstract Syntax Tree (AST) is an abstract representation of source code syntax structure. The grammatical structure of the programming language is represented in the form of a tree, and each node represents a structure in the source code.

Many syntaxes of JS are not suitable for program understanding in order to give developers a better programming experience. Therefore, it is necessary to convert the source code into AST to be more suitable for program analysis. The compiler of the browser generally converts the source code into AST for further analysis and other operations. By understanding the concept of AST, it is very helpful to have a deep understanding of some front-end frameworks and tools.

So how is the AST generated? Why do you need AST?

Students who have understood the principle of compilation know that it takes a "long" analysis process for a computer to understand a string of source code:

  1. Lexical Analysis
  2. Syntax Analysis
  3. Code Generation

Here is the online AST converter: AST Converter. You can try the down conversion yourself on this website. Click the word in the sentence, and the abstract syntax tree node on the right will be selected, as shown in the figure below:

The format after the code is converted into AST is roughly as shown in the figure below:

In order to make it easier for everyone to understand the abstract syntax tree, let's take a look at a specific example.

var tree = 'this is tree'

js source code will be transformed into the following abstract syntax tree:


{
  "type": "Program",
  "start": 0,
  "end": 25,
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 0,
      "end": 25,
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 4,
          "end": 25,
          "id": {
            "type": "Identifier",
            "start": 4,
            "end": 8,
            "name": "tree"
          },
          "init": {
            "type": "Literal",
            "start": 11,
            "end": 25,
            "value": "this is tree",
            "raw": "'this is tree'"
          }
        }
      ],
      "kind": "var"
    }
  ],
  "sourceType": "module"
}

It can be seen that a statement is composed of several lexical units. This token is like 26 letters. Create hundreds of thousands of words, and write articles with different contents through the combination of different words.

The type field in string form indicates the type of the node. Such as "BlockStatement", "Identifier", "BinaryExpression" and so on. Each type of node defines some attributes to describe the node type, and then these nodes can be used to analyze other operations.

2 How AST is generated

Seeing this, you should already know what an abstract syntax tree looks like. So how is AST generated?

Take the above var tree = 'this is tree' as an example:

lexical analysis

The lexical analysis stage scans the input source code string to generate a series of lexical units (tokens), which include numbers, punctuation marks, operators, etc. Lexical units are independent, that is, at this stage we don't care how each line of code is combined.

You can roughly see the basic structure of the source code before conversion.

Grammatical analysis stage - the teacher teaches us the specific role and meaning of each word in the context of the entire sentence.

  • code generation

Finally, there is the code generation stage, which is a very free link and can be composed of multiple steps. At this stage, we can traverse the initial AST, modify its structure, and then generate the corresponding code string from the modified structure.

Code generation phase - we have figured out the grammatical structure of each sentence and know how to write a grammatically correct English sentence. Through this basic structure, we can perfectly convert an English sentence into a Chinese sentence.

3 Basic structure of AST

Regardless of the specific compiler and programming language, everything in the "AST World" is a node (Node), and different types of nodes are nested with each other to form a complete tree structure.

{
  "program": {
    "type": "Program",
    "sourceType": "module",
    "body": [
      {
        "type": "FunctionDeclaration",
        "id": {
          "type": "Identifier",
          "name": "foo"
        },
        "params": [
          {
            "type": "Identifier",
            "name": "x"
          }
        ],
        "body": {
          "type": "BlockStatement",
          "body": [
            {
              "type": "IfStatement",
              "test": {
                "type": "BinaryExpression",
                "left": {
                  "type": "Identifier",
                  "name": "x"
                },
                "operator": ">",
                "right": {
                  "type": "NumericLiteral",
                  "value": 10
                }
              }
            }
          ]
        }
        ...
       }
       ...
    ]
}

The structure of AST is different in different language compilers, different compilation tools, and even different versions of languages. Here is a brief introduction to the general specifications that JavaScript compilers follow—some basic definitions of AST structure in ESTree. Different compilation tools are based on this structure. Corresponding expansions have been made.

4 Application Scenarios and Usage of AST

After understanding the concept and specific structure of AST, you may not help but ask: What are the usage scenarios of AST and how to use it?
Code syntax check, code style check, code formatting, code highlighting, code error prompts, code auto-completion, etc.

  • For example, JSLint and JSHint check code errors or styles to find some potential errors.
  • IDE's error prompts, formatting, highlighting, auto-completion, etc.

Code obfuscation compression.

  • UglifyJS2 etc.

Optimize and change the code, change the code structure to achieve the desired structure.

  • Code packaging tools webpack, rollup, etc.
  • Conversion between CommonJS, AMD, CMD, UMD and other code specifications.
  • Convert CoffeeScript, TypeScript, JSX, etc. to native Javascript.

As for how to use AST, it can be summed up in the following steps:

  1. Parsing: This process is implemented by the compiler, which will go through the lexical analysis process and the syntax analysis process to generate AST.
  2. Read/traverse (Traverse): Depth-first traversal of the AST, accessing the information of each node on the tree (Node).
  3. Modification/transform (Transform): During the traversal process, the node information can be modified to generate a new AST.
  4. Output (Printing): After converting the initial AST, according to different scenarios, the new AST can be directly output or translated into a new code block.

Usually using AST, we focus on steps 2 and 3. The general capabilities exposed by tools such as Babel and ESLint are to access and modify the initial AST.

The implementation of these two steps is based on a design pattern called the visitor pattern, which defines a visitor object on which access methods for various types of nodes are defined, so that different processing can be done for different nodes. For example, writing a Babel plugin is actually constructing a visitor instance to process the information of each node to generate the desired result.

const visitor = {

    CallExpression(path) {

        ...

    }

    FunctionDeclaration(path) {

        ...

    }   

    ImportDeclaration(path) {

        ...

    }

    ...

}

traverse(AST, visitor)

5 AST conversion process

Use babel-core (babel core library, realize the core conversion engine) and babel-types (can realize type judgment, generate AST nodes, etc.) and AST to convert

let sum = (a, b) => a + b

changed to:

let sum = function(a, b) {
  return a + b
}

The implementation code is as follows:

// babel核心库,实现核心的转换引擎
let babel = require('babel-core');
// 可以实现类型判断,生成AST节点等
let types = require('babel-types');

let code = `let sum = (a, b) => a + b`;
// let sum = function(a, b) {
//   return a + b
// }

// 这个访问者可以对特定类型的节点进行处理
let visitor = {
  ArrowFunctionExpression(path) {
    console.log(path.type);
    let node = path.node;
    let expression = node.body;
    let params = node.params;
    let returnStatement = types.returnStatement(expression);
    let block = types.blockStatement([
        returnStatement
    ]);
    let func = types.functionExpression(null,params, block,false, false);
    path.replaceWith(func);
  }
}

let arrayPlugin = { visitor }
// babel内部会把代码先转成AST, 然后进行遍历
let result = babel.transform(code, {
  plugins: [
    arrayPlugin
  ]
})
console.log(result.code);

Word segmentation divides the entire code string into an array of the smallest syntax units, generates an AST abstract syntax tree, generates a new AST tree through transformation transformer, and traverses to generate the final desired result generator:

AST's three tricks:

  • Generate AST via esprima
  • Traverse and update AST with estraverse
  • Regenerate AST source code through escodegen

We can do a simple example:
1. Create a test project directory first.
2. Install the npm modules of esprima, estraverse, and escodegen under the test project

npm i esprima estraverse escodegen --save

3. Create a new test.js file under the directory and load the following code:

const esprima = require('esprima');
let code = 'const a = 1';
const ast = esprima.parseScript(code);
console.log(ast);

You will see the output:

Script {
  type: 'Program',
  body:
   [ VariableDeclaration {
       type: 'VariableDeclaration',
       declarations: [Array],
       kind: 'const' } ],
  sourceType: 'script' }

4. In the test file, load the following code:

const estraverse = require('estraverse');

estraverse.traverse(ast, {
    enter: function (node) {
        node.kind = "var";
    }
});

console.log(ast);

5. Finally, in the test file, add the following code:

const escodegen = require("escodegen");
const transformCode = escodegen.generate(ast)

console.log(transformCode);

Output result:

var a = 1;

Through these three tricks: we transform const a = 1 into var a = 1

6 Practical applications

Use AST to implement the pre-calculated Babel plug-in, the implementation code is as follows:

// 预计算简单表达式的插件
let code = `const result = 1000 * 60 * 60`;
let babel = require('babel-core');
let types= require('babel-types');

let visitor = {
  BinaryExpression(path) {
    let node = path.node;
    if (!isNaN(node.left.value) && ! isNaN(node.right.value)) {
      let result = eval(node.left.value + node.operator + node.right.value);
      result = types.numericLiteral(result);
      path.replaceWith(result);
      let parentPath = path.parentPath;
      // 如果此表达式的parent也是一个表达式的话,需要递归计算
      if (path.parentPath.node.type == 'BinaryExpression') {
        visitor.BinaryExpression.call(null, path.parentPath)
      }
    }
  }
}

let cal = babel.transform(code, {
  plugins: [
    {visitor}
  ]
});

Author: JD Logistics Li Qiong

Source: JD Cloud Developer Community Ziyuanqishuo Tech

The 8 most in-demand programming languages ​​in 2023: PHP strong, C/C++ demand slow Programmer's Notes CherryTree 1.0.0.0 released CentOS project declared "open to everyone" MySQL 8.1 and MySQL 8.0.34 officially released GPT-4 getting more and more stupid? The accuracy rate dropped from 97.6% to 2.4%. Microsoft: Intensify efforts to use Rust Meta in Windows 11 Zoom in: release the open source large language model Llama 2, which is free for commercial use. The father of C# and TypeScript announced the latest open source project: TypeChat does not want to move bricks, but also wants to fulfill the requirements? Maybe this 5k star GitHub open source project can help - MetaGPT Wireshark's 25th anniversary, the most powerful open source network packet analyzer
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10089879