Abstract syntax tree from babel compilation principle to vue source code

image.png

what is babel

  • Babel is a javascript compiler. Since the birth of ECMAScript, it has acted as a translator of code and operating environment, allowing us to use the new syntax of js to write code as we like.
  • babel is a toolchain
  • What can babel do for developers?
    1. syntax conversion;
    2. source code conversion;
    3. Add missing features in the target environment through polyfill (transform and polyfill by introducing third-party polyfill modules);
    4. Using Babel to operate on source code can be used for static analysis. The ones that are often used in actual combat are: automatic internationalization, automatic document generation, automatic burying, js interpreter, etc.;

The compilation process of babel

Parse

  • implement
  • Call parser to traverse all plugins
  • Find parserOverride compiler
  • A parser that converts js code into an abstract syntax tree (AST for short)

Transform

  • pre (initialize the operation object when entering the AST)
  • visitor (execute the method in the visitor after initializing the operation object, use traverse to traverse the AST nodes, and execute the plug-in during the traversal process)
  • post (delete operation object when leaving AST)

Generate

  • After the AST conversion is completed, the code is regenerated by generate
  • After the target code is generated, the code related to sourceMap will also be generated at the same time

The abstract syntax tree of the actual Vue source code

For normal vue, we all write template syntax. If we want to compile it into normal html syntax, it is very difficult to compile it directly, so we need to use the abstract syntax tree to turn around, that is, first we have to change _template syntax_ into AST , After the completion, turn the AST into normal HTML syntax , and the abstract syntax tree acts as an intermediate transition, making the compilation work easier.
This is the template syntax

insert image description here


This is normal HTML syntax

insert image description here


This is the abstract syntax tree that needs to be parsed into

insert image description here


The bottom layer of Vue examines all templates from the perspective of strings. If you look at it this way, the parsed AST is like a js version of HTML objects. The abstract syntax tree is widely used in compiling templates, that is, as long as there is a place to compile templates An abstract syntax tree will definitely be used.
What is the relationship between abstract syntax trees and virtual nodes?
Template syntax => abstract syntax tree => rendering function (h function) => virtual node => interface

Required Algorithm Reserve

AST pointer idea

【例1】试寻找字符串中,连续重复次数最多的字符‘aaaaaaaaaabbbbbbbbcccddd’
解:
var str = 'aaaaaaaaaaabbbccccccddd';
//指针
var i = 0;
var j = 1;
// 当前重复次数最多的次数
var maxRepeatCount = 0;
// 重复次数最多的字符串
var maxRepeatChar = '';
//当i还在范围内的时候,应该继续寻找
while (i <= str.length - 1) {
    
    
  //看i指向的字符和j指向的字符是不是不相同
  if (str[i] != str[j]) {
    
    
    console.log(i + '和' + j + '之间文字连续相同' + '字母' + str[i] + '它重复了' + (j - i) + '次')
    //和当前重复次数最多的进行比较
    if (j - i > maxRepeatCount) {
    
    
      // 如果当前文字重复次数(j - i)超过了此时的最大值
      // 就让它成为最大值
      maxRepeatCount = j - i;
      // 将i指针只想的字符串存为maxRepeatChar
      maxRepeatChar = str[i];
    }
    // 让指针i追上指针j
    i = j;
  }
  //不管相不相同,j永远要后移
  j++;
}

Problem-solving ideas:
The first thing that comes to mind is to take out all the substrings and compare whether the length is the same, but in this case, the number of cycles will be more, and it will waste efficiency. Many strings that are no longer repeated will also be calculated Go in for comparison, so I will introduce the pointer method.
The pointer is the subscript, not the pointer in the c language, the pointer in the c language can operate the memory, and the pointer in the js is a position in the table below.
i: 0
j: 1
If the words pointed to by i and j are the same, then i does not move, and j moves backward
. If the words pointed to by i and j are different, it means that the words between them are consecutively the same, let i follow Up j, j moves backward.
operation result:

Between 0 and 11 the letter a is repeated 11 times between
11 and 14 b it is repeated 3 times
between 14 and 20 it is repeated 6 times
between 20 and 23 Text with the same letter d in a row it repeats 3 times

recursive thinking

【例1】斐波那契数列,求前N项的和
解:
function fib(n) {
    
    
  return n == 0 || n == 1 ? 1 : fib(n - 1) + fib(n - 2)
}
这其实很容易,但是衍生出来一个思考,代码是否有大量重复计算?应该怎样解决重复计算的问题。
缓存思想用hasOwnProperty方法判断
这样的话总的递归次数减少了,只要命中了缓存,就直接读缓存,不会再引发下一次递归了
解:
var cache = {
    
    }
function fib(n) {
    
    
  if (cache.hasOwnProperty(n)) {
    
    
    return cache[n]
  }
  var v = n == 0 || n == 1 ? 1 : fib(n - 1) + fib(n - 2)
  cache[n] = v;
  return v;
}
【例2】将高维数组 [1, 2, [3, [4, 5], 6], 7, [8], 9] 转换为以下这个对象
{
    
    
  children: [
    {
    
     value: 1 },
    {
    
     value: 2 },
    {
    
    
      children: [
        {
    
     value: 3 },
        {
    
    
          children: [
            {
    
     value: 4 },
            {
    
     value: 5 }
          ]
        },
        {
    
     value: 6 }
      ]
    },
    {
    
     value: 7 },
    {
    
    
      children: [
        {
    
     value: 8 }
      ]
    },
    {
    
     value: 9 }
  ]
}

解:
var arr = [1, 2, 3, [4, 5, 6]]
function convert(arr) {
    
    
  var result = [];
  for (let i = 0; i < arr.length; i++) {
    
    
    if (typeof arr[i] === 'number') {
    
    
      result.push({
    
    
        value: arr[i]
      })
    } else if (Array.isArray(arr[i])) {
    
    
      result.push({
    
    
        children: convert(arr[i])
      })
    }
  }
  return result;
}

// 还有一种更秒的写法不用循环数组,大大减少了递归次数
function convert(item) {
    
    
  var result = [];
  if (typeof item === 'number') {
    
    
    return {
    
    
      value: item
    }
  } else if (Array.isArray(item)) {
    
    
    return {
    
    
      children: item.map(_item => convert(_item))
    }
  }
  return result;
}

stack structure

We all know that an array is a linear structure, and data can be inserted and deleted at any position. But sometimes in order to achieve certain functions, we must limit this arbitrariness. Stacks and queues are common restricted linear structures.
A stack is a restricted linear list, last in first out. The restriction is that insertion and deletion operations are only allowed at one end of the table, which is called the top of the stack and the other end is called the bottom of the stack. LIFO (last in first out) means that it is the element that enters later, and the stack space is popped first. Inserting a new element into a stack is also called pushing, pushing, or pushing. It is to put the new element on top of the top element of the stack to make it a new top element of the stack. Deleting elements from a stack is also known as making a stack or unstacking. It deletes the top element of the stack and makes its adjacent element the new top element of the stack.

【例1】smartRepeat智能重复字符串问题
将 '3[abc]' 变为 'abcabcabc''3[2[a]2[b]]' 变成 'aabbaabbaabb''2[1[a]3[b]2[3[c]4[d]]]' 变成 'abbbcccddddcccddddabbbcccddddcccdddd'
function smartRepeat(templateStr) {
    
    
  // 指针下标
  let index = 0
  // 栈一,存放数字
  let stack1 = []
  // 栈二,存放需要重复的字符串
  let stack2 = []
  let tailStr = templateStr
  // 为啥遍历的次数为 length - 1 ? 因为要估计忽略最后的一个 ] 字符串
  while (index < templateStr.length - 1) {
    
    
    // 剩余没处理的字符串
    tailStr = templateStr.substring(index)
    if (/^\d+\[/.test(tailStr)) {
    
    
      // 匹配 "[" 前的数字
      let word = tailStr.match(/^(\d+)\[/)[1]
      // 转为数字类型
      let num = Number(word)
      // 入栈
      stack1.push(num)
      stack2.push('')
      index++
    } else if (/^\w+\]/.test(tailStr)) {
    
    
      // 匹配 "]" 前的需要重复的字符串
      let word = tailStr.match(/^(\w+)\]/)[1]
      // 修改栈二栈顶的字符串
      stack2[stack2.length - 1] = word
      // 让指针后移,word的长度,避免重复计算字符串
      index += word.length
    } else if (tailStr[0] === ']') {
    
    
      // 遇到 [ 字符串就需要出栈了,栈一和栈二同时出栈,栈二出栈的字符串重复栈一出栈的 数字的次数,并赋值到栈二的新栈顶上
      let times = stack1.pop()
      let word = stack2.pop()
      stack2[stack2.length - 1] += word.repeat(times)
      index++
    } else {
    
    
      index++
    }
    // console.log('tailStr', tailStr)

    // console.log('index', index)

    // console.log('stack1', stack1)

    // console.log('stack2', stack2)

  }
  // while结束之后, stack1 和 stack2 中肯定还剩余1项,若不是,则用户输入的格式错误
  if (stack1.length !== 1 || stack2.length !== 1) {
    
    
    throw new Error('输入的字符串有误,请检查')
  } else {
    
    
    return stack2[0].repeat(stack1[0])
  }
}

Problem-solving ideas:

iterate through each character

  • create two stacks
  • If the character is a number, then push the number onto stack 1, and push an empty string onto stack 2
  • If this character is a letter, then change the top item of stack 2 to this letter at this time
  • If the character is ], then the number is popped from stack 1, and the top element of stack 2 is repeated the number of times the number is popped from stack 1, stack 2 is popped, and spliced ​​to the new top of stack 2


The preliminary preparation is over, let us use "dream to illuminate reality" to convert **template string** into AST tree structure

Handwritten implementation of AST abstract syntax tree

insert image description here


We need to parse the template string into AST
problem-solving ideas:

  1. Parsing the attributes attribute in the html tag, the thinking of the stack is very useful when parsing the template string, and can quickly parse the nested HTML
  2. Convert the template string into an AST tree structure. The algorithm used is the stack, and the idea of ​​​​the stack in the algorithm reserve is used.

Parse the attributes attribute in the html tag

export default function (attrsString) {
    
    
  let result = []
  if (!attrsString) {
    
    
    return result
  } else {
    
    
    // console.log('attrsString', attrsString)
    // 案例 'class="box" title="标题" data-type="3"'
    let isMatchQuot = false // 是否遇到引号
    // 改变了一下写法,采用双指针来记录 "" 之间走过的字符串
    let i = 0
    let j = 0

    while (j < attrsString.length) {
    
    
      // 当前指针指向的这一项
      const char = attrsString.charAt(j)
      if (char === '"') {
    
    
        // 匹配 " 字符
        isMatchQuot = !isMatchQuot
      } else if (!isMatchQuot && char === ' ') {
    
    
        // 没匹配到 " 字符,而且当前项是空格

        // 尝试拿到 i 和 j 指针之间的目标字符串
        const target = attrsString.substring(i, j).trim()
        // console.console.log(target);
        if (target) {
    
    
          result.push(target)
        }

        // 让指针i 追上 指针j
        i = j
      }
      j++
    }
    // 循环结束,还剩下最后一项属性
    result.push(attrsString.substring(i).trim())

    // filter过滤空字符 
    return result.filter(item => item).map(item => {
    
    
      const res = item.match(/^(.+)="(.+)"$/)
      return {
    
    
        name: res[1],
        value: res[2]
      }
    })
  }
}

Idea:
For the attrs attribute, remove the spaces on both sides, and then use the pointer to judge and intercept the contents of each attribute in turn, and return the key value in the form of an object.

Convert template string to AST tree structure

import parseAttribute from './parseAttribute'

export default function parse(templateString) {
    
    
  let index = 0
  // 未处理的字符串
  let tailStr = templateString
  // 匹配开始的html标签
  const startTagRegExp = /^\<([a-z]+[1-6]?)(\s[^\<]+)?\>/
  // 匹配结束的html标签
  const endTagRegExp = /^\<\/([a-z]+[1-6]?)\>/
  // 抓取结束标签前的文字
  const wordRegExp = /^([^\<]+)\<\/[a-z]+[1-6]?\>/

  // 准备两个栈
  let stack1 = [] // 存储匹配到的开始html标签
  let stack2 = []
  let result = null

  while (index < templateString.length - 1) {
    
    
    tailStr = templateString.substring(index)

    if (startTagRegExp.test(tailStr)) {
    
    
      // 匹配开始标签
      const res = tailStr.match(startTagRegExp)
      const startTag = res[1]
      const attrsString = res[2]
      // 开始将标记放入到栈1中
      stack1.push(startTag)
      // 将对象推入数组
      stack2.push({
    
     tag: startTag, children: [], attrs: parseAttribute(attrsString)})
      // 得到attrsString的长度
      const attrsStringLength = attrsString ? attrsString.length : 0
      // 指针移动标签的长度 + 2 + attrsString.length
      // 为什么 +2,因为 <> 也占两个长度
      index += startTag.length + 2 + attrsStringLength
    } else if (endTagRegExp.test(tailStr)) {
    
    
      // 匹配结束标签
      const endTag = tailStr.match(endTagRegExp)[1]
      // 栈1和栈2都需要弹栈
      const pop_tag = stack1.pop()
      const pop_obj = stack2.pop()
      // 此时栈1的栈顶的元素肯定和endTag相同
      if (endTag === pop_tag) {
    
    
        if (stack2.length > 0) {
    
    
          stack2[stack2.length - 1].children.push(pop_obj)
        } else if (stack2.length === 0) {
    
    
          // 匹配到结束标签,且stack2出栈完毕,证明已经遍历结束,那么结果就是stack2最后出栈的那一项
          result = pop_obj
        }
      } else {
    
    
        throw new Error(`${
      
      pop_tag}便签没有封闭!!`)
      }
      // 指针移动标签的长度 + 3,为什么 +3,因为 </> 也占三个长度
      index += endTag.length + 3
    } else if (wordRegExp.test(tailStr)) {
    
    
      // 识别遍历到的这个字符,是不是文字,并且不能全是空字符
      const word = tailStr.match(wordRegExp)[1]
      if (!/^\s+$/.test(word)) {
    
    
        stack2[stack2.length - 1].children.push({
    
    
          text: word,
          type: '3'
        })
      }
      index += word.length
    } else {
    
    
      index++
    }
  }

  return result
}

Idea:
Set the moving pointer to determine whether the remaining string is a start label or an end label or text, and store the label name and container through two stacks. When the start label is encountered, the label is pushed into stack 1, the generated container is pushed into stack 2, and the text is filled The content at the top of stack 2, when closed, the content at the top of stack 2 will be moved to the previous container at the top of the stack.

import parse from './parse'

const templateString = `
<div>
    <h3 class="box" title="标题" data-id="1">模拟字符串</h3>
    <ul>
        <li>A</li>
        <li>B</li>
        <li>C</li>
    </ul>
</div>
`
console.log('输入的模板字符串', templateString);
const ast = parse(templateString)
console.log('生成的AST\n', ast)

Summarize

The above article is the main part of the AST abstract syntax tree in Vue. The most exciting part is the problem-solving ideas of stack and double pointer. I hope that after reading this article, you will have a new understanding of the algorithm of stack and pointer. .

Guess you like

Origin blog.csdn.net/gaojinbo0531/article/details/129355082