Vue source code analysis: AST abstract syntax tree

1. Introduction to algorithm:

1. Pointer thinking

        Question: Find the most consecutive and repeated characters in a string

ps: If there are multiple, you only need to find the first one. For example, a and b are the most consecutive. If the one found first is a, then it is a.

// 指针思想,提升遍历效率,每个字符只遍历一遍

var str = 'abcdaabbddccccwssdwdaddwwjjjjjjjjxxxxdd'

var i = 0
var j = 0

var maxRepeatChar = ''

var maxReapeatCount = 0

console.log(str)
while(i <= str.length - 1){
    if (str[i] != str[j]){
        // console.log('找到目前最长得'+str[i]+'重复了'+(j-i))
        if(j-i > maxReapeatCount){
            maxReapeatCount = j-i;
            maxRepeatChar = str[i]
        }
        i = j
    }
    j++
}
console.log("结果连续字符最长的是",maxRepeatChar,'数量是',maxReapeatCount)

Summarize:

        This algorithm question is designed to test the mastery of pointers. A simple loop search calculation can also be completed, but it requires many repeated steps, which wastes computing power. Through the use of i and j pointers, it can be traversed in one time without repetition. , this is the important role of pointers, here it is the subscript method. (Note that the pointer here refers to the pointer idea. In fact, it is more like the subscript method here, but the idea is the pointer idea. It is not like changing the memory like C language. Here is the subscript)

2. Fibonacci Sequence:

        Question: Find the nth term of the Fibonacci sequence. Fibonacci will not be elaborated in much. It is a recursive application of the classics, but it will face a problem, that is, repeated calculations. For example: solving fib (9 ), fib(8) and fib(7) will be calculated, but when calculating fib(8), fib(7) and fib(6) will be calculated again, so fib(7) will be repeatedly calculated.

// 递归经典,斐波那契数列

function fib(n){
    return n==0 || n==1 ? 1: fib(n-1) + fib(n-2)
}

for (let i =0;i<=9;i++){
    console.log(fib(i))
}

2.1 Fibonacci + cache idea

        Background: Although there is no problem in implementing the Fibonacci sequence at code 2, it faces the problem of a large number of repeated calculations, which is also a waste of computing power. Therefore, the idea of ​​caching is introduced and cached before calculating a fib(n). If you have already calculated the value, you can directly get the value without further calculation, which greatly saves performance.

        

// 递归经典,斐波那契数列,解决重复计算问题,

// 创建缓存对象
var cache = {}

function fib(n){
    // 判断缓存对象中有没有这个值,如果有直接用
    if(cache.hasOwnProperty(n)){
        // 如果有这里直接return结果,代码就不会走往下的逻辑了
        return cache[n]
    }
    // 缓存对象没有值
    var v = n==0 || n==1 ? 1: fib(n-1) + fib(n-2)
    // 那么需要写入缓存
    cache[n] = v
    return v
}

for (let i =0;i<=9;i++){
    console.log(fib(i))
    console.log(cache)
}

3. Recursion example 2 (if there is a recurrence of rules, you must think of recursion)

        Question: Try to change the high-dimensional array [1, 2, [3, [4, 5], 6], 7, [8], 9] into the object shown in the picture

// 测试数组
var arr = [1,2,[3,[4,5],6],7,[8],9]

// 转换函数
function convert(arr) {
    // 准备结果数组
    var result = []
    // 遍历传入的arr的每一项
    for (let i=0;i<arr.length;i++){
        // 如果遍历到的数字是number,直接放进
        if(typeof arr[i] == 'number'){
            result.push({
                value: arr[i]
            })
        } else if (Array.isArray(arr[i])){
            // 如果遍历的是数组,那么就递归
            result.push({
                children:convert(arr[i])
            })
        }
    }
    return result
}

Writing method two:

        Using the map mapping method, why do you think of mapping? In fact, for the arr test array, it can actually be said that there are only 6 items, respectively 1, 2, [3, [4, 5], 6], 7 , [8], 9. For these items [3, [4, 5], 6], it can be seen that the mapping relationship is actually a chilren for arr, and the content inside is the mapping content.

        PS: The number of recursions in method one is much less than that in method two, because method one only recurses on arrays, but method two recurses on everything item.map(_item => convert(_item)) recurses each item during traversal.

// 写法二
// 参数不是arr这个词语了,item可能是数组,也可能是数字
function convert(item){
    if(typeof(item) == 'number'){
        return {
            value: item
        }
    } else if(Array.isArray(item)) {
        // 如果传进来的参数是数组
        return {
            children:item.map(_item => convert(_item))
        }
    }
}

4.Stack

        Stack, also known as stack, is a linear table with limited operations, and insertion and deletion operations can only be performed at the end of the table . This end is called the top of the stack , and the other end is called the bottom .

        Inserting new elements into a stack is also called pushing, pushing or pushing ; deleting elements from a stack is also called popping or popping .

        Characteristics of last-in-last-out (LIFO) : Among the elements in the stack, the first one pushed into the stack must be the last one to be popped off the stack, and the last one pushed into the stack must be popped out first.

        In JavaScript, the stack can be simulated with an array . It is necessary to limit the use of push() and pop(), not unshift() and shift(). That is, the end of the array is the top of the stack

        Of course, you can use object-oriented and other methods to encapsulate the stack better

4.1 Stack question

        Problem-solving ideas:

①During lexical analysis, the data structure stack is often used.

② Big pitfall for beginners: The stack question is very similar to recursion

        Try writing the "smart repeat" smartRepeat function to implement:

        Change 3[abc] to abcabcabc.
        Change 3[2[a]2[b]] to aabbaabbaabb. Change
        2[1[a]3[b]2[3[c]4[d]]] to abbbcccddddcccddddabbbcccddddcccdddd.
        Not needed. Consider the situation where the input string is illegal, for example:
         2[a3[b]] is wrong, you should add a 1, that is, 2[1[a]3[b]]

          [abc] is wrong, you should add a 1, that is, 1[abc]

Prerequisite knowledge:

 

 Code part:

function smartRepeat(templateStr){
    // 指针
    var index = 0
    // 两个栈,存放数字
    // 栈1,存放数字
    var stack1 = []
    // 栈2,存放临时字符串
    var stack2 = []
    // 剩余部分
    var rest = templateStr

    while(index < templateStr.length - 1){
        // 剩余部分
        rest = templateStr.substring(index)
        // console.log(templateStr[index],rest)

        // 看当前剩余部分是不是以数字和[开头
        if (/^\d+\[/.test(rest)){
            // console.log("以数字开头")
            // 得到这个数字
            let times = Number(rest.match(/^(\d+)\[/)[1])
            // 如果这个是数字就压栈,空字符串也要压栈
            stack1.push(times)
            stack2.push('')

             // 让指针后移,times这个数字是多少位就后移多少位数+1,+1的是左方括号
             index += times.toString().length + 1
        }else if(/^\w+\]/.test(rest)){
            // 如果这个字符是字母,那么此时就把栈顶这项改为这个字母
            // 这里为什么要加[1]因为match返回的是一个数组,下标为1的才是需要项
            let word = rest.match(/^(\w+)\]/)[1]
            stack2[stack2.length - 1] = word

            // 让指针后移,word这个词语是多少位就后移多少位
            index += word.length
        } else if(rest[0] ==']'){
            // 如果这个字符是],那么就①将stack1弹栈,②stack2弹栈,③就把字符串栈的栈顶的元素重复到
            // 刚刚的这个次数,弹栈,拼接到新栈顶上
            let times = stack1.pop()
            let word = stack2.pop()

            // repeat是ES6的方法
            stack2[stack2.length - 1] += word.repeat(times)


            index++
        }
        console.log(index,stack1,stack2)
    }

    // while结束之后,stack1和stack2中肯定还有一项,返回他们的相乘即可,如果组成的字符串不对
    // 或者个数不对,可能是用户的问题,方括号没有闭合之类的
    return stack2[0].repeat(stack1[0])
}

console.log(smartRepeat('3[2[a]2[b]]'))

2. AST abstract syntax tree implementation:

1.index.js

import parse from "./parse"

var templateString = `
        <div>
            <h3 class="23 aa cc" id="66" data-n="8">你好</h3>
            <ul>
                <li>A</li>
                <li>B</li>
                <li>C</li>
            </ul>
        </div>
`
const ast = parse(templateString)

console.log(ast)

2.parse.js

import parseAttrsString from "./parseAttrsString"
// parse函数 主函数
export default function (templateString){
    // 指针
    var index = 0
    // 剩余部分
    var rest = ''
    // 开始标记
    // var statRegExp = /^\<([a-z]+[1-6]?)\>/
    // 当有attrs属性的时候需要修改规则
    var statRegExp = /^\<([a-z]+[1-6]?)(\s[^\<]+)?\>/
    // 结束标记
    var endRegExp = /^\<\/([a-z]+[1-6]?)\>/
    // 检测到文本
    var wordRegExp = /^([^\<]+)\<\/[a-z]+[1-6]?\>/
    // 准备两个栈
    var stack1 = []
    // 补充默认项children,这样就无需判断栈是否为空了
    var stack2 = [{'children':[]}]

    while(index < templateString.length -1) {
        rest = templateString.substring(index)
        // console.log("han",rest)
        // 识别遍历到的这个字符,是不是一个开始标签
        if (statRegExp.test(rest)){
                let tag = rest.match(statRegExp)[1]
                let attrsString = rest.match(statRegExp)[2]
                console.log(attrsString,666)
                console.log("检测到开始标记",tag)
                // 讲开始标记推入栈中
                stack1.push(tag)
                // 讲空数组推入栈2
                stack2.push({'tag':tag,'children':[],'attrs':parseAttrsString(attrsString)})
                // console.log(stack1,stack2,'查看入栈情况')
                // 得到attrs的总长度
                const attrsStringLength = attrsString != null ? attrsString.length : 0

                index += tag.length + 2 +attrsStringLength
        }else if(endRegExp.test(rest)){
            // 识别遍历到的这个字符,是不是一个结束标签
            let tag = rest.match(endRegExp)[1]
            console.log("结束标签",tag)
            let pop_tag = stack1.pop()
            // 检查tag是不是和入栈的时候标签相同?,一定是相同的
            if(tag == pop_tag){  
                let pop_arr = stack2.pop()
                if(stack2.length >0){
                    stack2[stack2.length - 1].children.push(pop_arr)
                }
            }else {
                throw new Error(stack1[stack1.length -1]+'标签没用闭合!')
            }
            // 指针移动标签的长度加3,为什么加3因为</>也站3位
            index += tag.length +3
            // console.log(stack1,stack2,'查看出栈情况')
        }else if(wordRegExp.test(rest)){
            let word = rest.match(wordRegExp)[1]
            // 看看word是不是全是空
            if (!/^\s+$/.test(word)){
                console.log('监测到文字',word)
                stack2[stack2.length - 1].children.push({'text':word,'type':3})
            }
            index += word.length
        }
        else{
            index ++
        }
        
    }
    console.log(stack2)
    // 此时stack2就是我们之前默认放置的一项了,此时要返回的这一项的children即可
    return stack2[0].children[0]
}

3.parseAttrsString.js

export default function(attrsString){
    if (attrsString == undefined) return []

    // 当前是否在引号内
    var isYinhao = false
    // 断点
    var point = 0
    // 结果数组
    var result = []


    // 遍历attrsString,不能用split方法简单解决
    for (let i=0;i<attrsString.length;i++) {
        let char = attrsString[i]
        if (char == '"'){
            isYinhao = !isYinhao
        }else if (char == ' ' && !isYinhao){
            // 遇到空格,并且不在引号内
            if(!/^\s*$/.test(attrsString.substring(point,i))){
                result.push(attrsString.substring(point,i).trim())
                point = i
            }  
        }
    }
    // 循环结束之后,最后还剩一个属性k="v"
    result.push(attrsString.substring(point).trim())


    // 为什么选择映射,是因为映射可以保证数组的项不会增加或减少
    // 功能是将["k"="v","k"="v","k"="v"]变为[{name:k,value:v},{name:k,value:v},{name:k,value:v}]
    result = result.map(item =>{
        // 根据等号拆分
        const o = item.match(/^(.+)="(.+)"$/)
        return{
            name:o[1],
            value:o[2]
        }
    })
    
    console.log(result)
    return 66
}

Implementation ideas and methods:

        It is beneficial to combine the pointer idea with the data structure of the stack. The data structure of the stack is often used in lexical analysis. The function of the pointer is to facilitate traversal. The traversal of the AST syntax tree uses the pointer movement method. The advantage of pointer movement is to reduce traversal. The length does not need to be traversed from the beginning every time. If the processed part only needs to be moved forward, that is, index+=? (Specific moving steps), and what needs to be processed here is that the remaining part is represented by the rest variable. The string we process is the DOM structure string type, so we need to use two stacks for storage. Simply name stack1 and stack2. Why here? Do you need two stacks? Because stack1 is mainly a complex operation of loading and unloading the stack, because the DOM structure string has an end tag in addition to the start tag, and has a complex nesting relationship, it is very suitable when combined with the structure type of the stack, because the inner HTML tags are out The stack is also the closest. For example, if the innermost tag is <li>, then the first one encountered must be </li>, because many html tags appear in pairs. Combining the stack can complete the label in and out operation. , the main function of stack1 is to make tags enter and exit the stack in a predetermined order, while stack2 stores the data that needs to be operated when they enter and exit the stack. For example, the outermost <div> tag has attributes such as tag and children, then in <div> When the tag is pushed into the stack, the teammate attributes will be pushed to stack2. Then </div> can easily find the corresponding operation data when performing stack operations. This is the implementation idea and the algorithm question of 4.1 Smart Sorting is Roughly the same, the difference is that the string parts that meet the conditions are filtered out through regular expressions and processed. For example, the ending tag is as follows:

var endRegExp = /^\<\/([a-z]+[1-6]?)\>/

You only need to execute the attributes carried by the push div tag of stack1 and the push tag of stack2 when encountering the start tag. When popping out of the stack, stack1 performs the pop operation. At the same time, stack2 operates on the current data and pushes the integrated data to stack2. In the previous layer, and so on, when reaching the outermost div, all other internal element data operations have been completed in the stack2 array, so the final data stored in stack2 is the data returned after processing from the innermost layer layer by layer. , which is the data structure we ultimately need, as shown in the figure:

 

Reference video:

[Silicon Valley] AST abstract syntax tree for Vue source code analysis

Some picture sources:

[Silicon Valley] AST abstract syntax tree for Vue source code analysis

Guess you like

Origin blog.csdn.net/weixin_54515240/article/details/130261374