带你玩转babel工具链（二）@babel/traverse

一、前言

本篇源码，大多基于Babel 插件通关秘籍推荐直接去看~ 本文为总结文章。光神yyds~

在上一章

带你玩转工具链（一）@babel/parser

我们学习了@babel/parser的基本用法，了解了parser是通过配置支持不同语法。

本文将继续学习另一个babel非常核心的库@babel/traverse。目前大多数的babel插件都是基于这个库，做了很多代码转换，所以本文将通过手写一个@babel/traverse来了解其原理。

下面我们先看一个简单的示例，了解下如何使用

二、例子

import * as parser from "@babel/parser";
import traverse from "@babel/traverse";

const code = `function square(n) {
  return n * n;
}`;

const ast = parser.parse(code);

traverse(ast, {
  enter(path) {
    if (path.isIdentifier({ name: "n" })) {
      path.node.name = "x";
    }
  },
});

通过parser解析除了ast, 然后调用traverse对ast做了进一步的转换

let n = 1
// 转换后
let x = 1

以上就是一个简单的例子，下面了解下traverse的写法

三、traverse的写法

traverse支持三种写法

第一种：

traverse(ast, {
  enter(path) {
  },
  exit(path) {
  }
});

上面这种方式比较常用，那enter, exit在遍历ast节点的时候是如何执行的呢?

可以看下伪代码

function traverse(path) {
    // 递归子孙节点之前执行
    enter(path)
    
    traverse(child)
    
    // 子孙节点遍历完后，回溯时执行
    exit(path)
}

第二种

traverse(ast, {
    FunctionDeclaration(path)  {
    },
    xxx(path) {
    },
    // xxx
});

可以直接通过节点的类型操作AST节点。

大概的实现如下

function traverse(path) {
    // 获取当前节点类型
    const type = path.node.type
    // 传入的配置，并执行对应节点的回调
    visitor[type](path)
    
    traverse(child)
}

第三种

traverse(ast, {
    'FunctionDeclaration|VariableDeclaration'(path)  {
    },
    xxx(path) {
    },
    // xxx
});

这种写法比较特殊，用|分割不同的节点类型，可以同时命中多个，原理同上。

四、path

从上面的用法可以发现，有很多地方都会用到path，那path上到底有哪些属性和方法呢？

path的属性

path中的属性并不是很多，但是都很关键，可以看到有如下属性：

path {
    // 属性：
    node // 当前 AST 节点
    parent // 父 AST 节点
    parentPath // 父 AST 节点的 path
    scope // 作用域
    hub // 可以通过 path.hub.file 拿到最外层 File 对象， path.hub.getScope 拿到最外层作用域，path.hub.getCode 拿到源码字符串
    container // 当前 AST 节点所在的父节点属性的属性值
    key // 当前 AST 节点所在父节点属性的属性名或所在数组的下标
    listKey // 当前 AST 节点所在父节点属性的属性值为数组时 listkey 为该属性名，否则为 undefined
}

我们常用的主要有node, parentPath, parent, 那scope、hub又是什么呢？

path的特殊属性：scope

举个例子，有如下代码，声明了两个变量

// 变量v1
let v1 = 1
function fn() {
  // 变量v2
  let v2 = 2
  console.log(v2)
}

fn()

如果我想要在遍历到v2变量的时候访问父作用域下的v1变量，就可以通过path.scope.parent.bindings.v1获取到。

那么scope有哪些属性呢？

scope.bindings 当前作用域内声明的所有变量
scope.block 生成作用域的 block，例如FunctionDeclaration, Program等AST节点
scope.path 生成作用域的节点对应的 path，例如FunctionDeclaration, Program等AST节点的path
scope.references 所有 binding 的引用对应的 path，详见下文
scope.dump() 打印作用域链的所有 binding 到控制台
scope.parentBlock() 父级作用域的 block
getAllBindings() 从当前作用域到根作用域的所有 binding 的合并
getBinding(name) 查找某个 binding，从当前作用域一直查找到根作用域
getOwnBinding(name) 从当前作用域查找 binding
parentHasBinding(name, noGlobals) 查找某个 binding，从父作用域查到根作用域，不包括当前作用域。可以通过 noGlobals 参数指定是否算上全局变量（比如console，不需要声明就可用），默认是 false
removeBinding(name) 删除某个 binding
hasBinding(name, noGlobals) 从当前作用域查找 binding，可以指定是否算上全局变量，默认是 false
moveBindingTo(name, scope) 把当前作用域中的某个 binding 移动到其他作用域
generateUid(name) 生成作用域内唯一的名字，根据 name 添加下划线，比如 name 为 a，会尝试生成 _a，如果被占用就会生成 __a，直到生成没有被使用的名字。

上面的方法和属性有很多，就不一个个说明了，我们能通过scope很方便的操作作用域中的某个变量。完全不需要我们手动去获取对应的AST了！

path的特殊属性：hub

hub上面挂载了很多方法，其实都是file对象上的方法，推荐大家直接通过path.hub.file直接使用

hub {
  file: File,
  getCode: () => file.code,
  getScope: () => file.scope,
  addHelper: file.addHelper.bind(this), // 为代码添加运行时的helper
  buildError: file.buildCodeFrameError.bind(this)
}

其实hub的属性，主要依赖了file类的实例对象, 我们定位到File的源码位置 node_modules\@babel\core\lib\transformation\file\file.js

class File {
  constructor(options, {
    code,
    ast,
    inputMap
  }) {
    this.hub = {
      file: this,
      getCode: () => this.code,
      getScope: () => this.scope,
      addHelper: this.addHelper.bind(this),
      buildError: this.buildCodeFrameError.bind(this)
    };
    this.code = code;
    this.ast = ast;
    this.scope = this.path.scope;
    // 省略..
  }
  set(key, val) {}
  
  get(key) {}
  
  has(key) {}
  
  getModuleName() {}

  addHelper(name) {}
  
  // 省略..
}

File对象是@babel/core中声明的类，其中有几个重要的属性和方法

code：源代码
ast: ast对象
scope：根作用域
path: path对象
set, get: 可以在file上设置和获取自定义的属性值。
getModuleName：获取模块的路径
addHelper：通过AST操作，注入运行时代码

关于addHelper是干嘛的，后续文章将逐步展开，大家敬请期待。

path的方法

inList() 判断节点是否在数组中，如果 container 为数组，也就是有 listkey 的时候，返回 true
get(key) 获取某个属性的 path
set(key, node) 设置某个属性的值
getSibling(key) 获取某个下标的兄弟节点
getNextSibling() 获取下一个兄弟节点
getPrevSibling() 获取上一个兄弟节点
getAllPrevSiblings() 获取之前的所有兄弟节点
getAllNextSiblings() 获取之后的所有兄弟节点
find(callback) 从当前节点到根节点来查找节点（包括当前节点），调用 callback（传入 path）来决定是否终止查找
findParent(callback) 从当前节点到根节点来查找节点（不包括当前节点），调用 callback（传入 path）来决定是否终止查找
isXxx(opts) 判断当前节点是否是某个类型，可以传入属性和属性值进一步判断，比如path.isIdentifier({name: 'a'})
assertXxx(opts) 同 isXxx，但是不返回布尔值，而是抛出异常
insertBefore(nodes) 在之前插入节点，可以是单个节点或者节点数组
insertAfter(nodes) 在之后插入节点，可以是单个节点或者节点数组
replaceWith(replacement) 用某个节点替换当前节点
replaceWithMultiple(nodes) 用多个节点替换当前节点
replaceWithSourceString(replacement) 解析源码成 AST，然后替换当前节点
remove() 删除当前节点
traverse(visitor, state) 遍历当前节点的子节点，传入 visitor 和 state（state 是不同节点间传递数据的方式）
skip() 跳过当前节点的子节点的遍历
stop() 结束所有遍历

具体用法，大家可以逐个尝试，就不一一展开了

五、实现@babel/traverse

通过上面的学习，相信大家对@babel/traverse有个大概的认识，下面我们来一块手写@babel/traverse来加深对源码的理解。

第一步：声明节点可遍历的属性

下面的代码中，定义了各种可遍历的属性，例如FunctionDeclaration函数声明，包含函数体、参数、函数名，因为这些属性都可能对应一个或多个节点。我们需要标识出它的那些属性可以遍历，可以很方便的递归子节点。

下面图片中body包含了一个或多个AST Node, 所以需要标记这个属性是可以遍历的。

const astDefinationsMap = new Map();

astDefinationsMap.set('Program', {
    visitor: ['body']
});
astDefinationsMap.set('VariableDeclaration', {
    visitor: ['declarations']
});
astDefinationsMap.set('VariableDeclarator', {
    visitor: ['id', 'init']
});
astDefinationsMap.set('Identifier', {});
astDefinationsMap.set('NumericLiteral', {});
astDefinationsMap.set('FunctionDeclaration', {
    visitor: ['id', 'params', 'body']
});
astDefinationsMap.set('BlockStatement', {
    visitor: ['body']
});
astDefinationsMap.set('ReturnStatement', {
    visitor: ['argument']
});
astDefinationsMap.set('BinaryExpression', {
    visitor: ['left', 'right']
});
astDefinationsMap.set('ExpressionStatement', {
    visitor: ['expression']
});
astDefinationsMap.set('CallExpression', {
    visitor: ['callee', 'arguments']
});

第二步：遍历节点

下面我们直接看下ast是如何遍历的：

function traverse(node, visitors, parent, parentPath, key, listKey) {
  const defination = visitorKeys.get(node.type);
  // 获取访问器
  let visitorFuncs = visitors[node.type] || {};
  // 统一处理成enter
  if(typeof visitorFuncs === 'function') {
      visitorFuncs = {
          enter: visitorFuncs
      }
  }
  
  // 创建NodePath
  const path = new NodePath(node, parent, parentPath, key, listKey);
  
  // 调用enter
  visitorFuncs.enter && visitorFuncs.enter(path);
  
  // 判断是否跳过，path.skip()会标记该属性
  if(node.__shouldSkip) {
     delete node.__shouldSkip;
     return;
  }

  if (defination.visitor) {
      defination.visitor.forEach(key => {
          const prop = node[key];
          if (Array.isArray(prop)) {
              prop.forEach((childNode, index) => {
                  // 传入 path, key, index
                  traverse(childNode, visitors, node, path, key, index);
              })
          } else {
              traverse(prop, visitors, node, path, key);
          }
      })
  }
  
  // 执行退出exit
  visitorFuncs.exit && visitorFuncs.exit(path);
}

以上代码可以解释成这几步

统一visitor为enter方法
创建NodePath
执行enter方法
判断是否跳过递归
递归子ast节点'
执行exit方法，退出

第三步：定义NodePath类

这里的NodePath类，其实就是我们在遍历AST时参数接收的path,在遍历到新的节点时都会创建一个NodePath对象。由于path能通过parentPath获取父节点，也可以通过find查找父节点，等等各种属性和方法。所以我们需要实现一个类。具备以下功能：

能建立父子ast节点的联系
能删除当前node
能支持查找父节点
可跳过递归
...

下面就是代码的具体实现：

class NodePath {
  constructor(node, parent, parentPath, key, listKey) {
    this.node = node; // 记录当前ast的node
    this.parent = parent; // 记录父节点的node
    this.parentPath = parentPath; // 记录当前path的父path
    this.key = key;// 当前节点所在的Key，因为可能在父node节点的某个属性里，例如FunctionDeclaration的body
    this.listKey = listKey; // 当前节点所在的索引，因为可能在父node节点的某个属性里，例如FunctionDeclaration的body，对应在body的第几个
  }
  
  // 标记当前节点需要跳过
  skip() {
     this.node.__shouldSkip = true;
  }
  
  // 获取父节点对应的属性，并删除当前的节点
  remove () {
      if (this.listKey != undefined) {
        this.parent[this.key].splice(this.listKey, 1);
      } else {
        this.parent[this.key] = null;
      }
  }
  
  // 查找父path, 从父path开始查
  findParent(callback) {
      let curPath = this.parentPath;
      while (curPath && !callback(curPath)) {
        curPath = curPath.parentPath;
      }
      return curPath;
  }
  // 查找path， 从当前节点开始查
  find(callback) {
      let curPath = this;
      while (curPath && !callback(curPath)) {
          curPath = curPath.parentPath;
      }
      return curPath;
  }
}

可以注意到，构造函数中初始化了key和listKey，这两个有啥用呢？

注意图片中标识的地方，如果我们想要删除VaribleDeclaration这个节点，是不可能删除自己的，需要找到父节点上的body, 还要知道在body中第几个, 那么key就对应了body, listKey对应了索引。

第三步：Bingding类

在前面我们提到过path.scope这个属性对象，我们知道它可以通过path.scope.binding获取到作用域下的变量，那traverse是怎么存储这些变量呢？其实它为每一个变量声明都创建了一个Bingding对象，并存储在path.scope.bindings。

我们先来实现一下：

class Binding {
  constructor(id, path, scope, kind) {
    this.id = id;
    this.path = path;
    this.referenced = false;
    this.referencePaths = [];
  }
}

这里有两个变量需要注意:

referenced: 变量是否使用（引用）
referencePaths: 引用这个变量的对应path数组，因为可能有多个地方都用到了这个变量

第四步：Scope类

上面我们简单实现了Binding类，那自然也要实现Scope了。

我们可以先看下代码：

class Scope {
  constructor(parentScope, path) {
    this.parent = parentScope; // 父作用域
    this.bindings = {}; // 绑定的变量
    this.path = path; // 作用域对应节点的path
    
    path.traverse({
      VariableDeclarator: (childPath) => {
        // 变量声明，创建一个binding
        this.registerBinding(childPath.node.id.name, childPath);
      },
      FunctionDeclaration: (childPath) => {
        // 跳过子节点遍历
        childPath.skip();
        // 函数声明，创建一个binding
        this.registerBinding(childPath.node.id.name, childPath);
      },
      Identifier: childPath =>  {
        // 如果这个标识符 没有被定义过，就将其记录到referencePaths
        if (!childPath.findParent(p => p.isVariableDeclarator() || p.isFunctionDeclaration())) {
          const id = childPath.node.name;
          const binding = this.getBinding(id);
          if (binding) {
            binding.referenced = true;
            binding.referencePaths.push(childPath);
          }
        }
      }
    });
  }
 
  registerBinding(id, path) {
    this.bindings[id] = new Binding(id, path);
  }
 
  get scope() {
      // 防止重复创建scope
      if (this.__scope) {
        return this.__scope;
      }
      // 判断是否是block节点
      const isBlock = this.isBlock();
      // 这里获取父作用域的scope 又会触发父节点的get scope， 会递归向上查找
      const parentScope = this.parentPath && this.parentPath.scope;
      // 如果当前节点是有作用域的，就创建一个Scope
      return this.__scope = isBlock ? new Scope(parentScope, this) : parentScope;
  }
 
  isBlock() {
    return types.visitorKeys.get(this.node.type).isBlock;
  }
  
  getOwnBinding(id) {
    return this.bindings[id];
  }

  getBinding(id) {
    // 获取当前作用域是否有该变量声明
    let res = this.getOwnBinding(id);
    // 没有就继续向父作用域查找
    if (res === undefined && this.parent) {
        res = this.parent.getBinding(id);
    }
    return res;
  }

  hasBinding(id) {
    return !!this.getBinding(id);
  }
}

其中有几个重要的方法，解释一下

constructor：

遍历节点，出现函数声明和变量声明创建Bindings对象
遍历节点，出现标识符，并且不是变量声明，说明是变量的引用, 收集到referencePaths

get scope：

当访问path.scope时，会触发get scope, 如果当前节点不存在scope, 就向父节点一层层查找scope，直到有节点是block节点，就创建对应的Scope对象

总结

在一开始学习了@babel/tarverse的基本用法，以及各种写法

后来学习了path的各种属性和方法，比较常用的有path.scope, 可以用来获取作用域内的变量。path.hub可以获取File对象，并且对文件做一些操作。

最后我们一起实现了一个简单的@babel/traverse，了解了traverse是如何遍历AST节点的。并且通过Scope、Binding实现了path.scope的功能。

参考

Babel 插件通关秘籍