Use LibTooling
and LibASTMatchers
build tools
Here's how to build useful translation tools Clang
based on the basicsLibTooling
源到源
Steps 0
: TakeClang
Because Clang
it is LLVM
part of the project, you need to download LLVM
the source code first. Clang
Both LLVM
are in the same git
repository, but in different directories. See the Getting Started Guide for more information .
cd ~/clang-llvm
git clone https://github.com/llvm/llvm-project.git
Next, you need to pick up CMake
the build system and Ninja
build tools.
cd ~/clang-llvm
git clone https://github.com/martine/ninja.git
cd ninja
git checkout release
./bootstrap.py
sudo cp ninja /usr/bin/
cd ~/clang-llvm
git clone git://cmake.org/stage/cmake.git
cd cmake
git checkout next
./bootstrap
make
sudo make install
OK. Now build Clang
!
cd ~/clang-llvm
mkdir build && cd build
cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DLLVM_BUILD_TESTS=ON
# 允许测试,默认关闭.
ninja
ninja check # Test LLVM only.
ninja clang-test # Test Clang only.
ninja install
OK, that's it. All tests should pass.
Finally, want to set the compiler Clang
for it .自己
cd ~/clang-llvm/build
cmake ../llvm
第二个
Command to open configure Clang
. GUI
You need to set CMAKE_CXX_COMPILER
the item. Press to "t"
open advanced mode. Scroll down CMAKE_CXX_COMPILER
to and set it to, /usr/bin/clang++
or install location.
Press "c"
configure, then "g"
generated CMake
files.
Finally, run it one last time ninja
and you are done.
Steps 1
:CreateClangTool
The simplest to create ClangTool
: 语法
the inspector. Although it already clang-check
exists.
First, create for the tool 新目录
, and tell CMake
it to exist. Since this won't be the core clang
tool, it will be in clang-tools-extra
the repository.
cd ~/clang-llvm
mkdir clang-tools-extra/loop-convert
echo 'add_subdirectory(loop-convert)' >> clang-tools-extra/CMakeLists.txt
vim clang-tools-extra/loop-convert/CMakeLists.txt
CMakeLists.txt
Should contain the following:
set(LLVM_LINK_COMPONENTS support)
add_clang_executable(loop-convert
LoopConvert.cpp
)
target_link_libraries(loop-convert
PRIVATE
clangAST
clangASTMatchers
clangBasic
clangFrontend
clangSerialization
clangTooling
)
Once completed, Ninja
the tool can be compiled. Compile! clang-tools-extra/loop-convert/LoopConvert.cpp
Place the following in.
See LibTooling
the documentation for the different components .
//声明`clang::SyntaxOnlyAction`.
#include "clang/Frontend/FrontendActions.h"
#include "clang/Tooling/CommonOptionsParser.h"
#include "clang/Tooling/Tooling.h"
//声明`llvm::cl::extrahelp`.
#include "llvm/Support/CommandLine.h"
using namespace clang::tooling;
using namespace llvm;
//对所有命令行选项,自定义分类,这样只显示他们.
static llvm::cl::OptionCategory MyToolCategory("my-tool options");
//`CommonOptionsParser`用与编译数据库和输入文件相关的常见命令行选项的`说明`声明`HelpMessage`.
//在所有工具中都有此帮助消息.
static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
//之后可添加此`特定工具`的帮助消息.
static cl::extrahelp MoreHelp("\nMore help text...\n");
int main(int argc, const char **argv) {
auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
if (!ExpectedParser) {
//对不支持的选项,优雅失败.
llvm::errs() << ExpectedParser.takeError();
return 1;
}
CommonOptionsParser& OptionsParser = ExpectedParser.get();
ClangTool Tool(OptionsParser.getCompilations(), OptionsParser.getSourcePathList());
return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get());
}
That's it! New tools can be compiled by build
running from the directory.ninja
cd ~/clang-llvm/build
ninja
It should now be able 源文件
to run on and ~/clang-llvm/build/bin
in 语法检查器
. Try it!
echo "int main() { return 0; }" > test.cpp
bin/loop-convert test.cpp --
Note the . 指定
after the source file 两个破折号
. Passing the compiler after the dash 附加选项
instead of loading them 编译数据库
from it is not needed now 选项
.
Intermezzo
:Learn AST
the basics of matchers
Clang
A library was recently introduced that provides 简单,强大且简洁
a way to describe AST
the .指定模式
ASTMatcher
Depending on the 宏和模板
supported DSL
implementation 匹配器
(see ASTMatchers.h
, here ), it provides a 函数式语言
common 代数数据类型
feel.
For example, suppose you just want to check 二元符号
. There is binaryOperator
a matcher called:
binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
It will match 左侧
exactly . It will not 0字面
match ( like or ), but it will match up to 0 .加式
其他形式
0
"\0"
NULL
扩展宏
匹配器
Overloaded symbols will not be 匹配
called "+"
because there is a separate operatorCallExpr
matcher to handle it 重载符号
.
There is a AST
matcher to match AST
all 不同节点
, narrow down 匹配器
to only matched 指定条件
nodes AST
, and AST
fetch from one node to AST
another 遍历匹配器
.
AST
Complete list of matchers
All 名词
matchers are described AST
in 可绑定实体
so that they 匹配项
can be found when found. To do this, just call a method on like:引用
这些匹配器
bind
variable(hasType(isInteger())).bind("intvar")
Step 2: Use AST
matchers
Okay, 使用
matcher. First define a catch 按零定义初化的新变量
all statements 匹配器
. Start by matching all for
loops:
forStmt()
Next, in 循环
the first part, specify 声明
a single variable so that it can be expanded 匹配器
to
forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
Finally, you can add 变量
initialization to zero 条件
.
forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
hasInitializer(integerLiteral(equals(0))))))))
It's easy 阅读和理解
to define a matcher (" 匹配
, init
part declares a loop of 按0
literal variables"), but hard to make every part necessary.初化
确定
Note that this will not match loops 此匹配器
that initialize to "\0",0.0,NULL
or divide 0整数
by zero . The last step is to give it a name and bind it because you want to work with it:变量
匹配器
ForStmt
StatementMatcher LoopMatcher =
forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
After you have defined your matchers, add more 助手
to run them. Matchers are MatchCallback
paired with MatchFinder
objects and 注册
then ClangTool
run from.
Add 以下内容
to LoopConvert.cpp
:
#include "clang/ASTMatchers/ASTMatchers.h"
#include "clang/ASTMatchers/ASTMatchFinder.h"
using namespace clang;
using namespace clang::ast_matchers;
StatementMatcher LoopMatcher =
forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
class LoopPrinter : public MatchFinder::MatchCallback {
public :
virtual void run(const MatchFinder::MatchResult &Result) {
if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
FS->dump();
}
};
and main()
change it to:
int main(int argc, const char **argv) {
auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
if (!ExpectedParser) {
//对不支持的选项,优雅失败.
llvm::errs() << ExpectedParser.takeError();
return 1;
}
CommonOptionsParser& OptionsParser = ExpectedParser.get();
ClangTool Tool(OptionsParser.getCompilations(), OptionsParser.getSourcePathList());
LoopPrinter Printer;
MatchFinder Finder;
Finder.addMatcher(LoopMatcher, &Printer);
return Tool.run(newFrontendActionFactory(&Finder).get());
}
Now, it should be reproducible 编译
, and 运行
the code should discover for
the loop. Create 几个示例
a new file containing, and test that the new 手工
works:
cd ~/clang-llvm/llvm/llvm_build/
ninja loop-convert
vim ~/test-files/simple-loops.cc
bin/loop-convert ~/test-files/simple-loops.cc
Steps 3.5
: More complex matchers
简单
Matchers can find for
loops, but 过滤
more loops still need to be dropped. A lot of the remaining work can 一些巧妙
be done with a matcher of choice, but first decide what you want to allow 属性
.
How to characterize a loop that can be converted to 区间
syntax 数组
-based? N
Array of size, 区间
loop based on:
1, 0索引
start
2, iterate consecutively
3, N-1
end at index
has been checked
(1)
, so all that's left 添加
is to check 循环条件
to make sure the 索引变量
AND N
comparison is in the loop, and again 检查
to make sure it 增量步骤
's just 递增
the same variable.
(2)
The matcher is simple: the requirement is declared in the init
section .相同变量
前增量或后增量
Unfortunately, it cannot be written 此匹配器
. 匹配器
It does not contain the logic to compare 两个
any AST
node and determine whether it is 相等
, so it is best to 匹配
have more than allowed, and additionally 回调
compare with.
Then you can start 构建
the submatcher. The requirements 增量步骤
are 一元增量
as follows:
hasIncrement(unaryOperator(hasOperatorName("++")))
Specifying 递增
content introduces Clang
another AST
: 怪癖
Because they are 引用
declared as variables 表达式
, press DeclRefExpr
(" 声明引用式
") to indicate 变量
usage.
To find 引用
a specific declaration unaryOperator
, simply add to it 第二个条件
:
hasIncrement(unaryOperator(
hasOperatorName("++"),
hasUnaryOperand(declRefExpr())))
Additionally, you can restrict the matcher 递增
to only when the variable is :整数
匹配
hasIncrement(unaryOperator(
hasOperatorName("++"),
hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
The last step is to 标识
append it to 此变量
so that 回调
it can be extracted in:
hasIncrement(unaryOperator(
hasOperatorName("++"),
hasUnaryOperand(declRefExpr(to(
varDecl(hasType(isInteger())).bind("incrementVariable"))))))
But when 添加
this code is added to LoopMatcher
the definition and 确保
equipped with a new matcher 程序
, it only prints out a loop initialized by 声明
zero 单个变量
and consisting 某个变量
of .一元增量
增量步骤
Now, it's just a matter of adding one 匹配器
that checks if the part for
of the loop compares to size. There's just one problem: if you don't look at the body, you don't know what's going on !条件变量
数组
循环
迭代的数组
Again limit yourself to, 匹配器
get 近似
the desired result in and 回调
fill in the details in . So, start with:
hasCondition(binaryOperator(hasOperatorName("<")))
Make sure 左侧
it is 引用
a variable and has 整数类型
.
hasCondition(binaryOperator(
hasOperatorName("<"),
hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
hasRHS(expr(hasType(isInteger())))))
Why? Because it doesn't work. In test-files/simple.cpp
the one provided 三个循环
, there is no 一个
matching condition. A quick look at the dump of the first loop generated by 上一个
the loop transformation shows the answer:迭代
for
AST
(ForStmt 0x173b240
(DeclStmt 0x173afc8
0x173af50 "int i =
(IntegerLiteral 0x173afa8 'int' 0)")
<<>>
(BinaryOperator 0x173b060 '_Bool' '<'
(ImplicitCastExpr 0x173b030 'int'
(DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
(ImplicitCastExpr 0x173b048 'int'
(DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
(UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
(DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
(CompoundStatement ...
It is known 声明
that and 增量
both match, otherwise there will be no dump 该循环
. The reason is that there is a conversion in 小于
the symbol 第一个操作数
(i.e. LHS
) , 隐式转换
i.e.引用i
式
L值到R值
The good thing is, 匹配器库
, ignoringParenImpCast
provides 此问题
a way to tell 匹配器
,to 继续
ignore before matching 隐式转换和括号
.
Adjust 条件符号
, restore 期望匹配
.
hasCondition(binaryOperator(
hasOperatorName("<"),
hasLHS(ignoringParenImpCasts(declRefExpr(
to(varDecl(hasType(isInteger())))))),
hasRHS(expr(hasType(isInteger())))))
After adding 绑定
to 想抓的式
and 标识串
extracting to 变量
, step 2 of the array is completed.
Step 4
: Extract matching nodes
At the moment, 匹配器
the callback is not very interesting: it is just 转储
looping AST
. Sometimes, you need to 更改
enter the source code. Then, use 绑定
the node from the previous step.
MatchFinder::run()
The callback takes MatchFinder::MatchResult&
parameters. Of interest are its Context
and Nodes
members.
That is, Clang
use ASTContext
classes to represent AST
environment information, but parameters 最重要
are 多个操作
required ASTContext*
.
More 直接有用
importantly , 匹配
nodes 集合
and how to extract them.
Because 绑定
there are three variables (identified by ConditionVarName,InitVarName
and ), they can be obtained by member functions . Add inIncrementVarName
getNodeAs()
匹配节点
LoopConvert.cpp
#include "clang/AST/ASTContext.h"
change LoopMatcher
to:
StatementMatcher LoopMatcher =
forStmt(hasLoopInit(declStmt(
hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))).bind("initVarName")))),
hasIncrement(unaryOperator(
hasOperatorName("++"),
hasUnaryOperand(declRefExpr(
to(varDecl(hasType(isInteger())).bind("incVarName")))))),
hasCondition(binaryOperator(
hasOperatorName("<"),
hasLHS(ignoringParenImpCasts(declRefExpr(
to(varDecl(hasType(isInteger())).bind("condVarName"))))),
hasRHS(expr(hasType(isInteger())))))).bind("forLoop");
and will LoopPrinter::run
change to
void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
ASTContext *Context = Result.Context;
const ForStmt *FS = Result.Nodes.getNodeAs<ForStmt>("forLoop");
//不想转换头文件!
if (!FS || !Context->getSourceManager().isWrittenInMainFile(FS->getForLoc()))
return;
const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName");
const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName");
const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName");
if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
return;
llvm::outs() << "发现可能基于数组的循环.\n";
}
Clang
变量声明
Associated with the variables represented by each . Since the form VarDecl
of each declaration is , it is only necessary to ensure that ( the base class of) is not and compares the specification ."规范"
按地址
唯一
VarDecl
ValueDecl
NULL
声明
static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
return First && Second &&
First->getCanonicalDecl() == Second->getCanonicalDecl();
}
If execution reaches LoopPrinter::run()
the end of the loop, know that the loop 壳
is as follows
for (int i= 0; i < expr(); ++i) {
... }
Now, just print a 循环
message stating that it was found.
By the way, although the job has been done Clang
by providing a method , testing whether it is , is not that simple:规范式
艰苦
两个式
相同
static bool areSameExpr(ASTContext *Context, const Expr *First, const Expr *Second) {
if (!First || !Second)
return false;
llvm::FoldingSetNodeID FirstID, SecondID;
First->Profile(FirstID, *Context, true);
Second->Profile(SecondID, *Context, true);
return FirstID == SecondID;
}
This code relies on two llvm::FoldingSetNodeID
member functions . 比较
As Stmt::Profile()
the documentation shows, Profile()
the member function is built according to the hash of the formula. It will be needed later AST
. 节点属性
Before adding it to run , try to find out is convertible .子节点
属性
节点描述
FoldingSetNodeID
比较
areSameExpr
其他循环
test-files/simple.cpp
新代码
哪些
循环