LLVM是一个自由软件项目,它是一种编译器基础设施,以C++写成。它的发展起源于2000年伊利诺伊大学厄巴纳-香槟分校(UIUC)的维克拉姆·艾夫(Vikram Adve)与其第一博士生克里斯·拉特纳(Chris Lattner)的研究,彼时他们想要为所有静态及动态语言创造出动态的编译技术。
LLVM的命名最早源自于底层虚拟机(Low Level Virtual Machine)的首字母缩写,但现在这个项目的范围早已大大超越其最初的意思。当前,LLVM已经发展成为被用于开发从编译器前端到后端的“一套模块及可重用的编译器及工具链技术的集合”("collection of modular and reusable compiler and toolchain technologies")。
2005年,苹果电脑雇用了克里斯·拉特纳及他的团队为苹果电脑开发应用程序系统,LLVM为现今Mac OS X及iOS开发工具的一部分,Xcode开发环境的内核使用的即是LLVM。因LLVM对产业的贡献,ACM于2012年将ACM软件系统奖授与Adve、Lattner及Evan Cheng。
----------------------------------------------------------------------------------------------------------------------------------------
一、Def-Use
Frequently, we might have an instance of the Value Class and we want to determine which Users use the Value. The list of all Users of a particular Value is called a def-use chain.
例如,let's say we have a Function* named F to a particular function foo. Finding all of the instructions that use foo is as simple as iterating over the def-use chain of F:
Function *F = ...;
for (User *U : F->users()) {
if (Instruction *Inst = dyn_cast<Instruction>(U)) {
errs() << "F is used in instruction:\n";
errs() << *Inst << "\n";
}
所以我们下面即将编写的Pass要实现的就是找出所有的使用了某一个def(或者称value)的instructions。例如,下图是本系列前面的文章《在LLVM中编写pass的详细教程(2)》 中给出的实验程序的IR文件内容。易见,在函数add中,定义了一个add,然后在其下面的一行语句中使用了它。而在main函数中定义了一个cmp,同样在其后的一行语句中使用了它。当然,这里所使用的仅仅是一个简单的示例程序,在实际的程序中一个def是可以被使用很多次的。
跟之前一样,假设你的LLVM之安装目录为 ... .../llvm,那么你首先在路径 ... .../llvm/lib/Transforms 中创建一个子文件夹,例如名字叫做DefUse。然后在此文件夹下创建如下三个文件:CMakeLists.txt、DefUse.exports、DefUse.cpp。因为LLVM中作为pass的一个示例,提供了另外一个现成的pass,即在Transforms中的Hello文件夹,你可以从该文件夹下面把三个名为CMakeLists.txt、Hello.exports、Hello.cpp的文件拷贝到DefUse文件夹中并修改相应的文件名。
# If we don't need RTTI or EH, there's no reason to export anything
# from the hello plugin.
if( NOT LLVM_REQUIRES_RTTI )
if( NOT LLVM_REQUIRES_EH )
set(LLVM_EXPORTED_SYMBOL_FILE ${CMAKE_CURRENT_SOURCE_DIR}/DefUse.exports)
endif()
endif()
if(WIN32 OR CYGWIN)
set(LLVM_LINK_COMPONENTS Core Support)
endif()
add_llvm_loadable_module( LLVMDefUse
DefUse.cpp
DEPENDS
intrinsics_gen
PLUGIN_TOOL
opt
)
然后,需要编写pass文件的具体内容,就当前这个例子而言,我们要做的就是编辑DefUse.cpp的内容。该文件内容如下:
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"
#include "llvm/Support/raw_ostream.h"
using namespace llvm;
namespace {
struct DefUse : public FunctionPass {
static char ID; // Pass identification, replacement for typeid
DefUse() : FunctionPass(ID) {}
bool runOnFunction(Function &F) override {
errs() << "Function name: ";
errs() << F.getName() << '\n';
for(Function::iterator bb = F.begin(), e = F.end(); bb!=e; bb++)
{
for(BasicBlock::iterator i = bb->begin(), i2 = bb->end(); i!=i2; i++)
{
Instruction * inst = dyn_cast<Instruction>(i);
if(inst->getOpcode() == Instruction::Add || inst->getOpcode() == Instruction::ICmp)
{
for(User *U: inst -> users())
{
if(Instruction * Inst = dyn_cast<Instruction>(U))
{
outs()<<"OpCode "<< inst->getOpcodeName() <<" used in :: ";
outs()<< * Inst <<"\n";
}
}
}
}
}
return false;
}
};
}
char DefUse::ID = 0;
static RegisterPass<DefUse> X("DefUse", "This is def-use Pass");
其中的getOpcodeName()函数在LLVM中的源码如下,这个表其实也给出了所有可能的opcode。
const char *Instruction::getOpcodeName(unsigned OpCode) {
switch (OpCode) {
// Terminators
case Ret: return "ret";
case Br: return "br";
case Switch: return "switch";
case IndirectBr: return "indirectbr";
case Invoke: return "invoke";
case Resume: return "resume";
case Unreachable: return "unreachable";
case CleanupRet: return "cleanupret";
case CatchRet: return "catchret";
case CatchPad: return "catchpad";
case CatchSwitch: return "catchswitch";
// Standard binary operators...
case Add: return "add";
case FAdd: return "fadd";
case Sub: return "sub";
case FSub: return "fsub";
case Mul: return "mul";
case FMul: return "fmul";
case UDiv: return "udiv";
case SDiv: return "sdiv";
case FDiv: return "fdiv";
case URem: return "urem";
case SRem: return "srem";
case FRem: return "frem";
// Logical operators...
case And: return "and";
case Or : return "or";
case Xor: return "xor";
// Memory instructions...
case Alloca: return "alloca";
case Load: return "load";
case Store: return "store";
case AtomicCmpXchg: return "cmpxchg";
case AtomicRMW: return "atomicrmw";
case Fence: return "fence";
case GetElementPtr: return "getelementptr";
// Convert instructions...
case Trunc: return "trunc";
case ZExt: return "zext";
case SExt: return "sext";
case FPTrunc: return "fptrunc";
case FPExt: return "fpext";
case FPToUI: return "fptoui";
case FPToSI: return "fptosi";
case UIToFP: return "uitofp";
case SIToFP: return "sitofp";
case IntToPtr: return "inttoptr";
case PtrToInt: return "ptrtoint";
case BitCast: return "bitcast";
case AddrSpaceCast: return "addrspacecast";
// Other instructions...
case ICmp: return "icmp";
case FCmp: return "fcmp";
case PHI: return "phi";
case Select: return "select";
case Call: return "call";
case Shl: return "shl";
case LShr: return "lshr";
case AShr: return "ashr";
case VAArg: return "va_arg";
case ExtractElement: return "extractelement";
case InsertElement: return "insertelement";
case ShuffleVector: return "shufflevector";
case ExtractValue: return "extractvalue";
case InsertValue: return "insertvalue";
case LandingPad: return "landingpad";
case CleanupPad: return "cleanuppad";
default: return "<Invalid operator> ";
}
}
接下来要做的就是重新bulid LLVM。进入... .../llvm/build文件夹下面,直接使用make。整个过程大概需要几分钟的样子。然后来试用一下上面编写的Pass,为此你需要在你期望的位置(例如Desktop)上建立一个新的测试文件,例如使用上一篇文章《在LLVM中编写pass的详细教程(2)》 中的名为test.c的文件。在使用Clang命令编译生成.ll文件之后,再使用下面的命令来执行上面编写的pass:
opt -load /Users/fzuo/llvm/build/lib/LLVMDefUse.dylib -DefUse test.ll
WARNING: You're attempting to print out a bitcode file.
This is inadvisable as it may cause display problems. If
you REALLY want to taste LLVM bitcode first-hand, you
can force output with the `-f' option.
Function name: add
OpCode add used in :: ret i32 %add
Function name: main
OpCode icmp used in :: br i1 %cmp, label %if.then, label %if.else
可见,我们的pass准确地找到了有关def被使用的具体位置(Instructions)。
Alternatively, it’s common to have an instance of the User Class and need to know what Values are used by it. The list of all Values used by a User is known as a use-def chain. Instances of class Instruction are common User s, so we might want to iterate over all of the values that a particular instruction uses (that is, the operands of the particular Instruction):
Instruction *pi = ...;
for (Use &U : pi->operands()) {
Value *v = U.get();
// ...
}
针对某一个Instruction,输出其所使用的所有operands(操作数)。首先在路径 ... .../llvm/lib/Transforms 中创建一个子文件夹,例如名字叫做UseDef。然后在此文件夹下创建如下三个文件:CMakeLists.txt、UseDef.exports、UseDef.cpp。然后修改上层目录(即Transforms)中的CMakeLists.txt,在其末尾添加:add_subdirectory(UseDef)。然后修改... .../llvm/lib/Transforms/UseDef 中的CMakeLists.txt。修改后该文件的内容如下:
# If we don't need RTTI or EH, there's no reason to export anything
# from the hello plugin.
if( NOT LLVM_REQUIRES_RTTI )
if( NOT LLVM_REQUIRES_EH )
set(LLVM_EXPORTED_SYMBOL_FILE ${CMAKE_CURRENT_SOURCE_DIR}/UseDef.exports)
endif()
endif()
if(WIN32 OR CYGWIN)
set(LLVM_LINK_COMPONENTS Core Support)
endif()
add_llvm_loadable_module( LLVMUseDef
UseDef.cpp
DEPENDS
intrinsics_gen
PLUGIN_TOOL
opt
)
然后,需要编写pass文件的具体内容,就当前这个例子而言,我们要做的就是编辑UseDef.cpp的内容。该文件内容如下:
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"
#include "llvm/Support/raw_ostream.h"
using namespace llvm;
namespace {
struct UseDef : public FunctionPass {
static char ID; // Pass identification, replacement for typeid
UseDef() : FunctionPass(ID) {}
bool runOnFunction(Function &F) override {
errs() << "Function name: ";
errs() << F.getName() << '\n';
for(Function::iterator bb = F.begin(), e = F.end(); bb!=e; bb++)
{
for(BasicBlock::iterator i = bb->begin(), i2 = bb->end(); i!=i2; i++)
{
Instruction * inst = dyn_cast<Instruction>(i);
if(inst->getOpcode() == Instruction::Add)
{
for(Use &U: inst -> operands())
{
Value * v = U.get();
outs()<< *v <<"\n";
}
}
}
}
return false;
}
};
}
char UseDef::ID = 0;
static RegisterPass<UseDef> X("UseDef", "This is use-def Pass");
接下来要做的就是重新bulid LLVM。进入... .../llvm/build文件夹下面,直接使用make。然后来试用一下上面编写的Pass,同样,还是使用前面的文章《在LLVM中编写pass的详细教程(2)》 中的名为test.c的文件。在使用Clang命令编译生成.ll文件之后,再使用下面的命令来执行上面编写的pass:opt -load /Users/fzuo/llvm/build/lib/LLVMUseDef.dylib -UseDef test.ll
然后,你便会得到如下输出:
WARNING: You're attempting to print out a bitcode file.
This is inadvisable as it may cause display problems. If
you REALLY want to taste LLVM bitcode first-hand, you
can force output with the `-f' option.
Function name: add
%0 = load i32, i32* %a.addr, align 4
%1 = load i32, i32* %b.addr, align 4
Function name: main
最后结合之前得到的IR文件来分析一下,如下图所示,对于定义了add的Instruction,其中所使用的操作数%0和%1分别由红色箭头所指向的两条语句所定义,这与上面的Pass执行结果保持一致。可见我们的Pass实现了预期的目标。
【本系列文章目录】
- 在LLVM中编写Pass的详细教程(1):一个Hello World的Pass
- 在LLVM中编写Pass的详细教程(2):遍历一个函数中的Basic Blocks
- 在LLVM中编写Pass的详细教程(3):对程序中的OpCode进行计数
- 在LLVM中编写Pass的详细教程(4):def use 与 use def
- 在LLVM中编写Backend Pass的详细教程(1)
- 在LLVM中编写Backend Pass的详细教程(2)
【参考文献】
- http://llvm.org/docs/ProgrammersManual.html#iterating-over-def-use-use-def-chains
- http://llvm.org/doxygen/Instruction_8cpp_source.html
- https://www.youtube.com/watch?v=KpISGH92wIs
- https://www.youtube.com/watch?v=XTqokII5pVw