First, the analysis of a simple C program using SVF
We use an example to look at individual components of SVF: (. 1) Memory the Model comprising PAG and the Constraint Graph ; (2) the Pointer the Analysis comprises a non-flow and flow sensitivity analysis sensitivity analysis; (. 3) the Value-Flow Construction value CONSTRUCTED.
1.C code is as follows
void swap(char **p, char **q){
char* t = *p;
*p = *q;
*q = t;
}
int main(){
char a1, b1;
char *a = &a1;
char *b = &b1;
swap(&a,&b);
}
2. Compile get LLVM IR and optimization options with mem2Reg
$ clang -c -Xclang -disable-O0-optnone -emit-llvm swap.c -o swap.bc
$ opt -mem2reg swap.bc -o swap.opt
$ llvm-dis swap.opt
define void @swap(i8** %p, i8** %q) #0 {
entry:
%0 = load i8** %p, align 8
%1 = load i8** %q, align 8
store i8* %1, i8** %p, align 8
store i8* %0, i8** %q, align 8
ret void
}
define i32 @main() #0 {
entry:
%a1 = alloca i8, align 1
%b1 = alloca i8, align 1
%a = alloca i8*, align 8
%b = alloca i8*, align 8
store i8* %a1, i8** %a, align 8
store i8* %b1, i8** %b, align 8
call void @swap(i8** %a, i8** %b)
ret i32 0
}
3. Generate Call Graph
$ wpa -ander -dump-callgraph swap.opt
4.PAG
$ wpa -nander -dump-pag swap.opt
Shown above, a PAG of the abstract node represents a pointer or a memory object, a circle represents a pointer ( ValPN ), octagonal memory objects represent abstract ( ObjPN ). The edge between two nodes represent a constraint: green edge showing memory allocation ( AddrPE ), represents blue edge Store ( StorePE ), represents a red edge Load ( LOADPE ), passing a parameter indicating the edge points ( CallPE ).
5.ConstraintGraph
$ wpa -nander -dump-pag -dump-consG swap.opt
( The Constraint Graph ) when the flow Andersen sensitive analysis used in the analysis based on the pointer included, use the following procedure to resolve constraint rules.
Analysis, a new copy of the sides (solid black arrow) continually added to the constraint map, until it reaches a fixed point, a final constraint is as follows:
6.Andersen的指向结果
$ wpa -nander -print-pts swap.opt
NodeID 4 PointsTo: { 5 }
NodeID 7 PointsTo: { 22 }
NodeID 8 PointsTo: { 24 }
NodeID 9 PointsTo: { 18 20 }
NodeID 10 PointsTo: { 18 20 }
NodeID 14 PointsTo: { 15 }
NodeID 17 PointsTo: { 18 }
NodeID 19 PointsTo: { 20 }
NodeID 21 PointsTo: { 22 }
NodeID 23 PointsTo: { 24 }
7.Andersen的分析数据
$ wpa -ander -stat swap.opt
8.流敏感的指向结果
$ wpa -fspta -print-pts swap.opt
NodeID 4 PointsTo: { 5 }
NodeID 7 PointsTo: { 22 }
NodeID 8 PointsTo: { 24 }
NodeID 9 PointsTo: { 18 }
NodeID 10 PointsTo: { 20 }
NodeID 14 PointsTo: { 15 }
NodeID 17 PointsTo: { 18 }
NodeID 19 PointsTo: { 20 }
NodeID 21 PointsTo: { 22 }
NodeID 23 PointsTo: { 24 }
9.流敏感的分析数据
$ wpa -fspta -stat swap.opt
10.内存SSA
程序的(memory SSA)表示是通过添加MU和CHI函数到程序LLVM IR来构建的。
$ wpa -ander -svfg -dump-mssa swap.opt
==========FUNCTION: swap==========
2V_1 = ENCHI(MR_2V_0) pts{22 }
4V_1 = ENCHI(MR_4V_0) pts{24 }
entry
LDMU(MR_2V_1) pts{22 }
%0 = load i8** %p, align 8
LDMU(MR_4V_1) pts{24 }
%1 = load i8** %q, align 8
store i8* %1, i8** %p, align 8
2V_2 = STCHI(MR_2V_1) pts{22 }
store i8* %0, i8** %q, align 8
4V_2 = STCHI(MR_4V_1) pts{24 }
ret void
RETMU(MR_2V_2) pts{22 }
RETMU(MR_4V_2) pts{24 }
==========FUNCTION: main==========
2V_1 = ENCHI(MR_2V_0) pts{22 }
4V_1 = ENCHI(MR_4V_0) pts{24 }
entry
%a1 = alloca i8, align 1
%b1 = alloca i8, align 1
%a = alloca i8*, align 8
%b = alloca i8*, align 8
store i8* %a1, i8** %a, align 8
2V_2 = STCHI(MR_2V_1) pts{22 }
store i8* %b1, i8** %b, align 8
4V_2 = STCHI(MR_4V_1) pts{24 }
CALMU(MR_2V_2) pts{22 }
CALMU(MR_4V_2) pts{24 }
call void @swap(i8** %a, i8** %b) CallSite: call void @swap(i8** %a, i8** %b)
2V_3 = CALCHI(MR_2V_2) pts{22 }
4V_3 = CALCHI(MR_4V_2) pts{24 }
ret i32 0
RETMU(MR_2V_3) pts{22 }
RETMU(MR_4V_3) pts{24 }
11.值流图
程序的过程间稀疏值流图(SVFG)是一个导向图,包含了顶层指针和地址引用对象的def-use链。
一个SVFG node可以是一个语句(PAGEdge)、一个参数或者一个内存区域(抽象对象集合)。
绿色矩形表示address PAG node(AddrPE),红色矩形表示load PAG edge (LoadPE),蓝色矩形表示store PAG edge (StorePE)。黄色矩形表示一个参数(如SVFG NodeID14,是一个实际参数,和PAG NodeID 21相关)或者表示函数entry|exit处/callsite处/load|store处的一个内存区域,例如,SVFG NodeID 27表示内存对象PAG NodeID 24在被调用函数swap中间接他那个个callsite ID 1读取。
$ wpa -ander -svfg -dump-svfg swap.opt
SVFG可以通过消除不必要的node如ActualParm、AcutalIn和FormalRet,这样就更紧凑,使用(SVFG Optimizer)来FormalOut。
二、以lib方式使用SVF
测试环境:Ubuntu 18.04。
外部工具示例
假设(wpa.cpp
)是个使用SVF库的外部工程(注意:wap.cpp实际上是SVF的嵌入工具,我们只是用它作为例子,表示你也可以把SVF整合为你自己的工具链的一部分),构建好SVF后需要添加以下环境变量。
#建议写到/etc/profile中
export SVF_HOME=your_svf_root_folder
export SVF_HEADER=$SVF_HOME/include
export SVF_LIB=$SVF_HOME/Release-build/lib
export LD_LIBRARY_PATH=$SVF_LIB
export LLVM_DIR=your_path_to_llvm-7.0.0.obj
1.使用llvm-config来构建tool
首先需安装llvm-config,版本和编译SVF时的一样。
$ clang++ -fno-rtti -I $SVF_HEADER wpa.cpp -o wpa.out $SVF_LIB/libLLVMSvf.a $SVF_LIB/CUDD/libLLVMCudd.a `llvm-config --cxxflags --ldflags --libs --system-libs`
解释参数:
- -fno-rtti:SVF自身是用这个选项构建
- -I $SVF_HEADER:包含SVF的头
-
$SVF_LIB/libLLVMSvf.a $SVF_LIB/CUDD/libLLVMCudd.a
:用libLLVMSvf.a和libLLVMCudd.a依赖链接tool。 - llvm-config --cxxflags --ldflags --libs --system-libs:包含LLVM目录,使用LLVM CPP flags,并链接所有可能的LLVM库。
注意:保持参数顺序,否则会造成链接错误。
编译后得到可执行文件wpa.out,一个使用SVF作为库的外部工具。如果你想与SVF共享库Svf.so链接到一起,你可以使用以下命令:
$ clang++ -fno-rtti -I $SVF_HEADER wpa.cpp -o wpa.out $SVF_LIB/Svf.so $SVF_LIB/CUDD/libLLVMCudd.a `llvm-config --cxxflags --ldflags --libs --system-libs`
2.使用CMake来构建tool(示例见 here)
CMake构建spa.cpp很容易。
$ tree SVF-example
SVF-example
├── CMakeLists.txt
└── tools
├── CMakeLists.txt
└── wpa.cpp # external tool
模板是 CMakeList.txt, tools/CMakeLists.txt, wpa.cpp。和building SVF with CMake一样,你需要export LLVM_DIR环境变量到包含LLVMConfig.cmake或LLVM-Config.cmake的目录;如果你用apt-get安装的话,这个目录通常是/usr/lib/llvm-7/lib,若用CMake从源码构建LLVM并使用默认安装目录,这个目录通常是/usr/local/lib/cmake/llvm。这个目录和使用SVF时导出的一样。
三、基于非流敏感和非域敏感的指针分析示例
我们需要a graph、a solver和分析规则集合来实现指针分析,首先看看一个简单的非流敏感和非域敏感的基于包含的指针分析实现。你可能需要参考Andersen.cpp来看看具体的实现。
1.从PAG构建约束图:
PointerAnalysis::initialize(module);
consCG = new ConstraintGraph(pag);
2.写一个约束求解器(对于更复杂的求解器,如wave propagation,请参考AndersenWave.cpp 和 AndersenWaveDiff.cpp):
while (!isWorklistEmpty()) {
NodeID nodeId = popFromWorklist();
/// process address PAG Edge
for (ConstraintNode::const_iterator it = node->outgoingAddrsBegin(), eit =
node->outgoingAddrsEnd(); it != eit; ++it) {
processAddr(*it);
}
/// process Load/Store PAG Edge
for (PointsTo::iterator piter = getPts(nodeId).begin(), epiter =
getPts(nodeId).end(); piter != epiter; ++piter) {
NodeID ptd = *piter;
for (ConstraintNode::const_iterator it = node->outgoingLoadsBegin(),
eit = node->outgoingLoadsEnd(); it != eit; ++it) {
processLoad(ptd, *it);
}
for (ConstraintNode::const_iterator it = node->incomingStoresBegin(),
eit = node->incomingStoresEnd(); it != eit; ++it) {
processStore(ptd, *it);
}
}
// process copy
for (ConstraintNode::const_iterator it = node->directOutEdgeBegin(), eit =
node->directOutEdgeEnd(); it != eit; ++it) {
processCopy(nodeId, *it);
}
}
3.处理不同约束的规则:
/*!
* Process address edges
* src --addr --> dst
* pts(dst) |= {src}
*/
void Andersen::processAddr(const AddrCGEdge* addr) {
if(addPts(addr->getDstID(),addr->getSrcID()))
pushIntoWorklist(dst);
}
/*!
* Process load edges
* src --load--> dst,
* node \in pts(src) ==> node--copy-->dst
*/
void Andersen::processLoad(NodeID node, const ConstraintEdge* load) {
return addCopyEdge(node, load->getDstID());
}
/*!
* Process store edges
* src --store--> dst,
* node \in pts(dst) ==> src--copy-->node
*/
void Andersen::processStore(NodeID node, const ConstraintEdge* store) {
return addCopyEdge(store->getSrcID(), node);
}
/*!
* Process copy edges
* node --copy--> dst,
* union pts(dst) with pts(node)
*/
void Andersen::processCopy(NodeID node, const ConstraintEdge* edge) {
PointsTo& srcPts = getPts(node);
bool changed = unionPts(edge->getDstID(),srcPts);
if (changed)
pushIntoWorklist(dst);
}
四、source-sink分析示例
很容易基于过程间稀疏值流图来写source-sink分析,可以参考 LeakCheck.cpp 和 ProgSlice.cpp的具体实现。
In order to calculate the flow Boolean guards, we use CUDD-2.5.0 Package (Binary Decision the Diagrams -BDDs) encoded path condition.
1. First, using the Andersen pointer analysis Construction SVFG:
PointerAnalysis* ander = AndersenWaveDiff::createAndersenWaveDiff(module);
svfg = new SVFGOPT(ptaCallGraph);
svfgbuilder.build(svfg,ander);
2. Then, the selection candidate set of source and sink SVFGNodes:
Simple code to iterate from a SVFGNode on SVFG
for(SVFGNode::const_iterator it = node->OutEdgeBegin(), eit = node->OutEdgeEnd(); it!=eit; ++it) {
}
3. Finally, the method using AllPathReachableSolve (from ProgSlice class) to the full path reachability analysis, is calculated by the iterative method of the following three values calling Guards stream, until it reaches a fixed point.
- ComputeInterCallVFGGuard
- ComputeInterRetVFGGuard
- ComputeIntraVFGGuard
reference:
https://github.com/svf-tools/SVF/wiki/Analyze-a-Simple-C-Program
https://github.com/SVF-tools/SVF/wiki/Using-SVF-as-a-lib-in-your-own-tool
https://github.com/SVF-tools/SVF/wiki/Write-a-flow--and-field---insensitive-pointer-analysis
https://github.com/SVF-tools/SVF/wiki/Write-a-source-sink-analyzer