[Personal use] Static analysis notes

Preface

It is used to record various problems or small assignments or small attempts in the process of learning static analysis. If you need to watch the tutorial, here is my Watching link< a i=2>. Recommended to collect, both teachers are very good! I am a fan of CP at Station B
Insert image description here

concept

What is static analysis? Why do we need it?

Static analysis is to know the behavior of the program and whether it meets some requirements before running the program.
With the development of the times, the development language of the project has not become more complex, but the scale of the project is gradually becoming larger and more complex, and the reliability and security of the project need to be ensured. And under this premise, the cost of debugging and running the project is too high, and static analysis can analyze problems or loopholes in the program in advance without running the program.

Rice's theorem

Rice's theorem is that any non-trivial property of a recursively enumerable language is undecidable.
The difference between recursively enumerable language sources and other types of languages: Understanding Links
非平凡性质(non-trivial property): This property is only partially recursive Enumerable languages ​​satisfy
平凡性质(trivial property): This property satisfies all recursively enumerable languages

complete&&sound&&truth

complete: under-approximation lower approximation
sound: over-approximation upper approximation
Number range: sound>< /span> . is more important because it contains more vulnerabilities, and although there are false positives, it can contain . instead of false positives , achieve false negatives at the same time. Generally, it is used and It is impossible to have a static analysis method that satisfies truth>complete
completesoundcompromise soundnessbugsbugsSoundnesstruth

3-Address Code (3AC)

three-address code, used for intermediate representation of static analysis.
There is at most one opcode on the right, and each three-address code has at most three addresses.
Compare here with AST abstract syntax tree:
AST Features:

  1. Grammatical structure close to human cognition
  2. language dependency
  3. Good for quick type checking
  4. Missing control flow information

3ACAdvantage:

  1. Close to machine code
  2. Language independent, not dependent on a certain language
  3. Contains control flow information
  4. compact and uniform

Control Flow Graph (CFG)

The control flow graph is an abstract representation of a process or program. It is an abstract data structure used in the compiler. It is maintained internally by the compiler and represents all the paths that will be traversed during the execution of a program. It represents the possible flow of execution of all basic blocks within a process in the form of a graph, and can also reflect the real-time execution process of a process.
In a control flow graph, each node in the graph represents a basic block, that is, a straight section of code without any jumps or jump targets; a jump target starts a block, and a jump End a block. Directed edges are used to represent jumps in control flow. In most demonstrations, there are two specially designated blocks: the entry block, through which control enters the flowchart, and the exit block, through which all control flow leaves.

Basic Block division rules
Entry rules
  1. The first instruction of the program is the entry
  2. The target instruction of the jump is the entry
  3. The instruction immediately following the jump is the entry
building blocks

One entry and subsequent instructions to the next entry.

side
  1. The end of block A hasgotoPoints to the beginning of block B
  2. Block B follows block A and there is no unconditional jump at the end of block A. Conditional jumps naturally have two exits.

Insert image description here
The above basic knowledge will not be updated for the time being, because I found that someone has done a great job, so I don’t want to write it (I am just lazy).
A1-A4After finishing the homework, I followed this big guy’s ’s ideas and made my own annotations and explanations. If the algorithm looks If you don’t understand, please go to class first and then come back.

A1

reference

courseware

coding tasks

1. Implement active variable analysisLiveVariableAnalysis.java

  • SetFact newBoundaryFact(CFG)
  • SetFact newInitialFact()
  • void meetInto(SetFact,SetFact)
  • boolean transferNode(Stmt,SetFact,SetFact)

2. Implement iterative solverSolver.java IterativeSolver.java

  • Solver.initializeBackward(CFG,DataflowResult)
  • IterativeSolver.doSolveBackward(CFG,DataflowResult)

principle

LiveVariableAnalysis.java

Insert image description here
Here we look directly at the picture and immediately find two simple functionsnewBoundaryFact newInitialFact, which just return the initialized empty set of an objectSetFact<Var>.

newBoundaryFact / newInitialFact
public SetFact<Var> newBoundaryFact(CFG<Stmt> cfg) {
    
    
	// TODO - finish me
	return new SetFact<Var>();
}
public SetFact<Var> newInitialFact() {
    
    
	// TODO - finish me
	return new SetFact<Var>();
}

Then seemeetInto is performing the operation on OUT. Note that in front, otherwise it will be wrong. uniontarget

meetInto
public void meetInto(SetFact<Var> fact, SetFact<Var> target) {
    
    
	// TODO - finish me
	target.union(fact);
}

The last onetransferNode determines the changes in active variables of the current node, and outputs the changestrueotherwisefalse,
Active variables are judged as follows. The used variables that appear on the right side of the expression are added to the active variable set by default. If the variable on the left side of the expression appears in this set, the variable is removed, which means it has been defined. If the variable is used again, it will not be included in the active variables; however, if the variable is redefined before use, it needs to be added to the active variables again, but this is not included in the judgment requirements.

transferNode
public boolean transferNode(Stmt stmt, SetFact<Var> in, SetFact<Var> out) {
    
    
        // TODO - finish me
        boolean change = false;
        // 用于判断该节点的data-flow的活跃变量变化情况 变化输出True 否则False
        // Optional 用于处理可能出现的空指针异常问题
        // Optional.isPresent() 检查是否有值
        // Optional<LValue>= stmt.getDef() ->得到表达式左值
        // List<RValue> useB = stmt.getUses() ->得到表达式右值
        Optional<LValue> defV = stmt.getDef();
        List<RValue> useV = stmt.getUses();
        // 设置临时变量保存out set format data-flow
        SetFact<Var> tmpsetfact = out.copy();

        if (defV.isPresent()) {
    
     // 判断有值,然后取出值
            LValue defV_val = defV.get();
            if (defV_val instanceof Var) {
    
     // 判断值类型
                tmpsetfact.remove((Var)defV_val); // 移除这个元素 已经被定义的元素 又被使用的话就不再是活跃变量
                // int a = 1;
                // a = 2;
                // a 不再是活跃变量
				// 但是 int a = 2;
				// a = a -1;
				// a 是活跃变量 这里在使用前被重定义了 所以要加回来
            }
        }
        for (RValue rval : useV) {
    
    
            if (rval instanceof Var) {
    
    
                tmpsetfact.add((Var) rval); // 把表达式右值元素加入集合
            }
        }
        if (!in.equals(tmpsetfact)) {
    
     // 如果in 和 tmpsetfact 相比有变化
            change = true; // 设置变化为True
            in.set(tmpsetfact); // 把tmpsetfact设置为in的内容
        }
        // 返回这个判断
        return change;
    }
Solver.java

Active variable analysis is backward-oriented in order to optimize analysis efficiency.
Insert image description here
First obtain the export data flow node, and then update the data corresponding to IN and OUT , see ’s ; ’s (Initialize empty)/ (Return corresponding )Map <Node, Fact>APIDataflowResultsetIn/OutFactDataflowAnalysisnewInitialFactFactnewBoundaryFactFact

initializeBackward
protected void initializeBackward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    
    
	// TODO - finish me
	// 获取出口节点
	Node exit_node = cfg.getExit();
	// result.setMap<Node, Fact> -> LinkedHashMap<>()
	// 后向 遍历节点,并设置数据流 Map<Node, Fact>
	result.setInFact(exit_node, analysis.newBoundaryFact(cfg));// setInFact(Node node, Fact fact) inFacts.put(node,fact)
	for (Node node : cfg) {
    
    
		if (!cfg.isExit(node)) {
    
    
			result.setInFact(node, analysis.newInitialFact());
			result.setOutFact(node, analysis.newInitialFact());// setOutFact(Node node, Fact fact) OutFacts.put(node,fact)
		}
	}
}
IterativeSolver.java

This is still the picture. Look at the picture to speak. If the active variable data flow of the node changes (judge based on the previous transferNode principle, but here it implements one by itself, there is no need to write), start the update operation until there is no change, and subsequent jobs will have work queue optimization efficiency.
Insert image description here

doSolveBackward
 protected void doSolveBackward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    
    
        // TODO - finish me
        boolean change = true;
        while (change) {
    
    
            change = false;
            for (Node node : cfg) {
    
    
                if (!cfg.isExit(node)) {
    
    
                    // 取出后继节点
                    Set<Node> suc_nodes = cfg.getSuccsOf(node);
                    for (Node suc_node : suc_nodes) {
    
    
                        // 取出对应IN,Out fact 进行meet ^操作
                        analysis.meetInto(result.getInFact(suc_node), result.getOutFact(node));
                    }
                    // 判断是否在该节点发生了活跃变量数据流改变 设置True 否则默认False
                    change  = change || analysis.transferNode(node, result.getInFact(node), result.getOutFact(node));
                }
            }
        }
    }

In fact, many of the places used here can be done using the principles implemented previously, but after setting a class in the middleDataflowAnalysis, there is no need to rewrite it again, mainly the logical framework We need to implement.

A2

reference

courseware

coding tasks

1. Implement constant propagation analysisConstantPropagation.java

  • CPFact newBoundaryFact(CFG)
  • CPFact newInitialFact()
  • void meetInto(CPFact,CPFact)
  • boolean transferNode(Stmt,CPFact,CPFact)

2. Reality WorklistSolution device Solver.java WorkListSolver.java

  • Solver.initializeForward(CFG,DataflowResult)
  • WorkListSolver.doSolveForward(CFG,DataflowResult)

principle

ConstantPropagation.java

Insert image description here
It can be seen here that the initializationOUT is an empty object, so newInitialFact is the same as last time.

newInitialFact
 public CPFact newInitialFact() {
    
    
        // TODO - finish me
        return new CPFact();
    }

But the assignment requirement mentions: "newBoundaryFact(), you must carefully handle the parameters of each method that will be analyzed. Specifically, you must initialize their values. NAC", why is this?
I personally don’t think UNDEF is used. The reason for using NAC is that during the running of the program, the value of the parameter is uncertain. It is a constant, it may be a non-constant, but it cannot be undefined, because the parameters of this function must be defined in the process of declaring this function (if there are parameters), here is the analysis based on the default parameters, Without parameters, an empty object is returnedCPFact.

newBoundaryFact
public CPFact newBoundaryFact(CFG<Stmt> cfg) {
    
    
	// TODO - finish me
	// CPF: map = {
    
    {key:value},{},...,{}}
	// key: variable, value: NAC/UNDEF/CONSTANT
	// Constant Parameter Flow 是一个字典列表类型map, 用于保存上述键值对关系,初始化为空
	CPFact fact = new CPFact();
	// getIR(): get IR(Intermediate representation) from control flow graph
	// getParams(): get parameter variables from IR (this variable is excluded)
	// 从数据流图中获取中间表示的参数变量对象
	List<Var> vars = cfg.getIR().getParams();
	// 遍历当前对象列表
	for(Var var:vars)
	{
    
    
		// 判断是否属于设定类型中的变量
		// BYTE,SHORT,INT,CHAR,BOOLEAN
		if(canHoldInt(var))
			//更新 键值对关系 {参数对象,参数对象的属性(NAC/UNDEF/CONSTANT)}
			fact.update(var, Value.getNAC());
	}
	return fact;
}

HeremeetValue ismeet operator, notunion! The specific rules are as follows:
Insert image description here
These rule descriptions are expressed in combination with defined symbols, specificallyAPIinValue.java

meetValue
public Value meetValue(Value v1, Value v2) {
    
    
	// TODO - finish m
/*   常量传播的规则 meet operator
	 PreSet: Undefined state: UNDEF, Not constant: NAC, Variable: V, Constant: C
	 1.NAC ^ V = NAC
	 2.UNDEF ^ V = V
	 3.C ^ V = C
	 4.C ^ C = C
	 5.C1 ^ C2 = NAC     */
	if (v1.isNAC() || v2.isNAC()) {
    
    
		return Value.getNAC();
	} // NAC ^ V = NAC
	if (v1.isUndef()) {
    
    
		return v2;
	} // UNDEF ^ V = V
	if (v2.isUndef()) {
    
    
		return v1;
	} // UNDEF ^ V = V
	if (v1.getConstant() == v2.getConstant()) {
    
    
		return Value.makeConstant(v1.getConstant());
	}// C ^ C = C 判断两个值是否相等 相等返回C的constant value
	return Value.getNAC(); // others C1 ^ C2 = NAC
}
meetInto

meetInto is used to update the key-value pair after meetValue:

    public void meetInto(CPFact fact, CPFact target) {
    
    
        // TODO - finish me
        // meet^ fact and target -> lub
        // CPFact 继承自 MapFact , keySet()用于获取其键集合
        for(Var var : fact.keySet())
        {
    
    
            // 更新target里的键值对 CPF.get(key=var) -> 得到key对应的value,不存在则返回默认值UNDEF
            target.update(var, meetValue(fact.get(var), target.get(var)));
        }
    }

Insert image description here
transferNode is mainly used to determine whether it has been changed. , the specific operator is defined inOUTValueevaluateir.exp.*

transferNode / evaluate
public boolean transferNode(Stmt stmt, CPFact in, CPFact out) {
    
    
	// TODO - finish me
	// copy CPF in data into CPF out data
	// use temp CPF
	if (stmt instanceof DefinitionStmt<?,?>) {
    
    
		// DefinitionStmt 里声明了 LValue,RValue getLValue()获取左侧Value,getRValue()获取右侧Value
		LValue lv = ((DefinitionStmt<?, ?>) stmt).getLValue();
		RValue rv = ((DefinitionStmt<?, ?>) stmt).getRValue();
		if (lv instanceof Var && canHoldInt((Var)lv)){
    
     // 判断是否是合法的类型
			CPFact tf = in.copy(); // 赋值给临时变量tf
			tf.update((Var)lv, evaluate(rv, in)); // 对进行运算结果判断Value 更新tf:{
    
    {key:value}}
			return out.copyFrom(tf); // 把tf的值复制给out
		}
	}
	return out.copyFrom(in);
}
/*
 * Evaluates the {@link Value} of given expression.
 *
 * @param exp the expression to be evaluated
 * @param in  IN fact of the statement
 * @return the resulting {@link Value}
 */
public static Value evaluate(Exp exp, CPFact in) {
    
    
	// TODO - finish me
	// 这个是计算表达式的各种结果,包含各种运算符
	// 判断是否是整型常量
	if (exp instanceof IntLiteral) {
    
    
		//返回它的值判断结果 NAC/ UNDEF / Constant value
		// makeConstant() -> 返回常量值
		return Value.makeConstant(((IntLiteral)exp).getValue());
	}
	//判断是否是变量
	if (exp instanceof Var) {
    
    
		// map.get(key) = value type
		//获得对应键值对的值属性
		return in.get((Var) exp);
	}
	//初始化为NAC
	Value result = Value.getNAC();
	// 判断是否是二元表达式 A op B
	if (exp instanceof BinaryExp) {
    
    
		// CPF.get(var) 获得var作为key, 进而取出对应的Value值, getOperator()取出表达式里的操作符op
		Var op1 = ((BinaryExp) exp).getOperand1(), op2 = ((BinaryExp) exp).getOperand2();
		Value op1_val = in.get(op1), op2_val = in.get(op2);
		BinaryExp.Op op = ((BinaryExp) exp).getOperator();

		if (op1_val.isConstant() && op2_val.isConstant()) {
    
    
			//判断是否是算术表达式 + - * /
			if (exp instanceof ArithmeticExp) {
    
    
				if (op == ArithmeticExp.Op.ADD) {
    
    
					result = Value.makeConstant(op1_val.getConstant() + op2_val.getConstant());
				} else if (op == ArithmeticExp.Op.DIV) {
    
    
					if (op2_val.getConstant() == 0) {
    
    
						// 判断除数为0
						result = Value.getUndef();
					} else {
    
    
						result = Value.makeConstant(op1_val.getConstant() / op2_val.getConstant());
					}
				} else if (op == ArithmeticExp.Op.MUL) {
    
    
					result = Value.makeConstant(op1_val.getConstant() * op2_val.getConstant());
				} else if (op == ArithmeticExp.Op.SUB) {
    
    
					result = Value.makeConstant(op1_val.getConstant() - op2_val.getConstant());
				} else if (op == ArithmeticExp.Op.REM) {
    
    
					if (op2_val.getConstant() == 0) {
    
    
						// 判断除数为0
						result = Value.getUndef();
					} else {
    
    
						result = Value.makeConstant(op1_val.getConstant() % op2_val.getConstant());
					}
				}
			} else if (exp instanceof BitwiseExp) {
    
    
				// 逻辑运算符
				if (op == BitwiseExp.Op.AND) {
    
    
					result = Value.makeConstant(op1_val.getConstant() & op2_val.getConstant());
				} else if (op == BitwiseExp.Op.OR) {
    
    
					result = Value.makeConstant(op1_val.getConstant() | op2_val.getConstant());
				} else if (op == BitwiseExp.Op.XOR) {
    
    
					result = Value.makeConstant(op1_val.getConstant() ^ op2_val.getConstant());
				}
			} else if (exp instanceof ConditionExp) {
    
    
				// 条件运算符
				if (op == ConditionExp.Op.EQ) {
    
    
					result = Value.makeConstant((op1_val.getConstant() == op2_val.getConstant()) ? 1 : 0);
				} else if (op == ConditionExp.Op.GE) {
    
    
					result = Value.makeConstant((op1_val.getConstant() >= op2_val.getConstant()) ? 1 : 0);
				} else if (op == ConditionExp.Op.GT) {
    
    
					result = Value.makeConstant((op1_val.getConstant() > op2_val.getConstant()) ? 1 : 0);
				} else if (op == ConditionExp.Op.LE) {
    
    
					result = Value.makeConstant((op1_val.getConstant() <= op2_val.getConstant()) ? 1 : 0);
				} else if (op == ConditionExp.Op.LT) {
    
    
					result = Value.makeConstant((op1_val.getConstant() < op2_val.getConstant()) ? 1 : 0);
				} else if (op == ConditionExp.Op.NE) {
    
    
					result = Value.makeConstant((op1_val.getConstant() != op2_val.getConstant()) ? 1 : 0);
				}
			} else if (exp instanceof ShiftExp) {
    
    
				// 移位运算符
				if (op == ShiftExp.Op.SHL) {
    
    
					result = Value.makeConstant(op1_val.getConstant() << op2_val.getConstant());
				} else if (op == ShiftExp.Op.SHR) {
    
    
					result = Value.makeConstant(op1_val.getConstant() >> op2_val.getConstant());
				} else if (op == ShiftExp.Op.USHR) {
    
    
					result = Value.makeConstant(op1_val.getConstant() >>> op2_val.getConstant());
				}
			} else {
    
    
				result = Value.getUndef();
			}
		} else if (op1_val.isNAC() || op2_val.isNAC()) {
    
    
			// 有任意一个为NAC
			if (exp instanceof ArithmeticExp && (op == ArithmeticExp.Op.DIV || op == ArithmeticExp.Op.REM)) {
    
    
				if (op2_val.isConstant() && op2_val.getConstant() == 0) {
    
    
					result = Value.getUndef();
				} else {
    
    
					result = Value.getNAC();
				}
			} else {
    
    
				result = Value.getNAC();
			}
		} else {
    
    
			result = Value.getUndef();
		}
	}
	return result;
}
Solver.java

Forward propagation initialization, initialize each node's IN \ OUT, entry and other nodes are initialized separately.

initializeForward
protected void initializeForward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    
    
	// TODO - finish me
	// 获得入口点
	Node entry_node = cfg.getEntry();
	// 获得entry的输入和输出
	result.setOutFact(entry_node, analysis.newBoundaryFact(cfg));
	result.setInFact(entry_node, analysis.newBoundaryFact(cfg));
	for (Node node : cfg) {
    
    
		if (!cfg.isEntry(node)) {
    
    
			result.setOutFact(node, analysis.newInitialFact());
			result.setInFact(node, analysis.newInitialFact());
		}
	}
}
WorkListSolver.java

Insert image description here
Look at the picture and talk, create a work queue, and add the initialized nodes to it. When WorkList is not empty, calculate the IN and OUT of the extracted nodes, IN is Take out the OUT-p of all the predecessors of the current node and execute the result of meetInto(OUT-p, IN) (iteration), and use transferNode to determine whether < a i=8> change, the change will add the successor set of the current node to the queue. See for details. OUTAPIDataflowAnalysis.java

doSolveForward
protected void doSolveForward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    
    
	// TODO - finish me
	// worklist 设置队列
	Queue<Node> work_list = new ArrayDeque<>();
	for (Node node:cfg) {
    
    
		work_list.add(node);
	}
	while (!work_list.isEmpty()) {
    
    
		// 取出队首
		Node node = work_list.poll();
		Fact out = result.getOutFact(node);
		Fact in = result.getInFact(node);
		for (Node pred_node : cfg.getPredsOf(node)) {
    
    
			analysis.meetInto(result.getOutFact(pred_node), in);
		}
		// 把后继加入工作队列
		if(analysis.transferNode(node, in, out)) {
    
    
			for (Node successor : cfg.getSuccsOf(node)) {
    
    
				if (!work_list.contains(successor)) {
    
    
					work_list.add(successor);
				}
			}
		}
	}
}

A3

First of all, make sure that your codeA1-A2 is submitted and passed, because the evaluation opportunity will use the two parts of the code you submitted for evaluation.

reference

courseware

coding tasks

  • Implementing a dead code detectorDeadCodeDetection.java
  • Hidden task: complete some codes, of whichConstantPropagation.java LiveVariableAnalysis.java can be copied and pasted directly, Solver.java WorkListSolver.java needs to be implemented by yourselfinitializeBackward / doSolveBackward

principle

Solver.java

Insert image description here
We need to add initializeBackward, which is the backward propagation algorithm. Looking at the picture, we can see that at first IN[exit] needs to be initialized to an empty object, and then each traversed IN[B] is initialized to an empty object to obtain OUT[B]

initializeForward / initializeBackward
protected void initializeForward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    
    
	// TODO - finish me
	Node entry_node = cfg.getEntry();
	// IN[entry] = .... , OUT[entry] = ....
	result.setOutFact(entry_node, analysis.newBoundaryFact(cfg));
	result.setInFact(entry_node, analysis.newBoundaryFact(cfg));
	for (Node node : cfg) {
    
    
		if (!cfg.isEntry(node)) {
    
    
			// IN[B] = [], OUT[B] = []
			result.setOutFact(node, analysis.newInitialFact());
			result.setInFact(node, analysis.newInitialFact());
		}
	}
}

protected void initializeBackward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    
    
	// TODO - finish me
	// Fact newBoundaryFact(CFG<Node> cfg) 得到一个在边界条件下的Fact
	// IN[exit] = []
	result.setInFact(cfg.getExit(),analysis.newBoundaryFact(cfg));
	for(Node node:cfg) {
    
    
		// OUT[B] = ....
		result.setOutFact(node, analysis.newBoundaryFact(cfg));
		if (cfg.isExit(node))
			continue;
		// IN[B] = []
		result.setInFact(node, analysis.newInitialFact());
	}
}
WorkListSolver.java

We have already implemented forwardWorkListSolver.java earlier, so we only need to add the predecessor and successor to the queue in reverse.

doSolveForward / doSolveBackward
protected void doSolveForward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    
    
	// TODO - finish me
	// worklist 设置队列
	Queue<Node> work_list = new ArrayDeque<>();
	for (Node node:cfg) {
    
    
		work_list.add(node);
	}
	while (!work_list.isEmpty()) {
    
    
		// 取出队首
		Node node = work_list.poll();
		Fact out = result.getOutFact(node);
		Fact in = result.getInFact(node);
		// predecessor 前驱
		for (Node pred_node : cfg.getPredsOf(node)) {
    
    
			analysis.meetInto(result.getOutFact(pred_node), in);
		}
		// 把后继加入工作队列
		if(analysis.transferNode(node, in, out)) {
    
    
			for (Node successor : cfg.getSuccsOf(node)) {
    
    
				if (!work_list.contains(successor)) {
    
    
					work_list.add(successor);
				}
			}
		}
	}
}


protected void doSolveBackward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    
    
	// TODO - finish me
	Queue<Node> work_list = new ArrayDeque<>();
	for (Node node : cfg) {
    
    
		work_list.add(node);
	}
	while (!work_list.isEmpty()) {
    
    
		Node node = work_list.poll();
		Fact in = result.getInFact(node);
		Fact out = result.getOutFact(node);
		// successor 后继
		for (Node succ_node : cfg.getSuccsOf(node)) {
    
    
			analysis.meetInto(result.getInFact(succ_node), out);
		}
		// 把前驱加入WorkList
		if (analysis.transferNode(node, in, out)) {
    
    
			for (Node pred : cfg.getPredsOf(node)) {
    
    
				if (!work_list.contains(pred)) {
    
    
					work_list.add(pred);
				}
			}
		}
	}
}
DeadCodeDetection.java

1. Determine the dead code situation under assignment conditions:
If the left variable is not in the active variable set, it needs to be eliminated, otherwise it will be kept.
2. Determine the dead code situation under four conditions:
IF_TRUE / IF_FALSE
Insert image description here
SWITCH_CASE / SWITCH_DEFALUT
Insert image description here
Here, we directly use the results of the previous active variable analysis to return a set containing active variablesliveVars, then start filtering statements and set branch conditions赋值, < a i=8>, , among which , need to calculate the \ , \ . IFSWITCHIFSWITCHValuetruefalseNACConstant

analyze
public Set<Stmt> analyze(IR ir) {
    
    
	// obtain CFG
	CFG<Stmt> cfg = ir.getResult(CFGBuilder.ID);
	// obtain result of constant propagation
	DataflowResult<Stmt, CPFact> constants =
			ir.getResult(ConstantPropagation.ID);
	// obtain result of live variable analysis
	DataflowResult<Stmt, SetFact<Var>> liveVars =
			ir.getResult(LiveVariableAnalysis.ID);
	// keep statements (dead code) sorted in the resulting set
	Set<Stmt> deadCode = new TreeSet<>(Comparator.comparing(Stmt::getIndex));
	// TODO - finish me
	// 使用双集合筛选变量 [codes not in liveCode -> deadCode]
	Set<Stmt> liveCode = new TreeSet<>(Comparator.comparing(Stmt::getIndex));
	Queue<Stmt> stmtQueue = new LinkedList<>();
	// Your task is to recognize dead code in ir and add it to deadCode
	// IF_TRUE、IF_FALSE、SWITCH_CASE 和 SWITCH_DEFAULT
	stmtQueue.add(cfg.getEntry());
	while (!stmtQueue.isEmpty()) {
    
    
		Stmt stmt = stmtQueue.poll();
		if (cfg.isExit(stmt)) {
    
    
			liveCode.add(stmt);
			continue;
		}
		if (deadCode.contains(stmt) || liveCode.contains(stmt)) {
    
    
			continue;
		}
		// 判断当前取出语句是否是赋值语句
		if (stmt instanceof AssignStmt<?,?> assignStmt) {
    
    
			if (hasNoSideEffect(assignStmt.getRValue()) && assignStmt.getLValue() instanceof Var lVar &&
					!liveVars.getOutFact(assignStmt).contains(lVar)){
    
    
				// 判断左右有值 并且活跃变量集合里面不包含左值变量 则加入死代码集合
				deadCode.add(stmt);
			}
			else {
    
    
				liveCode.add(stmt);
			}
			// 更新队列里的后继
			stmtQueue.addAll(cfg.getSuccsOf(stmt));
		}
		// 判断当前取出语句是否是IF语句
		else if (stmt instanceof If) {
    
    
			liveCode.add(stmt);
			// 条件判断的计算结果 Value
			// getCondition() ->得到 if(A op B) 里的 A op B
			// getInFact() -> flow-in : CPFact in 参数
			Value condition_value = ConstantPropagation.evaluate(((If) stmt).getCondition(),
					constants.getInFact(stmt));
			// 判断Value的属性
			if (condition_value.isConstant()) {
    
    
				if (condition_value.getConstant() == 0) {
    
    
					// 条件表达式是 False 的情况下 遍历CFG里的输出边
					for (Edge<Stmt> e : cfg.getOutEdgesOf(stmt)) {
    
    
						if (e.getKind() == Edge.Kind.IF_FALSE) {
    
    
							// 判断属性是 条件表达式为 IF-False
							// 加入目的点 Edge: source->target
							stmtQueue.add(e.getTarget());
							break;
						}
					}
				}
				else {
    
    
					for (Edge<Stmt> e : cfg.getOutEdgesOf(stmt)) {
    
    
						if (e.getKind() == Edge.Kind.IF_TRUE) {
    
    
							// 判断属性是 条件表达式为 IF-TRUE
							// 加入目的点 Edge: source->target
							stmtQueue.add(e.getTarget());
							break;
						}
					}
				}
			}
			else {
    
    
				// 加入所有的后继点
				stmtQueue.addAll(cfg.getSuccsOf(stmt));
			}
		}
		// 判断是否是SWITCH
		else if (stmt instanceof SwitchStmt) {
    
    
			liveCode.add(stmt);

			Value condition_value = ConstantPropagation.evaluate(((SwitchStmt) stmt).getVar(),
					constants.getInFact(stmt));
			if (condition_value.isConstant()) {
    
    
				if (((SwitchStmt) stmt).getCaseValues().contains(condition_value.getConstant())) {
    
    
					for (Edge<Stmt> e : cfg.getOutEdgesOf(stmt)) {
    
    
						if (e.isSwitchCase() && e.getCaseValue() == condition_value.getConstant()) {
    
    
							stmtQueue.add(e.getTarget());
						}
					}
				}
				else {
    
    
					stmtQueue.add(((SwitchStmt) stmt).getDefaultTarget());
				}
			}
			else {
    
    
				stmtQueue.addAll(cfg.getSuccsOf(stmt));
			}
		}
		else {
    
    
			liveCode.add(stmt);
			stmtQueue.addAll(cfg.getSuccsOf(stmt));
		}
	}
	// 把没有出现在liveCode里的变量 加入到 deadCode里
	for (Stmt s : cfg.getIR().getStmts()) {
    
    
		if (!liveCode.contains(s)) {
    
    
			deadCode.add(s);
		}
	}
	// 去掉起始和终点
	deadCode.remove(cfg.getEntry());
	deadCode.remove(cfg.getExit());
	return deadCode;
}

A4

reference

courseware

coding tasks

1. Implement Class Hierarchy Analysis (CHA)CHABuilder.java

  • JMethod dispatch(JClass,Subsignature)
  • Set<JMethod> resolve(Invoke)
  • CallGraph<Invoke, JMethod> buildCallGraph(JMethod)

2. Implement constant propagation between proceduresInterConstantPropagation.java

  • boolean transferCallNode(Stmt,CPFact,CPFact)
  • boolean transferNonCallNode(Stmt,CPFact,CPFact)
  • CPFact transferNormalEdge(NormalEdge,CPFact)
  • CPFact transferCallToReturnEdge(CallToReturnEdge,CPFact)
  • CPFact transferCallEdge(LocalEdge,CPFact)
  • CPFact transferReturnEdge(LocalEdge,CPFact)

3. Implementation process Worklist solver InterSolver.java

  • void initialize()
  • void doSolve()

principle

There are many types of applications involved here, please check them yourself firstAPI to avoid confusion.

CHABuilder.java

Let’s start with something simple. dispatch The principle is as follows:
Insert image description here
Look at the picture and talk: We first find the target class objectc, then search for its target methodm, if this class contains other non-abstract target methodsm' (also called m method name, and the structure is consistent with m, the implementation may not be consistent), then directly check its corresponding name and other target methods described m-set (there may be many method exists). In other cases, if this class c inherits the super class c', c is a child of c' class, apply the above search method in its super class (parent class)c'.
In one sentence: it is a recursive violent search of all objects with the same name and the same structure function m, ranging from its own class to Parent class with inheritance relationship. (Is it still very abstract? Anyway, if you don’t know it, there’s nothing you can do about it. )m'

dispatch
 private JMethod dispatch(JClass jclass, Subsignature subsignature) {
    
    
        // TODO - finish me
        if (jclass != null) {
    
    
            JMethod jm = jclass.getDeclaredMethod(subsignature); // get JMethod
            if (jm != null && !jm.isAbstract()) {
    
    // 不是抽象的 并且 签名里信息 和 JClass的名字对上
                return jm;
            }
            // 递归调用dispatch(super_class,subsignature)
            return dispatch(jclass.getSuperClass(), subsignature);
        }
        return null;
    }

The next step is toresolve, which is a classified discussion of how to use dispatch in different situations.
Insert image description here
Look at the picture and talk: First set up a result set T, and get our target method from the call point cs m, and then start the classification discussion.
1. When cs is a static call, directly set m as the result, m is static functions
Insert image description here

2. Whencs is a special call, including private instantiation methods, constructors, and superclass instantiation methods, you first need to obtain the target class object of the corresponding method c^m and then applydispatch to it. Because the type of the class object where the target method is located (for example: generic) is not necessarily the type of the original class.
Insert image description here
3. When cs is a virtual call, it includes classes with derived and inherited relationships. At this time, the class object type is the type where the call point is located, and all its subclasses (including itself) are traversed. Using dispatch on these class objects, we know dispatch Here, the parent class is iterated to ensure that all classes above and below can be traversed.
Insert image description here

resolve
private Set<JMethod> resolve(Invoke callSite) {
    
    
	// TODO - finish me
	Set<JMethod> set = new HashSet<>();
	// 判断CallSite的类型 是否 静态static
	if (callSite.isStatic()) {
    
    
		// callSite.getMethodRef -> methodRef
		// methodRef.getSubsignature() -> methodRef.Subsignature
		// JClass.getDeclaredMethod(Subsignature) -> map.get(Subsignature) -> JMethod
		// methodRef.getDeclaringClass() -> JClass
		// 把对应签名的方法JMethod加入到set里 静态的,直接获取到对应方法
		set.add(callSite.getMethodRef().getDeclaringClass().getDeclaredMethod(
				callSite.getMethodRef().getSubsignature()));
	}
	else if (callSite.isSpecial()) {
    
    
		// 是特殊的 应用dispatch(class, subsignature) 需要应用dispatch
		JClass cm = callSite.getMethodRef().getDeclaringClass();
		JMethod m = dispatch(cm, callSite.getMethodRef().getSubsignature());
		if (m != null) {
    
    
			set.add(m);
		}
	}
	else if (callSite.isVirtual() || callSite.isInterface()) {
    
    
		// 接口或者是虚拟的
		// 需要遍历它的子类 即继承它的类 再对他们进行dispatch(class, subsignature)
		JClass receiver = callSite.getMethodRef().getDeclaringClass();
		Queue<JClass> classQueue = new ArrayDeque<>();
		classQueue.add(receiver);
		while (!classQueue.isEmpty()) {
    
    
			JClass jClass = classQueue.poll();
			JMethod m = dispatch(jClass, callSite.getMethodRef().getSubsignature());
			if (m != null) {
    
    
				set.add(m);
			}
			// hierarchy 获得对应JClass的子类/子接口类/子实现类
			classQueue.addAll(hierarchy.getDirectImplementorsOf(jClass));
			classQueue.addAll(hierarchy.getDirectSubclassesOf(jClass));
			classQueue.addAll(hierarchy.getDirectSubinterfacesOf(jClass));
		}
	}
	return set;
}
buildCallGraph

Insert image description here
Look at the picture and talk: The important component is resolve, so you must write resolve first before you can write this. The main thing is to add the mark of the reachable method, and add the reachable mark to it after traversal to prevent repeated operations. To get the judgment of this reachable method, you need to use reachableMethods() to obtain the reachable method, and then compare and judge, and then go to traverse the reachable method m cs,Applyresolve. Take out m' from the result set, update Call Graph and add edge cs->m', and update the work queue.

private CallGraph<Invoke, JMethod> buildCallGraph(JMethod entry) {
    
    
	DefaultCallGraph callGraph = new DefaultCallGraph();
	// 把entry JMethod 加入entryMethods set
	callGraph.addEntryMethod(entry);
	// TODO - finish me
	// 构造JMethod对象队列
	// JMethod(JClass declaringClass:声明类,
	// String name:方法名称,Set<Modifier> modifiers:修饰符集合,
	// List<Type> paramTypes:参数类型列表,
	// Type returnType:返回类型, List<ClassType> exceptions:异常类型列表,
	// AnnotationHolder annotationHolder:注解持有者,
	// @Nullable List<AnnotationHolder> paramAnnotations:可空的参数注解列表,
	// Object methodSource:方法来源)
	Queue<JMethod> methodQueue = new ArrayDeque<>();
	methodQueue.add(entry);
	while (!methodQueue.isEmpty()) {
    
    
		// 取出队首元素 jm
		JMethod jm = methodQueue.poll();
		// Stream<Method> reachableMethods()
		// Stream.nonMatch(Predice:P) P: object->object condition
		// 若Stream里有元素满足条件返回True 否则返回False
		// 判断流里的每个元素m 是否等于jm
		if (callGraph.reachableMethods().noneMatch(m -> m.equals(jm))) {
    
    
			// 满足条件则把jm加入callGraph.reachableMethod set里
			// 判断CFG是否改变T/F -> 方法是判断jm是否被第一次加入set ->
			// 获得IR表达式并更新: callSiteToContainer.put(invoke, method);
			//                        callSitesIn.put(method, invoke) -> in order
			callGraph.addReachableMethod(jm);
			// 遍历callSiteIn cs: <method, invoke> 调用点
			callGraph.callSitesIn(jm).forEach(
					cs -> {
    
    
						Set<JMethod> T = resolve(cs); // 对每个调用点应用resolve
						for (JMethod mm : T) {
    
    
							// addEdge(Edge<Invoke, JMethod> edge)
							// 加边到 Map: callSiteToEdges.put(edge.getCallSite(), edge)), getCallSite: get call-site
							//            calleeToEdges.put(edge.getCallee(), edge) getCallee: get Callee Method
							// CallGraphs.getCallKind(cs) -> ret: CallKind, in: invoke->invokeExp/invokeExp
							// 边构造器-传入三个参数Kind,CallSite,Callee,只有两个被泛型变量使用Edge<CallSite,Method>
							// Edge(CallKind kind, CallSite callSite, Method callee)
							// 注意泛型可以传入任意已经存在类型的参数
							callGraph.addEdge(new Edge<>(CallGraphs.getCallKind(cs), cs, mm));
							methodQueue.add(mm);
						}

					}
			);
		}
	}
	return callGraph;
}
InterConstantPropagation.java

Insert image description here
Look at the picture and speak: There is no algorithm flow chart, but we can slightly modify it based on the previous flow chart and add the characteristics of Interprocedual.

1.Node transfer

  • call nodes -> transferCallNode used to determine /Whether it changes, it will be overwritten if changedcall nodeINOUTOUT
  • other nodes -> transferNonCallNode is used to determine if it is not /Whether to changecall nodeINOUT

2.Edge transfer

  • normal edges -> transferNormalEdge returns an unprocessed oneCPFact, identical
  • call-to-return edges -> transferCallToReturnEdge Get the output edge through a IN and edge object The specific information of a>, remove the variables that appear on the left side of the call pointcall-to-returnCFact<Var, Value>
  • call edges -> transferCallEdge Get the call edges through the call point OUT and edge objects< The specific information of a i=4>, pass the value of the actual parameter to the formal parametercall-edgeCPFact<Var, Value>argsparams
  • return edges -> transferReturnEdge Get the return edges through the return value OUT and edge objects< The specific information of a i=4>, and pass the actual return value to the return parameterreturn-edgeCPFact<Var, Value>

In fact, the value passed in the code is: update the CPFact<Var, Value> corresponding to this node, and change the value corresponding to Modified. VarValue

transferCallNode
protected boolean transferCallNode(Stmt stmt, CPFact in, CPFact out) {
    
    
	// TODO - finish me
	// 判断是否 CallNode 的data-flow 改变
	if(!out.equals(in))
	{
    
    
		out.copyFrom(in);// 改变之后覆盖out的值
		return true;
	}
	return false;
}
transferNonCallNode
protected boolean transferNonCallNode(Stmt stmt, CPFact in, CPFact out) {
    
    
	// TODO - finish me
	// transferNode() 用于判断是否是CallNode 是调用transferCallNode,否则调用transferNonCallNode
	return cp.transferNode(stmt, in, out);
}
transferNormalEdge
protected CPFact transferNormalEdge(NormalEdge<Stmt> edge, CPFact out) {
    
    
	// TODO - finish me
	return out.copy(); // return CPFact 恒等,不做处理
}
transferCallToReturnEdge

Removing the variables that appear on the left side of the expression means removing the relationship pairlv <Var, Value>

protected CPFact transferCallToReturnEdge(CallToReturnEdge<Stmt> edge, CPFact out) {
    
    
	// TODO - finish me
	// return call-to-return edge 调用-返回边
	// 需要kill掉左侧变量的值 map:CPFact <Var, Value>
	// 获得该边的起点 statement
	Stmt stmt = edge.getSource();
	CPFact result = out.copy();
	// 判断是否符合提取左值的表达式类型
	if(stmt instanceof DefinitionStmt<?,?>)
	{
    
    
		//提取左值然后删掉
		LValue lv = ((DefinitionStmt<?,?>) stmt).getLValue();// 强制类型转换 然后提取左侧的变量
		if(lv instanceof Var)
		{
    
    
			result.remove((Var) lv);
		}
	}
	return result;
}
transferCallEdge

passesarguments toparameters.
e.g. int sum(a,b,c){return a+b+c;}
formal parameters are a,b,c, sum(1,2,3) call point, actual parameters 1,2,3; Here we need to pass 1,2,3 to a,b,c to update their pair relationship:<a, 1>, <b, 2>, <c, 3>

protected CPFact transferCallEdge(CallEdge<Stmt> edge, CPFact callSiteOut) {
    
    
        // TODO - finish me
        // 返回调用边 把实参传给形参
        // 首先从调用点的 OUT fact 中获取实参的值,然后返回一个新的 fact
        // 这个 fact 把形参映射到它对应的实参的值
        Stmt stmt = edge.getSource();
        CPFact result = newInitialFact(); // 空 CPFact map<Var,Value>
        // 判断表达式是否符合调用Invoke格式
        if(stmt instanceof Invoke)
        {
    
    
            // 获取实参List
            List<Var> args = ((Invoke) stmt).getInvokeExp().getArgs();
            // 获得被调用对象callee
            JMethod callee = edge.getCallee();
            //获取callee函数对象里面的形参
            List<Var> calleeParamList = callee.getIR().getParams();
            // 映射形参
            for(int i=0;i<args.size();i += 1)
            {
    
    
                // update callFact: <parameter var,arguments value>
                // 获取 形参名 ->var, 实参值 ->value
                // calleeParamList <List> get index, callSiteOut <map> get key
                result.update(calleeParamList.get(i), callSiteOut.get(args.get(i)));
            }
        }
        return result;
    }
transferReturnEdge

Passreturn value toreturn parameter and pass the return value1 to return_value , but note that there may be multiple return values, and in this case their values ​​must be calculated.

protected CPFact transferReturnEdge(ReturnEdge<Stmt> edge, CPFact returnOut) {
    
    
        // TODO - finish me
        // 返回 返回边
        // 获取调用点 cs statement
        Stmt stmt = edge.getCallSite();
        CPFact result = newInitialFact(); // map <Var, Value>
        // 将被调用方法的返回值传递给调用点等号左侧的变量
        // 从被调用方法的 exit 节点的 OUT fact 中获取返回值(可能有多个),
        // 然后返回一个将调用点等号左侧的变量映射到返回值的 fact
        Value val = Value.getUndef();
        // 遍历所有返回变量 <Collections> 允许重复/ null 无顺序
        // 不一定只有一个返回值 if-else结构下可能有多个返回变量
        for(Var var : edge.getReturnVars())
        {
    
    
            // 计算右侧值结果 常量传播的那个计算规则 NAC op C = NAC ... UNDEF op C = C
            val = cp.meetValue(val, returnOut.get(var));
        }
        // 左侧变量名 -> 返回值
        // 判断当前语句符合调用格式 并且 有左侧有变量
        // invoke.getLValue() 获得调用表达式左侧变量
        if(stmt instanceof Invoke && ((Invoke) stmt).getLValue()!=null)
        {
    
    
            // 更新左值的映射关系
            result.update(((Invoke) stmt).getLValue(), val);
        }
        return result;
    }
InterSolver.java

It is emphasized here: "During the initialization process, the inter-procedural solver needs to initialize all IN/OUT fact in the program, that is, all ICFG node. But you only need for ICFG's entry method (such as main method) Node settings . This means nodes for other methods and initial nodes > It's the same."entryboundary factentryentryfact

initialize
private void initialize() {
    
    
        // TODO - finish me
        // 需要初始化程序中所有的 IN/OUT fact,也就是 ICFG 的全部节点
        // 但你仅需要对 ICFG 的 entry 方法(比如 main 方法)的 entry 节点设置 boundary fact
        // lambda expression: (item -> condition)-> return true/false
        // entryMethods()-> Stream <Method> 初始化入口节点
        icfg.entryMethods().forEach(entryMethod->{
    
    
            Node entryNode = icfg.getEntryOf(entryMethod);
            // set result <Node, Fact>
            // InterDataflowAnalysis<?,?>,newBoundaryFact(Node boundary) -> ret:Fact with Node boundary
            result.setInFact(entryNode, analysis.newBoundaryFact(entryNode));
            result.setOutFact(entryNode, analysis.newBoundaryFact(entryNode));
        });
        //初始化其他非入口节点
        // noneMatch(predice: P) -> 不满足 true 满足 false
        icfg.forEach(node ->{
    
    
            if(icfg.entryMethods().noneMatch(
                    entryMethod-> node.equals(icfg.getEntryOf(entryMethod)))){
    
    
                result.setInFact(node, analysis.newBoundaryFact(node));
                result.setOutFact(node, analysis.newBoundaryFact(node));
            }
        });
    }
toSolve

is written according to the previous worklist algorithm. When calculating IN fact of a node, the inter-procedural solver needs to analyze the incoming edge and the predecessors' OUT facts application edge transfer function (transferEdge), this is the part that needs to be modified.
Insert image description here

private void doSolve() {
    
    
	// TODO - finish me
	workList = new ArrayDeque<>();
	// icfg: Set<Node> getCallersOf(Method method);
	// 调用点的方法节点
	for(Node node : icfg){
    
    
		workList.add(node);
	}
	// 遍历workList
	// 在计算一个节点的 IN fact 时
	// 过程间求解器需要对传入的 edge 和前驱们的 OUT facts 应用 edge transfer 函数
	while(!workList.isEmpty())
	{
    
    
		Node node = workList.poll();
		// Fact getIn/OutFact(Node node)
		// 获得in / out fact
		// 这里是对于取出的node
		Fact in = result.getInFact(node);
		Fact out = result.getOutFact(node);
		// Set<ICFGEdge<Node>> DataflowResult.getInEdgesOf(Node node);
		// 这里是对于取出的node 的所有in-flow边 所涉及到的节点
		for(ICFGEdge<Node> e: icfg.getInEdgesOf(node))
		{
    
    
			// InterDataflowAnalysis.meetInto(Fact target, Fact fact){ target.union(fact);}
			// InterDataflowAnalysis.transferEdge(ICFGEdge<Node> edge, Fact out) ret: Fact -> the result of edge transfer function
			// result.getOutFact(e.getSource()) 获取e-node起点 再获取起点的outFact
			// 这是不同之处
			analysis.meetInto(analysis.transferEdge(e, result.getOutFact(e.getSource())),in);
		}
		// 判断该节点 in/out flow 是否发生变化
		// 就是经过上述计算 meetInto 之后
		// in 如果发生改变 则和out不一致 还要继续计算,把后继suc加入workList
		if(analysis.transferNode(node, in, out))
		{
    
    
			for(Node suc: icfg.getSuccsOf(node))
			{
    
    
				if(!workList.contains(suc))
				{
    
    
					workList.add(suc);
				}
			}
		}
	}
}

END

I will continue to write in the future, but it has not been updated here yet, so I will give it a try first.
It’s time to have a rest.
While doing it, I discovered the magical dependency chain relationship!
I gradually understand everything, it’s okay, let’s eat some plums first

A3 :- A2,A1
A4
A5
A6 :- A5
A7 :- A2, A4, A6
A8 :- A6

To summarize: you have to write A1、A2 to write A3, A1 A2 A4 A5 is basically independent, and writing < Only if a i=4> can be written, only if can one be written. So when you assign homework to students, you only need to assign it this way: ask everyone to complete so that they have to write prerequisite tasks. A2、A4、A6A7A6A8
A3,A6,A7,A8
Insert image description here

Guess you like

Origin blog.csdn.net/daxuanzi515/article/details/133947529