Slice-Level Vulnerability Detection and Interpretation Method Based on Graph Neural Network

Source: Journal of Software

Authors: Hu Yutao, Wang Suyuan, Wu Yueming, Zou Deqing, Li Wenke, Jin Hai

Summary

With the increasing complexity of software, the demand for research on vulnerability detection is also increasing. The rapid discovery and repair of software vulnerabilities can minimize the loss caused by vulnerabilities. Vulnerability detection methods based on deep learning are the current The emerging detection method can automatically learn the hidden vulnerability pattern from the vulnerability code, saving a lot of manpower. However, the vulnerability detection method based on deep learning is not yet perfect. Among them, the detection method at the function level has a coarse detection granularity and detection Although the detection method at the slice level can effectively reduce the sample noise, there are still two problems in the following two aspects: On the one hand, most of the existing methods use artificial vulnerability data sets for experiments, so their performance in the real environment The ability of vulnerability detection is still doubtful; on the other hand, related work only focuses on detecting whether there are vulnerabilities in slice samples, but lacks the consideration of the interpretability of detection results. To solve the above problems, a slice-level vulnerability detection and interpretation based on graph neural network is proposed. Method. This method first normalizes the C/C++ source code and extracts slices to reduce the interference of sample redundant information; then, uses the graph neural network model to embedding slices to obtain its vector representation, so as to retain the structural information and vulnerability characteristics of the source code ; Then, input the vector representation of the slice into the vulnerability detection model for training and prediction; finally, input the trained vulnerability detection model and the vulnerability slice to be explained into the vulnerability interpreter to get the specific vulnerability code line. The experimental results show that: in the vulnerability In terms of detection, the F1 score of this method for the detection of real vulnerability data reaches 75.1%, which is 41.2%−110.4% higher than that of the comparison method; in terms of vulnerability explanation, when the method limits the top 10% of key nodes, the accuracy rate can reach 75.1%. Compared with the two comparison interpreters, it has increased by 8.9% and 24.9%, and the time overhead has been shortened by 42.5% and 15.4%. Finally, the method correctly detects and interprets 59 real vulnerabilities, proving its usefulness in real-world vulnerability discovery.

Key words

Vulnerability Detection Deep Learning Graph Neural Network Artificial Intelligence Explainability

Cyberspace security has become a key component and an important guarantee of national security. In recent years, various cyberspace security incidents such as cyber hacker extortion, botnet attacks, and user information leakage have occurred frequently, causing serious damage to national security and users' personal and property. Threats. Vulnerability is the root cause of various cyberspace security incidents. As an important part of cyberspace, software systems have their own vulnerabilities that pose serious security threats to cyberspace. Vulnerability detection for source code is to discover software It is one of the main ways of exploiting vulnerabilities[1], and has natural advantages: on the one hand, rich semantic information can be obtained from the source code, which is convenient to represent the vulnerability characteristics of the software; Therefore, source code detection has obvious advantages and significance. With the rise of machine learning, especially deep learning, recent research has begun to use deep learning models to detect source code vulnerabilities [2−4], because it does not rely on expert experience and reduces the subjectivity of manual intervention, it can achieve lower false positive and false positive rates.

At this stage, the vulnerability detection method based on deep learning can be divided into function level and slice level according to the sample granularity. Among them,

  • Vulnerability detection is performed at the function level for the entire function. Its advantage is that the function can cover more comprehensive vulnerability information, but at the same time it will also introduce a large number of noise statements that are not related to the vulnerability, resulting in poor detection accuracy;

  • Slice-level vulnerability detection optimizes the sample granularity, that is, removes noise statements irrelevant to the vulnerability in the sample, so that the model can learn more accurate vulnerability characteristics. However, the existing slice-level vulnerability detection work mainly relies on artificially constructed vulnerabilities. datasets, such as the software assurance reference dataset (SARD)[5], to demonstrate the effectiveness of their methods, therefore, their vulnerability detection capabilities in real environments are still questionable.

In addition, related work only focuses on detecting whether there are vulnerabilities in sliced ​​samples, but lacks the consideration of the interpretability of detection results. Since the deep learning model involves a large number of complex feature calculations, the learned vulnerability features are difficult to be safely analyzed. The detection results are not convincing due to the lack of understanding by personnel. The interpretability of the model is not only a key factor in determining whether users can trust the detection model, but also helps users further analyze the causes of vulnerabilities and quickly repair them.

Aiming at the above problems, this paper proposes a slice-level vulnerability detection and interpretation method based on graph neural network. This method first normalizes the C/C++ source code and extracts slices to reduce the interference of sample redundant information; The network model performs slice embedding to obtain its vector representation, so as to retain the structural information and vulnerability characteristics of the source code; then, the vector representation of the slice is input into the vulnerability detection model for training and prediction; finally, the trained vulnerability detection model and the to-be-interpreted Enter the vulnerability code into the vulnerability interpreter to get the specific line of the vulnerability code.

In order to verify the effectiveness of the method in this paper for real vulnerability detection and interpretation, this paper uses real vulnerability data sets for all experiments. In terms of vulnerability detection, this paper selects three rule-based vulnerability detection tools (Checkmark[6], FlawFinder[7] and RATS[8]) and four deep learning-based vulnerability detection methods (TokednCNN[9], StatementLSTM[10], SySeVR[11] and Devign[12]) as comparison tools. The experimental results show that: the F1 score of the method in this paper It can reach 75.1%, which is 41.2%−110.4% higher than other vulnerability detection methods. In terms of vulnerability explanation, this paper selects two advanced graph neural network interpreters (GNNExplainer[13] and PGExplainer[14]) for comparison Interpreter. The experimental results show that the accuracy rate of this interpretation method can reach 73.6% when limiting the top 10% key nodes, which is 8.9% and 24.9% higher than the comparison method, and the time cost is shortened by 42.5% % and 15.4%. Finally, in order to test the practicability of this method in real-world vulnerability discovery, this paper selected 4 real open source software for testing, and successfully detected and explained 59 real vulnerabilities in open source software. The experimental results show that : The method in this paper can effectively detect real software vulnerabilities and provide reliable interpretation of detection results.

The main contributions of this paper are as follows:

  • (1) A slice-level vulnerability detection method based on graph neural network is proposed. By extracting code slices, code redundancy is effectively reduced and the interference of irrelevant sentences is reduced. At the same time, graph neural network can better preserve code structure and Vulnerability syntax and semantic information, so that it can achieve better detection results for complex real vulnerability data;

  • (2) A graph neural network interpreter is used and improved to make the general graph neural network interpreter more suitable for the vulnerability explanation task. This paper improves the GNNExplainer so that it can output the vulnerability code line quickly and accurately, and enhances the GNN-based The credibility of deep learning vulnerability detection methods, and help researchers analyze the causes of vulnerabilities and fix them;

  • (3) The method in this paper scans 3 312 657 slices of 4 open source software, and verifies the detection and interpretation results by manual analysis and matching with known vulnerability patterns. The experimental results show that it correctly detects and interprets 59 real Vulnerabilities, confirming the practicability of the method in this paper in real-world vulnerability discovery.

Section 1 of this paper introduces the related work of vulnerability detection and explanation based on deep learning. Section 2 introduces the basic knowledge required for this paper, including the concepts of static program analysis and graph neural network. Section 3 introduces the slice level of this paper based on graph neural network. The specific implementation of the vulnerability detection and interpretation method. Section 4 verifies the effectiveness of the method in this paper in the three aspects of vulnerability detection, vulnerability interpretation and practical application through comparative experiments. Finally, the full text is summarized and the future work is prospected.

1 related work

This section will introduce the related work involved in this paper, mainly including the two aspects of deep learning-based vulnerability detection and model interpretability.

1.1   Vulnerability detection related work based on deep learning

In recent years, deep learning models have been applied in the field of vulnerability detection due to their powerful modeling capabilities and feature learning capabilities. Vulnerability detection methods based on deep learning can be classified according to the two perspectives of sample granularity and model type.

Vulnerability detection methods based on deep learning can be divided into function level and slice level according to the granularity of samples to be processed.

  • At the function level: Lin et al. [15] used a deep learning model to detect vulnerabilities at the function level for the first time in view of the characteristics of cross-project software; Feng et al. [16] designed a tree-based vulnerability detector, which first used static Analyze the abstract syntax tree of the extracted function, then apply preorder traversal search to convert the tree into a sequence, and finally input the sequence into a biphasic gated recurrent unit (BGRU) model for training and testing; Wang et al.[17] The FUNDED vulnerability detection model was proposed, which extracts the abstract syntax tree and program control dependency graph of the code, and inputs it into the graph neural network to complete the vulnerability detection; Wu et al. [18] converted the function to be tested into a picture and sent it to the convolutional neural network. network for vulnerability detection;

  • In order to reduce the impact of sample noise on detection results, researchers optimized the sample granularity and proposed a finer-grained slice-level vulnerability detection method. For example: VulDeePecker proposed by Li et al. [19] first targets sensitive APIs in the program under test Extract program slices, and then use bidirectional long short-term memory (BiLSTM) to train a vulnerability detector; Zou et al. [20] introduced the concept of code attention for the first time, considering program control when building program slices Dependencies, to include more comprehensive vulnerability semantics and syntax information, and to complete multiple types of vulnerability detection tasks; SySeVR proposed by Li et al. [11] further improved the vulnerability slice method, that is, added array Use, pointer use, and integer use are used as the benchmarks for vulnerability slices. This method greatly increases the number of vulnerability slices, which is more conducive to the training of deep learning models.

Vulnerability detection methods based on deep learning can be divided into two types: text-based model and graph-based model.

  • The text-based vulnerability detection method mainly uses the convolutional neural network model and the recurrent neural network model. The idea is to treat the code as ordinary text for vector representation, and input the vector into the deep learning model for detection. For example: Russell et al.[9] The proposed TokenCNN first extracts code token sequences, then uses convolutional neural networks for representation learning, and inputs them into fully connected layers for classification to obtain vulnerability detection results; VulSniper proposed by Duan et al. [21] extracts code tokens The attribute graph is converted into a feature tensor and sent to the BiLSTM model to complete the vulnerability detection. However, the code is different from ordinary text, which has richer structural and semantic information. Therefore, the way of treating the code as text will be somewhat loss code information;

  • The code representation graph can fully represent the code structure and semantics [22]. Therefore, researchers proposed to input the code representation graph into the graph neural network model for vulnerability detection. For example: Zhou et al. [12] used a general graph neural network model for vulnerability detection , the graph convolution module included in the model can effectively learn function-level vulnerability features from the graph representation of the function; Cheng et al. [23] proposed DeepWukong, which uses a graph neural network (graph neural network, GNN) to achieve Vulnerability detection between C/C++ function procedures for the subgraph of the code program dependency graph. However, the vulnerability detection between procedures is difficult to apply to real scenarios, because the complexity of the call relationship between functions in real software makes it difficult to control the length of the call chain , which makes the graph too complex for effective detection.

The advantage of function-level vulnerability detection is that the function can cover relatively complete vulnerability features, but at the same time it will introduce more noise statements that have nothing to do with the vulnerability. Especially in the real-world vulnerability detection scenario, the function contains too many lines of code, Excessive noise in the samples will interfere with the deep learning model, making it impossible to learn precise vulnerability features and affect the accuracy of detection results. Therefore, in order to ensure the effectiveness of this system for real vulnerability detection tasks, this paper chooses to construct Slice granular vulnerability detection model. In addition, according to the research results of Chakraborty et al. [24], graphs can better express the structural and semantic information of the code than text, so in the field of vulnerability detection, the detection method based on the graph model is obviously better Therefore, this paper uses code slicing (i.e. program dependency graph subgraph) to train the graph neural network model to achieve accurate detection of real vulnerabilities.

1.2   Related Work Based on Deep Learning Vulnerability Interpretation

At present, the interpretability work of graph neural network can be divided into four types: perturbation-based explanation method, feature-based explanation method, model-based explanation method and decomposition-based explanation method. Among them, the model-based explanation method is oriented to The explanation method designed for the model, and other types of explanation methods are to provide an explanation of the detection results for a single instance. In the vulnerability explanation task, researchers are more inclined to understand the vulnerability principle of each vulnerability rather than understand the operation of the deep learning model itself mechanism. Therefore, the example-oriented explanation method is more suitable for the explanation task of vulnerability detection.

The model-based explanation method focuses on explaining the model as a whole, that is, providing an explanation scheme for the deep learning model itself, so as to provide high-level insights and general understanding of the working mode of the model, so as to facilitate researchers to make a targeted analysis of the model. For example: Yuan et al. [25] propose to train a graph generator whose generated graph pattern can explain the depth graph model, that is, use policy gradient to train the graph generator according to the feedback of the trained graph model. Decomposition-based explanation methods measure the importance of input features by decomposing the original model predictions into several items, which are regarded as the importance scores of the corresponding input features. For example: GNN-LRP interpreter proposed by Schnake et al. [26] The importance of different relative step sizes (recording the path of message passing between layers) in graph neural networks is studied, and explained by assigning scores to different step sizes and obtaining scores from their corresponding nodes. But its The computational complexity is too high, which leads to certain limitations in actual use. Feature-based interpretation methods rely on gradients or hidden map features to represent the importance of different input features (wherein, the higher the gradient or feature value, the higher its importance) , is widely used in image and text tasks. For example: the Grad-Cam interpreter proposed by Selvaraju et al. [27] is an interpretation model based on image classifiers, which uses gradients to represent the importance of different input features. The method It can be extended and applied to graph models to measure the importance of different nodes. The key idea is to combine hidden feature maps and gradients to indicate the importance of nodes. The structure of the model has special requirements, so it is not universal. On the contrary, the perturbation-based explanation method is widely used to explain the graph neural network model because it is not limited by the network structure of the model. For example: Ying et al. [13] proposed The GNNExplainer interpreter can generate explanations for any graph neural network from the perspective of graph structure and nodes, and explore the subgraph structure most relevant to the prediction results, so as to realize the interpretation of predictions; the PGExplainer explanation proposed by Luo et al. [14] The main idea of ​​GNNExplainer is similar to that of GNNExplainer, but it needs parameter training of the interpreter before explaining the graph neural network model, so as to explain multiple instances at the same time from the global perspective of the model, and the output plays a key role in the prediction results of the model. role subgraph structure;The SubgraphX ​​model proposed by Yuan et al. [28] aims to explain the role of the important subgraph of the graph neural network model in the prediction process. Specifically, it combines the pre-trained model to be explained and graph instance data with Monte Carlo tree search The method combines the important subgraphs in the output graph instance.

At present, the vulnerability interpretation based on deep learning is still in the initial stage of exploration, and there are few related works. The IVDetect tool proposed by Li et al. [29] uses the graph neural network interpreter GNNExplainer to realize the function-level vulnerability detection result interpretation. For the interpretation work of IVDetect, this article will use and improve GNNExplainer to explain slice-level vulnerability detection results. On the one hand, slices contain less noise information, which can achieve better interpretation results; on the other hand, the interpreted objects are slices ( Program dependency graph subgraph), because it contains fewer nodes than the complete program dependency graph, it can improve the efficiency of interpretation.

2 Basic knowledge

This section will introduce the basic knowledge of the theory involved in this paper, mainly including the related concepts of static program analysis and graph neural network.

2.1   Concepts related to program static analysis

The method of program static analysis is mainly based on code representation graphs such as code abstract syntax tree, control flow graph, and program dependency graph. The related concepts are introduced as follows.

  • Abstract syntax tree (abstract syntax tree, AST): AST is an abstract representation of source code in a tree structure. It starts from the root node and decomposes the code into code blocks, statements, statements, expressions, etc. AST is mainly used for Describe the grammatical structure of the code, which is the basis for forming other code representation diagrams;

  • Control flow graph (control flow graph, CFG): CFG is a directed graph, its nodes represent function statements, and edges represent the sequence of operations between adjacent statements, describing all possible running paths that a program may pass through during execution ;

  • Data dependence: Let A and B be two code statements, if the variables calculated in statement A are used in statement B, then the data of statement B depends on statement A;

  • Control dependency: Let A and B be two code statements, if the execution of statement B depends on the result of the execution of statement A, it is said that the control of statement B depends on statement A;

  • Program dependence graph (program dependency graph, PDG): PDG is a directed multi-graph with markers, which is constructed based on AST nodes. Nodes represent code statements, and edges represent data dependencies or control dependencies between adjacent nodes;

  • Code property graph (code property graph, CPG): CPG integrates AST, CFG, and PDG into a data structure that contains all syntactic and semantic features in the source code;

  • Code slicing: Code slicing consists of statements that have data dependencies or control dependencies between each other [19].

2.2    Graph neural network

Traditional convolutional neural networks and recurrent neural networks have achieved great success in extracting features of Euclidean space data (such as natural language), but they do not perform well when dealing with non-Euclidean space graph data (such as social networks). And Compared with traditional deep learning models, graph neural network has obvious advantages in the ability to deal with graph data in non-Euclidean spaces. Therefore, graph neural network is widely used in recommender systems, social network analysis, and traffic prediction. Graph neural network The working principle of the network model is as follows: First, construct the messages that need to be transmitted between adjacent nodes; then, collect the related messages of neighboring nodes; then, update the node representation. Among them, the information transmission work is mainly by taking out each node and its neighbors node information; the aggregation function aggregates all the above information; finally, the update function updates the node information, so as to capture the interdependent relationship between nodes and better learn the corresponding characteristics of the graph structure data.

The graph neural network model can be divided into the following three types according to the specific task level to be solved.

  • Graph-level task: The goal of a graph-level task is to predict the properties of the entire graph. For example, the vulnerability detection in this paper is a graph-level task, and its purpose is to determine whether there are loopholes in the input code representation graph;

  • Node-level tasks: The goal of node-level tasks is to predict the type of each node in the graph. For example, to determine the camp to which a node belongs in a social relationship graph;

  • Edge-level tasks: The goal of edge-level tasks is to predict the attributes of each edge. For example: in the semantic segmentation task of object detection, it is not only necessary to identify the type of each object, but also to predict the relationship between each object.

At present, the graph neural network is mainly divided into the following three types according to the model type [30].

  • Graph convolutional network (graph convolutional network, GCN): GCN extends the convolution operation to graph data. Its core idea is to learn a function mapping. Through this mapping, the characteristics of nodes and their neighbor nodes in the graph can be aggregated to generate the node The new representation of . Convolution operations can be divided into spectral-based methods and spatial-based methods. Among them, the spectral-based method needs to load the whole image into memory during training, so the performance is poor when dealing with complex graphs; because The central node, receptive field and aggregation function of the space-based graph convolution method are uncertain, resulting in mutual constraints and dependencies, and the actual use effect is not good;

  • Graph attention network (graph attention network, GAT): GAT introduces an attention mechanism on GNN, uses attention in the aggregation process to focus on neighbor node information that has an impact on the task, and then assigns weights to nodes, so that GNN focuses on tasks related to tasks nodes and edges to improve the model effect. However, it increases the time overhead and memory consumption of calculating the attention weights between neighbors. At the same time, the actual function of the attention mechanism is related to the network initialization;

  • Graph recurrent network (GRN): GRN converts graphs into sequences, using long short-term memory (long short-term memory, LSTM) or gated recurrent unit (gated recurrent unit, GRU) and other recurrent neural networks as architectures Training. Compared with other GNN models, GRN uses a gating mechanism in the process of information transmission, which can improve the long-term information dissemination ability and model representation ability on the entire graph. The gated graph neural network (gated graph neural network, GGNN) model is a network structure under the GRN type.

3 Slice-Level Vulnerability Detection and Interpretation Method Based on Graph Neural Network

In order to solve the problem that the existing vulnerability detection model has poor effect on the real vulnerability data set and cannot explain the detection results, this paper proposes and implements a slice-level vulnerability detection and interpretation method based on graph neural network. As shown in Figure 1 It is the overall frame diagram of the method in this paper. The method is composed of three modules: data preprocessing module, vulnerability detection module and vulnerability interpretation module. The input of the system is the source code of the software to be tested, and the output is whether there are vulnerabilities in the target program slice and Specific vulnerability statement.

Figure 1 The overall framework of slice-level vulnerability detection and interpretation method based on graph neural network

3.1   Data preprocessing

3.1.1 Code normalization

The program source code contains rich semantic information, which is convenient for expressing the characteristics of vulnerabilities. Therefore, this system is oriented to source code for vulnerability detection. But at the same time, information unrelated to code semantics such as code comments, complex variable and function naming in the source code will interfere with the depth Training and prediction of the learning model. In order to reduce the interference of irrelevant information in the code, the system first standardizes the source code. Considering that the real code is complex and difficult to display, the source code shown in Figure 2 comes from the artificial data set SARD .

Figure 2 Sample data preprocessing

As shown in the code standardization part in Figure 2, the code standardization includes the following three specific steps.

  • Step 1: Remove the comment information in the code (such as /**/);

  • Step 2: Map the user-defined variable name to a unified variable name (such as VAR1);

  • Step 3: Map the user-defined function name to a unified function name (such as FUNC1).

3.1.2 Program dependency graph extraction

The code representation graph is an effective code representation method, which can intuitively reflect the semantics, syntax, and structural information of the code. Among them, the program dependency graph is a data structure in which the nodes of the abstract syntax tree are connected by data dependencies and control dependencies. It is different from the abstract syntax tree, control flow graph and other graph representations that only contain a single code information such as syntax or control flow, and is different from the code attribute graph that contains too much unsimplified code information that affects the detection efficiency of the model. Program dependency graph It is an efficient and comprehensive code graph representation method, and can effectively represent various types of vulnerabilities [24]. Therefore, this paper chooses the program dependency graph as the source code representation method.

This paper uses the open source static code analysis tool Joern[31] to extract the program dependency graph from the source code. The reason for choosing this tool is that it can complete code analysis and program dependency graph generation without precompilation, which means it is suitable for The code to be tested at any granularity greatly reduces the user's use cost. As shown in the extraction part of the program dependency graph in Figure 2, the black box indicates the node corresponding to the code line, the blue edge indicates the control dependency, and the red edge Indicates the data dependency relationship, and the variable next to the red indicates the variable name of the data dependence. In order to show the follow-up work more concisely, this paper replaces the code in the program dependency graph node with the number, as shown in the right half of the program dependency graph extraction in Figure 2 .

3.1.3 Code slice extraction

On the one hand, function-level samples contain a large number of noise statements irrelevant to vulnerabilities, which will interfere with the learning of vulnerability characteristics by the model, thereby affecting the detection effect; on the other hand, the code statements of real software are more complex, so the program dependency graph contains is too large, resulting in a large amount of time and memory overhead in the training and interpretation process. To avoid the above problems, the vulnerability detection and interpretation process in this paper is oriented to the slice level to remove irrelevant information about the vulnerability, thereby improving the detection and interpretation effect and reducing overhead. Specifically, In other words, this paper adopts the concept of vulnerability slicing proposed by Li et al. [19] as the guiding ideology of slice generation. At the end of their work, it is concluded that software vulnerabilities are mainly introduced by pointers, arrays, expression operations, and the location of sensitive API functions, and called It is a vulnerability concern. For example, the common types of vulnerabilities such as array out-of-bounds, integer overflow, null pointer, and API function misuse are all caused by the above four types of vulnerability concerns. In the program code, there is data dependence or The set of statements that control dependencies constitutes a program slice that may have vulnerabilities; otherwise, other statements are regarded as vulnerability-independent statements that will interfere with model training. Different from the text slice constructed by Li et al. [19], this paper extracts the program dependency graph The subgraph is used as a slice, that is, after the above four types of vulnerability concerns are selected as the reference point of the program slice, the nodes and edges that have data and control dependencies with it are reserved to generate a program dependency graph subgraph. Specifically, the generated code slice can be Divided into 3 steps.

(1) Vulnerability concern selection. By traversing the program dependency graph nodes, select code elements that meet the four types of vulnerability concerns, and record this node as the slice reference point. Specifically: by matching "[" in the identifier declaration node Characters to determine array elements; Identify pointer elements by matching "*" characters in identifier declaration nodes; Match expression operation nodes by regular expression rules; Sensitive APIs through the list of sensitive APIs provided by Li et al. [19] Node matching. As shown in the extraction part of the program dependency graph in Figure 2, there is pointer element VAR1 in node 1, array elements VAR2 and VAR3 in nodes 2 and 5 respectively, and sensitive API element strncat in node 6. Therefore, the above nodes selected as the fiducial point for program slicing;

(2) Program slicing generation. Starting from the slicing reference point, perform forward and backward slicing respectively to generate program slices. Specifically, on the program dependency graph, starting from the node where the vulnerability concerns are, trace forward, Control dependent edges and data dependent edges backward, and record the nodes and edges involved until no new nodes and edges appear. The program dependency graph subgraph obtained according to the above steps is a program slice. Because it only Contains nodes and edges that are dependent on vulnerability concerns, so while retaining source code structure information, it excludes information that is not related to vulnerabilities in the graph. The code slice extraction part in Figure 2 shows the program dependency graph from the source code Part of the extracted slices, slice 1−slice 4 are the slices obtained by taking VAR1, VAR2, VAR3 and strncat as the slice benchmark points;

(3) Program slice labeling. This paper relies on the patch information of the vulnerability to label the slice, that is, the slice containing the deleted line in the vulnerability patch is marked as a slice with a vulnerability, otherwise it is considered a slice without a vulnerability. According to the vulnerability provided by Information, the deletion of the vulnerability patch is the 11th line of the source code, which corresponds to the No. 6 node in the program dependency graph. According to the labeling rules, since the slice 1−slice 4 in Figure 2 both contain the No. 6 node, they are all marked as Slices with holes.

3.2    Vulnerability detection

In this paper, the graph neural network model is used for vulnerability detection. The graph neural network model automatically learns the vulnerability characteristics of the input graph, and classifies its feature vectors to detect whether the code contains vulnerabilities. It is mainly divided into graph feature extraction module, graph neural network module and Vulnerability classification module, as shown in Figure 3.

Figure 3 Vulnerability detection model

 3.2.1 Graph Feature Extraction Module

Since the code slices extracted in this paper are files in abstract graph format (program-dependent graph subgraph), they cannot be directly input into the graph neural network vulnerability detection model, so key features of the graph need to be extracted to obtain graph feature vectors. The program-dependent graph subgraph contains two features of two dimensions, that is, code features within nodes (hereinafter referred to as node features) and graph structure features.

Aiming at node features, this paper embeddings the code in the node. Specifically, the code in each node is regarded as a sentence, and after decomposing the sentence into a token list, a fixed-length vector is obtained by embedding. This paper uses the word2vec model[ 32] Realize node embedding, which uses a distributed representation idea, maps tokens to integers and then converts them to a fixed-length vector. This method is faster and more efficient than traditional embedding tools in terms of training embedding speed It is versatile, so it is widely used in the field of text mining. Specifically, this paper trains the token lists in all slices to generate a pre-trained word2vec model, and then uses the pre-trained model to embedding vectors for all nodes. Namely: input the pre-trained slice into the pre-trained word2vec model, and output its feature matrix Mi with the shape of m×n, where Mi represents the number of nodes in the slice, and n represents the dimension of the embedding vector. Considering the actual detection effect and performance , this paper sets n to be 100 dimensions. As shown in the input graph of Figure 3, there are 8 nodes in the graph, so the dimension of its node feature vector Mi is 8×100.

According to the characteristics of the graph structure, this paper embeddings the edge relationship in the graph. Each edge can be regarded as a triplet (start node, end node, edge type). Among them, the start node and the end node can be obtained from It is obtained directly from the program dependency graph, and the edge types are divided into data-dependent edges and control-dependent edges. As shown in the input graph of Figure 3, it has 10 edges, including 3 data-dependent edges and 7 control-dependent edges (red edges Indicates data-dependent edges, blue edges indicate control-dependent edges, purple edges indicate both data-dependent edges and control-dependent edges), and the output matrix AS is the graph structure feature matrix.

3.2.2 Graph neural network module

Since this paper transforms the source code into graph-structured data with data dependencies and control dependencies, and the graph neural network helps to further aggregate and transfer updates to better capture the structural and semantic information of the graph, so this paper adopts the graph neural network Further feature embedding of graph samples. At present, in the field of deep learning, a variety of graph neural network models based on aggregated neighborhood information have been proposed [33−35]. Among them, GGNN enhances the long-term memory ability of the network, which is better than the traditional graph neural network. The network model is more in-depth [35], and is more suitable for processing data with both semantics and graph structure. Therefore, this paper chooses GGNN as the final graph neural network model.

The principle of GGNN is: aggregate the information of the node and its neighbor nodes, send the aggregated node and the current node to the GRU to obtain the current node at the next moment, repeat this process, and after several time step iterations, all The final node feature of the node. As shown in the neural network module in Figure 3, after inputting the graph feature gi(Mi, AS), GGNN converts it into a long m×n′ slice by embedding each node and its neighborhood feature matrix

, where n′ is the final slice feature size set, which is set to 200 dimensions in this paper, then the dimension of the feature matrix M′iMi′ in the figure is 8×200.

Specifically, for each node vu in the graph, initialize the node vector

, that is, copy the feature vector of the vu node and fill it with 0. Let T be the total number of time steps for neighborhood aggregation. In order to obtain the information propagated in the entire graph, for each time step t≤T, all nodes according to the edges they depend on To communicate, namely:

 (1)    

Among them, Wu represents the trainable parameters, b is the bias,

Indicates that the node vu corresponds to the adjacency matrix of AS, which is the result of the interaction between the current node and the adjacent nodes through the edge. Finally, the aggregation function AGG aggregates the information of the node vu and merges it with the previous time step to obtain the node’s new state, namely:

 (2)    

3.2.3 Vulnerability classification module

The core idea of ​​the vulnerability classification module is to select a feature set related to the vulnerability feature to complete the graph-level vulnerability sample classification task. Previous work [36] proposed: use a classification pooling layer (SortPooling) after the graph convolution layer , so as to achieve the sorting of the output features of the graph convolutional layer, which can be input into the traditional neural network for training to extract useful features in the slice vector embedding. Therefore, in this paper, the node features are learned through the GGNN layer, and then a dimensional convolution and fully connected layers to learn features related to graph classification tasks in order to classify more effectively. Specifically, this paper defines the expression of the classification pooling layer τ(M) as follows:

(3)    

Among them, Conv represents the convolutional layer, BN represents the BatchNorm layer, Relu represents the activation function, MaxPool represents the maximum pooling layer, and M represents a feature matrix.

As shown in the vulnerability classification module in Figure 3, this paper divides the sliced ​​node feature matrix Mi and the corresponding slice feature matrix

Connect into a new matrix, perform τ classification pooling operation on and respectively, and get output Y1 and Y2; then, send Y1 and Y2 to the fully connected layer with output dimension 2; finally multiply the two output values After calculating the average, perform Sigmoid classification to obtain predictions, that is, the classification prediction layer shown in Figure 3, as shown in the following formula:

 (4)    

in,

  • Avg means average operation;

  •  Liner represents a fully connected layer;

  • P is the output binary classification result, which consists of two dimensions. Among them, the first dimension represents the probability that the result has no loopholes, and the second dimension represents the probability that the result has loopholes.

Finally, the model takes the larger probability of the two as the final output of the vulnerability classification. Table 1 shows the specific parameter information used in the training model in this paper. The model in this paper uses the cross-entropy loss function CrossEntropyLoss to correct the wrong classification, and at the same time Use the Adam[37] optimization algorithm with a learning rate of 0.0001 and a weight decay of 0.001 to train the parameters Wu and b of the graph neural network module in Section 3.2.2 Formula (1). After obtaining the trained model, use It judges whether there is a vulnerability in the new code slice.

Table 1 Model training related parameter settings

3.3    Vulnerability Explanation

In the vulnerability explanation part, this paper uses and improves the GNNExplainer graph neural network interpreter [13], so as to provide a fine-grained explanation for the detection results output by the vulnerability detection model, that is, specific vulnerability code lines, to help researchers complete vulnerability cause analysis and further bug fixes.

GNNExplainer is a perturbation-based explanation method designed for the GNN model. It does not depend on the specific graph neural network model and has strong versatility. Its basic idea is to complete an optimization task that maximizes mutual information, and the input Generate the corresponding edge mask for update training, and finally output the key information in the graph as the interpretation result according to the change of the prediction result. During the mask update training process, each training can get a new graph containing key information, Input the new image into the trained GNN model, evaluate the importance of the edge mask according to the detection results, and continue to repeat the training to reduce the loss, and find a key information with the smallest loss as the end of the training. Finally, the interpreter passes The edge mask is used to obtain the important edges, and the important subgraphs are extracted according to the important edges as the model's interpretation of the instance.

Specifically, the goal of GNNExplainer is to learn an edge mask EM from the original graph GW(Mi, AS) to obtain the key subgraph GS as an explanation result. The starting point is to maximize the mutual information MI between the subgraph GS and the original graph GW, That is to ensure that the sub-image covers the important information of the original image as much as possible, as shown in the following formula:

(5)    

Among them, Y represents the prediction result of the vulnerability detection model. Obviously, the entropy H(Y) is a constant for the pre-trained model, so the above formula can be equivalent to the minimum conditional entropy H(Y|G=GS), According to the definition of conditional entropy, there is the following formula:

 (6)    

That is to say, when the interpretation result is GS, the degree of uncertainty in the prediction result Y still exists. However, the above formula cannot be directly optimized.

GNNExplainer treats GS~g as a random graph variable under special circumstances, so the above formula becomes

 (7)    

GNNExplainer further uses Jenson's inequality and convexity assumptions to convert the above formula into

(8)    

Among them, Eg[GS] can be replaced by an edge mask. Therefore, the final training target of the interpreter is actually an edge mask. After the training is completed, the output program depends on the key subgraph GS of the graph as the interpretation result. From the above description, we can see that: GNNExplainer sorts important edges through edge masks, and then generates important subgraphs. However, in the vulnerability detection task, researchers prefer to know which lines of code (that is, graph nodes) have vulnerabilities, rather than the program dependency graph. There are loopholes in which data control depends on the edge. Therefore, in fact, there is no need for additional operations to generate important subgraphs. In order to solve the above problems, this paper designs and adds a node sorting algorithm for GNNExplainer, so that it can be sorted by node importance and output The key nodes in the figure make the interpreter more suitable for vulnerability detection tasks, which can significantly improve the interpretation accuracy and reduce the time overhead of the original method (reducing the calculation process of the interpreter to generate subgraphs and irrelevant masks). The specific method is : First, sort the edge masks according to their importance; second, accumulate weights for the edge nodes corresponding to the edge masks, and the weight is the size of the edge mask; finally, sort each node according to the weight, and you can get Node sequence sorted by importance. Specifically: For node i in the graph, its weight is represented by N(i)MNM(i) (the initial weight of all nodes is 0). EIl represents the edge number containing node i , then E(EIi)MEM(EIi) represents the size of the edge mask whose edge number is EIi. Then, the weight update formula of node i can be expressed as

 (9)    

The vulnerability interpretation model in this paper is shown in Figure 4. The input of the interpreter is the vulnerability slice detected by the vulnerability detector. The interpreter generates a new graph by perturbing the edge mask of the input slice, and observes the detection of the new graph by the vulnerability detection model. As a result, the importance score of the edge mask is obtained, and finally the importance score of the edge mask is mapped to the importance score of the node and the output is sorted according to the score. Among them, the node with the higher score is considered more likely to be a vulnerable code line , that is, the interpretation of the vulnerability detection results.

Figure 4 Vulnerability Explanation Model

4 Experimental analysis

This section mainly explains the specific experimental links: First, the experimental preparation is explained, including the experimental data set, the specific implementation method of the experiment, and the experimental evaluation indicators; second, three research questions are introduced to verify the experimental effect of the method in this paper; finally, Analyze the experimental results.

4.1   Experimental preparation

(1) Experimental data set

This paper conducts all experiments on a subset of the widely used vulnerability dataset Big-Vul[38]. On the one hand, the Big-Vul dataset is a high-quality real vulnerability dataset, which covers 348 All CVE (common vulnerabilities and exposures[39]) entries of a real open source project, a total of 11 834 vulnerable functions and 253 096 non-vulnerable functions; on the other hand, it covers the complete code before and after patching and the specific vulnerability patching location , in order to facilitate slice labeling. Therefore, this paper chooses Big-Vul as the experimental data set.

It should be noted that the Big-Vul data set crawls all the functions in the source files containing specific vulnerability patches, so the functions before the patch are marked as vulnerable functions, and the functions after the patch are marked as non-vulnerable functions. Any modified and retained functions are marked as non-vulnerable. In the actual preprocessing process, slice labeling depends on the modified line information in the vulnerability patch, resulting in some data that cannot be marked and needs to be filtered. Therefore, this paper finally uses Big- There are 4 030 vulnerable functions and 4 716 non-vulnerable functions in the Vul dataset, and 16 260 vulnerable slices and 16 301 non-vulnerable slices are obtained from them.

(2) Specific implementation

The experimental equipment in this paper has a memory size of 128 GB, and is equipped with a 16-core Intel Xeon processor and a NVIDIA Quadro GTX 5000 series 16 GB graphics card. Each stage of the system is implemented by tools such as Joern[31], word2vec[32] and PyTorch[40]. In the vulnerability detection part, this paper randomly divides the data set into training set, verification set and test set according to the number of slice samples according to 8:1:1. In the vulnerability explanation part, since only the vulnerability slice needs to be provided with vulnerability explanation, this paper randomly 1 399 flawed slices with correct predictions in the test set are selected to be input into the interpreter for interpretation.

(3) Evaluation indicators

In order to evaluate the effectiveness of the vulnerability detection model, this paper uses common indicators to evaluate the accuracy of the model to evaluate the vulnerability detection model [11, 17].

  • True positive (TP): the number of samples that are correctly predicted to be vulnerable;

  • True negative (true negative, TN): the number of samples that are correctly predicted as having no loopholes;

  • False positive (FP): the number of samples that are incorrectly predicted to be vulnerable;

  • False negative (false negative, FN): the number of samples that are incorrectly predicted as having no loopholes;

  • Accuracy: accuracy=(TP+TN)/(TP+TN+FP+FN);

  • Recall rate (recall): recall=TP/(TP+FN);

  • Precision (precision): precision=TP/(TP+FP);

  • F1=2×precision×recall/(precision+recall).

In order to evaluate the effectiveness of the explanation model, this paper uses the accuracy evaluation index of IVDetect[29] to evaluate the vulnerability explanation model. Specifically: if the output node contains the modification line of the vulnerability patch, the explanation is considered correct; otherwise, it is considered Interpretation errors. The accuracy of interpreting the results is the ratio of the correctly interpreted sample size to the total sample size.

4.2   Experimental results and analysis

The experiments designed in this paper are designed to answer the following three research questions.

RQ1: How well does our method perform in source code vulnerability detection?

RQ2: How effective is the method in this paper for explaining the results of vulnerability detection?

RQ3: Can the method in this paper detect and explain real software vulnerabilities?

4.2.1 RQ1: How well does our method perform in source code vulnerability detection?

To answer this research question, this paper compares the proposed method with several advanced vulnerability detection tools. The comparison methods include three rule-based vulnerability detection tools (Checkmark[6], FlawFinder[7] and RATS[8]) and Four deep learning-based vulnerability detection methods (TokednCNN[9], StatementLSTM[10], SySeVR[11], Devign[12]). The experimental results are shown in Table 2.

Table 2 Comparison with other vulnerability detection tools

The rule-based method is a vulnerability detection method that matches the expert-defined vulnerability pattern with the source code. As shown in the precision rate, recall rate and F1 score in Table 2, the commercial static vulnerability detector (Checkmark) and the static analysis system (FlawFinder and RATS) detection results are not satisfactory. For example, the recall rate of Checkmark is only 31.9%, indicating that it can only successfully detect 31.9% of real-world vulnerabilities. These tools rely on the rules and patterns formulated by human experts, and through lexical analysis The code of the target program performs pattern matching to detect vulnerabilities. However, the vulnerability patterns defined by experts are difficult to cover the complex real environment, resulting in poor detection of real vulnerabilities. Different from the rule-based method, the method in this paper uses the neural network model to automate Mining and learning the vulnerability patterns in the training samples, while saving labor costs, it achieves a lower rate of false negatives and false positives.

The token-based method regards the entire source code fragment as a piece of natural language text, and disassembles the code text into token sequences according to the text segmentation method for model training. Among them, TokenCNN is a representative work in this type of method . It first converts the source code into a token sequence through lexical analysis, then embeds it into a vector, and finally inputs the vector representation into the convolutional neural network model for vulnerability prediction. Because TokenCNN treats the source code as plain text, and directly The code trains and predicts the model in the way of text processing, which lacks consideration of the semantic and structural information of the source code, thus losing a lot of information in the code, which leads to poor detection results.

The statement-based method is similar to the token-based method, which directly embeds each line of code instead of the solution of token-based method first segmentation and then embedding. For example: StatementLSTM regards each line of code as a natural language sentence, and it After embedding a fixed-length vector, input the LSTM model to train the vulnerability detector. Compared with the token-based method, it avoids the semantic loss caused by word segmentation, so the detection effect is relatively better. But the sentence-based method is also from the perspective of text Dealing with the code, the problem of the loss of code syntax and semantics has not been solved at the root.

The method based on the function level is to treat the entire function as the processing object for model training. For example: Devign first uses complex program analysis to extract a code attribute graph of a complete function, and then uses a general-purpose graph neural network to detect vulnerabilities. Through the code attribute of the complete function Graph representation, so that the training samples can contain comprehensive semantic and syntactic information about the code. However, the code attribute graph of the function contains a large number of vulnerability-independent nodes and edges that interfere with model training on the one hand; on the other hand, the code attribute graph also contains abstract Syntax tree, program control flow graph, program dependency graph and other data structures, the complex graph structure makes it difficult for the model to learn accurate vulnerability characteristics, resulting in poor model effect.

The slice-based method refers to first extracting the key semantics of the program through program slices, and then training and testing the sentences or node sets related to the key semantics. Among them, SySeVR is one of the most representative slice-level methods. Its Slicing is performed on the target program to generate code slices, which are then embedded as corresponding vector representations. Unlike other level methods, the slice-level method is equivalent to pre-purifying the samples, that is, removing noise statements in the samples. However, SySeVR Treat code slices as a piece of natural language text, so dependencies between lines of code are not captured in its slices. Instead, the slice generated by our method is a program dependency graph subgraph, which preserves the integrity of vulnerability-related inter-statements Semantics and code structure, thus achieving better detection results.

In summary, the method in this paper uses source code slicing (program-dependent graph subgraphs) to train the graph neural network vulnerability detector. The experimental results show that it can achieve an F1 score of 75.1% for real vulnerability detection. Compared with other vulnerability detection tool, the F1 score increased by 41.2%−110.4%, which has obvious advantages.

4.2.2 RQ2: How effective is the method in this paper for explaining the results of vulnerability detection?

To answer this research question, this paper compares the explanation effect of the proposed method with two advanced general-purpose graph neural network explainers (GNNExplainer[13] and PGExplainer[14]). Table 3 shows that the improved GNNExplainer method and the original Comparison of the interpretation effects of GNNExplainer and PGExplainer. Among them, the original GE means GNNExplainer, the original PE means PGExplainer, and the method in this paper refers to the improved GNNExplainer in this paper. The first line of Table 3 represents the interpreter output 2%−20% of the figure to be explained The nodes of are used as the accuracy rate of the interpretation results.

Table 3 Comparison of accuracy with other graph neural network interpreters (%)

From the experimental results in Table 3, it can be seen that compared with the original GNNExplainer and PGExplainer, the improved GNNExplainer method in this paper has improved the accuracy of the actual vulnerability explanation by 3.8%−18.5% and 13.4%−29.0%, among which, the top 10% The accuracy rate of nodes is as high as 73.6%, and the accuracy rate of the top 20% nodes is as high as 91.1%. The reasons are:

  • The original GNNExplainer constructs key subgraphs from the perspective of key edges and outputs the explanation results. However, as a general graph neural network, it is not designed for vulnerability samples. Vulnerability information caused a certain loss; and the method in this paper directly maps the importance score of the edge mask to the node, reducing the intermediate steps, so the interpretation effect is better than the original GNNExplainer, and it is closer to what researchers tend to obtain when analyzing the cause of the vulnerability. Lines of code are research requirements for vulnerable code;

  • Unlike GNNExplainer, which directly explains a single sample, PGExplainer needs a large number of samples for parameter training of the interpreter before explaining the graph neural network model, so as to explain multiple instances at the same time from the global perspective of the model, and its explanation is accurate The reliability depends on the training effect of the interpreter; while the experimental data in this paper are real vulnerability data, the data is highly complex, and it is difficult for PGExplainer to cope with the interpreter training of a large number of complex examples, so the explanation effect is not good. In contrast, GNNExplainer Separate training and interpretation for a single instance makes the loss of a single instance smaller and the interpretation accuracy higher.

It can be seen from the experimental results that with the increase of the number of nodes, the accuracy rate of explanation statement covering the loophole line is higher. But at the same time, the larger the number of nodes output by the explanation result means that researchers need to analyze more potential loophole lines, so the It brings a large manual burden. From the results in Table 3, it can be seen that when the number of explanation output nodes is greater than 10%, the growth rate of explanation accuracy begins to slow down. At the same time, in order to avoid bringing too many irrelevant explanation sentences, this paper finally The number of nodes output by the previous 10% nodes as the last interpreter.

This paper makes experimental statistics on the time overhead of the proposed method and the comparison interpreter, as shown in Table 4.

Table 4 Comparison of time overhead with other graph neural network interpreters

The experiment randomly selects 1 399 samples to be interpreted for interpretation, and the number of training rounds is 100 rounds by default. Compared with the original GNNExplainer, the method in this paper reduces the calculation process of the interpreter to generate subgraphs and irrelevant masks. Therefore, compared with the original method, The time overhead is significantly reduced. Since PGExplainer needs to train the interpreter in advance, and the trained interpreter then interprets a single instance in turn, so its time overhead is mainly used for interpreter training. The method in this paper performs training and explanation for each instance at the same time. Therefore, the total time overhead is significantly lower than that of PGExplainer, and the explanation efficiency is higher.

In summary, by improving GNNExplainer, this paper can accurately and quickly locate the vulnerability statement. On the one hand, it helps to improve the credibility of the vulnerability detection model; Repair. Compared with the existing general-purpose interpreter of graph neural network, the method in this paper has obvious advantages in terms of accuracy and time overhead.

4.2.3 RQ3: Can the method in this paper detect and explain real software vulnerabilities?

In order to answer this research question, this paper conducts vulnerability detection and interpretation on common open source software to test the practicability of this method in real-world vulnerability discovery. Specifically, this paper experiments the open source C/C++ code of four target programs As detection targets, the software to be tested includes libv, Xen, OpenSSL and httpd.

Table 5 gives the detailed description of the target software, in which there are a total of 286,039 target functions, and 3,312,657 slices are obtained from them. The specific process of the experiment is as follows: First, input the preprocessed slices into the trained vulnerability detection model for vulnerability detection. Then, the vulnerability slice is input into the interpreter to obtain the vulnerability rows corresponding to the top 10% key nodes. Since the method in this paper is a static detection method, it cannot directly provide the trigger information of the vulnerability in the program running, that is, apply for a new The vulnerability number (CVE-ID) needs to be submitted as the running basis of the program triggered by the vulnerability. Therefore, the experiment uses manual analysis to verify whether the detection and interpretation results of the method in this paper are correct for real software. Specifically, the manual analysis process uses two Verification method: First, by comparing with the vulnerability patterns of existing vulnerabilities in the national vulnerability database (NVD) [41], it is determined whether the method in this paper detects real vulnerabilities, and the accuracy of the vulnerability interpretation results is verified; secondly, through manual analysis of the vulnerability causes way to verify.

Table 5 Details of the open source software selected in this paper

This section describes in detail a vulnerability instance of a USB emulation program in Xen. Figure 5 shows the complete process of slice API#2 from slice generation to vulnerability explanation. Considering that the actual slice diagram takes up a lot of space, here we use The code line corresponding to the node shows the slice content.

Figure 5 Xen Real Vulnerability Detection Explanation Example Analysis

According to the analysis, there is an integer overflow vulnerability in this function. This function allows local guest users to cause denial of service attacks through remote NDIS control message packets. Specifically, line 7 of the code declares two integers bufoffs and buflen, and buffoffs and buflen are in Lines 11 and 12 are assigned values ​​after receiving the values ​​input by the user, and the branch condition is bufoffs+buflen > length (line 13). When the values ​​of buffoffs and buflen are deliberately constructed as maximum values ​​by the attacker, integer overflow will occur. As a result, the branch condition is not satisfied, that is, it cannot return normally. When the function further calls the function on line 16−18, the QEMU process crashes, and the host memory information leaks. Therefore, the code that has data and control dependencies with buffoffs and buflen Lines, including Line 7, Line 11−13, Line 16, and Line 17, are considered to be vulnerability-related lines. As shown in Figure 5, only the interpretation results of the API#2 slice contain the correct 17, 13, and 16). In fact, the interpretation results of 2−3 slices of the original function can cover all the vulnerability introduction positions and trigger positions, but due to space limitations, all the slices and interpretation results of the vulnerability It will be published in the code repository of this article [42].

According to the verification method of matching known vulnerability patterns, the method in this paper detects at least 54 new vulnerabilities from the samples to be tested, corresponding to 36 known vulnerability patterns in NVD. Columns 1 and 2 of Table 6 show The detailed information of the vulnerability detected by the method in this paper includes the name and version of the software and the file path where the vulnerability is located. Columns 3-5 indicate the known NVD vulnerability information that has a similar vulnerability pattern to this vulnerability, including the CVE number that matches the vulnerability pattern, The name of the software that reported the vulnerability, the release time of the vulnerability, and the patching status of the vulnerability. Among them, 21 vulnerabilities have been silently patched or deleted in subsequent versions of the software, and 33 vulnerabilities have not yet been patched. For the unpatched vulnerabilities, this paper aims at To protect software security, its detailed information has been obfuscated, and vulnerability reports and vulnerability patch suggestions have been submitted to its developers. According to the verification method of manual analysis of the causes of vulnerabilities, the method in this paper has detected at least 5 new Vulnerabilities. Table 7 shows the vulnerability information after the fuzzy processing of the new vulnerability. We have submitted the vulnerability to NVD for review and notified the manufacturer to fix it in time. After the NVD review is completed and the vulnerability is successfully repaired, we will publish the detailed vulnerability in the code warehouse of this article information.

Table 6 List of vulnerabilities verified by pattern matching with known vulnerabilities

Table 7 List of Vulnerabilities Verified by Vulnerability Cause Analysis

To sum up, the method in this paper successfully detected and explained 59 new vulnerabilities in 4 open source products. Among them, a total of 54 new vulnerabilities matched the existing vulnerability patterns, and 5 new vulnerabilities were confirmed by manual analysis of the causes of vulnerabilities. Experimental results It shows that the method in this paper can effectively detect real software vulnerabilities and provide reliable interpretation of detection results.

5 Summary and Outlook

Aiming at the problem that existing vulnerability detection models are not effective in detecting real vulnerabilities and cannot give specific vulnerability statements, this paper proposes a slice-level vulnerability detection and interpretation method based on graph neural network. This method first extracts code slices for vulnerability concerns, And using the graph neural network and the improved graph neural network interpreter to implement an efficient and accurate vulnerability detection and interpretation method for code slices. The test results of the method in this paper on the real vulnerability data set reached 75.1% F1 score, compared with The existing vulnerability detection model has been improved by 41.2%−110.4%. The experimental results show that the method in this paper can effectively detect real vulnerabilities. In terms of the interpretation of the vulnerability detection results, the improved graph neural network interpreter in this paper is better than the other two in terms of interpretation accuracy. A general-purpose graph neural network interpreter has improved by 8.9% and 24.9%, respectively, and the time overhead has been shortened by 42.5% and 15.4%. The vulnerability detection model provides a credible interpretation of the detection results. Finally, the method in this paper scans tens of millions of lines of code of 4 open source software, and successfully detects and explains 36 real vulnerabilities that match the known vulnerability patterns, which proves that the method in this paper is practical. Effectiveness in application.

However, the current work still has certain limitations.

  • In terms of vulnerability detection, not all vulnerabilities are introduced by the four types of vulnerability concerns mentioned in the article, resulting in the failure of some samples to successfully extract slices due to the absence of vulnerability concerns, and thus fail to perform vulnerability detection. Therefore, in the future, we will supplement and Improve the types of vulnerability concerns, or adopt a method that combines function-level and slice-level vulnerability detection, so that more samples can be successfully detected;

  • Secondly, this paper improves the existing general-purpose graph neural network interpreter GNNExplainer as a vulnerability explainer in the vulnerability explanation part, but its consideration of the characteristics of vulnerability samples is still not detailed enough. In the future, we will continue to analyze the characteristics of vulnerability samples in depth, and then according to the vulnerability characteristics Design a perturbation method for vulnerability samples and construct an interpreter;

  • Finally, when explaining the vulnerability detection of existing real products, this paper compares the detected vulnerabilities with the known vulnerability patterns one by one through manual analysis to confirm the accuracy of the detection results. It took 1 month to complete. For other vulnerabilities detected by the method in this paper that do not match known patterns, they are considered as possible zero-day vulnerabilities. For such vulnerabilities, PoC can be generated by directional fuzz testing technology to verify Whether it really exists, and submit PoC and vulnerability trigger information to apply for a vulnerability number. However, the fuzzing method at this stage will consume huge computing resources to verify vulnerabilities of this order of magnitude. Therefore, in the future, we plan to design a lightweight directional The fuzzer is used to support the subsequent PoC generation and other vulnerability verification work of the zero-day vulnerabilities detected in this paper.

In future work, we will continue to improve the detection and interpretation effects of the vulnerability detection model, and better assist experts in vulnerability mining and analysis.

Disclaimer: The articles and pictures reproduced on the official account are for non-commercial educational and scientific research purposes for your reference and discussion, and do not mean to support their views or confirm the authenticity of their content. The copyright belongs to the original author. If the reprinted manuscript involves copyright and other issues, please contact us immediately to delete it.

"Artificial Intelligence Technology and Consulting" released

Guess you like

Origin blog.csdn.net/renhongxia1/article/details/131281533