C/C++ static code security check tool

 

    A static code security check tool is a software that can help programmers to automatically detect whether there are security flaws in the source program. It finds potential security holes in the software by analyzing the source code of the program line by line. Aiming at the various security problems that are easy to exist in C/C++ language programming, this paper analyzes the root causes of the problems, and gives specific and feasible analysis and detection methods. Finally, by comparing the advantages and disadvantages of static code security inspection tools, some suggestions for improving the security inspection effect are given.

 

       The emergence of software vulnerabilities, in addition to the lack of awareness of programmers to write high-quality security programs, the insecurity of the programming language itself also makes it easier for programmers to write code with security problems inadvertently. Among many programming languages, C/C++ language is currently recognized as the language that is most likely to cause security problems. Hackers often use the vulnerabilities generated by these security problems to bypass security policies to achieve the purpose of network attacks. In view of this situation, it is a very effective method to use a static code security check tool to perform security check on the source program before the program runs. It deals with the problem itself rather than the symptom, so it is sometimes more effective than dynamic monitoring.

1 C/C++ language static code security check tool

Static code security checking tools work like static testing in software testing. The difference between them is that software testing is to find out bugs in the software, while the main purpose of static code security inspection is to solve the security problems of the software and to find the vulnerabilities in the software that are easily exploited by hackers. Its basic working principle is: read the source code line by line from front to back, locate possible suspects, and then carry out in-depth analysis step by step until a certain analysis result is obtained, and finally process it according to different analysis results and security policies. and report the processing results.

2 C/C++ language static code security check principle analysis

The working process of static code security check is: firstly read the list of unsafe functions, and then perform lexical analysis on the source program to be scanned. According to the list of unsafe functions, some functions will be found and dealt with accordingly; for functions that need to be parsed, further grammatical analysis will be performed to determine whether these functions will cause security problems, and dealt with accordingly. Repeat this process until all source programs have been analyzed, and finally report the results.

Specifically, there are the following analysis and processing methods for different types of security problems.

2.1 The solution to the buffer overflow problem

The buffer overflow problem is the most common problem in software today. From the current point of view, if the buffer overflow problem is found, most of the security problems have been found. The most fundamental cause of buffer overflow is that dynamic buffer boundaries are not checked, and overflow occurs when the source data length exceeds the buffer length. To statically analyze whether such problems exist in the source code, the buffer length must be calculated first.

For different types of buffers, there are four ways to calculate the buffer length:

(1) String constant: such as "satecode scan", the buffer length is the number of characters + 1. It exists in two ways, one is used directly in the function, and the other is in the variable definition or assignment statement. Either can be calculated by the grammatical analysis regression method;

(2) Static buffer: such as buf[1024], buf[MAX_len], for the former representation, the size of the buffer can be calculated directly according to 1024. For the second case, the size of the buffer can generally be determined by checking the macro definition and constant definition;

(3) Dynamic buffer: The dynamic buffer can be allocated by new, or allocated by alloc and malloc. For the former allocation method, you need to consider the base type allocated and then calculate the buffer length. For the latter allocation method, the buffer size can be calculated directly through the expression;

(4) Pointer reference: By referring to a pointer or an array subscript, a part of the preset buffer is referenced. In this case, first use the above method to find the size of the base buffer, and then calculate its offset by evaluating the expression.

In addition to this, the prefill data method can also detect buffer overflows. Example: For strcpy (buf1, buf2), add the following statement before the call: memset

(buf1,'A',sizeof(buf2)). If buf2 is larger than buf1, a buffer overflow occurs during the debug phase.

The feature of this method is that for functions that may cause buffer overflow, in the debugging phase (debug), the source buffer data is pre-filled, so that the overflow occurs in the debugging phase and avoids bringing unsafe factors to the runtime.

Specifically, the functions that may cause buffer overflow in C/C++ are divided into the following categories, and different types of functions are analyzed and processed separately.

3.1.1 String copy function with two arguments

Functions of this class include strcpy, _mbscpy, strcat, wcscat, etc. Its characteristic is that the function has two parameters, and the string is copied from one parameter to another parameter. When the length of the target parameter buffer is less than the length of the source parameter buffer, a buffer overflow occurs. Handling such functions checks the buffer length by means of data flow tracing.

For example the following program:

(1) void transdata(char *str)

(2) {char buffer[24];

(3) strcpy(buffer,str); /*Copy the contents of buf[256] to buffer[24]*/

(4) 

(5) char buf [256];

(6) for(i=0;i<255;i++)/*Write 255 M in buf[256]*/

(7) buf[i]='M';

(8) buf[255]=0;

(9) transdata (buf);

In this program, the overflow occurs when the program calls transdata(buf). To check this error, first, when strcpy(buffer,str) is encountered, check the destination parameter buffer, and find the one that appeared before it (line 2), and detect that its length should be 24 bytes; then, check again Source parameter str, found that it is obtained through the data flow of line 1 (char *str) → line 9 (buf)--> line 5 (char buf[256]), and detected that its length is 256 words Festival. At this point, it has been preliminarily determined that an overflow may occur. It is also possible to report where there is a buffer overflow (line 3). However, if you want to locate more precisely, you need to use the parser to continue to check all paths starting from line 5 defining the array buf[256], so that you can detect that an overflow does occur when calling transdata(buf), and finally report The path that caused the overflow (lines 3, 2, 5, 6, 7, 8).

3.1.2 String function with 3 arguments

Such functions include memcpy, strncpy, _mbsncpy, strncat, wcsncat, etc. It is characterized by three formal parameters, such as memcpy (buf, "M", count), when the number of bytes specified by count is greater than the buffer length of buf, overflow occurs. Processing such functions also adopts the method of data flow tracing. In the previous example, it is to check and compare whether the size of count exceeds the size of buf's buffer.

3.1.3 String handling functions for formatting control

There are two different cases of such functions: one includes printf, fprintf. Its characteristic is that the function cannot determine where the data parameters end, so the buffer overflow situation generally occurs when the number of parameters specified does not match the format string. This type of question analyzes whether the format string matches the parameter.

For example the following program:

(1)int   data=1234567890;

(2) printf("data=%d%n\n",data, &data); /* Display the value of data, and write the length of the displayed characters into the variable data*/

(3) printf("data=%d\n",data); The normal result of this program is: data=1234567890

data=10

If the second line is written as printf("%d%n\n",data), the length of the displayed content will be written to the memory pointed to by the variable data where the value is stored [2]. Of course, this address cannot be accessed. But if this input value is carefully designed, it will cause a buffer overflow attack. When analyzing, when encountering printf, first use the lexical analyzer to analyze and record the number of two double quotes containing "%" but not "%%", and then analyze whether the number of parameters matches it, then you can found such problems.

Another type of function includes sprintf and swprintf, which output through formatted strings. When the string buffer is less than the length specified by the formatted string, a buffer overflow occurs. Such problems check the dynamic length of the format string and compare it to the actual extent length.

3.1.4 Read string function into buffer

One such class of functions includes scanf, fscanf, sscanf, etc. A buffer overflow occurs when the specified buffer is less than the length of the string actually read. Analysis and processing method: Trace the appearance of parameters that describe the buffer in the program, check the buffer length, and prompt the user to use a format string with a limited input character length. Such as program segment:

char buffer[20]; scanf("%s",& buffer);

When checking, first analyze the size of the buffer pointed to by &buffer, and find that %s is not limited, indicating that overflow may occur. Then adopt the processing method [3] that prompts the user to use scanf("%20s", & buffer) instead.

Another class of functions includes fgets, fgetc, gets, getc. An overflow occurs if the value of the parameter limiting the size of the read data exceeds the length of the destination buffer. The handler class function uses the data flow trace method to check these two values. Such as fgets (char *sint n,FILE *stream), the function of this function is to read characters from the input stream stream and store them in the s string. Here, to analyze the definitions of s and n in the program, check whether the length of s is less than the value of n. It should be noted that it is strongly recommended not to use gets and getc, but to use fgets and fgetc instead.

3.2 Solutions to the memory leak problem

The reason for the memory leak is that the memory is dynamically allocated, but not released, so that the allocated memory can no longer be used. The general situation is the leak of heap memory, and also includes the leak of system resources, such as core state HANDLE, GDI Object, SOCKET, Interface, etc. [4].

Since the leak occurs while the program is running, it is not easy to detect memory leaks. Static security check can use the method of control flow tracking to find memory leaks by analyzing all possible paths. It is suitable for functions such as new/delete, alloc/free, malloc/free, GlobalAlloc/GlobalFree, etc. Depending on how the memory leak occurs, it can be analyzed in the following situations.

3.2.1 Memory leaks caused by forgetting to release memory

After dynamically allocating memory, there is no call to delete or free to release it. This kind of memory leak only needs to analyze whether all paths are only allocated memory with new and malloc, but not released with delete or free.

3.2.2 Memory leak caused by incorrect calling method of delete or free

Such problems are more common and the consequences are more serious. To deal with such problems, the method of syntax analysis is used for path analysis.

For example the following program:

void function(int size)

{

char* p= new char[size]; if( size>=512 ){

printf“(  Error!”); return;

}

//using the string pointed by p; delete p;

}

Obviously the program may end without reaching the exit, thus causing a memory leak.

To check for this type of problem, first analyze all paths using the method in 3.2.1, and then check whether there is a situation where the program ends without using delete or free to free the memory.

3.2.3 Implicit memory leaks

This kind of problem is quite special, the program keeps allocating memory during the running process, but does not release the memory until the end. Strictly speaking, there is no memory leak here, because the final program frees all allocated memory. But for a server program, if the memory is not released in time, it may eventually exhaust all the memory of the system. To check for such problems, start with freeing memory space, and check whether the freeing process only occurs when the destructor is called. If so, reanalyze the program to see if there is a path to reallocate memory without calling the destructor to find out if an implicit memory leak occurs. It should be noted that since such problems generally occur when abnormal situations occur, and the program itself is often normal, it is not easy to check it out statically, so this method can only analyze known special situations.

3.3 Solutions to the Null Pointer Reference Problem

Such functions include open, fopen. The so-called null pointer is a pointer that does not point to any legal storage space. If the process of opening the file is not checked, in the case of failure to open the file, a null pointer will be generated and used by hackers.

For example the following program:

FILE *in=null, *out=null;

 

out=fopen("\\Test\testnum1.txt","r"); in=fopen("\\Test\testnum2.txt", "w"); while(! feof(out))

{fputc(fgetc(out),in)};

A null pointer is produced when fopen fails to open a file. In addition, file attributes can also be easily modified if the file attributes are not checked after opening [5]. Analysis and processing of such problems adopts the method of syntax analysis to check whether the file opening process is strictly checked.

3.4 Solutions to the random number problem

The selection of many random numbers is involved in C/C++ programs, but the rand provided by the system is a pseudo-random number. Its internal implementation makes it possible to repeat the output value generated from a given seed, resulting in a random number that can be guessed by hackers. The analytical approach to handling such functions is to suggest replacing them with other robust data sources. Functions that may generate pseudo-random numbers include rand, drand48, erand48, jrand48, lrand48, mrand48, random, etc.

 

4 Conclusion

The static code security check tool of C/C++ language can find potential security holes in source programs before the program runs, which greatly reduces the probability of security holes and is of great significance for improving program security. But code inspection is time-consuming, and static code security inspection requires knowledge and experience. For more complex problems, static code security checking tools may not be able to detect them. Therefore, on the one hand, it is strongly recommended that programmers always maintain the idea of ​​high-quality programming and carry out active error-proofing design. On the other hand, for those systems that are more important, it is recommended to use a combination of various security checks. For example, on the basis of static inspection, dynamic resource monitoring, vulnerability scanning, intrusion detection and other methods are used to ensure system security.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325961080&siteId=291194637