C / C ++ code that static analysis tool for research

C / C ++ code that static analysis tool for research

Excerpt: https: //www.jianshu.com/p/92886d979401

Brief

Static analysis (static analysis) refers to, without execute code on their process of analysis and evaluation of the software is software quality and security of an important part. It is through lexical analysis, semantic analysis, control flow analysis, data flow analysis and other techniques to expose the code line by line to resolve the problem, which will help us a lot at run time will be exposed tricky trouble strangled in the cradle.

Examples of typical problems

Code Static analysis can identify many types of vulnerabilities or defects, mild to warning level "unused variables", all kinds of weight to the wrong level bug, Here are some common, more serious, problems can still be detected.

■ buffer overflow

Buffer overflow is directed into the buffer space which exceeds the amount of data, resulting in redundant data coverage in other regions of the valid data, similar excess water into the container resulting in an overflow flowing to where it should not go , resulting in unpredictable consequences. From a practical look at the statistics, software buffer overflow problem is most prevalent vulnerabilities in C / C ++ does not provide this type of memory to detect cross-border language is even worse. Typically, a buffer overflow occurs are:

  • String copy, when the target buffer length less than the length of the source string (such functions include strcpy, _mbscpy, strcat, wcscat, memcpy, strncpy, _mbsncpy, strncat, wcsncatetc.).
// 字符串拷贝之前没有对s做长度判断,如果超过10,就会造成缓冲区溢出。
void func(char* s) { char buf[10]; strcpy(buf, s); } 
  • Format string processing, when the parameter does not match the format string (Such functions include printf, fprintf, sprintf, swprintfetc.).
// %n将前面打印的字串长度信息写到相应地址
int len = 0; printf("This is a test string.%n", &len); 
// 错误的写法,此时长度信息会写到地址为0的内存空间中
int len = 0; printf("This is a test string.%n", len); 
  • Reading string, when the buffer to be read is less than the string length (Such functions include scanf, fscanf, sscanf, gets, getc, fgets, fgetcetc.).
// 用户输入的字串长度不受控制,如果超过10,就会造成缓冲区溢出。
char buf[10]; scanf("%s", &buf); 

■ Memory Leak

Generally refers to memory leaks heap memory leaks (there is also a system resource leak), application program memory resources are not properly released, resulting in this part of the memory can not be recycled and waste of resources. In severe cases, too much memory leak will cause the system to crash. C / C ++ language there is no automatic recovery mechanism, requires the programmer to ensure that the closed-loop memory usage (self new/delete, alloc/free, malloc/free, GlobalAlloc/GlobalFreeused in pairs).

 
 

Typically, there is a memory leak occurs:

  • After allocating memory forget to call the corresponding release function.
  • Process ended prematurely because of a particular condition, failure to implement a function to release memory back.
  • Program design is unreasonable, continue to allocate memory, the last to be released together, though not a memory leak as a whole, but in the process has been brewing for the possibility of resource depletion, it is tantamount to a memory leak.

■ wild pointer

When the pointer variable is not initialized, or has been recovered at the memory when the pointer has become dangling pointers. It points to the memory address is unlawful, illegal operation of this area will lead to unpredictable consequences.

// 对指针是否为空的判断看是严谨,其实是无效的。
char *p = (char*)malloc(10); free(p); if (p != NULL) { strcpy(p, "danger"); } 

Research Tools

Based on operational need, from detectable language, using the platform and licensing considerations in three areas, research more than 20 kinds of mainstream C / C ++ code that static analysis tool.

tool Language platform Authorize
AdLint C Windows, Linux, Mac OS, FreeBSD Open source
Astrea C Windows, Linux Pay
Bauhaus Toolkit C, C ++, Java, C #, Ada Windows, Linux, Solaris Pay
BLAST C Linux Open source
Cppcheck C, C++ Windows, Linux Open source
ladybug C Linux Open source
Coverity C, C++, C#, Java, JS, PHP, Python, Objective-C, Ruby, Swift, Fortran, VB Windows, Linux, Mac OS, FreeBSD, Solaris Pay
CppDepend C, C++ Windows, Linux Pay
ECLAIR C, C++ Windows, Linux, Mac OS Pay
Flawfinder C, C++ Python Open source
Fluctuat C, Ada Windows, Linux, Mac OS, FreeBSD Pay
Frama-C C Windows, Linux, Mac OS, FreeBSD Open Source / pay
CodeSonar C, C ++, Java, binary code Windows, Linux, Mac OS, FreeBSD Pay
Klocwork C, C++, Java, C# Windows, Linux, Solaris Pay
LDRA Testbed C, C++, Java, Ada Windows, Linux, Mac OS 付费
Parasoft C/C++test C, C++ Windows, Linux, Solaris 付费
PC-Lint C, C++ Windows 付费
Polyspace C, C++, Ada Windows, Linux, Mac OS 付费
PRQA QA·Static Analyzers C, C++, Java Windows, Linux 付费
SLAM C Windows 免费
Sparse C Linux, Mac OS, BSD 开源
Splint C Linux, FreeBSD, Solaris 开源
TscanCode C, C++, C#, Lua Windows, Linux, Mac OS 开源

根据以下标准,筛选出3款适用性较高的工具——Cppcheck、Flawfinder、TscanCode——进行详细调研:

  • 语言:支持C/C++代码分析
  • 平台:支持在Windows和/或Linux平台运行
  • 授权:免费

为进行一次实践对比,从TscanCode的GitHub上抓到一组现成的C/C++编码问题示例,共94个CPP文件,考察三者的检测效果。

运行平台:Windows
被测语言:C/C++
测试集:TscanCode/samples/cpp

■ Cppcheck

Cppcheck可检测的问题包括:

  • Dead pointers
  • Division by zero
  • Integer overflows
  • Invalid bit shift operands
  • Invalid conversions
  • Invalid usage of STL
  • Memory management
  • Null pointer dereferences
  • Out of bounds checking
  • Uninitialized variables
  • Writing const data

并将问题分为以下6类:

  • 错误(error):bug。
  • 警告(warning):预防性编程方面的建议。
  • 风格警告(style):出于对代码简洁性的考虑(函数未使用、冗余代码等)。
  • 可移植性警告(portability):64/32位可移植性、编译器通用性等。
  • 性能警告(performance):使代码更高效的建议,但不保证一定有明显效果。
  • 信息消息(information):条件编译方面的警告。

安装十分简便,只需在官网下载最新的可执行安装包(本文目前为cppcheck-1.83-x86-Setup.msi)跟着向导「下一步」即可。

 
Cppcheck有GUI,选择菜单栏「Analyze」下的「文件」或「目录」即可对源代码进行静态分析。
 
运行结果对94个例子的分析十分到位,只不过底侧的代码预览对中文注释似乎不太友好。

除了GUI,Cppcheck还支持与多种IDE(如VS、Eclipse、QtCreator等)、版本管理系统(如Tortoise SVN、Git)集成使用。

可对每次分析进行配置甚至自定义规则,并作为项目文件进行保存或重载。

分析的结果报告可保存为格式化纯文本或XML,并可借助Python pygments将XML生成为HTML。

■ TscanCode

TscanCode是腾讯的开源项目,为此次调研的唯一一款本土工具,起初构建于Cppcheck的基础之上,后来进行了重新实现,并加入了对C#和Lua的支持。

TscanCode可检测的问题包括:

  • 空指针检查,包含可疑的空指针,判空后解引用比如Crash等共3类subid检查
  • 数据越界,Sprintf_S越界共1类subid检查
  • 内存泄漏,分配和释放不匹配同1类subid检查
  • 逻辑错误,重复的代码分支,bool类型和INT进行比较,表达式永远True或者false等共18类检查
  • 可疑代码检查,if判断中含有可疑的=号,自由变量返回局部变量等共计15类检查
  • 运算错误,判断无符号数小于0,对bool类型进行++自增等,共计11类检查

并将问题分为致命、严重、警告、提示、风格5类。

安装同样便捷,下载安装包(本文目前为TscanCodeV2.14.24.windows.exe)跟着向导「下一步」即可。

 
同样具有用户友好的GUI,且UI设计更时尚些。点击「扫描文件夹」或「扫描文件」选定路径后点击「开始扫描」即可使用。
 
扫描结果,对中文注释必然友好。

TscanCode的提示信息可以说直接照搬了Cppcheck,但给出的提示数量明显少于Cppcheck,以mismatchsize.cpp为例:

void Demo()
{ //分配的内存空间不匹配 int i = malloc(3); } 
 
Cppcheck对mismatchsize.cpp的检测结果有4条提示,TscanCode相应地只给出了后两条。

■ Flawfinder

Flawfinder由计算机安全专家David A. Wheeler个人开发,依托于Python,自然而然拥有了跨平台性。

安装:

pip install flawfinder

运行:

cd *python_path*/Scripts
python flawfinder *directory_with_source_code*

实践表明,Flawfinder对中文注释更不友好,直接拿TscanCode的测试集跑会报编码错误,尽管这些CPP文件本来就是Flawfinder文档所建议的UTF-8格式:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 92: illegal multibyte sequence

将测试集批量转换为ANSI格式后方可正常运行:

 
94个示例,仅检测出11个问题。

David A. Wheeler本人也在官网特别声明Flawfinder是款相对简单的静态分析工具,不进行数据流和控制流分析,甚至不识别函数的参数类型。

Flawfinder可将结果保存为格式化纯文本HTMLCSV三种格式。

3款工具对比

  • 检测能力:Cppcheck > TscanCode > Flawfinder
  • 友好度:TscanCode > Cppcheck > Flawfinder
  • 易用性:TscanCode > Cppcheck > Flawfinder

参考文献


2018年4月10日~16日 无锡

Guess you like

Origin www.cnblogs.com/LiuYanYGZ/p/11729938.html