Why does compilation need to be involved in the code inspection process?

This article is shared from the Huawei Cloud Community  "Why does compilation need to be involved in the code inspection process?" , author: gentle_zhou.

As everyone pays more and more attention to software security, the protection of source code security during the coding stage is also mentioned more and more frequently by the R&D, test, operation and maintenance teams and individual developers of enterprises from all walks of life, among which the static code inspection SAST tool is especially protrude.

SAST code inspection service is a tool that can check the quality (including style), security, specifications and other aspects of source code. It can detect defects and risks in the code. As everyone uses the tool in-depth, many friends are confused during the use process. Didn't they agree to only check the source code? Why is compilation involved? Why does the compilation succeed in my local environment, but when I put it in the cloud environment, it says that the compilation failed?

This article attempts to explain the above issues one by one so that friends can understand the process and principles.

1. Didn’t you agree to check only the source code? Why is compilation involved?

Generally speaking, yes, SAST static code inspection is a static application security testing technology, which is usually performed before the code is compiled; that is to say, the SAST tool does not force the execution or running of the code before it can be used. It targets the source code itself. You can analyze the syntax, structure, logic, etc. of the code.

However, this does not mean that the SAST tool has nothing to do with compilation; in fact, when necessary, the SAST tool also needs to use the compilation and construction tool to compile the code, and then analyze the generated compilation product to analyze the semantics of the code. and logic to have a deeper understanding and analysis.

2. What is the general process of compilation?

Before talking about the compilation process, let us first understand a few proper nouns.

AST, Abstract Syntax Tree, is a tree-shaped data structure used to represent the structure of program code. It can reflect the syntax and logic of the code. AST can be used in syntax checking, code style checking, formatted code, syntax highlighting, error prompts, automatic completion, etc.

cke_114.png

IR, Intermediate Representation, is a data structure used to represent the semantics of program code. It can convert codes in different programming languages ​​into a common form to facilitate analysis and optimization.

cke_115.png

CFG, Control Flow Graph, is a graphical data structure used to represent the execution flow of program code. It can divide the code into basic blocks and use edges to represent the jump relationships between basic blocks.

cke_116.png

The above three technical nouns play a role in allowing tools to better understand and process the semantics and logic of the code during the code inspection process, helping to improve the accuracy of analysis.

Let’s get back to the point, what processes will we go through during the compilation process of the SAST code checking tool? Generally speaking, the complete compilation process will go through: analyzing the source code for syntax, lexicon, and semantics, generating AST, then converting to IR, generating CFG, analyzing and optimizing the data flow, and generating target code.

Therefore, SAST code inspection is not completely independent of compilation. To a certain extent, it needs to rely on compilation and construction tools to assist in-depth analysis.

3. Why does the compilation succeed in my local environment but fails when I put it in the cloud environment?

At this point, I believe that most friends will understand that the compilation operation is used in the SAST tool, but I believe that there will still be scenes of unsuccessful scanning during use. The most typical one must be the problem in the subtitle: Why does my local compilation succeed, but when I put it to the cloud environment check, it says that the compilation failed?

Specifically, there are roughly the following reasons:

  • The most common thing is that in the local environment, the project references some private dependencies or configurations stored locally. In the cloud environment, during the SAST compilation process, these dependencies or configurations cannot be found, and the compilation fails.
  • The user's project itself is not a compiled project, or although the project is a compiled project, it is not configured correctly in the project. For example, a problem often encountered is that users just take over a project and get the information that it is a compiled project, but in fact the project does not contain the core configuration file. For example, the core configuration file pom.xml is missing in the maven project.
  • The check compilation parameters in the cloud SAST tool are not selected correctly. For example, the user's project is a Maven project, but the user mistakenly thinks it is a gradle project and selects gradle as the compilation tool in the cloud. For another example, in the C# project, for the msbuild compilation project, the wrong version of the .net framework was selected (3.5 was selected instead of 4.8).
  • There are some grammatical errors or type errors (such as spelling errors, missing semicolons, type mismatches, etc.) in the user's project code. In the local environment, the IDE help automatically corrects them or the local compiler does not check them. The cloud SAST tool uses a more stringent or higher-level compiler, causing the compilation to fail.
  • There are some special language features or syntax sugar in the user's project, such as Lambda expressions, list comprehensions, etc. The local compiler supports these features; while the cloud SAST tool uses a compiler that does not support these features or a lower language version. Causes compilation to fail.

Of course, different SAST tools use different scanning methods and technologies, so they also have different compilation methods and varying degrees of dependence on the compilation environment.

References

1. https://en.wikipedia.org/wiki/Abstract_syntax_tree

2、https://www.twilio.com/blog/abstract-syntax-trees

3、https://www.cs.princeton.edu/courses/archive/spr03/cs320/notes/IR-trans1.pdf

4、https://gcc.gnu.org/onlinedocs/gccint/Control-Flow.html#:~:text=A control flow graph (CFG) is a data,behavior of a function that is being compiled.

5、https://www.csl.cornell.edu/~zhiruz/5997/pdf/lecture04.pdf

 

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~

 

Fined 200 yuan and more than 1 million yuan confiscated You Yuxi: The importance of high-quality Chinese documents Musk's hard-core migration server Solon for JDK 21, virtual threads are incredible! ! ! TCP congestion control saves the Internet Flutter for OpenHarmony is here The Linux kernel LTS period will be restored from 6 years to 2 years Go 1.22 will fix the for loop variable error Svelte built a "new wheel" - runes Google celebrates its 25th anniversary
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/10114610