A brief analysis of "code visualization" | JD Cloud technical team

1. What is code visualization?

Code visualization is the process of creating graphical representations of source code to help understand and analyze it. Code visualization is the process of creating graphical representations of source code to help understand and analyze it.

Personal understanding: By using graphical means (architecture diagram, dependency diagram, distributed tracing, class diagram, flame diagram, CallGraph, etc.) to make the code observable on certain characteristics, it is used to assist developers in understanding and analyzing projects or construction Some automation tools.

2. Why is code visualization needed?

Scenario 1: Difficulty understanding code logic

The project has a large amount of code and rapid iteration of requirements, so the documents compiled each time will quickly become outdated. It is extremely difficult for new students to get started, and it is difficult for veterans to have a comprehensive understanding of the overall business logic of the project, and they often need to reorganize the logic.



Scenario 2: The impact of the change is difficult to assess

The demand was to modify the logic of page A, but because the back-end code had a lot of common logic and the call level was very deep, it was discovered after going online that it affected the logic of page B, causing an online accident.



Scenario 3: Project reconstruction lacks a starting point

Old projects have gone through long iterations and multiple team changes, resulting in very confusing internal code logic and no one can fully understand all the logic. However, the demand for new business iterations continues, and the cost of modifications to original projects is getting higher and higher. Reconstruction is urgently needed to achieve higher R&D efficiency.



Other scenarios: Automated case regression often cannot cover the new logic; it is difficult to troubleshoot online problems and quickly locate the error code...

3. How to achieve code visualization?

Call Graph is a graphical representation of the relationship between different function calls in a program. It shows how functions in a program interact, allowing developers to understand the program's flow and identify potential performance issues.

The following explains the generation scheme of Call Graph, a way of code visualization, which can be divided into static and dynamic analysis:

3.1 Static program analysis

1) Generate based on source code

Before explaining the process of using source code to generate CallGraph, let us first review the relevant knowledge of compilation principles.

 

The front-end part of the compiler is mainly related to the source language and mainly includes:

Lexical analysis : Also called scanning, its main task is to scan the characters of the source program line by line from left to right, identify each word, determine the type of the word, and convert the identified words into a unified on-machine representation— — Token form. It can be compared to the process of combining English letters into words.



Syntax analysis : also called parsing. The parser identifies various phrases from the token sequence output by the lexical analyzer, thereby constructing a syntax tree and determining whether the source program is structurally correct. It can be compared to English words being combined into sentences.



Semantic analysis : Use the information in the syntax tree and symbol table to check whether the source program is consistent with the semantics defined by the language, such as: type checking, context-sensitive analysis, etc. It can be compared to checking whether an English sentence makes sense (for example: Dog is cat, this sentence is grammatically fine but semantically incorrect). It also collects the attribute information of the identifier and stores this information in the syntax tree or symbol table for use in the subsequent intermediate code generation process.

Intermediate code : An intermediate representation that contains information from which all facts about a program can be derived. The same intermediate code can reuse optimizer logic and directly use related compiler back-end functions, making each link more independent and easier to expand. Structurally, there are graph IR, linear IR and hybrid IR.

The back-end part of the compiler is mainly related to the target language, including the code optimizer and the target code generator. This part has little to do with generating CG. Without further elaboration on the principles, interested students can learn about LLVM and Graalvm .



With the basic knowledge of compilation principles, let’s take a look at the process of producing CG from source code:



It can be found that the analysis is actually a reproduction of the compiler front-end process, in which AST, CFG and CG are all counted as graph IR. Ready-made source code analysis tools include Antlr / javaparser /soot, etc. The following uses the javaparser tool as an example to briefly describe the generation process:

Step 1 : Import the source code and dependency packages of the project that need to be analyzed, and use tools to analyze them



Step 2 : Use visit mode to obtain all method and calling method information





Step 3 : Select a starting method and generate CG based on the method and calling relationship

Advantages: Language-independent and highly scalable. Disadvantages: The accuracy is poor and needs to be tuned; the analysis speed is slow; it is difficult to master non-Java language tools.

2) Based on bytecode generation

Customized development for language features can lead to faster results. Java bytecode can actually be regarded as a linear IR, and the analysis process is similar. At the same time, Java has a large number of bytecode manipulation tools (ASM, Javaassit, bcel, etc.), making bytecode analysis easy .

The basic idea is to obtain the class and method signature information from the .class file, and then find the invoke instruction in the bytecode to obtain the calling method signature. Based on these two pieces of information, the CG can be constructed. At the same time, because the bytecode contains the complete signature of the method, there is no need to introduce dependent jars for analysis like source code analysis, so the analysis efficiency will be much faster.



The following uses the bcel tool as an example to briefly describe the generation process:

Step 1 : Parse the target project, you can directly use the packaged jar package



Step 2 : Use visit mode to obtain all method and calling method information





Step 3 : Select a starting method and generate CG based on the method and calling relationship

Advantages: high analysis accuracy; fast analysis speed. Disadvantages: Language-related, poor scalability.

PS: I recommend an idea plug-in called call graph , which is implemented based on the psi capability of idea . The analysis is quite accurate when the amount of project code is not large.



3.2 Dynamic program analysis

Also known as runtime program analysis, it is generally implemented based on the agent method. I will not explain it here for now. I will write a separate article to explain the principle when I have the opportunity. Interested students can try AppMap .



4. What are the application scenarios?

Scenario 1: Change risk identification

Background : Identify risks posed by changes to infrastructure, changes external to the system, and changes within the system.



Scenario 2: Accurate testing

Background : Accurate testing is defined as a series of operations that use technical means to collect, store, calculate, summarize, and visualize the data generated during the testing process to ultimately help the team improve the efficiency of software testing and improve and optimize the overall quality of the project. For a detailed explanation, please read the second and third chapters of Accurate Testing.





Scenario 3: Architecture Guard

Background : We face many challenges in architectural governance

1) There is a mismatch between design and implementation. There is a huge difference between the designed software architecture and the actual implemented architecture. This difference often requires the coding to go online or even be discovered after a period of time;

2) No norms/non-compliance with norms. As a senior developer, we have developed a series of specifications, but not many team members are willing to comply;

3) The amount of code is huge and it is difficult to identify problems. In a system created by a dozen or dozens of microservices, it is often difficult to quickly discover the intricate relationships between them;

4) Errors can occur at every level of the architectural model. Such as API coupling between services, coupling between codes, database coupling, etc.;

5) Architects and developers themselves lack rich experience. I know there is a problem, but I can't tell what the problem is, and I don't know how to improve it.

Therefore, we need a platform/tool ​​to help us solve these problems.

Case : ArchGuard

Provides visual analysis based on the C4 model (context, container, component and code) and provides some architectural health monitoring indicators.





5. Extend reading

(Disclaimer: Some pictures are from the Internet and have been deleted)

Author: Jingdong Technology Xie Xiao

Source: JD Cloud Developer Community Please indicate the source when reprinting

Lei Jun: The official version of Xiaomi’s new operating system ThePaper OS has been packaged. A pop-up window on the Gome App lottery page insults its founder. The U.S. government restricts the export of NVIDIA H800 GPU to China. The Xiaomi ThePaper OS interface is exposed. A master used Scratch to rub the RISC-V simulator and it ran successfully. Linux kernel RustDesk remote desktop 1.2.3 released, enhanced Wayland support After unplugging the Logitech USB receiver, the Linux kernel crashed DHH sharp review of "packaging tools": the front end does not need to be built at all (No Build) JetBrains launches Writerside to create technical documentation Tools for Node.js 21 officially released
{{o.name}}
{{m.name}}

おすすめ

転載: my.oschina.net/u/4090830/blog/10120313