Java interview must-test points--Lecture 06: Commonly used tool sets

This class mainly introduces commonly used tools and will explain three knowledge points:

The functions and applicable scenarios of JVM related tools;
Common Git commands and workflows;
Commonly used analysis tools in Linux systems.

Summary of commonly used tools

A summary of commonly used tools is shown in the figure below.

Note: Listed here are some relatively independent tools or commands, excluding services like ZK, Redis, and frameworks like Spring. These tools are the most commonly used of their respective types, and this diagram is not intended to be comprehensive.

Team collaboration tools

As shown in the picture above, first look at the team collaboration tools on the left.

Ant+ivy, Maven, and Gradle are all tools used to build projects and manage dependencies. Ant manages dependencies by directly referencing jar files, describing them through scripts, and executing different build targets. At present, Ant is rarely used, and Maven is the more mainstream project management tool.
Maven describes the project through POM files, provides executable goals through agreements, and manages dependencies through coordinates such as GroupId, artifactId, and version. Maven can automatically download dependencies from remote or local warehouses.
Gradle is an automated build tool that combines the advantages of Ant and Maven. It is based on Groovy's DSL, which is a domain-specific language. It combines the power and flexibility of Ant with the life cycle management of Maven, which can automatically download dependencies. Currently, more and more teams are using Gradle to manage projects.
Git and SVN are both version management tools. The main difference is that SVN is centralized, while Git is distributed, and branch management is more flexible. At present, SVN is used relatively rarely. Git is a version management tool used by most Internet companies. The following detailed explanation will specifically explain the use of Git and Git workflow.

quality assurance tools

Quality assurance tools include CheckStyle, FindBugs, SonarQube, etc.

Among them, CheckStyle and FindBugs are static code detection tools that can be integrated through IDE to detect local code.
SonarQube is a code quality management platform that integrates the two previously mentioned tools by default and is more suitable for overall assurance of project quality.

Stress testing tools

Stress testing tools include LoadRunner, JMeter, AB, and JMH.

LoadRunner and JMeter are relatively professional testing tools that can provide professional reports and data analysis, and are more suitable for QA personnel.
AB is a simple and convenient stress testing tool provided by Apache, which is more suitable for developers to conduct simple concurrent stress testing of HTTP interfaces.
JMH mainly conducts benchmark testing for JVM, and pays more attention to performance benchmarks at the method level. If you want to know the throughput of a method under two different implementations, you can use JMH. For students applying for Java R&D positions, the testing tool section can focus on learning about JMH.

Containers and Agent Tools

Regarding containers and agents, the current mainstream Java Web container is Tomcat, and the mainstream agent is Nginx. But here we need to learn more about some trends, that is, with the popularity of microservices, the use of API Gateways such as Envoy, OpenResty, Kong, Zuul, etc. is becoming more and more common. With the popularization of the DevOps concept, CI/CD, that is, continuous integration and continuous deployment, is becoming more and more important. In this part, we need to know the more commonly used Jenkins and GitLab CI.

Jenkins, the old continuous integration framework, can support the construction, testing, and deployment of different types of projects, supports rich plug-in functions, and an easy-to-use management interface.
GitLab CI is a continuous integration suite provided by GitLab. It is perfectly integrated with GitLab and is simpler and easier to use. It is more suitable for projects with relatively simple CI processes.
Travis and CircleCI are both commonly used continuous integration frameworks in open source projects. If you are developing open source projects, you can learn more about how to use these two frameworks.

Document management tools

As shown on the right side of the picture above, JVM tools and Linux system analysis tools will be explained in detail later and will be skipped here. Let’s look at document management tools.

JavaDoc uses annotations to describe Java classes and methods and generate description documents.
Swagger is a standardized and complete framework for generating and describing RESTful APIs. Swagger supports multiple languages and provides a visual Swagger UI. In Java, Swagger uses annotations to describe interfaces, parameters, return values, etc., which is very suitable for RESTful interface for management, especially cross-language web services.

Network tools

Services generally interact through the network, so engineers need to master common network tools when debugging and troubleshooting problems. Here are some commonly used network tools.

Postman is a Chrome plug-in for debugging web pages. It is equivalent to a client and can simulate HTTP requests initiated by users. It is an efficient interface testing tool and is very suitable for joint debugging and testing of HTTP interfaces.
Wireshark is a powerful network packet analysis tool that supports network packet analysis of various protocols. It can capture packets directly, or it can be used with tcpdump to analyze the results of tcpdump packet capture. For example, analyze the HTTP service packet sending and receiving time, the link establishment and closing process, the packet size and timing of the request packet, the TCP window size, etc.
Fiddler only captures packets for HTTP requests. It can modify requests or simulate slow network speeds. It is a powerful tool for web front-end and mobile terminal debugging. Charles has similar functions to Fiddler, supports mac systems, and is more suitable for mobile terminal packet capture.

Detailed explanation of JVM tools

JMC

The first is the JVM related tools. The first one to introduce is JMC, which is Java Mission Control. JMC is a graphical JVM monitoring and analysis tool provided in JDK1.7. As shown in the figure below, JMC includes three parts: JVM browser and JMX console, and JFR, which is the flight recorder.

The JVM browser can list the JVMs of running Java programs. Each JVM instance is called a JVM connection. JVM browsers use JDP, the Java Discovery Protocol, to connect to local and remote running JVMs.

JMX is a Java management extension specification that can manage and monitor JVM. Through the management of MBeans, JMX can collect JVM information in real time, such as class instance information, heap usage, CPU load, thread information, etc., as well as other runtime attributes that can be managed through MBeans.

JFR provides the ability to go deep into the JVM to see the runtime status. It is a very powerful performance Profile tool suitable for program tuning and troubleshooting. JFR collects events generated when the JVM is running, and can collect very comprehensive data information by specifying the event type and frequency to be collected. Here I will mainly introduce what information can be analyzed using JFR.

As shown in the figure below, JFR can collect and analyze five major categories of information.

Memory information can obtain configuration information such as the different stages and time-consuming status of GC, GC pause time, GC generation size, etc. You can view object allocation, including allocation on the TLAB stack, and object statistics, etc.
Code information can analyze hotspot classes, hotspot methods, hotspot call trees, runtime exception information, compilation status, including OSR stack replacement information, and class loading and unloading status.
In the thread information section, you can analyze: hotspot threads, thread contention, thread waiting time, and lock-related information.
In the IO information section, you can obtain disk IO during the collection period, that is, file read and write information, network IO and other information.
System information can obtain operating system information, process-related information, environment variables and other information.

To summarize: both JMX and JFR can obtain information about the JVM running. JMX is mainly used to monitor and manage JVM, and supports customized management capabilities by extending Mbeans. JFR is mainly used to periodically collect JVM running information and analyze the running status.

BTrace

What should I do if when analyzing online problems, I find that the logs are incomplete and cannot be located? It is definitely not a good idea to add logs and go online again, especially when debugging, you may need to add logs repeatedly to locate the problem. Or, problems that occur online are difficult to reproduce, and you have no chance to add logs and continue analysis. In this case, you need to use BTrace. BTrace is a JVM real-time monitoring tool, regarded by Java engineers as an artifact for performance tuning and online problem diagnosis.

BTrace is based on dynamic bytecode modification technology to track and replace Java programs at runtime. In other words, you can monitor the system operation without restarting the JVM, and obtain JVM runtime data information, such as method parameters, return values, global variables, stack information, etc.

What BTrace can do:

Methods can be positioned and intercepted to obtain the method’s input parameters, return value, execution time and other information.
You can view the creation of certain types of objects
You can perform statistics on memory usage and view object sizes.
You can view the synchronization block execution status
You can view the exception throwing situation and the parameter information that caused the exception
Able to support scheduled execution of detection tasks
Ability to view class loading information
Ability to perform deadlock detection
Can print thread stack information
You can monitor the reading and writing of files or networks.

As mentioned above, BTrace is very powerful and can do almost anything. Because Btrace will directly implant logic into the running JVM, there are some restrictions on its use to ensure safety.

So, what BTrace can't do:

BTrace cannot create new objects
Cannot throw or catch exceptions
Loops such as for and while cannot be used
The properties and methods of the BTrace script must be decorated with static
Cannot use synchronized synchronized blocks or synchronized methods
Instance methods or static methods cannot be called, only methods provided in the BTraceUtils class can be used.

It can be seen that the conditions for using BTrace are still very strict. Three points need to be noted:

Improper use of BTrace may cause the JVM to crash;
The modifications made by BTrace will always take effect and will not be eliminated until the JVM is restarted;
You can cancel the security restrictions of BTrace by setting JVM parameters.

Other JVM tools

jps is used to view Java process information, including process id, main class name, main class full path, etc.
jmap can view the statistical information of objects in the JVM, including memory usage, number of instances, object types, etc. jmap can dump the heap and analyze it with the memory analysis tool MAT.
jstat monitors the resources and performance of the JVM in real time. The statistical items mainly include: class loading status, memory capacity and usage, GC times and time, etc.
jstack can view JVM thread stack information, including: thread name, sequence number, priority prio, thread status, lock status, etc.
jinfo can view all parameters of the running JVM and set some parameters.
jcmd is a tool provided after JDK1.7, which can send diagnostic commands to the JVM. Its functions are very powerful, basically including the functions of jmap, jstack, and jstat. You can focus on understanding this tool.
Others include jconsole, JProfiler, jvisualVM, etc. The functions basically overlap with JMC. It is recommended to use JMC directly.

List several practical application scenarios.

When you troubleshoot online problems, you need to check the GC log and find that the detailed information of the GC is not printed. You can enable the JVM parameter PrintGCDetails through jinfo to take effect dynamically.
When you analyze the risk of memory leaks, you can regularly obtain statistical information on heap objects through jmap or jcmd to discover suspicious objects that continue to grow.
When you encounter a problem where all services take a high time at a certain moment, you can use jstat to observe the GC recycling status to see if the GC pause time is too high.
When you encounter a service in the JVM that is stuck or stops processing, you can check the thread stack through jstack to see if multiple threads are in the BLOCKED state, resulting in a deadlock.
When your service goes online and you find that the performance does not meet expectations, you can use JMC to analyze JVM running information to see which hotspot methods can be optimized and which thread contentions can be avoided.

Detailed explanation of Git

Git common commands

The difference between Git and SVN has been briefly introduced in the previous summary of knowledge points. Let’s take a look at Git’s common commands and corresponding usage scenarios.

Git manages versions in a distributed manner, so there are four areas for saving data, as shown in the light green part in the figure below, which are the local workspace, the local staging area Stage, the local warehouse and the remote warehouse.

When developing, first pull the code from the remote to the workspace. There are several methods: clone, pull, fetch+checkout, as shown by the arrows to the left in the figure. When submitting code, first add it to the staging area through the add command, then commit it to the local warehouse, and finally use push to push it to the remote warehouse. As shown by the arrows pointing to the right in the picture.

Pay a little attention to the difference between fetch and pull.

fetch synchronizes from the remote warehouse to the local warehouse, but does not merge it into the workspace.
pull is equivalent to executing the fetch command + merge command, first synchronizing to the local warehouse, and then merging to the workspace.

Git's command line prompt is very friendly, and the instructions for common Git operations are very complete. Other commands will not be introduced.

Git common workflow

When using Git for team collaboration development, multi-person collaboration and multi-branch development are very common. In order to better manage the code, a workflow needs to be developed. This is what we call workflow, and it can also be called a branch management strategy. Common Git-based workflows include Git-flow workflow, GitHub workflow and GitLab workflow, as shown in the figure below.

Git-Flow workflow
1. As shown on the left side of the figure above, Git-Flow is divided into 5 branches according to functions, which are represented by different colors in the figure. Master and develop are long-term branches. The code on the master branch is the version release status; the develop branch represents the latest development progress.
2. When certain functions need to be developed, the feature branch is pulled from develop for development. After the development is completed and verified, it can be merged back into the develop branch. When the code on develop reaches a stable state and can be released, it will be merged from develop to the release branch for release. If there is a problem in the verification, it will be repaired in the release branch. After the repair is verified, it will be officially released and then merged into master. branches and develop branches. There is also a hotfix branch used for online emergency bug fixes. Hotfix pulls branch modifications directly from the master. After the modification verification is completed, it is directly merged back to the master and synchronized to the develop branch.
3. The Git-Flow process is very complete, but for many developers and teams, it is a little complicated and does not have a graphical page.
GitHub workflow
1. Now let’s look at another simpler workflow, as shown above, the GitHub workflow in the middle.
2. GitHub workflow has only one long-term branch master, and the code of the master branch is always releasable. If there is new feature development, you can check out the new branch from the master branch. When the development is completed and needs to be merged, create a merge to master to PR, which is a pull request. When the review passes or verification passes, the code is merged into the master branch. The process of hotfix hot fix in GitHub workflow is exactly the same as that of feature branch.
GitLab workflow
1. As shown in the figure above, look at the GitLab workflow on the right. The first two workflows each have their own pros and cons, with Git-Flow being slightly more complex and GitHub's single master branch sometimes lacking. GitLab combines the advantages of both, supporting both Git-Flow's multi-branch strategy and some of GitHub Flow's mechanisms, such as Merge Request and issue tracking. GitLab workflow uses the pre-production branch for pre-release management and the production branch for release. My team currently uses GitLab workflow.

Linux tools

Let’s take a look at the commonly used analysis tools under Linux systems. The first is the stat series listed in the table below.

vmstat can get information about processes, memory page swapping, virtual memory, thread context switching, wait queues, etc. Can reflect the load condition of the system. It is generally used to check the number of processes waiting, memory paging status, whether system context switching is frequent, etc.
The iostat tool can monitor the disk operation activities of the system and can also display CPU usage. It is generally used to troubleshoot problems related to file reading and writing. For example, when troubleshooting file writing takes a long time, you can check whether await and util have passed. high. iotop is a top tool for checking disk I/O usage. If you want to know which process has generated a lot of IO, you can use iotop.
ifstat is a simple real-time network traffic monitoring tool that can check the system's network outlet and inlet usage. iftop can be used to monitor the real-time traffic of the network card, reversely analyze IP, display port information, etc. It is easy to find which IP is occupying network traffic through iftop.
Netstat is a tool for monitoring system network status. It can check the network connection status, which interfaces are monitored, link-related processes and other information. It can display statistics related to IP, TCP, UDP and ICMP protocols. It is a very commonly used network. tool.
dstat is an all-in-one real-time system information statistics tool that can count CPU usage, memory usage, network conditions, system load, process information, disk information, etc. It can be used to replace tools such as vmstat, iostat, netstat and i fstat.

Let’s take a look at the tools shown below.

strace is a tool for diagnosing and debugging system calls when a program is running. It can dynamically track the running of a program and clearly see the system call process generated when a program is running, its parameters, return values and execution time.
When the JVM executes native methods, you can easily use strace to debug. For example, when executing system reads and writes, if the thread is stuck for a long time, you can use strace to view the parameters and time consumption of the system call.
GDB is a powerful command line debugging tool that allows the program to run in a controlled environment, stops the debugged program at a specified breakpoint, and can also dynamically change the execution environment of the program. When the JVM crashes for unknown reasons, you can use GDB to analyze the coredump file generated during the crash to analyze and locate the problem.
lsof is a tool that lists open files on the current system. Everything in Linux is a file, including devices, links, etc., which are managed in the form of files. Therefore, viewing the file list through the lsof tool is very helpful for system monitoring and troubleshooting.
tcpdump is a powerful network packet capture tool that is very useful when analyzing calls between services. Data packets transmitted in the network can be captured for analysis. tcpdump provides flexible capture strategies, supports filtering for network layers, protocols, hosts, networks or ports, and provides logical statements such as and, or, and not to remove unwanted information.
Traceroute is a network routing analysis tool that uses the ICMP protocol to locate all routes between the local computer and the target computer. Traceroute is very helpful for troubleshooting network problems between services, especially those passing through the public network.

Inspection points and bonus points

Inspection point

The above is the key points of knowledge about commonly used tools. Next, let’s summarize the interview points from the interviewer’s perspective:

Understand what types of problems commonly used JVM analysis tools are mainly used to analyze. For example, you can use the thread analysis tool jstack for thread deadlocks; for memory overflows, you can use jmap to view the largest object types in the heap; when you need to analyze program performance, you can use Flight recorders in JMC and more.
Master the commonly used code version management tool Git, including common commands and common problems of Git, and understand Git workflow. For example, you know the difference between Git merge and Git rebase. Merge is to commit and merge changes, while rebase is to modify the commit history. Know which workflow your team uses when collaborating on development and what its advantages and disadvantages are.
Mastering common tools under Linux system also highlights practical capabilities. Understand which types of tools should be used to analyze different problems. For example, if disk writing often takes a long time, you can use iostat to analyze the disk IO situation. If you cannot determine the problem, you can use strace to analyze the system call for file writing; or if the CPU load is high and you want to locate which thread is causing the problem, You can use top combined with jstack for analysis and so on.

The examination point in this class is mainly based on the breadth of knowledge. For different types of tools, you need to know the applicable situations, and focus on practical application experience. You may be asked about some principles in this part of the interview, but generally the specific implementation of the tool will not be asked in depth.

bonus

For the commonly used tools section, the interviewer may not directly ask you "Can you use such and such a tool?" So for the knowledge in this lesson, you need to take the initiative to get extra points. For example, when the interviewer asks about project experience, bring out tools you know about to reflect your breadth of knowledge and practical experience.

For example, when the interviewer asks you what online problems you have encountered, you can say that you have encountered problems with high time-consuming single-machine requests. Through JMC's flight recorder sampling analysis, it is found that thread competition is very fierce when writing logs. , many threads spend a lot of time waiting for write locks. Further investigation through iostat found that the util utilization percentage was very high, and the final location was that there was a disk problem. The solution: on the one hand, replacing the disk solved the problem, and on the other hand, the asynchronous log mechanism was used for log files with high writing competition. Answering in this way can not only highlight your ability to master common tools, but also highlight your practical and problem-solving abilities.

In addition, I provide two ideas:

When introducing a project you developed, you can mention that you used JMC to create a performance profile before going online, and found and optimized certain problems.
When introducing the project plan, he mentioned that he had conducted JMH tests on two different solutions to verify the performance of the solution implementation, and so on. Both cases are able to take the initiative and demonstrate their understanding and mastery of commonly used tools.

Summary of real questions

Finally, some real questions are listed for reference exercises, as follows.

I have basically introduced these questions before, so I won’t answer them again. It is recommended to learn the use of JMC, BTrace, tcpdump, strace and other tools after class.

The next lesson will explain commonly used frameworks such as Spring and Netty.