APM monitoring of Pinpoint usage experience

In the process of application system testing or production operation, you may often encounter such scenarios:

  1. "A certain service is hung up, and I can't tell the reason for the error message. I will wait for the boss to save the scene."
  2. "The response speed of a certain request is very slow. To find out where it is stuck, you need to look down layer by layer."
  3. "I only know how to monitor the resource indicators of the server, and there is still nothing when I encounter a problem."
  4. "The system is always under the wind from time to time, can it be rescued again?..."

Especially in the current popular distributed service system, if there is a performance problem, in the face of N hardware servers and 10N-level services (containers), where to start is already a problem in itself.

So, let me introduce you to Pinpoint, an APM monitoring tool for distributed services.

[Suitable for readers]: development, testing, operation and maintenance, etc.

APM, the full name of Application Performance Management, is focused and related to performance!

There have been many articles specifically comparing the pros and cons of various APM tools, such as the popular skywalking, zipkin, etc., I will not repeat them here, I want to know more about my own Baidu.

Continue our topic and see what Pinpoint can bring us?
First, the overall picture:
APM monitoring of Pinpoint usage experience

From this figure we can get this information:
1. The overall architecture components of the system, for example, you can see the request between the application service and mysql, redis, and third-party services in the figure;
2. The response speed of the system/node, And the number of successful and failed requests, etc.;

The default tracking granularity of pinpoint is indeed more detailed. You can see the detailed time-consuming situation of a single request on each node of the entire service link. For example, here, you can clearly see which node and method consume the most time.
APM monitoring of Pinpoint usage experience

Here is another picture. Can you see from this picture the current performance bottleneck of this system?
APM monitoring of Pinpoint usage experience

In addition to tracking the time-consuming status of service links, pinpoint can also monitor the usage of JVM, thread pool, database connection pool, handles, etc., to come up with several high-energy big pictures.
APM monitoring of Pinpoint usage experience

APM monitoring of Pinpoint usage experience

APM monitoring of Pinpoint usage experience
Here you can see the maximum number of connections of the database connection pool and the current number of connections, which can be a good reference for judging whether the number of database connection pools is enough, and it is clear at a glance.

In addition to the above conventional usage, Pinpoint can also be used to track abnormal errors. For example, in the first picture above, when the request fails abnormally, pinpoint will be marked in red, and you can click on it to see the node and method location where the abnormal failure occurred. , And the specific error message.

Similarly, because Pinpoint records the requested link information in detail, and displays the specific SQL statements during the request process, it can be very helpful for troubleshooting and even monitoring the execution speed of SQL statements. , Are there any surprises?
APM monitoring of Pinpoint usage experience

To sum up, pinpoint can bring us the following benefits:

  1. Grasp the overall response speed of the system, and have a good understanding of the operation of the system;
  2. Grasp the response speed of each node, such as third-party service interfaces, redis, mysql, etc.;
  3. The specific service link time-consuming situation of a single request, positioning the performance bottleneck;
  4. The specific service link request information of a single request can help in troubleshooting;
  5. Monitor the usage of the JVM, thread pool, and database connection pool of each service. Imagine if there are dozens or even hundreds of service nodes in the distributed service system, how to monitor the JVM of each node?

Generally speaking, as APM monitoring tools such as pinpoint are gradually developing and mature, the introduction of such tools can play a good auxiliary role in our daily development, testing, operation and maintenance work, especially in distributed service systems. If you don’t have such tools, you will inevitably panic when you encounter problems.

Regarding whether to deploy this type of APM monitoring tool in a production environment, here are a few points for reference:

  1. APM monitoring must not affect the success or failure of the business system. In other words, even if the APM monitoring fails, the business system should be able to operate as usual. Don't introduce an untimely bomb to the entire system just because of the introduction of APM monitoring. This will outweigh the gain. At one point, pinpoint should be OK. I tried the pinpoint server even if it hangs, the business system can still run well;
  2. APM monitoring consumes server resources. The finer the monitoring granularity, the more it consumes. In other articles, people have done a comparison of the performance loss of various APM monitoring tools. Pinpoint is relatively lossy because its monitoring granularity is relatively fine. I also did a test. After the pinpoint monitor is turned on, the performance will indeed suffer a loss of about 8%. But to put it another way, the current online environment of various systems, CPUs and other resources are often running below 30% utilization. In this case, even more than 10% loss is still acceptable. inside;
  3. Open source APM monitoring tools also need to consider security issues. Tools such as pinpoint and skywalking currently do not have access rights and other controls. Be careful not to flood data or be accessed by users with bad intentions. details. Of course, there are solutions to this type of security problem. For example, you can specify the authorized ip on the routing firewall, or pass the web server plus authorization verification before forwarding, etc.;

    If you have other different opinions or opinions, please leave a message to discuss.

Guess you like

Origin blog.51cto.com/14437683/2546396