Visual data collection of application call chain based on ebpf (implemented based on libebpfflow)

Daily Pain Points of Operation and Maintenance

  • 活动保障, every time the call chain needs to be sorted out based on the key business links, and the work is repeated for many years;
  • 问题排查, the lack of a visual call chain of the application, every time a problem is triggered, on-site operation and maintenance, development and drawing, repeated work for many years;
  • 关联分析, drills, relocations, failures, disasters, etc., the upstream and downstream of the associated call do not know who has called who, and it needs to be collected manually offline, and repeated labor for many years;
  • 运维管控, the call is allowed and not allowed, the first and second lines of defense need to be controlled for operation and security, and manual statistics are required for each audit, which requires repeated work for many years;

Visual data collection of application call chain based on ebpf

Here 调用链is a way to show the network topology of the application, the difference between the call chain based on the distributed application, skywalkingand zipkinonly the correlation analysis based on the 5 elements of the network layer, showing the calling logic of the application under TCP and UDP .

online scene

There are two environments on the production line, on the cloud (private cloud) and off the cloud (x86 physical machine and VM virtual machine). Here we assume that the company's internal operation and maintenance team has a robust and data-integrated CMDB system that records all system names. , project name, server address, middleware port, container-related information (ns, name, id), etc.

request data

The following information about the application on the current server node is obtained based on libebpfflow

[pid:17677][etype:0][enp0s3][task:calico-node][path:/usr/bin/calico-node][192.168.3.182:49410 <-> 10.96.0.1:443][container_id: fb8625900e0f18b2][name: k8s_calico-node_calico-node-flq9x_calico-system_9ee2b75c-ac51-4b45-b450-f3d515cde210_5]
[pid:20645][etype:0][lo][task:redis-server][path:/opt/repoll/redis/src/redis-server][192.168.3.182:40944 <-> 192.168.3.182:11770]

Parse it, the fields of the output are as follows

  • non-cloud environment

  • The difference between the cloud environment and the docker and k8s environments is that there are no ns and pod fields in the docker environment.

How to draw network topology based on request data combined with online scenarios

Let’s take a look at the following picture and explain that the analysis of the request data cmdb obtained on machine A can show that the source address belongs to a certain project of which system, and which system of machine B is called to a certain project ( This project or application is gathered from information such as process name, execution file name, container name, pod name and ns name). And so on to machines C, D, E, etc.

final vision

The final vision system draws a network topology map of the application according to the data requested by the network.

write at the end

The author does not have the idea of ​​landing in the online environment as above, just an idea. The core demand is to obtain a real-time, accurate, comprehensive and complete network call topology for the entire online network, including the application side and infrastructure.

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324225335&siteId=291194637