Hongke Sharing | How Flow-Based Traffic Classification Works | Network Traffic Monitoring

Many ntop products, such as ntopng, nProbe, and PF_RING FT, are based on network streams. However, not all users know in detail what network streaming is and how it works in practice. This blog post describes what they are and how they work in practice.

what is network traffic

A network stream is a set of packets with common properties. They are usually identified by a 5-tuple key, which means that all packets of a given flow have the same source and destination IP, source and destination port, and application protocol (e.g. TCP). In practice, the stream key also includes at least the VLAN Id, and eventually other attributes such as the tunnel ID of the encapsulated traffic. Flow is a method of clustering packets and then classifying traffic by using a public key, similar to what you see on your computer when you run a command like netstat -na. Each flow has various counters for tracking flow packets/bytes and various other attributes such as flow timers (time of first and last flow packet), statistics (retransmissions, packets disorder, etc.) and security properties (such as flow risk).

How is traffic stored in memory?

Network flows are kept in a data structure called a flow cache (usually implemented using a hash table), which is continuously fed with incoming packets. The flow cache stores active flows (ie, those flows that were still active when a packet belonging to the flow was received) in memory. Below you can see how ntopng displays the live stream cache and its 5-tuple key.

When does the network stream start?

Network flow begins as soon as the first flow packet is observed. On startup, the stream buffer is empty and fills up as packets are received. Each incoming packet is decoded and a stream key is computed. The flow cache is searched for such a key: if not found, a new entry is added to the flow cache, otherwise an existing entry with such a key is updated, ie the counters of flow packets/bytes and timers are updated. So, essentially, a stream starts when the first stream packet is observed.

When does the network stream end?

Each flow has two aging timers: an idle timer (which keeps track of how much time has passed since the last flow packet was received) and a duration timer (which keeps track of how long the flow lasts). A flow ends when one of these two aging timers expires, i.e. when the flow has been idle for too long (eg, no packets have been received for a period of time) or when the flow has been stored in the flow cache for too long. In nProbe and PF_RING FT, when a flow expires, it is removed from the flow cache and sent to the collector. In ntopng, by contrast, streams are removed from the stream cache only for idleness, since persistent streams are not removed from the cache. The reason is that traffic probes like nProbe need to periodically report information about the monitored traffic to a collector (e.g. ntopng), so the traffic is "clipped" and sent to the collector. In ntopng, by contrast, there is no need to notify the collector, so the stream stays in memory as long as the necessary is configured in the preferences.

Flow Keys and Direction

If the stream is created when the first stream packet is received, then we can treat the stream client as a real network client. For example, SSH from a client on host 1.2.3.4 to host 5.6.7.8, the flow of this communication would be 1.2.3.4:X<->5.6.7.8:22 (we assume SSH is running on port 22). Looks right? But sometimes you will see that such a stream is reported as 5.6.7.8:22<->1.2.3.4:X in the stream cache. Why? This could be due to various reasons:

  • The application (e.g. ntopng) starts after the stream starts, and the first packet observed by ntopng is 5.6.7.8:22 -> 1.2.3.4:X, not 1.2.3.4:X -> 5.6.7.8:22.
  • The flow is stored in the cache with the correct key, but no packets have been exchanged for a while (say 2 minutes), so the application has declared the flow as expired and removed it from the flow cache. Then, if a new packet is suddenly observed, that packet might be sent in the wrong direction (eg 5.6.7.8:22 -> 1.2.3.4:X), since this might be the server's save packet. In this case, the stream is placed in the cache in the opposite direction (9, so wrong).

The ntopng (via preferences) and nProbe (using the command line with -t and -d) stream timeouts can be configured, so these problems are mitigated (though not completely resolved). However, just adjusting the timeout is not enough, especially for UDP streams, because contrary to TCP, there are no TCP flags that can be used to guess the real flow direction. Therefore, ntopng implements some heuristics to swap flow directions, but this heuristic must not be too aggressive, as we may report invalid information.

We hope this post sheds light on how flow-based network traffic analysis works, and why some "unexpected" behavior is sometimes observed, not because of vulnerabilities, but because of the nature of these measurements.

Guess you like

Origin blog.csdn.net/HongkeTraining/article/details/130249747