Traceroute Principles and Application Challenges

1 Introduction to Traceroute

Traceroute is one of the most widely used network diagnostic tools after ping because of its simplicity and wide range of applications. Possible applications of traceroute range from simple network troubleshooting to large-scale scans that reveal the underlying network topology. However, because traceroute was not built with modern network technologies in mind, it faces many problems in today's network environment. These problems usually manifest as strange or incorrect probe return results. This greatly affects traceroute's network diagnostic and analysis capabilities, especially in large networks. In the following, we will gradually introduce the principle of traceroute and the problems faced by the tool in the current network environment.

2 Traceroute principle

Classic Traceroute works by sending ICMP echo requests (so-called probes) with a fixed TTL. TTL values ​​usually start at 1 and are incremented on each probe. Routers in the path decrement the TTL by one each hop when forwarding the packet, and send back an ICMP TTL timeout error if the TTL reaches zero. Thus, traceroute gets an error message from each hop between itself and the destination, containing the IP address of each hop. traceroute is also able to calculate the RTT for each hop by subtracting the time the error was received when the probe was sent. Finally, when the probe reaches its destination, traceroute receives an ICMP echo reply and stops. The basic traceroute message flow is shown in the diagram below:
insert image description here
Modern traceroute variants also include support for UDP and TCP as well as IPv6. The only difference from ICMP when using UDP or TCP is that packets received from the destination are usually ICMP port unreachable or TCP RST packets respectively. Typical exceptions are when the packet is blocked by a firewall or the port is in use, in which case no error is returned.

3 Current network environment

The scale of today's Internet has expanded many times compared to the early network. Therefore, many technologies have emerged to help improve network carrying capacity and transmission speed, among which load balancing and MPLS (Multi-Protocol Label Switching) are very representative . Details are given below.

3.1 Load balancing

Load balancing typically distributes packets across several different links or paths. Load balancing mechanisms generally fall into three categories, explained below.

  • Per-flow load balancing. Per-flow load balancing attempts to distribute packets according to so-called flows. A flow is usually identified by a 5-tuple of the corresponding packet, namely IP address, protocol, and port. This is done so that packets belonging to the same connection are delivered to their destination in order as possible.
  • Per-packet load balancing. Per-packet load balancing distributes each packet individually across available links. Typically, packets are distributed randomly or in a round-robin fashion. It has the advantage of requiring less work inside the router, but on the other hand it usually introduces huge jitter to the connection, especially if the different routes are not of equal length. Per-packet load balancing usually causes the most problems with traceroute because of its random nature.
  • Per-destination load balancing. Per-destination load balancing distributes packets based on their destination. It's basically the same as classic routing and usually has little to no impact on the network. Traceroute is usually completely immune to per-destination load balancing.

3.2 MPLS (Multiprotocol Label Switching)

Multiprotocol Label Switching, or MPLS, described in RFC 3031, is used to efficiently route data packets in large networks, such as the Internet. Typically, each router must make its own routing decisions based on the information contained in the IP header. Since IP addresses are widely distributed in the Internet, routers are usually required to have very large routing tables. Also, since only a few fields in the IP header (namely source and destination addresses and TTL) are actually used for routing, this introduces a lot of unnecessary overhead. MPLS uses its own header, which encapsulates the original packet. This way, only the first router has to examine the IP header and assign a forwarding equivalence class, or FEC, to the packet in the new header. This specifies the destinations that are considered equivalent for routing decisions. Since most destinations can actually be combined into large chunks, the corresponding tables can be very small. Subsequent routers are able to make routing decisions based on the shorter and more tractable FEC in the MPLS header. Since the TTL value can be copied back and forth between the IP and MPLS headers, MPLS routers are also able to honor the TTL value set in the original packet. Additionally, RFC 4950 allows the generation and consumption of ICMP packets in the context of MPLS. Therefore, MPLS routers may provide basic support for traceroute.

4 Challenges of Traceroute

The following are descriptions of the most common traceroute exceptions and their specific behavior. For each exception, a sample message flow is shown, along with the corresponding output.

4.1 Anonymous Routing

This is the most basic anomaly, where one or more hops are missing from the traceroute output. This typically occurs when the router is firewalled or otherwise configured not to generate ICMP TTL exceeded errors. The following figure shows a sample message flow for this exception and the corresponding output. The three asterisks highlighted below, indicating that no response has been received to the corresponding survey, are key indicators of this anomaly.
insert image description here

4.2 Destination No Response

Another exception is the absence of destinations in traceroute results. In this case, traceroute just keeps scanning until it reaches the maximum probe TTL value or is interrupted by another constraint. An example is to stop after a certain number of unsuccessful attempts. A peculiar side effect of this exception is that an arbitrary number of hops may be missing from the end of the output. A common case of lost destinations are destinations protected by firewalls. The image below shows an example of this anomaly. The output again contains the typical three asterisks of unsuccessful probes, and continues until the maximum TTL is reached.
insert image description here

4.3 Wrong RTT value

There are usually two reasons for this anomaly: asymmetric packet paths or MPLS routing. When the respective paths to and from the destination are asymmetrical, i.e. packets are routed on different paths to and from the destination, the round trip time may not reflect the actual time it takes for the packet to reach the destination. The resulting round trips are then shown to be misleading the value of. The actual path may actually be shorter or longer than indicated by the round trip time, depending on the situation. The figure below shows such a scenario, with the corresponding times highlighted in the output. If the return path jumps from a longer path to a shorter path, the RTT measured by traceroute will become even shorter, ie the output will show a negative TTL increase for the last link.
insert image description here
Similar results occur on MPLS links, where the response packet must travel to the end of the MPLS path until it returns to the sender of the probe. Since pure MPLS routers only know the packet's next hop, they cannot immediately send back ICMP errors. Instead, they must use the path where the original packet was located. The effect of this is that all packets are transmitted to the last MPLS hop first. Therefore, the round-trip time for each hop in the MPLS path shown in traceroute roughly reflects the round-trip time to the last MPLS router. The image below shows an example of this anomaly. A characteristic hallmark of this anomaly is nearly equal round-trip times for multiple hops in the traceroute output, as highlighted below.
insert image description here

4.4 Missing links

This exception means that the traceroute output is missing links, which are present in the actual topology. The usual reason is load balancing, in this case when all packets are routed on one path. The image below shows an example of this anomaly. Another link should appear on the two highlighted lines in the output.
insert image description here

4.5 Incorrect links

In this case, there is a bad link between the hops probed by traceroute. It usually occurs in load-balanced links, when some packets are routed through one path and some packets are routed on another path. An example is shown in the figure below. Error links are shown on the two highlighted lines in the output. This anomaly is actually a huge problem with modern networks, especially since it's not obvious to users who don't understand the actual topology. As such, it can lead people to draw wrong conclusions about the network or the problems it is trying to solve.
insert image description here

4.6 Routing Loops

This is one of the more complex anomalies where some hops are lost while others show up multiple times, i.e. packets appear to travel in loops or circles. The most common round robin case is load balancing for paths of unequal length. Another example might be an MPLS link, where the address of the last MPLS router is used for ICMP errors, such as when intermediate routers lack IP addresses. A rare example is when a packet with a TTL of zero is forwarded to the next hop, for example by a faulty router. Loops typically only occur on load-balanced links where the difference in length is greater than 1. PM describes a sample message flow. The two highlighted rows below show hops that were probed twice. However, the corresponding output may also be reasonable in case there is an actual forwarding loop or loop.
insert image description here

4.7 Mesh path

Mesh paths are among the most complex anomalies, showing some extra links while others are missing. This anomaly only occurs when multiple probes are sent for a hop. It is usually caused by load balancing, which occurs when some probes are forwarded on one path and some on other paths. This will cause the traceroute output to be completely confused. The real topology and traceroute detection results are shown in the figure below.
insert image description here
insert image description here

5 possible solutions

Below are several solutions for the various exceptions presented above. Of course, these solutions can only limit the impact of the aforementioned anomalies in most cases.

5.1 Paris Traceroute

Paris traceroute was developed to correct most of the deficiencies found in classic traceroute, especially with regard to load balancing networks. The distinguishing feature of Paris traceroute is that it attempts to actively influence routing decisions in per-flow load-balanced links. It does this by carefully setting header fields in the probe packets it sends, which are taken into account by per-flow load balancing.

  • Related paper: Multipath tracing with Paris traceroute
  • Related tools: https://paris-traceroute.net/download/

5.2 UDP and TCP detection packets

Modern variants of traceroute also support sending UDP or TCP probes instead of ICMP echo requests, as mentioned earlier. Since most routers and firewalls block ICMP echo requests, most modern traceroute implementations actually use UDP by default. Another advantage of UDP probes is that they do not require root privileges to send probes on Linux systems. TCP probing is usually only used in very specific circumstances, usually to bypass very restrictive firewalls or traverse NAT gateways. The main argument against TCP is that it tries to create a connection that then introduces state into the network. Also, there are more applications listening on TCP than UDP. In fact, most implementations use TCP port 80 by default to make it easier to traverse firewalls. To clear a pending connection, an additional TCP RST packet is required. All in all, anonymous routing or missing destination anomalies might be mitigated by using UDP or TCP instead of ICMP echo requests.

5.3 AS number lookup

AS numbers can be queried from databases, such as the RIPE database, for IP address identification of network operators and detection of network boundaries. This information may then be used to contact an administrator in the event of a network failure. There is also a modern algorithm that combines BGP information with information from multiple databases to produce more accurate results.

5.4 MPLS label decoding

The decoded MPLS label, ie FEC, is returned in the Extended ICMP Error Packet, as defined in RFC 4950. It makes diagnosing MPLS-related problems easier, and also allows for more accurate interpretation of traceroute output. This is especially useful if MPLS-related anomalies are suspected, such as causing incorrect round-trip times or loops caused by MPLS routing.

5.5 Reverse traceroute

Reverse traceroute technology is used to trace the path a packet takes from a remote source to a local host. Additional network problems can be diagnosed and interpretation of existing output can be simplified by examining the packet return path. This is especially important for exceptions related to asymmetric paths. There is a proposal to actually do this without target system interaction. However, it requires the use of the Record Route or RR IP option, which records the number of hops the packet has traveled. This is done to record the return path of each packet. The RR option was invented as an alternative to traceroute, but since it required interaction of individual routers, it was generally unsupported and abandoned. It can also only record 8 hops in both directions, which is why the method requires multiple hosts, so-called "vantage points", between the target and the local host. Therefore, this approach is not feasible for users who do not have the necessary resources. Finally, it is necessary to forge the source IP address in the probes sent to redirect the response to said host, which is blocked by most routers.

6 References

Jobst M E. Traceroute anomalies[J]. Network, 2012, 9.

Guess you like

Origin blog.csdn.net/airenKKK/article/details/130249992