On-chip network (2) topology

Preface

  The topology of the on-chip network determines the physical layout and connections between nodes and channels in the network. Topology has a very important impact on the overall cost-performance of the network. The topology determines the number of hops or routers a message passes through, as well as the physical distance of the interconnect lines that each hop passes through, and therefore has a significant impact on network latency. Because information consumes energy when passing through routers and links, the impact of topology on the number of hops will be directly reflected in the power consumption of the network. Additionally, the topology determines the total number of available paths between nodes, thereby affecting the network's ability to spread traffic and meet bandwidth demands. The implementation complexity of a topology depends on two factors: the number of links on each node (i.e., node degree) and the ease of laying out the topology on a chip (i.e., the number of wire lengths and metal routing layers required).

  A bus is one of the simplest topologies that uses a shared channel to connect a group of components. All components on the bus can observe every message on the bus. Therefore, the bus is an efficient way of broadcasting. However, the scalability of the bus is very limited as the continued addition of components will cause the shared channel to become saturated.

  In this chapter, we will focus on switched topology. In a switched topology, a set of components are connected through routers and links. We first describe several metrics that can help us develop some intuitive understanding when comparing topologies. Next, we describe several topologies commonly used in on-chip networks and compare them using the previously described metrics.

1. Indicators

        When designing an on-chip network, the topology needs to be considered first. Therefore, there is a need for methods that can quickly compare different topologies before determining other parts of the on-chip network such as routing, flow control, and microarchitecture. The figure below shows three topologies commonly used in on-chip networks (ring, mesh and torus structures). Among them, all nodes in the ring structure form a one-dimensional ring; in the two-dimensional mesh structure (2D mesh), the nodes form a regular two-dimensional grid, and each node is connected to its surrounding neighboring nodes; and the torus structure It is a further improvement on the mesh structure. It connects the corresponding nodes on both sides of the network so that the mesh forms a loop in this dimension.

1.1 Indicators not related to network traffic

1.degree

The degree         of a topology represents the number of links each node has . Each node in the ring structure has two links, so its degree is 2; each node in the torus structure has links with 4 adjacent nodes, so its degree is 4. The degrees of nodes at different positions in the mesh structure are different. Degree is an indicator that can effectively measure network overhead. The greater the degree of a node, the more ports the node router requires. Correspondingly, the implementation complexity, area and power consumption of the router will increase. We call the number of ports on each router the router radix.

2. Bisection bandwidth

  Bisection bandwidth refers to the communication bandwidth between the two parts after dividing the network into two identical parts. For example, in the above figure, the number of links passing through the bisection line of the ring structure is 2, the number of links passing through the bisection line of the mesh structure is 3, and the number of links passing through the bisection line of the torus structure is 6. This bandwidth metric effectively reflects the worst-case performance of a particular network because bisection bandwidth limits the total amount of data that can be moved from one end of the system to the other. In addition, this indicator also represents the number of global connections necessary to implement an on-chip network, so it can also be used to measure network overhead. Compared with evaluating off-chip networks, bisection bandwidth is actually not a good evaluation of on-chip networks, because the global connection resources on the chip are much richer than the pin bandwidth resources for chip and off-chip communication. .

3.Network diameter (diameter)

  The network diameter refers to the maximum value in the set of shortest paths between any two nodes in the topology (that is, the maximum number of hops in the shortest path for all source-destination node pairs in the network). For example, the network diameter of ring and mesh structures is 4, and the network diameter of torus structure is 2. In the absence of link contention, network diameter can be used as a measure of the maximum latency in a topology.

1.2 Indicators related to network traffic

1. Hop count

  Routing from one node to the next node is called a hop. The hop count refers to the total number of hops required for a message to be routed from the source node to the destination node, or the total number of links that the message passes through . This is an indicator that can simply and effectively reflect network delay , because even if there is no competition within the network, every node and link in the network will still introduce transmission delay. The maximum number of hops in a network depends on its diameter. In addition to the maximum hop count, the average hop count can also be a good reflection of network latency, which represents the average hop count of the paths between all possible source-destination node pairs in the network.

  When the three topologies (ring, mesh and torus) have the same number of nodes and the network traffic is random and uniform (each node has the same probability of sending messages to other nodes), the number of hops in the ring structure is higher than that in the mesh structure and torus. There are more structures. For example, assuming that all links are bidirectional links and the shortest path routing method is used, the maximum number of hops for both the ring and mesh structures is 4, while the maximum number of hops for the torus structure is 2. The average hop count of the torus structure is still the smallest, at 1¹/₃; the average hop count of the mesh structure is slightly larger, at 1⁷/₉; the average hop count of the ring structure is the largest among the three topologies, at 2 2\9.

2. Maximum channel load

  This metric can be used to estimate the maximum bandwidth a network can support, or the maximum number of bits per second that each node can inject before the network is saturated. There is the following relationship between the maximum channel load and the maximum injection bandwidth (maximum injectionbandwidth).

Maximum injection bandwidth = 1/maximum channel load

Intuitively, to estimate the maximum bandwidth a network can support, one first needs to determine which link in the network is the most congested under a specific traffic pattern. To calculate the maximum channel load, first identify the bottleneck channel. As shown in the figure below, the bottleneck channel is the channel connecting two ring structures.

        Let's assume it's a bidirectional link. Under the ideal routing method, half of the traffic injected by each node will remain within the ring structure in which it is located, while the other half will reach the node in another ring structure through the bottleneck channel. For example, for each packet output by node A, there is a 1/8 probability that it reaches node B, C, D, E, F, G, H or itself. When the destination of the data packet is nodes A, B, C, and D, the data packet does not pass through the bottleneck channel; but when its destination is nodes E, F, G, and H, it must pass through the bottleneck channel. So, 1/2 of the bandwidth injected by node A will go through the bottleneck channel, and the same goes for the remaining nodes. Therefore, the load of this bottleneck channel is 2. Correspondingly, the node's maximum injection bandwidth is 1/2. As a result, when the node injection bandwidth reaches 1/2, the entire network will be saturated. Adding more nodes to the two rings will further increase the channel load, thereby reducing network bandwidth.

3. Path diversity

  Given a source node and a destination node, if this node pair has multiple shortest paths in a certain topology and only one shortest path in another topology, the former topology is considered to have greater path diversity. . Path diversity in the topology gives routing algorithms greater flexibility in handling load balancing issues, thereby increasing network throughput by reducing channel load. Path diversity also gives packets the potential to bypass faults in the network. The ring structure in the graph does not provide path diversity because there is always only one shortest path between pairs of nodes.

        As shown in (a) above, if a data packet moves in the clockwise direction between nodes A and B, it will go through 4 hops; if the data packet moves in the counterclockwise direction, it will go through 5 hops. If the network wants more transmission paths, the easiest way is to go around them. If the number of nodes in a ring structure is an even number, then the nodes on both sides of the ring structure have two shortest paths, and the path diversity is 2. The mesh and torus structures in Figures (b) and (c) provide more shortest paths between source-destination node pairs to choose from. In the mesh structure of Figure (b), 6 different shortest paths are provided between nodes A and B, and the distance of all paths is 4 hops.

2. Direct connection topology: ring, mesh and torus

        Direct-connect topology refers to a network topology in which each terminal node (such as a processor core or cache in a single-chip multi-processor) is equipped with a router ; all routers are both the generation and inflow end of traffic and the transfer station of traffic. So far, most on-chip network designs have used direct-connect networks, because considering routers and terminal nodes together is usually suitable for environments with circuit area constraints such as chips.

  The directly connected topology can be described as k-ary n-cube, where k is the number of nodes in each dimension and n is the number of dimensions. For example, a 4×4 mesh or torus structure can be described as a 4-ary 2-cube. Their dimensions are all 2, and the number of nodes in each dimension is 4. There are 4×4=16 nodes in total; The 8×8 mesh or torus structure has 64 nodes and can be described as 8-ar y 2-cube. Similarly, a 4×4×4 mesh or torus structure has 64 nodes and can be described as 4-ar y 3-cube. Such a description method assumes that there are the same number of nodes in each dimension, so the total number of nodes in the network can be expressed as k". From a practical point of view, the vast majority of on-chip networks use 2D mesh topology to facilitate circuits on planar substrates Mapping, because more complex network topologies require more metal wiring layers. This is different from networks deployed off-chip, because the connection lines of the off-chip substrate are not restricted by planes and can be three-dimensional. In each dimension On, k nodes are connected to their nearest neighbor nodes through channels. Since the ring structure can be described as k-ary1-cube, it can also be classified as a torus-like structure.

  For a torus structure, all nodes have the same node degree; however, for a mesh structure, the degree of network edge nodes is smaller than the degree of network center nodes. The torus structure is edge-symmetrical (the mesh structure is not). This attribute can help the torus structure better balance the traffic of each channel. Correspondingly, due to the lack of edge symmetry, the mesh structure has higher load requirements on the channels located in the center of the network than on the edge channels.

        The torus network requires two channels in each dimension, and the entire torus network requires 2n channels. Therefore, for a 2D torus network, its degree is 4; for a 3D torus network, its degree is 6. For mesh networks, its speed is the same as that of torus networks, although the ports at the edge of the network may not be used. The average minimum hop count of a torus network is obtained by calculating the mean of the shortest paths between all possible pairs of nodes in the network. The average minimum hop count for the torus network is calculated as follows.

After removing the surrounding links, the torus network becomes a mesh network. The average minimum hop count of the mesh network is slightly larger than that of the torus network, and its average minimum hop count is calculated as follows.

  For a torus network with an even number of parameters k under uniform random traffic, the maximum channel load between the two parts after bisection is k/8, which limits the maximum injection bandwidth of the network to 8/k flit/node/cycle. For mesh, the maximum channel load will increase to k/4, so the maximum injection bandwidth of the network is reduced to 4/kflit/node/cycle.

  Compared with ring networks, both mesh and torus networks have path diversity. As the network dimension increases, its path diversity also increases.

3. Indirectly connected topology: crossbar switch, butterfly network, clos network and fat tree network

  Indirectly connected topology refers to a network topology in which all terminal nodes are connected through one or more intermediate switch nodes . Unlike the direct-connected topology where each node is both a terminal node and a switching node, the terminal nodes and switching nodes in the direct-connected topology are separated from each other. Only the terminal node is the source of traffic (sends out new traffic) or destination (processes the incoming traffic without continuing to pass it down), and the intermediate switching node is only the transfer station of the traffic (sends the traffic from the input to the specified output).

3. 1 crossbar switch

  The simplest indirect topology is a crossbar. A crossbar connects n inputs and m outputs through n×m simple switching nodes. Among them, n inputs and m outputs are fully connected, and each input is connected to all m outputs. This structure is non-blocking because it always connects the sender directly to a specific receiver.

3. 2 Butterfly Network

  Butterfly network is a typical example of indirect topology and can be described as k-ary n-fly. The network has k² terminal nodes such as processor cores or memories; the network has n levels of switching nodes, and each level contains kⁿ⁻¹ k×k switching nodes. In other words, k is the in-degree/out-degree of the switching node, and n is the level of the switching node. The figure below shows a 2-ary 3-fly butterfly network. The round nodes are terminal nodes , and the square nodes are switching nodes .

The detailed variation method of flat butterfly network is as follows:

3.3 fat tree network

      Fat tree is logically a binary tree structure, as shown in the figure below. A fat tree network can be folded from a clos network.

3.4 Irregular topology

        Multi-processor system-on-chip design can use a variety of heterogeneous IP modules, which can reduce the number of switching nodes and links in the topology according to specific conditions, and can significantly reduce power consumption and area.

        Customized topologies place certain restrictions on network connectivity.

Note: ARM's CMN-700 uses a mesh network.

Guess you like

Origin blog.csdn.net/m0_52840978/article/details/133001377