PCIe literacy --PCIe error detection mechanism

Address reprint: http://blog.chinaaet.com/justlxy/p/5100057784

It encompasses PCIe bus error detection errors on the link (Link) and packet errors during transmission, as shown in FIG. The application layer user design errors are not link transmission errors, error detection and should not be treated PCIe by handling mechanism, and can be interrupted by means of a special device (Device Specific Interrupt) other suitable manner and reporting process.

image.png

Error packet transfer process is mainly detected by CRC coding. PCIe defines two CRC - LCRC and ECRC. Wherein LCRC (Link CRC) generated by the data link layer and a checksum for detecting whether the error transmitted from the data link layer at one end to the other end of the TLP data link layer occurs. The ECRC (End-to-end CRC) generated by the transaction layer and the school seizure and ECRC is optional.

Some might question the need for ECRC exist, because LCRC has been carried out on TLP CRC checksum, based on this extra layer of ECRC may not be necessary. Here to explain briefly, under normal circumstances (especially simple PCIe bus system does not have the Switch), ECRC really is no need to exist. ECRC to solve the main problem may be wrong Switch transmission errors in the transmission, in other words, if the user's design does not Switch (simply Root and Endpoint directly connected end to end), you can not use the ECRC.

As shown below, it is assumed from the TLP Endpoint is correctly transmitted to the input port of the Switch Downstream (Ingress Port), a data link layer Downstream of input ports subject also completed LCRC correction thereof, and no errors are found. Switch the LCRC will then remove and add new sequence number (Sequence Number), and then recalculate the LCRC, and then sends the TLP to the output port of the Switch Upstream (Egress Port). Obviously, in this process TLP is not protected, once encountered during data transmission errors and other abnormalities that may lead to recalculate the data before LCRC has been destroyed, and only use LCRC can not find such errors.

Note: About the serial number (Sequence Number), you can refer to the previous related articles on Ack / Nak's.

image.png

Note that, ECRC is part of the AER in order to use ECRC, the PCIe devices must be supported by the AER.

If the layer according to (Layer) error occurred to points, the error can be divided into a physical layer, a data link layer and the transaction layer error error.

Error physical layer (Physical Layer Errors) are:

· 8b / 10b encoding and decoding abnormal

· Framing abnormalities (8b / 10b coding is optional, 128b / 130b are mandatory)

· Elastic Buffer error (optional)

· Start character loses lock (Loss of Symbol Lock) or channel alignment unlock (Lane Deskew) (optional)

Error data link layer (Data Link Layer Errors) are:

· LCRC checksum failure

· Serial number (Sequence Number) abnormal

· DLLP in the 16-bit CRC checksum failure

· Link layer protocol errors (Link Layer Protocol Errors)

Transaction Layer Error (Transaction Layer Errors) are:

· ERCR checksum failure (optional)

· Unusual TLP (Malformed TLP) (TLP format that is abnormal)

· Flow control protocol anomaly (Flow Control Protocol Violation)

· Unsupported Request

· Data Corruption (Data Corruption, also known Poisoned Packet)

· Completer Abort (optional)

Overflow-receiving end (Receiver Overflow) (optional)

· Return Timeout (Completion Timeout)

· Does not correspond to the return package (Unexpected Completion, i.e. inconsistent and Request Completion issued)

When the physical layer of the receiving end detects an error TLP, if we continue to send to the TLP data link layer and the transaction layer must also find errors. While too much will make mistakes error analysis and processing difficult. Thus, there is no need to pass upwardly in the TLP, but it directly thrown away, and to report the error.

However, even so, error reporting PCIe bus, there are many errors originate from the same source of the error. It is necessary to prioritize the error, such error source (the lowest error) of higher priority can be processed first. PCIe bus errors prioritizing follows (highest to lowest priority):

Internal error (Uncorrectable Internal Error) uncorrectable

· Receiving end Buffer Overflow

· Flow control protocol error

· ECRC checksum failure

· Unusual TLP (Malformed TLP)

·         AtomicOp Egress Blocked

· TLP header abnormality (TLP Prefix Blocked)

· Access Control Service (Access Control Services, ACS) abnormal

·         MC(Multi-cast) Blocked TLP

· Unsupported Request (Unsupported Request, UR), Completer Abort (CA) or the return packet does not correspond to (Unexpected Completion)

• receiving a corrupted data packet (Poisoned Packet)

Guess you like

Origin blog.csdn.net/kunkliu/article/details/94717017