NVMe Disk Violent Hot Swap Learning Record

1. SFF-8639
SFF-8639 is also called U.2, and its physical structure is compatible with SAS/SATA/SATA Express/Nvme. For detailed pin definition of SFF-8639, please refer to PCI Express SFF-8639 Module specification.
Insert picture description here
Insert picture description here
Insert picture description here
Among the more important sideband signals are PWRDIS, IfDet#, PERST#, DualPortEn.
PWDIS: When the signal asserts, the power supply of the SFF-8639 circuit is disabled.
IfDet#: The long pin signal is the detect signal of SFF-8639. When the Nvme disk is inserted or plugged in, it will be pulled down as a presence signal and pulled up.
PERST#: PCIe reset signal. PCIe Spec specifies the timing of power supply, clock and PESRT# signal.
DualPortEn: Dual-port disks (dual-port disks are generally used for storage). This signal needs to be asserted. If it is a single-port disk, the signal de-asserts.
2. Connection diagram
Insert picture description here

The PCA9555 simulated by CPLD can be replaced by a real PCA9555 chip. Using CPLD to simulate PCA9555 is completely based on cost considerations.
3. The in
-position signal defined by the SFF-8639 spec of the hot-swappable Nvme disk conflicts with the definition of PCIe. SFF-8639 stipulates that the two out-of-band signals PRSNT# and IfDet# are both frist-to-mate and last-to-break, which means that the PRSNT# and IfDet# specified by SFF-8639 are longer than the PCIe in-band signal , And the PCIe specification stipulates just the opposite.
Here are the original words of PCIe SFF8639 Module Specification:
The SFF-8639 interface includes the PRSNT# and IfDet# signals, as an out of band presence detect mechanism, to detect the presence of the SFF-8639 module. Since PRSNT# and IfDet# are not in the last-to-mate and first-to-break group, another vendor-specific mechanism is required to provide warning of module removal.
Due to historical reasons (there is no button button on the SAS disk) caused by the customer's operating habits (the customer changes the disk by violent removal), the Nvme disk needs to support violent hot swap instead of notification hot swap.
Violent hot plugging means that the PCIe device is unplugged from the system without notifying the driver to stop the IO in advance, which will result in unfinished IO.
Support for violent hot swapping requires that the switch or RP (if the SSD is connected to the switch or RP) can assist in processing these unfinished requests, or the host has a similar mechanism to return the completion package (PCIe protocol) for unfinished Non posted requests The non-post request requires a completion package) to prevent the host from waiting for the timeout caused by the uncompleted request (timeout may cause the system to crash). This will use an important capability newly added to the PCIe 3.1 protocol: DPC.
4. What is
DPC? The full name of DPC is Downstream Port Containment. This capability is newly added by PCIe 3.1 protocol for root port and switch downstream port. Its function is to check that there is an error in the downstream port itself (what level of error will trigger DPC is configurable) or the device below the downstream port will report an error message (what level of error message will trigger DPC is also configurable) and can be turned off Corresponding port (let the LSTTM of the port enter the disable state), block PCIe traffic under the port to prevent error propagation.
For non-posted requests that have not been completed, the root port or switch downstream port returns the completion package according to the DPC settings, and the completion package status is UR or CA.
The DPC mechanism provides an opportunity for the system to recover from errors. Therefore, it can be used to achieve the demand for violent hot swapping of Nvme disks.
5. Preparations for implementing Nvme violent plugging and unplugging
(1) BIOS
a. Reserve resources for each port
Since the enumeration of PCIe devices follows the principle of depth first, if the system is not booted with a disk during startup, the BIOS will not reserve resources (bus resources, memory resources, prefetchable memory resources) for RP or switch downstream ports during BIOS enumeration. Insert the Nvme disk later. Due to insufficient port resources, the disk cannot be enumerated into the system unless the software re-adjusts the resources and then enumerates (this will affect the IO of other disks).
If you want to sell the whole system to customers, the BIOS must reserve resources for ports that support hot plugging (reserved according to the maximum resources required by the EndPoint supported by the system). If you just verify the chip function, you can start the system with an Nvme disk or interface card to ensure that resources are allocated for ports that support hot swap during enumeration.
b. PCI Express configuration space settings.
Because some regs of PCIe configuration space are HwInit, the driver cannot be accessed. If you want to support the violent hot swap of Nvme disks, you need BIOS or Firmware to initialize some HwInit regs.
Insert picture description here
Insert picture description here
If slot implemented in PCI Express Capabilities is not implemented, slot capability will not be implemented. Therefore, slot implemented in PCI Express Capabilities must be 1.
Insert picture description here
Insert picture description here
The surprise down error report capable in Link Capabitlities reg must be 1, otherwise the error of surprise down cannot be converted to DPC (if the port connected to the NVme disk does not support surprise down, you do not need to modify this bit).
The Data link layer link active reporting capable is best implemented, so that it can assist in locating whether the Nvme is unplugged through in-band linkup and linkdown methods.
Insert picture description here
Insert picture description here
Insert picture description here
The power controllerpresent in the slot capabilities reg is best implemented so that the software can control the power of the slot.
Hot-plug surprise must be 0 (if the port connected to the Nvme disk does not support surprise down errors, you can ignore the bit). Hot-plug surprise is the traditional surprise down handling mechanism before the DPC implementation, which actually means a bit of concealment. Because on some platforms, RC receives an incorrect error message that will cause the system to crash, so when hot plugging is supported, the hot-plug surprise bit is added. When a surprise down error occurs, the error will not be reported to the host, thus Prevent the system from crashing. This bit can only prevent the reporting of surprise down errors, and cannot deal with the CPU waiting timeout problem caused by the incomplete IO during brute force pullout, and it will affect the DPC function, which is a historical issue.
For this point, the PCIe spec also has a detailed description:
Insert picture description here
Hot-plug capable needs to be 1, otherwise it cannot support various interrupts of hot-plugging.
(2) Software
Insert picture description here
When the slot has corresponding capabilities, the software needs to open the bit corresponding to slot ctrl capabilities. And set DPC ctrl according to system requirements.
Insert picture description here

6. The software process flow of Nvme brute force plugging and plugging
Insert picture description here
can be seen from the process, in fact, the so-called surprise add is not much different from the normal notification hot plug hot add, but the button press interrupt is replaced by the presence detect change interrupt. .
Insert picture description here
Since the Surprise remove process did not press the button to notify the Nvme driver to stop the IO, there will be uncompleted IOs. This will use the DPC feature to process these completed IOs.
————————————————
Copyright statement: This article is the original article of the CSDN blogger "linjiasen", and it follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprinting. .
Original link: https://blog.csdn.net/linjiasen/article/details/104361350

Guess you like

Origin blog.csdn.net/qq_33632004/article/details/105972147