pcie link training

There is a chance to use Rockchip to do EP, X86 to do RC, debug and establish a connection between the two.

Environment configuration

As shown in the figure, the two boards are expected to be EP by Rockchip and RC by X86, and the two can be interconnected.

LINK configuration process

Main Configuration Rockchip

Rockchip's chip configuration, when configuring EP mode, is divided into two parts: the controller and the PHY, and both parts need to be configured in EP mode.

Due to a bug in the software code, some clocks of the PHY work in RC mode, and the configuration clock comes from the chip itself, not from the peer. As a result, two RCs are interconnected in the actual environment.

Rockchip EP side of phenomenon

By resetting the Rockchip board, check that the PCIE connection has been established through the registers, and the relevant registers are:

The value is 0x4407, that is, both the physical layer and the data link layer are set to 1, both are linked, and the LTSSM is 0x11. According to the Rockchip manual,

That is, it has entered the connection state.

The X86 RC side of the phenomenon

Related registers on the x86 side

During the reset process of the Rockchip board or about 2 minutes after the reset, we can see the changes of this register as follows:

50: 00 00 11 38 00 b2 5c 00 00 00 48 01 08 00 00 00 (8 means in link train, 3 bit29 means link active)

50: 00 00 11 30 00 b2 5c 00 00 00 48 01 08 00 00 00 ( link training is 0, and link up is 1, indicating that it has entered the L state )

50: 00 00 11 38 00 b2 5c 00 00 00 48 01 08 00 00 00 (here will soon enter the link train state again)

50: 00 00 11 30 00 b2 5c 00 00 00 48 01 08 00 00 00

50: 00 00 11 38 00 b2 5c 00 00 00 48 01 08 00 00 00

50: 00 00 11 38 00 b2 5c 00 00 00 48 01 08 00 00 00

50: 00 00 11 38 00 b2 5c 00 00 00 48 01 08 00 00 00

50: 00 00 11 10 00 b2 5c 00 00 00 48 01 08 00 00 00 后续一直link traing ,直到此时没有再traing,说明对端已经起来了。没又link traing的对象了!

通过上述寄存器变化,可以得知,X86侧看到的PCIE link状态为短暂的link (offset 0x53值为0x30时),而后大部分时间在0x38这个状态,也就是一直在link training。

LTSSM状态变化与对应寄存器的关系

翻看PCIE协议,其中 当link up 1 并且 link training 也为1时,只有configuration和recovery两个状态。

而通过上面65标识文字及下面转换图,可以得到从Recovery进入configuration时才被设置为1。

于是我们想是什么导致 LINK 从Lx 状态变化到了Recovery状态

Recovery状态进入

通过上述发现后,我们在瑞芯微一侧持续跟踪连接状态,发现其LTSSM也在recovey的几个状态和L0之间转换

速率、带宽的改变是主要功能

然而我们已经配置X1 GEN1。

X86 桥片NSR功能

PCI Power Management Control

在实验过程钟,怀疑PCIE端口的电源状态相关,于是将其改配

setpci -s 0:1c.0 a4.b=0x8

如果写0,则下挂设备会跟着复位。注意。

rescan功能

1)在第二节的配置,LINK不稳定的情况下,我们通过lspci -vt命令可以看到对应的桥片端口信息如下:

+-1c.0-[01]----

虽然看到了桥片PCIE端口,但是看不到设备。

2)在第二节看到连接状态为0x30 0x38时,我们通过下面命令,

/sys/bus/pci/devices/0000:00:1c.0# echo 1 >rescan

此时,再查看lspci -vt

+-1c.0-[01]----00.0 Fuzhou Rockchip Electronics Co., Ltd Neural Network Processor Card

3)即可以看到设备信息,进而可以读取此设备的配置空间寄存器信息

lspci -xxx -s 1:0.0

10: 00 00 00 3c 00 00 00 00 00 00 00 00 00 00 00 00

20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

4) 此时虽然看到了设备,但是0x10 bar0 寄存器的值并非X86地址空间的地址,而是原始瑞芯微芯片的地址。说明rescan并没有给设备分配地址,这也情有可原,毕竟其上游端口也没地址可分。

总结

我们纠结于 LTSSM状态机苦寻无果,而最终走读代码发现由于两侧都配置为RC导致了上述的现象。

至于为什么在此种配置下会llink training几分钟,而不是几十分钟后续再进一步探索。

intel 是否有寄存器直接指示LTSSM状态变化?还好此款桥片支持LINK状态查询,即寄存器的link active 位。

Guess you like

Origin blog.csdn.net/proware/article/details/128943672