PCIe related

[Reprint] Old Boy Reading PCIe Five: TLP Structure

Source: http://www.ssdfans.com/?p=3683

 

Regardless of Request TLP or Completion TLP in response, they look the same:

Figure 5.1

TLP is mainly composed of three parts: Header, Data and CRC. TLP is born in the transaction layer of the sender, and finally the transaction layer of the receiver.

Every TLP has a Header. Just like an animal, it can’t live without a head. So TLP can have no hands or feet, but it cannot have a head. The transaction layer generates TLP Header according to the content of the upper request. Header content includes the sender's relevant information, target address (to whom the TLP is to be sent), TLP type (such as Memory read, Memory Write mentioned above), data length (if any), and so on.

The Data Payload field is used to put payload data. This field is not necessary, because not every TLP must carry data, such as Memory Read TLP, it is just a request, and the data is returned by the target device through the Completion TLP. Later we will sort out which TLPs need to carry data and which TLPs do not carry data. As mentioned earlier, the maximum load of a TLP is 4KB, and if the data length is greater than 4KB, it needs to be transmitted in several TLPs.

ECRC (End to End CRC) field, it generates a CRC for the previous Header and Data (if any), and then regenerates the CRC of Header and Data (if any) at the receiving end according to the received TLP. Compared with the received CRC, it means that there is no error in the data transmission process, otherwise there will be an error. It is also optional and can be set without CRC.

Figure 5.2

There is nothing to say about the Data field and the CRC field. The header field is the header field. We have to take a closer look.

The size of a Header can be 3DW or 4DW. Take the 4DW Header as an example, the TLP Header looks like this:

Figure 5.3

The red area is the common part of all TLP headers, all headers have these; the others are related to specific TLP.

Explain a little bit:

Fmt: Format, indicating whether the TLP has data, and whether the Header is 3DW or 4DW;

Type: TLP type, mentioned in the previous section, Memory Read, Memory Write, Configuration Read, Configuration Write, Message and Completion, etc.;

R: Reserved, which is 0;

TC: Traffic Class, TLP is also divided into three or six or nine grades, with the highest priority being served first. Here is 3 bits, indicating that it can be divided into 8 levels, 0-7, TC defaults to 0, the larger the number, the higher the priority;

Attr: Attrbiute, attribute, a total of three bits before and after, not to mention;

TH: TLP Processing Hints, not to mention;

TD: TLP Digest, previously said that ECRC is optional. If this bit is set, it means that the TLP contains ECRC, and the receiving end should do CRC check;

EP: Poisoned data, poisoned data, stay away, haha;

AT: Address Type, address type, not to mention;

Length: Payload data length, 10 bits, maximum 1024, unit DW, so the maximum data length of TLP is 4KB; the length is always an integer multiple of DW, if the TLP data is not an integer multiple of DW (not an integer multiple of 4Byte), You need to use the following two fields:

Last DW BE 和 1st DW BE。

I think that so far, for the Header, we only need to know what it contains, and there is no need to remember what each domain is.

Here I will focus on Fmt and Type. Let’s take a look at how Fmt and Type should be coded for different TLPs (the reduced version, owned by Native PCIe devices).

Table 5.1

As can be seen from the above, the Header size of Configuration and Completion TLP (TLP starting with C) is always 3 bytes; the Header of Message TLP is always 4 bytes; and the TLP related to Memory depends on the size of the address space. If the address space is less than 4GB, the header size is 3DW, and if the address space is greater than 4GB, the header size is 4DW.

The general parts of several TLP Headers are introduced above, and the specific TLP Headers are introduced below.

Memory TLP

There are two important things not mentioned before, that is, the source and target of the TLP, that is, where the TLP is generated and where it is going, they are all included in the header. Because different TLP types have different addressing modes, it is necessary to look at these two things in specific TLP.

Figure 5.4

For a PCIe device, the device space that it is open to the Host to access will first be mapped to the memory space of the Host. If the Host wants to access a space of the device, the address in the TLP Header should be set to the mapping of the access space in the Host memory address. If the Host memory space is less than 4GB, the Header size of the Memory read and write TLP is 3DW, and if it is greater than 4GB, it is 4DW. That's because, for the 4GB memory space, the 32bit address can be represented by 1DW, which is located in Byte8-11; and the memory space above 4GB requires 2DW to represent the address, which is located in Byte8-15.

When the TLP passes through the Switch, the Switch will forward the TLP to the target device according to the address information. The only way to find the target device is because different Endpoint device spaces are mapped to different locations in the Host memory space.

About TLP routing, I will talk about it later.

The target of Memory TLP is notified by memory address, and the source is notified by "Requester ID". Each device has a unique ID in the PCIe system. The ID is uniquely determined by the bus (Bus), device (Device), and function (Function). This will be discussed in detail later. You only need to know that a PCIe component has a unique ID, whether it is RC, Switch or Endpoint.

Configuration TLP

The configuration format of Endpoint and Switch are different, which are represented by Type 0 and Type 1, respectively. The configuration can be considered as a standard space of an Endpoint or Switch. This space also needs to be mapped to the memory space of the Host during initialization. Different from other spaces of the equipment, this space is standardized, that is, no matter which manufacturer produces equipment, there needs to be such a space, and where to put something, it is stipulated by the agreement. Host accesses this part of the space according to the agreement. Since each device ID is unique and its Configuration is fixed, Host accesses the configuration space of the PCIe device only by specifying the ID of the target device, without the need for a memory address.

The following is the TLP Header (Type 0) to access the configuration space of Endpoint:

Figure 5.5

Bus Number + Device + Function uniquely determines the target device; Ext Reg Number + Register Number is equivalent to the offset of the configuration space. Once you find the device and specify the offset of the configuration space, you can find a specific location in the configuration space you want to visit.

Message TLP

Message TLP is used to transmit interrupts, errors, power management and other information, replacing the sideband signal transmission in the PCI era. The Header size of Message TLP is always 4DW.

Figure 5.6

Message Code to specify the type of the Message, as follows:

Figure 5.7

For different Message Codes, the last two DWs have different meanings, so they will not be expanded here.

Completion TLP

Only with non-posted request TLP can there be Completion TLP. There are reasons for results. As you saw earlier, the Requester ID and Tag are included in the TLP of the Requester to tell the receiver who the initiator is. Then the target address of the responder is very simple, just copy the source address of the initiator. Therefore, the Header of Completion TLP is as follows:

Figure 5.8

Completion TLP, on the one hand, can return the requester’s data, for example, as a response to Memory or Configuration Read; on the other hand, it can also return the status of the transaction (Transaction). Therefore, there is a Completion Status in the header of the Completion TLP, Used to return the transaction status:

Figure 5.9


09/21/2017 Fall

 

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

First of all, PCIe switch and PCIe bridge are the general names of baiPCIe related chips.
PCIe switch is translated into PCIe switch or PCIe switch in Chinese. It is mainly used to interconnect PCIe devices. The communication protocol between PCIe switch chip and its devices is PCIe;
PCIe bridge is translated into PCIe bridge in Chinese, and its main function is to interconnect PCIe devices with other bus protocol devices. (For example, PCI, USB, etc.), PCIe bridge chip realizes the communication between PCIe bus protocol devices and other bus protocol (PCI, USB, etc.) devices.

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2 PCIE device configuration space read 

The command to read the PCI-E device configuration space is lspci .

NAME

lspci – list all PCI devices

   
 

SYNOPSIS

lspci [options]

   
 

For detailed command parameters, you can use man lspci to view, here we only introduce common parameters.

The default output result of the command is all PCI/PCI-E devices in the current system.

[root@localhost ~]# lspci

00:00.0 Host bridge: Intel Corporation 5500 I/O Hub to ESI Port (rev 13)

00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13)

00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)

00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)

00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)

00:10.0 PIC: Intel Corporation 5520/5500/X58 Physical and Link Layer Registers Port 0 (rev 13)

00:10.1 PIC: Intel Corporation 5520/5500/X58 Routing and Protocol Layer Registers Port 0 (rev 13)

00:11.0 PIC: Intel Corporation 5520/5500 Physical and Link Layer Registers Port 1 (rev 13)

00:11.1 PIC: Intel Corporation 5520/5500 Routing & Protocol Layer Register Port 1 (rev 13)

00:13.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 13)

00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13)

… …

01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)

01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)

04:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)

05:00.0 VGA compatible controller: XGI Technology Inc. (eXtreme Graphics Innovation) Z9s/Z9m (XG21 core)

[root@localhost ~]#

   
 

Common parameters:

-v Display detailed information about the device.

-vv displays more detailed information about the device.

-vvv Display all parseable information of the device.

-x Display the first 64 bytes of the configuration space in hexadecimal, or the first 128 bytes of the CardBus bridge.

-xxx Display the entire PCI configuration space (256 bytes) in hexadecimal notation.

-xxxx Display the entire PCI-E configuration space (4096 bytes) in hexadecimal.

-s [[[[<domain>]:]<bus>]:][<slot>][.[<func>]]: Display the specified device.

Example:

[root@localhost ~]# lspci -vvvxxxx -s 00:14.0

00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13) (prog-if 00 [8259])

    Subsystem: Unknown device 00e5:0008

    Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-

    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

    Capabilities: [40] Express Unknown type IRQ 0

        Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-

        Device: Latency L0s <64ns, L1 <1us

        Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-

        Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-

        Device: MaxPayload 128 bytes, MaxReadReq 128 bytes

        Link: Supported Speed unknown, Width x0, ASPM L0s, Port 0

        Link: Latency L0s unlimited, L1 unlimited

        Link: ASPM Disabled CommClk- ExtSynch-

        Link: Speed unknown, Width x0

00: 86 80 2e 34 00 00 10 00 13 00 00 08 10 00 80 00

10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

20: 00 00 00 00 00 00 00 00 00 00 00 00 e5 00 08 00

… …

fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   
 

When we use the old version of the Linux system to run on the new platform, we will find that many of the values ​​of the lspci command are unknown. The device name displayed by lspci is like "Host bridge: Intel Corporation 5500 I/O Hub to ESI Port (rev 13)", which is actually matched from the file /usr/share/hwdata/pci.ids, and the PCI-E configuration space is There is no string like Intel. When Unknown devices appear, we can update the pci.ids file.

The download address of pci.ids file is:

http://pciids.sourceforge.net/

After downloading, directly overwrite the /usr/share/hwdata/pci.ids file.

3 PCI-E device configuration space modification 

The command to modify the PCIE configuration space is: setpci .

NAME

setpci – configure PCI devices

  

SYNOPSIS

setpci [options] devices

  

For the setpci command, the main parameters are as follows:

-s [[[[<domain>]:]<bus>]:][<slot>][.[<func>]]

   
 

That is, we have to specify the device and then modify its configuration space. Common command formats and parameters are as follows:

setpci -s BUSID:DEVID.FUNCID REGISTEROFFSET.B=NEWVALUE

setpci -s BUSID:DEVID.FUNCID REGISTEROFFSET.W=NEWVALUE

setpci -s BUSID:DEVID.FUNCID REGISTEROFFSET.L=NEWVALUE

   
 

Such as:

setpci -s 0:14.0 60.B=6

It is the device 0:14.0 device, the PCI configuration space is cheap to 0x60, and the new byte value is written to 6. To check whether the PCI configuration space modification takes effect, you can use the lspci command to check. For example, after setting 0:14.0, the read command is lspci –s 0:14.0 –xxx.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

PCIE debugging process record

Problems encountered

  • PCIE link is unstable
  • The configuration space reads and writes normally, but the memory mapping space reads and writes abnormally

reason

The previous understanding of PCIE has remained at the conceptual stage, only knowing that it is a high-speed communication protocol, which is mainly used for high-speed BUS within and between boards. It happened that the company was debugging a PCIE BUS BSP on the PowerPC platform recently. Need some PCIE hardware and software knowledge. Let's further understand the PCIE bus protocol by solving the actual problem process. But it is only limited to engineering applications (debugging and application in actual products).

Specific resolution process

PCIE physical layer link is unstable

After starting u-boot, you can see the PCIE link status information log
pcie link status under u-boot
from the log analysis: the system has started 3 PCIE controllers, and the PCIE number is: 0.0.0 2.0.0 4.0.0.
The specific meaning of the three numbers is: 0.0.0: Bus Number Device Number Function Number.
An EP device is hung under the three Host Bridges, and the PCIE numbers are: 1.0.0, 3.0.0, 5.0.0.
Normally seeing these PCIE bridges and EPs indicates that PCIE is correct in reading the configuration space of PCIE, and the link status of PCIE is OK in physical.
You can use pci header + PCIE number to view Bridge and EP Config information.
pci config space
If the read configuration information is unstable, it indicates that the PCIE link is unstable. The hardware needs to be further investigated. The software-assisted method to check the link status is to check the register value of the link status of the PCIE Host Bridge. There is an LTSSM (Link Training and Status State Machine) in the PCIE specification, which is defined in the code specification of various status.
This LTSSM is in the PCIE Extend Config Space. The offset in P3041 is 0x404.
You can use the command: pci display.b 0.0.0 404 to view the LTSSM status.
Normal output:
LTSSM
PCIE will enter L0 state after initialization. For abnormal status, see PCIE link abnormal log.
The physical layer link is unstable. The following reasons are suspected:
-High-speed serial signal quality problem
-Serdes power problem
-clock problem

The Serdes power supply was basically eliminated after measurement.
The high-speed signal integrity problem is basically eliminated after the oscilloscope measurement.
For the clock problem, the frequency and amplitude of the clock were simply tested at the beginning. There is no substantive analysis of clock quality issues such as clock jitter. After analyzing the schematic diagram, it is found that the Host and EP ends of pcie use different pcie clock ICs, that is to say, they use non-homologous clocks. Without checking the pcie specification about pcie clock in detail, change the clock of host and EP to the same source clock (of course, choose the flying line), and solve the problem of pcie physical layer link.
When the project is not so stressful, reflect on the matter and carefully check the clock requirements in the pcie specification. It does not require that the Host and EP end use the same source clock. It only requires the accuracy of the clocks of the same source and non-same source.
In the case of homology, 100M is required. +-600ppm.
Non-homologous clock requires 100M. +-300ppm.
This is the case. Then look at the PCIE clock IC selected in the schematic, and find that the pcie clock IC on the EP side is 25M input and multiplied The 100M clock comes out, and an IC with SSC function is used, and the function is turned on in the principle configuration. According to the pcie specification, this function
is a pcie technology. In order to reduce board-level EMI, this function does a special energy diffusion treatment for the clock. The clock with SSC function exceeds the jitter of 300ppm, but is within 600ppm.
This leads to a conclusion that we have used a non-homologous design, and the jitter of the pcie clock IC used on the EP side exceeds 300ppm. It seems that the unstable link is also reasonable. It seems that there is another solution, turn off the SSC function on the PCIE side of the EP, and find out that after turning off the SSC function, the link can be normally stabilized.
The pcie clock chip we use supports SSC and the configuration is turned on, and the clock ppm is between 300 and 600. They are not homologous, so the link is unstable. Turn off this function, the clock is at 300ppm, so the link can be stabilized.

Solution:
-The flying leads are unified into the same clock source.
-Turn off the SSC function of the clock IC on the EP side.

PCIE memap space read and write abnormal

problem:

  • pcie can read and write the configuration space normally, but cannot read and write the memap space normally

The final positioning problem:
an address is 36bit, but the software is defined as a 32bit variable, which causes the software to fail to read and write the pcie memap space (read the wrong position).
Solution: fixed the software bug.

The positioning of this problem mainly requires some knowledge about some addresses of PICE, and to understand the relationship between them.

  • BAR in PCIE space
  • The address of the high-speed peripheral (IO) of the CPU to access the PCIE device
  • Application layer access PCIE memap address

    Take a picture,
    PCIE addresses

    In the above figure, there are several key Addresses. Virtual Address is easier to understand. Any programmer with a little experience in Linux development knows that applications under Linux can only access Virtual Address, so if we need to access some designated What to do with the Physical Address, this is very common in driver development. Linux provides the mamap system call for the conversion of Virtual Address and Physical Address. In addition, the PICE device uses the PCIE BAR address when reading and writing. The BAR address is the address used by the PCIE controller to read and write the PCIE EP device in memap mode, and is the address used by the PCIE protocol.

    Let's take the memap space of PCIE EP as an example to illustrate how the conversion between addresses is done. The PCIE memap space is allocated by the OS, which is usually a fixed address and size. After the OS is started, the PCIE application program maps the PCIE memap Physical Address to the Virtual Address through the mamap system call, and then the application program will pass this The address reads and writes the memap space of the EP device. When the CPU sends out a write to the memap space of the EP device, the PCIE controller will convert the address required by the corresponding PICE protocol according to the set BAR address. BAR is set during the PCIE enumeration process. Generally, it is set to a reasonable address value through enumeration. The EP side will map its memap address space to the set address according to the configured value.

Give a specific example:
The PCIE Physical Address allocated by the CPU is 0xc20000000, and the BAR address set is 0xe0000000. The Virtual Address actually used by the application layer APP is related to the OS. Different processes calling mamap to map to the same Physical Address may produce different Virtual Address values.

to sum up

Application layer debugging method, config space configuration, memap space understanding
Pcie controller configuration and use
From the actual debugging process of PCIE devices, device debugging and use mainly involve two aspects: config space and MMIO space. The configuration space is mainly used to configure PCIE devices. Many PCI devices only support 64 bytes of configuration space. The difference between PCI and PCIe configuration space is as follows:
PCI/PCI-X and PCIe devices also expand the configuration space 0x40 and 0xFF. This space mainly stores some Capability structures related to MSI or MSI-X interrupt mechanisms. All PCIe devices that can submit interrupt requests must support the Capability structure related to the MSI or MSI-X interrupt mechanism.
PCIe devices also support the extended configuration space 0x100-0xFFF. The maximum expansion configuration space of PCIe devices is 4KB. In the expansion configuration space of the PCIe bus, some capability structures unique to PCIe are stored, and PCI devices cannot use this space.
In the x86 processor, use the CONFIG_ADDRESS register and the CONFIG_DATA register to access 0x00-0xFF, and use the ECAM method to access the space 0x000-0xFFF; and in the PowerPC processor, you can use the CFG_DATA and CFG_ADDR registers to access 0x000-0xFFF.
MMIO (Memory mapping I/O) refers to memory mapping I/O. It is part of the PCI specification. I/O devices are placed in memory space instead of I/O space. From the processor's point of view, system devices are accessed the same as memory after memory mapped I/O. In this way, access to the frame buffer on the AGP/PCI-E graphics card, BIOS, and PCI devices can be completed with the same assembly instructions as reading and writing memory, which simplifies the difficulty of programming and the complexity of the interface. As a channel for communication between CPU and peripherals, I/O is mainly divided into two types, one is Port I/O, and the other is MMIO (Memory mapping I/O).
The following aspects need to be considered for software migration using pcie interface at the bottom:

  1. Determine the width of the address bus, whether it is 32bit or 64bit or other.
  2. Determine the size of the CPU, and the size of the PCIE issue.
  3. Check PICE max payload issue.

appendix


$$$develop from single control storeage based on P3041
$$$Hit any key to stop autoboot:  0 
=> pci display.b 0.0.0 400
00000400: 00 00 00 00 33 00 00 00 e2 04 00 00 00 00 00 00
00000410: 01 00 00 00 19 00 00 00 00 00 00 00 40 40 96 96
00000420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000430: 00 00 00 00 00 00 00 00 5c 41 c2 00 00 00 00 00
=> pci display.b 0.0.0 400
00000400: 00 00 00 00 02 00 00 00 e2 04 00 00 00 00 00 00
00000410: 01 00 00 00 19 00 00 00 00 00 00 00 40 40 96 96
00000420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000430: 00 00 00 00 00 00 00 00 5c 41 c2 00 00 00 00 00

Linux platform PCIe driver writing

https://blog.csdn.net/qq_21792169/article/details/88708428?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~first_rank_v2~rank_v28-5-88708428.nonecase&utm_term=bar%20linux%20%E6%9F%A5%E7%9C%8Bpcie&spm=1000.2123.3001.4430

The previous article analyzed the entire PCIe system knowledge, including how to scan the PCIe tree. This article explains how to write a driver when a PCIe device is obtained. The driver is written in the application, and the driver can also be written at the kernel layer.

To write the driver from the application layer is mainly to use the pcilib library and the /dev/mem interface. The code is analyzed below. The device is initialized according to the manufacturer ID and device ID of the pcie device, and the access device pointer descriptor is returned. The pci_dev pointer points to the device we need to access.

                                                                                                        figure 1

Clean up PCIe device function: pci_cleanup(myaccess);

The read and write configuration space functions are as follows:

                                                                                                    figure 2

 

So far, the program can access the PCIe 4k configuration space smoothly. When writing the configuration space, you must be under the root user. Then you can access the bar space. The bar space access is divided into two parts to get the size and mapping. To get the size of the bar space, write all 1s to the bar0 register first, and check the bit changes to get the size.

Get the bar space size function:

                                                                                             image 3

To map the bar space, the address obtained in the program is already the address of the cpu storage domain, not the PCIe domain address. The /dev/mem interface is a mirror image of all physical addresses. mmap can map the specified physical address to the virtual address space through this interface, so that the process can access the actual physical address space. The mapping function is as follows, the phy_addr parameter is the physical address to be mapped, and size is the size of the physical address to be mapped.

                                                                                            Figure 4

 

When the program exits, the mapped memory should be released and the /dev/mem device node should be closed.

munmap (vir_addr, size);

close(pcieVirAddr_fd);

 

The PCIe driver is written in the driver. All the ideas are consistent with the application, but the function interface and type used are different. But I recommend writing the PCIe driver to the kernel.

Only the core code is listed here in the driver. Please download the source code and read the detailed code and Makefile.

                                                                                           Figure 5 Get the device pointer

                                                                                    Figure 6 Map BAR space

 

If it is only for the convenience of debugging, you can use the system's lspci command to access the configuration space.

 

Source code download address: https://download.csdn.net/download/qq_21792169/11044860 (password: linux)

 

Guess you like

Origin blog.csdn.net/u014426028/article/details/109544492