Detailed explanation of SATA protocol stack: from serial signal to read and write hard disk

Table of contents

1. SATA interface and cable

2. SATA protocol stack overview

2.1 Physical layer overview

2.2 Overview of link layer and transport layer

2.3 Command Layer Overview

3. Detailed explanation of link layer and transport layer

3.1 Link initialization

3.2 List of SATA primitives

3.3 Byte alignment mechanism: ALIGN primitive

3.4 8b10b encoding

3.5 FIS packet structure

3.6 Calculation of CRC

3.7 FIS sending process

3.8 FIS Scrambling/Descrambling

3.9 Repeated scrambling of primitives

3.10 Flow Control

3.11 Summary of link layer and transport layer

4. Command layer: DMA read and write

References


SATA is the most widely used interface protocol for hard drives. This article briefly introduces the SATA protocol stack, hoping to enable readers to understand the purpose of various mechanisms and understand some details. Before you read the hundreds of pages of lengthy SATA Specification, you can read this article first to get an overall grasp

I open source a SATA Gen2 host (HBA) core, which can run on Xilinx FPGA with GTH. Provide an example based on netfpga-sume  official development board, which can realize hard disk reading and writing:

github : Open source SATA HBA that can run on Xilinx FPGA with GTH​github.com/WangXuan95/FPGA-SATA-HBA

1. SATA interface and cable

The SATA interface is shown in Figure 1. The SATA host bus adapter (HBA) is the hard disk read/write controller, which is often implemented by the motherboard chipset in the computer (in my open source project, the HBA is implemented by the FPGA). The SATA device is a hard disk (mechanical hard disk or solid-state hard disk). They are connected by two pairs of differential pairs, among which (SATA_A+, SATA_A-) differential pair is HBA sending and device receiving (that is, TX channel for HBA and RX channel for device); (SATA_B+, SATA_B- ) The differential pair is device transmit, HBA receive (for HBA, it is RX channel, for device, it is TX channel). The rate clocks of the two channels are the same, respectively:

  • SATA Gen1: 1.5 Gbps
  • SATA Gen2 : 3 Gbps
  • SATA Gen3 : 6 Gbps

Figure 1: SATA interface (two pairs of differential pairs)

Figure 2 (above)  is a photo of the SATA interface of a solid-state drive, in which the narrower 7PIN port on the left is a signal interface, and two differential pairs (SATA_A, SATA_B) are included in it. The wider 15PIN port on the right is used to supply power to the hard disk. The pin definitions of these PINs are shown in Figure 2 (below) .

Figure 2: SATA hard disk interface physical map (top); interface pin definition (bottom)

Figure 3 is the SATA cable, the left is the power cable, which can convert the 4PIN power port provided by the desktop power supply to the 15PIN SATA power port. The right side is the signal cable, one end is plugged into the 7PIN port on the hard disk, and the other end is plugged into the HBA.

Figure 3: SATA power cable (left); SATA signal cable (right)

2. SATA protocol stack overview

As shown in Figure , the SATA protocol stack structure includes from downstream to upstream: Physical Layer (PHY), Link Layer (Link Layer), Transport Layer (Transport Layer), and Command Layer (Command Layer).

Figure 4: SATA protocol stack

2.1 Physical layer overview

The downstream of the physical layer is connected to the SATA device with two pairs of serial differential signal pairs, and parallel signals are transmitted between the upstream and the link layer. The main work performed by the physical layer includes:

  • Clock recovery : Different from common low-speed communication (such as the ATA parallel port of the old-fashioned hard disk with dozens of MHz) and medium-speed communication (such as DDR3 and MIPI LVDS with hundreds of MHz), at the high rate of several Gbps such as SATA, each signal It is difficult to align between them, so SATA does not use different signal lines to transmit clocks, but modulates clock and data to the same pair of differential pairs through 8b10b encoding, so the RX channel of the physical layer needs to use a phase-locked loop (PLL, an analog circuit) to recover the clock of the serial signal, the RX data can be correctly sampled only with the recovered clock.
  • Serial-to-parallel conversion : The RX channel of the physical layer needs to use the recovered clock to convert the RX data into parallel signals in units of 10 bits; the TX channel needs to convert the parallel signals in units of 10 bits provided by the link layer into serial signals and send them out . For example, for 3Gbps SATA Gen2, the converted parallel signal can be 150MHz 20bit bit width, or 75MHz 40bit bit width.
  • Byte alignment : Serial-to-parallel conversion involves how to define the boundary of 10bit parallel units in the serial bit stream. SATA uses a special ALIGN primitive to define the byte boundary. Under 8b10b encoding, the ALIGN primitive will generate A unique 10-bit combination mode, the physical layer is responsible for identifying this mode, whenever this mode is encountered, the receiver knows that it is currently on a 10-bit boundary.

2.2 Overview of link layer and transport layer

This article combines the link layer and the transport layer , because the coupling between the two is relatively large, and I personally think it is better to understand together. The link layer and transport layer need to implement: 8b10b codec, primitive generation and detection (including FIS packet boundary recognition), scrambling/descrambling, CRC generation and verification, flow control. Finally, the transport layer interacts with the upstream command layer using a  packet structure called Frame Information Structures  ( FIS ). Link and transport layer functions are described as follows:

  • 8b10b encoding and decoding : The parallel data of the RX channel of the physical layer is 8b10b encoded data in units of 10bit, and the link layer needs to decode it into 8bit (1byte) data; while the TX channel needs to encode 8bit data into 10bit. Obviously, 8b10b encoding will lead to a waste of one-fifth of the bandwidth. The reason why this redundancy is designed is to distribute 0 and 1 as evenly as possible, so that the receiver can recover the clock of the signal.
  • Primitive insertion and detection : SATA specifies several primitives (Primitive), and the length of each primitive is 4byte. Primitives do not carry data, but are used for communication control. The primitives covered in this article include ALIGN, CONT, SYNC, R_RDY, R_IP, R_OK, R_ERR, X_RDY, SOF, EOF, WTRM, HOLD, HOLDA. Primitives have their own functions, for example, the ALIGN primitive is used for byte alignment, the X_RDY primitive is used to tell the other party that he wants to send a FIS to the other party, the SOF primitive is used to indicate the beginning of the FIS, and the EOF primitive is used to indicate the FIS (Through the SOF and EOF primitives, the RX channel can correctly parse out the boundary of the FIS packet). For the TX channel, the primitives need to be inserted correctly. For the RX channel, primitives need to be detected, and the state transition of the state machine should be carried out according to the functions specified by the primitives.
  • FIS scrambling/descrambling : The scrambler (Scrambler) and the descrambler can generate a pseudo-random number sequence. After each reset, the generated pseudo-random number sequence is fixed. In the TX channel, the FIS data to be sent needs to be XORed with the pseudo-random number sequence generated by the scrambler, which is called scrambling; in the RX channel, the received FIS data needs to be combined with the pseudo-random number sequence generated by the descrambler Sequences are also bitwise XORed , known as descrambling. Because the data remains unchanged after the two XORs, it is ensured that the data can be correctly restored after scrambling first and then descrambling. The purpose of scrambling is to make the data transmitted on the SATA cable more scrambled, so that the electromagnetic radiation is closer to white noise (rather than concentrated in a certain frequency), thereby reducing electromagnetic interference (EMI).
  • Repeated scrambling of primitives : FIS scrambling can only reduce EMI during FIS transmission. When SATA is transmitting a large number of repeated primitives, in order to reduce EMI, another similar mechanism needs to be used: repeated scrambling of primitives. This process uses the CONT primitive.
  • CRC generation and verification : The TX channel needs to calculate the CRC based on the FIS data and append it at the end of the FIS; the RX channel needs to generate a CRC based on the received FIS data and compare it with the received CRC (that is, CRC verification). If it does not match, it means that there is a bit error in FIS transmission, and the FIS needs to be discarded and the error should be reported to the upstream.
  • Flow control : The read/write rate of the hard disk medium often does not match the rate of the SATA interface, so the transport layer stipulates a flow control (Flow Control) mechanism, which depends on the HOLD and HOLDA primitives, including two types of flow control:
    • Sender flow control : When the sender is not yet ready to send FIS data (for example, the rate of reading the hard disk is slower than the rate of the SATA interface), the sender can insert the HOLD primitive to fill in the blank, so that it can support "intermittent" sending data.
    • Receiver flow control : When the receiver cannot receive FIS data temporarily (for example, the rate of writing to the hard disk is slower than the rate of the SATA interface), the receiver can send the HOLD primitive to the sender, telling the sender "don't send so fast, I accept "No more", the sender will suspend sending data and send the HOLDA primitive to the receiver to fill in the blank.

So many functions mentioned above may cause readers to confuse their logical relationship, for example, a link layer and transport layer structure Figure 5 .

Figure 5: Example implementation of link layer and transport layer

2.3 Command Layer Overview

Command layer : accept upstream read and write commands, generate and parse command FIS, and realize hard disk read and write operations. SATA supports ATA and ATAPI command sets, and each command set includes a variety of hard disk read and write methods, such as PIO mode, DMA mode, etc., so a complete command layer needs to implement a state machine with many complicated commands, but its purpose is not complicated , are to use various methods to achieve hard disk read and write. This article will only briefly introduce the DMA method: including how to use the DMA method to send and receive FIS, so as to read and write the hard disk.

So far, readers have had a rough understanding of the SATA protocol stack. The following will explain some details one by one from the link layer up. I do not understand the details of the physical layer, so this article does not talk about the physical layer.

3. Detailed explanation of link layer and transport layer

3.1 Link initialization

After the system is powered on, link initialization (link initialize) needs to be performed between the HBA and the device. Before the initialization, the FIS data cannot be transmitted normally between the HBA and the device, so SATA uses an out-of-band signal (OOB signal) to detect the existence of the other party, thereby performing link initialization. It is called an "out-of-band" signal because it is driving the differential pair to the same common voltage, which corresponds to neither a logic 0 nor a logic 1.

It is stipulated that the differential line level is different (logic 0 or logic 1) is SIGNAL, and the differential line level is the same as NOSIGNAL. SATA specifies two OOB signals:

  • COMINIT  : refers to continuously sending 6 SIGNALs, each SIGNAL lasts 106ns, and NOSIGNAL lasts 320ns between two adjacent SIGNALs.
  • COMWAKE  : Continuously send 6 SIGNALs, each SIGNAL lasts 106ns, and NOSIGNAL lasts 106ns between two adjacent SIGNALs.

Figure 6 is a sequence diagram of link initialization. First, the HBA sends a COMINIT and waits for the device to reply to the COMINIT. If no COMINITs are received, the host can send more COMINITs until one is received. Then the host sends COMWAKE to the device, and waits for the device to reply COMWAKE. After that, the HBA continuously sends a special data DIAL-TONE (translated as dial tone, which is an alternating pattern of 1 and 0) to the device, and waits for the device to send the ALIGN primitive. After the device sends the ALIGN primitive, the HBA also sends the ALIGN primitive Language to device, the link initialization can be completed. After initialization, both HBA and device send continuous SYNC primitives to each other, indicating that they are idle and ready to send and receive FIS.

Figure 6: Timing diagram for link initialization. Extracted from reference [3]

3.2 List of SATA primitives

After the link is initialized, SATA is always transmitting primitives and data (which can be called in-band signaling). The length of all primitives in SATA (before 8b10b encoding) is 4 byte (1 dword), and data is transmitted at other times except primitives (although it is not necessarily FIS data, it can also be garbage outside FIS data), and the data is also in the unit of 1 dword.

Dword is translated as "double word", which means 4 bytes.

The primitives and data involved in this article are shown in Table 1 (several uncommon primitives are not listed), and the byte form and the dword form are both before 8b10b encoding.

Note that all DWORDs in SATA are little-endian. After expressing in byte form, the low byte comes first and the high byte follows. When transmitting at the physical layer, the low byte of the DWORD is transmitted first, and then the high byte of the DWORD is transmitted. byte. For example for the ALIGN primitive, byte BC is transmitted first and byte 7B is transmitted last.

Table 1 : SATA primitive definition

name byte format (hexadecimal) Dword format (hexadecimal) Whether the first byte is K function/meaning
ALIGN BC 4A 4A 7B 7B4A4ABC Yes (K28.5) byte alignment
CONT 7C AA 99 99 9999AA7C Yes (K28.3) Repeated Primitive Scrambling/Descrambling
SYNC 7C95 B5 B5 B5B5957C Yes (K28.3) Idle (not transmitting FIS)
R_RDY 7C 95 4A 4A 4A4A957C Yes (K28.3) Ready to Receive FIS
R_IP 7C B5 55 55 5555B57C Yes (K28.3) receiving FIS
R_OK 7C B5 35 35 3535B57C Yes (K28.3) Receive FIS successfully
R_ERR 7C B5 56 56 5656B57C Yes (K28.3) Error receiving FIS
X_RDY 7C B5 57 57 5757B57C Yes (K28.3) ready to send fis
SOF 7C B5 37 37 3737B57C Yes (K28.3) send FIS beginning with
EOF 7C B5 D5 D5 D5D5B57C Yes (K28.3) send fis end
wtrm 7C B5 58 58 5858B57C Yes (K28.3) Send FIS end
HOLD 7C AA D5 D5 D5D5AA7C Yes (K28.3) Flow Control
IN CASE 7C AA 95 95 9595AA7C Yes (K28.3) Flow Control
DIAL-TONE 4A 4A 4A 4A 4A4A4A4A no Used during link initialization
DATA (ordinary data) XX XX XX XX XXXXXXXX no FIS data or junk data

Note that the meaning of XXXXXXXX in DATA (ordinary data) in Table 1 is that ordinary data is a dword that can take any value.

After reading Table 1 , you may have questions about the distinction between primitives and data: for example, if a common data happens to be 0xB5B5957C, which is the same as the SYNC primitive, then how to distinguish it as data rather than the SYNC primitive? In fact, it relies on additional information: it depends on whether the first byte (lowest byte) of the dword is encoded as K. If it is K, it is primitive; otherwise, it is common data. Whether it is K is actually 1bit additional information, which is realized by 8b10b encoding (see below for details: 8b10b encoding).

Note that the first byte of DIAL-TONE used for link initialization is not K, which means that DIAL-TONE is not a primitive, nor can it be distinguished from ordinary data 0x4A4A4A4A. In fact, because DIAL-TONE is only a concept before link initialization, all 0x4A4A4A4A encountered before link initialization are regarded as DIAL-TONE, and DIAL-TONE after link initialization are all regarded as ordinary data.

3.3 Byte alignment mechanism: ALIGN primitive

The bottom layer of SATA is the serial signal, which involves how to define the byte boundary in the serial bit stream. For this reason, SATA uses a special ALIGN primitive to define the byte boundary. The ALIGN primitive is the only one of the most special primitives, its first byte is K28.5, while the first byte of other primitives is K28.3 (for the concepts of K28.3 and K28.5, see below : 8b10b encoding), under 8b10b encoding, the K28.5 primitive will generate a unique 10bit combination mode, and the physical layer is responsible for identifying this mode. Whenever this mode is encountered, the receiver knows that it is currently in a 10bit combination mode. boundary.

Considering that the clock frequency has a certain precision, the receiver can still correctly define the 10bit boundary for a period of time after encountering ALIGN, but after a long enough time, the boundary will still be lost, so SATA requires both parties to periodically send The ALIGN primitive, known as the ALIGN insertion mechanism:

  • The SATA specification stipulates that at least 2 consecutive ALIGN primitives should be inserted for every 256 dwords sent . And it is allowed to be more frequent, for example, 2 consecutive ALIGN primitives can be inserted every 128 dwords.
  • As long as the link is initialized, the ALIGN insertion mechanism works at any time, regardless of whether primitives or data are currently being sent.
  • On the sender side, the work of inserting ALIGN is done by the link layer
  • On the receiving side, the work of using ALIGN to define the 10bit boundary is done by the physical layer. The link layer will also receive the ALIGN primitive, just ignore it.

Note: The FIS data, SOF primitive, and EOF primitive to be sent cannot be replaced just because the ALIGN primitive is to be inserted. An example is as follows (where "DATA" is the FIS data dword)

// 插入ALIGN举例
插入ALIGN前 : X_RDY X_RDY SOF DATA DATA DATA DATA DATA EOF WTRM WTRM WTRM ...
插入ALIGN后 : X_RDY X_RDY SOF DATA DATA DATA DATA DATA ALIGN ALIGN EOF WTRM WTRM WTRM, ...

When we want to send EOF, we just need to insert 2 consecutive ALIGN primitives, we can't replace EOF, we can only delay sending EOF, otherwise the receiver will not be able to define the end of FIS.

3.4 8b10b encoding

8b10b encoding means that the sending end encodes 1byte (8bit) data into 10bit encoding (called DC balance code ) to send; 8b10b decoding means that the receiving end decodes the received 10bit into the original 1byte. The number of logic 0 and logic 1 of the 10bit balance code is roughly balanced, and there are only 3 situations:

  • 5 logic 1s, 5 logic 0s
  • 4 logic 1s, 6 logic 0s
  • 6 logic 1s, 4 logic 1s

The reason why the DC balance code is transmitted is to enable the phase-locked loop (PLL) of the physical layer at the receiving end to recover the clock from the data without losing lock.

In fact, besides carrying 1 byte, 8b10b encoding can also carry 1 bit extra control bit, which indicates whether the byte is K or D. SATA uses whether it is K to distinguish whether it is a primitive: that is, the first byte of the primitive is K-byte, and the remaining bytes are D-byte. All bytes of data (non-primitive) are D-bytes.

In the context of 8b10b encoding, it is customary to write 1byte (8bit) raw data in the form of Kxx.y or Dxx.y, where xx is the decimal form of the lower 5bit of the byte, and y is the upper 3bit of the byte Decimal representation. The reason why the lower 5 bits and upper 3 bits are separated is because they are encoded separately later. An example is shown in Figure 7 .

Figure 7: Customary representation of 1byte raw data under 8b10b encoding

The D-byte used by SATA can be arbitrary, but only two K-bytes are used, namely K28.5 and K28.3. Among them, K28.5 is the first byte of the ALIGN primitive, and K28.3 is the first byte of other primitives.

This article only provides an overview of 8b10b encoding, and does not explain the subsequent details of the encoding algorithm. If you are interested, please refer to Appendix A: 8b/10b Encoding Tutorial in reference [1].

3.5 FIS packet structure

Except for the ALIGN and CONT primitives, other primitives are used to control the FIS sending and receiving process, so we need to understand the structure of the FIS data packet first.

Table 2 is the FIS data packet structure, in which the FIS-type field and the CRC field are fixed at 1 dword. The Payload field is a data field with an indeterminate length, which can be 0~2048 dword. The CRC does not need to be sent or processed by the command layer, because the transport layer will automatically insert the CRC in the TX FIS; check and delete the CRC of the RX FIS, and the command layer will report whether the CRC is wrong.

Table 2 : FIS packet structure

field FIS-type Payload CRC
length (byte) 4 0~8192 4
length (dword) 1 0~2048 1
sending behavior Need command layer to send Need command layer to send No command level send is required. transport layer insertion
receiving behavior The command layer is visible The command layer is visible The command layer is not visible. The transport layer checks, removes and reports CRC errors

In the following context, we use the noun  FIS length  to refer to the total length of the FIS-type + Payload field (minimum 1 dword, maximum 2049 dword), excluding CRC.

FIS takes dword as the unit (the length can be divisible by 4 bytes). It is customary to use dword to represent FIS. For example, a FIS with FIS length=5 is as follows:

// FIS 举例,十六进制形式,第一个 dword 是 FIS-type ,后面是 Payload ,不包含 CRC 
00258027 E0023456 00000012 00000004 00000000

The first dword 00258027 is FIS-type, and the following 4 dwords are payload.

Note that the FIS data is also little-endian. For example, for the first dword E0023456, the byte transmitted first at the bottom layer is 56, and the byte transmitted last is E0.

3.6 Calculation of CRC

All dwords of FIS-type field and Payload field will participate in CRC calculation in turn. The calculation method is expressed in the pseudocode of Verilog language style as follows:

// Verilog 风格的伪代码
wire [31:0] fis_data;     // 一个 FIS数据 dword (包括FIS_type和Payload)
reg  [31:0] crc;          // 当前 CRC 寄存器
reg  [31:0] crc_next;     // 下一个 CRC 。并不是真正的寄存器,只是 always 块内的临时变量
reg         x32;          // 并不是真正的寄存器,只是 always 块内的临时变量
integer     i;            // 并不是真正的寄存器,只是 always 块内的临时变量
always @(posedge clk)
    if( FIS传输还没有开始 ) begin
        crc <= 32'h52325032;       // 把 CRC 复位为初始值!
    end else if( 正在传输FIS,遇到一个dword的fis_data (包括FIS_type和Payload) ) begin
        crc_next = crc;
        for(i=31; i>=0; i=i-1) begin
            x32 = crc_next[31] ^ fis_data[i];
            crc_next = (crc_next<<1) ^ {5'h0, x32, 2'h0, x32, x32, 5'h0, x32, 3'h0, x32, x32, x32, 1'b0, x32, x32, 1'b0, x32, x32, 1'b0, x32, x32, x32};
        end
        crc <= crc_next;         // 算出来的 crc_next 更新 crc 寄存器
    end else if( 当前FIS传输结束 ) begin
        // 把 crc 插入FIS的末尾
    end

On the receiving side, use the same algorithm to calculate the CRC and compare it with the CRC sent by the sending side. If there is a match, there is no error. If there is no match, there is an error. The error needs to be reported to the command layer.

3.7 FIS sending process

In addition to the ALIGN and CONT primitives, almost all other primitives in Table 1 are used to control the sending and receiving process of FIS. Here is an example as follows (note that the periodic insertion of the ALIGN primitive is omitted here, and the repeated primitive scrambling mechanism to be discussed below is also omitted), where "DATA" represents a FIS data dword.

// FIS 发送进程举例 (忽略 ALIGN 的插入和 原语的重复加扰)
发送方发送 : SYNC SYNC X_RDY X_RDY X_RDY X_RDY X_RDY  SOF  DATA  DATA  DATA DATA EOF  WTRM WTRM WTRM WTRM SYNC SYNC SYNC SYNC SYNC
接收方发送 : SYNC SYNC SYNC  SYNC  SYNC  R_RDY R_RDY R_RDY R_RDY R_RDY R_IP R_IP R_IP R_IP R_IP R_OK R_OK R_OK R_OK R_OK SYNC SYNC

The above process is interpreted as follows:

  • When both parties are idle, they are continuously sending SYNC primitives. This state is called idle state (IDLE).
  • The sender wants to initiate a FIS send, it starts sending X_RDY primitives continuously.
  • After a delay, the receiver receives X_RDY (the delay comes from the signal processing delay of the link layer and physical layer), and if the receiver is ready to receive FIS, it will continue to send R_RDY primitives.
  • After receiving the R_RDY primitive, the sender sends a SOF primitive, followed by sending the dword of the FIS packet (including FIS-type, Payload, CRC) one by one, and the last dword (CRC) must be followed by a EOF primitive, and then continue sending WTRM primitives.
  • After receiving the SOF, the receiver starts to send the R_IP primitive continuously, indicating that receiving in progress. CRC check is performed after the FIS has been completely received. After receiving the WTRM primitive, if the CRC check is correct, keep sending the R_OK primitive, otherwise keep sending the R_ERR primitive.
  • After the sender receives R_OK or R_ERR, it starts to send SYNC continuously.
  • After the receiver receives the SYNC, it also starts to send the SYNC continuously, and then returns to the idle state, and the FIS transmission ends.
  • If the sender receives R_ERR, it will report the error to the upper layer, and the upper layer will decide whether to retransmit the FIS.

From this process, we can see that when one channel sends FIS, the other channel sends the four primitives R_RDY, R_IP, R_OK, R_ERR to control the process of sending FIS by the other party, so it is impossible for FIS to be bidirectional at a point in time send. In other words, SATA is physically full duplex and logically half duplex.

Also, since both the HBA and the device have the authority to initiate the FIS send process, it is possible that both the HBA and the device are sending X_RDY while trying to start the FIS send. SATA stipulates that in this case, the HBA always yields to the device: as long as the HBA detects the X_RDY sent by the device, it must give up the current sending process and send R_RDY instead, ready to receive the FIS sent by the device.

3.8 FIS Scrambling/Descrambling

The purpose of scrambling is to make the 0-1 sequence transmitted on the SATA cable more chaotic, so that the electromagnetic radiation is closer to white noise (rather than concentrated in a certain frequency), thereby reducing electromagnetic interference (EMI).

The FIS scrambler will generate a pseudo-random number sequence, each time a dword is generated, and it is bitwise XORed with the dword of the FIS data to obtain the scrambled FIS. The FIS-type field, Payload field, and CRC field of the FIS must be scrambled. None of the primitives are FIS scrambled.

The process of FIS scrambling is expressed in the pseudo code of Verilog language style as follows:

// Verilog 风格的伪代码
wire [31:0] fis_data;        // 加扰前的一个 FIS 数据 dword 输入 (包括FIS_type、Payload、CRC)
reg  [31:0] fis_data_scram;  // 加扰后的一个 FIS 数据 dword 输出
reg  [15:0] scram;           // 当前的加扰值寄存器
reg  [15:0] scram_next;      // 下一个加扰值。并不是真正的寄存器,只是 always 块内的临时变量
reg  [31:0] scram_rand;      // 加扰器生成的伪随机数。并不是真正的寄存器,只是 always 块内的临时变量
reg         x16;             // 并不是真正的寄存器,只是 always 块内的临时变量
integer     i;               // 并不是真正的寄存器,只是 always 块内的临时变量
always @(posedge clk)
    if( FIS传输还没有开始 ) begin
        scram <= 16'hFFFF;       // 把加扰值复位为初始值!
    end else if( 正在传输FIS,遇到一个dword的fis_data (包括FIS_type、Payload、CRC) ) begin
        scram_next = scram;
        for(int i=0; i<32; i++) begin
            x16 = scram_next[0];
            scram_next = (scram_next>>1) ^ {x16, 3'h0, x16, 8'h0, x16, 1'b0, x16};
            scram_rand[i] = x16;
        end
        scram <= scram_next;         // 算出来的 scram_next 更新 scram 寄存器
        fis_data_scram <= fis_data ^ scram_rand;   // 按位异或,对 FIS 数据的 dword 加扰
    end

FIS needs to be descrambled at the receiving end. Descrambling is a symmetrical operation of scrambling. It only needs to use the same algorithm to generate a pseudo-random dword sequence, and perform bitwise XOR operation on the received FIS. Because the data remains unchanged after two XORs, the data before scrambling can be recovered after descrambling.

Note that both the sender and the receiver should reset the scrambler/descrambler when FIS is not transmitting, that is, reset the scram register in the above pseudocode to 0xFFFF. After reset, the generated pseudo-random dword sequence is fixed, and the first 6 dwords are listed here as follows:

// 加扰器/解扰器复位后生成的伪随机 dword 序列,只展示前 6 个
0xC2D2768D , 0x1F26B368 , 0xA508436C , 0x3452D354 , 0x8A559502 , 0xBB1ABE1B , ......

3.9 Repeated scrambling of primitives

This section will talk about the role of the CONT primitive.

The FIS scrambling mechanism in the previous section only solves the EMI problem during FIS transmission, but SATA does not send FIS in many cases, but sends repeated primitives (such as repeatedly sending SYNC primitives when idle). Modes can also cause the spectrum of electromagnetic radiation to be concentrated in a certain frequency band, causing EMI problems. Therefore, SATA introduces the CONT primitive and the scrambling mechanism to the repetition primitive.

In the repeated scrambling mechanism, SATA divides primitives into 4 categories:

  • Non-repeatable primitives : SOF, EOF, CONT
  • Repeatable primitives (last repetition need not be retained) : SYNC, R_RDY, R_IP, R_OK, R_ERR, X_RDY, WTRM
  • Repeatable primitives (the last repetition must be kept) : HOLD, HOLDA
  • Primitives that do not affect repeated scrambling at all : ALIGN

For repeatable primitives , if they repeat more than three times in a row, SATA requires that the third repeated primitive be replaced with CONT, and then replace it with junk data starting from the fourth repeated primitive, which is a pseudo-random dword sequence , its generation algorithm is the same as the pseudo-random number generation algorithm used for FIS scrambling in the previous section. However, the two pseudo-random number generators cannot affect each other and should work independently. In addition, the garbage data will be directly distinguished and discarded by the receiving end, so the receiving end does not care about the value of the garbage data, and SATA does not require the sending end to reset the pseudo-random number generator of the garbage data, which can never be reset.

An example is as follows (where "DATA" represents a FIS data dword, and "GARB" represents a garbage data dword):

// 重复加扰举例 (忽略 ALIGN 的插入)
重复加扰前 : SYNC SYNC SYNC SYNC X_RDY X_RDY X_RDY SOF DATA  DATA  DATA DATA EOF WTRM WTRM WTRM WTRM WTRM SYNC SYNC SYNC SYNC SYNC
重复加扰后 : SYNC SYNC CONT GRAB X_RDY X_RDY CONT  SOF DATA  DATA  DATA DATA EOF WTRM WTRM CONT GARB GARB SYNC SYNC CONT GARB GARB
Now we can understand: After link initialization, each dword actually transmitted on SATA can only be divided into three types: primitive, FIS data, or garbage data after CONT primitive.

There are two special repeatable primitives : HOLD and HOLDA, which must be preserved when repeating the last time and cannot be replaced with CONT or garbage data. This is because HOLD and HOLDA will be inserted into the FIS data (discussed in the flow control section below). During repeated scrambling, in order to allow the receiver to distinguish junk data from FIS data, it is required to be in the HOLD and HOLD The last time HOLDA repeats, instead of replacing it with CONT or garbage data, HOLD and HOLDA itself are passed.

Examples of repeated scrambling for HOLD and HOLDA are as follows. Among them, the erroneous repeated scrambling has the continuous situation of "GRAB" and "DATA", because the receiver cannot distinguish whether a piece of data is FIS data or junk data from the data itself, so this will cause confusion errors. However, the correct scrambling uses the HOLD and HOLDA primitives to separate the FIS data from the garbage data, and there is no confusion problem.

// HOLD 和 HOLDA 在重复加扰时的特殊处理举例 (忽略 ALIGN 的插入)
重复加扰前         : SOF DATA DATA HOLD HOLD HOLD HOLD HOLD HOLD DATA DATA DATA HOLDA HOLDA HOLDA HOLDA DATA EOF WTRM WTRM WTRM WTRM SYNC
重复加扰后(错误!!) : SOF DATA DATA HOLD HOLD CONT GRAB GARB GRAB DATA DATA DATA HOLDA HOLDA CONT  GRAB  DATA EOF WTRM WTRM CONT GARB SYNC
重复加扰后(正确  ) : SOF DATA DATA HOLD HOLD CONT GRAB GARB HOLD DATA DATA DATA HOLDA HOLDA CONT  HOLDA DATA EOF WTRM WTRM CONT GARB SYNC

In addition, ALIGN is a special primitive for the repetition scrambling mechanism, and the repetition scrambling mechanism will not be interrupted by the insertion mechanism of ALIGN. For example, the sequence before repeated scrambling is periodically inserted into ALIGN. It can be seen that the repeated scrambling mechanism directly ignores the ALIGN primitive, and the repeated primitives appearing before and after ALIGN are still regarded as repeated.

// ALIGN 不会打断重复加扰的进程举例
重复加扰前 : EOF WTRM WTRM WTRM ALIGN ALIGN WTRM WTRM WTRM WTRM SYNC SYNC SYNC SYNC SYNC ALIGN ALIGN SYNC
重复加扰后 : EOF WTRM WTRM CONT ALIGN ALIGN GRAB GRAB GARB GRAB SYNC SYNC CONT GRAB GRAB ALIGN ALIGN GRAB

Finally, it should be mentioned that the repeated scrambling mechanism of primitives is actually relatively loose, and the sender can perform repeated scrambling loosely, thus simplifying some logic:

  • It is not necessary to insert CONT when repeating the 3rd time, you can insert CONT when it is 4th, 5th..., followed by garbage data (but at least the 3rd time).
  • The inserted ALIGN can interrupt the current repetition scrambling process.
  • It is possible to make repetition scrambling of all other repeatable primitives the same as HOLD and HOLDA: when repeating the last time, do not replace it. This way we don't need to distinguish between the two cases.
  • It is even possible to do no repeated scrambling at all, and all primitives are sent unchanged. This won't affect any functionality on the receiving end, as long as you don't care about EMI.

However, the receiver must be able to correctly handle all cases that meet the specification.

The repeated scrambling mechanism is summarized as the following principles:

  • Only repeatable primitives  (SYNC, R_RDY, R_IP, R_OK, R_ERR, X_RDY, WTRM, HOLD, HOLDA) will participate in the repeat scrambling mechanism.
  • At least the third (or more) repetition should be replaced with CONT.
  • Garbage data must follow CONT continuously. Garbage data can only be interrupted by the ALIGN primitive, and must remain continuous in other cases.
  • The last time HOLD and HOLDA are repeated, MUST be sent unchanged and cannot be replaced by CONT or garbage data.
  • The repeated scrambling of primitives does not affect the FIS data scrambling, and the two work independently.

3.10 Flow Control

This section describes the role of the HOLD and HOLDA primitives.

Because the read/write rate of the hard disk medium does not match the rate of the SATA interface, the transport layer specifies a Flow Control mechanism, which depends on the HOLD and HOLDA primitives, including two types of flow control: sender flow control and receiver flow control .

Sender flow control : When the sender is not yet ready to send FIS data (for example, the rate of reading the hard disk is slower than the rate of the SATA interface), the sender can insert the HOLD primitive to fill in the blank, so that it can support "intermittent" sending data. The flow control logic of the sender is as follows:

  • When the sender sends FIS data, if the next dword data is not ready yet, it will send HOLD until the data is ready.
  • If the receiving party detects HOLD, it will send HOLDA to tell the other party "I know you are not ready". On the contrary, if it detects FIS data, it will send R_IP normally.

Examples are as follows:

// 发送方流控举例 (忽略 ALIGN 的插入和 原语的重复加扰)
发送方发送 : X_RDY  SOF  DATA  DATA DATA HOLD DATA DATA  DATA HOLD HOLD HOLD  HOLD  DATA  EOF   WTRM WTRM
接收方发送 : R_RDY R_RDY R_RDY R_IP R_IP R_IP R_IP HOLDA R_IP R_IP R_IP HOLDA HOLDA HOLDA HOLDA R_IP R_OK

Receiver flow control : When the receiver cannot receive FIS data temporarily (for example, the rate of writing to the hard disk is slower than the rate of the SATA interface), the receiver can send the HOLD primitive to the sender, telling the sender "don't send so fast, I accept "No more", the sender will suspend sending data and send the HOLDA primitive to the receiver to fill in the blank.

Note: Because there is a round-trip time difference between when the receiver sends HOLD and when the sender inserts HOLDA, the receiver needs a receive buffer , and sends HOLD when the buffer is close to full (not completely full ), until the sender sends When the HOLDA comes, the cache can still store the data transmitted within this time difference without causing overflow. The SATA Spec stipulates that the cache is nearly full, which means that there is only 20 dword (80 byte) space left in the cache, so it is also stipulated that this time difference cannot be greater than the transmission time of 20 dwords.

The logic of receiver flow control is as follows:

  • When receiving FIS data, if the receiver finds that there are only 20 dwords left in the receiving buffer, it will stop sending R_IP and start sending HOLD continuously until some data in the receiving buffer is taken away and the remaining space is sufficient, then continue sending R_IP.
  • When the sender sends FIS data, if it receives HOLD, it will suspend sending data, but send HOLDA to fill in the blank, and continue to send data until it receives R_IP.

Examples are as follows:

// 接收方流控举例 (忽略 ALIGN 的插入和 原语的重复加扰)
发送方发送 :  SOF  DATA  DATA  DATA DATA DATA DATA DATA HOLDA HOLDA HOLDA HOLDA HOLDA DATA DATA EOF  WTRM WTRM WTRM
接收方发送 : R_RDY R_RDY R_RDY R_IP R_IP HOLD HOLD HOLD HOLD  HOLD  HOLD  R_IP  R_IP  R_IP R_IP R_IP R_IP R_IP R_OK

3.11 Summary of link layer and transport layer

We have seen that there are many and complex mechanisms in the link layer and the transport layer, but its purpose is not complicated, that is, to consider how to use two pairs of high-speed serial differential lines to achieve reliable FIS data packet transmission. Clock recovery, byte alignment, EMI reduction, bit error checking, and flow control mechanism when the rate does not match. The reader is advised to review Figure 5 to understand the relationship between the various mechanisms.

4. Command layer: DMA read and write

All the content explained above is in the link layer and transport layer. This section briefly explains the DMA read and write commands in the command layer, and shows how to use the FIS package to implement hard disk read and write.

As mentioned earlier, the first dword of FIS is the FIS-type field, which determines the FIS type, as shown in Table 3 .

Table 3 : FIS Types

FIS-type field
(hexadecimal, X stands for dont care)
FIS type FIS length (FIS-type+Payload)
(dword)
XXXXXX27 HBA to device register 5
XXXXXX34 device to HBA register 5
XXXXXXA1 set device bits 2
XXXXXX5F PIO setup 5
XXXXXX39 DMA activate 2
XXXXXX41 First Party DMA Setup 7
XXXXXX46 data 1~2049
XXXXXX58 BIST activate 3

To perform simple DMA read and write, in fact, only four types of FIS are needed: XXXXXX27, XXXXXX34, XXXXXX39, and XXXXXX46.

After the link is initialized and before reading and writing, the HBA needs to initiate an identify request to the device for FIS, which is of HBA-to-device Register type and contains 5 dwords:

// 用于发起 identify 请求的 FIS ,HBA->device ,第一个 dword 是 FIS-type ,后面是 Payload ,这里不包含 CRC
00EC8027 00000000 00000000 00000000 00000000

The device responds with two FISs:

  • The first one is FIS of type PIO setup.
  • The second one is FIS of data type, and its Payload is fixed at 128 dword, which contains various information of the hard disk (see [1] for details, and will not be explained in detail here).

Then you can initiate read and write requests. DMA reads and writes in units of sectors, each sector is 512 bytes (128 dword), and you can specify the number of consecutive sectors to read and write at a time. Use 48-bit LBA (logic block address, logical block address) to address the sector, for example, LBA=0x000001234567 represents the 0x1234567th sector .

Figure 8: DMA read timing diagram

The timing diagram of DMA reading sector is shown in Figure 8 . First, the HBA sends a 5 dword HBA-to-device Register type read request FIS, the format is as follows. Where XXXXXX is LBA[23:0] and YYYYYY is LBA[47:24]. ZZ is the number of read and write sectors, one or more sectors can be read at a time.

// 用于发起DMA读请求的 FIS ,HBA->device ,第一个 dword 是 FIS-type ,后面是 Payload ,这里不包含 CRC
00258027 E0XXXXXX 00YYYYYY 000000ZZ 00000000

For example, if we want to read LBA=0x0000A1234567 (that is, the 0xA1234567th sector), and read 4 sectors continuously, then the HBA should send the command FIS:

// 用于发起DMA读请求的 FIS 举例,HBA->device ,第一个 dword 是 FIS-type ,后面是 Payload ,这里不包含 CRC
00258027 E0234567 000000A1 00000004 00000000

Then the hard disk will send the read data (FIS of data type). Considering that the maximum value of the Payload field of FIS is 2048 dword, if the number of sectors to be written by HBA is ≤16 (≤2048 dword), the hard disk will only respond to 1 FIS. Otherwise multiple FIS will be responded. The format is:

// 硬盘发送DMA读数据,device->HBA ,第一个 dword 是 FIS-type ,后面是 Payload (也即读出的数据),这里不包含 CRC
00000046 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ...

After all the data is sent, the device will also send a device-to-HBA Register type FIS with a length of 5 dwords to the HBA to display its own status.

Figure 9: DMA write timing diagram

The timing diagram of DMA writing sector is shown in Figure 9. First, the HBA sends a 5 dword HBA-to-device Register type read request FIS, the format is as follows. Where XXXXXX is LBA[23:0] and YYYYYY is LBA[47:24]. ZZ is the number of read and write sectors, one or more sectors can be written at a time.

// 用于发起DMA写请求的 FIS ,HBA->device ,第一个 dword 是 FIS-type ,后面是 Payload ,这里不包含 CRC
00358027 E0XXXXXX 00YYYYYY 000000ZZ 00000000

For example, if we want to write to LBA=0x000000000001 and only write 1 sector, then HBA should send the command FIS:

// 用于发起DMA写请求的 FIS ,HBA->device ,第一个 dword 是 FIS-type ,后面是 Payload ,这里不包含 CRC
00358027 E0000001 00000000 00000001 00000000

Then the hard disk will respond with a DMA activate type FIS, the FIS is only 1 dword (Payload length=0), the format is as follows, telling the HBA to send write data now.

// 用于通知HBA可以发送数据的FIS ,device->HBA ,第一个 dword 是 FIS-type ,没有 Payload ,这里不包含 CRC
00000039

The HBA then sends the read data (FIS of type data). Considering that the maximum value of the Payload field of FIS is 2048 dword, if the number of sectors to be written by HBA is ≤16 (≤2048 dword), HBA will send 1 FIS. Otherwise, multiple FISs are sent. The format is:

// HBA发送DMA写数据,HBA->device ,第一个 dword 是 FIS-type ,后面是 Payload (也即要写的数据),这里不包含 CRC
00000046 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ...

After all the data is sent, the device will also send a device-to-HBA Register type FIS with a length of 5 dwords to the HBA to display its own status.

So far we have a brief understanding of the methods of reading and writing hard disks. To learn more about command layer protocols, please read references [1].

References

[1] SATA Storage Technology : https://www.mindshare.com/Books/Titles/SATA_Storage_Technology

[2] Serial ATA: High Speed Serialized AT Attachment : https://www.seagate.com/support/disc/manuals/sata/sata_im.pdf

[3] Nikola Zlatanov : design of an open-source sata core :  https://www.researchgate.net/publication/295010956_Design_of_an_Open-Source_SATA_Core

[4] Louis Woods et al. : Groundhog - A Serial ATA Host Bus Adapter (HBA) for FPGAs : https://ieeexplore.ieee.org/abstract/document/6239818/

[5] Open source SATA Gen2 host (HBA):  https://github.com/WangXuan95/FPGA-SATA-HBA

Guess you like

Origin blog.csdn.net/cy413026/article/details/131904690