Understanding SerDes Part 2

2.3 Receiver Equalizer (Rx Equalizer)

2.3.1 Linear  Equalizer

The goal of the receiver equalizer is the same as that of the transmit equalizer. For low-speed (<5Gbps) SerDes, the continuous time domain is usually used, and a linear equalizer is implemented such as a peaking amplifier. The gain of the equalizer for high-frequency components is greater than the gain for low-frequency components. Figure 2.8 shows the frequency domain characteristics of a linear equalizer. Usually the factory encapsulates the equalization characteristics into several levels, which can be dynamically set to adapt to different channel characteristics, such as High/Med/Low.

Understanding SerDes <wbr> 2

Figure 2.8 Frequency Response of A peaking Amplifier based Rx Equalizer

2.3.2 DFE Equalizer (Decision Feedback Equalizer)

For high-speed (>5Gbps) SerDes, since the jitter of the signal (such as ISI-related deterministic jitter) may exceed or approach one symbol interval (UI, Unit Interval), the use of a linear equalizer alone is no longer applicable. Linear equalizers amplify noise and signal together, and do not improve SNR or BER. For high-speed SerDes, a nonlinear equalizer called DFE (Decision Feedback Equalizer) is used. DFE predicts the sampling threshold of the current bit by tracking the data (history bits) of the past multiple UIs. DFE only amplifies the signal, not the noise, which can effectively improve the SNR.

Figure 2.9 demonstrates a typical 5th order DFE. The received serial data is determined by a comparator (slicer) to determine 0 or 1, and then the data stream is predicted by a filter to predict the inter-symbol interference (ISI), and then subtract the inter-symbol interference (ISI) from the input original signal, thereby to a clean signal. In order to make the circuit of the DFE equalizer work within the linear range of the circuit, the serial signal first passes through the VGA to automatically control the amplitude of the signal entering the DFE.

 

Understanding SerDes <wbr> 2

In order to understand the working principle of DFE , let's first look at the impulse response of a 10Gbps backplane. This backplane model is a measured model given by matlab and has typical characteristics.
Understanding SerDes <wbr> 2

In Figure 2.10, a horizontal grid represents the time of a UI. It can be seen that the pulse signal of a UI (0.1nS = 1/10GHz) leaks into multiple adjacent UIs before and after passing through the backplane, thereby causing interference to the data of other UIs. The interference behind the sampling point is called post-cursor interference, and the interference in front of the sampling point is called pre-cursor interference. The first coefficient h1 of the DFE (0.175 in this example) corrects the first post-cursor, and the second coefficient h2 (0.075 in this example) corrects the second post-cursor. The more orders of DFE, the more post-cursor that can be corrected.

 

Understanding SerDes <wbr> 2

Using the above backplane to transmit a 11011 code stream, due to the leakage of post-cursor and pre-cursor , if there is no equalization, '0' will not be recognized, see Figure 2.11 . Assuming a 2 -order DFE,  the amplitude at the '0' bit should be subtracted from the h2 of the first ' 1' bit , and the h1 of the  second ' 1' bit,  to get 0.35-0.075-0.175 = 0.1, which  is enough to be identified as 0 .

It can be seen that the DFE calculates the post-cursor interference of historical bits , and subtracts the interference from the current bit to obtain a clean signal. Since the DFE can only correct the post-cursor ISI,  the DFE is usually preceded by an LE . As long as the coefficient of the DFE is close to the pulse response of the channel , ideal results can be obtained. However, the channel is a time-varying medium, and factors such as slow changes in temperature and voltage processes will change the characteristics of the channel . Therefore , the coefficients of the DFE need an adaptive algorithm to automatically capture and follow the changes of the channel. The DFE coefficient adaptive algorithm is very academic, and each manufacturer's algorithm is confidential and will not be released to the public. For NRZ codes, the typical algorithm criterion is based on a sign-error- driven algorithm. Sign-error is the error between the amplitude and the expected value of the signal after equalization. The algorithm takes the minimum sign-error mean square error as the optimization goal, and optimizes h1/h2/h3… successively . because of sign-errorand the sampling position are coupled together and influence each other, so the DFE coefficient can also be predicted based on the two criteria of sign-error and eye width . Therefore, SerDes with DFE structure usually have built-in eye diagram test circuit, as shown in Figure 2.9 . The eye diagram test circuit calculates the bit error rate BER at each translation position by shifting the amplitude of the signal in the vertical direction and the sampling position in the horizontal direction , so as to obtain the " eye diagram " of the relationship between each offset position and the bit error rate . See Figure 2.12 .

Understanding SerDes <wbr> 2
 Figure 2.12 SerDes Embedded Eye-Diagram Test Function

 

2.4 Clock Data Recovery (CDR)

The goal of CDR is to find the best sampling moment, which requires rich transitions in the data. CDR has an indicator called the longest continuous 0 or even 1 length tolerance (Max Run Length or Consecutive Identical Digits) capability. If the data does not jump for a long time, the CDR cannot be accurately trained, and the CDR sampling moment will drift, and more 1s or 0s may be collected than the real data. And when the data resumes and jumps again, there may be wrong sampling. For example, some CDRs are implemented by PLL. If the data stops jumping for a long time, the output frequency of the PLL will drift. In fact, the data transmitted on SerDes is either scrambled or encoded to ensure that the Max Run Length is within a certain range.

l The 8B/10B encoding method can ensure that the Max Run Length does not exceed 5 UI.

l 64B/66B encoding method can ensure that Max Run Length does not exceed 66 UI

l SONET/SDH scrambling method can ensure that Max Run Length does not exceed 80 UI (BER<10^-12)

In point-to-point connections, most SerDes protocols use continuous-mode, and the data flow on the line is continuous without interruption. In point-to-multipoint connections, burst-mode (burst-mode) such as PON is often used. Obviously Burst-Mode has strict requirements on SerDes locking time.

Continuous-Mode protocols such as SONET/SDH are required to tolerate long continuous 0s, and also have strict requirements on the jitter transmission performance of CDR (because of loop timing).

If the receiving (Rx) sending (Tx) is asynchronous mode, or in spread spectrum (SSC) applications, the CDR is required to have a wider phase tracking range to track the Rx/Tx frequency difference.

According to the different requirements of application scenarios, there are many kinds of architectures for the implementation of CDR. FPGA SerDes often use digital PLL-based CDR, and phase interpolator-based CDR. These two CDRs use digital filters in the loop, which saves area compared to the structure of the analog charge pump plus the analog filter.

Understanding SerDes <wbr> 2

 

Figure 2.13 is a CDR based on a phase interpolator. The phase detector array performs phase comparison between the input serial data and M clocks with equal phase intervals in the spans of multiple UIs to obtain phase error signals on the spans of multiple UIs. The frequency of the phase error signal is very high and the width is very wide. After the decimator is slowed down and smoothed, it is sent to the digital filter. The performance of the digital filter affects the bandwidth, stability, response speed, etc. of the loop. The error signal smoothed by the digital filter is sent to the phase interpolator (phase rotators) to correct the clock phase. When the final loop is locked, the phase error is theoretically zero, and the 90-degree offset clock is used as the recovered clock to sample the serial input.



Understanding SerDes <wbr> 2

Figure 2.14 shows the DPLL-based CDR, which is divided into two loops. The phase tracking loop for data is similar to the CDR in Figure 2.13. The phase detector array compares the phases of the input serial data with M clocks with equal phase intervals (may also be in the span of multiple UIs) to obtain a phase error signal. The phase error signal is sent to a digital filter. The performance of the digital filter affects the bandwidth, stability, response speed, etc. of the loop. The error signal smoothed by the digital filter is sent to the VCO to correct the clock phase. When the final loop is locked, the phase error is theoretically zero, and the 90-degree offset clock is used as the recovered clock to sample the serial input.

DPLL-based CDR has an additional frequency tracking loop (Frequency Tracking Loop). This is to reduce the locking time of the CDR and reduce the design constraints on the loop filter. Switching to the data phase tracking loop occurs only when the frequency tracking loop is locked. When the phase tracking loop loses lock, it automatically switches to the frequency tracking loop. N times the reference clock frequency and line rate are nearly equal, so the VCO steady-state control voltages of the two loops are nearly equal. With the frequency tracking loop, the acquisition time of the phase tracking loop is reduced.
    When the phase tracking loop is locked, the frequency tracking loop does not affect the phase loop. Therefore, the SerDes receiving side does not have high requirements on the jitter of the reference clock.

The reference clock of the CDR based on the phase interpolator can be a common PLL for transmitting and receiving, or it can be an independent PLL for each channel. The jitter of the reference clock of this structure will directly affect the jitter of the recovered clock and the received bit error rate.

Phase detector (PD)

The phase detector is used to compare the phase error. The phase error is represented by the signal of UP or DN. The duration of UP/DN is proportional to the phase error. An example of a bang-bang phase detector is shown in Figure 2.15. Only four-phase recovered clocks are used as examples in the example.

Understanding SerDes <wbr> 2

 

Decimators and filters

The decimator is to make the filter work at lower frequencies. The step size of the decimation and the smoothing method will affect the performance of the loop. The digital filter consists of a proportional branch (Proportion) and an integral branch (Integral), which track phase error and frequency error respectively. In addition, the processing delay of the digital filter should not be too large. If the processing delay is too large, the loop will not be able to track the rapid changes in phase and frequency, resulting in bit errors.

The structure of CDR is not limited to the above two, and there are many other variants. Basically a phase locked loop. The following performance of the loop, stability (STABILITY), bandwidth (bandwidth)/gain (gain) performance analysis is a very academic problem, with small signal linear model analysis, there are many books and materials explaining the quantification of the loop performance. Some of the characteristics of the CDR loop are summarized as follows:

l Loop bandwidth

1. The phase jitter whose frequency is lower than the loop bandwidth will be transferred to the recovered clock through the CDR. In other words, jitter at frequencies below the loop bandwidth can be tracked by the CDR without causing bit errors. High frequency jitter components may cause bit errors depending on the magnitude of the jitter amplitude.

2. The larger the loop bandwidth, the shorter the locking time and the larger the jitter of the recovered clock. On the contrary, the longer the lock time is, the smaller the jitter of the recovered clock is. As a CDR, we hope that the loop bandwidth is larger, so that it can have greater jitter tolerance, but for loop timing applications such as SONET/SDH, the jitter of the recovered clock is limited, and it cannot be too large.

3. The switching frequency of the switching power supply is generally less than the loop bandwidth and can be tracked by the CDR. However, on the one hand, the noise coupled to the VCO (Digital to Multi-Phase Convertor) by the switching power supply cannot be tracked by the loop, and the low-cost Ring VCO is especially sensitive to power supply noise. On the other hand, the harmonics of the switching power supply may exceed the loop bandwidth.

Some protocols provide CDR gain masks, such as SDH/SONET. Compatibility with these protocols requires calculating input and output jitter budgets.

 

2.5   Common phase-locked loop (PLL)
   SerDes need an internal clock working at the data baud rate, or an internal clock at 1/2 the data baud rate, working in DDR mode. The reference clock frequency provided to SerDes off-chip is much lower than the data baud rate, and the PLL is used to multiply the frequency to generate an internal high-frequency clock. The SerDes PLL of the FPGA generally has 8x, 16x, 10x, 20x, 40x modes to support the commonly used SerDes interface protocol. For example, when PCIExpress works at 5Gbps, an off-chip reference clock of 125MHz needs to be provided in 40x mode, and an off-chip reference clock of 250MHz needs to be provided in 20x mode.

A third-order PLL circuit is shown in Figure 2.17. The phase of the input signal and the phase of the VCO feedback signal are compared by the phase detector. The phase error is converted into a voltage or current signal by the charge pump. After the Loop Filter is smoothed, the control voltage is generated and the phase of the VCO is corrected. , and eventually the phase error tends to zero.

 

Understanding SerDes <wbr> 2

                                    Figure 2.17 A 3-order Type II PLL

The working process of PLL is divided into locking process and tracking process. In the lock-in process, the model of the loop can be expressed by a nonlinear differential equation, and the capture time, capture bandwidth and other indicators can be evaluated. After locking, in the small signal range, the model of the PLL is a constant coefficient linear equation, and the bandwidth, gain, stability and other properties of  the PLL can be studied in the Laplace transform domain . Figure 2.18 is the small signal mathematical model.

 

Understanding SerDes <wbr> 2

The PLL names the order of the loop in terms of the number of transfer function poles (roots of the denominator). The VCO has an integral action on the phase (K vco /s), so a loop without a filter is called a first-order loop. A loop with a first-order filter is called a second-order loop. First-order and second-order rings are unconditionally stable systems. However, higher order loops have more poles and zeros and can independently adjust the performance of band, gain, stability, capture band, capture time, etc.

The frequency domain transfer function of PLL is mainly determined by the loop filter F(s)| s=jw . A general PLL frequency domain transfer curve is shown in Figure 2.19. There are two important features, loop belting and jitter peaking. Excessive peaking will amplify the jitter, and a large damping factor can limit the peaking, but will increase the loop's lock time, affecting the roll-off speed and natural frequency .

 

 

l When the loop is locked, the fixed phase difference:

K dc is the DC open-loop gain of the loop, and Δω is the difference between the VCO center frequency and the controlled frequency. The PLL phase error is zero for the charge pump + passive filter structure.

l When the loop is locked, there is only a fixed phase difference, and the frequencies of the two input signals are equal.

          fr/M = fo/N

l For the noise at the input, the loop is a low-pass filter that can suppress noise or interference higher than the loop cutoff frequency. As a SerDes PLL, it is desirable to have a smaller bandwidth to suppress interference and noise on the reference clock.

 

For VCO noise, the loop acts as a high-pass filter. Only VCO noise below the loop cutoff frequency is suppressed. Excessive VCO high frequency noise can degrade clock jitter. The VCO of low-speed SerDes (<5Gbps) adopts the VCO of Ring structure for cost consideration, which is noisy and sensitive to power supply. The VCO of the high-speed SerDes adopts the LC structure VCO with low noise.


unfinished.....

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324662206&siteId=291194637
Recommended