Introduction to Decision Feedback Equalizer

Reference links: https://www.163.com/dy/article/GK6BBSEB0531PW97.html
https://zhuanlan.zhihu.com/p/477141677

DFE stands for Decision Feedback Equalizer, which is the decision feedback equalizer. It is a commonly used equalizer in telecommunications communication systems. It implements the functions of equalizer, filter and demodulator at the receiving signal end. The DFE equalizer mainly avoids the impact of multipath interference by processing feedback on the received signal. Specifically, the DFE equalizer can try to predict the signal state of the next input sample based on previous decision information and use it to improve the quality and stability of the received signal. The steps of its operation are as follows:

  1. Collection of input signals: The received signal is collected through the antenna and converted into a digital signal through the ADC.
  2. Forward equalizer: The input signal is equalized by the forward equalizer to eliminate the influence from multipath interference.
  3. Decision maker: The input signal enters the decision maker, and the status of the next input sample is calculated based on the decision-making information, and a corresponding decision is made.
  4. Inverse equalizer: Since the decision from the decision maker is only accurate to a certain extent, feedback processing through the inverse equalizer is required to eliminate decision errors and suppress interference signals.
  5. Signal output: Output the balanced signal through the output port.
    DFE equalizers are often used in high-speed data transmission systems to eliminate multipath interference and improve signal reliability and stability.

Decision Feedback Equalier (DFE) is an equalization method commonly used in the RX part of SerDes, which can effectively improve the reception performance of RX.

DFE is typically used in SERDES receivers (RX) to eliminate inter-signal interference (ISI) caused by lossy channels. The DFE includes a finite impulse response (FIR) filter, an adder, and a slicer for signal decision making. Figure 1 shows an example of a three-tap DFE. The advantage of DFE is that it can eliminate ISI without amplifying the noise. The disadvantage is that it cannot correct forward ISI.
Insert image description here
First, the ISI from the previous signal is calculated using FIR. The interference is then subtracted from the input signal. Finally, the slicer makes a symbol decision and outputs the signal back to the FIR. Tap values ​​are chosen to minimize ISI. In this example, the tap values ​​are -0.223, -0.073, and -0.033. Figure 2 shows the signal sampling point after the interference is removed.

In the channel simulation shown in Figure 3, Optimized Initial Tap Calculation is enabled for the DFE in the Rx_Diff component, as shown in Figure 4. When the simulation ends, the optimal tap values ​​are written to a text file. The left panel of Figure 5 shows the eye diagram with DFE disabled, and the right panel of Figure 5 shows the eye diagram with DFE enabled. We can see that DFE improves the eye diagram.
Insert image description here
Insert image description here
Insert image description here
The function of DFE is essentially to reduce inter-symbol interference and reduce the tailing of each signal, so that the signal response of each 1 bit is more concentrated, thereby enhancing the quality of the receiving end signal.

Previously, we mentioned in the lecture that there is a tailing phenomenon in the time domain response of the channel. The tailing will affect the next symbol, which is often called inter-symbol interference (ISI). Of course, the non-ideal channel also includes crosstalk, reflection, etc. Phenomenon. For relatively high channel insertion loss, the effect of linear equalization CTLE has an upper limit, and further equalization mechanisms are needed.


figure 1

As shown in Figure 1, in the time domain, one of the most direct and original ideas is that if the subsequent impact of the current symbol can be reduced in sequence based on the judgment result of the current symbol, the impact of the ISI of the current symbol can be minimized. , or even eliminated. This is also the most intuitive description of the role of DFE.

Based on the above ideas, the focus is how to implement and implement, how to adapt to the channel, how to improve the robustness of the system, how to design the algorithm to be efficient, and a series of other issues? This series will attempt to answer these questions.


figure 2

Figure 2 The single pulse impulse response contains three postcursor quantities. The waveform of 1011 data after passing through the channel is the result of the superposition of corresponding pulse shifts. The second data 0 is prone to misjudgment. If the first symbol can be accurately judged, the result will be delayed and fed back to the input signal, and part of the amount will be subtracted appropriately, then the second symbol 0 will be easier to judge accurately. This is also the literal meaning of Decision-Feedback Equalier.

During implementation, for the continuous input data Din, the sampled data Dout is obtained through the sampler Sampler controlled by the clock Clk. After a delay of Td, it is added to the input data through the weights h1 and h2. Appropriate delay time td and weights h1 and h2 can ensure that the inter-symbol interference of the input data is reduced or completely eliminated, as shown in Figure 3.


image 3

For the structure of Figure 3, the following two issues need to be considered:

First, if Dout only samples the input data Din, there will be problems with quantization accuracy, such as how many bits are used to represent it. The quantized signal needs to determine the size and sign of the feedback weights h1 and h2. For NRZ-encoded signals, a 1-bit quantizer can be used to quantize them into logical signals 0 and 1 after sampling. For PAM4 encoded signals, a multi-bit quantizer (such as implemented with an ADC) is usually used. To facilitate subsequent data processing.


Figure 4

Second, the delay time Td. As shown in Figure 4, for the full-rate RX structure, if the rising edge is used to sample data, the impact of the previous UI needs to be reflected in the subsequent sampled data within one sampling clock cycle, that is, 1 UI is required. Here we can simply understand Td<1UI. If it is not satisfied, the elimination effect cannot be reflected.

Note that after NRZ-encoded sampler quantization, it is converted into 1-bit logic, which is a non-linear conversion process.

When the actual circuit is implemented, the sampler is also called Slicer, and the combination of Slicer and delay unit can be represented by DFF. As shown in Figure 4, the timing critical path is the yellow feedback path in the figure, which needs to meet the DFF setup time Tsetup, response time Tck-q, and adder response time Tsum. The sum of the three is less than 1UI time.

Tsetup+Tck-q+Tsum<1UI


Figure 5

Figure 5 is the most basic DFE structure. Where the Clk clock frequency is equal to the data rate (NRZ encoding). It is called the Full-Rate Direct Feedback architecture DFE.

Figure 6 shows the equalization effect of full-speed DFE on the signal. After Din is compensated by 1 Tap, the output waveform Dout is obtained. It can be seen that for high-frequency data, the judgment amplitude is significantly increased, and the probability of judging the correct data is greater.


Figure 6

Part2 DFE architecture introduction

The delay time Tck-q of the sampling DFF has not been significantly reduced, and the DFE feedback critical path timing has become increasingly difficult to meet. Therefore, these problems need to be structurally improved and optimized.

As data rates increase, the DFE of Figure 5 has many improvements and optimization structures. Mainly classified from these two aspects.

The first is the sampling working rate, which is divided into full-rate (Full-Rate), half-rate (Half-Rate), 1/4 rate (Quarter-Rate) and so on.

Secondly, from the timing critical path, it is divided into direct feedback (Direct Feedback) and open loop (Loop Unrolled) structures. The open-loop structure is also called the Speculation structure. The number of slicers in the speculative structure will be doubled. It is common to open the loop on the critical path h1 to ensure timing. In some literature, we have also seen that the first two Taps all use the prejudgment structure. The disadvantage is that a relatively large number of Slicers will be used.

Figure 7 is a half-rate implementation corresponding to the structure of Figure 5. The figure includes various time parameters included in the timing paths of h1 and h2, as well as the timing constraints that need to be met. Note that the sampling clock of the half-rate structure is half the data rate, which is the Nyquist frequency clock frequency used. Odd and Even use inverted clocks clk and clkb respectively for alternate sampling.


Figure 7

Here we need to think about the gains and losses of using half speed. Using half rate, the area and power consumption of the DFE part will increase, so what are the advantages? In fact, from a system perspective, power consumption will still be reduced, mainly because the power consumption of the clock generation circuit PLL and clock path will be significantly reduced. At the same time, a lower clock frequency can also appropriately reduce the difficulty of circuit design.

Figure 8 shows the structure diagram of the full-rate pre-judgment DFE. Tap1 adopts the pre-judgment structure and Tap2 provides direct feedback. It can be seen that by adding and subtracting h1 and the input in advance, and selecting the result of the current code unit through the result of the previous code unit. As you can see, there are three adders (Summer) here. When implementing the actual circuit, the summer and DFF of h1 are usually combined to form a Slicer containing the threshold.


Figure 8

Figures 9 and 10 show the half-rate predictive DFE structure. By adding and subtracting h1 in advance, the time of the Tap1 path can be saved and the timing margin can be increased.


Figure 9


Figure 10

Figure 11 gives a typical example illustrating the time margin of tap1 and tap2 with different structures. and the advantages of improved half-rate structures. You can reduce the time of h2, appropriately extend the time of h1, and expand it to the time of 2UI.


Figure 11

In addition, both Tap1 and Tap2 can be added and subtracted from the signal in advance, which is a DFE with a 2tap-Speculative structure, as shown in Figure 12. When the structure uses half rate, a total of 8 Slicers are required for Data sampling. The Tap1 and Tap2 coefficients C1 and C2 will be superimposed according to the symbol and the input signal in advance, and after sampling, the first two symbols will be used for Mux4to1 selection.


Figure 12

In actual implementation, there are some structures that will compress the time and further improve the speed, such as merging Summer and Slicer, etc., to reduce the delay Tsum of Summer.

For half-rate predictive DFE, if there is only one Tap, you can delay the odd and even data by multiple beats, and then make Mux selection.

In SerDes, DFE usually includes other samplers to implement DFE's adaptive equalization, CDR locking, LEQ adaptation, channel gain adjustment, digital eye diagram measurement, Figure of Metric evaluation and other functions, as shown in Figure 13 Show.


Figure 13

Part3 key functional modules

In the first two parts, it is mentioned that the most commonly used sub-modules in the DFE structure include the sampler Slicer, the adder Summer, the selector Mux, the delay unit DFF&Latch, etc. We will briefly introduce them below.

  • Slicer

The sampler Slicer is mainly used to quantize the input signal, which is equivalent to a 1-bit ADC. The quantized logic can be viewed as recovered data. Its structure is a conventional sense amplifier (Sense Amplifier, SA), or dynamic comparator (Dynamic Comparator).

Figure 14 shows some common Slicer implementation forms. Among them (a) is the basic type, (b) is a double-tail current (Double-Tail) dynamic comparator, and © is an improved double-tail current dynamic comparator. The focus of the design is to improve response speed, improve input sensitivity, reduce power consumption, reduce kickback noise, reduce input offset, etc.


Figure 14

As the data rate increases, how to reduce the output response time of the dynamic comparator may be the highest priority consideration. Only by ensuring that the data is correct does everything make sense. The output response time is closely related to the process, input differential signal amplitude, common mode level, and power supply voltage. Among them, the smaller device size provided by advanced process technology provides the most direct help for high-speed interface design.

It should be noted that in the structural circuit in Figure 14, within the high and low levels of a CLK cycle, the dynamic comparator needs to complete the functions of precharge (Precharge) and amplification (Amplification). Because half of the clock cycle is in the reset state, the sampling quantization needs to be completed in the remaining half of the clock cycle. This explains why the half-rate structure leaves more output response time for the dynamic comparator than the full-rate structure. In other words, there is a greater probability of distinguishing smaller input signals and obtaining correct data.

The output of the dynamic comparator can only maintain the output data for less than half a sampling period. Usually we need another holding circuit to "connect" the output of the dynamic comparator to maintain the sampling result for a whole sampling period.

This is what we mentioned in Figure 5, using the concept of DFF to represent the function of sampling and holding data. In fact, we use the commonly used concept Slicer, which can also generally refer to dynamic comparators and "baton-taking" delay units.

  • DFF & Latch

The delay unit required in the Slicer mentioned earlier can be a Latch or a DFF. Of course, DFF can be viewed as a cascade of two Latch structures.

When it comes to Latch, we first think of SR latch. It can be composed of NAND2 or NOR2. The basic SR Latch does not require a clock and only has different output states based on changes in the input signal. As shown in Figure 15, the inputs of the Hold state of the two structures are different and can be used in combination with the structure selection of the dynamic comparator.


Figure 15

A selector-based latch (Multiplexer-Based Latch) is composed of a transmission gate selector and an inverter and is controlled by a clock. As shown in Figure 16, (a) and (b) are high-level latches and low-level latches respectively. © in the figure is the implementation details of the selector and inverter.


Figure 16

By cascading the high and low level latches in Figure 16, a master-slave edge-triggered flip-flop can be obtained. The trigger mode can be rising edge or falling edge. Its timing and structure are shown in (a) and (b) in Figure 17.


Figure 17

In addition to the above-mentioned CMOS level latches, half-level CML latches are also commonly seen in the literature.

  • Summer

The purpose of the adder is to complete the addition and subtraction operations of the input Din and Tap quantities hn.

Figure 18 shows the CML structure resistor load adder. Each tap value is added through current and converted into a voltage signal through the resistor for Slicer sampling and comparison. The structure of the resistive load has output swing, power consumption, and settling time considerations. The settling time is determined by the time constant tau of the output node. For example 3tau can establish about 95% accuracy. Mainly affects the timing parameter Tsum.


Figure 18

Figure 19 is a current integrating (Current Integrating) type adder, which performs a phased and repeated operation of resetting and integrating the load capacitance CL through the clock. The output is a return to zero (RZ) signal. This structure has higher energy efficiency than the resistive load adder, but it is mentioned in the literature that this structure will have a direct integration loss of 3.9dB.


Figure 19

Figure 20 is a Charge Steering type adder. Razavi mentioned that this structure has zero static power consumption and a new adder implementation with significantly reduced dynamic power consumption. In the reset phase, the X and Y points are reset to the power supply, so the check output Vx-Vy can actually be regarded as the RZ signal.


Figure 20

It should be noted that in order to achieve a certain accuracy in DFE compensation, the Tap quantity hn is usually implemented using IDAC, which is faster. The accuracy and range of different taps will vary.

  • Mux

The function of Mux itself is relatively simple. Its purpose is to select multiple paths (more often 2 path data) for output. It can be implemented directly using a transmission gate structure, or using an inverter with 3-state control. I will not give an example here.

  • Merged Structure

In addition to the above basic structures, hybrid structures are also mentioned in some literature, which can further reduce the time of the tap feedback path.

The thresholded Slicer mentioned in Figure 8 is considered a hybrid structure. Additive Summer and Sampler fusion of tap1 weight h1. The structure is shown in Figure 21. Tap compensation for the input signal is achieved through the tap voltage of the DAC pig or directly using the logic control input pair tubes M11 and M21. Among them, the structure directly controlled by logic is especially suitable for the pre-judgmental DFE design of one Tap in low-power consumption scenarios. After all, a certain speed and accuracy are sacrificed to achieve the purpose of low power consumption.


Figure 21

The Slicer concept mentioned above can actually be regarded as a hybrid structure that combines the Sampler and the delay unit Latch.

Decision feedback equalizer DFE (medium)

4. Transient waveforms and eye diagrams

5. The impact of DFE on edges

6. Introduction to adaptive algorithms

Part4 Transient Waveforms and Eye Diagrams

The different structures mentioned in the Decision Feedback Equalizer DFE (Part 1) chapter will have different transient waveforms. The eye diagram before sampling intuitively reflects the amplitude Margin of the sampler and the margin of the CDR sampling moment.

For comparison, the VerilogA ideal model is used here, and the excitation conditions are kept as consistent as possible. The model data rate is 10GT/s and the input signal is 800mVppd. Channel insertion loss is about 10dB@5Ghz, no CTLE, only DFE compensation, 2 Taps. After passing through the channel, the data eye diagram before sampling is shown in Figure 1.


figure 1

The transient correlation waveforms and steady-state eye diagrams of the full-rate direct feedback structure are shown in Figures 2 and 3 respectively.


figure 2


image 3

The convergence process is shown in Figure 4. It can be clearly observed that the dfe compensation effect will cause the sampling clock to move forward.


Figure 4

The transient correlation waveforms and steady-state eye diagrams of the half-rate direct feedback structure are shown in Figures 5 and 6 respectively. Figure 5 shows the adder input and adder parity output waveforms, which are alternately sampled through the interleave method.


Figure 5


Figure 6

Figure 7 is the eye diagram of the odd-even path convergence process.


Figure 7

Figure 8 is the same as Figure 7. It is also an eye diagram of the improved half-rate direct feedback structure. I wonder if you can see the difference?


Figure 8

For the prejudgment full (half) rate structure, the transient waveform and eye diagram are similar to the above two direct feedback structures, except that the tap1 component is not reflected on the summer node, and the tap1 amount needs to be added to the data based on the judged data selection. Or minus.

Part5 DFE’s impact on edges

Traditional 2x oversampling CDR anchors the data sampling position through edge positioning. The jitter and distribution of the edge position will actually affect the data sampling margin. Especially in the case of relatively large insertion loss, the eye margin itself is relatively small. .

The previous series of animations can clearly see the influence of DFE on the phase of the sampling clock. As the compensation amount of each tap gradually increases, the phase of the sampling clock gradually moves forward under the adjustment of CDR. The change of cdr phase locking position reflects the influence of dfe on the edge.

Figure 9 simply illustrates the impact of the addition and subtraction of the DFE Tap1 amount on the edge. The actual edge data changes are closely related to the data pattern, Tap order, and response time. For the data conversion of 1-0, the pattern type is 110, and you can see that the Tap1 amount causes the edge to advance; and the pattern type is 010, because there is a reverse sign of the Tap1 amount, and the impact on the edge is related to the Tap amount response time.


Figure 9

In fact, for multi-tap DFE compensation, the edge contains the influence of historical multi-bit data. Comprehensive impact assessment needs to be implemented in combination with specific circuits.

In fact, the most obvious edge influence is the predictive structure. Because the Tap1 amount addition and subtraction processing is after summer. If edge sampling is not processed, the locked data sampling position will be later. As shown in Figure 10.


Figure 10

So how to solve this problem? I’ll leave it to you to think about.

Figure 11 is the result after edge compensation processing. It can be seen that the equivalent eye diagram convergence result is better.


Figure 11

In addition, the impact on the 2x oversampled CDR clock phase. It is also necessary to consider the order of data and edge, which is closely related to the specific circuit implementation structure. Which order is better needs to be carefully evaluated.

Here is a summary of the DFE convergence values ​​of several structural ideal situations, for reference only.


Figure 12

Finally, I will send you the approximate convergence process of the cover image synthesized from the calculation results of the Matlab tool SerDes Toolbox.


Figure 13

Part6 Introduction to Adaptive Algorithms

At present, the adaptive filter theory is very mature and has been widely used in the communication and consumer fields. Such as channel equalization, beamforming, noise cancellation, etc. As shown in Figure 14.


Figure 14

The adaptive filter is shown in Figure 15. Simply put, the filter is adjusted to make the filter output signal close to or equal to the target signal. Mainly the processing of discrete signals by (non)linear discrete time systems. Performance Function is usually used to evaluate the performance of different filter coefficients. For example, use the mean-squared error (MSE) of the error signal. The filter with minimum mean square error is also called Wiener Filter . Various adaptive algorithms are designed to obtain the Wiener Solution of the filter.


Figure 15

In the field of communications, adaptive filters are mainly used for channel equalization (Channel Equalization), as shown in Figure 16. Channel H(z) in the figure is the channel transfer function. Noise v(n) can be regarded as a combination of non-ideal factors such as crosstalk and noise. The equalizer W(z) in the figure is an adaptive filter that eliminates inter-symbol interference and reduces the impact of crosstalk and noise. The equalizer output y(n) needs to satisfy the quantized output data within the specified bit error rate range.


Figure 16

Figure 17 is the mathematical derivation of the Wiener filter 1. The filter coefficient satisfies the Wiener-Hope equation . The Wiener solution form includes the input data autocorrelation matrix R and the relationship between the input data and the target data cross-correlation vector p . It can be seen that the Wiener solution can be achieved in one step, but it will involve more complex matrix operations.


Figure 17

Another method uses an iterative search algorithm to gradually and iteratively converge to the optimal weight vector w o from any Tap weight vector w i . Here we will mention the gradient- based method of steepest descent. Figure 18[2] gives an animation, which illustrates the gradient descent method more vividly. There must be a lowest point in the entire performance error surface (which may include multiple local lowest points). This point is also the position of the Wiener solution.


Figure 18

The schematic diagram and derivation process of gradient-based steepest descent are shown in Figure 19. It can be seen that the gradient-based steepest descent method can move towards the Wiener solution position in the fastest way in the direction of negative gradient. At the same time, the iteration step gain μ and the characteristic root λi of the autocorrelation matrix R need to satisfy a certain relationship to ensure the stability of the convergence process.


Figure 19

The gradient-based steepest descent method in Figure 19 still needs to know the autocorrelation matrix R and the cross-correlation matrix p . The calculation process is still relatively complicated. We know that the aforementioned gradient is the gradient of the expected error squared. If we replace the expected error gradient with the instantaneous error gradient, the iterative process can be simplified and implemented as an LMS algorithm, as shown in Figure 20.


Figure 20

The LMS iteration process in Figure 20 uses the instantaneous error e(n) and the historical data matrix x(n) . There will still be floating point operations (multiplication and addition) involved. Computing hardware consumption is also relatively large. This leads to the simplified version shown in Figure 21. Mainly errors or symbolization of historical data. The computational complexity can be simplified to varying degrees. However, the cost of simplifying the calculation in the figure is the convergence time, and there may be a larger residual mean square error MSE.


Figure 21

Decision Feedback Equalizer (DFE), as its name suggests, needs to use historical decision values ​​to form a feedback FIR filter to compensate the input signal. As shown in Figure 22. It should be noted that the Tap feedback to the adder in the figure is negative, that is, subtraction is implemented. At the same time, the definition of the error signal is also opposite to the aforementioned basic filter definition. Negative is positive. Therefore, the sign of the weight iteration expression is also the same as the previous result.


Figure 22

In Figure 22, when the sampler quantization accuracy is 1 bits, the sign-sign LMS in Figure 21 is implemented. When the actual circuit is implemented, the hardware implementation cost can be further reduced through time-division multiplexing and template data selection.

When the filter converges, the tap coefficients tend to stabilize. The transfer function of the filter output value and input data and quantization error is shown in Figure 23.


Figure 23

Figure 24 is a simple first-order DFE transfer function and zero-pole distribution. It can be seen that the signal transfer function is high-pass (high-order DFE frequency response characteristics do not necessarily have high-pass characteristics across the entire frequency range), attenuating low frequencies and boosting high frequencies. This compensates for the smearing of the input data. The overall gain of the error transfer function is attenuated.


Figure 24

DFE summarizes the following characteristics:

1. It has the characteristics of attenuating low frequencies and compensating high frequencies, but will not amplify noise and crosstalk.

2. Symbolic quantification has nonlinear characteristics.

3.DFE cannot eliminate prescripts, only postscripts.

4.DFE has the possibility of incorrect transmission.

5.DFE increases the complexity of CDR phase detection while eliminating ISI.

Decision feedback equalizer DFE (Part 2)

7.VerilogA model

8. Explanation of model results

9. DFE under PAM4

7.VerilogA model

Verilog-A Hardware Description Language (HDL) is a hardware description language defined by IEEE for describing the behavior of analog systems (Analog Systems). By modeling sub-modules, system-level model construction and simulation verification can be achieved in the early stages of the project. Provide feasibility analysis during the research phase. The top-down (TOP-DOWN) design process facilitates the clear definition of key sub-module parameters and indicator requirements from a system perspective.

As shown in Figure 1, Verilog-A is a pure analog subset (Analog-Only Subset) of Verilog-AMS. Combining with verilog or VHDL can achieve system verification of digital-analog mixed signals. At present, mainstream simulator tools such as spectre, spice, ADS, etc. all support Verilog-A language.


figure 1

Compared with Verilog/VHDL language, analog designers are more familiar with Verilog-A and are more friendly. Common modules such as op amps, comparators, AD/DA, PLL, etc. It can easily realize ideal replacement, and facilitate and accelerate simulation design verification.

In the previous article, we performed pure Verilog-A modeling on the SerDes Rx part, and focused on the analysis of the small system DFE.

In fact, CTLE, DFE, CDR, DESER, etc. can be modeled and implemented, and can also form TX and PLL, and ultimately form a large system with hierarchical relationships.

Usually EDA tools will provide Verilog-A help documentation. Detailed syntax explanations and practical examples can help novices get started quickly. Figure 2 briefly lists common functional functions in Verilog-A.


figure 2

For specific modeling needs, Reference documents and Internet resources can be used as references. With more practice and use, I believe that you will soon be able to build the modeling requirements for the basic functions of the module. More complex functions require a deep understanding of the non-ideal characteristics of the circuit and customization of high-order functional performance. The textbook in Figure 3 provides instructions for AMS system design and simulation. Those who are interested can refer to it.


image 3

8.Explanation of model results

Here we focus on introducing DFE-related models, involving Verilog-A modeling of the simulation part and Verilog or Verilog-A implementation of the algorithm.

Figure 4 lists the DFE simulation module, including adders, dynamic comparators, multipliers, delay units, multiplexers, data alignment (Align) and DFE algorithm VA models.


Figure 4

The implementation of the algorithm can be implemented in Verilog or VHDL in digital HDL language, and finally verified using hybrid simulation. Of course, in the initial model stage, the algorithm model implemented using VerilogA can use floating-point data to implement an algorithm demo that is closer to the basic principles. The scale is also smaller and the verification speed will be faster.

Through the basic model unit in Figure 4, you can build the different architectures mentioned in the previous DFE series articles, and compare and evaluate the architecture differences and performance. Figure 5 shows one of the DFE evaluation subsystems.


Figure 5

The sub-module functions of DFE in Figure 4-5 only include basic functions. Of course, more complex functions may also be implemented. For example, the adder settling time, tap resolution, input offset, noise and delay time of the dynamic comparator. Update rate, accuracy, etc. of the convergence algorithm.

Model complexity and simulation speed are positively correlated. This requires selecting appropriate model accuracy based on the purpose of modeling verification.

Finally, it should be noted that the receiver part includes LEQ equalization and algorithm, CDR loop, DFE compensation and algorithm. Multiple loops are coupled with multiple algorithms, causing mutual influence. All small system function point evaluations are ultimately Performance regression needs to be done on the overall system as a whole to ensure the accuracy of small system evaluation.

9.DFE under PAM4

As the SerDes line rate has almost doubled, changes in signal bandwidth are facing the problem of increasingly higher channel insertion loss, which also poses higher challenges to equalization capabilities.

At present, it seems that the higher-order encoding method, changing from NRZ (or PAM2) to PAM4, is a more effective solution. Thus, the data rate is doubled while maintaining the Nyquist frequency. The changes of PAM4 encoding relative to NRZ are shown in Figure 6.


Figure 6

The amplitude of the PAM4 relative NRZ encoded signal drops to 1/3, the SNR drops by 9.5dB, and the absolute eye width will also decrease. These will have higher requirements on the implementation of the receiving side, but in general, it is beneficial to sacrifice the signal-to-noise ratio in exchange for maintaining the same bandwidth.


Figure 7

Currently, from the perspective of mainstream protocol standards, Figure 8 shows the different interconnection scenarios defined by the Optical Internet Forum OIF for 56/112G. The channel distance SR-MR-LR and signal coding are all differentially defined. For different scenarios, the receiving side can be The architecture is actually designed to be differentiated.


Figure 8

The current receiving-side architecture of PAM4 encoding, targeting SR low insertion loss scenarios, is mainly based on the implementation of the traditional NRZ upgrade structure, as shown in Figure 9[1] and Figure 10[2]. Including multi-level quantization, PAM4 encoding and decoding, CDR phase detection pattern changes and other new functional points.


Figure 9

XSR/VSR scenarios with small insertion loss can ensure a relatively low bit error rate and achieve high-performance data transmission without even requiring an error correction mechanism.


Figure 10

For the high insertion loss scenario of the MR-LR channel, Figure 11 is the SerDes RX structure based on the ADC sampling structure [3]. The receiving end AFE includes equalization of CTLE+VGA. After TI-ADC sampling and quantization with 6-8bits accuracy, in the digital domain Use DSP to achieve combined equalization of FFE+DFE.

This ADC+DSP implementation has high potential because of its subsequent powerful digital equalization capabilities. It is becoming more and more widely used in ultra-high-speed medium and long-distance applications.


Figure 11

Figure 12 is the DSP implementation block diagram provided in literature [4], including multi-order FFE+1Tap DFE. The balanced algorithm implemented in the digital domain has a lot of benefits, because the algorithm implementation is relatively insensitive to the process, can make full use of the characteristics of advanced processes, and can achieve high speed, low power consumption, and small area goals. The digital implementation is also more flexible, and migration during process evolution is easier. Of course, it also tests algorithm tuning, algorithm robustness, etc.


Figure 12

Finally, here is a simple demo that uses excel to simulate the discrete-time domain DFE algorithm, as shown in Figure 13-15. A full-rate direct feedback structure is adopted, Vref and Tap are adaptive, and Tap convergence still uses the SS-LMS algorithm introduced above.


Figure 13

As shown in Figure 13, the main index H0 (Main Cursor) of the impulse response is 0.6, and the suffix H1-H3 are 0.2, 0.15 and 0.05 respectively. There are also differences in the Vref/Tap update step size, and all three DFE Tap compensations are turned on.


Figure 14

Figure 14 is the convergence process of Vref and three Taps. It can be seen that when the convergence is completed, the suffixes H1-H3 are almost perfectly eliminated.

Figure 15 is a scatter diagram of the sampling point convergence process. The eye height can be increased from 0.4Vppd to 1.2Vppd.


Figure 15

That’s all for this series about DFE, I hope you can gain something from it.

Guess you like

Origin blog.csdn.net/han_better/article/details/130743753