CDC design and verification issues

CDC design
Clock Domain Crossing (CDC) design, namely cross clock domain design. We know that metastable state means that the trigger cannot reach a confirmable state (0 or 1) within a certain period of time. In a multi-clock domain design, metastability cannot be avoided, but we can offset the adverse effects of metastability.
Figure 1 shows an example of failure to synchronize data from one clock domain to another clock domain. This causes the output to enter a metastable state. The reason is that the sampling clock edge is too close to the sending clock edge, and enough data must be reserved to keep it stable. For sampling time. Insert picture description here
Metastability is an issue that cannot be ignored. As shown in Figure 2, due to the uncertainty of the output, traversing the metastable output in the receive clock domain will cause illegal signal values ​​to propagate throughout the rest of the design. Since the CDC signal may fluctuate for a period of time, the input logic in the receiving clock domain may recognize the logic level of the fluctuating signal as a different value, thereby propagating the error signal to the receiving clock domain. Insert picture description here
Each flip-flop used in the design has a definite setup and hold time, that is, and legally specifies the time that the input data remains unchanged before and after the rising edge of the clock. The existence of this time window is to prevent the signal from changing too fast and too close to another synchronization signal and causing the output to enter a metastable state.
Synchronizer
Two scenarios need to be considered for synchronizing data:
1. It is allowed to miss some sampled signals during cross-time transmission;
for this scenario, it means that in some designs, it is not necessary to sample every value, but the data collected is required to be accurate . For example, a collection of Gray code counters used in a standard asynchronous FIFO design. The counter does not need to sample every legal value from another clock domain, but it is essential that the sampled value can accurately identify full/empty condition.
2. Every signal during cross-time domain transmission must be sampled;
every CDC signal must be accurately sampled.

In the above two scenarios, the common requirement is that the signal needs to be synchronized from one clock domain to another clock domain, so, the synchronization issue will be discussed next.
The
first flip-flop of the TWO flip-flop synchronizer samples the asynchronous input signal into a new clock domain, and waits for a complete clock cycle, so that any metastable attenuation on the output signal of the first stage is completed, and then passed The same clock samples the level 1 signal into the second level flip-flop, the purpose is to make the second stage signal now a stable and effective signal, ready to be distributed in the new clock domain. Insert picture description here
In theory, when the synchronization signal reaches the second-level flip-flop, since the first-level signal may still have a long metastable state, it may also cause the second-level output signal to also become metastable. In this case, the probability of occurrence can be inferred by MTBF. The calculation of the time probability of synchronization failure (MTBF) is a function of multiple variables, including the clock frequency used to generate the input signal and provide the clock for the synchronization trigger.
For most synchronization designs,
two flip-flop synchronizers are sufficient to eliminate all possible metastability.

MTBF-mean time before failure
In most designs, for any signal that crosses the CDC boundary, the mean time before failure (MTBF) must be calculated. In this sense, failure means that the signal is transmitted to the synchronization trigger, becomes metastable on the first-level synchronizer trigger, and one cycle later, when it is sampled into the second-level synchronizer trigger , Is still metastable.
Since the signal has not stabilized to a certain value after one clock cycle, the signal may still be metastable after being sampled and transmitted to the receiving clock domain, leading to potential failures in subsequent logic. When calculating MTBF, big numbers take precedence over small numbers. The larger the MTBF value, the longer the time interval between two potential failures. Although the smaller MTBF value indicates that metastability may occur frequently, it will also lead to design failure.

Related papers give an expression to calculate the MTBF of the synchronization circuit. It can be seen from the expression that the two most important factors that directly affect the MTBF of the synchronizer circuit are: the sampling clock frequency (how fast the signal is sampled into the receiving clock domain) and the data change frequency (how much data changes across the CDC boundary) fast). That is, failures occur more frequently in high-speed designs (shorter MTBF), or when the sampled data changes more frequently.
Insert picture description here

Three flip-flop synchronizer
For some high-speed designs, the MTBF of the 2-level flip-flop synchronization is too short, so the three-level flip-flop is used to increase the MTBF value to make the design meet the time requirement.
Insert picture description here
Synchronizing the signal
from the transmit clock domain Is it a good idea to register the signal from the transmit clock domain before being transmitted to the receive clock domain? Hidden here is an assumption that the CDC signal will be synchronized to the receive clock domain. Therefore, they do not require synchronization in the transmit clock domain. This assumption is not reasonable, and it is usually required to register signals in the transmit clock domain. Consider such an example, in Figure 6 below, the signal in the transmit clock domain is not registered before being transmitted to the receive clock domain.
Insert picture description here
It can be seen from this example that the combinatorial logic output obtained from the transmit clock domain may experience a combinational setting at the CDC boundary. This combination of establishment effectively increases the frequency of small oscillation data bursts that may be caused by data changes, and increases the number of edges that can be sampled when the data is changing. Therefore, the metastable scene caused by sampling the constantly changing data The probability of occurrence has increased.

Synchronize the signal to the receive clock domain. The
Insert picture description here
aclk domain signal is registered in the adat flip-flop before being transferred to the bclk domain. The adat trigger filters out the combined setting signal on input (a) and passes the pure signal to the bclk domain.

Synchronizing fast signals to slow clock domains
For cross-time domain transmission, the bit width of the transmission signal and synchronization technology must be considered. A problem with synchronizers is that the signal from the transmit clock domain may change values ​​twice before being sampled, or get too close to the sampling edge of the slower clock domain. Whenever a signal is sent from one clock domain to another, it must be determined whether the missing signal is important to the current design.
When missing samples are not allowed, we have the following two methods to solve the problem:
(1) An open-loop solution that can ensure that the signal is captured without ACK (acknowledgement);
(2) A closed-loop solution The solution requires the received signal across the CDC boundary to feed back ACK;
these two solutions are discussed below.

Requirements for reliable signal transmission between clock domains
When the clock frequency of the fast clock domain is 1.5X or more than that of the slow clock domain, synchronizing the control signal of the slow time domain to the fast clock domain usually has no problem, because The fast clock signal will sample the slow CDC signal one or more times. Compared with sampling a faster signal into the slow time domain, sampling a slow signal into the fast time domain may cause more problems. Designers can take advantage of this feature and use a simple two-stage flip-flop in one of the two clock domains. The single-bit CDC signal is transmitted synchronously between.

The "three edge" requirement
related papers pointed out that when a CDC signal is transmitted between two clock domains through a 2-level flip-flop synchronizer, the CDC signal must be greater than 1-1/2 times the width of the receive domain clock cycle. That can be described as "the input data value must remain stable on the three target clock edges."

For extremely long source clock and target clock frequencies, the requirement can be safely relaxed to 1-1/4 times or less of the cycle time of the receiving clock domain, but the "three-edge" criterion is the safest initial design condition, and It is easier to prove by using SystemVerilog assertions than by dynamically measuring the fractional width of the CDC signal during simulation.

The "three-edge" requirement actually applies to open-loop and closed-loop solutions, but the implementation of closed-loop solutions automatically ensures that at least three edges are detected for all CDC signals.

When passing a fast CDC pulse,
when the frequency of the transmitting clock domain is higher than the frequency of the receiving clock domain, if the CDC pulse is only one cycle wide in the transmitting clock domain. If the CDC signal only pulses in a fast clock cycle, the CDC signal may go high and low between the rising edges of the slower clock, and will not be captured in the slower clock domain, as shown in Figure 8. Shown.Insert picture description here

Sampling a long but not long enough CDC pulse.
When the sending clock domain sends a pulse that is slightly wider than the receiving clock frequency to the receiving clock domain, in most cases, the signal will be sampled and passed, but CDC pulses may also appear It will change too close to the two rising clock edges of the receiving clock domain, thus violating the setup time of the first clock edge and the holding time of the second clock edge, and the expected pulse cannot be formed. This possible failure is shown in Figure 9.Insert picture description here

Open-loop solution-
a possible solution to this problem using synchronizer sampling signals is to set the CDC pulse pulse width to exceed the sampling clock period for a period of time, as shown in Figure 10, the minimum pulse width is the receiving time domain clock period 1.5 times, assuming that the CDC signal will be sampled 1 or 2 times by the receiving clock edge. When the relative clock frequency is fixed and analyzed correctly, open loop sampling can be applied.
Advantages:
fast speed, no need to wait for ACK confirmation signals;
disadvantages:
design requirements change, engineers may not be able to analyze the original open-loop solution. This problem can be minimized by adding SystemVerilog assertions to the model to detect whether the input pulse fails to exceed the "three-edge" design requirement.
Insert picture description here
Closed-loop solution-use synchronizer to sample the signal. The
closed-loop strategy is to send an enable control signal, synchronize it to the new clock domain, and then transmit the synchronized signal to the send clock domain through another synchronizer as an ACK confirmation signal.
Advantages: Synchronize the feedback signal to the sending clock domain, which can safely and accurately confirm that the first control signal has been identified and sampled by the new clock domain;
Disadvantage: Before the control signal is allowed to change, the synchronization control signals in the two directions are relatively different. Big time delay.Insert picture description here

Transmitting multiple signals across time domains
When passing multiple signals between clock domains, a simple synchronizer cannot guarantee the safe transmission of data. In the multi-time domain design process of transferring multiple CDC bits from one clock domain to another in the same transaction, engineers often make mistakes and easily overlook the importance of synchronous sampling of CDC bits. Because the data is unavoidably skew in the process of synchronizing multiple signals, even if the trace length of the multiple signals is controlled well, the skew problem and the resulting sampling failure cannot be avoided. A multi-bit CDC strategy must be adopted to avoid sampling to offset multi-bit values.
Multi-bit CDC strategy In
order to avoid multi-bit CDC skew sampling, the multi-bit CDC strategy is divided into three main categories as follows:
(1) Multi-bit signal combination.
Combine multiple CDC bits into a 1-bit CDC signal as much as possible.
(2) Multi-cycle path formula.
Use a synchronized load signal to safely transfer multiple CDC bits.
(3) Use Gray code to transmit multiple CDC bits.
The rest of this section will detail each of these strategies.
(1) Multi-bit signal combination
If possible, can multiple CDC signals be combined into a single bit before transmission? As shown in the figure below, each CDCbit is synchronized by a synchronizer. If the order and alignment of the control signal are important, whether the signal can be correctly transmitted to another clock domain is a problem that must be paid attention to. Insert picture description here
As can be seen from the example in Figure 12, in order to write data into the register, the register of the receiving clock domain needs to receive the load and enable signals from the transmitting clock domain in order. If the load and enable signals are simultaneously on the same transmit clock edge Drive, the slight offset skew between the two may cause it to be synchronized to two different clock cycles. Because load and enable are not aligned, data cannot be written to the register.
In order to solve this problem, we combine the two control signals. As shown in Figure 13, there is only one load_enable signal to drive the load and enable signals in the receiving clock domain. The combination eliminates the problem of the deviation of the two control signals.
Insert picture description here
Two phase shift sequence control signals
As shown in Figure 14, the two control signals b_ld1 and b_ld2 of the sending clock domain are driven from the sending clock domain to the receiving clock domain in order to control the input enable signal of the pipeline data register. The problem is that in the first clock domain, the b_ld1 control signal may be invalid before the b_ld2 signal is pulled high. The rising edge of the receive clock may appear in the tiny gap between the two signal pulses, resulting in the use of the receive clock domain. Can control the formation of a periodic gap in the signal chain. This makes the a3 register unable to get the a2 data in time.
Insert picture description here
The solution to the above problem is to combine the control signals and add a trigger. As shown in Figure 15 below.
Insert picture description here
Multiple CDC signals
Figure 16 shows two coded control signals passed between the two clock domains. If the two encoded signals are slightly skewed during sampling, in the receive clock domain, an incorrect decoded output may be generated within one clock cycle. Insert picture description here
(2) Multi-cycle path formula
MCP formula and FIFO technology can solve the problem of multiple CDC signal transmission. Two multi-cycle path formula MCPs are given below to solve this problem:
(a) Closed loop-MCP formula with feedback;
(b) Closed loop-MCP formula with ACK acknowledgement feedback;
There are also two FIFO strategies that can be used The closed-loop solution to this problem:
(c) Asynchronous FIFO implementation;
(d) 2-Depth FIFO implementation;
Insert picture description here
Advantage:
No need to send clock domains to calculate the appropriate width of the sending pulse between clock domains.
Disadvantages:
The sending clock domain only needs to switch the enable signal to the receiving clock domain to indicate that the data has been transferred and is ready to be loaded. There is no need for the enable signal to return to its initial logic level.

This strategy can transmit multiple CDC signals without synchronization, and simultaneously transmit the synchronized enable signal to the receive clock domain. Before the synchronization enable signal reaches the receive clock domain register through synchronization, the receive clock domain is not allowed to sample the multi-bit CDC signal.

It is called MCP because the unsynchronized data codeword is directly transferred to the receive clock domain and maintains multiple receive clock cycles, and before the unsynchronized data codeword is allowed to change, the enable signal is allowed to synchronize and the receive clock is recognized Domain.
Because the unsynchronized data will remain stable after multiple clock cycles before sampling, the sampled value will not be in danger of metastability.
MCP formulations that use synchronization enable pulses
may transmit synchronization enable signals between clock domains by triggering the enable signal that is passed to the synchronization pulse generator to indicate that unsynchronized ones can be captured on the next receive clock edge. Multi-cycle data codewords.
Insert picture description here
The key to the generation of this synchronization enable pulse is that the polarity of the input signal does not matter. In Figure 18, the d input flips to a high level in the first cycle. When it is cycle4, it is high and propagates through three synchronous flip-flops. In cycle3, the polarity of the output q2 and q3 flip-flops are different. , Cause the synchronization enable pulse to be formed at the output of the XOR gate in the same cycle. Similarly, the input in cycle7d flips to a low level. By cycle10, the high-level signal has propagated through the three synchronous flip-flops. In cycle9, the outputs of the q2 and q3 flip-flops have different polarities, causing the enable pulse to be formed at the output of the XOR gate.
In the above discussion, the synchronous enable pulse generation circuit is used. It is more efficient and practical if the circuit is represented by an equivalent symbol. The specific equivalent symbol is shown in Figure 19.
Insert picture description hereIn addition to generating pulses with arbitrary d input polarity, the synchronization enable pulse generation circuit also has a q output delayed by 3 clock cycles following the d input. The q output is often used as a feedback signal and is used as an acknowledgement signal ACK by the sending clock domain The other synchronization enable pulse generation circuit in the transmission.
Figure 20 shows a typical transmit-receive trigger pulse generator design.
Insert picture description here
Using this technique requires that the receive clock domain has appropriate logic to capture data when a pulse is detected, because the pulse is only valid during one receive clock cycle of each multi-period data word.
Closed loop-MCP formulation with feedback
An important technique when using MCP formulations is to transmit the enable signal as an acknowledgement signal back to the transmit clock domain, as shown in Figure 21.
Insert picture description here
In the example shown in Figure 21, the confirmation feedback signal b_ack generates an confirmation pulse aack, which is used as the input of a one-bit state machine (state is READY-BUSY). When the state is READY, it indicates the input data adatain of the sending clock domain. You can change the data preparation input, that is, once the already signal is pulled high, it is convenient to send new data and control signals freely.

This is an automatic feedback path, which assumes that the receiving clock domain is always ready for the next data synchronized by the MCP formulation.
Closed loop – MCP formulation
with acknowledgement feedback The closed loop strategy with acknowledgement feedback means that both the sender and the receiver have feedback, as shown in Figure 22 below. The signal is enabled only after the bload pulse in the receive clock domain confirms that the data has been received It is fed back to the sender as a confirmation signal.
Insert picture description here
For the example in Figure 22, the receive clock domain has a WAIT-READY FSM, which is used to send a valid signal (bvalid) to the receive logic when the data is valid on the input of the data register. The data is not actually loaded until the receiving logic confirms that the data should be loaded by setting the bload signal. Before the data is loaded, there will be no feedback to the sending clock domain, and then the b_ack signal with automatic feedback is sent back like the MCP formulation.
This feedback path requires manipulation of the receive clock domain before capturing data and sending feedback.

About other technologies of multi-bit CDC The
following introduces two interesting FIFO implementations that can be used to solve multi-bit CDC signal integrity:
(1) Asynchronous FIFO implementation
(2) 2-deep FIFO implementation
Use asynchronous FIFO for Multi-bit CDC signal Transmission To
transmit multi-bit data or control signals across clock domains, it is undoubtedly a good way to use asynchronous FIFO. Asynchronous FIFO is a shared memory or register buffer area in which data is written in the transmit clock domain and data is read in the receive clock domain.
A standard asynchronous FIFO allows data to be written to it when it is not full, while allowing the receiver to read data from it when it is not empty. Therefore, the key to the problem is to judge whether it is empty or full, which can be easily realized by using Gray code counters. As for the reason, please refer to the blog: Transmitting Data in Cross-Time Domain for details .
The design of cross-time domain transmission using 1-deep/2-register FIFO synchronizer is as follows.
Insert picture description here
Since this FIFO is only composed of two registers or 2-deep dual-port RAM, the gray code counter used to detect the full-empty flag is a simple toggle trigger, but in fact it is only a 1-bit binary counter.
After reset, both pointers are cleared and the FIFO is empty, so the FIFO is not full. We use the under-inversion condition to indicate that the FIFO is ready to receive data or control words (wrdy is high).
After putting data or control word into FIFO (using wput),
wptr switches and the FIFO becomes full, in other words, the
wrdy signal becomes low, which also disables the function of switching wptr, so it also disables putting another word into 2 The function of the register FIFO, until the receive clock domain logic deletes the first word from the FIFO.
What’s particularly interesting about this design is that wptr now points to the second position in the 2-register FIFO,
Therefore, when the FIFO is ready again (when wrdy is high), wptr has already pointed to the next location to be written.
The same concept is replicated at the receiving end of the FIFO. When data or control words are written into the FIFO, the
FIFO is not empty. We use the inverted non-empty condition to indicate that the FIFO has data or control words ready to receive (rrdy is high).
By using two registers to store multi-bit CDC values, we are able to remove one clock cycle from the sending MCP formula and another clock cycle from the confirmation feedback path.

Clock-oriented design division
Design division is that each module allows only one clock to be used, and all meet the Timing requirements, so static timing analysis will become very easy for each clock domain in the design.
Recommendation 1: Create a synchronizer module to pass signals from one clock domain to another clock domain, and only allow one clock per synchronizer module.
Reason: Assume that any signal passing from one clock domain to another will eventually encounter setup and hold time problems. Isolating the CDC boundary logic can significantly reduce the design and verification workload of multi-clock designs.
In most cases, the synchronizer module will be the only module in the design that has deliberate violations of setup and hold time. It is a known fact that time violations occur when signals are passed between asynchronous clock domains, which is why a synchronizer must be added to the design.
Insert picture description here
In the example shown in Figure 30, it is a design with 3 clock domains, each of which is labeled aclk, bclk, and cclk. All aClk design modules are combined into one aClk logic block, as is b/c. Any signal from the asynchronous clock domain must pass through the synchronizer module before being allowed to drive the input of another logic module.

Conclusion
Cross-time domain transmission is prone to errors, leading to design failures. Therefore, we must follow the correct cross-time domain transmission strategy to guide the design.

Recommendations for single-bit CDC transmission
* The output of the sending clock domain is a register type to eliminate glitches that may be caused by combinational logic;
* Synchronize the signal to the receiving clock domain. MCP formulation can be used when necessary;

Multi-bit DCD suggestion
* Combine – Combine multiple signals as much as possible into 1 bit and synchronize to the receiving clock domain;
* Use MCP formulation to transmit multi-bit numbers across time domains;
* Use FIFOs to transmit data signals or control signals;
* Use Gray code counters ;

Guess you like

Origin blog.csdn.net/lilliana/article/details/106642667