Processing across clock domains

Cross-clock domain processing is a frequently encountered problem in FPGA design, and how to handle data between cross-clock domains can be said to be a compulsory course for every FPGA beginner. Cross-clock domain processing is also a frequently asked question in interviews .
Pulse signal: Following the clock, the signal transitions.
Level signal: The signal transitions without following time.

1. Single-bit inter-frequency transmission

There are mainly two cases.

The first: signal from B to A, (slow to fast)

The pulse signal pulse_b in the clock domain B is a very wide "level" signal in the view of the clock domain A, which maintains multiple clock cycles of clk_a, so it must be picked up by clk_a. The experience design collection process must be registered for two beats. The first beat synchronizes the input signal, and the synchronized output may cause conflict of setup/hold time, resulting in metastable state. One more shot needs to be stored to reduce the impact of metastability. Generally speaking, two stages are the most basic requirements. If it is a high-frequency design, it is necessary to increase the number of register stages to greatly reduce the instability of the system. That is to say, multi-stage flip-flops are used to sample the signals from the asynchronous clock domain. The more stages, the more stable the synchronized signals.
 
It should be emphasized that pulse_b must be a register signal under clk_b at this time. If pulse_b is a combinational logic signal under clk_b, you must first use a D flip-flop (DFF) in clk_b to capture a shot, and then use a two-stage DFF to clk_a is passed. This is because the combinational logic signal under clk_b will have glitches. When used under clk_b, the setup/hold time will ensure that the glitch will not be picked up by clk_b. However, due to the uncertainty of the asynchronous phase, the glitch of the combinational logic is very likely to be detected by clk_a. picked up. The general code design is as follows:
always @ (posedge clk_a or negedge rst_n) 
begin 
    if (rst_n == 1'b0) 
        begin 
            pules_a_r1 <= 1'b0;
            pules_a_r2 <= 1'b0;
            pules_a_r3 <= 1'b0; 
        end 
    else
        begin //打3拍
            pules_a_r1 <= pulse_b;
            pules_a_r2 <= pules_a_r1;
            pules_a_r3 <= pules_a_r2; 
         end  
    end  
assign pulse_a_pos = pules_a_r2 & (~pules_a_r3); // rising edge detection 
assign pulse_a_neg = pules_a_r3 & (~pules_a_r2); // falling edge detection 
assign pulse_a = pules_a_r2;

Many people should ask, why is it two beats, is it okay to beat one beat and three beats?

Let’s briefly talk about the principle of the next two-level register: the two-level register is the square of the first-level register. The two-level register cannot completely eliminate the metastable hazard, but it improves the reliability and reduces its probability of occurrence. In general, the probability of level 1 is very high, and the improvement of level 3 is not great.
There may still be many people who do not fully understand this, so please see the following timing diagram:
data is the data of clock domain 1 and needs to be transmitted to clock domain 2 (clk) for processing. The clocks used by register 1 and register 2 are both clk. Assuming that the rising edge of clk happens to be the transition edge of data (the rising edge from 0 to 1, the actual data transition cannot be instantaneous, so there is a short transition time), then as register 1 Should the input be 0 or 1? This is an indeterminate question. Therefore, the value of Q1 cannot be determined, but at least it can be guaranteed that at the next rising edge of clk, Q1 can basically meet the holding time and setup time requirements of the second-level register, and the probability of metastable state has been greatly improved.
If the third-level register is added, since the second-level register has greatly improved the processing of metastability, the third-level register can be said to be only a delay for the second-level register to a large extent. So it doesn't make much sense.

The second: signal from A to B (fast to slow)

If a single-bit signal goes from clock domain A to clock domain B, there are two different situations, the transmission pulse signal pulse_a or the transmission level signal level_a. In fact, in general, only the width of the level signal level_a can be collected by clk_b to ensure the normal operation of the system. So how to deal with the pulse signal pulse_a? The clock domain handshake can be implemented by replacing pulse_a with a stretched signal.
The main principle is to first widen the pulse signal under clk_a, turn it into a level signal signal_a, and then transmit it to clk_b. After confirming that clk_b has "seen" the signal synchronization, clear signal_a. The general framework of the code is as follows:
module Sync_Pulse ( clk_a, clk_b, rst_n, pulse_a_in, pulse_b_out, b_out ); 
/****************************************************/ 
input clk_a; input clk_b; input rst_n; input pulse_a; output pulse_b_out; output b_out; 
/****************************************************/ 
reg signal_a; reg signal_b; reg signal_b_r1; reg signal_b_r2; reg signal_b_a1; reg signal_b_a2; 
/***************************************************** */  
// Under the clock domain clk_a, generate a stretched signal signal_a 
always @ ( posedge clk_a or  negedge rst_n) 
 begin  
     if (rst_n == 1 ' b0) 
        signal_a <= 1 ' b0; 
     else  if (pulse_a_in) // detected If the input signal pulse_a_in is pulled high, then pull up signal_a 
        signal_a <= 1 ' b1; 
     else  if (signal_b_a2) // Detect that signal_b1_a2 is pulled high, then pull down signal_a 
        signal_a <= 1 ' b0; else;
end  
// In the clock domain clk_b, collect signal_a and generate signal_b 
always @ ( posedge clk_b or  negedge rst_n) 
 begin  
    if (rst_n == 1 ' b0) 
        signal_b <= 1 ' b0; 
    else  
        signal_b <= signal_a; 
 end  
// more Stage flip-flop processing 
always @ ( posedge clk_b or  negedge rst_n) 
 begin 
    if (rst_n == 1 ' b0) 
    begin  
        signal_b_r1 <= 1' b0; 
        signal_b_r2 <= 1 ' b0; 
    end  
    else  begin  
        signal_b_r1 <= signal_b; 
         // Two beats to 
        signal_b signal_b_r2 <= signal_b_r1;
         end  
    end  
// In the clock domain clk_a, signal_b_r1 is collected for feedback to pull down the stretching signal signal_a 
always @ ( posedge clk_a or  negedge rst_n) 
 begin  
    if (rst_n == 1 ' b0) 
    begin  
        signal_b_a1 <= 1 ' b0; 
        signal_b_a2 <=1 ' b0; 
    end  
    else  begin  
        signal_b_a1 <= signal_b_r1; 
         // Take two beats to signal_b_r1, because it also involves cross-clock domain 
        signal_b_a2 <= signal_b_a1; 
     end  
end  
assign pulse_b_out = signal_b_r1 & (~ signal_b_r2); 
 assign b_out = signal_b_r1; 
 endmodule

In summary, there are five simple principles to keep in mind when designing:

1. The transition edge of the global clock is the most reliable.
2. The input from the asynchronous clock domain needs to be registered once for synchronization and again to reduce the impact of metastability.
3. The input from the same clock domain does not need to use the transition edge, and there is no need to register the signal.
4. The input from the same clock domain that needs to use the transition edge can be registered once.
5. Inputs from different clock domains that need to use the transition edge need to use 3 flip-flops, the first two are used for synchronization, and the output of the third flip-flop and the output of the second pass through the logic gate to determine the jump. change edge.

2. Multi-bit transmission (inter-frequency problem)

To process multi-bit data across clock domains, asynchronous dual-port RAM is generally used. Suppose we now have a signal acquisition platform, the ADC chip provides a source synchronous clock of 60MHz, the data output by the ADC chip changes at the rising edge of the 60MHz clock, and the FPGA needs to use a 100MHz clock to process the data collected by the ADC (multi-bit).
In this similar scenario, we can use asynchronous dual-port RAM for cross-clock domain processing. First, use the 60MHz clock provided by the ADC chip to write the data output by the ADC into the asynchronous dual-port RAM, and then use the 100MHz clock to read it from the RAM.
I believe everyone can understand the use of asynchronous dual-port RAM to process multi-bit data across clock domains. Of course, in scenarios where asynchronous dual-port RAM can be used to process cross-clock domains, asynchronous FIFOs can also be used to achieve the same purpose.
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324492338&siteId=291194637