Analysis of Cross-Clock Domain Processing (3) (Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog)

write in front

         Text Reference 《Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog》 --Clifford E. Cummings.

        It mainly describes how to design across clock domains. In the text, the black text is the content, and the light blue font is my long-winded text. If you need the original article in English, you can comment and leave an email to me.

        Series of articles:
        Analysis of Cross-Clock Domain Processing (1) (Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog)

        Analysis of Cross-Clock Domain Processing (2) (Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog)

        Analysis of Cross-Clock Domain Processing (3) (Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog)


6.0 Naming Conventions and Design Partitions

        Naming conventions help ensure good team communication and facilitate the use of scripting languages ​​to collect and group all signals associated with a specific clock in a design. Good design partitioning can significantly reduce the effort to synthesize and verify the timing of multi-clock designs. This section discusses recommended naming conventions and design partitions.

        There are two ways to address potential CDC issues:

                (1) Verify that the design complies with the correct CDC rules,

                (2) Avoid this problem.

        Both methods are valuable and should be used to ensure error-free designs.

        The first method, verification of the CDC design rules, typically requires the use of special tools to check the design for possible CDC violations. When I wrote my first paper on multi-clock design in 2001, I didn't know of any tools on the market that could check CDC rules. Today, there are many companies offering such tools (see [11] for a list of companies and tools in the CDC validation space). The second method, avoiding this problem, can be done by using some good coding guidelines outlined below.

6.1 Clock and Signal Naming Conventions

        Various design teams have used a number of useful clock and signal naming conventions.  

        Outline: Use the clock naming convention to identify the clock source for each signal in your design.  

        Why: The naming convention helps all team members identify the clock domain for each signal in the design, and also makes it easier to group signals for timing analysis using regular expression "wildcards" in synthesis scripts.  

        A well-established naming convention requires the use of leading prefix characters to identify various asynchronous clock domains. Examples include: uClk for microprocessor clock, vClk for video clock, and dClk for display clock.  

        Each signal is then synchronized to one of the clock domains in the design, and each signal name is marked with a prefix character to identify the clock domain used to generate the signal. For example, any signal produced by uClk is marked with a u prefix in the signal name, such as uaddr, udata, uwrite, etc. Any signal generated by vClk is in the signal name, eg vdata, vhsync, vframe, etc. The same signal naming convention is used for all signals generated by any other clock in the design. 

        Using this technique, any engineer on a design team can easily identify the clock domain source of any signal in the design and use those signals directly or pass them through with proper synchronization to use them in a new clock domain.  

        The exact naming convention is not important, but it is critical that every engineer on the project agrees to adhere to the naming convention chosen by the team. Naming conventions will significantly increase the productivity of design teams.

        Each company basically has its own agreement, and the more standardized companies will also publish it in the form of documents, requiring all designers to abide by it.  

6.1.1 Multi-Clock/Multi-Source Modules Without Naming Conventions

        If your team doesn't use any specific clock-oriented signal naming convention, and if modules are allowed to have multiple clock inputs, there is always a danger that CDC analysis tools may be set up incorrectly, and bad CDC design practices can easily be missed. 

        Even if your team has access to a good CDC analysis tool, I strongly recommend that you take a few simple steps to more easily identify and debug the analysis and identification of potential CDC design issues.

6.2 Timing Verification for Each Clock Domain

        To verify the timing of any design, you must verify that each clock domain in the design meets the timing requirements. Although tools have improved over the past decade to help automate the analysis and verification of signals in different clock domains, it is still good practice to use good partitioning and naming conventions for multi-clock designs.  

        By partitioning the design to allow only one clock per block, static timing analysis becomes very easy for each domain in the design.  

6.3 Clock-Oriented Design Partitioning

        Some of the easiest and best design partitioning methods are implemented using design partitioning at clock boundaries.  

        Outline: Only one clock is allowed per module [9].  

        Reason: Static timing analysis and creation of synthesis scripts are easier to do on a single clock block or group of single clock blocks.  

        Exception: A top-level block that connects signals from all different clock domains together will naturally have all clocks as inputs to that block. Minimize your multi-clock verification efforts and only allow top-level blocks to have multiple clock inputs.  

        Outline: Divide the design block into a clock block.  

        Reason: Timing verification of fully synchronized sub-blocks can be easily verified using STA (Static Timing Analysis) tools and dividing the design block into multiple one clock domain sub-blocks converts large and complex timing analysis tasks into multiple fully synchronized single clock design.  

        Synopsis: Create synchronizer blocks to pass signals from one clock domain to another, and allow only one clock per synchronizer block.  

        Reason: Assume that any signal passed from one clock domain to another will eventually suffer from setup and hold time issues. Isolating CDC boundary logic can significantly reduce the design and verification effort for multi-clock designs.  

        In most cases, the synchronizer block will be the only block in the design that will encounter intentional setup and hold violations. Timing violations occur when passing signals between asynchronous clock domains, which is the whole reason that synchronizers must be added to the design.

        The design is as modular and refined as possible, and then each module only uses one clock, and finally the sub-modules are aggregated through the top-level module (there can be several levels), so as to control the CDC problem in the top-level module, effectively reducing the probability of problems occurring .

        Consider an example design with three clock domains, labeled aClk, bClk, and cClk, as shown in Figure 30. In this design, all aClk design blocks are combined into one aClk logic block. All bClk design blocks are grouped into a bClk logic block, and again we create a cClk logic block. Any signals originating from an asynchronous clock domain pass through the synchronizer block before being allowed to drive the input of another logic block. 

6.3.1 Timing Analysis of Clock Partition Module

        Using a clock-oriented design partitioning strategy, all inputs and outputs of each design block are fully synchronized to a single clock. This is the easiest type of design to verify using a static timing analysis (STA) tool because there are no false paths in the design.  

        Group together all design blocks clocked within each clock domain. A group should be formed for each clock domain in the design. These groups will undergo timing verification as if each group were a separate, fully synchronized design. For each clock domain we have a design block where we can easily perform worst case (max time/setup time check) timing analysis and best case (min time/hold time check) timing analysis.  

        Also using this clock-oriented partitioning strategy, each CDC boundary is isolated using synchronizer modules. Each synchronizer block consists only of synchronizer cells provided by the ASIC or FPGA vendor (preferred), or is constructed using flip-flops connected in pairs to form a synchronizer equivalent.  

        If the ASIC or FPGA vendor provides synchronizer cells and instantiates them in the design, then there is no need to verify the setup and hold times for these blocks, as the vendor should have created a cell placement that does not violate the setup or hold time between flip-flop phases .

        If the synchronizer is synthesized from RTL code, it is most important to perform a best-case timing analysis to ensure that the flip-flops are not too close together so that the output of the first stage changes too quickly to meet the hold time requirements of the input of the second stage . A colleague recently pointed out that worst-case timing analysis should also be performed in case the placement tool happens to place two synchronizer flip-flops very far apart on the ASIC or FPGA chip. I agree with this updated suggestion.  

        Due to the partitioning of individual synchronizers, gate-level simulations can be more easily configured to ignore setup and hold time violations for the first stage of each synchronizer.  

        Static timing analysis of the RTL synchronizer requires a simple set_false_path command to remove the STA input. We know that there are timing issues at the input of the synchronizer, which is why a synchronizer is used.  

        Static timing analysis becomes very easy to perform by dividing the design and synchronizer blocks to allow only one clock per block. Synthesis script commands used to address multiple clock domains now become a matter of grouping, identifying faulty paths, and performing min-max timing analysis.

6.4 Partitioning with the MCP method

        Dividing the design into separate design blocks and synchronizer blocks at clock boundaries works well in most cases, but if multiple signals need to be passed between clock domains using the MCP formula, some of the signals passed to the design block may will appear from different clock domains, as shown in Figure 31. 

        Design blocks with asynchronous inputs can still be easily timed if the signals in the design use a clock-based naming convention. Simply exclude the asynchronous input from the analysis before performing STA on the design block in question.  

        Typically, the "set_false_path" command is only required for Synchronizer and MCP formula datapath inputs. If you use a clock prefix naming scheme, you can easily identify all asynchronous inputs using wildcards. In Figure 31, to exclude the adata bus from STA within the bClk block, first execute the following command:

                set_false_path -from { a* } 

        This command should be sufficient to eliminate all asynchronous inputs from bClk STA. 

7.0 Multiple Clock Gate-Level Simulation Issues

        Digital simulation models typically generate X when the synchronizer identifies setup and hold violations on the CDC signal. This usually causes gate-level simulation to fail. What technologies exist to solve this problem?  

        As described in Section 6.3.1, signals crossing clock boundaries through the synchronizer will encounter setup and hold violations. This is why synchronizers are added to the design to filter out metastability effects of signals that vary too close to the rising edge of the receiving clock domain clock signal.  

7.1 Synchronizer gate-level CDC simulation problem 

        When simulating at the gate level on a multi-clock design, the ASIC library model of the flip-flop is modeled with setup and hold expressions to match the timing specifications of the actual flip-flop. ASIC libraries typically model flip-flops to drive X (unknown) on the flip-flop output when a timing violation occurs. When simulating a gate-level synchronizer, setup and hold violations can cause the ASIC library to issue setup and hold error messages, and problematic signals are often driven to the X value. These X values ​​propagate to the rest of the design, causing problems when trying to verify the functionality of the entire gate-level design, as shown in Figure 31. 

7.2 Strategies to Eliminate X-Propagation from Gate-Level Simulations

        Over the past 10 years, many colleagues have shared with me a number of strategies to address the problems associated with unwanted propagation of X every time a signal violates the setup or hold time of the first stage of the synchronizer. 

        Since X-propagation occurs when setup or hold times are violated, almost all solutions to this problem involve changing the setup and hold times to 0 so that the setup or hold times are not violated, and therefore no X-propagation occurs.  

        Some methods are bad, some are good. Here are some strategies to consider for solving the X-propagation problem.  

7.2.1 Simulator command to turn off timing check

        Most SystemVerilog simulators have a command option to ignore all timing checks, but this also ignores the timing checks required by the rest of the design.  

7.2.2 Change Trigger Setup and Hold Time to 0

        The setup and hold time settings can be changed to zero for any ASIC library flip-flop used in the synchronizer, but this will cause all setup and hold time checks to be set to zero for all instances of that type of flip-flop, including those you might want Triggers used to test the rest of the design.  

7.2.3 Copying and Modifying New Trigger Models

        You can make copies of the flip-flops from the ASIC library and store them into a new SystemVerilog library with a different name, set all setup and hold times to zero, then modify the design gate-level netlist to trigger with the modified library replace all first-stage synchronizer ASIC library flip-flops without timing checks, but this can be an error-prone and tedious process that may have to be repeated every time a new netlist is generated, or every time a new netlist is generated. The process must be repeated every time a new netlist is generated. It may require the creation of a makefile and script to be modified automatically every time a new netlist is generated.  

7.2.4 Synopsys set_annotated_check instruction

        A useful approach to this problem proposed by Bhatnagar [5] is to use Synopsys commands to modify the set SDF back annotation and hold time on the first-stage flip-flop cells in the design. Bhatnagar noted that SDF files are instance-based, making it easier to implement set and hold times for offending cells. Bhatnagar pointed out:

                Rather than manually removing the setup and hold constructs from the SDF file, a better approach is to zero out the setup and hold time in the SDF file, and replace the existing setup and hold time numbers with zeros only for violating failures.

Bhatnagar further noted that setting the hold time to zero means that there will be no timing violations and therefore no propagation of unknowns to the rest of the design. The following dc_shell -t command given by Bhatnagar is used to make setup and hold times zero:

                set_annotated_check 0 -setup -hold -from REG1/CLK -to REG1/D

        Using a creative naming convention for the output of the synchronizer's first-stage trigger might allow wildcard expressions to easily back-comment all first-stage trigger SDF settings and keep the time value as zero. This technique works if the design is done using the Synopsys DesignCompiler tool, but what about non-Synopsys flows?  

7.3 Other strategies to eliminate X-propagation

        All strategies described in Sections 7.2 to 7.2.4 were shared in my first multi-clock design paper given in 2001. After the initial demonstration, many engineers came forward to share other techniques for removing X-propagation from gate-level simulation. Engineers from at least three companies described the technology very similarly to those in the ### section (kudos to the engineers who attended SNUG-2001 in San Jose).  

        Since then, other engineers from many companies have shared other techniques. These techniques will be covered in this section, and I am very grateful to all the engineers who continue to share interesting techniques with me every year. Salute to all! 

7.3.1 Using multiple SDF files

        Remember that the key to eliminating unwanted X-propagation is to force the setup and hold times of the synchronizer inputs to 0, thereby eliminating all possible setup and hold time conflicts on the synchronizer inputs.  

        Many engineers told me that they actually generated two SDF files. The first SDF file has all the actual delays for the entire design, including precise setup and hold times. The engineer then generates a second SDF file that contains only the first-level flip-flops. In this file, the setup and hold times are set to 0. Some engineers build this file manually, while others use a script to generate this file.  

        The engineer then reads the first SDF file using the $sdf_annotate command. They then read a second SDF file that covers the setup and hold times of the first-stage synchronizer data input. When reading two SDF files, the last SDF file of each instance takes precedence. All timings are accurately annotated, and then the timing checks for the first-stage synchronizer are modified.  

        This is a clever approach that can be used with any toolflow that produces SDF files. 

7.3.2 With support SDF

        The vendor that generates the tool, the synchronizer unit, has been told by other engineers a good way to solve the X-propagation problem, but that approach requires either (a) a control unit library, or (b) a good working relationship with your ASIC vendor. 

        This technique requires the creation of a separate synchronizer unit with a proper placement relationship between the two flip-flop stages. For this method to work, the vendor must provide:

                (1) Actual Synchronizer Units - These will be fixed into the design.

                (2) System verification model of the synchronizer unit used for simulation.

                (3) SDF file generation tools that will generate SDF files with 0 settings and 0 holds for synchronizer units.  

        If the vendor can provide this unit and these functions, then just generate an SDF file with the appropriate timing checks on the synchronizer unit.  

        Any FPGA vendor's ASIC that offers this capability is doing a huge help to their customer base. I've heard of some ASIC vendors offering this feature. I don't know of any FPGA vendors that offer this kind of functionality. Recognizing that most modern designs are multi-clock designs, I strongly urge all ASIC and even all FPGA vendors to provide proper simulation and SDF file tool support for synchronizer units.

7.3.3 Vendors with built-in synchronizer support

        If anyone knows of a vendor who provides this support, please let me know who the vendor is with the appropriate contact information, and I will update this whitepaper regularly to recognize vendors who provide us (the design community) with this capability.  

        List of suppliers: (As of this article, no suppliers are listed) 

7.4 Multiple SDF files for gate-level CDC simulation

        After my first multi-clock demo in 2001, engineers from at least three different companies shared the following excellent techniques for solving X-propagation in gate-level simulation right after my presentation.  

        The technique involves writing out a complete SDF timing file and then manually or using a script, generating a second SDF file for the first stage flip-flops of all synchronizer modules. The second SDF file sets all setup and hold times to 0, then uses the $sdf_annotate command to apply both SDF files to the design. The first SDF file annotates all actual timings for the entire design, then the second SDF file is read to cover the setup and hold times of the first-stage synchronizer.  

        The advantage of this technique is that it can use all tools for all designs, not just Synopsys ASIC designs. This is a highly recommended technique.

7.5 Force Synchronizer Notifier Input to Fixed Value

        Verilog and SystemVerilog setup's built-in timing checks and hold time checks ($setup, $hold, and $setuphold) have optional notifier output. This notifier output toggles from 0-1-XZ whenever a timing violation is detected. 

        Most ASIC and FPGA flip-flop models are built from Verilog user-defined primitives (UDP), and the notifier signal is usually listed as one of the inputs to the UDP table. Whenever the notifier input toggles (caused by a timing violation), the flip-flop output becomes unknown, and this unknown value is visible on the output of the gate-level flip-flop model. Notifiers on these first-stage flip-flop models can be forced to logic levels to prevent them from switching during simulation and causing the flip-flop output to be unknown.  

        A clever technique used by at least one company forces the timing violation notifiers of the first-stage synchronizer flip-flops to be forced to a logic level so they can never switch and trigger X into the trigger flip-flop model. 

7.6 ASIC and FPGA Library Cell Synchronizers

        CDC designs can be more easily accomplished if ASIC and FPGA providers can provide fully characterized synchronizer cells that can be instantiated into the design.

        Premium ASIC vendors offer:

                (1) Characterized synchronizer unit.

                (2) Verilog model for simulating the synchronizer unit.

                (3) SDF generator to generate SDF files that annotate setup and hold times on synchronizer cells with 0 to avoid generating X when a signal crosses a CDC boundary when a setup or hold time is violated.  

        As far as I know, no FPGA provider offers this capability, but forward-thinking FPGA providers offer such units for their advanced multi-clock design customers. 

7.7 Simulation Model with Random Delay Insertion

        Multiple colleagues have proposed an interesting model that is designed to synthesize the correct synchronizer, but simulated with random periodic delays.

        The block diagram of the model is shown in Figure 33, and the SystemVerilog code to support the model is shown in Example 6. 

      

        As can be seen from the block diagram, this model is intended to generate synthesizable synchronizer models, or to be used as simulation models with optional delays.  

        The IEEE Std 1364.1-2002 Verilog RTL Synthesis Standard [6] requires a compliant synthesis tool to set the SYNTHESIS macro before reading any Verilog models. Although most synthesis tools largely ignore many of the requirements of the IEEE Verilog synthesis standard, most implement this nice synthesis macro requirement.  

        Tools that set the SYNTHESIS macro before reading this sync2 SystemVerilog code will choose code to infer two trigger synchronizers.  

        Simulators without the SYNTHESIS macro set will read the sync2 model, ignore the code for the synthesizable model, and simulate the model in the "else" section of the code. The model is parameterized so the same model can be used with the default parameter SIZE of width 1 bit for a simple 1-bit CDC signal, or the model parameter can be instantiated with the SIZE parameter set to be multi-bit wide so that the synchronizer can be used to capture and synchronous multi-bit buses, such as Gray code counters. 

        The simulation portion of the model includes a default declaration of a SIZE variable named DLY. By default, the DLY variable is initialized to 0, which causes the entire sync2 model to be simulated with a default value of two trigger delays, but the DLY variable can be stratified from the testbench to reproducible random values ​​of 1 and 0, This results in some bits in the bus passing through three flip-flop stages, while other bits pass through only two flip-flop stages. This can model the behavior of a set of synchronizers, where some bits are captured on earlier clock edges than others, and allow simulation to see how the design behaves with small skew in the multi-bit datapath. 

8.0 Summary and Conclusion

        Clock Domain Crossing (CDC) errors can cause serious design failures. These costly failures can be avoided by following a few key guidelines and using well-established verification techniques.  

8.1 Recommended 1 bit CDC technology

        When passing a bit between clock domains:

  • Register the signal in the transmit clock domain to remove combinatorial settling.
  • Synchronize the signal to the receive clock domain. A multi-cycle path (MCP) formula may be required.  

8.2 Recommended Multi-bit CDC Technology

        Use one of the following strategies when passing multiple control or data signals between clock domains:

  • Combining - First attempt to combine multi-bit signals into a 1-bit representation in the transmit clock domain before synchronizing the signal to the receive domain.
  • Pass multiple signals across clock domains using the multicycle path (MCP) formula.
  • Use FIFO to pass multi-bit bus, data or control bus.
  • Use a Gray code counter.  

8.3 Recommended Naming Conventions and Design Partitions

        Use a clock-based naming convention. Divide design sub-blocks into fully synchronized 1-clock designs whenever possible. 

8.4 Recommended solutions for multi-clock gate-level CDC emulation

There are several useful solutions to the CDC X-propagation simulation problem during gate-level simulation:

  • Use Synopsys switches to generate 0-setup and 0-hold times for the first stage flip-flop on the synchronizer. Applies to Synopsys tools only.
  • Use multiple SDF files - a good technique to cover later in this section.
  • Vendor provides synchronizer unit and appropriate SDF tools - if your ASIC or FPGA vendor provides models and tools this is a good solution (rarely does this - require your ASIC and FPGA vendor to support this feature )
  • Simulate synchronization problems using a creative SystemVerilog model.  

The techniques described in this article are designed to facilitate robust development and verification of multi-clock designs.  

Guess you like

Origin blog.csdn.net/wuzhikaidetb/article/details/123521798