SV——Yikes! Why is My SystemVerilog Still So

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.csdn.net/m0_38037810/article/details/102715222
 

0. 介绍

这个Cummings在2019年DVCon会议上的论文《 Yikes! Why is My SystemVerilog Still So Slooooow 》,主要讲关于systemverilog仿真速度与coding之间关系。

1. System Verilog语义

1.1 logic类型有两种语义

Introduced in SystemVerilog, the logic type can have either wire or variable storage and that storage type is determined from context by the simulator if it is not explicitly declared. This matters to simulation because wires can be collapsed to be the same object for higher simulation speed whereas variables cannot. Since the semantic for logic type is to default to variable storage in all cases except for the inputs or inouts of a design unit .

wire is implicit on the input port so the declaration is not needed

1.2 vector操作比bit操作快

Another important semantic is that the simulator will typically operate faster on a full vector than individual bits.

1.3 引用(ref)和值传递

Just keep in mind that all parameters in an argument list that follow the ref construct will pass by reference unless you explicitly use input, output or inout .

1.4 static arrry faster than dynamic array

The semantics of dynamic data structures (QDAs) are also sources of common performance issues that are generally true of SystemVerilog and most languages that have these types. An easy one to recognize is the use of static arrays instead of dynamic arrays wherever possible(dynamic array 有 memory footprint and garbage collection time ).

Since dynamic arrays are best for look-up and random insertion/deletion operations and queues are best for front or back operations with automatic resizing,

2. Memory and Garbage Collection – Neither are Free

2.1 对象的创建和释放消耗时间

  1. 用ref引用方式传递动态对象或者深拷贝对象。

  2. 根据设计需要决定是创建一个对象,还是每次循环都要创建新的对象。

// call new every loop
task run();
    forever begin
        md=new();
        @(negedge vif.clk);
        ……
    end
endtask
// only once new
task run();
    md=new();
    forever begin
        md=new();
        ……
    end
endtask

2.2 减小class的仿真消耗

  1. 如果只需要一个container的话,用struct代替class.

    Wherever possible struct[s] should be used instead – either inside the class or instead of the class. For example, if the main purpose of the class is to be a container of heterogeneous data types, then a struct is a better choice .

    because that separate class will require heap management and potentially engage garbage collection but the simple struct will not .

  2. 将一些interface-heavy function放在interface中。

    Putting interface-heavy functionality into the interface rather than in classes is also more simulation efficient with the added benefit of being more reusable,

3 Leave Sleeping Processes to Lie

A very common process in SystemVerilog is the always block with a single sensitive signal, such as the clock.This static process is highly optimized in all simulators, but side-effects from dynamic tasks or functions such as DPI (or any external) functions, virtual class tasks/functions, and virtual interface tasks/functions may disable the optimization.

By moving the DPI call inside the conditional, the simulator might optimize the process wake up to posedge clk and txactive reducing the number of times the process executes.

import "DPI-C" function void dpi_tic(logic active, int count);
module BENM9A (input logic txactive, clk);
    int counter; // default value is 0
    initial $display("%m");
    always_ff @(posedge clk) begin
        //move DPI code into condition if it is conditional
        dpi_tic(txactive, counter);
        if (txactive)
        counter <= counter+1;
    end
endmodule
​
import "DPI-C" function void dpi_tic(logic active, int count);
module BENM9B (input logic txactive, clk);
    int counter; // default value is 0
    initial $display("%m");
    always_ff @(posedge clk)
        if (txactive) begin
            //move DPI code into condition if it is conditional
            dpi_tic(txactive, counter);
            counter <= counter+1;
        end
endmodule

4. UVM Best Practices

4.1 通过条件判断string processing

只在需要处理string的时候再处理。

The unconditional array string processing even when the processed string was not printed was huge, exacting a penalty of 3,000-10,000 time slower than conditional string processing.

// 无条件string processing
function void get_data();
    string memlayout;
    // Format the memory layout into a string
    memlayout = " {\n";
    foreach(mem[i])
        memlayout = $sformatf("%s mem[%0d]:%8h",memlayout, i, mem[i]);
    memlayout = {memlayout, " }\n"};
    `uvm_info("MEMDATA", memlayout, UVM_HIGH)
endfunction
​
//有条件string processing
function void get_data();
    string memlayout;
    `ifdef FAST
    // Only do expensive string processing for >= UVM_HIGH verbosity
    if(uvm_report_enabled(UVM_HIGH, UVM_INFO, "MEMDATA")) begin
    `endif
    // Format the memory layout into a string
    memlayout = " {\n";
    foreach(mem[i])
        memlayout = $sformatf("%s mem[%0d]:%8h",memlayout, i, mem[i]);
    memlayout = {memlayout, " }\n"};
    `ifdef FAST
    end
    `endif
    `uvm_info("MEMDATA", memlayout, UVM_HIGH)
endfunction

4.2 减少TLM analysis port的执行

Turning off unused analysis port path sampling and broadcasting can significantly improve simulation performance.

// Unconditionally broadcast UVM analysis port transactions
task run_phase(uvm_phase phase);
    forever collect();
endtask
​
// Conditionally broadcast UVM analysis port transactions
task run_phase(uvm_phase phase);
    if(ap.size()) forever collect();
endtask
​
task collect();
    trans1 tr = trans1::type_id::create("tr");
    get_txn_from_interface(tr);
    ap.write(tr);
endtask

5. Verification Best Practices

与randomization、assertion和 coverage collection相关的性能提高。

5.1 降低随机化的空间

the loop sets up a constraint on each array element based on its neighbor resulting in a list of 16-256 (randomized) integers with 32-bit variables that have to be solved simultaneously. Modifying the code to use post_randomize() and an array sort() method can improve runtime performance up to 1000x.

// 搜索空间很大。
class txn15;
    rand int addr;
    rand logic [15:0] payload[$];
    rand bit [2:0] del;
    constraint size_ct { payload.size() inside { [16:256]}; }
    constraint sort_ct {
        foreach (payload[i]) {
        // i must be greater than 0
        if(i) payload[i] >= payload[i-1];
        }
    }
endclass
​
// 通过在Post_randomize使用sort排序,大大降低randomization的仿真时间。            
class txn15;
    rand int addr;            
    rand logic [15:0] payload[$];
    rand bit [2:0] del;
    constraint size_ct { payload.size() inside { [16:256]}; }
    function void post_randomize();
        payload.sort();
    endfunction
endclass 
           

5.2 assertion

using single-cycle assertions wherever possible,and using single-clock assertions – even if that means splitting the assertion into two separate assertions – all result in improved performance. While local variables may be needed to manipulate data inside sequences and properties, they add overhead during simulation.

5.3 coverage

fewer coverage events will deliver faster simulation.

Coverage sampling events can be further reduced by having covergroup[s] share common expressions

A third method to reduce sampling events is to merge sample process that use the same event

`ifdef MERGED
// Sampling merged to a single event
always @(posedge valid iff collect_cov) begin
    c1.sample();
    c2.sample();
    c3.sample();
end
`else
always @(posedge valid iff collect_cov)
    c1.sample();
always @(posedge valid iff collect_cov)
    c2.sample();
always @(posedge valid iff collect_cov)
    c3.sample();
`endif

猜你喜欢

转载自blog.csdn.net/m0_38037810/article/details/102715222
今日推荐