Solutions to Thousand Questions of Digital IC Written Test--Short Answer Questions (6)

Preface

The summary of written test questions is to summarize the problems that may be encountered in the autumn recruitment. Solving the questions is not the purpose. The purpose is to discover your own loopholes in the process of doing the questions and consolidate the foundation.

All question results and explanations are given by the author. The answers are highly subjective. If there are any errors, please point them out in the comment area. The information is compiled from digital IC-related public accounts such as "Digital IC Workers", real questions from websites such as Niuke.com, and online written tests. Transcripts of real questions and interviews.

        Keep updated (2023.9.25) The article contains 270 single-choice questions, 106 multiple-choice questions, 16 fill-in-the-blank questions, 17 true-false questions, 72 short-answer questions, 3 logical reasoning questions, and 8 C language python script programming questions Tao .
All the codes provided by the author in this article are written as APIs and can be directly copied to the software to compile, run, and give results.  

        There are many questions, and even with previous analysis and the powerful ChatGPT, mistakes are inevitable. If you find any mistakes, please feel free to discuss them in the comment area.

        In addition, there is a little personal stuff~: At this moment, I feel that I must give...
The total number of words in the solution to the Thousand Questions of the Digital IC Written Test has reached 150,000+, and the webpage is severely stuck in coding, so it is divided into multiple parts to facilitate maintenance. The link is as follows: Solution to the Thousand Questions in the Digital IC Written Test--Single Choice Questions (1)
Digital
IC Solutions to Thousands of Questions in the Written Test - Single-Choice Questions (Part 2) Solutions to
Thousands of Questions in the Digital IC Written Test - Multiple-Choice Questions (Part 3) Solutions to Thousands of Questions
in the Digital IC Written Test - Fill-in-the-Blank Questions (Part 4) )
Solutions to thousands of questions in the digital IC written test - judgment questions (5) Solutions to
thousands of questions in the digital IC written test - short answer questions (6)
Solutions to thousands of questions in the digital IC written test - logical reasoning (7
) ​​​​Solutions to Thousand Questions of the Digital


short answer questions

1. Can setup time and hold time be negative at the same time? Why?

Answer: setup time and hold time can be negative, but not both at the same time.

Analyzing from the definition, setup time refers to the time the data needs to remain unchanged before the input data of the register arrives at the rising edge of the clock. The hold time refers to the time the data needs to remain unchanged after the input data of the register arrives at the rising edge of the clock.

Therefore, setup time + hold time actually gives the register a time window that can accurately sample the input signal . If both setup time and hold time are negative, then setup time + hold time <0, that is, the time window for register sampling data is negative, and there is no cannot accurately sample the data, so the answer is no.


2. Write the truth table for the following circuit diagram

Answer: Examine the relationship between the input and output of AND gate and XOR gate. (tip: This is a 1-bit full adder circuit, S is the sum, and C is the carry)

AND gate: two inputs 1 –> output 1, otherwise –> output 0.

XOR gate: two inputs are different –> output 1, inputs are the same –> output 0

B=0

B=1

B=0

B=1

A=0

C=0

C=0

S=0

S=1

A=1

C=0

C=1

S=1

S=0


3. The three core indicators of chip design are PPA. Please explain what these three letters stand for and explain your understanding of PPA.

Answer: In chip design, PPA stands for "Performance (Performance), Power Consumption (Power) and Area (Area)", which are three key indicators and an important criterion for measuring the quality of chip design.

  1. Performance: refers to the running speed and processing capabilities of the chip under a specific workload, usually measured by clock frequency, throughput, latency and other indicators.

  1. Power consumption (Power): refers to the energy consumed by the chip when working, usually represented by static power consumption and dynamic power consumption. Static power consumption refers to the energy consumption when the chip is not doing any operations while working, while dynamic power consumption refers to the energy consumed by the chip when performing various operations.

  1. Area: refers to the physical size of the chip, usually expressed by the wafer area or the number of transistors. The smaller the chip area, the more chips can be accommodated in the same package, thereby reducing manufacturing costs and improving performance.

In actual applications, designers need to determine the priority and weight of PPA based on specific application scenarios and needs to achieve optimal design effects.


4. List the general stages of classic processor CPU execution and the general behavior of each stage based on your own understanding.

1. Instruction Fetch: Fetch the next instruction from the memory and store it in the instruction register.

2. Decoding stage (Instruction Decode): Decode the instructions stored in the instruction register into corresponding opcodes and operands, and determine the type of instruction.

3. Execution phase (Execution): According to the type of instruction and operands, corresponding operations are performed, such as arithmetic operations, logical operations, memory reading and writing, etc.

4. Memory Access stage (Memory Access): If the operation performed requires access to memory, then the memory address that needs to be read or written is calculated, and the data is read or written from the memory.

5. Write Back: Write the execution results back to the register file or memory, and update the data in the register or memory.

In order to improve the execution efficiency of the CPU, modern processors also use technologies such as pipelines, superscalars, and dynamic execution to process different stages of instruction execution in parallel, thereby achieving more efficient instruction execution.


5. In chip design, in some cases, different modules may be started and stopped at different times. Please describe your understanding of this scenario.

  1. Reduce power consumption: During the chip startup phase, if all modules are started at the same time, the instantaneous power consumption may be too high and exceed the endurance range of the power supply. Starting different modules at different times can avoid this situation, thereby reducing the power consumption of the entire chip.

  1. Speed ​​up the startup time: Starting different modules in a time-sharing manner can avoid the problem of too long system response time caused by simultaneous startup, thereby speeding up the startup of the chip.

  1. Convenient debugging and testing: During the chip design process, starting and stopping different modules at different times can facilitate chip designers to test and debug the functions of different modules, thus improving development efficiency and testing results.


6. What is The Difference Between Byte And Bit [7:0]

Answer: It is easy to fall into the misunderstanding that 1 byte = 8 bit, but in fact this is to examine the difference between the byte variable type and the bit variable type in system verilog. The byte type is a binary logical signed number, and the bit type is a binary logical unsigned number. The SV data type is as follows: Excerpted from the Green Book


7. What Is The Use Of $cast?

Answer: In dynamic conversion, when we use a class, the class handle is converted downward, that is, when converting from a parent class handle to a subclass handle, the $cast() function needs to be used for conversion, otherwise a compilation error will occur and the parent class handle will be converted. There is only one way to convert a class handle into a subclass handle. Generally, when the parent class handle points to the subclass object and you want to access the member variables of the subclass, you use the cast function to convert. If the parent class handle points to a parent class object, you cannot use the cast function to convert it. Even if it is converted, the conversion will fail, that is, the function returns 0.

For details, please refer to the author's other articles: Verification Basics - Type Conversion, Virtual Methods, Callback Functions, Object Copy_Display Conversion Static Conversion_The Blog of the Pickled Fish Without Scallions-CSDN Blog


8. What is The Difference Between Mailbox And Queue?

Answer: Examine the basics of SV,

1. The mailbox must be instantiated through the new() function, while the queue only needs to be declared.

2. The access methods put() and get() of mailbox are blocking methods. Even when they are used, the methods may not return immediately. The queue corresponds to the access method. The push_back() and pop_front() methods are non-blocking methods. If blocked, it will return immediately.

3. When passing formal parameters, if it is the input direction, then the mailbox type passes the handle, while the queue type completes the copy of the queue content.


9. How To Call The Task Which is Defined In Parent Object Into Derived Class ?

Answer: Looking at the SV foundation, subclasses can access the member variables and methods of the parent class through super.xxx.


10. How to start sequence, tell me any method

The start method is the most essential and fundamental way to start a sequence in uvm. The start method is a task in the "uvm_seuquence_base" class.


    my_sequence seq;
    seq = my_sequence::type_id::create("seq");
    seq.start(m_my_driver);

11. Without Using Randomize Method Or Rand, generate An Array Of Unique Values?

Use time as a random factor: You can use the current time or simulation time as a random seed and use some algorithm (for example, linear congruence method) to generate pseudo-random numbers.


module testbench;
  initial begin
    int unsigned random_number;
    timeunit 1ns;
    time seed = $realtime;

    repeat (10) begin
      seed = seed + $realtime;
      $display("Seed: %0d", seed);
      random_number = seed % 31;
      $display("Random number between 0 and 30: %0d", random_number);
    end

  end
endmodule

12. Please list the differences between SRAM and DRAM?

SRAM (static random access memory) and DRAM (dynamic random access memory) are two common types of computer memory. They have the following differences:

  1. Storage principle: SRAM uses storage circuits to store data, so there is no need to refresh the circuit, the storage speed is fast, and it is generally used as a Cache; DRAM uses capacitors to store data, needs to be refreshed regularly, and the storage speed is slow.

  1. Storage density: Due to the storage principle of SRAM, each storage unit requires a larger number of transistors, so the storage density is lower than that of DRAM.

  1. Power consumption: Since SRAM does not need to be refreshed, it consumes less power than DRAM.

  1. Cost: Due to differences in manufacturing processes and storage density, the manufacturing cost of SRAM is higher than that of DRAM.

  1. Capacity: Due to the difference in storage density, DRAM is more suitable for storing large amounts of data than SRAM.


13. Please list the Memory hierarchy in a general system. And explain why the system needs to hierarchize memory.

The Memory hierarchy in a general system is as follows:

  1. Register

  1. Cache (Cache, SRAM)

  1. Main Memory (DRAM)

  1. Auxiliary Storage

The faster the memory, the higher the cost. Conversely, the cheaper the memory, the slower it will be. Using memory hierarchy can improve performance by using different types of memory at different levels while controlling costs. For example, storing the most frequently used data in registers and storing less frequently used data in auxiliary memory allows quick access to frequently used data and reduces costs.


14. In a CPU system, there are two Masters accessing a Slave through a 2x1 AXI bus. Briefly describe how to construct a verification scenario for verification and ensure the completeness of the verification.

In order to construct a verification scenario to verify and ensure completeness, you can follow the following steps:

  1. Design test plan: Based on the requirements and specification documents, design the test plan to determine the functional, performance and interface requirements that need to be verified.

  1. Build a simulation environment: Based on the requirements and specification documents, build a simulation environment, including the CPU system, 2 Masters, AXI bus, and 1 Slave.

  1. Write test cases: According to the test plan, write test cases to cover various situations, including normal situations, boundary situations and abnormal situations, such as read and write operations, multi-Master concurrent operations, etc.

  1. Run simulation: Run test cases in the simulation environment and observe the simulation results, including response time, data correctness, error handling, etc., to verify whether the functional and performance requirements are met.

  1. Analyze simulation results: Based on the simulation results, analyze the coverage of test cases and the effectiveness of test results, discover and solve potential problems, and ensure the completeness of verification.

  1. Repeat testing: After verification is completed, regression testing is performed to ensure that previously passed test cases can still pass to ensure the stability and reliability of the system.

  1. Submit verification report: Finally, based on the verification results, write a verification report, summarize the verification process and results, record all test cases and verification results, including passed and failed use cases, as well as uncovered parts and areas that need optimization.

Through the verification of the above steps, the completeness of the verification scenario can be ensured, so that the design implementation can meet the system specification requirements while ensuring the correctness, reliability and performance of the design.


15. Simple implementation of x[7: 0]*480 using Veilog HDL

Let’s simplify it first, x[7:0] * 480 = x [7:0] * (512-32)

assign out = (x<<9) - (x<<5);


16. What are the common causes of setup violation and hold violation in timing constraints?

This can be explained based on the calculation formulas of setup time slack and hold time slack.

setup_slack = clk_period + clk_path2_delay - dff_set_up - clk_uncertainty + clk_path1_delay + ck_to_q + logic_delay。

Common reasons for setup time violation can be: 1. Clock frequency is too high; 2. Data path combination logic delay is too large; 3. Asynchronous circuit signal acquisition error; 4. Clock jitter is unstable; 5. Back-end register clock arrival Time is too slow.

hold time slack = clk_path1_delay + ck_to_q + logic_delay + clk_path2_delay + dff_hold + clk_uncertainty

Common reasons for hold time violation can be: 1. The data path combinational logic is too short; 2. The clock jitter is unstable; 3. The back-end register clock arrival time is too fast.


17. The FPGA generates two output pulses, and the delay between the two pulses is required to be 0.5ns. Please describe your implementation plan.

Use the module inside the FPGA, IODelay Components.

Reference blog: Delay between generating pulses_FPGA generates ns-level pulses_Silly Boy: CPU's Blog-CSDN Blog


18. Please draw the circuit corresponding to the synthesis of the following sentences (Hesai Technology will be approved in advance in 2022)


reg out, int1, int2;
always @(posedge clk) begin
    out <= in1 & in2
    out <= in1 ^ in2
    out <= in1 l in2
end

The last sentence out <= in1 l in2 will be synthesized.

Verilog design and logic comprehensive example analysis (including code) - Zhihu (zhihu.com)


19. Please draw the circuit corresponding to the synthesis of the following statements


wire wire1, sel1, sel2, a,b,c;
assign wire1 = sel1==1 ? a : sel2 ? b: c

Two ternary operators are used, and the logic is actually:


if(sel1==1)
    wire1 = a;
else if(sel2==1)
    wire1 = b;
else 
    wire1 = c;

The synthesis is combinatorial logic, using two alternative muxes to make choices.


20. Please use as few 2-to-1 MUX as possible to implement a two-input XOR gate.

Just use two 2-choose-1 muxes.


21. Please explain LUT, CLB, BRAM, ISERDES, GTP, and DSP respectively.

  • LUT (Look-Up Table): It is the basic logic unit in FPGA. It can be regarded as a programmable look-up table, which can implement various logical operations and arithmetic operations. LUT usually consists of multiple inputs and an output. After the input is looked up in a lookup table, the result is output through a programmable output. LUT is an important part of FPGA, and its number and size have an important impact on the performance and resource usage of FPGA.

  • CLB (Configurable Logic Block): It is a configurable logic block in FPGA, consisting of multiple LUTs, registers and other programmable logic. CLB is one of the core components in FPGA and is used to implement circuit functions such as various logic operations and state machines.

  • BRAM (Block RAM): is a memory block in FPGA that can be used to store data and programs. BRAM can be configured in different widths and depths, supporting single-port and dual-port read and write operations. BRAM is widely used in FPGAs in FIFO cache, DMA controller, image processing, FFT and other fields.

  • ISERDES (Input Serializer/Deserializer): It is an input serializer/deserializer in FPGA, used to convert external high-speed data signals into synchronization signals in the FPGA internal clock domain. ISERDES can configure various input timing and clock domain information through programming, and can support a variety of standard protocols, such as DDR, PCI Express, SATA, etc.

  • GTP (Gigabit Transceiver): It is a high-speed serial transceiver in FPGA, used to support various high-speed serial protocols, such as PCI Express, SATA, 10G Ethernet, etc. GTP supports data rates up to 10Gbps and above, and has the characteristics of low power consumption and low noise.

  • DSP (Digital Signal Processor): It is a programmable digital signal processor in FPGA. It usually consists of multiple hardware multipliers and register files. It can be used to implement various digital signal processing algorithms, such as filters, FFT, volume Product, multiplication and accumulation, etc. DSP is widely used in FPGA in audio, video, radar, communications and other fields.


22. Please explain how to use the lookup table method to divide 8 bits by 8 bits, and explain the resource consumption.

To require x/y is to find x*1/y

Perform a table lookup method on 1/y. y is the address of ram. The result of 1/y is calculated using matlab, saved as an 8-bit binary decimal (the integer is 0), and placed in ram. This uses 2^8=256 * 8bit = 256byte ram space. Then just multiply it by x


23. Use verilog to implement the addition of two 8-bit complements.

Reference blog: Complementary addition operation_overflow judgment - Verilog implementation - CodeLeading.com (codeleading.com)


module top_module (
    input [7:0] a,
    input [7:0] b,
    output [7:0] s,
    output overflow
); 
    assign s=a+b;
    assign overflow=((~a[7])&(~b[7])&s[7])|(a[7]&b[7]&(~s[7])); 
endmodule

24. Briefly describe the methods and functions of pipeline design.

Method: Split the long combinational logic into small segments, and add registers between them to reduce the delay of the combinational logic.

Function: Can increase clock frequency and improve data throughput


25. How to use D flip-flop, AND or NOT to form a divide-by-two circuit?

The output Q terminal of the register is connected to the input terminal D of the register through a NOT gate. The output is inverted once per cycle, and inverted twice for one cycle of Q.


26. Please list as many test points as you can think of for functional verification based on the design description below.

An asynchronous FIFO, rdata and wdata are both 8-bit data, the FIFO depth is 16, when the rst_n input is low, the FIFO is reset, when the rising edge of wclk samples wr is high, the data is written to the FIFO, when the rclk When rd is high when sampled on the rising edge, the FIFO outputs data. In addition, when the FIFO is empty, the empty signal output is high, and when the FIFO is full, the full signal output is not high.

1. Basic testing of writing and reading data, including edge cases and regular cases.
2. Confirm the consistency of written data and read data.
3. Perform a write test when the FIFO is full to confirm whether the full signal is correct.
4. Perform a read test when the FIFO is empty to confirm whether the empty signal is correct.
5. Confirm whether the reset signal can correctly clear the FIFO, and set the values ​​of the empty and full signals.
7. Verify that the read and write clocks (rclk and wclk) are sampling signals correctly, and check whether the data is correctly read from the FIFO.
8. Test whether the FIFO depth is correct and confirm whether the FIFO can handle various data amounts normally.
9. Test different write rates and read rates to ensure the stability and correctness of the FIFO under different circumstances.
10. Check whether the asynchronous reading and writing of FIFO can work normally.
11. Verify that the FIFO can correctly detect and handle any errors, such as overflow and underflow.

27. Use D flip-flop to build a 4-base counter.


module counter4(
    input             clk    ,
    input             reset  ,
    output reg [1:0]  cnt
);

always@(posedge clk) begin
  if (reset) begin
    cnt <= 2'b00; // 初始化为 0
  end
  else begin
    case(cnt)
      2'b00: cnt <= 2'b01;
      2'b01: cnt <= 2'b10;
      2'b10: cnt <= 2'b11;
      2'b11: cnt <= 2'b00;
    endcase
  end
end

endmodule

28. Design a synchronous FIFO with the same read and write clocks. During writing, 10 will be written every 100 clock cycles. The specific time to write is uncertain. On the read side, 1 will be read every 10 cycles. Calculate the FIFO minimum depth?

Answer: 18. Back to back transmission.

Reference blog post: FIFO depth calculation for FPGA written test interview questions [ByteDance] [DJI] [Simple calculation formula]


29. Can the clock gating circuit be synthesized when the following code is synthesized? If so, draw a schematic diagram of the clock gating. If not, please modify it so that the signal out can be synthesized into the clock gating circuit.


always @(posedge clk or negadge rst_n) begin
    if(rst_n==1'b0)
        out <= 64'b0;
    else if (en)
        out <= data;
    else
        out <=64'b0;
end

The door control clock cannot be integrated


module clkGating(
    input clk,
    input rst_n,
    input out_en,
    input [63:0] data,
    
    output reg out
    
);

reg en1;
wire clk_en;

always@(posedge clk or negedge rst_n) begin
    if(!rst_n)begin
        en1 <= 1'b0;
    end
    else begin
        en1 <= out_en;
    end
end

assign clk_en = clk & en1;

always @(posedge clk_en or negedge rst_n) begin
    if(rst_n==1'b0)
        out <= 64'b0;
    else
        out<= data;
end

endmodule


30. Use Verilog language to implement an enabled modulo 100 asynchronous 0-clearing counter; the module is defined as module count (out, count_en, clr, clk);


module count (out, count_en, clr, clk);

  output reg [6:0] out;
  input count_en, clr, clk;

  always @(posedge clk or negedge clr)begin
    if (!clr) begin
      out <= 7'd0;
    end
    else if (count_en) begin
           if (out == 7'd99) begin
             out <= 7'd0;
           end
           else begin
             out <= out + 1;
           end
    end
  end

endmodule

31. What do dynamic power consumption and static power consumption refer to respectively? What methods can be used to reduce them?

Dynamic power consumption: P_dynamic= k*C*V*V*f + m*V*I_sc, and voltage, load capacitance, operating clock frequency, signal toggle rate, short

related to the circuit current;

Voltage angle:

(1) Reduce the operating voltage; (2) Multiple voltage domains; (3) Dynamic voltage scaling DVS technology (the processor uses different voltages in different working modes); (4) Power shutdown technology, power-gating;

Load capacitance angle: related to process

(1) Scale down the integration level and reduce device capacitance; (2) In multi-chip systems, multi-chip packaging can be considered to reduce inter-interface capacitance; (3) Reasonable layout and wiring;

Working clock frequency angle:

(1) Reduce operating frequency; (2) Multiple clock domains; (3) Gated clock, clock gating;

Data flip rate angle:

(1) Use codes such as Gray code with relatively few state flips; (2) The data is not operated and keeps the last value instead of forcing it to 0 or 1; (3) Use enable signals and chip select signals to reduce Unnecessary switching;

Static power consumption: P_static = V*I_leak, which is related to voltage and leakage current, and leakage current is related to process;

Voltage angle V:

(1) Reduce the operating voltage; (2) Multiple voltage domains; (3) Dynamic voltage scaling DVS technology (the processor uses different voltages in different working modes); (4) Power shutdown technology, power-gating;

Current angle I_leak (leakage current):

(1) Using HVT high threshold transistors, the leakage current is small; (2) Multiple thresholds;


32. s(t) is the fsk modulated signal s(t)=x(n)sin(w1t)+x'(n)sin(w2t), {w1>w2},x(n)={1,011,01} .The passband of band-pass filtering is w1±a,0<(w1-w2)/2. The following figure shows the demodulation block diagram of envelope detection. Please draw the waveforms of points b, c, d and the waveform of point a. as follows.

In signal processing, some common basic concepts are as follows:

  1. Band-pass filtering: refers to passing signals within a specific frequency range of the signal, while blocking or attenuating signals at other frequencies, thereby filtering the signal. Bandpass filters are often used to remove noise or certain frequency components from a signal to obtain a clearer signal.

  1. Full-wave rectification: refers to changing the negative half-cycle of the signal into a positive half-cycle. This is usually implemented using diodes or rectifier circuits, which amplify the amplitude of the signal.

  1. Low-pass filtering: refers to filtering out the high-frequency components in the signal and retaining only the low-frequency components. Low-pass filters are often used to remove high-frequency noise in signals, smooth signals, or reduce signal bandwidth.

  1. Sampling and decision: refers to sampling the signal and classifying the samples through a decision maker to obtain better signal quality. Sampling decision is usually used in the modulation and demodulation process in digital communications, which can make the transmission signal more resistant to interference and improve the bit error rate.


33.

1) Please explain what is input delay and what is output delay? It can be explained by drawing and other methods.

Input delay: The delay that passes before the signal reaches the input port after the clock is triggered.

output delay: The transmission delay of the signal after the output port outputs the signal to before the next trigger sampling.

2) As shown in the circuit diagram below, it is known that the delay of clock-2-Q is 1.5ns, the clock period T=12ns, the skew between F0 and F1 is 1ns, the setup time is 1ns, and the hold time is 0.5ns. In order to ensure that setup time violation and hold time violation will not occur in the circuit, find tc0_max, tc0_min, tc1_max, tc1_min

Skew = 1ns means that the clock reaches F1 1ns earlier than the clock reaches F0.

In this case, the skew of the F0.clk->F1.D path is -1ns. The skew of the F1.clk->F0.D path is 1ns.

F0.clk->F1.D path:

Establish relationship: tcq + tc0 + tsetup < tclk + tskew, 1.5 + tc0 + 1 < 12 + (-1), tc0 < 8.5ns
Maintain relationship: tcq + tc0 > thold + tskew, 1.5 + tc0 > 0.5 + (-1 ), tc0 > -2ns.

F1.clk->F0.D path:

Establish relationship: tcq + tc1 + tsetup < tclk + tskew, 1.5 + tc1 + 1 < 12 + (1), tc0 < 11.5ns
Maintain relationship: tcq + tc1 > thold + tskew, 1.5 + tc1 > 0.5 + (1), tc0 > 0ns.

3) What is OCV, why is it used, and indicate which paths are affected by OCV in the circuit schematic shown?

On Chip Variation (OCV) refers to the performance difference in the chip's internal circuit due to changes in the manufacturing process. This difference may cause the circuit performance predicted by the chip designer to be inconsistent with the actual circuit performance. There are many factors that affect OCV, such as changes in manufacturing processes, temperature, voltage, aging, etc. These factors will lead to changes in circuit performance.

OCV mainly affects device performance, so all paths to devices will be affected by OCV.

Reference blog:

Introducing the chip OCV_Weijiang’s blog on the road to chip backend-CSDN blog

STA - PVT, RC, OCV_zhuangdk's blog-CSDN blog


34. The function of the following program is to find the narcissus number between 100-999 (the daffodil number refers to a three-digit number whose cube sum is the number itself, such as: 153=1^3+5^3+ 3^3).Please add【? 】The code at


#include<iostream h>
int fun(int n){
    int i,j,k,m;
    m=n;
    【1】
    for(i-1;i<4;i++){
      【2】
       m=(m-j)/10;
       k=k+j*j*j;
    }
    if(k==n)
        【3】
    else return(0);
}
void main(){
    int i;
    for(i=100;i<1000;i++){
        if(【4】==1)
            cout<<i<<"is ok!"<<endl;
    }
}

Supplementary code


#include<iostream h>
int fun(int n){
    int i,j,k,m;
    m=n;
    k = 0;                    // 需要将 k 的初值设为 0
    for(i-1;i<4;i++){
       j=m % 10;              //取m的个位数
       m=(m-j)/10;
       k=k+j*j*j;
    }
    if(k==n)
        return(1);            //是水仙花数
    else return(0);
}
void main(){
    int i;
    for(i=100;i<1000;i++){
        if(fun(i)==1)         //如果是水仙花数,则输出
            cout<<i<<"is ok!"<<endl;
    }
}

35. The description of the question is as follows, with a total of 5 questions:

Q1: If a setup violation is reported for the path from FF_A to FF_B, is it a violation of the setup of FF_A or the setup of FF_B? (+2) If a hold violation is reported for the path from FF_A to FF_B, is it violating the hold of FF_A or the hold of FF_B? (+2)

That is, the data arrives too late (setup violation) or too early (hold violation) when it reaches the next register. a) The violation is the setup of the next register FF_B. b) What is violated is the hold of the latter register FF_B

Q2 Please write down the conditional formula that needs to satisfy the period for path setup check from FF_A to FF_B? (+4)

CLK_D_A + A_CK2Q + Comb_D + B_setup < T + CLK_D_B
setup_slack = T + CLK_D_B - CLK_D_A - A_CK2Q - Comb_D - B_setup > 0

Q3. Please write down the conditional formula that needs to be met for path hold check from FF_A to FF_B? (+4)

CLK_D_A + A_CK2Q + Comb_D > CLK_D_B + B_hold
hold_slack = CLK_D_A + A_CK2Q + Comb_D - CLK_D_B - B_hold > 0

Q4. If it is found that the combinational logic Comb_D is too large than expected, please list the possible reasons? (+4)

There are many reasons why combinational logic is too large, such as ① there are many cascade operations (multiplications) in combinational logic, ② the operation bit width of combinational logic is very large, resulting in a large fan-in and fan-out of the synthesized gate circuit, ③ combinational logic There are many redundant operations, such as unnecessary calculations, etc. ④ Too many input variables will also cause the synthesized gate to be very complicated.

Q5. If Comb is only composed of two levels of buffers and Comb_D is too large than expected, please list the possible reasons? (+4)

① The buffer is overloaded, that is, large fan-in and fan-out. Overloading the buffer may cause the buffer's transmission delay time to increase.
② The buffer itself is of poor quality and has a large delay.

36. A CPU with a main frequency of 400MHz executes a standard test program. The instruction type, number of executions and average number of clock cycles in the program are as follows:

Instruction type Number of instructions executed Average number of clock cycles

Integer 45000 1

Data transfer 75000 2

Float 8000 10

1. Find the effective CPI (Cycle Per Instruction), MIPS (Million Instruction Per Second) and program execution time of the computer.

CPI = (number of integer instruction execution times × average number of clock cycles of integer instructions + number of data transfer instruction executions × average number of clock cycles of data transfer instructions + number of floating point instruction executions × average number of clock cycles of floating point instructions) ÷ total instruction execution Times
CPI = (45000 × 1 + 75000 × 2 + 8000 × 10) ÷ (45000 + 75000 + 8000) = 2.5
Secondly, calculate MIPS:
MIPS = Main frequency × 10^6 ÷ CPI
MIPS = 400 ÷ 2.5 = 160
Finally, Calculate program execution time:
Program execution time = total number of instructions × CPI ÷ main frequency
. Since the total number of instructions is not given, we need to calculate it first:
total number of instructions = number of integer instruction executions + number of data transfer instruction executions + floating point instruction execution Number of times
Total instructions = 45000 + 75000 + 8000 = 128000
Program execution time = 128000 × 2.5 ÷ (400 × 10^6) = 0.8 seconds
Therefore, the effective CPI of this computer is 2.5, MIPS is 160, and the program execution time is 0.8 seconds .

2. If the floating-point unit in the CPU is accelerated by 10 times, what is the overall performance improvement ratio of the CPU?

The overall CPU performance improvement ratio can be calculated using Amdahl's Law:
the proportion of floating-point instructions before acceleration is P, and the overall CPU performance improvement ratio is S, then:
S = 1 ÷ (1 - P + P ÷ 10)
where, P = Number of floating-point instruction executions ÷ Total number of instruction executions, that is,
P = 8000 ÷ (45000 + 75000 + 8000) = 0.0476 .
Substituting this into the calculation can be obtained:
S = 1 ÷ (1 - 0.0476 + 0.0476 ÷ 10) ≈ 1.05
Therefore, if If the floating-point unit in the above CPU is accelerated by 10 times, the overall performance improvement ratio of the CPU will be approximately 1.05 times.

37. Simplify the logical expression: Y=A'BC+ABC'+ABC+AB'C+AB'C' (' means not operation)

Simplify using Karnaugh map: Y=A+BC


38. A certain IP supports three kinds of op operations: WRITE/READ/NOP. 40% of this IP is in the read state, 40% is in the write state, and about 20% is in the NOP state. Please write the constraint (sv code )

Answer: This is a hand-picked question for verification. The knowledge involved is to generate random variables and then assign randomly generated weight constraints to the random variables. The purpose is to randomly generate three OP codes and send them to the slave in a ratio of 4:4:2.


randc int OP_code;
constraint OP_dist{
    OP_code dist {0:=40,1:=40,2:=20};
}

Refer to the author's other blogs: Verification Basics - Random Constraints and Random Control_Addresses do not overlap each other constraint sv_The blog of pickled fish that does not eat onions-CSDN blog


39. Please provide Linux shell command(s)to find all files which contains string "Montage" or "montage" in the/home/user.

答案:grep -rli 'Montage\|montage' /home/user/

- The `grep` command is used to search a file for a specified pattern. - The `-r` option is used to search recursively in the specified directory and its subdirectories. - The `-l` option is used to output only the filenames containing the pattern, not the matching lines. (lower case of the letter L. Without this, all file names containing fields and corresponding lines will be output) - The `-i` option is used to ignore case. - `'montage'' is the pattern to search for. - `/home/user/` is the directory to search.


40. What's the Non_Blocking assignment(b <= a)and Blocking assignment(b = a)?

Blocking assignment uses the equal sign ("=") to assign values. For example: `a = b;` means assigning the value of variable b to variable a. When executing this statement, the program will wait for the assignment operation to complete. Execute the next statement. Blocking assignment is a synchronous assignment method. It will execute each assignment statement in sequence, and the next assignment statement will not be executed until the current assignment statement is completed.

Non-blocking assignment uses "<=" for assignment, for example: `a <= b;` means assigning the value of variable b to variable a. When executing this statement, the program will not wait for the assignment operation to complete. before executing the next statement, but directly execute the next statement. Non-blocking assignment is an asynchronous assignment method that can execute multiple assignment statements at the same time, and the execution time of each statement is uncertain. The execution order of non-blocking assignments is irrelevant, it only guarantees that all assignment statements are executed within one clock cycle.


41. Please use verilog write an 8bits asynchronous reset D flip-flop (8bits low active asynchronous reset D flip-flop)


module ff_8bits(
    input         clk    ,
    input         rstn   ,
    input  [7:0]  D      , 
    output [7:0]  Q
);
reg [7:0] Q_reg;
always @(posedge clk or negedge rstn)begin
    if(!rstn)begin
        Q_reg <= 8'd0;
    end
    else begin
        Q_reg <= D;
    end
end
assign Q = Q_reg;
endmodule

42. What's the difference between "task" and "function" in Verilog?

task:

A task can call another task or another function;
a task can be executed at a non-zero simulation time;
a task can contain delay, time or timing control declaration statements;
a task can have no or multiple inputs (inputs) and outputs (outputs) ) and bidirectional (inout) variables;
tasks do not return any values, and tasks can pass multiple values ​​through output (output) or bidirectional (inout) variables; task
calls are implemented through a separate task call statement;
task calls can only occur In a procedure block;
task execution can be interrupted by the disable statement.

function:

A function can call another function, but not another task;
the function always starts execution at simulation time 0;
the function must not contain any delays, events, or timing control declarations;
the function has at least one output variable, and can have multiple Input variables;
functions can only return one value, and functions cannot have output (output) or bidirectional (inout) variables; functions cannot
appear as a separate statement, they can only appear as part of a statement;
function calls can appear in procedures In block or continuous assignment statements;
the execution of the function is not allowed to be interrupted by the disable statement.

43. Please use two methods to generate a 100MHz Clk signal in testbench,you can use verilog or system verilog


//method 1
always #5 clk = ~clk;
//method 2
initial begin
    forever #5 clk = ~clk;
end

44. What's the setup time and hold time in Synchronous circuit,and how to resolve if setup time was not met?

Settling time refers to the time the data needs to remain unchanged before the data is sampled in the register.

Holding time refers to the time that the data needs to remain unchanged after the data is sampled in the register.

When a setup violation occurs, it can be solved in the following ways:

a). Reduce clock frequency;

b). Split the pipeline on the critical path;

c). Use a better and more stable clock;

d). Use FF with faster sampling speed (shorten the time of ck_to_q).


45. Please use the following CMOS to form an inverter, NAND gate, or NOR gate.

The one on the left is NMOS, and the one on the right is PMOS. In fact, NMOS and PMOS are used to build inverters, NAND gates and NOR gates.

Inverter: pull-up PMOS, pull-down NMOS.

NAND gate: pull up two parallel PMOS, pull down two serial NMOS.

NOR gate: pull up two serial PMOS, pull down two parallel NMOS.


46. What's the result of these function print A and print B?


`timescale  1ns/10ps

class BasePacket;
    int A=1;
    int B=2;
    
    function void printA;
        $display ("BasePocket::A is %0d", A);
    endfunction
    
    virtual function void printB; 
        $display ("BasePacket::B is %0d", B);
    endfunction 
endclass

class My_Packet extends BasePacket;
    int A=3;
    int B=4;
    
    function void printA;
        $display("My_Pocket::A is %0d", A);
    endfunction 
    
    virtual function void printB;
        $display("My_Pocket::B is %0d", B);
    endfunction 
endclass 

module tb_sv();
BasePacket P1;
My_Packet P2;

initial begin
    P1=new();
    P2=new();
    P1.printB;
    P1=P2;
    P1.printA;
    P1.printB;
    P2.printA;
    P2.printB;    
end
endmodule

Answer: Verification question. The knowledge points involved include inheritance, virtual methods, and the problem of calling member variables and methods with the same name from parent classes and subclasses.

First, two classes are defined, the parent class BasePacket, and the subclass My_packet inherits from the parent class BasePacket. Both the subclass and the parent class have two variables, A and B, and two printing methods, printA and printB. where printB is a virtual method. Then start declaring two classes and creating their respective variables new().

This question involves a lot of knowledge, let me explain one by one.

Virtual function:

When different handle types point to the same object, the functions or variables they call may have different results. When the parent class handle points to the subclass object, if both the subclass and the parent class define a method with the same name, then the parent class handle can only access the method of the parent class but not the method of the subclass. In fact, we Point it to an object of a subclass. What I hope is that when calling a function, it can also call the method of the subclass. If you want to do this, you can use the cast function to convert the handle type of the parent class into a handle of the subclass. But if every Doing this every time will make the code redundant and error-prone.
The problem that virtual methods need to solve is dynamic binding. Regardless of the handle type, as long as you point to a subclass object, when you call a method, even if there is a method with the same name, the method in the subclass will be called. Virtual methods are declared via virtual and only need to be declared once. Note that the virtual method solves the index of the method with the same name, and cannot index member variables. There is no such thing as a virtual member.

Step analysis:

There are a total of five pinrt functions in the question. It mainly depends on the results of these five print functions:

The first print: The parent class object accesses the parent class virtual method, and the output B is 2. There is no ambiguity in this.

The second print: point the parent class handle to the subclass object, use printA. Since printA is not a virtual method, the parent class handle can only access the method of the parent class but not the method of the subclass. The printA called is of the parent class. printA, so the output A is 1.

The third print: Point the parent class handle to the subclass object and use printB. Since printB is a virtual method, as long as you point to the subclass object, when you call the method, even if there is a method with the same name, the method in the subclass will be called. . The call is printB in the subclass, so the output B is 4.

The fourth print: The subclass inherits the parent class. In the parent class and the subclass, member variables and methods with the same name can be defined (the formal parameters and return types should also be the same), and when referencing, they will be determined according to the handle type. Scope. printA is a method of the subclass. The subclass directly calls printA, which acts in the scope of the subclass and outputs A is 3.

The fifth print: Consistent with the fourth print, the output B is 4.

The vcs simulation results are as follows and consistent with the analysis:


47. A good verification process can ensure the quality and efficiency of verification to a certain extent, assuming you want to verify a DUT. What process will you follow for verification? Please list each step and give detailed instructions.

The following is a basic DUT verification process, including common steps and recommendations:

1. Confirm verification requirements and specifications: Determine the verification requirements of the DUT, including required input, output, timing, etc., and determine the verification specifications used (such as UVM, OVM, etc.).
2. Write a test plan: Write a test plan, including test goals, test scenarios, test cases, test environment, etc.
3. Write test cases: Write test cases according to the test plan. Each test case should contain input, output, expected results and actual result comparison.
4. Build a verification environment: Build a verification environment for the DUT, including establishing signal interfaces, creating simulation models, setting simulation clocks, generating simulation data, etc.
5. Run simulation: Use the emulator to run tests and perform functional verification, timing verification, boundary condition verification, etc. on the DUT.
6. Analyze the simulation results: Analyze the simulation results, check whether the actual results are consistent with the expected results, identify fault points and debug.
7. Confirm verification coverage: By evaluating the coverage of test cases, confirm that the functionality of the DUT has been fully verified.
8. Optimize verification: Based on the coverage evaluation results, adjust and optimize the test plan, test cases, verification environment, etc. to improve verification efficiency and coverage.
9. Verification passed: When the verification results of the DUT meet the design specifications and verification requirements, confirm that the verification has passed and prepare for subsequent processes, such as integration testing, verification reports, etc.

In short, a good DUT verification process should be standardized, repeatable, scalable, and efficient, fully cover the functions and timing of the DUT, and ensure the accuracy and reliability of the verification results.


48. How to use a 2-choose MUX and an INV to implement XOR.

out = A'B+AB', XOR logic


49.What are recovery and removal times? Please describe the concepts of recovery time and removal time.

In a synchronous circuit, the input data needs to meet the setup time and hold time with the clock for normal data transmission to prevent metastability;

Similarly, for an asynchronous reset register, the set and reset signals also need to meet the recovery time and removal time with the clock to effectively perform the set and reset operations.

recovery time: recovery time. When canceling the reset, the level that returns to the non-reset state must arrive a period of time before the arrival of the valid edge of the clock to ensure that the clock can effectively return to the non-reset state. This period of time is the recovery time.

removal time: removal time. When canceling the reset, the time the reset signal needs to be maintained after the arrival of the valid edge of the clock is the removal time.

Comment: It can be seen from the definition that recovery time and removal time both refer to the removal of reset or reset, that is, the system or device will start to work, and do not refer to reset and removal of reset (the system or device will stop working) No matter what the metastable state is!)


50.The clock cycle is T, the clock toregister output delay is Tco, setup and hold time of a register are Tsetup andThold, what's the Tdelay constrain? Register holding time Thold. Please describe the setup and hold time requirements for the logic delay Tdelay (regardless of clock delay).

STA analysis.

setup分析:Tco+Tdelay+Tsetup<T

hold analysis: Tco+Tdelay>Thold


51.What's the difference between a LATCH and a DFF? Please describe the concepts and differences between a LATCH and a DFF?

1. The latch is triggered by level and controlled asynchronously. When the enable signal is valid, the latch is equivalent to a path, and when the enable signal is invalid, the latch maintains the output state. DFF is triggered by the clock edge and controlled synchronously.

2. Latch is prone to glitches, while DFF is not prone to glitches.

3. If gate circuits are used to build latch and DFF, latch consumes less gate resources than DFF. This is where latch is superior to DFF. Therefore, the integration level of using latch in ASIC is higher than that of DFF, but the opposite is true in FPGA, because there is no standard latch unit in FPGA, but there is a DFF unit, and a LATCH requires multiple LEs to implement.

4. Latch makes static timing analysis extremely complex.


52.What's the difference between asynchronous and an asynchronous circuit? What is the difference between asynchronous circuit and an asynchronous circuit?

The difference between synchronous circuits and asynchronous circuits is whether the circuit triggering is synchronized with the driving clock . From a behavioral perspective, that is whether all circuits process data synchronously on the same clock edge.


53.What is IR-drop, in which area will be easy to have IR-drop problem? What is IR-drop, in which area will be easy to have IR-drop problem?

IR-drop refers to the voltage drop caused by resistance to current flow. In a circuit, when a large amount of current flows through a line or power supply, the resistance of the circuit will cause the voltage of the line or power supply to drop. This phenomenon is called IR-drop.

IR-drop problems often occur in the power and ground circuits of chips, especially in large digital circuits such as processors and memories. The large number of transistors and wiring in these circuits causes increased resistance, causing IR-drop problems. In addition, in high-speed designs, inductors and capacitors in power and ground circuits can also cause voltage drops and increase the occurrence of IR-drop problems.


54.How do you synchronize an asynchronousinput? How to synchronize asynchronous signals?

Single bit: from slow to fast, using a two-stage trigger to reduce possible metastable effects; from fast to slow, pulse broadening

Multi-bit: asynchronous FIFO, DMUX, Dual RAM; register latch, handshake protocol, synchronization after effective enablement;


55.There is an X present in my gate-levelsimulation due to a timing violation. How do you identify the source of it and the type of violation? If an question?

1. The simulation pattern’s own reasons, such as the program using an uninitialized (written) storage area, reads the red X data and uses it, causing the red X to spread;

2. Due to the simulation environment or platform, the signal of the top PIN pin of the model or the entire chip is not driven, and is in a high-impedance Z state. When it enters the digital logic, it becomes a red X and propagates;

3. DFF, gating cell, etc. due to the setup/hold not being satisfied, or the recovery/removal of the rst signal not being satisfied, or the first beat of DFF being synchronized by asynchronous logic, red X is generated and causes propagation;

4. DFF without reset terminal causes X propagation

The X state is also a metastable state, and it will propagate: metastable state means that the trigger cannot reach a confirmable state within a specified period of time. When a flip-flop enters a metastable state, it is impossible to predict either the output level of the unit or when the output will stabilize at a correct level. During this stable period, the flip-flop outputs some intermediate level, or may be in an oscillating state, and this useless output level can cascade down the various flip-flops on the signal path.


56. Please describe the ECO flow (including pre-mask ECO and post-mask ECO). Please describe the ECO flow, including pre-mask and post-mask ECO.

The ECO (Engineering Change Order) process is a process that is adopted when problems are found that need to be corrected after the chip design is completed. This process is mainly divided into two stages: pre-mask ECO and post-mask ECO.

  1. pre-mask ECO: pre-mask ECO is a modification made after the design is completed but before the chip enters the manufacturing process. The main purpose of this process is to solve problems existing in the design, such as incorrect circuit logic, too high power consumption, or unsatisfied timing. During the pre-mask ECO process, the design team usually uses RTL simulation to verify the repair effect of EC0 and conducts netlist-level simulation to ensure that the repair does not introduce new problems.

  1. Post-mask ECO: If there are still problems after the chip is produced, post-mask ECO needs to be performed. The main purpose of this process is to fix problems found during chip production. During the post-mask ECO process, the design team needs to feed back the produced chip samples for modification to fix problems in the design. Usually, post-mask ECO uses modifications at the physical level to solve problems, such as by introducing metal connections or changing the physical structure.

Generally speaking, the ECO process is a repair process performed after the chip design is completed. Among them, pre-mask ECO is a modification made before chip manufacturing, while post-mask ECO is a modification made after chip production. Through these two processes, the correctness and integrity of the chip design are ensured.


57.What are various techniques to resolve routing congestion? How to solve routing congestion problem?

Routing congestion refers to the mutual interference of signal lines or insufficient wiring resources during wiring, resulting in the failure of some signal lines to be successfully routed. Here are several ways to solve the routing congestion problem:

1. Increase wiring resources: You can increase wiring resources, such as increasing the number of line layers, increasing line width, etc. But this approach may result in larger and more expensive boards.

2. Reduce the signal rate: Reducing the signal transmission rate can reduce the interference between signal lines. But this approach may affect the performance and speed of the circuit.

3. Reallocation layout: Rearrange the layout of the circuit to reduce the use of wiring resources. This can be achieved by redesigning the PCB layout.

4. Use optimized routing algorithms: Use optimized routing algorithms that can handle routing congestion problems, such as Maze routing algorithm and Lee-Moore algorithm. These algorithms optimize path selection and the use of routing resources, thereby reducing the occurrence of routing congestion.

5. Use impedance matching: Impedance matching is performed on the wiring circuit to make it more stable and reliable, thereby reducing interference between signal lines. This approach requires specialized impedance matching tools and techniques.


58.Please describe the rtl with INV, AND, OR andDFF. Please use AND, OR, NOT gates and registers to draw the circuit described by the code.


always@(posedgeclk or negedge rst_n)
begin
    if(!rst_n) begin
        cnt<= 2'd0;
    end
    else if(cnt_en) begin
        if(ina)
            cnt <= cnt+2'd1;
    end
    else begin
        cnt <=2'd0;
    end
end

Asynchronous reset register. The code involves addition and selection. First draw the gate-level circuit of a full adder. The full adder involves XOR, so draw another XOR circuit. Finally, according to the mux situation in the code, connect the full adder and the mux. (tips: When I was drawing the picture, I forgot that I couldn’t use mux. I can use the AND gate and the OR gate to achieve a two-choice mux)

59.What are the different sources of powerconsumption? Please describe different techniques used to reduce powerconsumption. What are the types of power consumption of the chip? Please describe the ways to reduce power consumption.

Power consumption is divided into dynamic power consumption and static power consumption.

The dynamic power consumption formula is: P_switch=1/2CV^2f.

Reduce dynamic power consumption

(1) Using gated clocks
to reduce the activity factor is a very effective way to reduce power consumption
(2) Reducing glitches
will increase the activity factor, possibly increasing the gate activity factor to more than 1.
(3) Reduce load capacitance
Capacitance comes from the connections and transistors in the circuit. Shorten the wiring length and good floor plan and layout can reduce the wiring capacitance. Choosing a smaller number of logic stages and smaller transistors can reduce the device's flip capacitance.
(4)
The dynamic power consumption in the voltage domain has a square relationship with the voltage. Reducing the power supply voltage can significantly reduce the power consumption. Divide the chip into multiple voltage domains, each of which can be optimized for the needs of a specific circuit. For example, a high power supply voltage is used for the memory to ensure the stability of the storage unit, a medium voltage is used for the processor, and a low voltage is used for the IO peripheral circuits that run at a lower speed. The solution to cross-voltage domain signal transmission is to use level converters.
(5) Dynamic voltage adjustment DVS
CPU handles different tasks and has different performance requirements. For tasks with low performance requirements, the clock frequency can be reduced to the minimum value sufficient to complete the task within the scheduled time, and then the voltage can be reduced to the minimum value required to operate at this frequency, which can save a lot of energy consumption.
(6) Reduce frequency and
dynamic power consumption is proportional to frequency. The chip should only work at the required frequency and cannot be faster than required. As can be seen from the previous summary, lowering the frequency can also use a lower power supply voltage, which greatly reduces power consumption.
(7) Resonant circuit
The resonant circuit reduces flipping power consumption by transferring energy back and forth between energy storage components such as capacitors or inductors instead of discharging energy.

Reduce static power consumption:

(1) Power gating
The easiest way to reduce quiescent current is to turn off the power supply of the sleep module. This technique is called power gating.
(2) Multiple threshold voltages and gate oxide thicknesses. The
selective application of multiple threshold voltages can maintain performance of transistors with low Vt while reducing leakage in other paths with high Vt transistors.
Most nanotechnology logic tubes use thin gate oxide, and IO transistors use much thicker gate oxide to enable them to withstand larger voltages.
(3) Variable threshold voltage The
threshold voltage can be modulated through the body effect. Applying a reverse body bias in sleep mode reduces leakage. Utilizes a forward body bias in active mode to improve performance.
(4) Input vector control
As can be seen from the above, the stacking effect and input ordering will cause changes in sub-threshold leakage and gate leakage. Therefore, the leakage of a logic module is related to the input of the gate. Input vector control applies a set of input patterns to minimize module leakage when the module is placed in sleep mode. These input vectors can be added via set/reset inputs on the register or via scan chains.

Reference blog: Static power consumption and dynamic power consumption_yuzhong_Muyang's blog-CSDN blog


60.Implement below RTL logic with DFF andNOR/NAND/INV cells:


always@(posedge clk or negedge rst_n) begin
    if(!rst_n)
        C<=1'b0;
    else if (B)
        C<=~A;
    else;
end    

Regardless of the reset signal, the expression of C can be written as: C = B'C+BA', and then the circuit diagram is obtained.


61.What is metastability? How to prevent the propagation of metastability or reduce the probability of metastability?

Metastability refers to the inability of a flip-flop to reach a confirmable state within a specified period of time. When a flip-flop enters a metastable state, it is impossible to predict either the output level of the unit or when the output will stabilize at a correct level. During this stable period, the flip-flop outputs some intermediate level, or may be in an oscillating state, and this useless output level can cascade down the various flip-flops on the signal path.

Metastable state has the following characteristics:

1) Metastable state violates the timing and cannot become stable state within the specified time.

2) The metastable output is uncertain, but will be passed to the subsequent stage flip-flop. This will cause errors in the subsequent circuit. So metastability is very harmful.

3) The metastable state will eventually stabilize, but it will take a longer time.

Tsu is the setup time: it refers to the time for the data to be stable before the rising edge of the clock signal of the flip-flop arrives. If the setup time is not enough, the data will not be stably driven into the flip-flop at the rising edge of the clock signal. Tsu refers to this Minimum settling time.

Th is the holding time: it refers to the time for the data to be stable after the rising edge of the clock signal of the flip-flop arrives. If the holding time is not enough, the data cannot be stably entered into the flip-flop. Th refers to this minimum holding time.

Tco is the output time: it is the time required for the flip-flop to stabilize its output after the rising edge of the clk clock arrives.

Tmet: The additional time beyond Tco required for the metastable output to return to a stable state is called the stabilization time, that is, after this period of time, the metastable state becomes a stable state.

If the Tsu and Th of the flip-flop are not satisfied during data transmission, the data will be unstable, resulting in metastability.

How to avoid metastability:

Asynchronous signals and cross-clock domain signals are high-risk areas for metastability.

For asynchronous signals, multi-stage synchronizers are generally used to avoid the occurrence of metastability.

For cross-clock domain signals, there are usually 4 methods of cross-clock domain processing

(1) Two beats, two-level flip-flop synchronization— single-bit data is processed across clock domains , suitable for slow clock domain data to fast clock domain;

(2) Asynchronous FIFO—multi-bit data is processed across clock domains;

(3) Gray code conversion;

(4) Add handshake signal (guide signal).


62. Translate the following paragraphs and draw the start and stop timing of the I2C interface based on the description:

The I2C bus employs two signals, SDA(data)and SCL (clock), to communicate between integrated circuits in a system.Thebus transfers data serially, one bit at a time. The 8-bit address anddatabytes are transferred with the most-significant bit (MSB) first. Inaddition, each byte transferred on the bus is acknowledged by the receivingdevice withan acknowledge bit. Each transfer operation begins with the masterdevicedriving a start condition on the bus and ends with the master devicedriving astop condition on the bus. The bus uses transitions on the data pin(SDA) whilethe clock is at logic high to indicate start and stop conditions. Ahigh-to-lowtransition on the SDA signal indicates a start, and a low-to-hightransitionindicates a stop. Normal data-bit transitions must occur within thelow time ofthe clock period.

The I2C bus uses two signals, SDA (data) and SCL (clock), to communicate between integrated circuits in the system. This bus transfers data serially, bit by bit. The 8-bit address and data bytes are transmitted in MSB first order. Additionally, each byte transmitted on the bus is acknowledged by the receiving device with an acknowledgment bit. Each transfer operation is initiated by the master device sending a start condition on the bus and ends with the master device sending a stop condition on the bus. The bus uses transitions on the data pin (SDA) when the clock is at logic high to indicate start and stop conditions. A high-to-low transition on the SDA signal indicates a start, while a low-to-high transition indicates a stop. Normal data bit transitions must occur during the low period of the clock cycle.


63.The following schematic shows datapath operators going into a register. From power perspective, figureout the inefficient part and draw a new schematic with your fix.

Low power consumption design, add a gated clock:


64.Suppose there is a logfile

The file's content is like:

<MESSAGE LEVEL>_<MESSAGE_TYPE>_<MESSAGE_ CONTENT>

<MESSAGE_ LEVEL> should be "ERROR" or"WARNING" or "INFO"

<MESSAGE_TYPE> should be "TYPE" plusan integer number.

Please write a function named as printErrors to parse the log, filter out required information and print some messages. Given astring logPath representing the log file path.

The requirements are:

a. The output messages should be ERROR level and their MESSAGE_CONTENT should contain “NVIDIA_SOC”

b. Sorted the output order by MODULE_TYPE number

c. Use any script language you like.

###

Example 1:

log file.

ERROR_TYPE1_NVIDIA

INFO_TYPE1_NVIDIA SOC

ERROR_ TYPE4_THIS_IS_NVIDIA_SOC

WARNING_TYPE2_SOC

ERROR_TYPE1_SOC

ERROR_TYPE1_NVIDIA_SOC_TEAM

ERROR_TYPE4_NVIDIA_SOC

ERROR_TYPE12_NVIDIA_SOC

Example 2:

log file:

ERROR_TYPE1_NVIDIA

ERROR_TYPE12_NVIDIA_SOC

ERROR_TYPE1_Nvidia_soc

INFO_TYPE2_NVIDIA_SOC TEAM

ERROR TYPE12 NVIDIA SOC

###

Please provide your answer in the following editor


65.Gate level logic netlist_ais optimized to netlist_b in back-end flow. And they are checked by formalcheck tool to prove whether they are functional equivalence. Please answer belowquestions.

1.What is the concept of combinational and sequential logic?Please classify A/B/C/D cells in below netlist_a schematic, which belong to combinational logic and which belongs to sequential logic?

①:

The combinational logic output depends only on the input and not on the previous state of the circuit.

Sequential logic outputs depend not only on the inputs but also on the previous state of the circuit.

②:

A is sequential logic and BCD is combinational logic.

2. If the value vector 110 is applied to the left three flops D pin After 1 cycle what is the D pin value of the reg_d in netlist_a/netlist_b?

netlist_a: output 1

netlist_b: output 0

3. Please estimate if above netlists are function equivalent or not according to the netlist schematics and explain why?

Not equivalent, the results are different.


66. 3-stages pipelinecircuit shown as below.

The clock period is 0.9

The clock uncertainty is 0.1

The cell delay for F1/F2/F3 from CP-> Q are 0.15

The library setup require time for F1/F2/F3 are 0.1

1. Please calculate the setup slack between F1 and F2

Answer: The setup timing analysis formula is as follows

clk_latency + clk_pathF1_delay + ck_to_q + logic_delay < clk_period+clk_latency + clk_pathF2_delay - dff_set_up - clk_uncertainty

1.3+0.6+0.15+0.8<0.9+1.3+0.5-0.1-0.1

2.85<2.5

time slack = - 0.35ns

2. Please describe what's clock skew and suggest how to fix the setup violation between F1 and F2 with clock skew

Clock skew is the time difference between the same clock source and different registers, which is the above:

clk_pathF2_delay-clk_pathF1_delay

clock skew = -0.1 in the previous question

From the perspective of clock skew, to repair timing violations, you can add a buffer to the clock path of the subsequent clock and increase clk_pathF2_delay.

3. Please re calculate the slack betweenF2 and F3 after the setup violation between F1 and F2 are fixed to 0

After fixing the timing as an example, clk_pathF2_delay=0.85ns. The setup time slack of F2 and F3 can be calculated using the same formula as in the first question:

clk_latency + clk_pathF2_delay + ck_to_q + logic_delay < clk_period+clk_latency + clk_pathF3_delay - dff_set_up - clk_uncertainty

1.3+0.85+0.15+0.1<0.9+1.3+0.5-0.1-0.1

2.4<2.5

time slack = 0.1ns


67.Please use NAND2 gatesto create new logic signal as below: (Use as less gates as possible) New_ logic= ECO_SELECT? Original_logic & mask: original logic;

Answer: The above logical expression can be converted into addition form.

The above formula is too complex and can be expressed in simple letters:

Y=A?B&C:B=ABC + A'B=B(AC+A')=B(A'+C)=B(AC')'

Connect C to a NAND gate to get C'. C' and A are connected with a NAND gate to get (AC')'. B and (AC')' are connected with two NAND gates to get B(AC')', so a total of four NAND gates are needed.


68.Design a sequence(10100) detector. The logic with single bit input and single bit output. When detecting input bit with the sequence of 10100 output pulse with one cycle of 1'b1,otherwise output keeps 1'b0. (No need to write RTL code, just provide a schematic diagram or a state machine flow chart.)

Draw a state transition diagram for sequence detection. The S5 state outputs 1, and other conditions output 0.


69. As shown in the assertion below, which clock in the waveform shown in the picture can be judged as success?

Answer:

The disable iff statement can be used to temporarily disable an assertion under a certain condition.

The EN signal is active at high level. When A rises, the previous cycle (~B) is established. Wait for 0 cycles. (B&&~C) is established and wait for 1 cycle. Until D is 1, ( B&&C) statement is always true; wait for 1 cycle, B is true.

So to sum up, the 15th clock will be judged as success.

Reference blog: systemVerilog Assertion (SVA) assertion syntax_OnePlusZero's blog-CSDN blog


70. There is a circuit module M whose function is as shown below.

Assume that there are only the following devices:

Please use the above devices to build a circuit that implements this function.

The numbers cycle through 2, 3, 7, 5, 4, 0. The binary representation is: 010, 011, 111, 101, 100,000. It is easy to find that the change between two adjacent numbers is only 1 bit, and then think that this is Gray code. It is similar to Gray code, but different, because the Gray code from 010 to the next digit is 110, here it is 011, so a judgment needs to be made. The specific implementation is that when the counter goes from 2 to 3, the output result is shifted one bit to the right. When the counter reaches 7, it goes to 0 in the next cycle. Then start counting from 2.

The question was poorly answered. It was very time-consuming to build circuits in such an irregular manner, and I don’t know what the point is. In the end, the time for each gate was given, and the clock cycle was not given, so STA analysis could not be done.

Or if anyone finds a pattern, please add it in the comment area and I will revise it.


71. Please answer the following questions:

(1) Please briefly explain the concepts of clock skew and jitter

Clock skew is the time difference between a signal clock arriving at the source register and the destination register along the same clock network.

Clock jitter is the clock jitter caused by the instability of the crystal oscillator, which is inevitable.

(2) Please describe the difference between design for test (DFT: design for test) and verification.

Design for testability refers to strategies and tools that consider testing during the design phase to ensure that the chip can be effectively tested and diagnosed. The goal of design for testability is to design a chip that is easy to test and diagnose so that faults can be quickly and accurately detected and repaired during production and use. Design for testability includes designing wiring, designing scan chains, selecting test points, etc.

Verification verifies that the design meets specifications and requirements. The purpose of verification is to ensure that the design meets its intended functionality and performance and works properly under a variety of circumstances. The verification process is typically performed using simulation, verification tools, and hardware testing.

In short, DFT is designed to make chips easy to test and diagnose, while verification is designed to verify that the design complies with specifications and requirements.


72. The circuit in the figure below is in the same clock domain, and its function is to transmit the result of DATA0 plus DATA1 to the REG input terminal when SEL0 and SEL1 are 0 and 1 respectively.

Now you need to reduce the power consumption of this circuit. Please use common logic units, modify them without changing the function of the above circuit, and draw the modified circuit diagram.

A two-select one MUX needs to be implemented with 4 NAND gates:

An AND gate is implemented with two NAND gates, and an NOT gate is implemented with one NAND gate.

The optimized circuit achieves the same function. Compared with two MUXs (eight NAND gates), only five NAND gates are used here. The blogger did not think about it carefully. There may be room for optimization. Welcome to discuss in the comment area.


73. Application of Gray code in asynchronous circuits:

The conversion formula between 4-bit wide binary code and Gray code is as follows:

reg[3.0]g: //Gray code

reg[3.0]b: //Binary code

g[0]=b[0]^b[1]: b[3]=g[3]:

g[1]=b[1]^b[2]: b[2]=g[3]^g[2];

g[2]=b[2]^b[3]: b[1]=g[3]^g[2]^g[1];.

g[3]=b[3]: b[0]=g[3]^g[2]^g[1]^g[0];

There are two asynchronous clocks: clk0, clk1. It is required to generate a 4-bit counter in ck0 domain and transmit the counting result to ck1 domain (Fclk1>2*Fclk0). Please use Verilog to implement this design:

If the counter step is 2, that is, it cycles in the order 0->2->4->6->0, what problems will there be in the above design?

Do it again after some time.


74. A certain circuit has the following Waveform:

Among them, clk, din[2:0] are inputs; out1, out2 are outputs. The value of din[2:0] is random.

(1) Please use 1 DFF, several AND, OR, and NOT gates to implement the above functions, and draw the circuit diagram

Answer:

(1) din is the input, out1 will be pulled high when din=2,3,7, and out2 is a rising edge detection circuit.

din=2,3,7, these numbers are expressed in binary as: 010,011,111.

Therefore out1=din[0]'din[1]din[2]'+din[0]din[1]din[2]'+din[0]din[1]din[2]

Hit out1 and then XOR it with out1 to get out2

(2) Use Verilog language to describe the above circuit.


module out_1_2(
  input       clk   ,
  input [2:0] din   ,
  output      out1  ,
  output      out2
);
reg out1_beat;
always @(posedge clk)begin
  out1_beat <= out1;
end
assign out1 = ((!din[0])&&din[1]&&(!din[2])) + (din[0]&&din[1]&&(!din[2])) + (din[0]&&din[1]&&din[2]);
assign out2 = out1_beat ^ out1;

endmodule 

75. Static timing analysis

For the following circuit diagram:

Explanation of Timing parameters in the figure:

Thold Hold time minimum time
TSu Setup time minimum time
TCq Clock-> Q delay
Input Delay of input IN is always 0.5ns.

1. Find the maximum frequency that this circuit can achieve?

Find the maximum frequency to check the setup path.

FF1 to FF2 setup:

clk_latency + clk_pathF1_delay + ck_to_q + logic_delay < clk_period+clk_latency + clk_pathF2_delay - dff_set_up - clk_uncertainty

1+2+2+2+2<clk_period+1+1-3, the minimum clk_period is 10ns and the maximum frequency is 100MHz.

Setup from FF2 to FF1:

The data_arrive of register 1 has two paths, one is directly from din to Path1 of the D end, and the other is from the Q end of F2 to the D end of Path2.

Path1: 0.5 + 2 = 2.5 ns (din + Tandgate) combinational logic path

Path2: 1 + 1 + 2 + 2 = 6ns (two buffers+Tcq+Tandgate) sequential logic path

The longest delay is Path2

clk_latency + clk_pathF2_delay + ck_to_q + logic_delay < clk_period+clk_latency + clk_pathF1_delay - dff_set_up - clk_uncertainty

6< clk_period+1-3, the minimum clk_period is 8ns, the maximum frequency is 120MHz,

Two paths are analyzed and the smaller frequency is taken, so the maximum frequency of the circuit is 100MHz.

2. The clock frequency is 50M. Does this circuit have timing violations? If so, write down the calculation process and give modification suggestions.

The clock frequency is 50M. After the analysis of our first question, there will be no setup violation in this circuit, so we need to check the hold violation.

Hold from FF1 to FF2:

1+2+2+2+2>1+1+2 established

Hold from FF2 to FF1:

0.5+2>1+2, not established, hold violation.

Modification comments: ① The input delay of IN needs to be increased to more than 1ns. ② Replace the path FF2 to FF1 with an AND gate with a larger delay.


76. Briefly describe the process of chip design and manufacturing

  1. Demand analysis and planning: Based on product demand and market research, determine the chip's function, performance, power consumption, size, cost and other requirements, and formulate project plans and schedules.

  1. Design front-end: including circuit design, functional verification, logic synthesis, timing analysis, etc. Designers use EDA (Electronic Design Automation) tools to design circuits for each functional module of the chip, simulate and verify the circuit, and generate schematic diagrams and circuit netlists. Then perform logic synthesis and timing analysis on the circuit to generate a synthesized netlist.

  1. Physical design: including layout, layout, wiring and physical verification. Designers layout and layout the synthesized netlist, determine the location and size of each module, and then perform wiring to connect the various modules in the chip to form a circuit. Finally, physical verification is performed to confirm the correctness and manufacturability of the circuit.

  1. Design backend: including design verification, testing, verification, repair and GDSII generation, etc. Use EDA tools to verify and test layout and routing, identify errors and defects in circuits, and repair them. A GDSII file of the chip is then generated, which is a digital format that describes the physical structure of the chip.

  1. Manufacturing: including mask making, wafer manufacturing, packaging testing, etc. A photomask is created from the GDSII file, and then photolithography is used to fabricate the chip on the wafer. After manufacturing is completed, the chip is packaged and tested to ensure that the performance and quality of the chip meet the requirements.


77. How to use UVM verification method to build a verification platform, just briefly describe the idea (including the functions of each part)

UVM (Universal Verification Methodology) is a standard method for building verification environments, which can help verification engineers improve verification efficiency and maintainability. The following is the general idea for building a UVM verification platform:

  1. Write DUT model: First, you need to prepare the DUT model to be verified. You can use hardware description languages ​​such as Verilog and VHDL to write the model, or you can use other high-level languages ​​such as SystemVerilog and C++ to build the model.

  1. Write test cases: According to the requirements of the DUT model, write test cases to verify the correctness and completeness of the model. Test cases generally include steps such as generating input data, feeding it into the DUT model, collecting output data, and checking whether the output data meets expectations.

  1. Write UVM testbench: Use UVM framework to build testbench, which is used to control test cases, generate simulation events, collect simulation results, etc. A testbench usually includes the following components:

  • Agent: used to handle the sending and receiving of input/output signals and data, including components such as monitor, driver, and scoreboard.

  • Sequence: Sequence component used to generate test cases.

  • Environment: Used to organize the hierarchical relationship of various components in testbench and provide connections and communication between various components.

  • Configuration: used to specify various parameters and properties of testbench.

  1. Run simulation: Run the simulation program, pass the test cases to the DUT model, collect the simulation results, and compare them with the expected results. If there is a mismatch, it needs to be debugged and fixed.

Generally speaking, the construction of UVM verification platform includes the following steps: prepare DUT model, write test cases, build UVM testbench, run simulation, collect simulation results and debug. It should be noted that the specific implementation of the UVM verification platform may vary from project to project.


78.What is the main power consumption of CMOS? (5 points)

  1. Static power consumption: The power consumption when the chip is in static mode, also known as leakage current power consumption. Static power consumption is caused by the leakage current of the transistor. In CMOS circuits, static power consumption usually accounts for a part of the total power consumption.

  1. Dynamic power consumption: The power consumption when the chip is in dynamic mode, also called switching power consumption. Dynamic power dissipation is caused by the charge and discharge generated by the transistor during its switching operation.

  1. Short-circuit power consumption: In CMOS circuits, due to the existence of multiple different paths in the circuit, signals on different paths may be at high or low levels at the same time, resulting in short-circuit current and thus power consumption.

In CMOS circuits, each transistor has a switching time, which is the time from on to off. When a transistor is turned on, the load capacitance downstream of it is charged, and the inductor upstream is also charged. If during this time other transistors in the circuit downstream of this transistor also start charging, a short-circuit current will occur between them. This results in unnecessary power dissipation because these currents create a voltage drop across the circuit's resistance, thereby dissipating power.

  1. Interconnect power consumption: The connections between different circuits in the chip will also cause power consumption. Interconnect power consumption is usually caused by the capacitance and inductance between circuits.

  1. Switching speed power consumption: In CMOS circuits, because transistors take a certain amount of time to perform switching operations, fast switching operations may cause power consumption, especially at high frequencies.


79. Please convert the serial processing process described in C language below into parallel processing completed in a single shot, and describe it with Verilog that can be synthesized.


#include <stdio. h>
unsigned charcal_table_high_first(unsigned char value)
{
    unsigned char i;
    unsigned char checksum = value;
    for (i=8; i>0; --i)
    {
    if (checksum & 0x80)
    checksum= (checksum<< 1)^ 0x31;
    else
    checksum= (checksum << 1);
    }
    return checksum;
}
int main(void)
{
    /*我的第一个C程序*/
    print("%x",cal_table_high_first(60));
    getch();
}
//输出为:b8

Answer: 0x80 is 8'b0100_0000 in binary. The AND in the if judgment condition is a bitwise AND, and if 1 comes out, 1 comes out. Therefore, as long as the 6th bit of the checksum (counting from 0) is 1, then the if judgment condition is satisfied. 0x31 is 8'b0011_0001 in binary.

Guess you like

Origin blog.csdn.net/qq_57502075/article/details/133261993