Low Power Design Clock Gating

background introduction

As much as 40% or more of the chip power consumption is consumed by the clock tree. The reason for this result is also intuitive, since these clock trees have the highest switching frequency in the system, have many clock buffers, and in order to minimize the clock delay, they usually have a high drive strength. Then the most direct way to reduce the power consumption of the clock network is to turn off the clock if the clock is not needed. This method is the familiar gated clock: clock gating. If we were to design a circuit that gates a clock, how would we design it? The most direct method is to turn off the clock when the clock is not needed. This is the AND operation. We only need to perform the "AND" operation on enable and CLK. The circuit diagram is as follows:

This way of directly gate-controlling the control EN signal and the clock CLK to complete the operation can be completed when EN is 0, and the clock is turned off. But at the same time it brings another big problem: glitches

As shown in the figure above, EN is uncontrolled and may jump at any time, so the pure combined output GCLK may have glitches, and glitches on the clock signal are very dangerous.

Naturally, we will think of a solution, use a flip-flop, just register EN with CLK, then the output will be based on CLK. In fact, there is another way to use a latch. The output latched by EN with a latch is also based on CLK.

latch gating

Let’s talk about the second method first, using a latch for clock gating, the circuit is as follows:

The waveform is as follows:

It can be seen that only when CLK is high, GCLK may output high, so that the glitch caused by EN can be eliminated. This is because the D latch is level-triggered. When clk=1, the data flows to Q through the D latch; when Clk=0, Q keeps the original value unchanged.

Although our goal of eliminating burrs has been achieved, there are still two disadvantages in this circuit: 1. If in the circuit, the latch and the AND gate are far apart, and the clock arriving at the latch and the clock arriving at the AND gate have a large delay difference, glitches will still appear. 2 If in the circuit, the clock enable signal is very close to the latch, it may not meet the establishment time of the latch, which will cause a metastable state of the latch output.

上述的右上图中,B 点的时钟比 A 时钟迟到,并且 Skew > delay,这种情况下,产生了毛刺。 为了消除毛刺,要控制 Clock Skew,使它满足 Skew >Latch delay(也就是锁存器的 clk-q 的 延时)。上述的右下图中,B 点的时钟比 A 时钟早到,并且|Skew| > ENsetup 一 (D->Q),这 种情况下,也产生了毛刺。为了消除毛刺,要控制 Clock Skew,使它满足|Skew|< ENsetup 一(D->Q)。

寄存门控

对于clock gating,我们还有另外的解决办法,就是用寄存器来寄存 EN 信号再与上 CLK 得到 GCLK,电路图如下所示:

时序图如下所示:

由于 DFF 输出会 delay 一个周期,所以除非 CLKB 上升沿提前 CLKA 很多,快半个周期,才 会出现毛刺,而这种情况一般很难发生。但是,这种情况 CLKB 比 CLKA 迟到,是不会出现毛刺的。 当然,如果第一个 D 触发器不能满足 setup 时间,还是有可能产生亚稳态。

提问:SOC 芯片设计中使用最多的是锁存结构的门控时钟,为什么? 原因是:在实际的 SOC 芯片中,要使用大量的门控时钟单元。所以通常会把门控时钟做出一 个标准单元,有工艺厂商提供。那么锁存器结构中线延时带来的问题就不存在了,因为是做成一个单元,线延时是可控和不变的。而且也可以通过挑选锁存器和增加延时,总是能满足 锁存器的建立时间,这样通过工艺厂预先把门控时钟做出标准单元,这些问题都解决了。

那么用寄存器结构也可以达到这种效果,为什么不用寄存器结构呢?那是因为面积!一个 DFF 是由两个 D 锁存器组成的,采样 D 锁存器组成门控时钟单元,可以节省一个锁存器的 面积。当大量的门控时钟插入到 SOC 芯片中时,这个节省的面积就相当可观了

代码(寄存器门控)

module clk_gating(
    input         clk      ,
    input         rst_n    , 
    input         out_en   ,
    input [63:0]  data     ,
    
    output reg out
);
 
reg en1;
wire clk_en;
 
always@(posedge clk or negedge rst_n) begin
    if(!rst_n)begin
        en1 <= 1'b0;
    end
    else begin
        en1 <= out_en;
    end
end
assign clk_en = clk & en1;
always @(posedge clk_en or negedge rst_n) begin
    if(rst_n==1'b0)
        out <= 64'b0;
    else
        out<= data;
end
endmodule

综合出来电路如下,和我们预想的一样。

Guess you like

Origin blog.csdn.net/qq_57502075/article/details/129278380