Arbiter design (2) RR round-robin scheduling

PrevFixed Priority Arbiter

Arbiter design (1) Fixed priority arbiter

As mentioned, one of the problems with fixed-priority arbitration is fairness. For example, in the above article, students raised their hands and the teacher named the teacher. If the teacher called the student with a small student number every time, the students with a large student number would feel unfair, because the chances of being called by the teacher were small. It may be fine to simply answer questions. If we assume that one point is accumulated for each question answered, and the final score is calculated according to the number of questions answered, then obviously this method is too unfair for students with large numbers. Therefore, the fairness of the arbiter is something we must consider in the design.

        Round Robin is an arbitration algorithm that considers fairness. The basic idea is that when a requestor obtains the grant permission, its priority becomes the lowest in the next arbitration, that is to say, the priority of each requestor is not fixed, but will be the highest (obtained). After the grant), it becomes the lowest, and adjusts accordingly according to the permission of other requestors. In this way, when there are multiple requestors, grant can give each requestor in turn. Even if the previous high-priority requestor has a new request again, it will wait for the previous requestor to be granted before its turn.

        Let's take 4 requestors as an example. The Req[3:0] column of the following table represents the actual request, and 1 indicates that the request has been generated; the RR Priority column is the current priority, and 0 indicates the highest priority, which is 3 indicates the lowest priority; RR Grant column indicates the permission given according to the current Round Robin priority and request; Fixed Grant indicates that if it is a fixed priority, that is, according to 3210, the given grant value.

Req[3:0]

RR Priority

RR Grant[3:0]

Fixed Grant

Cycle 0

0101

3210

0001

0001

Cycle 1

0101

2103

0100

0001

Cycle 2

0011

0321

0001

0001

Cycle 3

0010

2103

0010

0010

Cycle 4

1000

1032

1000

1000

        In the first cycle, the initial state, we assume that req[0] has the highest priority, req[1] is the second, and req[3] is the lowest. When req[2] and req[0] are both 1, according to the priority , req[0] has higher priority than req[2], grant = 0001.

        In the second cycle, because req[2] did not obtain a grant in the previous cycle, it continues to be 1, and at this time, req[0] has a new request, and at this time, we can see round robin and fixed The difference in priority. For fixed priority, grant is still given 0, which is 0001. But the requirements of the round robin algorithm are: because req[0] has been granted in the previous cycle, its priority becomes the lowest 3, and correspondingly, the priority of req[1] becomes the highest, because it is originally The second highest priority, then when the priority of req[0] becomes the lowest, it will be supplemented to the highest naturally, then the license grant generated at this time cannot be given to req[0], but to req[2] ].

        Similarly, in the third cycle, req[2] has the lowest priority because it was granted in the previous cycle, and the priority of req[3] becomes the highest. In the following cycle, you can analyze it yourself.

        In other words, because the priority of the granted road becomes the lowest in the next cycle, so that other road requests will be granted in turn, and there will be no situation where one road is continuously granted when other roads have requests. , so round-robin is also translated as "round-robin scheduling" in Chinese.

        Well, let's talk about the RTL implementation of round robin. Lao Li will not talk about special cases this time, and directly introduce several parameterized writing methods.

        First look at the first idea, that is, the priority is changing. Recalling the Fixed Priority Design we talked about earlier, we all assumed that the priority from LSB to MSB is arranged from high to low. So is there a way we can design a fixed priority arbiter first, whose priority is an input? See the RTL below

module arbiter_base #(parameter NUM_REQ = 4)
   (

     input [NUM_REQ-1:0]    req,
     input [NUM_REQ-1:0]    base,
     output [NUM_REQ-1:0]    gnt

   );
   
    wire[2*NUM_REQ-1:0] double_req = {req,req};
    wire[2*NUM_REQ-1:0]double_gnt = double_req & ~(double_req - base);  
    assign gnt = double_gnt[NUM_REQ-1:0] | double_gnt[2*NUM_REQ-1:NUM_REQ];

  endmodule

        In this module, base is a onehot signal. The bit whose value is 1 indicates that this bit has the highest priority, and then its high bit, which is the left bit, until the highest bit returns to the 0th bit and wraps around. , the priority is decreased in turn, until the bit to the right of the bit of 1 is the lowest. Let's take 4 bits as an example, if base = 4'b0100, then the priority is bit[2] > bit[3] > bit[0] > bit[1].

        The idea of ​​this design is very similar to the idea of ​​the last 1-line design of the previous article by Lao Li, in which double_req & ~(double_req-base) actually uses the borrow of subtraction to find the first one above the base is 1. , but since the base value may be larger than the req value, it is not enough to be subtracted, so it needs to be expanded to {req, req} to subtract. When base=4'b0001 is the last algorithm in our last article. Of course, when base=4'b0001, there is no problem that req is not enough to reduce, so there is no need to expand.

        Well, now that there is a fixed priority arbiter that can give priority according to the input (this sentence is a bit confusing, you think about it carefully), then the next task is simple, after each grant, I put my Just adjust the priority. And the beauty of this design is that the base request is a onehot signal, and the one that is 1 has the highest priority. As we said before, the grant must be onehot. After the grant, the priority of the grant is the lowest, and its high 1 priority becomes the highest. Therefore, I only need a history_reg to record the value of the last grant before. , and then just shift the value of grant to the left to become the base of the next cycle. For example, assuming that my last cycle grant was 4'b0010, then for bit[2] to become the highest priority, it only needs base to be the left shift of grant. The RTL code is as follows

module round_robin_arbiter #(parameter NUM_REQ = 4)

(

  input                      clk,
  input                      rstn,
  input [NUM_REQ-1:0]        req,
  output [NUM_REQ-1:0]       gnt 

);


logic [NUM_REQ-1:0]          hist_q, hist_d;

always_ff@(posedge clk) begin
  if(!rstn) 
    hist_q <= {
   
   {NUM_REQ-1{1'b0}}, 1'b1};
  else
    if(|req)
      hist_q <= {gnt[NUM_REQ-2:0, gnt[NUM_REQ-1]}; 
end

arbiter_base #(
  .NUM_REQ(NUM_REQ)
) arbiter(

  .req      (req),
  .gnt      (gnt),
  .base     (hist_q)

);

endmodule

        We noticed that, unlike the Fixed Priority Arbiter, the Round robin arbiter is no longer a pure combinational logic circuit, but has a clock and a reset signal, because there must be a register to record the state of the previous grant.

        The advantages of the above design of the Round Robin Arbiter are that the idea is simple and clear, and the number of lines of code is also very short. After you understand the Fixed Priority Arbiter, it is easy to understand the design. But this design also has shortcomings, that is, the optimization in area and timing is not good enough. Compared with the design we will introduce next, when the number of requests is large (such as 64 bits), the timing and area are worse, so in fact, Lao Li Jian has not seen this design in the company, and more The following design was used.

        The previous idea is to change the priority, while the request remains the same. Another idea is that the priority remains the same, but we start with the request: when a certain request has been granted, we artificially block the req that enters the fixed priority arbiter. Drop it, which is equivalent to allowing only those routes that have not previously been granted to participate in the arbitration. After granting one route, one route will be blocked. After the remaining requests have been processed in turn, release the shield and start over. This is the idea of ​​​​using the method of masking to realize round robin.

        This idea will still use the writing method of Fixed Priority Arbiter in the previous lecture. How to generate the mask signal mask? Look back at the following RTL

  

module prior_arb #(

   parameter REQ_WIDTH = 16

   )(

   input [REQ_WIDTH-1:0]     req,
   output [REQ_WIDTH-1:0]    gnt

   );  

   logic [REQ_WIDTH-1:0]   pre_req;   

   assign pre_req[0] = 1'b0;  
   assign pre_req[REQ_WIDTH-1:1] = req[REQ_WIDTH-2:0] | pre_req[REQ_WIDTH-2:0];
   assign gnt = req & ~pre_req;

  endmodule

        What is the meaning of pre_req inside? That is, if the req of the i-th bit is the first 1, then each bit of pre_req is 1 from the i+1 bit, and the 0-th bit to the i-th bit are all 0. This is actually the mask we are looking for! Just AND the req with the pre_req of the previous cycle, then we will naturally get a new request. The previous grant bit and the previous bit in this request are masked off. The bits with lower priority are allowed to pass. If those bits have been requested but not granted, it is their turn now. After each new grant, the number of 0 bits in the mask will increase, so that more bits will be masked. Until all low-priority requests are granted once, the result of req AND mask becomes all 0. This It means that we have finished polling and have to start again.

        In terms of hardware implementation, we need two parallel Fixed Priority Arbiters. One of their inputs is the masked_request after request AND mask, and the other is the original request. Then we select a grant from the outputs of the two arbiters. As shown below

      

         When masked_request is not all 0, that is, when there is a request that has not been masked, we choose the above Mask Grant, otherwise we choose the following Unmasked Grant.

        And because for the above way, when the masked_request is all 0, the Mask Grant is also all 0. At this time, the Mask Grant and the Unmask Grant can be directly ORed together, so in fact, the Mux shown at the end of the figure can be used. The following simple AND gate and OR gate implementation

 The following is the code of this design, which is still a parameterized expression and can satisfy any number of requests.

module round_robin_arbiter #(
 parameter N = 16
)(

input         clk,
input         rst,
input [N-1:0] req,
output[N-1:0] grant

);


logic [N-1:0] req_masked;
logic [N-1:0] mask_higher_pri_reqs;
logic [N-1:0] grant_masked;
logic [N-1:0] unmask_higher_pri_reqs;
logic [N-1:0] grant_unmasked;
logic no_req_masked;
logic [N-1:0] pointer_reg;


// Simple priority arbitration for masked portion

assign req_masked = req & pointer_reg;
assign mask_higher_pri_reqs[N-1:1] = mask_higher_pri_reqs[N-2: 0] | req_masked[N-2:0];
assign mask_higher_pri_reqs[0] = 1'b0;
assign grant_masked[N-1:0] = req_masked[N-1:0] & ~mask_higher_pri_reqs[N-1:0];


// Simple priority arbitration for unmasked portion
assign unmask_higher_pri_reqs[N-1:1] = unmask_higher_pri_reqs[N-2:0] | req[N-2:0];
assign unmask_higher_pri_reqs[0] = 1'b0;
assign grant_unmasked[N-1:0] = req[N-1:0] & ~unmask_higher_pri_reqs[N-1:0];


// Use grant_masked if there is any there, otherwise use grant_unmasked. 
assign no_req_masked = ~(|req_masked);
assign grant = ({N{no_req_masked}} & grant_unmasked) | grant_masked;

// Pointer update
always @ (posedge clk) begin
  if (rst) begin
    pointer_reg <= {N{1'b1}};
  end else begin
    if (|req_masked) begin // Which arbiter was used?
      pointer_reg <= mask_higher_pri_reqs;
    end else begin
      if (|req) begin // Only update if there's a req 
        pointer_reg <= unmask_higher_pri_reqs;
      end else begin
        pointer_reg <= pointer_reg ;
      end
    end
  end
end

endmodule

        Here are a few more explanations. After no_req_masked, pointer_reg is not to be updated to 1111 or 1110, but to be based on the request at this time. For example, when the request is 0010 at this time, the new mask will be adjusted to 1100. Re-mask bit[0] and bit[1].

        It can be seen that this design uses two N-bit arbiters to calculate in parallel, and the critical path only has one more mask step and the last mux level than the single fixed priority arbiter, which performs very well in timing. Compared with the previous method, the 2N adder is also less in area.

        There are other ideas about Round-robin, such as rotating the request to achieve the purpose of changing the priority, and then rotating it back according to the history. For example, N fixed priority arbiters are placed in parallel, all priority orders are realized, and then one of the N grants is selected. These designs have their own sacrifices in area and timing, and the parametric design is not very easy to write. , Lao Li will not go into details here. Interested students can click on the original link to see a detailed explanation in a paper. Lao Li's design of method 2 also comes from this paper, and there are also area timing comparisons of different designs.

For more information, please see IC Gas Station:

Arbiter Design (2) -- Round Robin Arbiter

The above is the experience sharing of Lao Li in Silicon Valley. In the next article, we will try to use other ideas to realize RR scheduling.

Guess you like

Origin blog.csdn.net/m0_52840978/article/details/123783465