P4- single cycle CPU (Verilog implementation)

PS: I am not dead yet thoroughly, although in P3 gave his first hanging, but still a stroke, drag more reason is that I wrote the first pass write code more complicated, though able to live, but in order to facilitate the changes in class , so they rewrote it again (people can not do without the courage to start all over again)

Here will introduce a rough build process, focusing on the difference between P3 and P4 in the realization of, I stepped on the pit, the first to write defects and optimize the point of writing a second time code, hoping to bring their own alert, also We hope to give you some possible design optimization.

Want to learn a single CPU cycle theory, the venue and textbooks / courseware, I knew theoretical knowledge shallow, and the theory is not able to speak to understand the article, so here share more emphasis on implementation.

Build the attention points:

The key is to fill the basic functions of the two tables, the implementation complexity depends on the mode of translation circuit P3. Two tables, i.e. data table and the instruction path - a truth table signal. In P3, we have at least two written over the table, I will not say the way to fill in the table. Notably, with slightly different P3, P3, we can try to write an R type instruction all the information, and is connected at the data path, and then selected from the I, J-shaped connecting it, made out of the basic framework, but go write code P4 is recommended that the table is completely finished after the following reasons: the code describes the circuit as a direct connection so intuitive, if you want to make changes in the original data path, it needs its own "paying attention" + "named reasonable" to modifying the time to know the change which of wires; if the column finished table again to write the code, since the respective input and output ports in the end what the source, the kind of control signals, have been completely identified, it saves the modified time spent, and from a practical point of view, could save the mips.v (top) defined printed conductors (not recommended before attempting to gradually modify the implementation of all instructions, and have time to look as good as the pipeline)

To aid in writing code, schematics P3 is a very good way to look at, but, as I said before, the code implementation complexity depending on the translation mode! P3 circuit that thing out again:

 

 

 Some time ago I think they scored not too complex, time changes on the class is not large, so I used the iron Han Han style translation method.

Han Han-type iron-translation method: brothers, see this circuit in the wire yet? They have fifty or sixty roots, we just put all the modules to build a good, all conductors have taken on the name, then the corresponding put together, like, this led to the accident scene like this:

 

 

 

 

The picture shows part mips.v wire definition file, all nearly two wire definition screen, you can own imagination.

Although I used the naming "after a preceding element to element", but the first time I was made table side edge connector, so many wires plus instructions when you need to change the name, such a change with the change, wires definition get more than sixty lines, even out of the bug after completing cynicism, how can this be changed? In fact you can still change, reasonable to add breakpoints, view the intermediate variables and one by one test instruction can solve this problem, eventually miraculously over, but I guess they did not dare to class to cook this "monster" hands and feet. This design is undoubtedly a failure. First of all , my bad circuit diagram, such as here:

 

 

 How many possibilities NPC have it? Current instruction supported, is that pc + 4, pc + 4 + (offset << 2), {pc + 4 [31:28], 25-0,00}, $ register. These four things I actually use the three selectors implementation choices (such reason is from a utilitarian point of view, real time connection circuit particularly good change, easy to lead a class), which will increase a lot the definition of wire, also a waste of resources. Obviously an improved method is selected from a 4-multiplexer, a control signal to the two, thus saving wire, but also reduces the number of control signals.

Secondly , the way I used edge list while writing code, as already said, so there will be many changes, especially for non-intuitive and multi-conductor code, the changes are catastrophic, although sometimes comments, but do not the true meaning of clear leads, sometimes redefined for the multiplexer, if the name is not good, do not know which one is instantiated.

After finding the cause of the failure, the second time in the translation I used the following method:

The combined 2-to-1 multiplexer, and the multiplexer into the package interior of the trunk module translation method : In the first place a piece of code sub Examples:

timescale 1ns `/ 1ps
 Module1 the ALU (
     INPUT [ 31 is : 0 ] A,
     INPUT [ 31 is : 0 ] B_from_grf,
      INPUT [ 31 is : 0 ] B_from_ext,
      INPUT [ . 1 : 0 ] ALUsrc, // order to facilitate the expansion, so changed two 
    INPUT [ . 3 : 0 ] ALUOP,
     Output  REG [ 31 is : 0 ] Result,
     Output  REG ZERO 
    ); 
    //0000: addition, 0001: subtraction, 0010: or operation, 0011: comparison operation 
    Wire [ 31 is : 0 ] B;
     Wire [ 31 is : 0 ] maybe_b [ . 3 : 0 ]; // for simplicity, to all selected data to memory array, convenient and direct selected by the selection signal out directly 
    ASSIGN maybe_b [ 0 ] = B_from_grf; 
     ASSIGN maybe_b [ . 1 ] = B_from_ext; 
    
    ASSIGN B = maybe_b [ALUsrc];
     Always @ (*) the begin 
        Case (ALUOP)
             . 4 ' B0000: the begin 
                Result <= A + B;
                zero<=0;
            end
            4'b0001:begin
                result<=A-B;
                zero<=0;
            end
            4'b0010:begin
                result<=A|B;
                zero<=0;
            end
            4'b0011:begin
                if(A-B==0)begin
                    zero<=1;
                end
                else begin 
                    zero<= 0 ;
                end 
            end 
        endcase 
    end 
endmodule
ALU part of the code

Unlike conventional ALU, ALU I different input source B as a separate input, and also serves as the input signal ALUsrc. In the ALU, an opened can be stored in several cases this may input port of the ALU B space with ALUsrc selected, then B is selected as an operand, so that the multiplexer corresponds to the ALU inside the package, to achieve the purpose of reducing the wire with the top-level assign in the module, they do not racking their brains to think about how can we know the definition of MUX MUX is Gansha. In fact, if we MUX is something which places the selection criteria for the naming (such as MUX name here named ALUB_select), rather it is how many bits per input, output how many, how much input as a standard naming words (for example, here is a 32-bit input, 32-bit output named mux32_to_32, so), the top level better (as to ensure the unity of ALU function: calculated without hybridity select, add instruction needs to be changed only when mux, the input port does not need to change the number of alu). Due to the recent body minor ailments, so without too much effort on reasonable named MUX translation method: that is to say the period of the last translation method, due to illness do not want to change, and just want to sleep.

Error-prone points:

1. The non-blocking assignments and display the contents do not match

Look through our evaluation machine display things and not the same as it needs to judge whether we are right. If non-blocking assignments, followed by a display case, due to the non-blocking assignments until the end of the process is the assignment of a unified block, so what is the output unmodified.

2. bit wide

For 0x00003000, it is 32 bits wide, so the binary was written 32'b00000000000000000011000000000000, easy to write 16 hex, then, is 32'b00003000, where the 32 refers to the wide, rather than the number in hexadecimal under certain several!

3.jr Jump

jr can not just jump address register 31 store any address register stores that can jump! This encounter is the scene of an accident roommate P4 class, weak test did not test out the next lesson! Be sure to check whether jr wrote on it!

4. or display

Need to output 32-bit wide memory addresses, not [11: 2] (10) addresses, please look after their own output! Output and tutorials provided to it than that, it will be found.

 

 

About debug:

Video tutorial recommended Verilog part of the re-read, and learn how to add a school break, how to join the intermediate variables as a signal, which is helpful for us to trace the source of the bug, for example, I use this method gradually traced a jump bug, first in alu which add breakpoints, watch variables middle filling zero is the number that do not meet expectations, and found to be involved ALU calculated AB not what you want, and then I added a breakpoint to GRF, he found himself the success of a line even the wrong value does not lead to the ALU passed. Here are just cite a simple example, you can use this method to other similar bug resolved.

After the addition, the rational design of the test program is also very important, first of all is not measured with several instructions, once we measured an instruction, such as the first full test ori, then with ori, measured addition and subtraction, addition and subtraction with after measuring the jump as well as other instructions, you can eventually write a comprehensive test program to test all of them again. Do not measure together! ! Tell you the truth, I could not even additions are wrong, starting with comprehensive test program measured, sensed my beq not perform, then I have been watching beq, after a long time only to find that I have a problem addition, lead the value of the branch does not meet the conditions. I hope we can learn a lesson.

 

About naming convention:

See Discussion: [dry] pretends to be about verilog naming of personal experience to share , feel the students named specification covers information given quite good, though a bit long, but with bad written to achieve a good memory function. If I reprint here, then it must involve copyright issues, so we can go there directly learn. I first copy a pdf himself aside, and so the final exam is over, get a second opinion, after the consent of the then reproduced here (classic forever Hee hee hee)

 

Guess you like

Origin www.cnblogs.com/BUAA-Wander/p/11873946.html