Article directory
1. Hierarchy of digital circuit system design
Serial Adder:
A four-bit serial adder consists of 4 full adders. The full adder is a sub-module of the serial adder, and the full adder is composed of basic logic gates, and these basic logic gates are the so-called leaf modules. In this design, leaf modules (basic logic gates) are used to build sub-modules (full adders), and then sub-modules are used to build the required circuits (serial adders).
Obviously, the Bottom-Up design method has no obvious rules to follow. It mainly relies on the designer's practical experience and skilled design skills, and finally designs a complete digital system with a step-by-step trial method. The performance indicators of the system can only be analyzed and tested after the system is constructed. This design method is often used in the design of schematic diagrams. Compared with other methods, this method takes less time to realize each sub-module circuit.
Use the Top-Down design method to design a typical cpu:
Vector dot product multiplier:
use the modular hierarchical design method to design a 4-dimensional vector dot product multiplier, where vector a =(a 1 , a 2 ,a 3 ,a 4 );b=(b 1 ,b 2 ,b 3 ,b 4 ). Dot product multiplication rule:
Verilog code:
module vector(a1,a2,a3,a4,b1,b2,b3,b4,out);
input [3:0] a1,a2,a3,a4,b1,b2,b3,b4;
output [9:0] out;
wire [7:0] out1,out2,out3,out4;
wire [8:0] out5, out6;
wire [9:0] out;
mul_addtree U1(.x(a1), .y(b1), .out(out1));
mul_addtree U2(.x(a2), .y(b2), .out(out2));
mul_addtree U3(.x(a3), .y(b3), .out(out3));
mul_addtree U4(.x(a4), .y(b4), .out(out4));
add #(8) U5(a(out1), .b(out2), .out(out5));
add #(8) U6(.a(out3), .b(out4), .out(out6));
add #(9) U7(.a(out5), .b(out6), .out(out));
endmodule
//adder
module add(a,b,out);
parameter size=8;
input [size-1:0] a,b;
output [size:0j out;
assign out=a+b;
endmodule
//Multiplier
module mul_addtree(mul_a,mul_b,mul_out);
input [3:0] mul_a.mul_b;
output [7:0] mul_out;
wire [3:0] muI_out;
wire [3:0] stored0,stored1,stored2,stored3;
wire [3:0] add0l, add23;
assign stored3=mul_b[3]?{1'b0,mul_a,3'b0):8’b0;
assign stored2=mul_b[2]?{2’b0,mul_a ,2’b0}:8’b0;
assign stored1=mul_b[1]?{3'b0,mul_a,1'b0}:8’b0;
assign storedo=mul_b[0]?{4’b0,mul_a}:8’b0;
assign add01=storedl +stored0;
assign add23=stored3+stored2;
assign mul_out=add0l +add23;
endmodule
2. Typical circuit design
2.1 Adder tree multiplier
The design idea of the adder tree multiplier is "shift and add", and the addition operation takes the form of an adder tree. The process of multiplication is: the multiplicand is multiplied by each bit of the multiplier and multiplied by the corresponding weight, and finally the results are added to obtain the final multiplication result.
Example: The figure below is a 4-bit multiplier structure, using verilog to design an adder tree 4-bit multiplier
module mul_addtree(mul_a,mul_b,mul_out);
input [3:0] mul_a,mul_b;
output [7:0] mul_out;
wire [7:0] mul_out;
wire [7:0] stored0,stored1,stored2,stored3;
wire [7:0] add01,add23;
assign stored3 = mul_b[3]?{1'b0,mul_a,3'b0}:8'b0;
assign stored2 = mul_b[2]?{2'b0,mul_a,2'b0}:8'b0;
assign stored1 = mul_b[1]?{3'b0,mul_a,1'b0}:8'b0;
assign stored0 = mul_b[0]?{4'b0,mul_a,0'b0}:8'b0;
assign add01=stored1+stored0;
assign add23=stored3+stored2;
assign mul_out=add01 + add23;
endmodule
module mult_addtree_tb;
reg [3:0] mult_a;
reg [3:0] mult_b;
wire [7:0] mult_out;
mul_addtree U1(.mul_a(mult_a), .mul_b(mult_b), .mul_out(mult_out));
initial
begin
mult_a = a;
mult_b = 0;
repeat(9)
begin
#20 mult_a = mult_a + 1;
mult_b = mult_b + 1;
end
end
endmodule
Pipeline structure:
Example: The figure below is a 4-bit multiplier structure, using Verilog to design a two-stage pipeline adder tree 4-bit multiplier.
Two-stage pipeline adder tree The structure of 4-bit multiplier is shown in the figure. By inserting D flip-flop groups between the first-stage and second-stage, second-stage and third-stage adders, a two-stage pipeline design can be realized.
module mul_addtree_2_stage(clk,clr,mul_a,mul_b,mul_out);
input clk,clr;
input [3:0] mul_a,mul_b;
output [7:0] mul_out;
reg [7:0] add_tmp_1,add_tmp_2,mul_out;
wire [7:0] stored0,stored1,stored2,stored3;
assign stored3 = mul_b[3]?{1'b0,mul_a,3'b0}:8'b0;
assign stored2 = mul_b[2]?{2'b0,mul_a,2'b0}:8'b0;
assign stored1 = mul_b[1]?{3'b0,mul_a,1'b0}:8'b0;
assign stored0 = mul_b[0]?{4'b0,mul_a}:8'b0;
always@(posedge clk or negedge clr)
begin
if(!clr)
begin
add_tmp_1 <= 8'b0000_0000;
add_tmp_2 <= 8'b0000_0000;
mul_out <= 8'b0000_0000;
end
else
begin
add_tmp_1 <= stored3 + stored2;
add_tmp_2 <= stored1 +stored0;
mul_out <= add_tmp_1 + add_tmp_2;
end
end
endmodule
module mult_addtree_2_stag_tb;
reg clk,clr;
reg [3:0] mult_a,mult_b;
wire [7:0] mult_out;
mul_addtree_2_tage U1(.mul_a(mult_a), .mul_b(mult_b), .mul_out(mult_out), .clk(clk), .clr(clr));
initial
begin
clk = 0;clr = 0;mult_a = 1;mult_b = 1;
#5 clr = 1;
end
always #10 clk = ~clk;
initial
begin
repeat(5)
begin
#20 mult_a = mult_a + 1;
mult_b = mult_b + 1;
end
end
endmodule
2.2 Wallace tree multiplier
The operation principle of the Wallace tree multiplier is shown in the figure below, where FA is a full adder and HA is a half adder. The basic principle is that addition starts from the most data-intensive place, and uses full adder and half adder repeatedly to cover the "tree". This level of full adder is a 3-input and 2-output device, so the full adder is also called a 3-2 compressor. The depth of the tree is continuously reduced by the full adder, and finally reduced to a tree with a depth of 2. The final stage consists of a simple two-input adder.
module wallace(x,y,out);
parameter size=4;
input [size-1:0] x,y;
output [2*size-1:0] out;
wire [size*size-1:0] a;
wire [1:0] b0,b1,c0,c1,c2,c3;
wire [5:0] add_a,add_b;
wire [6:0] add_out;
wire [2*size-1 :0] out;
assign a={x[3],x[3],x[2],x[2],x[1],x[3],x[1],x[0],x[3],x[2],x[1],x[0],x[2],x[1],x[0],x[0]}
&
{y[3],y[2],y[3],y[2],y[3],y[1],y[2],y[3],y[0],y[1],y[1],y[2],y[0],y[0],y[1],y[0]};
hadd U1(.x(a[8]), .y(a[9]), .out(b0));
hadd U2(.x(a[11]), .y(a(a[12]), .out(b1));
hadd U3(.x(a[4]), .y(a[5]), .out(c0));
fadd U4(.x(a[6]), .y(a[7]), .z(b0[0]),.out(c1));
fadd U5(.x(a[13]), .y(a[14]), . z(b0[1]), .out(c2));
fadd U6(.x(b1[0]), .y(a[10]), .z(b1[1], .out(c3));
assign add_a = {c3[1],c2[1],c1[1],c0[1],a[3],a[1]);
assign add_b ={ a[15],c3[0],c2[0],c1[0],c0[0],a[2]};
assign add_out = add_a + add_b;
assign out={add_out,a[0]};
endmodule
module fadd(x, y, z, out);
output [1:0] out;
input x,y,z;
assign out=x+y+z;
endmodule
module hadd(x, y, out);
output [1:0] out;
input x.y;
assign out=x+y;
endmodule
module wallace_tb;
reg [3:0] x, y;
wire [7:0] out;
wallace m(.x(x), .y(y), .out(out));
initial
begin
x=3; y=4;
#20 x=2; y=3;
#20 x=6; y=8;
end
endmodule
2.3 Complex multiplier
The circuit structure of a complex multiplier is shown in the figure below. Multiply the real part of the complex number x by the real part of the complex number y, and subtract the imaginary part of x from the multiplication of the imaginary part of y to obtain the real part of the output result. Multiply the real part of x by the imaginary part of y, and multiply the imaginary part of x by the real part of y to get the imaginary part of the output.
module complex(a,b,c,d,out_real,out_im);
input [3:0]a,b,c,d;
output [8:0] out_real,out_im;
wire [7:0] sub1,sub2,add1,add2;
wallace U1(.x(a), .y(c), .out(sub1));
wallace U2(.x(b), .y(d), .out(sub2));
wallace U3(.x(a), .y(d), .out(add1));
wallace U4(.x(b), .y(c), .out(add2));
assign out_real=subl - sub2;
assign out_im = add1 + add2;
endmodule
module complex_tb;
reg [3:0] a,b,c,d;
wire [8:0] out_real;
wire [8:0] out_im;
complex U1(.a(a), .b(b), .c(c), .d(d), .out_real(out_real), .out_im(out_im));
initial
begin
a=2;b=2;c=5;d=4;
#10
a=4;b=3;c=2;d=1;
#10
a=3;b=2;c=3;d=4
end
endmodule
2.4 FIR filter design
A finite impulse response (FIR) filter is a commonly used digital filter that uses a weighted sum of its input samples to form its output. Its system function is:
where Z -1 means a delay of one clock cycle, and Z -2 means a delay of two clock cycles.
The FIR filter for the input sequence X[n] can be represented by the structural diagram shown in the figure below, where X[n] is the input data stream. The input connections and output connections of the stages are called taps, and the coefficients (b 0 , b 1 , . . . , b n ) are called tap coefficients. A FIR filter of order M will have M+1 taps.
The data stream samples at each clock edge n (time index) are multiplied by the tap coefficients by the shift register and added to form the output Y[n].
code show as below:
module FIR(Data_out, Data_in ,dock,reset); //模块FIR
output [9:0] Data_out;
input [3:0] Data_in;
input clock,reset;
wire [9:0] Data_out;
wire [3:0] samples_0,sampies_1 ,samples_2,samples_3,samples_4,
samples_5, samples_6, samples_7, samples_8;
shift_register U1(.Data_in(Data_in), .clock(ciock), .reset(reset),
.samples_0(samples_0), .samples_1(samples_1),
.samples_2(samples_2), .samples_3(samples_3),
.samples_4(samples_4), .samples_5(samples_5),
.samples_6(sam ples_6), .samples_7(samples_7),
.samples_8(samples_8));
caculator U2(.samples_0(samples_0), .samples_1 (samples_1),
.samples_2(samples_2), .sam ples_3(samples_3),
.samples_4(samples_4), .samples_5(sam ples_5),
.sampIes_6(samples_6), .samples_7(sampies_7),
.samples_8(sam pies_B), .Data_out( Data_out));
endmodule
shift_register
module shift_register(Data_in,clock,reset,samples_0,samples_1 ,samples_2,samples_3,
samples_4,samples_5,samples_6,samples_7, samples_8);
input [3:0] Data_in;
input clock,reset;
output [3:0] samples_0,samples_1 ,samples_2,samples_3,samples_4,
samples_5,samples_6, samples_7,samples_8;
reg [3:0] samples_0,samples_l ,samples_2,samples_3,samples_4,
samples_5, sam pies 6, sampies_7,samptes_8;
always(posedge clock or negedge reset)
begin
if(reset)
begin
samples_0 <= 4’b0;
samples_1 <= 4’b0;
samples_2 <= 4’b0;
samples_3 <= 4’b0;
samples_4 <= 4’b0;
samples_5 <= 4’b0;
samples_6 <= 4’b0;
samples_7 <= 4’b0;
samples_8 <= 4’b0;
end
else
begin
samples_0 <= Data_in;
samples_1 <= samples_0;
samples_2 <= samples_1;
samples_3 <= samples_2;
samples_4 <= samples_3;
samples_5 <= samples_4;
samples_6 <= samples_5;
samples_7 <= samples_6;
samples_8 <= samples_7;
end
end
endmodule
//模块caculator
module caculator(sampies_O,samples_i ,samples_2,samples_3,samples_4,
samples_5,samples_6, samples_7,samples_8,Data_out);
input [3:0] samples_0,samples_1 ,samples_2,samples_3,samples_4,samples_5,samples_6, samples_7,samples_8;
output [9:0] Data_out;
wire [9:0] Data_out;
wire [3:0] out_tmp_1 ,out_tmp_2,out_tmp_3,out_tmp_4,out_tmp_5;
wire [7:0] outl,out2,out3,out4,out5;
parameter b0=4’b0010;
parameter b1=4’b0011;
parameter b2=4’b0110;
parameter b3=4’b1010;
parameter b4=4’b1100;
mul_addtree U1(.mul_a(b0),.mul_b(out_tmp_1),.mul_out(out1));
mul_addtree U2(.mul_a(b1),.mul_b(out_tmp_2),.mul_out(out2));
mul_addtree U3(.mul_a(b2),.mul_b(out_tmp_3),.mul_out(out3));
mul_addtree U4(.mul_a(b3),.mul_b(out_tmp_4),.mul_out(out4));
mul_addtree U5(.mul_a(b4),.mul_b(samples_4),.mul_out(out5));
assign out_tmp_1 = samples_0 + samples_8;
assign out_tmp_2 =samples_1 + samples7;
assign out_tmp_3 = samples_2 + samples_6;
assign out_tmp_4 = samples3 + samples_5;
assign Data_out = out1 +out2 + out3 + out4 + qout5;
endmodule
//模块FIR_tb
module FIR_tb;
reg clock,reset;
reg [3:0] Data_in;
wire [9:0] Data_out;
FIR U1(.Data_out(Data_out), .Data_in(Data_in), .clock(clock), reset(reset));
initial
begin
Data_in = 0; clock = 0; reset = 1;
#10 reset = 0;
end
always
begin
#5 clock <= ~clock;
#5 Data_in <= Data_in+1;
end
endmodule
2.5 Design of on-chip memory
(1) The Verilog description of RAM
RAM is a random access memory, and the contents of the storage unit can be taken out or stored at will as needed. This kind of memory will lose all data after power off, and is generally used to store some programs and data used in a short time. Its internal structure is as follows:
For example: use Verilog to design a single-port RAM with a depth of 8 and a bit width of 8. Single-port RAM has only one set of address bus, and the read and write operations are separated.
module ram_single(clk, addm, cs_n, we_n, din, dout);
input clk; //clock signal
input [2:0] addm; //address signal
input cs_n; //chip select signal
input we_n; //write enable signal
input [7:0] din; //input data
output[7:O] dout; //output data
reg [7:0] dout;
reg [7:0] raml [7:0]; //8*8 bites register
aIways(posedge clk)
begin
if(cs_n)
dout <= 8’bzzzz_zzzz;
else
if(we_n) //read data
dout <= raml[addm];
else //write data
raml[addm] <= din;
end
end module
module ram single tb;
reg clk, we_n, cs_n;
reg [2:0] addm;
reg [7:0] din;
wire [7:0] dout;
ram_single U1(.clk(clk),.addm(addm),.cs_n(cs_n),.we_n(we_n),.din(din),.dout(dout));
initial
begin
clk=0; addm=0; cs_n=1; we_n=0; din=0;
#5 cs_n=0;
#315 we_n=1;
end
always #10 clk=~clk;
initial
begin
repeat(7)
begin
#40 addm=addm+1;
din=din+1;
end
#40 repeat(7)
#40 addm=addm-1;
end
endmodule
Example: Use Verilog to design a dual-port RAM with a depth of 8 and a bit width of 8. Dual-port RAM has two sets of address buses, one for reading data and the other for writing data. Both can be operated independently.
module ram_dual(q, addr _n, addr_out, d, we, rd, clk1, clk2);
output [7:0] q; //output data
input [7:0] d; //input data
input [2:0] addr_in; //write data address signal
input [2:0] addr_out; //output data address signal
input we; //write data control signal
input rd; //read data control signal
input clk1; //write data clock
input clk2; //read data clock
reg[7:0] q;
reg[7:0] mem[7:0]; //8*8 bites register
always@(posedge clk1)
begin
if(we)
mem[addr_n] <= d;
end
always@(posedge clk2)
begin
if(rd)
q <= mem[addr_out];
end
endmodule
module ram_dual_tb;
reg clk1, clk2, we, rd;
reg [2:0] addr_in;
reg [2:0 ]addr_out;
reg [7:0] d;
wire [7:0] q;
ram_dual U1(.q(q),.addr_in(addr_in),.addr_out(addr_out),.d(d),.we(we),.rd(rd),.clk1(clk1),.clk2(clk2));
initial
begin
clk1=0; clk2=0; we=1; rd=0; addr_in=0; addr_out=0; d=0;
#320 we=0;
rd=1;
end
always
begin
#10 clk1 = ~clk1;
clk2 = ~clk2;
end
initial
begin
repeat(7)
begin
#40 addr_in=addr_in+1;
d=d+1;
end
#40
repeat(7) #40 addr_out=addr_out+1;
end
endmodule
(2) The Verilog description of ROM ROM
is read-only memory, which is a memory that can only read data stored in advance. Its characteristic is that the stored data cannot be changed, that is to say, this memory can only be read but not written. Since ROM data will not be lost after power failure, it is usually used in electronic or computer systems that do not need to change data frequently, and the data will not disappear because the power is turned off.
module rom(dout, clk, addm, cs_n);
input clk, cs_n;
input [2:0] addm;
output [7:0] dout;
reg [7:0] dout;
reg [7:0] rom[7:0];
initial
begin
rom[0]=8b0000_0000;
rom[1]=8b0000_0001;
rom[2]=8b0000_0010;
rom[3]=8b0000_0011;
rom[4]=8b0000_0100
rom[5]=8b0000_0101;
rom[6]=8b0000_0110;
rom[7]=8'b0000_0111;
end
always@(posedge clk)
begin
if(cs_n)
dout<=8'bzzzz_zzzz;
else
dout<=rom[addm];
end
endmodule
module rom_tb;
reg clk, cs_n;
reg [2:0] addm;
wire [7:0] dout;
rom U1(.dout(dout),.clk(clk),.addm(addm),.cs_n(cs_n));
initial
begin
clk=0; addm=0; cs_n=0;
end
always #10 clk=~clk;
initial
begin
repeat(7)
#20 addm=addm+1;
end
endmodule
2.6 FIFO design
FIFO (First In First Out) is a first-in first-out data buffer, usually used for data buffering of interface circuits. The difference with ordinary memory is that there is no external read and write address line, and two clocks can be used to perform write and read operations respectively. FIFO can only write data sequentially and read data sequentially. Its data address is completed by adding 1 automatically to the internal read and write pointer. It cannot be read or written to a specified address by the address line like ordinary memory.
The FIFO consists of memory blocks and a controller that manages the passage of data into and out of the FIFO, providing access to only one register at a time, rather than the entire register array. The FIFO has two address pointers, one for writing data to the next available storage unit, and one for reading the next unread storage unit. Reading and writing data must be done one at a time.
The reading and writing process is shown in the figure:
when a stack is empty (Figure A), the read data pointer and write data pointer both point to the first storage unit as shown; when writing a data (Figure B) write data The pointer will point to the next storage unit; after seven write data operations (Figure C) the write pointer will point to the last data unit; after eight consecutive write operations, the write pointer will return to the first unit and show that the stack status is full ( Figure D). The data read operation is similar to the write operation. When a data is read, the read data pointer will move to the next storage unit until all the data is read. At this time, the read pointer returns to the first unit, and the stack status is displayed as empty.
The composition of a FIFO generally includes two parts: the address control part and the RAM part for storing data. As shown below. The address control part can generate RAM addresses according to read and write instructions. RAM is used to store stack data, and store and read data according to the address signal generated by the control part. The RAM used here is the aforementioned dual-port RAM.
Example: Use Verilog HDL to design a FIFO with a depth of 8 and a bit width of 8
//顶层模块:
module FIFO_buffer(clk,rst,write_to_stack,read_from_stack,Data_in,Data_out);
input clk,rst;
input write_to_stack,read_from_stack;
input [7:0] Data_in;
output [7:0] Data_out;
wire [7:0] Data_out;
wire stack_full, stack_empty;
wire [2:0] addr_in, addr_out;
FIFO_control U1(.stack_full(stack_full),.stack_empty(stack_empty),.write_to_stack(write_to_stack),
.write_ptr(addr_in),read_ptr(addr_out),.read from stack(read from stack),
.clk(clk),.rst(rst));
ram_dual U2(.q(Data out),.addr_in(addr_in),.addr_out(addr_out),.d(Data_in),
.we(write_to_stack),.rd(read_from_stack),.clk1(clk),.clk2(clk));
endmodule
//控制模块:
module FIFO_control(write_ptr, read_ptr, stack_full, stack_empty, write_to_stack,read_from_stack, clk, rst);
parameter stack_width=8;
parameter stack_height=8
parameter stack_ptr_width=3,
output stack_full; //stack full flag
output stack_empty; //stack empty flag
output [stack_ptr_width-1:0] read_ptr; //read data address
output[stack_ptr_width-1:0] write ptr; //write data address
input write_to_stack; //write data to stack
input read_from_stack; //read data from stack
input clk;
input rst;
reg [stack_ptr_width-1:0] read_ptr;
reg [stack_ptr_width-1:0] write_ptr;
reg [stack_ptr_width:0] ptr_gap;
reg [stack_width-1:0] Data_out;
reg [stack_width-1:0] stack[stack_height-1:0];
//stack status signal
assign stack_full=(ptr_gap==stack height);
assign stack_empty=(ptr_gap==0);
always@(posedge clk or posedge rst)
begin
if(rst)
begin
Data_out<=0;
read_ptr<=0;
write_ptr<=0;
ptr_gap<=0;
end
else if(write_to_stack && (!stack_full) && (!read_from_stack))
begin
write_ptr<=write_ptr+1;
ptr_gap<=ptr_gap+1;
end
else if(!write_to_stack && (!stack_empty) && (read_from_stack))
begin
read_ptr<=read_ptr+1;
ptr_gap<=ptr_gap-1;
end
else if(write_to_stack && stack_empty && read_from_stack)
begin
write_ptr<=write_ptr+1;
ptr_gap<=ptr_gap+1;
end
else if(write_to_stack && stack_full && read_from_stack)
begin
read_ptr<=read_ptr+1;
ptr_gap<=ptr_gap-1;
end
else if(write_to_stack && read_from_stack&& (!stack_full)&&(!stack_empty))
begin
read_ptr<=read_ptr+1;
write_ptr<=write_ptr+1;
end
end
endmodule
module FIFO_tb;
reg clk, rst;
reg [7:0] Data_in;
reg write_to_stack, read_from_stack;
wire [7:0] Data_out;
FIFO_buffer U1(.clk(clk),.rst(rst),.write_to_stack(write_to_stack),
.read_from_stack(read_from_stack),.Data_in(Data_in),.Data_out(Data_out));
initial
begin
clk=0; rst=1; Data_in=0, write_to-stack=1; read_from_stack=0;
#5 rst=0;
#155 write_to_stack=0;
read _rom_stack=1:
end
always #10 clk = ~clk;
initial
begin
repeat(7)
#20 Data_in =Data_in+1;
end
endmodule
2.7 Keyboard Scanner and Encoder
Keypad scanners and encoders are used to manually enter data in digital systems with keyboards, by detecting whether a key is pressed, and generating a scan code that uniquely corresponds to the key.
Example: Use Verilog to design the keyboard scan and encoder of the hexadecimal keyboard circuit: the
control signal state machine transition diagram is shown in the figure below:
For details, see: Verilog implementation of the hexadecimal keyboard scanner
At this time, the intersection of the row and column lines is the position of the button. Output the corresponding coding information according to the determined position of the key. Its keyboard code table is shown in the table below.
Key | Row[3:0] | Col[3:0] | Code |
---|---|---|---|
0 | 0001 | 0001 | 0000 |
1 | 0001 | 0010 | 0001 |
2 | 0001 | 0100 | 0010 |
3 | 0001 | 1000 | 0011 |
4 | 0010 | 0001 | 0100 |
5 | 0010 | 0010 | 0101 |
6 | 0010 | 0100 | 0110 |
7 | 0010 | 1000 | 0111 |
8 | 0100 | 0001 | 1000 |
9 | 0100 | 0010 | 1001 |
A | 0100 | 0100 | 1010 |
B | 0100 | 1000 | 1011 |
C | 1000 | 0001 | 1100 |
D | 1000 | 0010 | 1101 |
E | 1000 | 0100 | 1110 |
F | 1000 | 1000 | 1111 |
In order to make the test closer to the real physical environment, the test platform must include a signal generator that simulates the state of the button, a module Row_signal that can confirm the row line corresponding to the button, and the tested module Hex Keypad Grayhill 072. The signal generator that simulates the state of the key can be embedded in the test platform, and by continuously assigning values to the key signal, different key signals can be simulated. The Row_Signal module is used to detect the validity of the key and determine the row where the key is located. The Synchronizer module determines whether a key is pressed by detecting the OR of each line value. When the output of this module changes, the tested module Hex Keypad Grayhil 072 will determine the position of the key and output the corresponding code
Its Verilog HDL program code is:
// 顶层模块:
module keypad(clock,reset,row,code,vaild,col);
input clock,reset;
input [3:0] row;
output [3:0] code;
output vaild;
output [3:0] col;
wire s_row;
hex_keypad_grayhill U1(.code(code),.col(col),.valid(valid),
.row(row),.s_row(s_row),.clock(clock),.reset(reset));
synchronizer U2(.s_row(srow),.row(row),.clock(clock),.reset(reset));
endmodule
//编码模块:
module hex_keypad_grayhill(code,col,valid,row,s_row,clock,reset);
output [3:0] code;
output valid;
output [3:0] col;
input [3:0] row;
inputs row;
input clock,reset;
reg [3:0] col;
reg[3:0] code;
reg [5:0] state,next_state,
parameter s_0=6'b000001,s_1=6'b000010,s_2=6'b000100;
parameter s_3=6'b001000,s_4=6'b010000,s_5=6'b100000;
assign valid=((state==s_1)|(state==s_2)|(state==s_3)|(state==s_4))&&row;
always@(row or col)
case(frow,col})
8'b0001_0001: code=0;
8'b0001_0010: code=1;
8'b0001_0100: code=2;
8'b0001_1000: code=3;
8'b0010_0001: code=4;
8'b0010_0010: code=5;
8'b0010_0100: code=6;
8'b0010_1000: code=7;
8'b0100_0001: code=8,
8'b0100_0010: code=9;
8'b0100_0100: code=10;
8'b0100_1000: code=11;
8'b1000_0001: code=12;
8'b1000_0010: code=13;
8'b1000_0100: code=14;
8'b1000_1000: code=15;
default code=0;
endcase
always@(state or s_row or row) //next-state logic
begin
col=0:next_state=state;
case(state)
s_0:
begin
col=15;
if(s_row) next_state=s_1;
end
s_1:
begin
col=1;
if(row) next_state=s_5;
else next_state=s_2
end
s_2:
begin
col=2;
if(row) next_state=s_5;
else next_state=s_3,
end
s_3:
begin
col=4;
if(row) next_state=s_5;
else next_state=s _4;
end
s_4:
begin
col=8;
if(row) next_state=s_5;
else next_state=s_0;
end
s_5:
begin
col=15;
if(!row) next_state=s_0;
end
endcase
end
always@(posedge clock or posedge reset)
if(reset)
state<=s_0;
else
state<=next_state;
endmodule
module synchronizer(s_row,row,clock,reset);
output s_row;
input [3:0] row;
input clock,reset;
reg a_row,s_row;
always@(negedge clock or posedge reset)
begin
if(reset)
begin
a_row<=0;
s_row<=0;
end
else
begin
a_row<=(row[0]llrow[1]llrow[2]llrow[3]);
s row<=a row;
end
endendmodule
//模拟键盘产生信号
module row_signal(row,key,col);
output [3:0] row;
input [15:0] key;
input[3:0] col;
reg[3:0] row;
always@(key or col)
begin
row[0]=key[0]&&col[0]||key[1]&&col[1]||key[2]&&col[2]||key[3]&&col[3];
row[1]=key[4]&&col[0]||key[5]&&col[1]||key[6]&&col[2]||key[7]&&col[3];
row[2]=key[8]&&col[0]||key[9]&&col[1]||key[10]&&col[2]||key[11]&&col[3];
row[3]=key[12]&&col[0]||key[13]&&col[1]key[14]&&col[2]||key[15]&&col[3];
end
endmodule
//Testbench
module hex_keypad_grayhill_tb;
wire [3:0] code;
wirevalid;
wire [3:0] col;
wire [3:0] row;
reg clock;
reg reset;
reg [15:0] key;
integer j,k;
reg [39:0] pressed;
parameter [39:0] key_0="key_0";
parameter [39:0] key_1="key_1";
parameter [39:0] key_2="key_2";
parameter [39:0] key_3="key_3";
parameter [39:0] key_4="key_4";
parameter [39:0] key_5="key_5";
parameter [39:0] key_6="key_6";
parameter [39:0] key_7="key_7";
parameter [39:0] key 8="key 8";
parameter [39:0] key_9="key_9";
parameter [39:0] key_A="key_A";
parameter [39:0] key_B="key_B";
parameter [39:0] key_C="key_C";
parameter [39:0] key_D="key_D";
parameter [39:0] key_E="key_E";
parameter [39:0] key_F="key_F";
parameter [39:0] None="None";
keypad U1(.clock(clock),.reset(reset),.row(row),
.code(code),.vaild(vaild),.col(col)); //top module
row_signal U2(.row(row),.key(key),.col(col)); // Simulatesignal generation
always@(key)
begin
case(key)
16'h0000: pressed=None;
16'h0001: pressed=key_0;
16'h0002: pressed=key_1;
16'h0004: pressed=key_2;
16'h0008: pressed=key_3;
16'h0010: pressed=key_4;
16'h0020: pressed=key_5;
16'h0040: pressed=key_6;
16'h0080: pressed=key_7;
16'h0100: pressed=key_8;
16'h0200: pressed=key_9;
16'h0400: pressed=key_A;
16'h0800: pressed=key_B;
16'h1000: pressed=key_C;
16'h2000: pressed=key_D;
16'h4000: pressed=key_E;
16'h8000: pressed=key_F;
default: pressed=None;
endcase
end
initial #2000 $stop;
initial
begin
clock=0;
forever #5 clock=~clock;
end
initial
begin
reset=1;
#10 reset=0;
end
initial
begin
for(k=0;k<=1;k=k+1)
begin
key=0;
#20 for(j=0;j<=16;j=j+1)
begin
#20 keyli]=1;
#60 key=0;
end
end
end
endmodule
2.8 Verilog design of log function
The log function is a typical monocular calculation function, and correspondingly there are exponential functions, trigonometric functions, etc. There are generally two simple methods for hardware accelerator design of monocular calculation functions: one is the way of lookup table; the other is to use Taylor series expansion into polynomials for approximate calculation. These two methods are very different in terms of design method and precision. The look-up table method is designed through the memory, and the design method is simple. Its accuracy needs to be realized by increasing the memory depth, which occupies a large area in the integrated circuit. Therefore, this method is usually used in approximate calculations that do not require high precision. The Taylor series expansion method is realized by multipliers and adders, and the calculation accuracy can be improved by increasing the expansion series.
Example: use Verilog HDL to design a log function using a lookup table, the input signal bit width is 4 bits, the output signal bit width is 8 bits,
where the input data is an integer with three decimal places accurate to 2-3 , and the output result is two integers with six bits The decimal place is accurate to 2-6 . Its Verilog program code is:
module log_lookup(x,clk,out);
input [3:0] x;
input clk;
output [7:0] out;
reg [7:0] out;
always@(posedge clk)
begin
case(x)
4b1000:out<=8b00000000;
4b1001:out<=8b00000111;
4b1010:out<=8b00001110;
4b1011:out<=8b00010101;
4b1100:out<=8b00011001;
4b1101:out<=8b00100000;
4b1110:out<=8b00100100;
4b1111:out<=8b00101000
default:out<=8'bz;
endcase
end
endmodule
module log_lookup_tb;
reg clk;
reg [3:0]x;
wire [7:0] out;
initial
begin
x=4'b1000;
clk=1'b0;
repeat(7)
#10 x=x+1;
end
always #5 clk=~clk;
log_lookup U1(.x(x),.ck(clk),.out(out));
endmodule
Example: use Verilog to design the log function that adopts the Taylor series expansion mode, the input signal bit width is 4bits, and the output signal bit width is 8bits.
The definition of Taylor series: if the function f (x) has until (n+ 1) order derivative, then the order Taylor formula of f(x) in this neighborhood is:
Taylor series can approximate some complex functions in the form of polynomial addition, thereby simplifying its hardware implementation. The Taylor expansion of log a x at x 0 =b is:
the error range is:
at x 0 =1, it is:
the error range:
the circuit structure diagram is as follows:
the above log function is expanded at X=1, and X is required The value range is 1<X<2, the input 4-digit binary data X is accurate to 2-3 , and one integer has four decimal places, and the output 8-digit binary data is accurate to 2-6 , among which two integers are six-digit Decimal places. The multipliers and subtractors used in the design all adopt the subtractors and multipliers given above.
module log(x,out);
input[3:0] x;
output[7:0] out;
wire [3:0] out1;
wire [7:0] out2,out3, out5, out;
wire [3:0] out4;
assign out4={out3[7:4]};
assign out1=x-4'b1000;
wallace U1(.x(out1),.y(4'b0111),.out(out2));
wallace U2(.x(out1),.y(out1),.out(out3));
wallace U3(.x(out4),.y(4'b011),.out(out5));
assign out=out2-out5;
endmodule
module log_tb;
reg [3:0] x=4'b1000
wire [7:0] out;
log U1(.x(x),.out(out));
always
#10 x=x+1;
always@(x)
begin
if(x==4'b0000)
$stop;
end
endmodule
2.9 Verilog Implementation of CORDIC Algorithm
Coordinate Rotation Digital Computer CORDIC (Coordinate Rotation Digital Computer) algorithm, through shifting and addition and subtraction operations, can recursively calculate common function values, such as sin, cos, sinh, cosh and other functions, which were first used in navigation systems to make vector rotation and Oriented operations do not need to do complex operations such as looking up trigonometric function tables, multiplication, square root and inverse trigonometric functions. J.Walther used it in 1971 to study a unified algorithm that can calculate a variety of transcendental functions. The parameter m was introduced to unify the three iterative modes realized by CORDIC: trigonometric operations, hyperbolic operations and linear operations under the same expression. . Form the most basic mathematical basis of the currently used CORDIC algorithm. The basic idea of the algorithm is to approach the required rotation angle through a series of fixed angles related to the base of operation. It can be described by the following equation.
Proposed, so as to obtain
here, the angles of all iterations here, so the matrix in the matrix here becomes:
in the above formula. As the number of iterations increases, the modified formula will converge to a constant:
k is a constant gain, which can be ignored for the time being. At this time, the above formula will become:
If Z is used to represent the partial sum of phase accumulation, then
if Make Z rotate to 0, then the sign of S n is determined by Z n , as follows:
the final result after rotation is:
for a set of special initial values:
the result obtained is:
this working mode is called the rotating working mode Through the rotation mode, the sin and cos values of an angle can be obtained.
Iterative structure
Simply copy the formula of the CORDIC algorithm to the hardware description to realize the iterative CORDIC algorithm, and its structure is shown in the figure below.
Pipeline structure
Although the pipeline structure occupies more resources than the iterative structure, it greatly improves the data throughput. The pipeline structure is to expand the iterative structure, so that each of the n processing units can simultaneously process a same iterative operation in parallel. Its structure is shown in the figure below.
Example: use Verilog HDL to design a CORDIC algorithm based on a 7-stage pipeline structure to find sine and cosine. There is an initial X and Y value in the CORDIC algorithm. The input variable Z is an angle variable. First, X and Y are input to the shift register with a fixed number of shifts for shifting, and then the result is input to the adder/subtractor, and the adder-subtractor is determined according to the output result of the angle accumulator In this way, an iteration is completed, and the result of this iterative operation is sent to the next level of iterative operation as input, and the iterative operation is carried out sequentially. When the required number of iterations is reached (7 times in this example) ) when the result is output, this time is the desired result, so the entire CORDIC processor is an array of interconnected adder/subtractors.
module sincos(clk,rst_n,ena,phase_in,sin_out,cos_out,eps);
parameter DATA_WIDTH=8;
parameter PIPELINE=8;
input clk;
inpu trst_n;
input ena;
input [DATA _WIDTH-1:0] phase_in;
output [DATA_WIDTH-1:0] sin_out;
output [DATA_WIDTH-1:0] cos_out;
output [DATA_WIDTH-1:0] eps;
reg [DATA_WIDTH-1:0] sin_out;
reg [DATA_WIDTH-1:0] cos_out;
reg [DATA_WIDTH-1:0] eps;
reg[DATA_WIDTH-1:0] phase_in_reg;
reg [DATA_WIDTH-1:0] x0,y0,z0;
wire [DATA_WIDTH-1:0] x1,y1,z1;
wire [DATA_WIDTH-1:0] x2,y2,z2;
wire [DATA_WIDTH-1:0] x3,y3,z3;
wire [DATA_WIDTH-1:0] x4,y4,z4;
wire [DATA_WIDTH-1:0] x5,y5,z5;
wire [DATA_WIDTH-1:0] x6,y6,z6;
wire [[DATA_WIDTH-1:0] x7,y7,z7;
reg [1:0] quadrant[PIPELINE:0];
integer i;
always@(posedge clk or negedge rst n)
begin
if(!rst_n)
phase_in_reg<=8b0000_0000;
else
if(ena)
begin
case(phase_in[7:6])
2b00:phase_in_reg<=phase_in;
2b01:phase_in_reg<=phase_in-8'h40;
2b10:phase in reg<=phase_in-8'h80;
2b11:phase_in_reg<=phase_in-8hc0;
endcase
end
end
always@(posedge clk or negedge rst_n)
begin
if(!rst_n)
begin
x0<=8b00000000;
y0<=8b00000000;
z0<=8b00000000;
end
else
if(ena)
begin
x0<=8'h4D;
y0<=8'h00;
z0<=phase_in_reg;
end
end
lteration #(8,0,8'h20)u1(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x0),.y0(y0),.z0(z0),.x1(x1),.y1(y1),.z1(z1));
lteration #(8,1,8'h12)u2(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x1),.y0(y1),.z0(z1),.x1(x2),.y1(y2),.z1(z2));
lteration #(8,2,8'h09)u3(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x2),.y0(y2),.z0(z2),.x1(x3),.y1(y3),.z1(z3));
lteration #(8,3,8'h04)u4(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x3),.y0(y3),.z0(z3),.x1(x4),.y1(y4),.z1(z4));
lteration #(8,4,8'h02)u5(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x4),.y0(y4),.z0(z4),.x1(x5),.y1(y5),.z1(z5));
Iteration #(8,5,8'h01)u6(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x5),.y0(y5),.z0(z5),.x1(x6),.y1(y6),.z1(z6));
Iteration #(8,6,8'h00)u7(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x6),.y0(y6),.z0(z6),.x1(x7),.y1(y7),.z1(z7));
always@(posedge clk or negedge rst _n)
begin
if(!rst_n)
for(i=0;i<=PIPELINE;i=i+1)
quadrant[i]<=2'b00;
else
if(ena)
begin
for(i=0;i<=PIPELINE;i=i+1)
quadrant[i+1]<=quadrant[i];
quadrant[0]<=phasein[7:6];
end
end
always@(posedge clk or negedge rst_n)
begin
if(!rst_n)
begin
sin_out<=8'b00000000;
cos_out<=8'b00000000;
eps<=8'b00000000;
end
else
if(ena)
case(quadrant[7])
2'b00:
begin
sin_out<=y6;
cos_out<=x6;
eps<=z6;
end
2b01:
begin
sin_out<=x6;
cos_out<=~(y6)+1'b1;
eps<=z6;
end
2b10:
begin
sin_out<=~(y6)+1b1;
cos_out<=~(x6)+1b1;
eps<=z6
end
2'b11:
begin
sin_out<=~(x6)+1'b1;
cos_out<=y6;
eps<=z6;
end
endcase
end
endmodule
//迭代模块:
module lteration(clk,rst_n,ena,x0,y0,z0,x1,y1,z1):
parameter DATA_WIDTH=8;
parameter shift=0;
parameter constant=8"h20;
input clk,rst_n,ena;
input [DATA_WIDTH-1:0] x0,y0,z0;
output[DATA_WIDTH-1:0] x1,y1,z1;
reg [DATA_WIDTH-1:0] x1,y1,z1;
always@(posedge ck or negedge rst_n)
begin
if(!rst_n)
begin
x1<=8'b00000000;
y1<=8'b00000000
z1<=8'b00000000
end
else
if(ena)
if(z0[7]==1'b0)
begin
x1<=x0-{
{shift{y0[DATA_WIDTH-1]}},y0[DATA_WIDTH-1:shift]};
y1<=y0+{
{shift{x0[DATA_WIDTH-1]},x0[DATA_WIDTH-1:shift]};
z1<=z0-constant;
end
else
begin
x1<=x0+{
{shift{y0[DATA_WIDTH-1]}},y0[DATA_WIDTH-1:shift]};
y1<=y0-{
{shift{x0[DATA_WIDTH-1]},x0[DATA_WIDTH-1:shift]};
z1<=z0+constant;
end
end
endmodule
module sincos tb;
reg clk,rst_n,ena;
reg [7:0] phase_in;
wire [7:0] sin_out,cos_out,eps;
sincos U1(.clk(clk),.rst_n(rst_n),.ena(ena),.phase_in(phase_in),
.sin_out(sin_out),.cos_out(cos_out),.eps(eps));
initial
begin
clk=0;rst_n=0;ena=1;
phasei_n=8b00000000;
#3 rst_n=1;
end
always #5 clk=~clk;
always #10
phase_in=phase_in+1;
endmodule
3. Bus controller design
3.1 UART interface controller
The serial port is also called UART (Universal Asynchronous Receiver/Transmitters). In practical applications, usually only two pins TXD and RXD are used, while other pins are not used. The timing of the UART interface is shown in the figure: a simple UART
structure The figure is shown in the figure below:
Sending module: The function of the sending module is to send the data in serial form, and add start bit and stop bit to each group of serial data. When the byte_ready signal is active, the data is loaded into the shift register and the start bit (low level) and stop bit (high level) are added. When the byte_ready signal is invalid, the shift register starts the shift operation and sends the data in serial form.
module UART_transmitter(clk,reset,byte_ready,data,TXD);
inputclk,reset;
input byte_ready;
input[7:0] data;
output TXD;
reg [9:0] shift_reg;
assign TXD=shift_reg[0];
always@(posedge clk or negedge reset)
begin
if(!reset)
shift reg<=10'b1111111111;
else
if(byte_ready)
shift reg<=(1'b1,data,1'b0);
else
shift reg<=(1'b1,shift reg[9:1]);
endmodule
Receiving module: The function of the receiving module is to receive the serial data output by the sending module, and send the data into the memory in parallel. When the receiving module detects the start bit (low level), it starts to receive data, and the input serial data is stored in the shift register, and the data is output in parallel when the reception is completed.
module UART_receiver(clk,reset,RXD,data_out);
parameter idle=2'b00;
parameter receiving=2'b01;
inputclk,reset;
input RXD;
output [7:0] data out;
reg shift;
reg inc_count;
reg [7:0] data_out;
reg[7:0] shift_reg;
reg(3:0] count;
reg[2:0] state,next state;
always@(state or RXD or count)
begin
shift=0;
inc_count=0;
next_state=state,
case(state)
idle:
if(!RXD)
next state=receiving;
receiving:
begin
if(count==8)
begin
data_out=shift_reg;
next_state=idle;
count=0;
inc_count=0;
end
else
begin
inc_count = 1;
shift=1;
end
end
default:next_state<=idle;
endcase
end
always@(posedge clk or negedge reset)
begin
if(!reset)
begin
data_out<=8'b0;
count<=0;
state<=idle;
end
else
begin
state<=next_state;
if(shift)
shift_reg<={shift_reg[6:0],RXD):
if(inc_count)
count<=count+1;
end
end
endmodule
module UART_tb;
reg clk,reset;
reg [7:0] data;
reg byte_ready;
wire [7:0] data_out;
wire serial_data
initial
begin
clk=0;
reset=0;
byte_ready=0;
data=8'b10101010;
#40 byte_ready=1;
#50 reset=1;
#170 byte_ready=0:
end
always #80 clk=~clk
UART transmitterU1(.clk(clk),.reset(reset),.byte ready(byte_ready),
.data(data),.TXD(serial data));UART receiverU2(.clk(clk),
.reset(reset),.RXD(serial data),.data out(data out));
endmodule
3.2 SPI interface controller
Serial Peripheral Interface (Serial Peripheral Interface SPI) is a synchronous serial peripheral interface, which can realize communication data exchange between microcontrollers or between microcontrollers and various peripherals in a serial manner. SPI can be shared, which is convenient to form a system with multiple SPI interface devices, and has high transfer rate, programmable, few connecting lines, and good scalability. It is an excellent synchronous sequential circuit. The SPI bus usually has 4 lines: serial Line clock line (SCLK), master input/slave output data line MISO), master output/slave input data line (MOSI). Active-Low Slave Select Line (SS N). The SPI system can be divided into two categories: master device and slave device. The master device provides the SPI clock signal and chip select signal, and the slave device is any integrated circuit that receives the SPI signal. When the SPI is working, the data in the shift register is output bit by bit from the output pin (MOSI), and at the same time, the data is received bit by bit from the input pin (MISO). Both sending and receiving data operations are controlled by the SPI master clock signal (SCLK), thus ensuring synchronization. Therefore, there can only be one master device, but there can be multiple slave devices, and one or more slave devices can be selected at the same time through the chip select signal (SSN).
Its typical structure is shown in the figure below:
The typical sequence diagram of SPI bus is shown in the figure below:
For example: use Verilog to design a simplified SPI receiver, which is used to complete the transmission of 8bits data. The block diagram of the SPI receiver is shown in the figure:
module SPI(sdout,MISO,sclk,srst,sen,ss_n);
output [7:0] sdout;
output ss_n;
input MISO,sclk,srst,sen;
reg [2:0] counter;
reg [7:0] shift regist;
reg ss_n;
always @(posedge sclk)
if (!srst)
counter<=3'b000;
else if(sen)
if (counter==3'b111)
begin
counter<=3'b000:
ss_n<=1'b1;
end
else
begin
counter<=counter+1;
ss_n<=1'b0;
end
else
counter<=counter;
always@(posedge sclk)
if(sen)
shift_regist<={shift_regist[6:0],MISO};
else
shift_regist<=shift_regist;
assign sdout=ss_n?shift_regist:8'b00000000;
endmodule
In the code, sclk is the interface clock. srst is a clear signal, active low. sen is an interface enable signal, which is active at high level. ss_n is the chip select signal, which selects the slave device, and the high level is effective. When the circuit is powered on, the clear signal is first set to be effective, and the circuit is initialized. When the sen enable signal is valid, data transmission starts. Since the transmission data is 8 bits, the sen enable signal should be maintained for at least 8 clock cycles. When all 8bits data is input, the chip select signal ss n is valid, selects the slave device, and outputs the data as a whole. The chip select signal ss n is generated by a 3bits counter. When the counter counts to 111 states, ss_n=1, and in other states ss_n=0
module SPI_tb;
reg MISO,sclk,sen,srst;
wire [7:0] sdout;
wire ss_n;
SPI U1(.sdout(sdout),.MISO(MISO),.sclk(sclk),.srst(srst),.sen(sen),ss_n(ss_n));
initial
begin
MISO=0;sclk=0;srst=0:sen=0;
#10 srst=1;
#10 sen=1;
#80 sen=0;
#10 sen=1;
#80 sen=0;
end
initial
begin
#30 MISO=1;
#10 MISO=0;
#10 MISO=1;
#10 MISO=0:
#10 MISO=1;
#10 MISO=0;
#10 MISO=1;
#20 MISO=1;
#10 MISO=0;
#10 MISO=1;
#10 MISO=0;
#10 MISO=1;
#10 MISO=0;
#10 MISO=1;
#10 MISO=0;
end
always #5 sclk<=~sclk;
endmodule