Article directory

1. Hierarchy of digital circuit system design
2. Typical circuit design
3. Bus controller design
- 3.1 UART interface controller
- 3.2 SPI interface controller
Source: Teacher Cai Jueping's Verilog course

1. Hierarchy of digital circuit system design

insert image description here
Serial Adder:

A four-bit serial adder consists of 4 full adders. The full adder is a sub-module of the serial adder, and the full adder is composed of basic logic gates, and these basic logic gates are the so-called leaf modules. In this design, leaf modules (basic logic gates) are used to build sub-modules (full adders), and then sub-modules are used to build the required circuits (serial adders).
Obviously, the Bottom-Up design method has no obvious rules to follow. It mainly relies on the designer's practical experience and skilled design skills, and finally designs a complete digital system with a step-by-step trial method. The performance indicators of the system can only be analyzed and tested after the system is constructed. This design method is often used in the design of schematic diagrams. Compared with other methods, this method takes less time to realize each sub-module circuit.
insert image description here
Use the Top-Down design method to design a typical cpu:

Vector dot product multiplier:
use the modular hierarchical design method to design a 4-dimensional vector dot product multiplier, where vector a =(a 1 _, a ₂ ,a ₃ ,a ₄ );b=(b ₁ ,b ₂ ,b ₃ ,b ₄ ). Dot product multiplication rule:
insert image description here
Verilog code:

module vector(a1,a2,a3,a4,b1,b2,b3,b4,out); 
 input [3:0] a1,a2,a3,a4,b1,b2,b3,b4;
 output [9:0] out;
 wire [7:0] out1,out2,out3,out4; 
 wire [8:0] out5, out6; 
 wire [9:0] out;
 mul_addtree U1(.x(a1), .y(b1), .out(out1)); 
 mul_addtree U2(.x(a2), .y(b2), .out(out2));
 mul_addtree U3(.x(a3), .y(b3), .out(out3));
 mul_addtree U4(.x(a4), .y(b4), .out(out4)); 
 add #(8) U5(a(out1), .b(out2), .out(out5)); 
 add #(8) U6(.a(out3), .b(out4), .out(out6)); 
 add #(9) U7(.a(out5), .b(out6), .out(out));
endmodule

//adder 
module add(a,b,out); 
 parameter size=8;
 input [size-1:0] a,b; 
 output [size:0j out; 
 assign out=a+b;
endmodule

//Multiplier
module mul_addtree(mul_a,mul_b,mul_out); 
 input [3:0] mul_a.mul_b; 
 output [7:0] mul_out;
 wire [3:0] muI_out; 
 wire [3:0] stored0,stored1,stored2,stored3;
 wire [3:0] add0l, add23;
	assign stored3=mul_b[3]?{1'b0,mul_a,3'b0):8’b0; 
	assign stored2=mul_b[2]?{2’b0,mul_a ,2’b0}:8’b0; 
	assign stored1=mul_b[1]?{3'b0,mul_a,1'b0}:8’b0; 
	assign storedo=mul_b[0]?{4’b0,mul_a}:8’b0; 
	assign add01=storedl +stored0; 
	assign add23=stored3+stored2; 
	assign mul_out=add0l +add23; 
endmodule

2. Typical circuit design

2.1 Adder tree multiplier

The design idea of the adder tree multiplier is "shift and add", and the addition operation takes the form of an adder tree. The process of multiplication is: the multiplicand is multiplied by each bit of the multiplier and multiplied by the corresponding weight, and finally the results are added to obtain the final multiplication result.
Example: The figure below is a 4-bit multiplier structure, using verilog to design an adder tree 4-bit multiplier
insert image description here

module mul_addtree(mul_a,mul_b,mul_out);
 input [3:0] mul_a,mul_b;
 output [7:0] mul_out;
 wire [7:0] mul_out;
 wire [7:0] stored0,stored1,stored2,stored3;
 wire [7:0] add01,add23;
 assign stored3 = mul_b[3]?{1'b0,mul_a,3'b0}:8'b0;
 assign stored2 = mul_b[2]?{2'b0,mul_a,2'b0}:8'b0;
 assign stored1 = mul_b[1]?{3'b0,mul_a,1'b0}:8'b0;
 assign stored0 = mul_b[0]?{4'b0,mul_a,0'b0}:8'b0;
 assign add01=stored1+stored0;
 assign add23=stored3+stored2;
 assign mul_out=add01 + add23;
endmodule

module mult_addtree_tb;
 reg [3:0] mult_a;
 reg [3:0] mult_b;
 wire [7:0] mult_out;
 mul_addtree U1(.mul_a(mult_a), .mul_b(mult_b), .mul_out(mult_out));
 initial
 	begin
 		mult_a = a;
 		mult_b = 0;
 		repeat(9)
 			begin
 				#20 mult_a = mult_a + 1;
 				mult_b = mult_b + 1;
 			end
 	end
 endmodule

insert image description here

Pipeline structure:
Example: The figure below is a 4-bit multiplier structure, using Verilog to design a two-stage pipeline adder tree 4-bit multiplier.

insert image description here
Two-stage pipeline adder tree The structure of 4-bit multiplier is shown in the figure. By inserting D flip-flop groups between the first-stage and second-stage, second-stage and third-stage adders, a two-stage pipeline design can be realized.

module mul_addtree_2_stage(clk,clr,mul_a,mul_b,mul_out); 
 input clk,clr;
 input [3:0] mul_a,mul_b; 
 output [7:0] mul_out;
 reg [7:0] add_tmp_1,add_tmp_2,mul_out;
 wire [7:0] stored0,stored1,stored2,stored3;
assign stored3 = mul_b[3]?{1'b0,mul_a,3'b0}:8'b0; 
assign stored2 = mul_b[2]?{2'b0,mul_a,2'b0}:8'b0;
assign stored1 = mul_b[1]?{3'b0,mul_a,1'b0}:8'b0; 
assign stored0 = mul_b[0]?{4'b0,mul_a}:8'b0; 
always@(posedge clk or negedge clr) 
 begin
	if(!clr) 
		begin 
			add_tmp_1 <= 8'b0000_0000; 
			add_tmp_2 <= 8'b0000_0000; 
			mul_out <= 8'b0000_0000;
		end
	else 
		begin 
			add_tmp_1 <= stored3 + stored2;
			add_tmp_2 <= stored1 +stored0; 	
			mul_out <= add_tmp_1 + add_tmp_2;
		end
 end
endmodule

module mult_addtree_2_stag_tb;
 reg clk,clr;
 reg [3:0] mult_a,mult_b;
 wire [7:0] mult_out;
 mul_addtree_2_tage U1(.mul_a(mult_a), .mul_b(mult_b), .mul_out(mult_out), .clk(clk), .clr(clr));
 initial
 	begin
 		clk = 0;clr = 0;mult_a = 1;mult_b = 1;
 		#5 clr = 1;
 	end
 always #10 clk = ~clk;
 initial
 	begin
 		repeat(5)
 			begin
 				#20 mult_a = mult_a + 1;
 				mult_b = mult_b + 1;
 			end
 	end
 endmodule

insert image description here

2.2 Wallace tree multiplier

The operation principle of the Wallace tree multiplier is shown in the figure below, where FA is a full adder and HA is a half adder. The basic principle is that addition starts from the most data-intensive place, and uses full adder and half adder repeatedly to cover the "tree". This level of full adder is a 3-input and 2-output device, so the full adder is also called a 3-2 compressor. The depth of the tree is continuously reduced by the full adder, and finally reduced to a tree with a depth of 2. The final stage consists of a simple two-input adder.
insert image description here

module wallace(x,y,out);
	parameter size=4;
	input [size-1:0] x,y;
	output [2*size-1:0] out;
	wire [size*size-1:0] a; 
	wire [1:0] b0,b1,c0,c1,c2,c3; 
	wire [5:0] add_a,add_b; 
	wire [6:0] add_out;
	wire [2*size-1 :0] out;
	assign a={x[3],x[3],x[2],x[2],x[1],x[3],x[1],x[0],x[3],x[2],x[1],x[0],x[2],x[1],x[0],x[0]}
			& 
			 {y[3],y[2],y[3],y[2],y[3],y[1],y[2],y[3],y[0],y[1],y[1],y[2],y[0],y[0],y[1],y[0]};
			 
  hadd U1(.x(a[8]), .y(a[9]), .out(b0));
  hadd U2(.x(a[11]), .y(a(a[12]), .out(b1));
  hadd U3(.x(a[4]), .y(a[5]), .out(c0));
 
  fadd U4(.x(a[6]), .y(a[7]), .z(b0[0]),.out(c1)); 
  fadd U5(.x(a[13]), .y(a[14]), . z(b0[1]), .out(c2));
  fadd U6(.x(b1[0]), .y(a[10]), .z(b1[1], .out(c3));
 
 assign add_a = {c3[1],c2[1],c1[1],c0[1],a[3],a[1]); 
 assign add_b ={ a[15],c3[0],c2[0],c1[0],c0[0],a[2]};
 assign add_out = add_a + add_b;
 assign out={add_out,a[0]};

endmodule

module fadd(x, y, z, out); 
	output [1:0] out;
	input x,y,z;
	assign out=x+y+z; 
endmodule

module hadd(x, y, out);
	output [1:0] out;
	input x.y;
	assign out=x+y;
endmodule

module wallace_tb; 
 reg [3:0] x, y;
 wire [7:0] out;
 wallace m(.x(x), .y(y), .out(out)); 
initial
	begin
		x=3; y=4;
		#20 x=2; y=3;
		#20 x=6; y=8; 
	end
endmodule

insert image description here

2.3 Complex multiplier

insert image description here
The circuit structure of a complex multiplier is shown in the figure below. Multiply the real part of the complex number x by the real part of the complex number y, and subtract the imaginary part of x from the multiplication of the imaginary part of y to obtain the real part of the output result. Multiply the real part of x by the imaginary part of y, and multiply the imaginary part of x by the real part of y to get the imaginary part of the output.

insert image description here

module complex(a,b,c,d,out_real,out_im);
 input [3:0]a,b,c,d;
 output [8:0] out_real,out_im; 
 wire [7:0] sub1,sub2,add1,add2; 
 wallace U1(.x(a), .y(c), .out(sub1)); 
 wallace U2(.x(b), .y(d), .out(sub2)); 
 wallace U3(.x(a), .y(d), .out(add1)); 
 wallace U4(.x(b), .y(c), .out(add2));
 assign out_real=subl - sub2; 
 assign out_im = add1 + add2;
endmodule

 module complex_tb;
 	reg [3:0] a,b,c,d;
 	wire [8:0] out_real;
 	wire [8:0] out_im;
 	complex U1(.a(a), .b(b), .c(c), .d(d), .out_real(out_real), .out_im(out_im));
 	initial
 		begin
 			a=2;b=2;c=5;d=4;
 			#10
 			a=4;b=3;c=2;d=1;
 			#10
 			a=3;b=2;c=3;d=4
 		end
 endmodule

insert image description here

2.4 FIR filter design

A finite impulse response (FIR) filter is a commonly used digital filter that uses a weighted sum of its input samples to form its output. Its system function is:
insert image description here
where Z ^-1 means a delay of one clock cycle, and Z ^-2 means a delay of two clock cycles.
The FIR filter for the input sequence X[n] can be represented by the structural diagram shown in the figure below, where X[n] is the input data stream. The input connections and output connections of the stages are called taps, and the coefficients (b ₀ , b ₁ , . . . , b _n ) are called tap coefficients. A FIR filter of order M will have M+1 taps.
The data stream samples at each clock edge n (time index) are multiplied by the tap coefficients by the shift register and added to form the output Y[n].
insert image description here
code show as below:

module FIR(Data_out, Data_in ,dock,reset); //模块FIR
 output [9:0] Data_out;
 input [3:0] Data_in;
 input clock,reset;
 wire [9:0] Data_out;
 wire [3:0] samples_0,sampies_1 ,samples_2,samples_3,samples_4,
  			samples_5, samples_6, samples_7, samples_8;
 shift_register U1(.Data_in(Data_in), .clock(ciock), .reset(reset),
				   .samples_0(samples_0), .samples_1(samples_1),
				   .samples_2(samples_2), .samples_3(samples_3),
				   .samples_4(samples_4), .samples_5(samples_5),
				   .samples_6(sam ples_6), .samples_7(samples_7),
				   .samples_8(samples_8));
 caculator U2(.samples_0(samples_0), .samples_1 (samples_1),
			  .samples_2(samples_2), .sam ples_3(samples_3),
			  .samples_4(samples_4), .samples_5(sam ples_5),
			  .sampIes_6(samples_6), .samples_7(sampies_7),
			  .samples_8(sam pies_B), .Data_out( Data_out)); 
endmodule

shift_register
module shift_register(Data_in,clock,reset,samples_0,samples_1 ,samples_2,samples_3,
					  samples_4,samples_5,samples_6,samples_7, samples_8); 
 input [3:0] Data_in;
 input clock,reset;
 output [3:0] samples_0,samples_1 ,samples_2,samples_3,samples_4, 
 			  samples_5,samples_6, samples_7,samples_8;
 reg [3:0] samples_0,samples_l ,samples_2,samples_3,samples_4, 
 		   samples_5, sam pies 6, sampies_7,samptes_8;
always(posedge clock or negedge reset)
	begin
		if(reset)
			begin
				samples_0 <= 4’b0; 
				samples_1 <= 4’b0; 
				samples_2 <= 4’b0; 
				samples_3 <= 4’b0; 
				samples_4 <= 4’b0; 
				samples_5 <= 4’b0; 
				samples_6 <= 4’b0; 
				samples_7 <= 4’b0; 
				samples_8 <= 4’b0;
			end
		else
			begin
				samples_0 <= Data_in;
				samples_1 <= samples_0; 
				samples_2 <= samples_1; 
				samples_3 <= samples_2; 
				samples_4 <= samples_3; 
				samples_5 <= samples_4; 
				samples_6 <= samples_5; 
				samples_7 <= samples_6; 
				samples_8 <= samples_7; 
			end
	end
endmodule

//模块caculator
module caculator(sampies_O,samples_i ,samples_2,samples_3,samples_4,
				 samples_5,samples_6, samples_7,samples_8,Data_out);
 input [3:0] samples_0,samples_1 ,samples_2,samples_3,samples_4,samples_5,samples_6, samples_7,samples_8;
 output [9:0] Data_out; 
 wire [9:0] Data_out; 
 wire [3:0] out_tmp_1 ,out_tmp_2,out_tmp_3,out_tmp_4,out_tmp_5; 
 wire [7:0] outl,out2,out3,out4,out5;
 parameter b0=4’b0010; 
 parameter b1=4’b0011; 
 parameter b2=4’b0110; 
 parameter b3=4’b1010; 
 parameter b4=4’b1100; 
 mul_addtree U1(.mul_a(b0),.mul_b(out_tmp_1),.mul_out(out1)); 
 mul_addtree U2(.mul_a(b1),.mul_b(out_tmp_2),.mul_out(out2)); 
 mul_addtree U3(.mul_a(b2),.mul_b(out_tmp_3),.mul_out(out3)); 
 mul_addtree U4(.mul_a(b3),.mul_b(out_tmp_4),.mul_out(out4)); 
 mul_addtree U5(.mul_a(b4),.mul_b(samples_4),.mul_out(out5)); 
 assign out_tmp_1 = samples_0 + samples_8;
 assign out_tmp_2  =samples_1 + samples7; 
 assign out_tmp_3 = samples_2 + samples_6; 
 assign out_tmp_4 = samples3 + samples_5; 
 assign Data_out = out1 +out2 + out3 + out4  + qout5; 
endmodule
//模块FIR_tb
module FIR_tb; 
 reg clock,reset; 
  reg [3:0] Data_in; 
  wire [9:0] Data_out;
 FIR U1(.Data_out(Data_out), .Data_in(Data_in), .clock(clock), reset(reset)); 
 initial
	begin
		Data_in = 0; clock = 0; reset = 1;
		#10 reset = 0; 
	end
	always
		begin
			#5 clock <= ~clock;
			#5 Data_in <= Data_in+1;
		end
endmodule

insert image description here

2.5 Design of on-chip memory

(1) The Verilog description of RAM
RAM is a random access memory, and the contents of the storage unit can be taken out or stored at will as needed. This kind of memory will lose all data after power off, and is generally used to store some programs and data used in a short time. Its internal structure is as follows:
insert image description here
For example: use Verilog to design a single-port RAM with a depth of 8 and a bit width of 8. Single-port RAM has only one set of address bus, and the read and write operations are separated.

module ram_single(clk, addm, cs_n, we_n, din, dout);
input clk;	 //clock signal
input [2:0] addm;	 //address signal
input cs_n;	 //chip select signal
input we_n; 	//write enable signal
input [7:0] din; 	//input data
output[7:O] dout; 	//output data
reg [7:0] dout;
reg [7:0] raml [7:0]; 	//8*8 bites register 
aIways(posedge clk)
	begin
		if(cs_n)
			dout <= 8’bzzzz_zzzz;
		else 
			if(we_n) 	//read data 
				dout <= raml[addm];
			else	 //write data
				raml[addm] <= din;
	end
end module

module ram single tb;
 reg clk, we_n, cs_n;
 reg [2:0] addm;
 reg [7:0] din;
 wire [7:0] dout;
 ram_single U1(.clk(clk),.addm(addm),.cs_n(cs_n),.we_n(we_n),.din(din),.dout(dout));
 initial 
 	begin
 		clk=0; addm=0; cs_n=1; we_n=0; din=0;
 		#5 cs_n=0;
 		#315 we_n=1;
	end
always #10 clk=~clk;
initial
	begin
		repeat(7) 
			begin
				#40 addm=addm+1;
				din=din+1;
			end
			#40 repeat(7)
			#40 addm=addm-1;
	end
endmodule

insert image description here
Example: Use Verilog to design a dual-port RAM with a depth of 8 and a bit width of 8. Dual-port RAM has two sets of address buses, one for reading data and the other for writing data. Both can be operated independently.

module ram_dual(q, addr _n, addr_out, d, we, rd, clk1, clk2);
 output [7:0] q;	//output data
 input [7:0] d;	//input data
 input [2:0] addr_in; 	//write data address signal
 input [2:0] addr_out; 	//output data address signal
 input we;	//write data control signal
 input rd;	//read data control signal
 input clk1; 	//write data clock
 input clk2; 	//read data clock
 reg[7:0] q;
 reg[7:0] mem[7:0];	 //8*8 bites register
 always@(posedge clk1)
	begin
		if(we)
			mem[addr_n] <= d;
	end
 always@(posedge clk2)
	begin
		if(rd)
			q <= mem[addr_out];
	end
endmodule

module ram_dual_tb;
 reg clk1, clk2, we, rd;
 reg [2:0] addr_in;
 reg [2:0 ]addr_out;
 reg [7:0] d;
 wire [7:0] q;
 ram_dual U1(.q(q),.addr_in(addr_in),.addr_out(addr_out),.d(d),.we(we),.rd(rd),.clk1(clk1),.clk2(clk2));
 initial
 	begin
 		clk1=0; clk2=0; we=1; rd=0; addr_in=0; addr_out=0; d=0;
 		#320 we=0;
 		rd=1;
	end
 always
 	begin
 		#10 clk1 = ~clk1;
 		clk2 = ~clk2;
 	end
initial
	begin
		repeat(7)
			begin
				#40 addr_in=addr_in+1;
				d=d+1;
			end
			#40
			repeat(7) #40 addr_out=addr_out+1;
	end
endmodule

insert image description here
(2) The Verilog description of ROM ROM
is read-only memory, which is a memory that can only read data stored in advance. Its characteristic is that the stored data cannot be changed, that is to say, this memory can only be read but not written. Since ROM data will not be lost after power failure, it is usually used in electronic or computer systems that do not need to change data frequently, and the data will not disappear because the power is turned off.

module rom(dout, clk, addm, cs_n);
 input clk, cs_n;
 input [2:0] addm;
 output [7:0] dout;
 reg [7:0] dout;
 reg [7:0] rom[7:0];
 initial
 	begin
		rom[0]=8b0000_0000;
		rom[1]=8b0000_0001;
		rom[2]=8b0000_0010;
		rom[3]=8b0000_0011;
		rom[4]=8b0000_0100
		rom[5]=8b0000_0101;
		rom[6]=8b0000_0110;
		rom[7]=8'b0000_0111;
	end
 always@(posedge clk)
 	begin
		if(cs_n) 
			dout<=8'bzzzz_zzzz;
		else
			dout<=rom[addm];
	end
endmodule

module rom_tb;
 reg clk, cs_n;
 reg [2:0] addm;
 wire [7:0] dout;
 rom U1(.dout(dout),.clk(clk),.addm(addm),.cs_n(cs_n));
 initial 
 	begin
		clk=0; addm=0; cs_n=0;
	end
 always #10 clk=~clk;
 initial 
 	begin
		repeat(7)
		#20 addm=addm+1;
	end
endmodule

insert image description here

2.6 FIFO design

FIFO (First In First Out) is a first-in first-out data buffer, usually used for data buffering of interface circuits. The difference with ordinary memory is that there is no external read and write address line, and two clocks can be used to perform write and read operations respectively. FIFO can only write data sequentially and read data sequentially. Its data address is completed by adding 1 automatically to the internal read and write pointer. It cannot be read or written to a specified address by the address line like ordinary memory.
The FIFO consists of memory blocks and a controller that manages the passage of data into and out of the FIFO, providing access to only one register at a time, rather than the entire register array. The FIFO has two address pointers, one for writing data to the next available storage unit, and one for reading the next unread storage unit. Reading and writing data must be done one at a time.
The reading and writing process is shown in the figure:
insert image description here
when a stack is empty (Figure A), the read data pointer and write data pointer both point to the first storage unit as shown; when writing a data (Figure B) write data The pointer will point to the next storage unit; after seven write data operations (Figure C) the write pointer will point to the last data unit; after eight consecutive write operations, the write pointer will return to the first unit and show that the stack status is full ( Figure D). The data read operation is similar to the write operation. When a data is read, the read data pointer will move to the next storage unit until all the data is read. At this time, the read pointer returns to the first unit, and the stack status is displayed as empty.

The composition of a FIFO generally includes two parts: the address control part and the RAM part for storing data. As shown below. The address control part can generate RAM addresses according to read and write instructions. RAM is used to store stack data, and store and read data according to the address signal generated by the control part. The RAM used here is the aforementioned dual-port RAM.
insert image description here
Example: Use Verilog HDL to design a FIFO with a depth of 8 and a bit width of 8

//顶层模块:
module FIFO_buffer(clk,rst,write_to_stack,read_from_stack,Data_in,Data_out);
 input clk,rst;
 input write_to_stack,read_from_stack;
 input [7:0] Data_in;
 output [7:0] Data_out;
 wire [7:0] Data_out;
 wire stack_full, stack_empty;
 wire [2:0] addr_in, addr_out;
 FIFO_control U1(.stack_full(stack_full),.stack_empty(stack_empty),.write_to_stack(write_to_stack),
 				 .write_ptr(addr_in),read_ptr(addr_out),.read from stack(read from stack),
 				 .clk(clk),.rst(rst));
 ram_dual U2(.q(Data out),.addr_in(addr_in),.addr_out(addr_out),.d(Data_in),
  			 .we(write_to_stack),.rd(read_from_stack),.clk1(clk),.clk2(clk));
endmodule

//控制模块:
module FIFO_control(write_ptr, read_ptr, stack_full, stack_empty, write_to_stack,read_from_stack, clk, rst);
 parameter stack_width=8；
 parameter stack_height=8
 parameter stack_ptr_width=3,

 output stack_full;		//stack full flag
 output stack_empty;		//stack empty flag
 output [stack_ptr_width-1:0] read_ptr;  //read data address
 output[stack_ptr_width-1:0] write ptr;  //write data address

 input write_to_stack;   //write data to stack
 input read_from_stack;  //read data from stack

 input clk;
 input rst;

reg [stack_ptr_width-1:0] read_ptr;
reg [stack_ptr_width-1:0] write_ptr;
reg [stack_ptr_width:0] ptr_gap;
reg [stack_width-1:0] Data_out;
reg [stack_width-1:0] stack[stack_height-1:0];

 //stack status signal
 assign stack_full=(ptr_gap==stack height);
 assign stack_empty=(ptr_gap==0);
 
 always@(posedge clk or posedge rst)
 	begin
		if(rst)
			begin
				Data_out<=0;
				read_ptr<=0;
				write_ptr<=0;
				ptr_gap<=0;
			end
		else if(write_to_stack && (!stack_full) && (!read_from_stack))
			begin
				write_ptr<=write_ptr+1;
				ptr_gap<=ptr_gap+1;
			end
		else if(!write_to_stack && (!stack_empty) && (read_from_stack))
			begin
				read_ptr<=read_ptr+1;
				ptr_gap<=ptr_gap-1;
			end
		else if(write_to_stack && stack_empty && read_from_stack)
			begin
				write_ptr<=write_ptr+1;
				ptr_gap<=ptr_gap+1;
			end
		else if(write_to_stack && stack_full && read_from_stack)
			begin
				read_ptr<=read_ptr+1;
				ptr_gap<=ptr_gap-1;
			end
		else if(write_to_stack && read_from_stack&& (!stack_full)&&(!stack_empty))
			begin
				read_ptr<=read_ptr+1;
				write_ptr<=write_ptr+1;
			end
	end
endmodule

module FIFO_tb;
 reg clk, rst;
 reg [7:0] Data_in;
 reg write_to_stack, read_from_stack;
 wire [7:0] Data_out;
 FIFO_buffer U1(.clk(clk),.rst(rst),.write_to_stack(write_to_stack),
			    .read_from_stack(read_from_stack),.Data_in(Data_in),.Data_out(Data_out));
initial
	begin
		clk=0; rst=1; Data_in=0, write_to-stack=1; read_from_stack=0;
		#5 rst=0;
		#155 write_to_stack=0;
		read _rom_stack=1:
	end
always #10 clk = ~clk;
	initial
		begin
			repeat(7)
			#20 Data_in =Data_in+1;
		end
endmodule

insert image description here

2.7 Keyboard Scanner and Encoder

Keypad scanners and encoders are used to manually enter data in digital systems with keyboards, by detecting whether a key is pressed, and generating a scan code that uniquely corresponds to the key.
Example: Use Verilog to design the keyboard scan and encoder of the hexadecimal keyboard circuit: the
insert image description here
control signal state machine transition diagram is shown in the figure below:

For details, see: Verilog implementation of the hexadecimal keyboard scanner

At this time, the intersection of the row and column lines is the position of the button. Output the corresponding coding information according to the determined position of the key. Its keyboard code table is shown in the table below.

Key	Row[3:0]	Col[3:0]	Code
0	0001	0001	0000
1	0001	0010	0001
2	0001	0100	0010
3	0001	1000	0011
4	0010	0001	0100
5	0010	0010	0101
6	0010	0100	0110
7	0010	1000	0111
8	0100	0001	1000
9	0100	0010	1001
A	0100	0100	1010
B	0100	1000	1011
C	1000	0001	1100
D	1000	0010	1101
E	1000	0100	1110
F	1000	1000	1111

In order to make the test closer to the real physical environment, the test platform must include a signal generator that simulates the state of the button, a module Row_signal that can confirm the row line corresponding to the button, and the tested module Hex Keypad Grayhill 072. The signal generator that simulates the state of the key can be embedded in the test platform, and by continuously assigning values to the key signal, different key signals can be simulated. The Row_Signal module is used to detect the validity of the key and determine the row where the key is located. The Synchronizer module determines whether a key is pressed by detecting the OR of each line value. When the output of this module changes, the tested module Hex Keypad Grayhil 072 will determine the position of the key and output the corresponding code

Its Verilog HDL program code is:

// 顶层模块:
module keypad(clock,reset,row,code,vaild,col);
 input clock,reset;
 input [3:0] row;
 output [3:0] code;
 output vaild;
 output [3:0] col;
 wire s_row;
 hex_keypad_grayhill U1(.code(code),.col(col),.valid(valid),
 						.row(row),.s_row(s_row),.clock(clock),.reset(reset));
 synchronizer U2(.s_row(srow),.row(row),.clock(clock),.reset(reset));
endmodule

//编码模块:
module hex_keypad_grayhill(code,col,valid,row,s_row,clock,reset);
 output [3:0] code;
 output valid;
 output [3:0] col;
 input [3:0] row;
 inputs row;
 input clock,reset;
 reg [3:0] col;
 reg[3:0] code;
 reg [5:0] state,next_state,
 parameter s_0=6'b000001,s_1=6'b000010,s_2=6'b000100;
 parameter s_3=6'b001000,s_4=6'b010000,s_5=6'b100000;
 assign valid=((state==s_1)|(state==s_2)|(state==s_3)|(state==s_4))&&row;
 always@(row or col)
 	case(frow,col})
 		8'b0001_0001: code=0;
 		8'b0001_0010: code=1;
 		8'b0001_0100: code=2;
 		8'b0001_1000: code=3;
		8'b0010_0001: code=4;
		8'b0010_0010: code=5;
		8'b0010_0100: code=6;
		8'b0010_1000: code=7;
		8'b0100_0001: code=8,
		8'b0100_0010: code=9;
		8'b0100_0100: code=10;
		8'b0100_1000: code=11;
		8'b1000_0001: code=12;
		8'b1000_0010: code=13;
		8'b1000_0100: code=14;
		8'b1000_1000: code=15;
		default code=0;
	endcase
 always@(state or s_row or row) 	//next-state logic
begin
	col=0:next_state=state;
	case(state)
		s_0:
			begin
				col=15;
				if(s_row) next_state=s_1;
			end
		s_1:
			begin
				col=1;
					if(row) next_state=s_5;
					else next_state=s_2
			end
		s_2:
			begin
				col=2;
				if(row) next_state=s_5;
				else	next_state=s_3,
			end
		s_3:
			begin
				col=4;
				if(row) next_state=s_5;
				else next_state=s _4;
			end
		s_4:
			begin
				col=8;
				if(row) next_state=s_5;
				else next_state=s_0;
			end
		s_5: 
			begin
				col=15;
				if(!row) next_state=s_0;
			end
	endcase
end
always@(posedge clock or posedge reset)
	if(reset)
		state<=s_0;
	else
		state<=next_state;
endmodule

module synchronizer(s_row,row,clock,reset);
 output s_row;
 input [3:0] row;
 input clock,reset;
 reg a_row,s_row;
always@(negedge clock or posedge reset)
	begin 
		if(reset)
			begin
				a_row<=0;
				s_row<=0;
			end
		else
			begin
				a_row<=(row[0]llrow[1]llrow[2]llrow[3]);
				s row<=a row;
			end
endendmodule

//模拟键盘产生信号
module row_signal(row,key,col);
 output [3:0] row;
 input [15:0] key;
 input[3:0] col;
 reg[3:0] row;
 always@(key or col)
 	begin
		row[0]=key[0]&&col[0]||key[1]&&col[1]||key[2]&&col[2]||key[3]&&col[3];
		row[1]=key[4]&&col[0]||key[5]&&col[1]||key[6]&&col[2]||key[7]&&col[3];
		row[2]=key[8]&&col[0]||key[9]&&col[1]||key[10]&&col[2]||key[11]&&col[3];
		row[3]=key[12]&&col[0]||key[13]&&col[1]key[14]&&col[2]||key[15]&&col[3];
	end
endmodule


//Testbench
module hex_keypad_grayhill_tb;
 wire [3:0] code;
 wirevalid;
 wire [3:0] col;
 wire [3:0] row;
 reg clock;
 reg reset;
 reg [15:0] key;
 integer j,k;
 reg [39:0] pressed;
 parameter [39:0] key_0="key_0";
 parameter [39:0] key_1="key_1";
 parameter [39:0] key_2="key_2";
 parameter [39:0] key_3="key_3";
 parameter [39:0] key_4="key_4";
 parameter [39:0] key_5="key_5";
 parameter [39:0] key_6="key_6";
 parameter [39:0] key_7="key_7";
 parameter [39:0] key 8="key 8";
 parameter [39:0] key_9="key_9";
 parameter [39:0] key_A="key_A";
 parameter [39:0] key_B="key_B";
 parameter [39:0] key_C="key_C";
 parameter [39:0] key_D="key_D";
 parameter [39:0] key_E="key_E";
 parameter [39:0] key_F="key_F";
 parameter [39:0] None="None";
 keypad U1(.clock(clock),.reset(reset),.row(row),
 		  .code(code),.vaild(vaild),.col(col));		//top module
 row_signal U2(.row(row),.key(key),.col(col)); 	// Simulatesignal generation
 
 always@(key)
 	begin
 		case(key)
 			16'h0000: pressed=None;
 			16'h0001: pressed=key_0;
 			16'h0002: pressed=key_1;
 			16'h0004: pressed=key_2;
 			16'h0008: pressed=key_3;
 			16'h0010: pressed=key_4;
 			16'h0020: pressed=key_5;
 			16'h0040: pressed=key_6;
 			16'h0080: pressed=key_7;
			16'h0100: pressed=key_8;
			16'h0200: pressed=key_9;
			16'h0400: pressed=key_A;
			16'h0800: pressed=key_B;
			16'h1000: pressed=key_C;
			16'h2000: pressed=key_D;
			16'h4000: pressed=key_E;
			16'h8000: pressed=key_F;
			default: pressed=None;
		endcase
	end
initial #2000 $stop;
initial 
	begin 
		clock=0;
		forever #5 clock=~clock;
	end
initial 
	begin 
		reset=1;
		#10 reset=0;
	end
initial 
	begin 
		for(k=0;k<=1;k=k+1)
			begin 
				key=0;
				#20 for(j=0;j<=16;j=j+1)
							begin
								#20 keyli]=1;
								#60 key=0;
							end
			end
	end
endmodule

2.8 Verilog design of log function

The log function is a typical monocular calculation function, and correspondingly there are exponential functions, trigonometric functions, etc. There are generally two simple methods for hardware accelerator design of monocular calculation functions: one is the way of lookup table; the other is to use Taylor series expansion into polynomials for approximate calculation. These two methods are very different in terms of design method and precision. The look-up table method is designed through the memory, and the design method is simple. Its accuracy needs to be realized by increasing the memory depth, which occupies a large area in the integrated circuit. Therefore, this method is usually used in approximate calculations that do not require high precision. The Taylor series expansion method is realized by multipliers and adders, and the calculation accuracy can be improved by increasing the expansion series.
Example: use Verilog HDL to design a log function using a lookup table, the input signal bit width is 4 bits, the output signal bit width is 8 bits,
insert image description here
where the input data is an integer with three decimal places accurate to 2-3 ^, and the output result is two integers with six bits The decimal place is accurate to ^2-6 . Its Verilog program code is:

module log_lookup(x,clk,out);
 input [3:0] x;
 input clk;
 output [7:0] out;
 reg [7:0] out;
 always@(posedge clk)
 	begin
		case(x)
			4b1000:out<=8b00000000;
			4b1001:out<=8b00000111;
			4b1010:out<=8b00001110;
			4b1011:out<=8b00010101;
			4b1100:out<=8b00011001;
			4b1101:out<=8b00100000;
			4b1110:out<=8b00100100;
			4b1111:out<=8b00101000
			default:out<=8'bz;
		endcase
	end
endmodule

module log_lookup_tb;
 reg clk;
 reg [3:0]x;
 wire [7:0] out;
 initial
	begin
		x=4'b1000;
		clk=1'b0;
		repeat(7)
		#10 x=x+1;
	end
	always #5 clk=~clk;
  log_lookup U1(.x(x),.ck(clk),.out(out));
 endmodule

insert image description here
Example: use Verilog to design the log function that adopts the Taylor series expansion mode, the input signal bit width is 4bits, and the output signal bit width is 8bits.
The definition of Taylor series: if the function f (x) has until (n+ 1) order derivative, then the order Taylor formula of f(x) in this neighborhood is:
insert image description here
Taylor series can approximate some complex functions in the form of polynomial addition, thereby simplifying its hardware implementation. The Taylor expansion of log _a x at x ₀ =b is:

the error range is:

at x ₀ =1, it is:

the error range:

the circuit structure diagram is as follows:
insert image description here
the above log function is expanded at X=1, and X is required The value range is 1<X<2, the input 4-digit binary data X is accurate to 2-3 ^, and one integer has four decimal places, and the output 8-digit binary data is accurate to 2-6 ^, among which two integers are six-digit Decimal places. The multipliers and subtractors used in the design all adopt the subtractors and multipliers given above.

module log(x,out);
 input[3:0] x;
 output[7:0] out;
 wire [3:0] out1;
 wire [7:0] out2,out3, out5, out;
 wire [3:0] out4;
 assign out4={out3[7:4]};
 assign out1=x-4'b1000;
 wallace U1(.x(out1),.y(4'b0111),.out(out2));
 wallace U2(.x(out1),.y(out1),.out(out3));
 wallace U3(.x(out4),.y(4'b011),.out(out5)); 
 assign out=out2-out5;
endmodule

module log_tb;
 reg [3:0] x=4'b1000
  wire [7:0] out;
  log U1(.x(x),.out(out));
  always
 	 #10 x=x+1;
  always@(x)
  	begin
  		if(x==4'b0000)
  			$stop;
	end
endmodule

insert image description here

2.9 Verilog Implementation of CORDIC Algorithm

Coordinate Rotation Digital Computer CORDIC (Coordinate Rotation Digital Computer) algorithm, through shifting and addition and subtraction operations, can recursively calculate common function values, such as sin, cos, sinh, cosh and other functions, which were first used in navigation systems to make vector rotation and Oriented operations do not need to do complex operations such as looking up trigonometric function tables, multiplication, square root and inverse trigonometric functions. J.Walther used it in 1971 to study a unified algorithm that can calculate a variety of transcendental functions. The parameter m was introduced to unify the three iterative modes realized by CORDIC: trigonometric operations, hyperbolic operations and linear operations under the same expression. . Form the most basic mathematical basis of the currently used CORDIC algorithm. The basic idea of the algorithm is to approach the required rotation angle through a series of fixed angles related to the base of operation. It can be described by the following equation.
insert image description here
Proposed, so as to obtain

here, the angles of all iterations here, so the matrix in the matrix here becomes:

in the above formula. As the number of iterations increases, the modified formula will converge to a constant:

k is a constant gain, which can be ignored for the time being. At this time, the above formula will become:
insert image description here
If Z is used to represent the partial sum of phase accumulation, then

if Make Z rotate to 0, then the sign of S _{n is determined by Z}_n , as follows:

the final result after rotation is:

for a set of special initial values:

the result obtained is:

this working mode is called the rotating working mode Through the rotation mode, the sin and cos values of an angle can be obtained.

Iterative structure
Simply copy the formula of the CORDIC algorithm to the hardware description to realize the iterative CORDIC algorithm, and its structure is shown in the figure below.
insert image description here
Pipeline structure
Although the pipeline structure occupies more resources than the iterative structure, it greatly improves the data throughput. The pipeline structure is to expand the iterative structure, so that each of the n processing units can simultaneously process a same iterative operation in parallel. Its structure is shown in the figure below.
insert image description here
Example: use Verilog HDL to design a CORDIC algorithm based on a 7-stage pipeline structure to find sine and cosine. There is an initial X and Y value in the CORDIC algorithm. The input variable Z is an angle variable. First, X and Y are input to the shift register with a fixed number of shifts for shifting, and then the result is input to the adder/subtractor, and the adder-subtractor is determined according to the output result of the angle accumulator In this way, an iteration is completed, and the result of this iterative operation is sent to the next level of iterative operation as input, and the iterative operation is carried out sequentially. When the required number of iterations is reached (7 times in this example) ) when the result is output, this time is the desired result, so the entire CORDIC processor is an array of interconnected adder/subtractors.
insert image description here

module sincos(clk,rst_n,ena,phase_in,sin_out,cos_out,eps);
 parameter DATA_WIDTH=8;
 parameter PIPELINE=8;
 input clk;
 inpu trst_n;
 input ena;
 input [DATA _WIDTH-1:0] phase_in;
 output [DATA_WIDTH-1:0] sin_out;
 output [DATA_WIDTH-1:0] cos_out;
 output [DATA_WIDTH-1:0] eps;
 reg [DATA_WIDTH-1:0] sin_out;
 reg [DATA_WIDTH-1:0] cos_out;
 reg [DATA_WIDTH-1:0] eps;
 reg[DATA_WIDTH-1:0] phase_in_reg;
 reg [DATA_WIDTH-1:0] x0,y0,z0;
 wire [DATA_WIDTH-1:0] x1,y1,z1;
 wire [DATA_WIDTH-1:0] x2,y2,z2;
 wire [DATA_WIDTH-1:0] x3,y3,z3;
 wire [DATA_WIDTH-1:0] x4,y4,z4;
 wire [DATA_WIDTH-1:0] x5,y5,z5;
 wire [DATA_WIDTH-1:0] x6,y6,z6;
 wire [[DATA_WIDTH-1:0] x7,y7,z7;

 reg [1:0] quadrant[PIPELINE:0];
 integer i;
 always@(posedge clk or negedge rst n)
 	begin
		if(!rst_n)
			phase_in_reg<=8b0000_0000;
		else
			if(ena)
				begin
					case(phase_in[7:6])
						2b00:phase_in_reg<=phase_in;
						2b01:phase_in_reg<=phase_in-8'h40;
						2b10:phase in reg<=phase_in-8'h80;
						2b11:phase_in_reg<=phase_in-8hc0;
					endcase
			end
	end
always@(posedge clk or negedge rst_n)
	begin
		if(!rst_n)
			begin
				x0<=8b00000000;
				y0<=8b00000000;
				z0<=8b00000000;
			end
		else
			if(ena)
				begin
					x0<=8'h4D;
					y0<=8'h00;
					z0<=phase_in_reg;
				end
	end
lteration #(8,0,8'h20)u1(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x0),.y0(y0),.z0(z0),.x1(x1),.y1(y1),.z1(z1));
lteration #(8,1,8'h12)u2(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x1),.y0(y1),.z0(z1),.x1(x2),.y1(y2),.z1(z2));
lteration #(8,2,8'h09)u3(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x2),.y0(y2),.z0(z2),.x1(x3),.y1(y3),.z1(z3));
lteration #(8,3,8'h04)u4(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x3),.y0(y3),.z0(z3),.x1(x4),.y1(y4),.z1(z4));
lteration #(8,4,8'h02)u5(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x4),.y0(y4),.z0(z4),.x1(x5),.y1(y5),.z1(z5));
Iteration #(8,5,8'h01)u6(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x5),.y0(y5),.z0(z5),.x1(x6),.y1(y6),.z1(z6));
Iteration #(8,6,8'h00)u7(.clk(clk),.rst_n(rst_n),.ena(ena),.x0(x6),.y0(y6),.z0(z6),.x1(x7),.y1(y7),.z1(z7));
always@(posedge clk or negedge rst _n)
	begin
		if(!rst_n)
			for(i=0;i<=PIPELINE;i=i+1)
				quadrant[i]<=2'b00;
		else
			if(ena)
				begin
					for(i=0;i<=PIPELINE;i=i+1)
						quadrant[i+1]<=quadrant[i];
						quadrant[0]<=phasein[7:6];
				end
	end
always@(posedge clk or negedge rst_n)
	begin
		if(!rst_n)
			begin
				sin_out<=8'b00000000;
				cos_out<=8'b00000000;
				eps<=8'b00000000;
			end
		else
			if(ena)
				case(quadrant[7])
					2'b00:
						begin
							sin_out<=y6;
							cos_out<=x6;
							eps<=z6;
						end
					2b01:
						begin
							sin_out<=x6;
							cos_out<=~(y6)+1'b1;
							eps<=z6;
						end
					2b10:
						begin
							sin_out<=~(y6)+1b1;
							cos_out<=~(x6)+1b1;
							eps<=z6
						end
					2'b11:
						begin
							sin_out<=~(x6)+1'b1;
							cos_out<=y6;
							eps<=z6;
						end
				endcase
	end
endmodule

//迭代模块:
module lteration(clk,rst_n,ena,x0,y0,z0,x1,y1,z1):
 parameter DATA_WIDTH=8;
 parameter shift=0;
 parameter constant=8"h20;
 input clk,rst_n,ena;
 input [DATA_WIDTH-1:0] x0,y0,z0;
 output[DATA_WIDTH-1:0] x1,y1,z1;
 reg [DATA_WIDTH-1:0] x1,y1,z1;
always@(posedge ck or negedge rst_n)
	begin
		if(!rst_n)
			begin
				x1<=8'b00000000;
				y1<=8'b00000000
				z1<=8'b00000000
			end
		else
			if(ena)
				if(z0[7]==1'b0)
					begin
						x1<=x0-{
   
   {shift{y0[DATA_WIDTH-1]}},y0[DATA_WIDTH-1:shift]};
						y1<=y0+{
   
   {shift{x0[DATA_WIDTH-1]},x0[DATA_WIDTH-1:shift]};
						z1<=z0-constant;
					end
			else
				begin
					x1<=x0+{
   
   {shift{y0[DATA_WIDTH-1]}},y0[DATA_WIDTH-1:shift]};
					y1<=y0-{
   
   {shift{x0[DATA_WIDTH-1]},x0[DATA_WIDTH-1:shift]};
					z1<=z0+constant;
				end
	end
endmodule

module sincos tb;
	reg clk,rst_n,ena;
	reg [7:0] phase_in;
	wire [7:0] sin_out,cos_out,eps;
	sincos U1(.clk(clk),.rst_n(rst_n),.ena(ena),.phase_in(phase_in),
			  .sin_out(sin_out),.cos_out(cos_out),.eps(eps));
	initial
		begin
			clk=0;rst_n=0;ena=1;
			phasei_n=8b00000000;
			#3 rst_n=1;
		end
	always #5 clk=~clk;
	always #10
	phase_in=phase_in+1;
endmodule

insert image description here

3. Bus controller design

3.1 UART interface controller

The serial port is also called UART (Universal Asynchronous Receiver/Transmitters). In practical applications, usually only two pins TXD and RXD are used, while other pins are not used. The timing of the UART interface is shown in the figure: a simple UART
insert image description here
structure The figure is shown in the figure below:

Sending module: The function of the sending module is to send the data in serial form, and add start bit and stop bit to each group of serial data. When the byte_ready signal is active, the data is loaded into the shift register and the start bit (low level) and stop bit (high level) are added. When the byte_ready signal is invalid, the shift register starts the shift operation and sends the data in serial form.

module UART_transmitter(clk,reset,byte_ready,data,TXD);
 inputclk,reset;
 input byte_ready;
 input[7:0] data;
 output TXD;
 reg [9:0] shift_reg;
assign TXD=shift_reg[0];
always@(posedge clk or negedge reset)
	begin
		if(!reset)
			shift reg<=10'b1111111111;
		else 
			if(byte_ready)
				shift reg<=(1'b1,data,1'b0);
			else
				shift reg<=(1'b1,shift reg[9:1]);
endmodule

Receiving module: The function of the receiving module is to receive the serial data output by the sending module, and send the data into the memory in parallel. When the receiving module detects the start bit (low level), it starts to receive data, and the input serial data is stored in the shift register, and the data is output in parallel when the reception is completed.

module UART_receiver(clk,reset,RXD,data_out);
 parameter idle=2'b00;
 parameter receiving=2'b01;
 inputclk,reset;
 input RXD;
 output [7:0] data out;
 reg shift;
 reg inc_count;
 reg [7:0] data_out;
 reg[7:0] shift_reg;
 reg(3:0] count;
 reg[2:0] state,next state;
 always@(state or RXD or count)
 	begin
		shift=0;
		inc_count=0;
		next_state=state,
		case(state)
			idle:
				if(!RXD)
					next state=receiving;
			receiving:
				begin
					if(count==8)
						begin
							data_out=shift_reg;
							next_state=idle;
							count=0;
							inc_count=0;
						end
					else
						begin
							inc_count = 1;
							shift=1;
						end
				end
			default:next_state<=idle;
		endcase
	end
 always@(posedge clk or negedge reset)
	begin
		if(!reset)
			begin
				data_out<=8'b0;
				count<=0;
				state<=idle;
			end
		else 
			begin
				state<=next_state;
				if(shift)
					shift_reg<={shift_reg[6:0],RXD):
				if(inc_count)
					count<=count+1;
			end
	end
endmodule

module UART_tb;
 reg clk,reset;
 reg [7:0] data;
 reg byte_ready;
 wire [7:0] data_out;
 wire serial_data
 initial
 	begin
 		clk=0;
		reset=0;
		byte_ready=0;
		data=8'b10101010;
		#40 byte_ready=1;
		#50 reset=1;
		#170 byte_ready=0:
	end
 always #80 clk=~clk
 UART transmitterU1(.clk(clk),.reset(reset),.byte ready(byte_ready),
					.data(data),.TXD(serial data));UART receiverU2(.clk(clk),
					.reset(reset),.RXD(serial data),.data out(data out));
endmodule

insert image description here

3.2 SPI interface controller

Serial Peripheral Interface (Serial Peripheral Interface SPI) is a synchronous serial peripheral interface, which can realize communication data exchange between microcontrollers or between microcontrollers and various peripherals in a serial manner. SPI can be shared, which is convenient to form a system with multiple SPI interface devices, and has high transfer rate, programmable, few connecting lines, and good scalability. It is an excellent synchronous sequential circuit. The SPI bus usually has 4 lines: serial Line clock line (SCLK), master input/slave output data line MISO), master output/slave input data line (MOSI). Active-Low Slave Select Line (SS N). The SPI system can be divided into two categories: master device and slave device. The master device provides the SPI clock signal and chip select signal, and the slave device is any integrated circuit that receives the SPI signal. When the SPI is working, the data in the shift register is output bit by bit from the output pin (MOSI), and at the same time, the data is received bit by bit from the input pin (MISO). Both sending and receiving data operations are controlled by the SPI master clock signal (SCLK), thus ensuring synchronization. Therefore, there can only be one master device, but there can be multiple slave devices, and one or more slave devices can be selected at the same time through the chip select signal (SSN).
Its typical structure is shown in the figure below:
insert image description here
The typical sequence diagram of SPI bus is shown in the figure below:

For example: use Verilog to design a simplified SPI receiver, which is used to complete the transmission of 8bits data. The block diagram of the SPI receiver is shown in the figure:

module SPI(sdout,MISO,sclk,srst,sen,ss_n);
 output [7:0] sdout;
 output ss_n;
 input MISO,sclk,srst,sen;
 reg [2:0] counter;
 reg [7:0] shift regist;
 reg ss_n;
 always @(posedge sclk)
 	if (!srst)
		counter<=3'b000;
	else if(sen)
		if (counter==3'b111)
			begin
				counter<=3'b000:
				ss_n<=1'b1;
			end
		else
			begin
				counter<=counter+1;
				ss_n<=1'b0;
			end
	else
		counter<=counter;
 always@(posedge sclk)
 	if(sen)
		shift_regist<={shift_regist[6:0],MISO};
	else
		shift_regist<=shift_regist;
	assign sdout=ss_n?shift_regist:8'b00000000;
endmodule

In the code, sclk is the interface clock. srst is a clear signal, active low. sen is an interface enable signal, which is active at high level. ss_n is the chip select signal, which selects the slave device, and the high level is effective. When the circuit is powered on, the clear signal is first set to be effective, and the circuit is initialized. When the sen enable signal is valid, data transmission starts. Since the transmission data is 8 bits, the sen enable signal should be maintained for at least 8 clock cycles. When all 8bits data is input, the chip select signal ss n is valid, selects the slave device, and outputs the data as a whole. The chip select signal ss n is generated by a 3bits counter. When the counter counts to 111 states, ss_n=1, and in other states ss_n=0

module SPI_tb;
 reg MISO,sclk,sen,srst;
 wire [7:0] sdout;
 wire ss_n;
 SPI U1(.sdout(sdout),.MISO(MISO),.sclk(sclk),.srst(srst),.sen(sen),ss_n(ss_n));
 initial
	begin
		MISO=0;sclk=0;srst=0:sen=0;
		#10 srst=1;
		#10 sen=1;
		#80 sen=0;
		#10 sen=1;
		#80 sen=0;
	end
initial
	begin
		#30 MISO=1;
		#10 MISO=0;
		#10 MISO=1;
		#10 MISO=0:
		#10 MISO=1;
		#10 MISO=0;
		#10 MISO=1;
		#20 MISO=1;
		#10 MISO=0;
		#10 MISO=1;
		#10 MISO=0;
		#10 MISO=1;
		#10 MISO=0;
		#10 MISO=1;
		#10 MISO=0;
	end
always #5 sclk<=~sclk;
endmodule

Source: Teacher Cai Jueping's Verilog course

Verilog study notes (5): Verilog advanced programming

Article directory

1. Hierarchy of digital circuit system design

2. Typical circuit design

2.1 Adder tree multiplier

2.2 Wallace tree multiplier

2.3 Complex multiplier

2.4 FIR filter design

2.5 Design of on-chip memory

2.6 FIFO design

2.7 Keyboard Scanner and Encoder

2.8 Verilog design of log function

2.9 Verilog Implementation of CORDIC Algorithm

3. Bus controller design

3.1 UART interface controller

3.2 SPI interface controller

Source: Teacher Cai Jueping's Verilog course

Guess you like