Skip to main content

FPGA Zero to Hero Vol 3

·2476 words·12 mins· loading ·
Table of Contents

FPGA - Zero to Hero: Vol 3
#

In the previous blog I create a very basic code to toggle a PIN on a ALtera based FPGA board. I will continue my journey with the board and we will try to get a UART working on the board. There is RS232 IP available and Example provided by Altera for Quartus II and might not have changed much for Quartus Prime Lite but we are going to go the hard way and implement a bare metal UART (not production ready but a small test) for educational purpose only. We will also connect it to a Raspberry PI UART port to send and receive data between the two boards.

To create a UART TX and RX we need to understand how UART Works. UART is Universal Asynchronous Receiver/Transmitter (UART) is a popular hardware communication protocol used in serial communication. It enables the asynchronous transmission of data between devices without the need for a shared clock. So not sharing the clock makes it popular as dealing with clock like in some protocol like Serial Peripheral Interface (SPI more about in future blogs) is not easy from HW design perspective and also SPI needs >= 4 wires while UART only needs 2 wires.

Here is basic feature of a UART protocol: It has an UART Transmitter and an UART Receiver The protocol has a Start Bit and Stop bit Parity bit and Data bits. [UART Transmitter] — (TX) Start Bit, Data Bits, Parity, Stop Bit —> [UART Receiver]

UART Transmitter Code
#

`timescale 1ns / 1ps

module uart_tx #(
    parameter CLOCK_FREQ = 50000000,
    parameter BAUD_RATE  = 115200
)(
    input clk,
    input rst,
    input [7:0] tx_data,  
    input tx_start,      
    output reg tx,       
    output reg tx_busy 
);

    localparam integer BAUD_COUNTER_MAX = CLOCK_FREQ / BAUD_RATE;
    localparam [2:0] IDLE  = 3'b000,
                     START = 3'b001,
                     DATA  = 3'b010,
                     STOP  = 3'b011;

    reg [2:0] state;
    reg [15:0] baud_cnt;
    reg [3:0] bit_index;
    reg [7:0] tx_shift;

    always @(posedge clk or posedge rst) begin
        if (rst) begin
            state     <= IDLE;
            tx        <= 1'b1;
            baud_cnt  <= 0;
            bit_index <= 0;
            tx_busy   <= 0;
        end else begin
            case (state)
                IDLE: begin
                    tx <= 1'b1;
                    baud_cnt <= 0;
                    bit_index <= 0;
                    tx_busy <= 0;
                    if (tx_start) begin
                        state    <= START;
                        tx_busy  <= 1;
                        tx_shift <= tx_data;
                    end
                end
                START: begin
                    tx <= 1'b0;
                    if (baud_cnt < BAUD_COUNTER_MAX-1)
                        baud_cnt <= baud_cnt + 1;
                    else begin
                        baud_cnt <= 0;
                        state    <= DATA;
                    end
                end
                DATA: begin
                    tx <= tx_shift[0];
                    if (baud_cnt < BAUD_COUNTER_MAX-1)
                        baud_cnt <= baud_cnt + 1;
                    else begin
                        baud_cnt  <= 0;
                        tx_shift  <= tx_shift >> 1;
                        bit_index <= bit_index + 1;
                        if (bit_index == 7)
                            state <= STOP;
                    end
                end
                STOP: begin
                    tx <= 1'b1;
                    if (baud_cnt < BAUD_COUNTER_MAX-1)
                        baud_cnt <= baud_cnt + 1;
                    else begin
                        baud_cnt <= 0;
                        state    <= IDLE;
                        tx_busy  <= 0;
                    end
                end
                default: state <= IDLE;
            endcase
        end
    end

endmodule

lets break down the code and try to understand the code piece-by-piece.

The first part is quite easy to understand. It is just a module template where we have CLOCK_FREQ of 50000000. and BAUD_RATE of 115200 bps with Ports as clk is the system clock, and rst is used to asynchronously reset the transmitter.tx_data carries the 8-bit data to be sent serially. tx_start triggers the start of the transmission.tx is the serial output. In UART, the idle state is high and tx_busy signals that the transmitter is active.

module uart_tx #(
    parameter CLOCK_FREQ = 50000000,
    parameter BAUD_RATE  = 115200
)(
    input clk,
    input rst,
    input [7:0] tx_data,  
    input tx_start,      
    output reg tx,       
    output reg tx_busy 
);

The state machine is the thing that need better explanation.

State machine Transmitter

As I have explained in my previous blog Link verilog is a descriptive language that describes a hardware hence it is easy to build a sequential circuit. What we use in the design is something call memory elements to pass the pass output of a combinational circuit into next combinational block with help of clocks and memory elements. Flip Flops and Shift Registers are one type of memory elements that might be used.

So states being sequential must be implemented with memory.In verilog this can be done using always @(posedge clk) begin ... end block. This

Lets start with IDLE state.The UART line tx stays high. All counters are reset.tx_busy is set low. When tx_start is asserted, the module loads the 8-bit tx_data into the shift register and transitions to the START state.We also set tx_busy high to indicate an ongoing transmission.

    IDLE: begin
        tx <= 1'b1;         
        baud_cnt <= 0;
        bit_index <= 0;
        tx_busy <= 0;
        if (tx_start) begin 
            state <= START;
            tx_busy <= 1;
            tx_shift <= tx_data;
        end
    end

In START state he start bit is driven low on the tx line.The baud_cnt counts clock cycles until a complete bit count period is reached.After one bit period, the state moves to DATA.

    START: begin
        tx <= 1'b0;
        if (baud_cnt < BAUD_COUNTER_MAX-1)
            baud_cnt <= baud_cnt + 1;
        else begin
            baud_cnt <= 0;
            state <= DATA;
        end
    end

In DATA state The lowest significant bit of tx_shift is sent out first. The bit is heald for timing calculated by baud counter. After the bit period, the data is shifted right so that the next bit becomes the LSB.The bit_index is incremented. Once 8 bits are transmitted (when bit_index reaches 7), the state machine advances to the STOP state.

    DATA: begin
        tx <= tx_shift[0];
        if (baud_cnt < BAUD_COUNTER_MAX-1)
            baud_cnt <= baud_cnt + 1;
        else begin
            baud_cnt  <= 0;
            tx_shift  <= tx_shift >> 1;  
            bit_index <= bit_index + 1;
            if (bit_index == 7)
                state <= STOP;
        end
    end

In STOP the stop bit which is logic high is transmitted.The baud counter again ensures that the stop bit is held for one full bit period.Once completed, the state machine returns to the IDLE state, and tx_busy is cleared.

                STOP: begin
                    tx <= 1'b1;
                    if (baud_cnt < BAUD_COUNTER_MAX-1)
                        baud_cnt <= baud_cnt + 1;
                    else begin
                        baud_cnt <= 0;
                        state    <= IDLE;
                        tx_busy  <= 0;
                    end
                end
                default: state <= IDLE;

To test the code we need to write a testbench. I have written a small test bench with the test cases:

`timescale 1ns / 1ps

module uart_tx_tb;
    reg clk;
    reg rst;
    reg [7:0] tx_data;
    reg tx_start;
    
    // Outputs
    wire tx;
    wire tx_busy;
    
    // Clock
    parameter CLOCK_FREQ = 50000000;
    parameter BAUD_RATE  = 9600;
    
    uart_tx #(
        .CLOCK_FREQ(CLOCK_FREQ),
        .BAUD_RATE(BAUD_RATE)
    ) uut (
        .clk(clk),
        .rst(rst),
        .tx_data(tx_data),
        .tx_start(tx_start),
        .tx(tx),
        .tx_busy(tx_busy)
    );
    
    // We toggle clock every 10ns
    initial begin
        clk = 0;
        forever #10 clk = ~clk;
    end
 
    initial begin
        $dumpfile("uart_tx.vcd");
        $dumpvars(0, uart_tx_tb);
    end

    initial begin
        rst      = 1;
        tx_data  = 8'b0;
        tx_start = 0;
        rst = 0;
        #50;
       
        // Test Case 1: Transmit 0x55
        tx_data = 8'h55;
        tx_start = 1;
        #20;
        tx_start = 0;
        wait (tx_busy == 0);
        
        #100;
        // Test Case 2: Transmit 0xA5
        tx_data = 8'hA5;
        tx_start = 1;
        #20;
        tx_start = 0;
        wait (tx_busy == 0);

        #100;
        // Test Case 3: Transmit 0xA5
        tx_data = 8'h03;
        tx_start = 1;
        #20;
        tx_start = 0;

        
        wait (tx_busy == 0);
        
        #100;
        $finish;
    end
endmodule

I then compile the uart_tx.v and uart_tx_tb.v file with iverilog as shown in previous blog post and check the timing diagram using GTKWave with following results

UArt tx timing

UART Receiver Code
#

Similarly we can create Reciever block with following state machine: State machine diagram for UART RX.

State machine Reciever

As you can see the Receiver block is bit different from the Transmitter block. There are multiple reasons for this. Firstly, I am using a 50 MHz clock and if you see the line BAUD_COUNTER_MAX = CLOCK_FREQ / BAUD_RATE Our CLOCK_FREQ is not exactly divisible by out BAUD_RATE of 9600. This can cause sampling error in the reciver part. Secondly I made it more robust with double synchronization using rx_sync_0 and rx_synq_1.

module uart_rx #(
    parameter CLOCK_FREQ = 50000000,
    parameter BAUD_RATE  = 9600,
    parameter OVERSAMPLE = 16
)(
    input clk,
    input rst,
    input rx,
    output reg [7:0] rx_data,
    output reg rx_done
);
    localparam integer OVERSAMPLE_MAX = CLOCK_FREQ / (BAUD_RATE * OVERSAMPLE);

    localparam [1:0] IDLE  = 2'b00,
                     START = 2'b01,
                     DATA  = 2'b10,
                     STOP  = 2'b11;

    reg rx_sync_0, rx_sync_1;
    always @(posedge clk) begin
        rx_sync_0 <= rx;
        rx_sync_1 <= rx_sync_0;
    end

    //Registers
    reg [1:0] state;
    reg [15:0] oversample_cnt;
    reg [3:0] sample_cnt;
    reg [3:0] bit_index;
    reg [7:0] rx_shift;

    always @(posedge clk or posedge rst) begin
        if (rst) begin
            state          <= IDLE;
            oversample_cnt <= 0;
            sample_cnt     <= 0;
            bit_index      <= 0;
            rx_shift       <= 0;
            rx_data        <= 0;
            rx_done        <= 0;
        end else begin
            case (state)
                IDLE: begin
                    rx_done <= 0;
                    if (!rx_sync_1) begin
                        state          <= START;
                        oversample_cnt <= 0;
                        sample_cnt     <= 0;
                    end
                end
                START: begin
                    if (oversample_cnt < OVERSAMPLE_MAX - 1) begin
                        oversample_cnt <= oversample_cnt + 1;
                    end else begin
                        oversample_cnt <= 0;
                        sample_cnt     <= sample_cnt + 1;
                        if (sample_cnt == (OVERSAMPLE/2)-1) begin
                            if (!rx_sync_1) begin
                                state     <= DATA;
                                bit_index <= 0;
                            end else begin
                                state <= IDLE;
                            end
                            sample_cnt <= 0;
                        end
                    end
                end
                DATA: begin
                    if (oversample_cnt < OVERSAMPLE_MAX - 1)
                        oversample_cnt <= oversample_cnt + 1;
                    else begin
                        oversample_cnt <= 0;
                        sample_cnt     <= sample_cnt + 1;
                        if (sample_cnt == OVERSAMPLE - 1) begin
                            sample_cnt <= 0;
                            rx_shift[bit_index] <= rx_sync_1;
                            if (bit_index == 7)
                                state <= STOP;
                            else
                                bit_index <= bit_index + 1;
                        end
                    end
                end
                STOP: begin
                    if (oversample_cnt < OVERSAMPLE_MAX - 1)
                        oversample_cnt <= oversample_cnt + 1;
                    else begin
                        oversample_cnt <= 0;
                        sample_cnt     <= sample_cnt + 1;
                        if (sample_cnt == OVERSAMPLE - 1) begin
                            state   <= IDLE;
                            rx_data <= rx_shift;
                            rx_done <= 1;
                        end
                    end
                end
                default: state <= IDLE;
            endcase
        end
    end

endmodule

here is detailed explanation of the code:

The OVERSAMPLE_MAX: This is computed as OVERSAMPLE_MAX = CLOCK_FREQ /(BAUD_RATE * OVERSAMPLE) for us it is OVERSAMPLE_MAX = 50 MHZ /(9600*16) ​ Before processing the asynchronous rx signal, the code synchronizes it to the clk using a two-stage synchronizer:

reg rx_sync_0, rx_sync_1;
always @(posedge clk) begin
    rx_sync_0 <= rx;
    rx_sync_1 <= rx_sync_0;
end

oversample_cnt counts clock cycles up to OVERSAMPLE_MAX - 1. This counter defines the duration of one oversample period and sample_cnt counts the number of oversample ticks within a bit period. It runs from 0 to OVERSAMPLE - 1. bit_index keeps track of what databit is currently being received from 0 to 7.

IDLE state is state where Receiver waits for transmittion to bigin. I will skip the explaination for this block.

START: begin
    if (oversample_cnt < OVERSAMPLE_MAX - 1) begin
        oversample_cnt <= oversample_cnt + 1;
    end else begin
        oversample_cnt <= 0;
        sample_cnt     <= sample_cnt + 1;
        // When reaching the middle sample of the start bit
        if (sample_cnt == (OVERSAMPLE/2)-1) begin
            // Confirm that the start bit is still low
            if (!rx_sync_1) begin
                state     <= DATA;
                bit_index <= 0;
            end else begin
                state <= IDLE; // False start bit detected
            end
            sample_cnt <= 0;
        end
    end
end

START state is used to validate the reception of start bit. The oversample_cnt is incremented each clock cycle until it completes one oversample period. Once an oversample period is completed, sample_cnt is incremented. When sample_cnt reaches the midpoint of the oversampling window ((OVERSAMPLE/2)-1), the code samples the rx_sync_1 signal and if the signal is still low it confirms a valid start bit and state machine moves to DATA state else it is considered false start ans moved back to IDLE state.

DATA: begin
    if (oversample_cnt < OVERSAMPLE_MAX - 1)
        oversample_cnt <= oversample_cnt + 1;
    else begin
        oversample_cnt <= 0;
        sample_cnt     <= sample_cnt + 1;
        // At the end of the oversample period, consider taking a sample.
        if (sample_cnt == OVERSAMPLE - 1) begin
            sample_cnt <= 0;
            // Optionally, you could implement majority voting over several samples.
            rx_shift[bit_index] <= rx_sync_1;
            if (bit_index == 7)
                state <= STOP;
            else
                bit_index <= bit_index + 1;
        end
    end
end

Similar to the START state the oversample_cnt counts clock cycles for each oversample period. Once an oversample period completes the sample_cnt is incremented. When sample_cnt reaches OVERSAMPLE - 1, it indicates that a full bit period has elapsed. At that moment, the sampled value from rx_sync_1 is stored in the rx_shift register at the position specified by bit_index. If bit_index is 7 state machine transitions to the STOP state

I will leave the STOP bit and compilation and testbench and simulation as homework for the readers of this blog.

Implementing in FPGA
#

Now its time to put this in a FPGA and test in real world. I will continue using Altera (Intel) FPGA for now.

We will continue from where we left in my previous blog post link. In this blog we created a basic PLL block that was connected to input clock reference and a GPIO pin at tthe output of the PLL block.

I will remove the output pin as we dont need it anymore and use it as clock input for the uart_rx and uart_tx modules. To create a block diagram from verilog code we can use few simple step first go to File --> New... --> Design Files --> Verilog HDL File to creata a new verilog modules. Then save the abopve code in uart_rx.v and uart_tx.v files. Then start complilation using the button or Ctrl + L button combination in the keyboard. After the compilation is finished, Go to FIles and right click on the uart_tx.v file and Create Symbol Files from current Files:

Create Symbol from Verilog

Do the same for the uart_rx.v and the symbols are now available in main Block Diagram in Project.

MegaWizard Intel Fpga

For now to test the Reciever and Transmitter block. I will internally connect them or shor circuit them in electrical terms. So the data comming from Receiver is transmitted by Transmitter.

Complete block diagram

As you can see I have skipped some steps like connecting the GPIO pins to the block as I have already explained it in my previous blog link.

Real World Test
#

Now to test it I connected it to a Raspberry Pi (see siagram below) UART Tx pins and created a simple program in Python script to send a counter value over Uart TX which is connected to UART Rx of FPGA board at PIN L6.

import serial
import RPi.GPIO as GPIO
import time 

reset = 12
GPIO.setmode(GPIO.BOARD)
GPIO.setup(reset, GPIO.OUT)

GPIO.output(reset, GPIO.HIGH)
time.sleep(1)
GPIO.output(reset, GPIO.LOW)


port = serial.Serial("/dev/serial0", baudrate=9600, timeout=3.0, stopbits=serial.STOPBITS_ONE,bytesize=serial.EIGHTBITS)

counter = 1
while True:
    send_data = counter.to_bytes(1, 'big')
    port.write(send_data)
    counter += 1
    if(counter == 254):
        counter = 1

I have also connected GPIO 18 of Rasperry PI to PIN N3 as reset pin for both uart_rx and uart_tx blocks. This is important as you need to reset and clean all registers.

Connections

Now I recieve the counter values from the Raspberry Pi Uart Tx and pass it to logic analyser using Uart Tx pin of FPGA.

I get following data:

Logic Data 1 Logic Data 2

In future blogs I will continue our journey and explore the world of FPGA also we will try other vendors and other modern FPGAs where we dont need to have this interface as the processor is already indife the FPGA as a seperate core.