Embedded system BootLoader detailed

What to do after power up (bootload phase)


1. The first line of program

After getting the empty PCB board, the hardware engineer will first test whether the main circuits are connected, whether the solder joints are empty, disconnected or short-circuited, and then solder the modules one by one. After the system is powered on, it is necessary to verify whether the power supply voltage of the CPU and each component is normal, whether the oscillating circuit supplied to the CPU can start oscillating normally, and whether the external memory can be read and written normally. After downloading our program to the board with the JTAG tool, the following checks need to be done before actually debugging the system:

  • Use the debugging tool to set a breakpoint on the first line of the program to make sure that the program has stopped;
  • Check whether the program counter PC of the CPU is correct;
  • Check whether the content of the internal RAM of the CPU is the same as the executable file we downloaded;
  • The first command line of the program is to set the CPU status register and observe whether the CPU status register changes as expected;
  • Continue single-stepping to confirm whether the PC register will change accordingly, and the execution result of each command is correct.

After checking the above items, it can only prove that the power supply circuit and CPU on the board are normal. Next, continue to verify the CPU and peripheral devices, and confirm the correctness and stability of the board before proceeding to the next test.


2. Basic hardware test

Since the responsibility of Boot-Loader is to help other programs arrange a runnable environment, the following verifications must be done:

  • CPU register (status register, general register, memory map register) operation test;
// 设定SP(Stack Point)寄存器
//
asm("xld.w  %r15, 0x2000");
asm("ld.w %sp, %r15");

// 设定CPU的状态寄存器
//
asm("xld.w  %r15, 0x200010");
asm("ld.w   %psr, %r15");

// 将寄存器0x300023的bit 1设为1
*(volatile unsigned char *)0x300023 |= 0x2;
  • Is the Stack Pointer setting correct? Does the function call run correctly?
  • Is the interruption meter set correctly? Is the interrupt vector program running normally?
  • Memory initialization and operation test to ensure that all memories can be read and written normally;
  • Load the data segment into RAM, set the initial value for the bss segment, and load the program that needs to run in RAM into RAM. Ensure that the initial values ​​of global variables are correct when the main program is executed.

The next step can only be carried out after the above test is passed.


(1) Confirm whether the function call can run normally

The correct setting of the stack (Stack) is the prerequisite for the successful call of the function. During the development of the embedded system, the system must manage the stack by itself. If the management is improper, the function call or a crash after several layers may occur. Because C language uses the stack to accomplish the following:

  • Store function return address;
  • Parameter passing during function call (when there are many parameters);
  • Local variables inside the storage function;
  • When the interrupt service routine is executed (when an interrupt occurs), the current state of the CPU and the return address are stored.

The configuration of the stack point address (Stack Point) is a very important thing, but it is easily overlooked. Mainly when programming on Windows or Linux, when the operating system generates an executable file, the linker will automatically add a Startup Code to the program, which contains the configuration of the Stack memory. However, in an embedded system without an operating system, the Stack Point must be set up before calling any function.

When a function is called in C language, such as fun(a,b), the compiled machine code should contain the following actions:

  • Execute the instruction push, store the parameters a and b into Stack, and decrease the stack pointer SP by one;
  • Store the value of the current program counter register PC (that is, the return address: the address of the next instruction of the function call instruction) to the stack;
  • Execute the command Call, set the PC value to the address of the function fun(), and the next command to be executed is the first command of the function.
  • When the function fun is executed, the current SP value can be used to calculate the addresses of parameters a and b;
  • If there are local variables inside the function, these variables are stored in the stack in turn. Therefore, in embedded development, try not to define variables with too large a size, otherwise there is a risk of Stack Overflow.
  • When the function is executed, the CPU will execute the ret command, which will take the return address from the top of the Stack and assign it to the PC register. The next instruction will execute the next line of instructions after the function to complete the function call.

If the SP register is not set to the correct address, or a large enough storage area is not configured as a stack space, an error is likely to occur when calling the function. The following figure is an example of stack space overflow, destroying the program data segment:

Local variables are too large causing Stack Overflow

In order to avoid the occurrence of the above situation, the top (maximum address) of a certain block of RAM is generally selected as the initial value of the SP register, but the appropriate size of the specific stack depends on the specific software and hardware environment and project requirements. The general method is to define a bit larger at first, for example, about 2KB-4KB, and then ask the tester to run all the functions (functions) of the system, and record the minimum value of SP after each function call, which is the same as the top of the stack. The address difference is the minimum stack space required, usually a little bit more.


(2) Confirm whether the interrupt system can operate normally

The engineer responsible for writing the driver should fill in the interrupt vector table with the address of the interrupt service program and must ensure that the interrupt system is normal when the driver is executed. Generally speaking, do the following work:

  • An array of interrupt vector tables, with detailed notes on the interrupt source represented by each entry;
  • If it is an external interrupt controller, the driver of the interrupt controller must be completed before the interrupt system test can be started.
  • Set the CPU's interrupt vector table address register (some CPUs have no interrupt vector table address register, but it will specify a fixed address as the address of the interrupt vector table)
  • Set the interrupt control register of the CPU (priority, interrupt enable bit, etc.)
  • After confirming that the interrupt is triggered, the corresponding ISR will be executed.
  • Provide ISR examples so that ISR writers do not need to know the details of the interrupt system.
// ISR模板
//
void isr_template(void)
{
    // 将所有通用目的寄存器存到堆栈
    //
    asm("pushn %r15"); /*将r0 - r15 都存到堆栈中 */

    //将ALR与AHR寄存器通过r1存到堆栈
    //你无需搞清ALR和AHR是什么寄存器,不同的CPU有不同的寄存器需要存储
    //
    asm("ld.w   %r1, %alr");
    asm("ld.w   %r0, %ahr");
    asm("pushn  %r1");

    //调用C语言函数your_ISR,即真正ISR要处理的事写在该函数里就行
    //
    asm("xcall your_ISR");

    //从堆栈中取回被调用时的ALR和AHR寄存器的值
    //
    asm("popn   %r1");
    asm("ld.w   %alr, %r1");
    asm("ld.w   %ahr, %r0");

    //从堆栈中取回r1 - r15的值
    //
    asm("popn   %r15");

    //执行中断返回指令,返回被中断的程序
    //
    asm("reti");
}

The places that are prone to errors in the above links are:

  • The interrupt priority register is not set correctly;
  • The corresponding relationship between each entry in the interrupt vector table and the interrupt source is wrong;
  • The address of the interrupt vector table is set incorrectly. Many CPUs will require the address of the interrupt vector table to be set at an even address or a multiple of 4, or even a multiple of 128KB.

How to judge whether the ISR has been executed correctly? The general method is to select a simple interrupt source (for example, divide by 0 error interrupt), set a breakpoint in its ISR, and then single-step to see if the ISR program can be executed smoothly and return to the place where the interrupt occurred (divide by zero) The next statement of the instruction).

(3) Memory test

The memory problems are:

  • Hardware aspect: wrong connection of data line and address line;

  • Software aspect: SRAM, NOR Flash, and ROM do not require additional circuits and can be used directly, but SDRAM requires additional SDRAM Controller circuits to be used. The program must first set the configuration of SDRAM Controller (SDRAM size, speed, etc.);

  • The timing setting of the external memory, if the timing setting is too fast, the system will be unstable, too slow, the system performance will deteriorate. The Timing setting table of the general CPU will explain how to set it.

  • Before proceeding to the lower part of the work, each Byte of the memory must be tested to ensure that the reading and writing (if it can be written) are normal. The method is to write 0x00, 0xFF, 0x55, 0xAA to each byte in turn, ensuring that each bit will be written with 0 and 1.

  • int SRAM_testing(void)
    {
      int i,counter =0;
      //待测RAM起始地址为0x2000000,大小为2MB.
      unsigned char *pointer = (unsigned char *)0x2000000;
      unsigned char data[4]={0x00,0xFF,0x55,0xAA};
    
      for(i=0; i<4; i++)
      {    // 逐一对每个字节写入某特殊值
          for(j=0; j<(8*1024*1024); j++)
              pointer[i] = data[i] 
           // 逐一读出每个字节,判断写入的值是否正确      
          for(j=0; j<(8*1024*1024); j++)
              pointer[i]==data[i]?::counter++;
      }      
      return counter; //返回出错字节的个数  
    }
  • For read-only ROM, how to verify that the data burned into the memory is consistent with the original image file? The checksum inspection method is generally used. That is, calculate whether the checksum of the original image file and the file burned into the ROM are equal.

  • /***************************************************************
    Function Name: calculate_ROM_checksum
    Function Purpuse:计算起始地址为0x2000000,size为8MB存储器的校验和
    ****************************************************************/
    unsigned long calculate_ROM_checksum(void)
    {
      unsigned long checksum = 0;
      unsigned char *pointer = 0x2000000;
      for(i=0; i<(8*1024*1024); i++)
          checksum += pointer[i];
      return checksum;
    }

(4) CPU initialization

During the Boot-Loader phase, the following CPU-related settings should be made:

  • Set the stack pointer register SP;
  • Set the status register and disable interrupts;
  • Set interrupt vector table pointer;
  • Set CPU execution status (clock timing);
  • Set the memory controller (if SDRAM-like memory is used);
  • Set the timing of CPU operation of each memory;
  • Set the PIN function of the CPU;
  • Initialize peripheral devices (LCD Controler, USB Controler, SD card interface, etc.)

3. Loading program segments and data initialization

(1) Load the data section

Global variables with initial values ​​must be stored in an executable file and burned into ROM. But because the value of these global variables will be changed during execution, of course it cannot run in ROM, and must be addressed to RAM when connected. It is precisely because of this feature of "stored in ROM, run in RAM" that there is a need to transmit data segments, and these things must be done before all programs use global variables.

Memory usage during execution

In the above figure, the content of the data segment was originally after the rodata segment in the executable file, but during execution, the data segment needs to be copied after the bss segment in RAM. The connection script is as follows:

.data __END_bss : AT(__END_rodata)
{
    __START_data = .;
    *(.data);
    __END_data = .;

    // 定义可在程序中使用的变量“__START_data_LMA”,表示data段的存储起始地址LMA
    __START_data_LMA = LOADADDR(.data);

    //定义可在程序中使用的变量“__SIZE_DATA”,表示data段的大小
    __SIZE_DATA = __END_data - __START_data;
}

The transfer procedure is as follows:

/**************************************************
Function Name: copy_data_section()
Function Purpuse:将可执行文件中的数据段复制到内存中
***************************************************/
extern unsigned long *__START_data;
extern unsigned long *__START_data_LMA;
extern int __SIZE_DATA;

void copy_data_section(void)
{
    int i;
    unsigned long *dest = __START_data;
    unsigned long *src = __START_data_LMA;
    //假设data段的大小是4的整数倍个字节
    for(i=0; i<(__SIZE_DATA/4); i++)
        dest[i] = src[i];    
}

(2) Set the bss segment

The setting of the bss section is relatively simple, because the members in the bss section are global variables with no initial value, and no storage space is needed at all. When executing, just set the execution space (VMA) of the bss section to 0.

/*******************************************
定义bss段,起始地址(VMA)从0开始
******************************************/
.bss 0x0 : 
{
    __START_bss = .;
    *(.bss);
    __END_bss = .;

    //定义可在程序中使用的变量:__SIZE_BSS
    __SIZE_BSS = __END_bss - __START_bss;
}

The code to set the bss segment to 0 is as follows:

/**************************************************
Function Name: clear_bss_section()
Function Purpuse:将bss段清零
***************************************************/
extern unsigned long * __START_bss;
extern int __START_BSS;

void clear_bss_section(void)
{
    int i;
    unsigned long * dest = __START_bss;
    //假设bss段的大小为4的整数倍字节大小
    for(i=0; i<(__SIZE_BSS/4); i++)
        dest[i] = 0;
}

Attention : During the boot phase, the data section and the bss section must be set first, otherwise the value of the global variable during execution is incorrect. In other words, the boot-load program cannot use global variables before setting the data and bss sections. If you must use them, avoid assigning values ​​when defining global variables, and you must explicitly assign values ​​in the program. E.g:

<img src=" https://cdn.jsdelivr.net/gh/Leon1023/leon_pics/img/20201125224022.png "alt="You must be careful when using global variables in the Boot-Loader program" style="zoom:80%;" />

(3) Load the text segment

When a system program or application program module requires a higher execution speed, they can often be copied to the system memory for execution. However, system memory is often limited in space, and it is impossible to load them all at the same time. So we usually write a function, address it to the same address, and load it when needed.

The performance of various types of memory in descending order are: CPU register, CPU cache, CPU internal RAM, external SRAM, NOR Flash, SDRAM, Mask ROM, NAND Flash.

NAND Flash: Low price and large capacity. Think of it as a hard disk-like device, but it cannot be directly addressed and the program cannot be executed directly on it;

NOR Flash: The price is high, the capacity is small, but it is fast to read data. Think of it as a rewriteable ROM, and the program can run directly on it.

Mask ROM: High cost and limited capacity, but the program can be run directly on it;

SDRAM: Cost-effective, generally used as the external memory of the system, the program can be run directly on it;

SRAM: It is expensive and small in capacity. It is generally used as the built-in memory of the system, and the program can be run directly on it.

(4) Several system memory architectures

  • Architecture of booting from NAND Flash:

  • image-20201125230828425

  • The startup process is :

    • After power-on, the CPU built-in program will read the Boot-Loader program from the specific address of the NAND Flash (usually the first block address) to the internal memory of the CPU.
    • The CPU transfers control to the Boot-Loader in the internal memory;
    • Boot-Loader initializes SDRAM, and then loads the main program into SDRAM from NAND Flash;
    • Boot-Loader transfers control to the main program.

    For more knowledge, please click and follow:
    Embedded Linux&ARM
    CSDN Blog
    Brief Book Blog
    Know the Column

Guess you like

Origin blog.51cto.com/14592069/2556462