AB32VG1 overclocking, compiler optimization settings

1. Cause

The bottom layer of the Helix decoding library was modified before and the bottom layer was forcibly implemented in C language, so that the Helix decoding library can run on any processor. For details, please see: Helix MP3 decoding library is free from the shackles of assembly instructions and runs on any processor solution_Fairchild_1947's Blog-CSDN Blog

However, according to theoretical calculations, it can be seen that decoding 44.1KHZ, 320Kpbs audio files requires a performance of 154.05DMIPS (see the above for specific calculations), and the rated operating frequency of AB32VG1 is 120MHZ, and the actual use of it is obvious. Therefore, it is necessary to use compiler optimization and overclocking so that this function can be used normally on AB32VG1 MCU.

2. Compiler optimization

Compiler optimization is a very safe way to improve the running speed of the program, and its settings are as follows:

Step 1: Right click on the project and select Properties

Step 2: Choose C/C++ build

Step 3: Setup

Step 4: Click on Optimization

Step 5: The default is no optimization. Select the level you want to optimize from the drop-down list. Here, the highest level 3 optimization is selected.

 3. Compiler optimization effect

I don’t know why the size of the binary file remains unchanged after the RISC-V compiler optimizes the selection, and the download software of AB32VG1 has a function to check the difference between the two downloaded images, and only updates the different places. After adjusting the optimization level, the download software directly prompts that there is no update.

It may be that I don't understand the RISC-V compiler, and the settings are wrong. Please correct me.

 4. Overclocking

I thought this idea was crazy, but I didn't expect to see such a scene when I checked the README.md attached to the RT-Thread project——

Unexpectedly, the official has already preempted the whole work. The official claims that it can be overclocked to 192MHZ. It just happened that this also aroused my interest. I want to try the limit normal operating frequency of AB32VG1. After all, I overclocked a STM32F429 MCU with a rated 180MHZ main frequency to 480MHZ, which is the same frequency as H7 ^*_^*

The overclocking officially starts, RT-Thread will put the clock setting code into board.c, so lock this file and find the corresponding code to change the frequency.

4.1 Adjustment within the rated operating frequency

This method can be used to adjust within the rated operating frequency, that is, within 120MHZ. What's the point? Because for some reason, the newly generated project will default to the working frequency of 48MHZ, so when the performance of AB32VG1 is not enough, it may be because the working frequency is only 48MHZ, the function code is as follows:

void rt_hw_systick_init(void)
{
    CLKCON2 &= 0x00ffffff;
    CLKCON2 |= (25 << 24);                                  //配置x26m_div_clk = 1M (timer, ir, fmam ...用到)
    CLKCON0 &= ~(7 << 23);
    CLKCON0 |= BIT(24);                                     //tmr_inc select x26m_div_clk = 1M

    set_sysclk(SYSCLK_48M);    //该这一句就可在额定频率范围内调整频率

    /* Setting software interrupt */
    set_cpu_irq_comm(cpu_irq_comm);
    rt_hw_interrupt_install(IRQ_SW_VECTOR, rt_soft_isr, RT_NULL, "sw_irq");

    timer0_init();
    hal_set_tick_hook(timer0_cfg);
    hal_set_ticks(get_sysclk_nhz() / RT_TICK_PER_SECOND);

    PICCON |= 0x10002;
}

Change the parameter of set_sysclk to adjust within the rated frequency range. Note that the parameters of this function should be selected from the enumeration

4.2 Overclocking above the rated operating frequency

After exceeding the rated operating frequency, there is no ready-made and simple way to overclock, and the multiplier setting code must be read and manually modified. Since the chip information of AB32VG1 is too little, there is no detailed documentation on the structure of the internal clock tree, so just follow the code of the routine to explore. The function of PLL setting is in the file system_ab32vg1.c, the function is as follows:

void set_sysclk(uint32_t sys_clk)
{
    uint32_t uart_baud, spll_div = 0, spi_baud = 0, spi1baud;
    uint8_t cnt_1us, clk_sel;

    clk_sel = get_clksel_val(sys_clk);
    if(sys.clk_sel == clk_sel) {
        return;
    }
//    if (sys_clk > SYSCLK_48M) {
//        PWRCON0 = (PWRCON0 & ~0xf) | (sys_trim.vddcore + 1);            //VDDCORE加一档
//    }
//    vddcore_other_offset();

//    printf("%s: %d, %d\n", __func__, sys_clk, clk_sel);
    switch (sys_clk) {
    case SYSCLK_12M:
        spll_div = 19;                   //pll0 240M
        cnt_1us = 1;
        spi_baud = 0;
        spi1baud = 0;
        break;

    case SYSCLK_24M:
        spll_div = 9;                   //pll0 240M
        cnt_1us = 2;
        spi_baud = 0;
        spi1baud = 1;
        break;

    case SYSCLK_30M:
        spll_div = 7;                   //pll0 240M
        cnt_1us = 3;
        spi_baud = 1;                   //Baud Rate =Fsys clock / (SPI_BAUD+1)
        spi1baud = 1;
        break;

    case SYSCLK_48M:
        spll_div = 4;                   //pll0 240M
        cnt_1us = 4;
        spi_baud = 1;                   //Baud Rate =Fsys clock / (SPI_BAUD+1)
        spi1baud = 3;
        break;

    case SYSCLK_60M:
        spll_div = 3;                   //pll0 240M
        cnt_1us = 5;
        spi_baud = 2;                   //Baud Rate =Fsys clock / (SPI_BAUD+1)
        spi1baud = 3;
        break;

    case SYSCLK_80M:
        spll_div = 2;                   //pll0 240M
        cnt_1us = 7;
        spi_baud = 3;                   //Baud Rate =Fsys clock / (SPI_BAUD+1)
        spi1baud = 4;
        break;

    case SYSCLK_120M:
        spll_div = 0;                   //pll0 240M
        cnt_1us = 10;
        spi_baud = 4;                   //Baud Rate =Fsys clock / (SPI_BAUD+1)     //spiclk 120/5 = 24M
        spi1baud = 9;
        break;

    case SYSCLK_26M:
        spll_div = 0;
        cnt_1us = 3;
        spi_baud = 1;
        spi1baud = 1;
        break;

    case SYSCLK_13M:
        spll_div = 1;
        cnt_1us = 1;
        spi_baud = 0;
        spi1baud = 0;
        break;

    case SYSCLK_2M:
        spll_div = 1;
        cnt_1us = 1;
        spi_baud = 0;
        spi1baud = 0;
        break;

    default:
        return;
    }

    //先判断PLL0是否打开
    if(clk_sel <= PLL0DIV_120M) {
        if (!(PLL0CON & BIT(12))) {
            PLL0CON &= ~(BIT(3) | BIT(4) | BIT(5));
            PLL0CON |= BIT(3);                     //Select PLL/VCO frequency band (PLL大于206M vcos = 0x01, 否则为0)
            PLL0CON |= BIT(12);                    //enable pll0 ldo
            delay_us(100);                         //delay 100us
            PLL0DIV = 240 * 65536 / 26;            //pll0: 240M, XOSC: 26M
            PLL0CON |= BIT(20);                    //update pll0div to pll0_clk
            PLL0CON |= BIT(6);                     //enable analog pll0
            PLL0CON |= BIT(18);                    //pll0 sdm enable
            delay_us(1000);                        //wait pll0 stable
        }
    }

    sys.cnt_1us = cnt_1us;
    sys.sys_clk = sys_clk;
    sys.clk_sel = clk_sel;
    uart_baud =  (((get_sysclk_nhz() + (sys.uart0baud / 2)) / sys.uart0baud) - 1);

    set_sysclk_do(sys_clk, clk_sel,spll_div, spi_baud, spi1baud);
    set_peripherals_clkdiv();
    update_sd0baud();       //更新下SD0BAUD
    update_uart0baud_in_sysclk(uart_baud);
}

It can be seen that this function uses switch to set parameters for different frequency selections separately, and finally sets the phase-locked loop. When setting parameters, there are two SPIs at the beginning. These two have no effect because they are related to SPI. Only two parameters, spll_div and cnt_1us, need to be adjusted. Among them, cnt_1us is used to set the system time base. If the value is not modified after overclocking, it will lead to changes in the time length of the system’s delay function and timing function. This parameter can also be left unchanged when there is no requirement for absolute time. spll_div is the frequency divider after the frequency multiplier.

These parameters will actually play a role in calling the set_sysclk_do function at the end. The set_sysclk_do function is as follows:

static void set_sysclk_do(uint32_t sys_clk, uint32_t clk_sel, uint32_t spll_div, uint32_t spi_baud, uint32_t spi1baud)
{
    uint32_t cpu_ie;
    cpu_ie = PICCON & BIT(0);
    PICCONCLR = BIT(0);                             //关中断,切换系统时钟
    set_peripherals_clkdiv_safety();

    CLKCON0 &= ~(BIT(2) | BIT(3));                  //sysclk sel rc2m
    CLKCON2 &= ~(0x1f << 8);                        //reset spll div

    if(clk_sel <= PLL0DIV_120M) {
        //sys_clk来源PLL0的分频配置
        CLKCON0 &= ~(BIT(4) | BIT(5) | BIT(6));     //sys_pll select pll0out
        if (PLL0DIV != (240 * 65536 / 26)) {
            PLL0DIV = 230 * 65536 / 26;             //pll: 240M, XOSC: 26M
            PLL0CON &= ~(BIT(3) | BIT(4) | BIT(5));
            PLL0CON |= BIT(3);                      //Select PLL/VCO frequency band (PLL大于206M vcos = 0x01, 否则为0)
            PLL0CON |= BIT(20);                     //update pll0div to pll0_clk
            CLKCON3 &= ~(7 << 16);
            CLKCON3 |= (4 << 16);                   //USB CLK 48M
        }
    } else if (clk_sel <= OSCDIV_26M) {
        //sys_clk来源于XOSC26M时钟分频, 无USB时关闭PLL0
//        if (!is_usb_support()) {
//            PLL0CON &= ~BIT(18);
//            PLL0CON &= ~(BIT(12) | BIT(6));         //close pll0
//        }

        CLKCON0 &= ~(BIT(4) | BIT(5) | BIT(6));
        CLKCON0 |= BIT(6);                          //spll select xosc26m_clk
    }

    CLKCON2 |= (spll_div << 8);
    CLKCON0 |= BIT(3);                          //sysclk sel spll
    SPI0BAUD = spi_baud;
    if (CLKGAT1 & BIT(12)) {
        SPI1BAUD = spi1baud;
    }
//    if (spiflash_speed_up_en()) {
//        set_flash_safety(sys_clk);
//    }
    PICCON |= cpu_ie;
}

Among them, what really work is the statement written to the PLL0CON, PLL0DIV, CLKCON0, CLKCON1 and other registers. Since the documentation supporting AB32VG1 does not introduce the clock system, the frequency multiplier diagram of STM23 is used here instead of understanding.

The frequency multiplier can be input from the external crystal oscillator and the internal RC oscillator. After the input, it will be pre-divided first, namely "/M", then multiplied by "*N", and finally divided by "/P", and then input to the subsequent core and peripherals. After input, the core and different peripherals have their own frequency dividers, which are redistributed, such as the frequency divider on the APB bus, the frequency divider on the AHB bus, etc.

Next, it is understood by analogy with the registers in the set_sysclk_do function. PLL0CON is the "*N" of the Main PLL, PLL0DIV is the "/M" of the Main PLL, and CLKCON0, CLKCON1, etc. are like the frequency dividers on the APB and AHB buses. Through observation, it is found that CLKCON2 is the frequency divider of the processor, because the setting parameter of CLKCON2 comes from spll_div and this parameter is the frequency division coefficient of the processor.

Observing the code, it can be found that although PLL0CON is a multiplier setting, it is difficult to modify and is not intuitive, and the sentence "PLL0DIV = 230 * 65536 / 26; //pll: 240M, XOSC: 26M" is not only clear at a glance but also has a comment, so start from this sentence.

By default, the parameter is 240 * 65536 / 26, which means that the frequency multiplier outputs 240MHZ on the premise of using a 26MHZ crystal oscillator. At this time, set the frequency division of the processor to 1 (that is, divide by 2) to get the rated main frequency of 120MHZ.

When overclocking, the author chose to change the frequency division of the processor to 0 (no frequency division) and then continuously modify the PLL0DIV parameters to perform overclocking. Because this approach can ensure that the peripherals other than the processor work within the rated frequency, reducing the factors of overclocking failure.

The actual measurement found that AB32VG1 can be overclocked to 230MHZ, which is higher than the 192MHZ in the manual, and it can run stably.

So far successfully overclocked to 230MHZ.

3. Follow-up work

After overclocking, since AB32VG1 is equipped with RT-Thread operating system by default, it is not bare metal, so the system timer needs to be reset to avoid the system time slice being too short. The system timer of AB32VG1 is timer0, and the setting code is in the void rt_hw_systick_init(void) function of board.c. As mentioned earlier, in this function, hal_set_ticks(get_sysclk_nhz()/RT_TICK_PER_SECOND); is used to set the counting register of the system timer, which will determine its interrupt frequency and then determine the length of the system time slice. Going back all the way, I found that the function to get the parameters is as follows:

uint32_t get_sysclk_nhz(void)
{
    return sysclk_index[sys.sys_clk] * 1000000;
}

This function directly takes the value from the corresponding position in the array sysclk_index and multiplies it by 1M to return. The parameter sys.sys_clk at the corresponding position is the enumeration variable SYSCLK_120M written when calling set_sysclk(SYSCLK_120M) in the void rt_hw_systick_init(void) function of board.c, so we can modify it to the current actual operating frequency in the sysclk_index array corresponding to the enumeration variable as the offset. The array is defined as follows:

const uint8_t sysclk_index[] = {
    2,
    12,
    13,
    24,
    26,
    30,
    48,
    60,
    80,
    120,
};

Here change the last 120 of the array to 230.

4. Test results

Overclocking 230MHZ is already a very high frequency. The STM32F7 series is just a main frequency of 216MHZ. When I thought it would be instant, unexpectedly, when I tried the MP3 function again, there was still a little lag. And the same algorithm can run very smoothly on STM32F429 at 180MHZ. Of course, the optimization of the RISC-V compiler is also a big reason for the delay. I look forward to the optimization of the RISC-V compiler. I believe that after optimization, the performance can be changed!

Guess you like

Origin blog.csdn.net/Fairchild_1947/article/details/123159343