Microsecond delay solution on RTOS

Follow the + star public account and never miss exciting content

Reprinted from | MultiMCU EDU

Usually RTOS system tick is 1KHz , of course, there are also cases of 100Hz or 10KHz.

At 1KHz, the minimum system delay is 1ms. In real-time control, some situations require microsecond (us) level delay. What should we do?

There are two ways to implement microsecond delay:

1. Increase the system clock as much as possible
2. Using MCU’s high-precision timer

1. Increase the system clock as much as possible

The reason why it is said to be "passionate" is that the faster the system clock is, the more threads are scheduled per unit time, which means that the time spent on scheduling will increase significantly, which is detrimental to the function of the thread. What really does the work is the thread function. If the CPU could talk, too fast thread scheduling would cause extreme dissatisfaction on the CPU. Threads are the specific things that the CPU has to do. As soon as the CPU is transferred to do something, and before the thing is finished, it is pulled away to do another thing. The CPU will say: "Fool, are you crazy? It's not the code that asked me to do things, why do you keep doing it?" You're dragging me here and there, but you can't let me finish the job before moving on?!"

2. Using the MCU on-chip peripheral timer

Generally, MCUs will have on-chip high-precision timer peripherals that can be configured to 1us accuracy. Since it’s okay to use a timer, then just use a timer. Why else write an article? Of course, it is not just as simple as turning on the timer. What RTOS wants to achieve is blocking delay. When a task enters the delay, it must hand over the CPU usage rights and enter the blocking state. It is a rogue behavior to use timers on RTOS to lie down and wait. Only by giving up power to sleep can good multi-thread scheduling be achieved.

Although the us-level delay time is short, it is unlikely that another thread will start to delay while one thread is in delay. However, in the case of multi-threaded delay, reentrancy may still occur. For example, one thread will be delayed by 500us, and just after 100us, another thread will be delayed by 200us. In this case, not only reentrancy occurs, but also "time coverage" (200us covers the remaining 400us period of the previous thread). These situations cannot be dealt with by just relying on a hardware high-precision timer.

Analysis of multi-thread delay conditions

Let’s first look at a multi-thread delay condition diagram, as shown in the figure:

In order to facilitate reading and further design implementation, some comments have been added to the above figure to describe the multi-threaded working conditions in more detail, as shown in the figure:

For a better explanation, Microsoft Azure RTOS ThreadX is selected as the basis to implement this design. The purpose is to output general methods. It doesn't matter which RTOS you choose, as long as it is multi-threaded, such as RT-Thread, FreeRTOS, etc.

A, B, C and High-precision Timer in the figure are 4 threads. Among them, the High-precision Timer thread has the highest priority, but it is not called back regularly, but is triggered passively. Let’s talk about why the High-precision Timer thread has the highest priority and how to trigger it passively.

We know that WAIT_FOREVERwhen a thread uses the method to wait for a semaphore, if the value of the semaphore is 0, the thread will be suspended under this semaphore. We use this feature to complete the "passive triggering" of threads, namely:

1. The initial value of the semaphore is 0 when it is established.

2. Release the semaphore once during the interrupt (that is, add 1 to the semaphore value)

In this way, the thread suspended under the semaphore can be awakened immediately after the interrupt occurs, that is, the passive triggering of the thread is completed. After the thread transitions to the ready state, because it has the highest priority, it will immediately preempt the scheduler for execution. After the Hight-precision Timer thread is awakened by the semaphore, the resume operation is immediately performed on the thread whose delay time has expired, thus completing the us delay of the thread.

Let's look back at the three threads A, B, and C in the picture above. Each line is strung with two circles. The first circle from top to bottom of each line is delayed active suspension, and the second circle is The circle is resumed by the High-precision Timer thread after the time is up to continue execution.

At this point, the method of reading pictures is basically explained clearly. If you want to implement it into the code, there is actually a relationship between "hardware timer and High-precision Timer thread". The label on the left side of High-precision Timer in the figure says: Because the hardware timer generates an interrupt, the High-precision Timer thread resumes the thread whose delay time has expired. When talking about "passive triggering" above, we mentioned related principles. In fact, there should be another column on the rightmost side of the picture above to indicate "hardware timer" to better understand the principle. The reason why I didn't put it is that "re-entrancy" needs to be considered here. There are a lot of things that can't fit in one car. If it's too little, it won't be perfect. If it's too much, it'll be confusing. So I didn't draw the "hardware timer" column.

Code

In order to achieve the blocking delay of the above design, the code must be divided into four parts:

1. Configure a us-level timer;

2. Make a us delay function interface;

3. There must be a High-precision Timer thread;

4. There must be a us-level ordinary timing callback thread for testing.

The following uses STM32 as an example to describe the code one by one.

us level timer configuration

1. Timer initialization

It is most convenient to directly use the function generated by CubeMX here without changing a single line, as follows:

/**
  * @brief TIM9 Initialization Function
  * @param None
  * @retval None
  */
static void MX_TIM9_Init(void)
{


  /* USER CODE BEGIN TIM9_Init 0 */


  /* USER CODE END TIM9_Init 0 */


  TIM_ClockConfigTypeDef sClockSourceConfig = {0};


  /* USER CODE BEGIN TIM9_Init 1 */


  /* USER CODE END TIM9_Init 1 */
  htim9.Instance = TIM9;
  htim9.Init.Prescaler = 215;
  htim9.Init.CounterMode = TIM_COUNTERMODE_UP;
  htim9.Init.Period = 65535;
  htim9.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
  htim9.Init.AutoReloadPreload = TIM_AUTORELOAD_PRELOAD_DISABLE;
  if (HAL_TIM_Base_Init(&htim9) != HAL_OK)
  {
    Error_Handler();
  }
  sClockSourceConfig.ClockSource = TIM_CLOCKSOURCE_INTERNAL;
  if (HAL_TIM_ConfigClockSource(&htim9, &sClockSourceConfig) != HAL_OK)
  {
    Error_Handler();
  }
  /* USER CODE BEGIN TIM9_Init 2 */


  /* USER CODE END TIM9_Init 2 */


}

Since we want to use the timer's scheduled interrupt, we need to set up the NVIC. This part of the code CubeMX is generated in another file. For the convenience of calling, it is combined with the above initialization function, as follows:

void bsp_InitHardTimer(void)
{
    __HAL_RCC_TIM9_CLK_ENABLE();
    HAL_NVIC_SetPriority(TIM1_BRK_TIM9_IRQn, 0, 0);
    HAL_NVIC_EnableIRQ(TIM1_BRK_TIM9_IRQn);
    MX_TIM9_Init();
}

Note that it is enough to adjust to the initialization function here. Do not turn on the timer. According to the design, the timer is turned on only when the thread that needs to be delayed calls the delay function.

2. Open the timer function

void bsp_DelayUS(uint32_t n)
{
    n = (n<=30) ? n : (n-30);
    HAL_TIM_Base_Stop_IT(&htim9);
    htim9.Instance->CNT = htim9.Init.Period - n;
    HAL_TIM_Base_Start_IT(&htim9);
}

Note here that "turn off first and then turn on". As mentioned above, when delaying in the case of "time override", you must first turn off the timer that is being delayed.

3. Timer interrupt function

/**
  * @brief This function handles TIM1 break interrupt and TIM9 global interrupt.
  */
void TIM1_BRK_TIM9_IRQHandler(void)
{
  /* USER CODE BEGIN TIM1_BRK_TIM9_IRQn 0 */


  /* USER CODE END TIM1_BRK_TIM9_IRQn 0 */
  HAL_TIM_IRQHandler(&htim9);
  /* USER CODE BEGIN TIM1_BRK_TIM9_IRQn 1 */
  tx_semaphore_put(&tx_semaphore_delay_us);
  HAL_TIM_Base_Stop_IT(&htim9);
  /* USER CODE END TIM1_BRK_TIM9_IRQn 1 */
}

The Microsoft Azure RTOS ThreadX API to release the semaphore is called here tx_semaphore_put(). The semaphore is established during initialization (the code to establish the semaphore is omitted).

us delay function interface

TX_THREAD       *thread_delay_us;


UINT  tx_thread_sleep_us(ULONG timer_ticks)
{
    TX_THREAD_GET_CURRENT(thread_delay_us)
    bsp_DelayUS(timer_ticks); 
    tx_thread_suspend(thread_delay_us);
    return TX_SUCCESS;
}

A global variable thread_delay_us is defined here. Use to TX_THREAD_GET_CURRENT()get the thread that calls us delay, and after turning on the timer, the thread will be tx_thread_suspend()suspended.

High-precision Timer thread

extern TX_THREAD*      thread_delay_us;


UINT status;
void threadx_task_delay_us_run(ULONG thread_input)
{
    (void)thread_input;


    while(1){
        tx_semaphore_get(&tx_semaphore_delay_us, TX_WAIT_FOREVER);
        if(thread_delay_us){
            status = tx_thread_resume(thread_delay_us);
        }
    }
}

The thread establishment process is also omitted here, and the thread main body is given: together with the semaphore, the tx_semaphore_delay_uspassive triggering of the thread is completed, and the resume of the thread_delay_us thread is completed.

Test using us-level ordinary scheduled callback threads

#include "pthread.h"


VOID    *pthread_test_entry(VOID *pthread1_input)
{
    while(1) 
    {
        //print_task_information();
        uint64_t now = get_timestamp_us();
        tx_thread_sleep_us(100);
        printf("delay_us: %lld\r\n", get_timestamp_us() - now);
    }
}

This is a thread created with the posix interface API. If you are interested in posix, you can read the article "Posix Interface of Azure RTOS ThreadX" .

time granularity testing

ThreadX is said to be able to achieve sub-microsecond context switching on a 200MHz MCU, and the time granularity of the Sugar test is relatively stable at 150us. This is not to say that ThreadX performance is not good, but that the STM32F7 timer takes about 30us to turn on and off, so do not turn the timer on and off when the timing accuracy is less than 30us, but this time our design is to deal with possible re-entrancy. , there must be a timer switch.

How do you know that it takes 30us to open and close each time? The reason is as shown below:

------------ END ------------

●Column "Embedded Tools "

●Column "Embedded Development"

●Column "Keil Tutorial"

●Embedded column selected tutorials

Follow the official account and reply " Add Group " to join the technical exchange group according to the rules, and reply " 1024 " to view more content.

Click " Read the original text " to view more sharing.