In the previous section of the detailed explanation of memory barrier instructions DMB, DSB and ISB , the functions of the three memory barrier instructions were introduced and some examples were given. The timing of using memory barrier instructions depends on the processor architecture (such as Cortex-M and Cortex-A) and the system implementation of the processor (the same architecture, but different implementations, such as STM32 and NXP both have microcontrollers based on Cortex-M4) are related.

This section will continue to understand memory barriers in depth through 20 examples, mainly from the following two aspects:
(1) Processor architecture requirements : refers to the specifications and requirements defined in the hardware architecture. It describes the processor's instruction set, registers, interrupt control, memory access, pipeline structure and other hardware features. These specifications are usually determined by processor designers or architecture definition organizations (such as ARM, x86, etc.). The architectural requirements are generic and apply to all processors based on this architecture.

(2) System implementation requirements : refers to the specific methods to implement these specifications according to architectural requirements in specific processor implementations. Each processor manufacturer can design and produce its own processor according to the architecture specification, but their implementation must follow the architecture specification. Implementation requirements may vary by processor model, version, and manufacturer.

1 Access to ordinary data in memory

In this case there is no need to use a memory barrier between each memory access:

Processor architecture: The processor can reorder data transfers as long as it does not affect the execution of the program
System implementation: In the Cortex-M processor, data transmission is performed in the programmed sequence

2 Access between devices (peripherals)

During peripheral programming or peripheral access, there is no need to use memory barrier instructions between each step:

Processor architecture: Access to the same device must be performed in program order
System implementation: Cortex-M processor does not reorder data transfers

If the programming sequence involves many different devices:

Processor architecture: Memory barriers are required when different devices are accessed and the order of programming between the two devices may affect the results. This is because the bus structure may have different bus branches leading to each device, and the different bus branches may have different delays.
System implementation: Cortex-M processors do not reorder data transfers, so no memory barriers are required when accessing different devices

3 bits with access

Bitband access on Cortex-M3 and Cortex-M4 processors is a special feature. It makes two parts of the memory map bit-addressable:

Processor architecture: The bitbanding feature is not part of the ARMv7 or ARMv6 architecture, so there is no architecture-defined requirement for using memory barriers for bitbanded accesses
System Implementation: Cortex-M3 and Cortex-M4 processors handle bitband accesses, bitband regions, and bitband alias regions, in programming order. There is no need to use memory barriers

ARM Cortex-M0 and Cortex-M0+ processors do not have bit-banding features. bus wrapperThe bitbanding feature can be added to Cortex-M0 and Cortex-M0+ processors using In this case, bus wrapperthe correct memory order must be maintained.

4 SCS peripheral access

SCS peripheral accesses, such as NVIC and debug accesses, generally do not require the use of memory barrier instructions: there is no need to insert memory barrier instructions between each SCS access, nor between SCS accesses and device memory accesses.

Processor architecture: The MPU in the memory area where SCS is located is configured as strongly ordered by default, with its own DMBfunctions (as shown in the figure below)
System implementation: There is no need to insert memory barrier instructions between each SCS access, nor between SCS accesses and device memory accesses.

Processor architecture requirements

If you need to see the effect of a write to the SCS register immediately, you needDSB
No need to add memory barrier instructions between two adjacent SCS accesses
If the next instruction must be executed after the previous instruction takes effect, DSBthe instruction needs to be called at this time. Examples are as follows:

SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk; /* Enable deepsleep */
__DSB();                           /* Ensure effect of last store takes effect */
__WFI();                           /* Enter sleep mode */
---------------------
void Device_IRQHandler(void) {
	software_flag = 1;                        /* Update software variable used in thread */
	SCB->SCR &= ~SCB_SCR_SLEEPONEXIT_Msk;     /* Disable sleeponexit */
	__DSB();                                  /* Ensure effect of last store takes effect */
	return; 
}

Note that when the program accesses the Normal memory, DMBthe memory ordering cannot be guaranteed at the system architecture level when the SCS accesses it. If the operation of the program depends on the ordering between accesses to the SCS and accesses to ordinary memory, then a memory barrier instruction such as DMBor DSB. Below is an example:

STR R0, [R1] ; Access to a Normal Memory location
DMB          ; Add DMB ensures ordering for ALL memory types
STR R3, [R2] ; Access to a SCS location
DMB          ; Add DMB ensures ordering for ALL memory types
STR R0, [R1] ; Access to a Normal Memory location

Not required if [R1]pointing to device memory area or strongly ordered memory areaDMB

System Implementation Requirements
In existing Cortex-M processors, omitting the DMBor DSBinstruction will not cause an error because the SCS in these processors already contains DSBthe behavior:

In Cortex-M0, M0+ processors, this behavior occurs immediately after the access is completed. After the SCS visit, it is not strictly required DSB.
In Cortex-M3 and M4 processors, the effect of the memory barrier takes effect immediately after accessing the SCS. SLEEPONEXITFor access to SCS memory, the use of instructions is usually not strictly required, except for updates in special cases DSB.
- If the exception handler disables SLEEPONEXITfeatures in the SCS before the exception returns, a DSB instruction is required after the SCR is written but before the exception returns. Refer to the previous Device_IRQHandlerexample.

Take a look at the performance of SCS access at the system implementation level:
Insert image description here
The figure makes it clear that every access to SCS (including NVIC) has the associated DSBeffect of automatically adding data synchronization barriers to device/strongly ordered access. So for the previous 处理器架构要求example, DSByou can remove:

SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk; /* Enable deepsleep */
__WFI();                           /* Enter sleep mode */

Notice:

Existing Cortex-M processors do not reorder any data transfers, so DMBinstructions do not need to be used
For Cortex-M3 and Cortex-M4 processors, if the instruction after the SCS load/store is a NOP instruction, or a conditional fail() condition failedinstruction, the NOP instruction or conditional fail instruction can be executed in parallel with the SCS load/store instruction.

5 Enable interrupts through NVIC

Normally, NVIC operations do not require the use of memory barrier instructions. The code is as follows:

device_config();                         // Setup peripheral
NVIC_ClearingPending(device_IRQn);       // clear pending status
NVIC_SetPriority(device_IRQn, priority); // set priority level
NVIC_EnableIRQ(device_IRQn);             // Enable interrupt

When an interrupt event occurs, it can enter the pending state first, regardless of whether the interrupt is enabled or not.

As mentioned earlier, from an architectural point of view, every time SCS is accessed (NVIC belongs to SCS), access to device memory or strongly ordered memory will be inserted before two adjacent operations DMB.
Insert image description here
From the system implementation of the Cortex-M processor, every time SCS is accessed (NVIC belongs to SCS), access to device memory or strongly ordered memory will be inserted before two adjacent operations DSB.

For the Cortex-M processor, due to its pipeline nature, if the interrupt is already pending, the processor can execute up to two additional instructions after enabling the interrupt in the NVIC before executing the interrupt service routine. As shown in the figure below:
Insert image description here
Processor architecture requirements.
Different applications have different requirements:

In normal NVIC operation, there is no need to use memory barriers
Between operations between NVIC and peripherals without using memory barriers
If a pending interrupt needs to be responded to immediately after enabling NVIC, you need to add one DSB, and then add anotherISB

If the instruction after the interrupt depends on the result of the pending interrupt, a memory barrier instruction should be added. An example of handling interrupts is as follows:

LDR R0, =0xE000E100         ; NVIC_SETENA address
MOVS R1, #0x1
STR R1, [R0]                ; Enable IRQ #0
DSB                         ; Ensure write is completed
; (architecturally required, but not strictly
; required for existing Cortex-M processors)
ISB                         ; Ensure IRQ #0 is executed
CMP R8, #1                  ; Value of R8 dependent on the execution
; result of IRQ #0 handler

If the above memory barrier instruction is omitted, CMPit will be executed before the interrupt occurs, as shown in the following figure:
Insert image description here
System implementation requirements
Different applications have different requirements:

In normal NVIC operation, there is no need to use memory barriers
If a pending interrupt needs to be responded to immediately after enabling NVIC, an ISBinstruction needs to be added

Note : Since access to the NVIC(SCS) inherently has DSBa memory barrier, omitting DSBthe instruction will still immediately identify enabled and pending interrupts.

6 Turn off interrupts through NVIC

Due to the Cortex-M's pipeline architecture, interrupts can be disabled by writing to the NVIC while entering the interrupt sequence (a series of operations and instructions executed by the processor when an interrupt event occurs). Therefore, it is possible for an interrupt handler to be executed immediately after the NVIC disables interrupts.
Insert image description here
Processor Architecture Requirements
Depending on the application requirements, memory barriers need to be used:

General NVIC programming does not require the use of memory barriers when IRQ is disabled
There is also no need to use a memory barrier between NVIC programming and peripheral programming
If you need to ensure that interrupts are not triggered after the NVIC disables them, you can add the instruction DSBfollowed by theISB

The following is an example of switching interrupt handling functions (modifying vector tables):

#define MEMORY_PTR(addr) (*((volatile unsigned long *)(addr)))

NVIC_DisableIRQ(device_IRQn);
__DSB();
__ISB();
// Change vector to a different one
MEMORY_PTR(SCB->VTOR+0x40+(device_IRQn<<2))=(void) device_Handler;

System implementation requirements
According to different application requirements, memory barriers need to be used:

There is no need to use a memory barrier when disabling IRQ in normal NVIC programming
There is also no need to use a memory barrier between NVIC programming and peripheral programming
If you need to ensure that the interrupt is not triggered after the NVIC disables the interrupt, you can add ISBthe instruction

7 Enable interrupts using CPS and MSR instructions

In a normal application, there is no need to add any barrier instructions after enabling interrupts using the CPS instruction:

_enable_irq();         /* 实际上是执行CPSIE I来清除PRIMASK */

If an interrupt is already pending, CPSIE Ithe processor will handle the interrupt after the call. However, before the processor enters the exception handler, additional instructions may be executed:

For Cortex-M3 or Cortex-M4, the processor can execute up to two additional instructions before entering the interrupt service routine
For Cortex-M0, the processor can execute up to one additional instruction before entering the interrupt service routine

As shown in the figure below:
Insert image description here
Processor architecture requirements

If it is necessary to ensure that pending interrupts are recognized before subsequent operations are performed, the instruction should be CPSIE iused afterwards ISB, as shown in the following figure:
If you want to allow a pending interrupt to occur between two critical section tasks, you can use ISBinstructions to achieve this. The code looks like this:

__enable_irq();  // CPSIE I : Enable interrupt
__ISB();         // Allow pended interrupts to be recognized
__disable_irq(); // CPSID I : Disable interrupt

Insert image description here
Another typical example is:

the timing diagram is as follows:

When using MSRinstructions to enable interrupts, the requirements are the same as above

System Implementation Requirements
In Cortex-M processors:

If it is necessary to ensure that pending interrupts are recognized before subsequent operations are performed, the instruction should be CPSIE iused afterwards ISB. This is the same as the processor architecture requirement
There is an exception CPSIEfollowed by CPSID, but in Cortex-M processors, there is no need to insert between CPSIEand . code show as below:CPSIDISB

The timing diagram is as follows:
Insert image description here

In system implementation requirements, there is no need to add memory barrier instructions between __enable_irq()and . __disable_irq()However, within the processor architecture requirements, if interrupts need to be identified between CPSIEand instructions, then instructions CPSIDare required .ISB

When using MSRinstructions to enable interrupts, the requirements are the same as above

Depending on the processor architecture requirements , in some cases, instructions need to be added if it is necessary to ensure that interrupts are correctly recognized ISB. This is because in some specific processor architectures, enabling and disabling interrupts may require additional synchronization to ensure their correctness. ISBTherefore, using directives is a way to ensure correct behavior depending on architectural requirements . In the absence of a memory barrier added to the system implementation requirements , this operation is already handled reasonably in the specific architecture, so no additional memory barrier is required. In the code just now, according to the specific system implementation requirements , it does not need to add memory barrier instructions between __enable_irq()and . __disable_irq()This means that in a specific processor implementation, interrupt enabling and disabling operations are already properly synchronized at the hardware level without the need for additional memory barrier instructions.

8 Use CPS and MSR instructions to turn off interrupts

CPSIDInstructions synchronize themselves in the instruction stream, eliminating the need CPSIDto insert memory barrier instructions later.
Processor architectural requirements
No need to use memory barriers.

System Implementation Requirements
No need to use memory barriers.

When using MSRinstructions to turn off interrupts, the requirements are the same as above

9 Disable peripheral interrupts

When an interrupt is disabled on a peripheral, additional time may be required due to the many possible sources of latency in the system. The following figure shows several different sources of latency:
Insert image description here
Even after the peripheral is disabled, an interrupt request from the disabled peripheral may be received for a short period of time.

Processor architecture requirements:
No requirements, everything is determined by the following system implementation requirements.

System Implementation Requirements
Latency depends on the device. For most cases, if the delay in the IRQ synchronizer is small, the following steps can be used to disable interrupts:

CONTROLDisable peripheral interrupts by writing to its control register
Read the peripheral's control register to ensure it has been updated
Disable IRQ in NVIC
Clear IRQ pending status in NVIC
Read IRQ pending status. If IRQ pending is set, clear the IRQ pending status in the peripheral, and then clear the IRQ pending status in the NVIC again. This step must be repeated until the NVIC IRQ pending status remains clear.

This sequence of steps works on most simple microcontroller devices and can successfully disable interrupts. However, due to various latency factors that can occur within the system, it is recommended to contact the chip vendor or manufacturer for support.

10 Change the priority of interrupts

The priority SCSsetting is determined NVICby Priority Levelthe register in. For Cortex-M3 or Cortex-M4 processors, priority levels can be changed dynamically. However, for ARMv6-M processors, such as Cortex-M0 or Cortex-M0+, dynamically changing the priority of enabled interrupts or exceptions is not supported. Priority should be set before interrupts are enabled.

Processor Architecture Requirements
Since SCSit is strongly ordered memory, NVICthe configuration does not require memory barriers. However, after changing the interrupt priority, if the interrupt is enabled and the interrupt is required to execute at the new priority level, the DSBand ISBinstruction should be inserted after it.
Insert image description here

Note: On ARMv6-M processors, the priority level of interrupts should only be changed if interrupts are disabled, otherwise the results are unpredictable

If the next instruction is CPSIEor MSR, according to the processor architecture requirements, an instruction should be inserted DSB, and then another ISBinstruction should be inserted (if you want a pending interrupt to be recognized immediately, call it ISB, otherwise it does not need to be called). Such an operation sequence can ensure the correct switching of interrupt status and the sequential execution of instructions to prevent inconsistent interrupt status or out-of-order execution of instructions.

System Implementation Requirements
In the Cortex-M processor, accessing the interrupt priority register itself has DSBbarriers because SCSit is a strongly ordered memory. In Cortex-M3 or Cortex-M4 processor:

If you need to immediately identify changes in priority, you need to use ISBthe directive
If a change in priority does not need to be recognized immediately before subsequent operations, there is no need to insert a memory barrier instruction
If the next operation is SVCan exception, there is no need to insert a memory barrier instruction.

For Cortex-M3 or Cortex-M4 processors, if a change in priority level may result in a new interrupt nesting (higher priority than the currently executing interrupt), And if we want to execute this interrupt immediately, we need to insert ISBinstructions. Otherwise, due to pipeline reasons, up to two more instructions may be executed.

11 Vector table configuration-VTOR

In Cortex-M3 and Cortex-M4 processors, the location of the vector table is determined by the setting SCSin VTOR( Vector Table Offset Register).

Processor Architecture Requirements
Architecturally speaking, VTORif you want to raise an exception immediately after a change and use the latest vector table settings, you should use DSBinstructions.
Insert image description here
System Implementation Requirements
In the Cortex-M3, Cortex-M4 and Cortex-M0+ processors, access to the SCSprocessor itself has DSBa barrier, so there is no need to insert a DSB instruction.

Cortex-M0 processor does notVTOR

12 Vector table entry configuration

This refers to updating individual entries in the vector table ( entry).

Processor architectural requirements
If the vector table is located in RAM (such as SRAM/SDRAM), whether by VTORrelocation or through the device-dependent memory remapping mechanism, architecturally speaking, after updating the vector table entry, if you want to immediately To enable exceptions, you need to use memory barrier instructions. As shown in the figure below:
Insert image description here
If the next instruction is to access RAM, another DMBinstruction is needed:

that is to say, if the vector table is stored in ordinary memory instead of strongly ordered memory, memory barrier operations need to be considered.
System Implementation Requirements
In Cortex-M processors, omitting the DSBOR DMBinstruction does not cause any problems when modifying vector entries because the sequence of exception entries does not start until the last memory access is completed.
Insert image description here

13 Memory mapping changes

Many microcontrollers include a device-specific memory remapping feature that allows the memory map to be changed at runtime by programming a configuration register that should be placed in device memory ( ) device memory. Whether memory barrier instructions are required during changes to memory map configuration depends on the following factors:

Whether the affected memory space covers the program code, that is, whether it includes instructions.
A device-specific data path between the processor and memory configuration registers, such as a write buffer (the write buffer needs to go between the CPU and the registers).
Processor Architecture Requirements
The requirements discussed here apply:
There are no device-specific write buffers that affect the memory remapping control register, other than any internal write buffers within the processor
No additional hardware latency in memory map switching

Architecturally speaking, memory barrier instructions should be inserted before and after making memory map changes:

... ; application code before switching
DSB ; Ensure all memory accesses are completed
STR <remap>, [<remap_reg>] ; Write to memory; map control register
DSB ; Ensure the write is completed
ISB ; Flush instruction buffer (optional, only required if
; the memory map change affects program memory)
... ; application code after switching

Insert image description here

If the affected memory is not used in any program code, DSBinstructions need to be inserted after the memory map change, but ISBthe instructions can be omitted.
Insert image description here
System Implementation Requirements
The requirements discussed here also apply:

There are no device-specific write buffers that affect the memory remapping control register, other than any internal write buffers within the processor
No additional hardware latency in memory map switching

In Cortex-M processor:

No DSB or DMB instructions are required before making a memory map change because these processors do not allow two sequences of write operations to overlap
After remapping, a sequence of DSB and then ISB is required to ensure that the program code is fetched using the latest memory map

In this case study, two assumptions are made. If these assumptions are invalid, for example, if the data path between the processor and the memory control registers contains additional system-level write buffers, then the memory barrier instructions cannot guarantee that the transfer is completed. in this case:

Read operations can be performed from previously accessed areas to ensure that the write buffer is cleared. If multiple write transfers have been issued to various parts of the system, multiple read operations may be required to ensure that all write buffers are flushed.
Alternatively, the microcontroller or SoC may have a status register that indicates if there are any transfers in progress and notifies when memory remapping is complete. This allows program code to account for additional hardware delays in memory remapping logic if needed.

Specific differences still have to consult different chip manufacturers in detail.

14 Enter sleep mode

In Cortex-M processors, sleep mode can be entered using the WFIand WFEinstruction.

Processor Architectural Requirements
Architecturally, instructions should be used before executing the WFIor WFEinstruction DSB:
Insert image description here

System Implementation Requirements
For simple designs that do not include a system-level write buffer, there is no need to use memory before entering sleep mode on Cortex-M3 (r2p0 or later), Cortex-M4, Cortex-M0+ and Cortex-M0 processors Barrier command. This is handled by the processor itself. The situation is more complicated if the internal bus contains a system-level write buffer that is external to the processor. In this case, just using DSBthe instruction may not be enough because the system control logic may turn off the clock before the buffer write is completed.
Insert image description here
The clock signal being turned off may not cause an error, depending on the system-level design, the sleep operation used, and the peripherals being accessed before entering sleep mode. It is recommended to contact the chip supplier or manufacturer for device details. This problem can usually be solved by adding a dummy read operation to the write buffer to ensure that the write buffer is cleared. The image below shows a possible workaround:
Insert image description here

15 self-start

The Cortex-M processor has a self-start( self-reset) function. A system reset can be triggered via bits AIRCRin the register . SYSRESETREQIn the CMSIS library, you can use C functions NVIC_SystemReset(void)to use this functionality.

Processor architecture requirements
Before self-starting, DSBinstructions need to be used to ensure that all outstanding transfers have been completed, and CPSID Iinterrupts can be turned off. This is optional and can prevent an enabled interrupt request from being triggered during the self-starting process. .
Insert image description here
System Implementation Requirements
On Cortex-M processors, instructions are optional if not CPSIDused DSB. Because access to SCSthe buffer is already DSBbarrier-free, autostart cannot begin until the write operation is complete. As shown in the figure below:
Insert image description here
If instructions are used CPSID, they should be inserted DSBto ensure that CPSIDthey are not executed until the previous transfer has completed. This way, if a previous transfer caused it imprecise bus fault, it will happen before interrupts are disabled.

No exceptions on ARMv6-M bus faultand therefore not available on Cortex-M0 processors

Insert image description here
If the system has a write buffer at the bus level, you can perform a virtual read operation in the write buffer to ensure that the CPSIDsystem-level write buffer has been cleared before executing instructions and performing self-start. As shown below:

If you are using CMSIS 2.0 or higher, NVIC_SystemReset(void)the function already contains DSBinstructions.

16 CONTROL register

CONTROLThe register is one of the special registers implemented in the Cortex-M processor and it can be accessed through the MSRand MRSinstructions.

Processor Architectural Requirements
Architecturally, instructions CONTROLshould be used after modifying a register ISB. Below is a code that switches from privileged execution to unprivileged execution.

This operation is not supported in Cortex-M0

MOVS R0, #0x1
MSR CONTROL, R0 ; Switch to non-privileged state
ISB             ; Instruction Synchronization Barrier
...

ISBMake sure to fetch at the correct privilege level, as shown in the image below:
Insert image description here
We can also use CONTROLregisters to select which stack pointer to use in thread mode.

System Implementation Requirements
In the Cortex-M processor, CONTROLnot executing ISBinstructions after writing to a register will not cause a program error unless you change the privilege level and the previous privilege level has prefetched the following instructions. Directives are required if and only if you need to use the correct permission level for subsequent instructions ISB.

17 MPU programming

MPU is an optional feature of Cortex-M0+, Cortex-M3 and Cortex-M4 processors.

Processor architecture requirements

The MPU configuration registers are located in the SCS, so there is no need to insert memory barrier instructions between each step of MPU programming
Strongly ordered memory does not enforce the order associated with normal memory accesses. In the architectural design, the MPU programming sequence needs to be executed before DMB, and after the MPU programming is completed DSBto ensure that all settings are visible to all buses.
If a change in MPU settings affects program memory, a ISBdirective should also be added to ensure instructions are re-fetched with the updated MPU settings

If you perform the MPU programming step in an exception handler, no ISBinstructions are needed because the exception entry and exit boundaries are already there ISB. For example, in an application running an RTOS, PendSVthe user thread's MPU locale can be updated during a context switch within an exception handler. Switch the exception exit sequence between PendSV and user threads to ensure MPU settings take effect. This applies to the architectural behavior and system implementation of current Cortex-M processors.

System implementation requirements

On Cortex-M0+, Cortex-M3 and Cortex-M4 processors, omitting DMBinstructions before entering MPU programming code does not cause problems;
Omitting instructions after completing the MPU programming code DSBwill not cause problems.
If a change to the MPU settings only affects the memory holding data and not the memory holding the program, no instruction is required on the Cortex-M processor ISB. Instructions are required if subsequent instructions need to be fetched using the new MPU settings ISB.

18 multi-master system

If you want your code to run normally on multiple systems, that is, consider the portability of the code under different architectures, then using memory barrier instructions is necessary.

Processor Architecture Requirements The OR instruction is needed when dealing
with shared data and needs to ensure that their order in memory is not altered or messed up . For example, the instruction needs to be used before initiating a DMA operation .DMBDSBDMB

You can also use DSBinsteadDMB

Insert image description here
If not DMB, architecturally the two stores may be reordered or overlapped, such as DMApossibly starting before the data update is complete. Another multi-master example is the communication of information in shared memory between two processors. When passing data to another program running on a different processor, the data is typically written to shared memory and then a software flag is set in the shared memory. In this case, the DMBor DSBinstruction should be used to ensure correct memory ordering between the two memory accesses:
Insert image description here
the interaction between the two processors is not limited to shared memory. Another possible interaction is event communication (such as message queue). In this case, DSBinstructions may be needed to ensure that the correct order between memory transfers and events is preserved.
Insert image description here

System Implementation Requirements When the or instructions are removed from the first two diagrams of
the previous processor architecture requirements , no error will occur because the Cortex-M processor does not reorder memory transfers and does not allow two write transfers to overlap.DMBDSB

In Cortex-M3 and Cortex-M4 processors, the last picture in the processor architectural requirementsDSB requires the use of instructions. But in Cortex-M0 processor, omitting the DMBor DSBinstruction will not cause any error in these three examples because there is no write buffer in Cortex-M0 processor.

19 Semaphores and mutexes (single-core and multi-core)

Semaphores and mutex operations are essential in many operating systems. They can be used in either a single-processor environment or a multi-processor environment.

In a multiprocessor environment, semaphore operations require software variables to be placed in shared memory among multiple processors. To ensure correct operation, memory barrier instructions should be used. If a cache is present in a multiprocessor system, you must ensure that the correct cache configuration is used so that the data in shared memory is consistent across all processors.

Processor architecture requirements
The DMB instruction should be used in semaphore and mutex operations. The following example shows a simple code to acquire the lock. After acquiring the lock, you need to use the DMB instruction:

/* Note: __LDREXW和__STREXW是CMSIS函数 */
void get_lock(volatile int *Lock_Variable)
{ 
	int status = 0;
	do {
		while (__LDREXW(&Lock_Variable) != 0); // Wait until Lock_Variable is free
		status = __STREXW(1, &Lock_Variable);  // Try to set Lock_Variable
	} while (status!=0);                       //retry until lock successfully
	__DMB();
	return;
}

Likewise, the code that releases the lock should have a memory barrier at the beginning:

void free_lock(volatile int *Lock_Variable)
{
	__DMB();          // Ensure memory operations completed before
	Lock_Variable = 0;// releasing lock
	return;
}

This is done to prevent other threads from "prematurely" releasing the lock due to pipeline reasons before the lock is released, so that they can access the shared resource.

System implementation requirements

On microcontroller devices using Cortex-M3 and Cortex-M4 processors, omitting DMBinstructions in semaphore and mutex operations does not cause an error. But it might go wrong in the following cases:

Processor has cache
Software is used on multi-core systems.

ARM recommends the use of instructions for semaphore and mutex operations in operating system design DMB.

Cortex-M0 and Cortex-M0+ processors do not have mutually exclusive access instructions

20 Self-modifying code

Usually our code is static and cannot be modified. But in fact, the program is allowed to modify its own code at runtime, and the modified code will be executed immediately, thereby changing the behavior of the code. This situation is generally used for: anti-cheating (improving the complexity and security of the program), encryption and decryption (improving data security).

If your program contains self-modifying code, you need to use a memory barrier if the modified program code is to be executed shortly after the modification. Since program code can be prefetched, DSBinstructions should be executed and then executed ISBto ensure the pipeline is flushed.

Processor Architectural Requirements The architectural requirements are instructions that immediately follow a
use instruction after modifying program memory .DSBISB

STR <new_instr>, [<inst_address1>]
DSB               ; Ensure store is completed before flushing pipeline
ISB               ; Flush pipeline
B <inst_address1> ; Execute updated program

The following figure shows the memory barrier instructions required to meet processor architectural and implementation requirements for self-modifying code.
Insert image description here
If a cache exists in the system, cache flush operations should be made to ensure that the instruction cache is updated.

System Implementation Requirements
In general, after modifying the program memory, it needs to be used first DSBand then used ISB. This can be omitted if there is no write buffer or cache in the processor or system, such as a Cortex-M0 based microcontroller DSB.

Cortex-M3 and Cortex-M4 processors can prefetch up to six instructions. If an application executes an instruction shortly after modifying it in program memory, the previous instruction may be used. If the instructions are not used for a period of time after modification, the program may work correctly, but this is not guaranteed.

Some Cortex-M3 and Cortex-M4 designs may have implementation-specific program cache to speed up program memory accesses. Additional steps may be required after modifying program code to ensure that the program cache is cleared.

Summarize

In the ARM architecture, memory barriers are a mechanism used to ensure that programs execute in the expected order in a multi-core or multi-thread environment. ARM defines three memory barrier instructions, namely DMB(data memory barrier), DSB(data synchronization barrier) and ISB(instruction synchronization barrier).

DMBInstructions are used to ensure the order of memory accesses. In multi-core processors, the caches of different cores may cause data consistency problems. DMBInstructions add barriers between multiple cores to ensure that the order of execution of instructions is consistent with the order of memory access and avoid disordered reading and writing of data.

DSBInstructions are used to ensure instruction completion and data synchronization. It ensures that DSBall instructions before the instruction have completed execution before executing DSBthe instructions after the instruction. This can avoid out-of-order execution of instructions and out-of-order reading and writing of data, ensuring the order of execution.

ISBInstructions are used to ensure synchronization of instructions. It flushes all instruction caches and pipelines, causing execution of the instruction sequence ISBto restart from the instruction after it. This ensures that ISBall instructions before the instruction have been executed, and all caches during execution are cleared, so that the execution results of the instructions are consistent with expectations.

To sum up, ARM's memory barrier mechanism ensures the correct sequence execution of programs in a multi-core or multi-thread environment through // instructions DMB. These instructions provide synchronization and sequence guarantees of memory and instructions, ensuring program correctness and reliability.DSBISB

ARM Advanced: Detailed explanation of 20 usage examples of memory barrier (DMB/DSB/ISB)

Article directory