In the previous section of the detailed explanation of memory barrier instructions DMB, DSB and ISB , the functions of the three memory barrier instructions were introduced and some examples were given. The timing of using memory barrier instructions depends on the processor architecture (such as Cortex-M and Cortex-A) and the system implementation of the processor (the same architecture, but different implementations, such as STM32 and NXP both have microcontrollers based on Cortex-M4) are related.
This section will continue to understand memory barriers in depth through 20 examples, mainly from the following two aspects:
(1) Processor architecture requirements : refers to the specifications and requirements defined in the hardware architecture. It describes the processor's instruction set, registers, interrupt control, memory access, pipeline structure and other hardware features. These specifications are usually determined by processor designers or architecture definition organizations (such as ARM, x86, etc.). The architectural requirements are generic and apply to all processors based on this architecture.
(2) System implementation requirements : refers to the specific methods to implement these specifications according to architectural requirements in specific processor implementations. Each processor manufacturer can design and produce its own processor according to the architecture specification, but their implementation must follow the architecture specification. Implementation requirements may vary by processor model, version, and manufacturer.
Article directory
- 1 Access to ordinary data in memory
- 2 Access between devices (peripherals)
- 3 bits with access
- 4 SCS peripheral access
- 5 Enable interrupts through NVIC
- 6 Turn off interrupts through NVIC
- 7 Enable interrupts using CPS and MSR instructions
- 8 Use CPS and MSR instructions to turn off interrupts
- 9 Disable peripheral interrupts
- 10 Change the priority of interrupts
- 11 Vector table configuration-VTOR
- 12 Vector table entry configuration
- 13 Memory mapping changes
- 14 Enter sleep mode
- 15 self-start
- 16 CONTROL register
- 17 MPU programming
- 18 multi-master system
- 19 Semaphores and mutexes (single-core and multi-core)
- 20 Self-modifying code
- Summarize
1 Access to ordinary data in memory
In this case there is no need to use a memory barrier between each memory access:
- Processor architecture: The processor can reorder data transfers as long as it does not affect the execution of the program
- System implementation: In the Cortex-M processor, data transmission is performed in the programmed sequence
2 Access between devices (peripherals)
During peripheral programming or peripheral access, there is no need to use memory barrier instructions between each step:
- Processor architecture: Access to the same device must be performed in program order
- System implementation: Cortex-M processor does not reorder data transfers
If the programming sequence involves many different devices:
- Processor architecture: Memory barriers are required when different devices are accessed and the order of programming between the two devices may affect the results. This is because the bus structure may have different bus branches leading to each device, and the different bus branches may have different delays.
- System implementation: Cortex-M processors do not reorder data transfers, so no memory barriers are required when accessing different devices
3 bits with access
Bitband access on Cortex-M3 and Cortex-M4 processors is a special feature. It makes two parts of the memory map bit-addressable:
- Processor architecture: The bitbanding feature is not part of the ARMv7 or ARMv6 architecture, so there is no architecture-defined requirement for using memory barriers for bitbanded accesses
- System Implementation: Cortex-M3 and Cortex-M4 processors handle bitband accesses, bitband regions, and bitband alias regions, in programming order. There is no need to use memory barriers
ARM Cortex-M0 and Cortex-M0+ processors do not have bit-banding features. bus wrapper
The bitbanding feature can be added to Cortex-M0 and Cortex-M0+ processors using In this case, bus wrapper
the correct memory order must be maintained.
4 SCS peripheral access
SCS peripheral accesses, such as NVIC and debug accesses, generally do not require the use of memory barrier instructions: there is no need to insert memory barrier instructions between each SCS access, nor between SCS accesses and device memory accesses.
-
Processor architecture: The MPU in the memory area where SCS is located is configured as strongly ordered by default, with its own
DMB
functions (as shown in the figure below)
-
System implementation: There is no need to insert memory barrier instructions between each SCS access, nor between SCS accesses and device memory accesses.
Processor architecture requirements
- If you need to see the effect of a write to the SCS register immediately, you need
DSB
- No need to add memory barrier instructions between two adjacent SCS accesses
- If the next instruction must be executed after the previous instruction takes effect,
DSB
the instruction needs to be called at this time. Examples are as follows:
SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk; /* Enable deepsleep */
__DSB(); /* Ensure effect of last store takes effect */
__WFI(); /* Enter sleep mode */
---------------------
void Device_IRQHandler(void) {
software_flag = 1; /* Update software variable used in thread */
SCB->SCR &= ~SCB_SCR_SLEEPONEXIT_Msk; /* Disable sleeponexit */
__DSB(); /* Ensure effect of last store takes effect */
return;
}
Note that when the program accesses the Normal memory, DMB
the memory ordering cannot be guaranteed at the system architecture level when the SCS accesses it. If the operation of the program depends on the ordering between accesses to the SCS and accesses to ordinary memory, then a memory barrier instruction such as DMB
or DSB
. Below is an example:
STR R0, [R1] ; Access to a Normal Memory location
DMB ; Add DMB ensures ordering for ALL memory types
STR R3, [R2] ; Access to a SCS location
DMB ; Add DMB ensures ordering for ALL memory types
STR R0, [R1] ; Access to a Normal Memory location
- Not required if
[R1]
pointing to device memory area or strongly ordered memory areaDMB
System Implementation Requirements
In existing Cortex-M processors, omitting the DMB
or DSB
instruction will not cause an error because the SCS in these processors already contains DSB
the behavior:
- In Cortex-M0, M0+ processors, this behavior occurs immediately after the access is completed. After the SCS visit, it is not strictly required
DSB
. - In Cortex-M3 and M4 processors, the effect of the memory barrier takes effect immediately after accessing the SCS.
SLEEPONEXIT
For access to SCS memory, the use of instructions is usually not strictly required, except for updates in special casesDSB
.- If the exception handler disables
SLEEPONEXIT
features in the SCS before the exception returns, a DSB instruction is required after the SCR is written but before the exception returns. Refer to the previousDevice_IRQHandler
example.
- If the exception handler disables
Take a look at the performance of SCS access at the system implementation level:
The figure makes it clear that every access to SCS (including NVIC) has the associated DSB
effect of automatically adding data synchronization barriers to device/strongly ordered access. So for the previous 处理器架构要求
example, DSB
you can remove:
SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk; /* Enable deepsleep */
__WFI(); /* Enter sleep mode */
Notice:
- Existing Cortex-M processors do not reorder any data transfers, so
DMB
instructions do not need to be used - For Cortex-M3 and Cortex-M4 processors, if the instruction after the SCS load/store is a NOP instruction, or a conditional fail()
condition failed
instruction, the NOP instruction or conditional fail instruction can be executed in parallel with the SCS load/store instruction.
5 Enable interrupts through NVIC
Normally, NVIC operations do not require the use of memory barrier instructions. The code is as follows:
device_config(); // Setup peripheral
NVIC_ClearingPending(device_IRQn); // clear pending status
NVIC_SetPriority(device_IRQn, priority); // set priority level
NVIC_EnableIRQ(device_IRQn); // Enable interrupt
- When an interrupt event occurs, it can enter the pending state first, regardless of whether the interrupt is enabled or not.
As mentioned earlier, from an architectural point of view, every time SCS is accessed (NVIC belongs to SCS), access to device memory or strongly ordered memory will be inserted before two adjacent operations DMB
.
From the system implementation of the Cortex-M processor, every time SCS is accessed (NVIC belongs to SCS), access to device memory or strongly ordered memory will be inserted before two adjacent operations DSB
.
For the Cortex-M processor, due to its pipeline nature, if the interrupt is already pending, the processor can execute up to two additional instructions after enabling the interrupt in the NVIC before executing the interrupt service routine. As shown in the figure below:
Processor architecture requirements.
Different applications have different requirements:
- In normal NVIC operation, there is no need to use memory barriers
- Between operations between NVIC and peripherals without using memory barriers
- If a pending interrupt needs to be responded to immediately after enabling NVIC, you need to add one
DSB
, and then add anotherISB
If the instruction after the interrupt depends on the result of the pending interrupt, a memory barrier instruction should be added. An example of handling interrupts is as follows:
LDR R0, =0xE000E100 ; NVIC_SETENA address
MOVS R1, #0x1
STR R1, [R0] ; Enable IRQ #0
DSB ; Ensure write is completed
; (architecturally required, but not strictly
; required for existing Cortex-M processors)
ISB ; Ensure IRQ #0 is executed
CMP R8, #1 ; Value of R8 dependent on the execution
; result of IRQ #0 handler
If the above memory barrier instruction is omitted, CMP
it will be executed before the interrupt occurs, as shown in the following figure:
System implementation requirements
Different applications have different requirements:
- In normal NVIC operation, there is no need to use memory barriers
- If a pending interrupt needs to be responded to immediately after enabling NVIC, an
ISB
instruction needs to be added
Note : Since access to the NVIC(SCS) inherently has DSB
a memory barrier, omitting DSB
the instruction will still immediately identify enabled and pending interrupts.
6 Turn off interrupts through NVIC
Due to the Cortex-M's pipeline architecture, interrupts can be disabled by writing to the NVIC while entering the interrupt sequence (a series of operations and instructions executed by the processor when an interrupt event occurs). Therefore, it is possible for an interrupt handler to be executed immediately after the NVIC disables interrupts.
Processor Architecture Requirements
Depending on the application requirements, memory barriers need to be used:
- General NVIC programming does not require the use of memory barriers when IRQ is disabled
- There is also no need to use a memory barrier between NVIC programming and peripheral programming
- If you need to ensure that interrupts are not triggered after the NVIC disables them, you can add the instruction
DSB
followed by theISB
The following is an example of switching interrupt handling functions (modifying vector tables):
#define MEMORY_PTR(addr) (*((volatile unsigned long *)(addr)))
NVIC_DisableIRQ(device_IRQn);
__DSB();
__ISB();
// Change vector to a different one
MEMORY_PTR(SCB->VTOR+0x40+(device_IRQn<<2))=(void) device_Handler;
System implementation requirements
According to different application requirements, memory barriers need to be used:
- There is no need to use a memory barrier when disabling IRQ in normal NVIC programming
- There is also no need to use a memory barrier between NVIC programming and peripheral programming
- If you need to ensure that the interrupt is not triggered after the NVIC disables the interrupt, you can add
ISB
the instruction
7 Enable interrupts using CPS and MSR instructions
In a normal application, there is no need to add any barrier instructions after enabling interrupts using the CPS instruction:
_enable_irq(); /* 实际上是执行CPSIE I来清除PRIMASK */
If an interrupt is already pending, CPSIE I
the processor will handle the interrupt after the call. However, before the processor enters the exception handler, additional instructions may be executed:
- For Cortex-M3 or Cortex-M4, the processor can execute up to two additional instructions before entering the interrupt service routine
- For Cortex-M0, the processor can execute up to one additional instruction before entering the interrupt service routine
As shown in the figure below:
Processor architecture requirements
- If it is necessary to ensure that pending interrupts are recognized before subsequent operations are performed, the instruction should be
CPSIE i
used afterwardsISB
, as shown in the following figure:
- If you want to allow a pending interrupt to occur between two critical section tasks, you can use
ISB
instructions to achieve this. The code looks like this:
__enable_irq(); // CPSIE I : Enable interrupt
__ISB(); // Allow pended interrupts to be recognized
__disable_irq(); // CPSID I : Disable interrupt
Another typical example is:
the timing diagram is as follows:
- When using
MSR
instructions to enable interrupts, the requirements are the same as above
System Implementation Requirements
In Cortex-M processors:
- If it is necessary to ensure that pending interrupts are recognized before subsequent operations are performed, the instruction should be
CPSIE i
used afterwardsISB
. This is the same as the processor architecture requirement - There is an exception
CPSIE
followed byCPSID
, but in Cortex-M processors, there is no need to insert betweenCPSIE
and . code show as below:CPSID
ISB
The timing diagram is as follows:
In system implementation requirements, there is no need to add memory barrier instructions between __enable_irq()
and . __disable_irq()
However, within the processor architecture requirements, if interrupts need to be identified between CPSIE
and instructions, then instructions CPSID
are required .ISB
- When using
MSR
instructions to enable interrupts, the requirements are the same as above
Depending on the processor architecture requirements , in some cases, instructions need to be added if it is necessary to ensure that interrupts are correctly recognized ISB
. This is because in some specific processor architectures, enabling and disabling interrupts may require additional synchronization to ensure their correctness. ISB
Therefore, using directives is a way to ensure correct behavior depending on architectural requirements . In the absence of a memory barrier added to the system implementation requirements , this operation is already handled reasonably in the specific architecture, so no additional memory barrier is required. In the code just now, according to the specific system implementation requirements , it does not need to add memory barrier instructions between __enable_irq()
and . __disable_irq()
This means that in a specific processor implementation, interrupt enabling and disabling operations are already properly synchronized at the hardware level without the need for additional memory barrier instructions.
8 Use CPS and MSR instructions to turn off interrupts
CPSID
Instructions synchronize themselves in the instruction stream, eliminating the need CPSID
to insert memory barrier instructions later.
Processor architectural requirements
No need to use memory barriers.
System Implementation Requirements
No need to use memory barriers.
- When using
MSR
instructions to turn off interrupts, the requirements are the same as above
9 Disable peripheral interrupts
When an interrupt is disabled on a peripheral, additional time may be required due to the many possible sources of latency in the system. The following figure shows several different sources of latency:
Even after the peripheral is disabled, an interrupt request from the disabled peripheral may be received for a short period of time.
Processor architecture requirements:
No requirements, everything is determined by the following system implementation requirements.
System Implementation Requirements
Latency depends on the device. For most cases, if the delay in the IRQ synchronizer is small, the following steps can be used to disable interrupts:
CONTROL
Disable peripheral interrupts by writing to its control register- Read the peripheral's control register to ensure it has been updated
- Disable IRQ in NVIC
- Clear IRQ pending status in NVIC
- Read IRQ pending status. If IRQ pending is set, clear the IRQ pending status in the peripheral, and then clear the IRQ pending status in the NVIC again. This step must be repeated until the NVIC IRQ pending status remains clear.
This sequence of steps works on most simple microcontroller devices and can successfully disable interrupts. However, due to various latency factors that can occur within the system, it is recommended to contact the chip vendor or manufacturer for support.
10 Change the priority of interrupts
The priority SCS
setting is determined NVIC
by Priority Level
the register in. For Cortex-M3 or Cortex-M4 processors, priority levels can be changed dynamically. However, for ARMv6-M processors, such as Cortex-M0 or Cortex-M0+, dynamically changing the priority of enabled interrupts or exceptions is not supported. Priority should be set before interrupts are enabled.
Processor Architecture Requirements
Since SCS
it is strongly ordered memory, NVIC
the configuration does not require memory barriers. However, after changing the interrupt priority, if the interrupt is enabled and the interrupt is required to execute at the new priority level, the DSB
and ISB
instruction should be inserted after it.
- Note: On ARMv6-M processors, the priority level of interrupts should only be changed if interrupts are disabled, otherwise the results are unpredictable
If the next instruction is CPSIE
or MSR
, according to the processor architecture requirements, an instruction should be inserted DSB
, and then another ISB
instruction should be inserted (if you want a pending interrupt to be recognized immediately, call it ISB
, otherwise it does not need to be called). Such an operation sequence can ensure the correct switching of interrupt status and the sequential execution of instructions to prevent inconsistent interrupt status or out-of-order execution of instructions.
System Implementation Requirements
In the Cortex-M processor, accessing the interrupt priority register itself has DSB
barriers because SCS
it is a strongly ordered memory. In Cortex-M3 or Cortex-M4 processor:
- If you need to immediately identify changes in priority, you need to use
ISB
the directive - If a change in priority does not need to be recognized immediately before subsequent operations, there is no need to insert a memory barrier instruction
- If the next operation is
SVC
an exception, there is no need to insert a memory barrier instruction.
For Cortex-M3 or Cortex-M4 processors, if a change in priority level may result in a new interrupt nesting (higher priority than the currently executing interrupt), And if we want to execute this interrupt immediately, we need to insertISB
instructions. Otherwise, due to pipeline reasons, up to two more instructions may be executed.
11 Vector table configuration-VTOR
In Cortex-M3 and Cortex-M4 processors, the location of the vector table is determined by the setting SCS
in VTOR
( Vector Table Offset Register
).
Processor Architecture Requirements
Architecturally speaking, VTOR
if you want to raise an exception immediately after a change and use the latest vector table settings, you should use DSB
instructions.
System Implementation Requirements
In the Cortex-M3, Cortex-M4 and Cortex-M0+ processors, access to the SCS
processor itself has DSB
a barrier, so there is no need to insert a DSB instruction.
- Cortex-M0 processor does not
VTOR
12 Vector table entry configuration
This refers to updating individual entries in the vector table ( entry
).
Processor architectural requirements
If the vector table is located in RAM (such as SRAM/SDRAM), whether by VTOR
relocation or through the device-dependent memory remapping mechanism, architecturally speaking, after updating the vector table entry, if you want to immediately To enable exceptions, you need to use memory barrier instructions. As shown in the figure below:
If the next instruction is to access RAM, another DMB
instruction is needed:
that is to say, if the vector table is stored in ordinary memory instead of strongly ordered memory, memory barrier operations need to be considered.
System Implementation Requirements
In Cortex-M processors, omitting the DSB
OR DMB
instruction does not cause any problems when modifying vector entries because the sequence of exception entries does not start until the last memory access is completed.
13 Memory mapping changes
Many microcontrollers include a device-specific memory remapping feature that allows the memory map to be changed at runtime by programming a configuration register that should be placed in device memory ( ) device memory
. Whether memory barrier instructions are required during changes to memory map configuration depends on the following factors:
- Whether the affected memory space covers the program code, that is, whether it includes instructions.
- A device-specific data path between the processor and memory configuration registers, such as a write buffer (the write buffer needs to go between the CPU and the registers).
Processor Architecture Requirements
The requirements discussed here apply: - There are no device-specific write buffers that affect the memory remapping control register, other than any internal write buffers within the processor
- No additional hardware latency in memory map switching
Architecturally speaking, memory barrier instructions should be inserted before and after making memory map changes:
... ; application code before switching
DSB ; Ensure all memory accesses are completed
STR <remap>, [<remap_reg>] ; Write to memory; map control register
DSB ; Ensure the write is completed
ISB ; Flush instruction buffer (optional, only required if
; the memory map change affects program memory)
... ; application code after switching
If the affected memory is not used in any program code, DSB
instructions need to be inserted after the memory map change, but ISB
the instructions can be omitted.
System Implementation Requirements
The requirements discussed here also apply:
- There are no device-specific write buffers that affect the memory remapping control register, other than any internal write buffers within the processor
- No additional hardware latency in memory map switching
In Cortex-M processor:
- No DSB or DMB instructions are required before making a memory map change because these processors do not allow two sequences of write operations to overlap
- After remapping, a sequence of DSB and then ISB is required to ensure that the program code is fetched using the latest memory map
In this case study, two assumptions are made. If these assumptions are invalid, for example, if the data path between the processor and the memory control registers contains additional system-level write buffers, then the memory barrier instructions cannot guarantee that the transfer is completed. in this case:
- Read operations can be performed from previously accessed areas to ensure that the write buffer is cleared. If multiple write transfers have been issued to various parts of the system, multiple read operations may be required to ensure that all write buffers are flushed.
- Alternatively, the microcontroller or SoC may have a status register that indicates if there are any transfers in progress and notifies when memory remapping is complete. This allows program code to account for additional hardware delays in memory remapping logic if needed.
Specific differences still have to consult different chip manufacturers in detail.
14 Enter sleep mode
In Cortex-M processors, sleep mode can be entered using the WFI
and WFE
instruction.
Processor Architectural Requirements
Architecturally, instructions should be used before executing the WFI
or WFE
instruction DSB
:
System Implementation Requirements
For simple designs that do not include a system-level write buffer, there is no need to use memory before entering sleep mode on Cortex-M3 (r2p0 or later), Cortex-M4, Cortex-M0+ and Cortex-M0 processors Barrier command. This is handled by the processor itself. The situation is more complicated if the internal bus contains a system-level write buffer that is external to the processor. In this case, just using DSB
the instruction may not be enough because the system control logic may turn off the clock before the buffer write is completed.
The clock signal being turned off may not cause an error, depending on the system-level design, the sleep operation used, and the peripherals being accessed before entering sleep mode. It is recommended to contact the chip supplier or manufacturer for device details. This problem can usually be solved by adding a dummy read operation to the write buffer to ensure that the write buffer is cleared. The image below shows a possible workaround:
15 self-start
The Cortex-M processor has a self-start( self-reset
) function. A system reset can be triggered via bits AIRCR
in the register . SYSRESETREQ
In the CMSIS library, you can use C functions NVIC_SystemReset(void)
to use this functionality.
Processor architecture requirements
Before self-starting, DSB
instructions need to be used to ensure that all outstanding transfers have been completed, and CPSID I
interrupts can be turned off. This is optional and can prevent an enabled interrupt request from being triggered during the self-starting process. .
System Implementation Requirements
On Cortex-M processors, instructions are optional if not CPSID
used DSB
. Because access to SCS
the buffer is already DSB
barrier-free, autostart cannot begin until the write operation is complete. As shown in the figure below:
If instructions are used CPSID
, they should be inserted DSB
to ensure that CPSID
they are not executed until the previous transfer has completed. This way, if a previous transfer caused it imprecise bus fault
, it will happen before interrupts are disabled.
- No exceptions on ARMv6-M
bus fault
and therefore not available on Cortex-M0 processors
If the system has a write buffer at the bus level, you can perform a virtual read operation in the write buffer to ensure that the CPSID
system-level write buffer has been cleared before executing instructions and performing self-start. As shown below:
If you are using CMSIS 2.0 or higher, NVIC_SystemReset(void)
the function already contains DSB
instructions.
16 CONTROL register
CONTROL
The register is one of the special registers implemented in the Cortex-M processor and it can be accessed through the MSR
and MRS
instructions.
Processor Architectural Requirements
Architecturally, instructions CONTROL
should be used after modifying a register ISB
. Below is a code that switches from privileged execution to unprivileged execution.
- This operation is not supported in Cortex-M0
MOVS R0, #0x1
MSR CONTROL, R0 ; Switch to non-privileged state
ISB ; Instruction Synchronization Barrier
...
ISB
Make sure to fetch at the correct privilege level, as shown in the image below:
We can also use CONTROL
registers to select which stack pointer to use in thread mode.
System Implementation Requirements
In the Cortex-M processor, CONTROL
not executing ISB
instructions after writing to a register will not cause a program error unless you change the privilege level and the previous privilege level has prefetched the following instructions. Directives are required if and only if you need to use the correct permission level for subsequent instructions ISB
.
17 MPU programming
MPU is an optional feature of Cortex-M0+, Cortex-M3 and Cortex-M4 processors.
Processor architecture requirements
- The MPU configuration registers are located in the SCS, so there is no need to insert memory barrier instructions between each step of MPU programming
- Strongly ordered memory does not enforce the order associated with normal memory accesses. In the architectural design, the MPU programming sequence needs to be executed before
DMB
, and after the MPU programming is completedDSB
to ensure that all settings are visible to all buses. - If a change in MPU settings affects program memory, a
ISB
directive should also be added to ensure instructions are re-fetched with the updated MPU settings
If you perform the MPU programming step in an exception handler, no ISB
instructions are needed because the exception entry and exit boundaries are already there ISB
. For example, in an application running an RTOS, PendSV
the user thread's MPU locale can be updated during a context switch within an exception handler. Switch the exception exit sequence between PendSV and user threads to ensure MPU settings take effect. This applies to the architectural behavior and system implementation of current Cortex-M processors.
System implementation requirements
- On Cortex-M0+, Cortex-M3 and Cortex-M4 processors, omitting
DMB
instructions before entering MPU programming code does not cause problems; - Omitting instructions after completing the MPU programming code
DSB
will not cause problems. - If a change to the MPU settings only affects the memory holding data and not the memory holding the program, no instruction is required on the Cortex-M processor
ISB
. Instructions are required if subsequent instructions need to be fetched using the new MPU settingsISB
.
18 multi-master system
If you want your code to run normally on multiple systems, that is, consider the portability of the code under different architectures, then using memory barrier instructions is necessary.
Processor Architecture Requirements The OR instruction is needed when dealing
with shared data and needs to ensure that their order in memory is not altered or messed up . For example, the instruction needs to be used before initiating a DMA operation .DMB
DSB
DMB
- You can also use
DSB
insteadDMB
If not DMB
, architecturally the two stores may be reordered or overlapped, such as DMA
possibly starting before the data update is complete. Another multi-master example is the communication of information in shared memory between two processors. When passing data to another program running on a different processor, the data is typically written to shared memory and then a software flag is set in the shared memory. In this case, the DMB
or DSB
instruction should be used to ensure correct memory ordering between the two memory accesses:
the interaction between the two processors is not limited to shared memory. Another possible interaction is event communication (such as message queue). In this case, DSB
instructions may be needed to ensure that the correct order between memory transfers and events is preserved.
System Implementation Requirements When the or instructions are removed from the first two diagrams of
the previous processor architecture requirements , no error will occur because the Cortex-M processor does not reorder memory transfers and does not allow two write transfers to overlap.DMB
DSB
In Cortex-M3 and Cortex-M4 processors, the last picture in the processor architectural requirementsDSB
requires the use of instructions. But in Cortex-M0 processor, omitting the DMB
or DSB
instruction will not cause any error in these three examples because there is no write buffer in Cortex-M0 processor.
19 Semaphores and mutexes (single-core and multi-core)
Semaphores and mutex operations are essential in many operating systems. They can be used in either a single-processor environment or a multi-processor environment.
In a multiprocessor environment, semaphore operations require software variables to be placed in shared memory among multiple processors. To ensure correct operation, memory barrier instructions should be used. If a cache is present in a multiprocessor system, you must ensure that the correct cache configuration is used so that the data in shared memory is consistent across all processors.
Processor architecture requirements
The DMB instruction should be used in semaphore and mutex operations. The following example shows a simple code to acquire the lock. After acquiring the lock, you need to use the DMB instruction:
/* Note: __LDREXW和__STREXW是CMSIS函数 */
void get_lock(volatile int *Lock_Variable)
{
int status = 0;
do {
while (__LDREXW(&Lock_Variable) != 0); // Wait until Lock_Variable is free
status = __STREXW(1, &Lock_Variable); // Try to set Lock_Variable
} while (status!=0); //retry until lock successfully
__DMB();
return;
}
Likewise, the code that releases the lock should have a memory barrier at the beginning:
void free_lock(volatile int *Lock_Variable)
{
__DMB(); // Ensure memory operations completed before
Lock_Variable = 0;// releasing lock
return;
}
This is done to prevent other threads from "prematurely" releasing the lock due to pipeline reasons before the lock is released, so that they can access the shared resource.
System implementation requirements
On microcontroller devices using Cortex-M3 and Cortex-M4 processors, omitting DMB
instructions in semaphore and mutex operations does not cause an error. But it might go wrong in the following cases:
- Processor has cache
- Software is used on multi-core systems.
ARM recommends the use of instructions for semaphore and mutex operations in operating system design DMB
.
- Cortex-M0 and Cortex-M0+ processors do not have mutually exclusive access instructions
20 Self-modifying code
Usually our code is static and cannot be modified. But in fact, the program is allowed to modify its own code at runtime, and the modified code will be executed immediately, thereby changing the behavior of the code. This situation is generally used for: anti-cheating (improving the complexity and security of the program), encryption and decryption (improving data security).
If your program contains self-modifying code, you need to use a memory barrier if the modified program code is to be executed shortly after the modification. Since program code can be prefetched, DSB
instructions should be executed and then executed ISB
to ensure the pipeline is flushed.
Processor Architectural Requirements The architectural requirements are instructions that immediately follow a
use instruction after modifying program memory .DSB
ISB
STR <new_instr>, [<inst_address1>]
DSB ; Ensure store is completed before flushing pipeline
ISB ; Flush pipeline
B <inst_address1> ; Execute updated program
The following figure shows the memory barrier instructions required to meet processor architectural and implementation requirements for self-modifying code.
If a cache exists in the system, cache flush operations should be made to ensure that the instruction cache is updated.
System Implementation Requirements
In general, after modifying the program memory, it needs to be used first DSB
and then used ISB
. This can be omitted if there is no write buffer or cache in the processor or system, such as a Cortex-M0 based microcontroller DSB
.
Cortex-M3 and Cortex-M4 processors can prefetch up to six instructions. If an application executes an instruction shortly after modifying it in program memory, the previous instruction may be used. If the instructions are not used for a period of time after modification, the program may work correctly, but this is not guaranteed.
Some Cortex-M3 and Cortex-M4 designs may have implementation-specific program cache to speed up program memory accesses. Additional steps may be required after modifying program code to ensure that the program cache is cleared.
Summarize
In the ARM architecture, memory barriers are a mechanism used to ensure that programs execute in the expected order in a multi-core or multi-thread environment. ARM defines three memory barrier instructions, namely DMB
(data memory barrier), DSB
(data synchronization barrier) and ISB
(instruction synchronization barrier).
DMB
Instructions are used to ensure the order of memory accesses. In multi-core processors, the caches of different cores may cause data consistency problems. DMB
Instructions add barriers between multiple cores to ensure that the order of execution of instructions is consistent with the order of memory access and avoid disordered reading and writing of data.
DSB
Instructions are used to ensure instruction completion and data synchronization. It ensures that DSB
all instructions before the instruction have completed execution before executing DSB
the instructions after the instruction. This can avoid out-of-order execution of instructions and out-of-order reading and writing of data, ensuring the order of execution.
ISB
Instructions are used to ensure synchronization of instructions. It flushes all instruction caches and pipelines, causing execution of the instruction sequence ISB
to restart from the instruction after it. This ensures that ISB
all instructions before the instruction have been executed, and all caches during execution are cleared, so that the execution results of the instructions are consistent with expectations.
To sum up, ARM's memory barrier mechanism ensures the correct sequence execution of programs in a multi-core or multi-thread environment through // instructions DMB
. These instructions provide synchronization and sequence guarantees of memory and instructions, ensuring program correctness and reliability.DSB
ISB