C language pointer and memory copy practical experience

1. Background

Memory copy actions are very common. For example, in the postgraduate entrance examination 408 "Operating System", the data received from the peripheral device in the system buffer also needs to be copied to the user area. The process of copying to the user area is exactly memory copying.

Memory copy generally directly calls the memcpy function or directly writes a loop for assignment. The memcpy function can be called directly from the C library, which is indeed efficient in programming. But in actual operation, is it really the fastest?

2. Analysis

The internal implementation of memcpy is actually assigned by two pointers, and the offset is controlled by a loop. The code is as follows

void *memcpy(void *dest, const void *src, size_t count)

{undefined

if (NULL == dest || NULL == src || count <= 0)

return NULL;

while (count--)

*dest++ = *src++;

return dest;

}

It is not difficult to see that memcpy implements byte copying, that is, copying one byte at a time. This method does not fully return the performance of the processor in 16-bit and 32-bit processors. Under the premise of not considering the byte alignment problem, 16-bit and 32-bit processors can copy up to 2 and 4 bytes of data at a time, so the method of loop + pointer assignment is also used to copy, but can the speed be improved by casting the pointer to a 16-bit or 32-bit array pointer?

3. Experiment

Here is an example of copying the copy of the STM32 player from the decoding area to the buffer that is just being done.

void Fill_SAI_Buff0(void)//缓冲区填充
{
	if(buff0_attribute==NULL){
		return;
	}
	for(uint16_t count=0; count<SAI_TX_BUFF_HALF/4; count++)
	{
		((uint32_t *)SAI_BUFF)[count] = ((uint32_t *)buff0_attribute)[count];
	}
}
void Fill_SAI_Buff0(void)//缓冲区填充
{
	if(buff0_attribute==NULL){
		return;
	}        
    memcpy(SAI_BUFF, buff0_attribute, SAI_TX_BUFF_HALF);	
}

The two pieces of code are copying from the decoding area to the buffer by using memcpy and casting a pointer to a 32-bit array and cyclically assigning values. Among them, buff0_attribute is the decoding area, and SAI_BUFF is the buffer. The total number of bytes copied is SAI_TX_BUFF_HALF=4096.

After measurement, the implementation of memcpy takes about 40us, and the method of forcing conversion to 32 as an array pointer copy takes about 10us. The latter is significantly faster than the former. The measurement method is the author's previous article: KEIL5 debugging timing, measuring program running time, suitable for STM32\MK60\IM6U and other microcontrollers based on Cortex-M architecture processors_Fairchild_1947's Blog-CSDN Blog

4. Pay attention

 4.1 Pay attention to changes in the number of copies

The copy method of forced conversion to a 32-bit array pointer is indeed faster, but it should be noted that the change of the length of a single copy will lead to a change in the number of copies during operation, so it is necessary to pay attention to the setting of the upper limit of the number of cycles. For example, in the above example, the number of copies will be reduced by 4 times if the 8-bit array pointer is cast to a 32-bit array pointer.

4.2 Pay attention to the total length of the copy

For example, in the above example, the 8-bit array pointer is forcibly converted to a 32-bit array pointer. The premise that this example can be used normally is that the number of bytes to be copied by the 8-bit array is exactly an integer multiple of 4. If it is not, it may lead to out-of-bounds access or data loss.

4.3 Pay attention to the details of the use of mandatory conversion

This is also a very stupid mistake made by the author, and I would like to take this opportunity to share it with you.

void Fill_SAI_Buff1(void)//缓冲区填充
{
	if(buff1_attribute==NULL){
		return;
	}	
	for(uint16_t count=0; count<SAI_TX_BUFF_HALF/4; count++)
	{
		((uint32_t *)SAI_BUFF)[count+SAI_TX_BUFF_HALF/4] = ((uint32_t *)buff1_attribute)[count];
	}
SCB_CleanDCache_by_Addr((uint32_t *)(SAI_BUFF+SAI_TX_BUFF_HALF), SAI_TX_BUFF_HALF);//Cortex-M7处理器CACHE回写模式时必须使用	
}

When the second decoding area is copied to the buffer, since the addresses of the buffer are continuous, start copying directly by offsetting the address to the corresponding position. But now a cast is involved. When actually writing, I mistakenly confuse "(uint32_t *)(SAI_BUFF+SAI_TX_BUFF_HALF)" and "(uint32_t *)(SAI_BUFF)+SAI_TX_BUFF_HALF". The former is a forced conversion of (SAI_BUFF+SAI_TX_BUFF_HALF), while the latter is a forced conversion of (SAI_BUFF) and the offset SAI_TX_BUFF_HALF. The SAI_BUFF is 8 as an array, and the unit of the offset SAI_TX_BUFF_HALF is also a byte, so "(uint32_t *)(SAI_BUFF+SAI_TX_BUFF_HALF)" is correct. The offset of "(uint32_t *)(SAI_BUFF)+SAI_TX_BUFF_HALF" is actually in units of 32 bits (4 bytes), so the offset is wrong, so the second way of writing is wrong.

Guess you like

Origin blog.csdn.net/Fairchild_1947/article/details/122313424