语音增强和语音识别
1.对齐memory,高效利用cache line,尽可能减少取数次数
/* |alignment| is the byte alignment and MUST be a power of two. */
```
struct AlignedPtr* AllocAlignedPointer(int alignment, int bytes) {
struct AlignedPtr* aligned_ptr;
unsigned long raw_address;
aligned_ptr = (struct AlignedPtr*) malloc(sizeof(*aligned_ptr));
aligned_ptr->raw_pointer_ = malloc(bytes + alignment);
raw_address = (unsigned long) aligned_ptr->raw_pointer_;
raw_address = (raw_address + alignment - 1) & ~(alignment - 1);
aligned_ptr->aligned_pointer_ = (void*) raw_address;
return aligned_ptr;
}
void FreeAlignedPointer(struct AlignedPtr* pointer) {
free(pointer->raw_pointer_);
free(pointer);
}
```
如下例子:
```
struct AlignedPtr* in_aligned;
struct AlignedPtr* out_aligned;
in_aligned = AllocAlignedPointer(32, sizeof(*in) * fft_size);
out_aligned = AllocAlignedPointer(32, sizeof(*out) * (fft_size + 2));
#include <stdlib.h>
#include <stdio.h>
void* aligned_malloc(size_t required_bytes, size_t alignment)
{
void* p1; // original block
void** p2; // aligned block
int offset = alignment - 1 + sizeof(void*);
if ((p1 = (void*)malloc(required_bytes + offset)) == NULL)
{
return NULL;
}
p2 = (void**)(((size_t)(p1) + offset) & ~(alignment - 1));
p2[-1] = p1;
return p2;
}
void aligned_free(void *p)
{
free(((void**)p)[-1]);
}
void main (int argc, char *argv[])
{
char **endptr;
int *p = aligned_malloc (100, strtol(argv[1], endptr, 10));
printf ("%s: %p\n", argv[1], p);
aligned_free (p);
}
```
2. write-back vs write-through
write-through always write to cache and system memory
write-back write to cache. Only when cache lines is replaced, write to system memory.
在多核共享cache时,类似DSP等这种架构时需要注意write-back方式可能带来cache不一致的问题。
3.__restrict__关键词
__restrict__关键词可以帮助编译器对代码进行优化,restrict 关键词可以让编译器推导出buffer的overlap与否而进行代码优化。
4.inline的使用
对于小于十行的代码可以考虑使用inline化,inline带来性能提升的前提是代码里循环要少,尽量没有,这样可以避免函数调用带来的入栈出栈开销,这在ARM上是常见的,在有些架构(DSP)上,有些有stack window 寄存器以避免栈操作开销;对于较大的函数体如果使用inline的化,可能导致整个代码的size比较大,进而反而导致性能上反而有所下降。