高效率代码技巧

语音增强和语音识别

1.对齐memory,高效利用cache line,尽可能减少取数次数

/* |alignment| is the byte alignment and MUST be a power of two. */

```

struct AlignedPtr* AllocAlignedPointer(int alignment, int bytes) {
  struct AlignedPtr* aligned_ptr;
  unsigned long raw_address;

  aligned_ptr = (struct AlignedPtr*) malloc(sizeof(*aligned_ptr));
  aligned_ptr->raw_pointer_ = malloc(bytes + alignment);
  raw_address = (unsigned long) aligned_ptr->raw_pointer_;
  raw_address = (raw_address + alignment - 1) & ~(alignment - 1);
  aligned_ptr->aligned_pointer_ = (void*) raw_address;

  return aligned_ptr;
}

void FreeAlignedPointer(struct AlignedPtr* pointer) {
  free(pointer->raw_pointer_);
  free(pointer);

}

```

如下例子:

```

struct AlignedPtr* in_aligned;
struct AlignedPtr* out_aligned;

in_aligned = AllocAlignedPointer(32, sizeof(*in) * fft_size);
out_aligned = AllocAlignedPointer(32, sizeof(*out) * (fft_size + 2));

#include <stdlib.h>
#include <stdio.h>

void* aligned_malloc(size_t required_bytes, size_t alignment)
{
    void* p1; // original block
    void** p2; // aligned block
    int offset = alignment - 1 + sizeof(void*);
    if ((p1 = (void*)malloc(required_bytes + offset)) == NULL)
    {
       return NULL;
    }
    p2 = (void**)(((size_t)(p1) + offset) & ~(alignment - 1));
    p2[-1] = p1;
    return p2;
}

void aligned_free(void *p)
{
    free(((void**)p)[-1]);
}

void main (int argc, char *argv[])
{
    char **endptr;
    int *p = aligned_malloc (100, strtol(argv[1], endptr, 10));

    printf ("%s: %p\n", argv[1], p);
    aligned_free (p);
}
```

2. write-back vs write-through

write-through always write to cache and system memory

write-back write to cache. Only when cache lines is replaced, write to system memory.

在多核共享cache时,类似DSP等这种架构时需要注意write-back方式可能带来cache不一致的问题。

3.__restrict__关键词

__restrict__关键词可以帮助编译器对代码进行优化,restrict 关键词可以让编译器推导出buffer的overlap与否而进行代码优化。

扫描二维码关注公众号,回复: 3826533 查看本文章

4.inline的使用

对于小于十行的代码可以考虑使用inline化,inline带来性能提升的前提是代码里循环要少,尽量没有,这样可以避免函数调用带来的入栈出栈开销,这在ARM上是常见的,在有些架构(DSP)上,有些有stack window 寄存器以避免栈操作开销;对于较大的函数体如果使用inline的化,可能导致整个代码的size比较大,进而反而导致性能上反而有所下降。

猜你喜欢

转载自blog.csdn.net/shichaog/article/details/80925812