文章目录

1 英文缩写
2 overall architecture
3 pipelined CNN PE architecture
4 triple ping-pong memories
5 nearest-neighbor searching (NNS) processing-in-memory (PIM)

文献摘自A 9.02mW CNN-Stereo-Based Real-Time 3D Hand-Gesture Recognition Processor for Smart Mobile Devices

1 英文缩写

HGR: hand-gesture recognition手势检测
HMD: head-mounted displays头戴式设备
ToF: time-of-flight
PE: processing element
NNS: nearest-neighbor searching
PIM: processing-in-memory
CSE: CNN-stereo(立体) engine
ICP-PSO: iterative-closest-point/particle-swarm optimization-based（迭代最近点、粒子群优化）
IPE: ICP-PSO engine
FWD: forwarding
BWD: backwarding

2 overall architecture

In this paper, we describe an accurate, low power (<10mW), and real-time 3D HGR processor for smart mobile devices.

feature：

a piplined CNN processing element with a shift MAC operation
triple ping-pong buffers with workload balancing
nearest-neighbor searching (NNS) processing-in-memory (PIM) for high energy efficiency

CNN-stereo engine(CSE)

two line-streaming CNN cores
4 locally distributed memories
1 matching core

the CNN core

1 pipelined CNN PE
a local DMA
a forwarding/backwarding unit

ICP-PSE engine(IPE)

a NNS unit with 16-way parallel NNS PIMs
a hand-tracking unit

在这里插入图片描述

3 pipelined CNN PE architecture

The shift MAC operation with a 3×3 filter in consists of three stages

shifting feature maps and filters
element-wise multiplication
partia-sum accumulation
The line-streaming CNN operation is accelerated by the 7-stage pipelined CNN PE that processes 48 MACs per cycle with 96% core utilization

4 triple ping-pong memories

The hardware utilizes triple ping-pong memories to store feature maps, where each memory is accessed simultaneously to feed pipeline inputs, write back pipeline outputs, and to access an external interface, respectively.
在这里插入图片描述

为什么是3？
Instead of storing the entire feature maps on the chip, the line-streaming processing with only 3-to-5 lines of feature maps reduces 90.1% of required data that must be fetched from/to off-chip。

如何 balance workload?
The FWD/BWD units keep CNN core workloads identical throughout CNN processing and exchange feature-map boundary data with one another when local feature maps are fetched.

5 nearest-neighbor searching (NNS) processing-in-memory (PIM)

文献笔记（3）(2018ISSCC 13.4)