FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
FINN：一个用于创建高性能可扩展二值神经网络推测器的框架

基本信息

发表日期：2016年12月
主要作者：Yaman Umuroglu；Nicholas J. Fraser；Giulio Gambardella
机构：Xilinx Research Labs；Norwegian University of Science and Technology；University of Sydney

主要内容

动机

基于CPU或者GPU的深度神经网络的计算开销太大，动辄几百MB(Meg Byte)的参数和几个甚至几十GFLOP，部分研究证明训练好的神经网络模型中存在冗余，其中一种冗余就是精度冗余，因此出现了低精度网络甚至二值网络。
FPGA特别适合二值数据的运算和存储(low-precision arithmetic and small memory footprint)，可以达到TOPS的水平。

技术背景

CNN
BNN
four architecture for hardware implementation of NN
a single processing engine, usually in the form of systolic engine
streaming architecture, consisting one processing engine per network layer
vector processor with instructions specific to accelerating the primitive operations of convolutions
neurosynaptic processor

面向应用

需要实时处理的嵌入式系统

主要贡献

Quantification of peak performance for BNNs on FPGAs using a roofline model.
A set of novel optimizations for mapping BNNs onto FPGA more efficiently.
A BNN architecture and accelerator construction tool, permitting customization of throughput.
A range of prototypes that demonstrate the potential of BNNs on an off-the-shelf FPGAs platform.

核心设计

A framework for mapping BNN to a flexible heterogeneous streaming structure
设计的主要内容：

architecture design: BNN映射在FPGA上的结构
BNN-specific operator optimization
popcount for accumulation: 用计数实现加法
Batchnorm-activation as threshold: 批归一化和激活用阈值实现
Boolen OR for Max-pooling: 布尔或运算实现最大值池化操作
设计流程
硬件库实现
Matrix-Vector-Threshold Unit:实现矩阵点乘
the sliding window unit for convolution: 用来为卷积操作编组输入特征图数据的单元
pooling unit: 池化单元, OR逻辑+streaming buffer
Folding: 网络折叠，折叠的主要对象是MVTU

基于FPGA的BNN性能和精度评估

性能上作者提出了一个对比案例，用FPGA的roofline模型评估基于AlexNet结构的二值网络和8位网络，对比案例中两种网络分类一张图片需要的操作数都是一样的1.4GOPS，然而两种网络的参数量不同，二值网络的参数量只有7.4MB，8位网络则需要50MB，在FPGA的roofline模型中二值网络很占优势，首先是二值操作的峰值性能66TOPS就是8位数值计算的16倍，其次二值网络

Paper Review: FINN: A Framework for Fast, Scalable Binarized Neural Network Inference