CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning - 代码天地

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning

其他 2018-06-25 19:12:25 阅读次数: 2

Agenda

Hardware 101: the Family

Hardware 101: Number Representation

Hardware 101: Number Representation

1. Algorithms for Efficient Inference

1.1 Pruning Neural Networks

Iteratively Retrain to Recover Accuracy

Pruning RNN and LSTM

pruning之后准确率有所提升：

Pruning Changes Weight Distribution

Trained Quantization

How Many Bits do We Need?

Pruning + Trained Quantization Work Together

Huffman Coding

Summary of Deep Compression

Results: Compression Ratio

SqueezeNet

Compressing SqueezeNet

1.3 Quantization

Quantizing the Weight and Activation

**Quantization Result**：选择8bit

1.4 Low Rank Approximation

Low Rank Approximation for Conv：类似Inception Module

Low Rank Approximation for FC :矩阵分解

1.5 Binary / Ternary Net

Trained Ternary（三元） Quantization

Weight Evolution during Training

Error Rate on ImageNet

1.6 Winograd Transformation

3x3 DIRECT Convolutions

Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs

3x3 WINOGRAD Convolutions：

Transform Data to Reduce Math Intensity

Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs
Winograd convolution: we need 16xC FMAs for 4 outputs: 2.25x fewer FMAs

2. Hardware for Efficient Inference

Hardware for Efficient Inference：

a common goal: minimize memory access

Google TPU

Roofline Model: Identify Performance Bottleneck

Log Rooflines for CPU, GPU, TPU

EIE: the First DNN Accelerator for Sparse, Compressed Model：
不保存、计算0值

EIE Architecture

Micro Architecture for each PE

Comparison: Throughput

Comparison: Energy Efficiency

3. Algorithms for Efficient Training

3.1 Parallelization

Data Parallel – Run multiple inputs in parallel

Parameter Update

参数共享更新

Model-Parallel Convolution – by output region (x,y)

Model Parallel Fully-Connected Layer (M x V)

Summary of Parallelism

3.2 Mixed Precision with FP16 and FP32

Mixed Precision Training

结果对比：

3.3 Model Distillation

student model has much smaller model size

Softened outputs reveal the dark knowledge

Softened outputs reveal the dark knowledge

3.4 DSD: Dense-Sparse-Dense Training

DSD produces same model architecture but can find better optimization solution, arrives at better local minima, and achieves higher prediction accuracy across a wide range of deep neural networks on CNNs / RNNs / LSTMs.

DSD: Intuition

DSD is General Purpose: Vision, Speech, Natural Language

DSD on Caption Generation

4. Hardware for Efficient Training

GPU / TPU

Google Cloud TPU

Future

Outlook: the Focus for Computation

猜你喜欢

转载自blog.csdn.net/u012554092/article/details/78484699

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning

【论文阅读】韩松《Efficient Methods And Hardware For Deep Learning》节选《Deep compression》

【论文阅读】韩松《Efficient Methods And Hardware For Deep Learning》节选《Learning both Weights and Connections 》

韩松博士毕业论文Efficient methods and hardware for deep learning论文详解

deep learning hardware guard

Kernel Methods for Deep Learning

A Full Hardware Guide to Deep Learning深度学习电脑配置

cs231n 学习 -- Lecture 8 Deep Learning Software

CNN笔记（CS231N）——深度学习软件（Deep Learning Software）

# Asynchronous Methods for Deep Reinforcement Learning

Asynchronous Methods for Deep Reinforcement Learning

Asynchronous methods for deep reinforcement learning论文--学习笔记

CS231n 8. Deep Learning Software 笔记 tensorflow数据类型

CS231n课程笔记：Leture8 Deep Learning Software

联邦学习——FedAvg《Communication-Efficient Learning of Deep Networks from Decentralized Data》论文笔记

论文笔记：Effective and Efficient Sports Play Retrieval with Deep Representation Learning

【deep learning】斯坦福CS231n—深度学习与计算机视觉(资料汇总)

Communication-Efficient Learning of Deep Networks from Decentralized Data

【CS231n】Deep Learning using Linear Support Vector Machines全文翻译

「Deep Learning」Note on Graph Neural Networks: A Review of Methods and Applications

【5分钟 Paper】Asynchronous Methods for Deep Reinforcement Learning

Deep learning II - I Practical aspects of deep learning - other regularization methods 其他正则化方法

Deep Learning 学习笔记

CS231n学习笔记--14. Reinforcement Learning

论文阅读：Face Recognition: From Traditional to Deep Learning Methods 《人脸识别综述：从传统方法到深度学习》

Deep Learning 简略笔记

deep learning实验笔记

【深度学习】Deep Learning

(Review cs231n) Optimized Methods

Deep Learning With Python 学习笔记

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

基本数据类型封装类比较 Java源码解读(一) 8种基本类型对应的封装类型

JS实现无缝滚动上

深入解析HashMap原理（基于JDK1.8）

mysql的连接池

关于.htc

linux下的ubuntu12.04图形界面

【数论】好推不好记的扩展欧几里德

设备树详解

cscope + tags 简单设置

xml学习

每日归档

更多

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)