Pytorch basic exercises

introduction

From expert systems to machine learning
From traditional machine learning to deep learning
What can and cannot be done in deep learning

Overview of Deep Learning

Shallow Neural Networks: Biological Neurons to Single Layer Perceptrons, Multilayer Perceptrons, Backpropagation and Vanishing Gradients
Neural Networks to Deep Learning: Layer-wise Pre-training, Autoencoders and Restricted Boltzmann Machines

pytorch and installation, using Colaboratory

1. What is PyTorch (PyTorch is a python library, which mainly provides two advanced functions)

GPU-accelerated tensor computation
Deep Neural Networks Built on Inverse Automatic Derivation System

2. Install pytorch and create Colaboratory file

Install Google Chrome Install Ghelper (extension): http://googlehelper.net/
Create folders on Google Cloud Disk and associate Colaboratory applications (designed to help disseminate machine learning training and research results. It is a Jupyter notebook environment that can easily use Keras, TensorFlow, PyTorch, OpenCV and other frameworks for deep learning application development) , create a Colaboratory file, set up a free GPU.
Test that the code runs successfully.

pytorch basic operations

1. Define data
Generally, torch.Tensor is used to define data, and tensor means tensor, which is a general term for various forms of numbers.

import torch
x = torch.tensor（）
#（）可以是一个数，一维数组（向量），二维数组（矩阵），任意维度的数组（张量）

Please add a picture description
Tensor supports various types of data, including torch.float32, torch.float64, torch.float16, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64. There are many ways to create Tensor, including ones, zeros, eye, arange, linspace, rand, random.normal, uniform, randperm.
Please add a picture description
2. Define operations
Any operation that uses Tensor for various operations is a Function. In the end, Tensor still needs to be used for calculation, and the calculation is nothing more than:

Basic operations, addition, subtraction, multiplication and division, exponentiation and remainder
Boolean operations, greater than less than, maximum and minimum
Linear operation, matrix multiplication, modulus, determinant

Basic operations include: abs/sqrt/div/exp/fmod/pow, and some trigonometric functions cos/sin/asin/atan2/cosh, and ceil/round/floor/trunc. Boolean operations include: gt/lt/ge/
le /eq/ne, topk, sort, max/min
linear calculations include: trace, diag, mm/bmm, t, dot/cross, inverse, svd

Please add a picture description

个人想法及解读：之前用过numpy对数据进行处理，感觉和pytorch差不多，经过学习，了解到pytorch是将其作为numpy的替代品，以使用强大的GPU能力；而且pytorch是一个深度学习的研究平台，能够提供最大的灵活性和速度，这都是我之前没有接触过的，感觉pytorch功能很强大。

Spiral Data Classification

!wget https://raw.githubusercontent.com/Atcold/pytorch-Deep-Learning/master/res/plot_lib.py
#引入基本的库，然后初始化重要参数

import random
import torch
from torch import nn, optim
import math
from IPython import display
from plot_lib import plot_data, plot_model, set_default

# 因为colab是支持GPU的，torch 将在 GPU 上运行
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('device: ', device)

# 初始化随机数种子。神经网络的参数都是随机初始化的，
# 不同的初始化参数往往会导致不同的结果，当得到比较好的结果时我们通常希望这个结果是可以复现的，
# 因此，在pytorch中，通过设置随机数种子也可以达到这个目的
seed = 12345
random.seed(seed)
torch.manual_seed(seed)

N = 1000  # 每类样本的数量
D = 2  # 每个样本的特征维度
C = 3  # 样本的类别
H = 100  # 神经网络里隐层单元的数量

Please add a picture description

1. Build a linear model classification

Use the print (model) model output above, you can see that there are two layers:

The input of the first layer is 2 (because the feature dimension is mainly 2), and the output is 100;
The input of the second layer is 100 (the output of the previous layer), and the output is 3 (the number of categories)

As can be seen from the above illustration, the accuracy rate of the linear model can only reach about 50%, for such a complex data distribution. Linear models are difficult to achieve accurate classification.

2. Construct two-layer neural network classification
Please add a picture description

conclusion of issue

1. What are the characteristics of AlexNet? Why can it achieve better performance than LeNet?

AlexNet的特点
1.Alex的网络更深，一共有8层，参数很多；
2.Conv（特征层）+FC（输出层）的组合配置；
3.FC（输出层）占据绝对多数的模型参数，96%的参数全都集中在3层全连接层中，卷积层之占4%；
4.卷积层与Pooling层交替；
5.当分辨率（HW）减少时，需要增加通道数，保证有效特征数量；
6.多卷积核尺度。

AlexNet比LeNet取得更好的性能原因
1.AlexNet相当于是更大更深的LeNet，10*参数个数，260*计算复杂度；
2.AlexNet使用的是ReLU函数，传统的LeNet使用的是sigmoid函数。ReLU函数比sigmoid函数计算上更为简单（不需要求幂运算），且ReLU函数在不同的参数初始化方法下可以让模型更容易训练。
3.AlexNet新进入了丢弃法（Dropout），可以控制全连接层的模型复杂程度，常用于应对过拟合问题，其具体操作核心就是随机的丢弃某些层中的某些神经元，以此来降低后一层在计算中对上一层的过度依赖，起到正则化的作用。
4.使用了最大池化层maxpooling，避免了平均池化层的模糊化的效果，提升了特征的丰富性。

2. What is the role of the activation function?

激活函数就是在人工神经网络上的神经元上运行的函数，负责将神经元的输入映射到输出端。
主要作用是完成数据的非线性变换，解决线性模型的表达/分类能力不足的问题。
激活函数分为线性激活函数（线性方程控制输入到输出的映射）和非线性激活函数（非线性方程控制输入到输出的映射）。
常用的激活函数有（f（x）=x;Sigmoid,Tanh,ReLU,LReLU,PReLU,Swish等）

3. What is the vanishing gradient phenomenon?

梯度消失是指当神经网络层数增多时，越靠近输入层的层之间的权重无法得到有效修正（导数趋于0），从而得到的神经网络效果不好。
例如Sigmoid函数趋近0和1的时候变化率会变得平坦，也就是Sigmoid函数的梯度趋近于0，神经网络使用Sigmoid激活函数进行反向传播时，输出接近0或1的神经元（饱和神经元），其梯度趋近于0，这些神经元的权重不会更新，与此类神经元相连的神经元的权重也会更新的很慢，这就是梯度消失。

4. Is the neural network wider or deeper?

对于神经网络的深度和宽度，我之前只了解过深度，通常我们说的是神经网络越深，学习能力越强，但太深了会造成过拟合，并且对于浅层网络来说，过拟合会更加明显。
通过查阅资料，我看到目前的研究是模型性能对深度更加敏感，而调整宽度更有利于提升模型性能。深度代表了函数的表示能力，宽度关联了优化的难易（找到全局最优）程度。加宽网络比加深网络更容易训练，但增加网络的深度比增加网络的宽度更有利于提升性能。

5. Why use Softmax?

Soft函数用于多分类过程中，它将多个神经元的输出，映射到（0，1）区间内。
在多分类问题中，数据（x，y）中y服从多项分布，使用Softmax函数输出wx+b转化成概率p，具有更好的解释性，方便后续取阈值。

6. Which is more effective, SGD or Adam?

Adam和SGDM是两种深度学习优化器，分别在效率和精度上有着各自的优势。
Adam在前期优化速度较快，SGDN在后期优化精度较高。Adam的速度更快，但SGDM能够取得更好的效果。

Week 1 Deep Learning Fundamentals