Deep learning basic training process

Deep learning basic training process

foreword

This article is just a record of my study notes, some of the pictures in the article are from the Internet, if there is any infringement, please contact me to delete

1. Basics of deep learning

1.1 Deep Learning

Deep learning is a branch of machine learning, the purpose is to find a good set of parameters θ, so that the mathematical model represented by θ can learn the mapping relationship well from the training set: fθ: x→y, x, y∈D(train ), so as to use the trained fθ(x), x∈D(test) to predict new samples. Neural network belongs to a research branch of machine learning, which specifically refers to the model that uses multiple neurons to parameterize the mapping function fθ

insert image description here

  • Classification

    • unsupervised learning

    • supervised learning

      • Classification

      • return

1.2 The neural network mainly includes two processes

  • Forward propagation, calculate the loss

    • The forward propagation of the neural network is from the input layer to the output layer: the forward propagation starts from the input layer (Layer1), goes through layers of Layers, and continuously calculates the results obtained by the neural network of each layer and passes the activation function (generally used The output result of this layer of the Relu function), and finally get the output process and flow data
    • Generally, it is to build a network, stacking different layers of network layers (such as: fully connected layer), and building a function mapping relationship from the input layer to the output layer.
  • Backpropagation, update parameters

    • The forward propagation calculates the predicted value y¯, and the loss L(y¯,y) can be calculated according to the difference between y¯ and the real value y. Backpropagation is to reverse the direction according to the loss function L(y¯,y). Calculate the partial derivatives (gradients) of a, z, w, and b in each layer, and change the weight of each layer from the last layer layer by layer, that is, update the parameters. The core is the loss L for each layer The chain derivation rule for finding the gradient for each parameter of . What flows is the gradient
    • Back Propagation (BP) Algorithm, Updating Network Parameters, Model Optimization, SGD

θ = θ − η d ( J ( θ ) ) d ( θ ) \theta = \theta - \eta\frac{d(J(\theta ))}{d(\theta )};i=ithed ( i )d(J(θ))

insert image description here

  • flow chart

insert image description here

1.3 Network layer classification

  • fully connected layer

    • fc
  • activation layer

    • resume
  • convolutional layer

    • cnn
  • BN layer

1.4 Fully connected layer

  • An output node is connected to each input node

  • insert image description here

  • question

    • Too many parameters, resulting in slow calculation
    • For image information, it is easy to lose spatial information

1.5 Convolutional Neural Networks

  • convolutional layer

    • local correlation

        	- 只与周围的像素有关
      
    • Weight sharing

        	- 一个卷积核提取一种特征
      
  • Convolution calculationinsert image description here

    • Related hyperparameters

      • stride
      • padding
      • Convolution kernel size kernel
    • Output Size Calculation

      • hnew = (h+2*ph-k)/s+1
      • w新=(w+2*pw-k)/ s + 1
  • pooling layer

insert image description here

  • Downsampling an image, reducing image parameters, translation invariance

1.6 Image Processing Direction

  • Image Identification
  • Target Detection
  • semantic segmentation
  • instance segmentation

1.7 What is a deep learning framework

  • tensorlfow

    • General programming language python, easy to develop, many modules

    • tensorflow framework training

      • Scientific computing library, a package of python

      • Why use a frame

        • Complete backpropagation, automatic derivation

        • Provide basic API interface

          • Convolutional layer, fully connected layer, pooling layer, etc.
          • optimizer
    • environment

      • cpu

      • gpu

        • Parallel Computing, Matrix Operations

2 training

2.1 Input data (you need to process it yourself)

  • data normalization

    • The preprocessed data is limited to a certain range, thereby eliminating the adverse effects caused by singular sample data.
  • category code

    • one-hot

2.2 Forward propagation of the network layer (using API to build a network)

  • Data flows between layers, computing the output of each layer

2.3 Calculate loss (provided by API, select type)

  • MAE
  • MAE
  • cross entropy loss

2.4 Network layer reverse gradient update (framework completed)

  • bp algorithm
  • The gradient flows between layers, the gradient of the last layer is calculated first, and then the gradient is passed back

2.5 Update parameters (provided by API, you must select an optimizer)

  • Model parameters

    • The parameters learned by the network, that is, the parameters that need to be optimized
  • hyperparameters

    • Parameters (learning rate) set in advance, parameters that cannot be trained
  • Common optimization methods

    • SGD
    • Adam

3 predictions

insert image description here

  • Input data

  • Network Layer Forward Propagation

  • output result

Guess you like

Origin blog.csdn.net/qq_45723275/article/details/129137744