Introduction to Deep Learning and Neural Networks (1)

1. Deep Learning and Neural Networks

1.1 Introduction to Deep Learning

Target:

  1. Know what is deep learning
  2. Know the difference between deep learning and machine learning
  3. Be able to tell the main application scenarios of deep learning
  4. Know common frameworks for deep learning

1.1.1 The concept of deep learning

Deep learning is a branch of machine learning. It is an algorithm based on artificial neural network to learn features of data.

1.1.2 The difference between machine learning and deep learning

  1. Difference 1 (feature extraction):
    Machine learning requires a manual feature extraction process; deep learning does not have a complicated manual feature extraction process, and the feature extraction process can be automatically completed through a deep neural network .
  2. Difference 2 (data volume):
    Deep learning requires a large number of training data sets, which will have higher effects; deep learning training deep neural networks requires a lot of computing power because there are more parameters.

Machine learning: less data, not particularly good results
Deep learning: more data, better results

1.1.3 Application Scenarios of Deep Learning

1.1.3.1 Image recognition

  1. object recognition
  2. scene recognition
  3. Face detection and tracking
  4. Face ID authentication

1.1.3.2 Natural Language Processing Technology

  1. machine translation
  2. text recognition
  3. chat conversation

1.1.3.3 Speech Technology

  1. Speech Recognition

1.1.4 Common Deep Learning Frameworks

At present, there are many common deep learning frameworks in enterprises (TensorFlow, Caffe2, Keras, Theano, PyTorch, etc.).
Among them, TensorFlow and Keras are produced by Google, and there are many users, but the syntax is relatively obscure and different from that of python. Generally speaking, it is relatively difficult for beginners to get started.
Therefore, PyTorch will be used more frequently. It has the same syntax as python, and the entire operation is similar to Numpy. Moreover, PyTorch uses dynamic calculations, which will make code debugging easier.

1.2 Introduction to Neural Networks

Target

  1. Know the concept of neural network
  2. know what a neuron is
  3. Know what is a single layer neural network
  4. Know what a perceptron is
  5. Know what is a multilayer neural network
  6. Know what the activation function is and what it does
  7. Understand the idea of ​​neural networks

1.2.1 The concept of artificial neural network

Artificial neural network, referred to as neural network or neural network for short, is a mathematical model that imitates the structure and function of a biological neural network (brain), and is used to estimate or approximate functions.
Like other machine learning methods, neural networks have been used to solve a variety of problems, such as machine vision and speech recognition, that are difficult to solve by traditional rule-based programming.

1.2.2 The concept of neurons

In a biological neural network, each neuron is connected to other neurons, and when it "excites", it sends chemicals to the connected neurons, thereby changing the potential in these neurons; if the potential of a neuron exceeds Once a "threshold" is reached, it is activated, that is, "excited," sending chemicals to other neurons.
In 1943, the MP neuron model connected many such neurons in a certain hierarchical structure to obtain a neural network.

The basic units in the neural network are connected to each other to form a neural network

A simple neuron is shown below:
insert image description here
where:

  1. a1, a2...an are the components of each input
  2. w1, w2...wn are the weight parameters corresponding to each input component
  3. b is bias
  4. f is the activation function, common activation functions are tanh, sigmoid, relu
  5. t is the output of the neuron,
    which is expressed by a mathematical formula:
    t=f(w^T * A+b)
    It can be seen that the function of a neuron is to obtain the inner product of the input vector and the weight vector, and then pass through a nonlinear transfer The function gets a scalar result.

1.2.3 Single-layer neural network (uncommon)

It is the most basic form of neuron network, which is composed of a finite number of neurons, and the input vectors of all neurons are the same vector. Since each neuron produces a scalar result, the output of a single layer of neurons is a vector whose dimension is equal to the number of neurons.
The schematic diagram is as follows:
insert image description here

1.2.4 Perceptron (common two-layer neural network)

The perceptron is composed of two layers of neural networks . The input layer receives the external input signal and transmits it to the output layer (output: +1 positive example, -1 negative example) . The output layer is the role of the MP neuron
insert image description here
perceptron:
an n-dimensional vector space Use a hyperplane to divide it into two parts. Given an input vector, the hyperplane can determine which side of the hyperplane the vector is on. When the input is obtained, it is positive or negative. Corresponding to the 2-dimensional space, a straight line divides the plane . into two parts.

A simple two-category model, given a threshold, to determine which part the data belongs to

1.2.5 Multilayer neural network

A multi-layer neural network is obtained by superimposing a single-layer neural network, so the concept of a layer is formed . A common multi-layer neural network has the following structure:

  • In the input layer, many neurons accept a large number of input messages. The incoming message is called the input vector.
  • In the output layer, messages are transmitted, analyzed, and weighed in neuron links to form output results. The outputted messages are called output vectors.
  • The hidden layer, referred to as "hidden layer", is each layer composed of many neurons and links between the input layer and the output layer. There can be one or more hidden layers. The number of nodes (neurons) in the hidden layer is variable, but the more the number is, the more obvious the nonlinearity of the neural network is, so the robustness of the neural network is more significant.
    The schematic diagram is as follows:
    insert image description here
    Concept: Fully connected layer
    Fully connected layer: Each neuron in the current layer and the previous layer is connected to each other. We call this time the fully connected layer.
    insert image description here
    (It is a linear change without considering the activation function. The so-called linear change is translation (+b) and scaling (*W))

1.2.6 Activation function

The activation function was mentioned in the previous neuron, so let’s briefly introduce it.
Suppose we have such a set of data, triangles and quadrilaterals. We need to divide them into two categories as shown in the
insert image description here
figure above. The right side is the sigmoid function. For the result of the perceptron, pass The sigmoid function is used for processing.
If appropriate parameters w and b are given, a suitable curve can be obtained, and the nonlinear segmentation of the initial problem can be completed. Therefore, a very important function of the activation function is to increase the nonlinear segmentation ability of the model .
Common activation functions:
insert image description here

From the picture we can see:

  • sigmoid will only output positive numbers, and the output change rate close to 0 is the largest

  • The difference between tanh and sigmoid is that the tanh output can be negative

  • Relu is that the input can only be greater than 0. If the input contains negative numbers, Relu is not suitable. If your input is in image format, Relu is very commonly used, because when the pixel value of the image is used as input, the value is [0,255]. In addition to increasing the nonlinear segmentation ability of the model
    mentioned above , the function of the activation function also has

  • Improve model robustness (robustness)

  • Alleviate the vanishing gradient problem

  • Accelerate model convergence, etc.
    (just understand these)

a: linear
i: system: function, f, model, f(x)=y
ii: linear if these two conditions are met: f(x1+x2)=y1+y2; f(kx1)=ky1
means both can be satisfied Additive, and multiplied by a value, the result is still a value. The two conditions are linear
b: Function: increase the nonlinear segmentation ability of the model; provide the robustness of the model; alleviate the gradient disappearance; accelerate the convergence of the model
c: common Activation function:
i: sigmoid: (0,1)
ii: tanh: (-1,1)
iii: relu: max (0, x) images are used more
iv: elu: a (e^x-1) text used more

1.2.7 Neural Network Power

A boy wanted to find a girlfriend, so he implemented a girlfriend determination machine . As he grew older, his determination machine kept changing. At
the age of 14:
insert image description here
he finally found a girlfriend at the age of 15, but it took a while Later, he found that he had various unbearable habits, and finally decided to break up. During a period of vacancy, he found that finding a girlfriend is very complicated, and more conditions are needed to help him find a girlfriend, so at the age of 25, he modified the judgment machine again: the above judgment machine is actually a neural network
insert image description here
. It can accept the basic input, through the linear and nonlinear changes of the hidden layer, and finally to the output.
Through the above examples, I hope everyone can understand the idea of ​​deep learning:
output the most primitive and basic data, perform feature engineering through the model, learn more advanced features, and then determine the appropriate parameters through the incoming data, so that model to better fit the data.
This process can be understood as a blind man touching an elephant, multiple people touch together, multiply the result by the appropriate weight, and make appropriate changes to make it closer to the target value. The whole process only needs to input the basic data, and the program will automatically find the appropriate parameters.

Guess you like

Origin blog.csdn.net/weixin_45529272/article/details/127889165