(Study notes) The first lesson of Wu Enda's deep learning course—neural network and deep learning (to be continued)

(Study notes) The first lesson of Wu Enda's deep learning course-neural network and deep learning

Video link: https://www.bilibili.com/video/BV164411m79z?p=8&spm_id_from=pageDriver

An overview of deep learning in the first week

1. What is a neural network

Insert picture description here
​ In the housing price prediction problem, we train a curve as shown in the figure above based on the training data to fit the data as much as possible, and then use the curve to predict the housing price based on the size of the house. This curve function fitting housing prices can be seen as a very simple neural network .

Insert picture description here

​ Size is the input, price is the output, and the small circle in the middle is an independent neuron . The task of this neuron is to input size, complete linear calculation, take a value not less than 0, and finally get the output predicted price. A more complex neural network is formed by stacking such single neurons.
(The function represented by the above curve starts at 0 and then turns into a straight line. Such a function is called a ReLU function.)
Insert picture description here
Each circle in the above figure may represent a ReLU function or other non-linear functions (The family population is estimated based on the area of ​​the house and the number of bedrooms, the degree of pedestrianization can be estimated based on the zip code, and the quality of nearby schools can also be estimated based on the zip code) In fact, housing prices have a lot to do with what people care about. In this example, family population, degree of pedestrianization, and school quality can all help us predict housing prices. This is a neural network that uses multiple neurons.
Insert picture description here

​ On the premise that the characteristics of these inputs are known, the job of the neural network is to predict the corresponding housing price. The circle in the figure is also called the hidden unit of the neural network, which is responsible for calculating the input data and finally getting the predicted house price y.


Second, use neural networks for supervised learning

Supervised learning, You need to give the machine a training set with labels. So-called containinglabel, Is to tell the machine which class this input belongs to. Through training in the training set, the machine finally obtains a function that can be used for prediction, while unsupervised learning does not have this label. The following is a list of applications of neural networks for supervised learning.
Insert picture description here

​ From left to right in the figure below are standard neural networks, convolutional neural networks (CNN) and recurrent neural networks (RNN). CNN is mainly used for image processing, while RNN is mainly used for processing one-dimensional sequence data.
Insert picture description here

​ Machine learning is also appliedStructured datawithUnstructured data. Structured data is a database of data. For example, in housing price forecasting, you may have a database or data column that tells you the size of the room, the number of bedrooms... This is structured data, and each feature has a clear definition . The opposite is unstructured data, such as audio and images. Compared with structured data, it is actually difficult for computers to understand unstructured data. Through deep learning and neural networks, modern computers can better understand and interpret. Unstructured data, speech recognition, image recognition, natural language word processing and other technologies emerged at the historic moment.



Neural network basics in the second week

One and two classification

Two-classification problem. For example, input a picture and judge whether it is a cat, if it is a cat, then output 1, and if it is not, output 0.
Insert picture description here

How does a computer express a picture?

​ To save a picture on the computer, three independent matrices need to be saved , corresponding to the three color channels **red (R), green (G), and blue (B)**, for example, if the input image is 64×64 pixels (There are 64 pixels in length and width), there are three 64×64 matrices. Put all the pixel brightness values ​​in these three matrices into a feature vector X, and X can be used to represent this picture:

​ x=(255,231,…,255,134,…,255,134,…) T , if the picture is 64×64, then the total dimension of the vector X is 12288 (that is, 64×64×3), usually n x or n Represents the dimension of the input feature vector.
Insert picture description here
​ In the binary classification problem, the goal is to train a classifier, which takes the feature vector x of the picture as input, and predicts whether the output result y is 1 or 0.

The following are some symbols that need to be used in the course:

(x,y): represents a single sample, for example, x is a feature vector of a picture, and y is 1 or 0;

m: indicates that the training set is composed of m training samples;

(x (1) ,y (1) ): represents the input and output of sample 1, and so on;

X: It can be used to represent the matrix composed of all x in the training set, X=(x (1) ,x (2) ,...,x (m) ), the matrix has n x rows and m columns;

Y: It can be used to represent the matrix composed of all y, Y=(y (1) ,y (2) ,...,y (m) ), the matrix has 1 row and m columns.


Two, logistic regression

​ Logistic regression algorithm is a generalized linear regression analysis model used to predict the probability of something happening in supervised learning problems. In the above two classification problem, input a cat picture, denoted by x, we can get the output y^ =P(y=1|x), we hope y^ tells us the probability that this is a cat picture:

  • x is an n x -dimensional vector;
  • The parameter w of Logistic regression is also an n x -dimensional vector, and b is a real number;
  • So, given x, w, b, we can use the linear equation to calculate y^ = w T x + b.

The above is our usual way of doing linear regression, but this is not a very good binary classification algorithm, because we want y^ to be the probability of y=1, so y^ should be between 0 and 1 , but it is difficult Realization, because the value of w T x+b may be greater than 1, or it may be negative. Such a probability is meaningless . Therefore, in logistic regression, we use the sigmoid function for the quantity w T x+b , namely :

Y ^ = σ (at T x + b)

The sigmoid function is a smooth curve from 0 to 1 as shown in the figure below:
Insert picture description here

We use z to represent (w T x+b), the abscissa of the above figure represents z, then: y^ =σ(z), in fact σ(z)=1/(1+e -z ) , It can be observed that if z is large, the value of σ is very close to 1, which is consistent with the expression in the above figure.

Through the processing of the sigmoid function, when the z value is a number far greater than 1, the obtained probability is close to 1, and when the z value is a negative number, the obtained probability is close to 0, which solves the meaningless problem of the probability above.

Guess you like

Origin blog.csdn.net/weixin_44051854/article/details/114108443