Deep Learning Specialization Course Notes - Neural Network Basics

The second week of classes begins with a refresher on the basics of neural network programming.

Binary classification

Logistic regression is a binary classification problem. For example: judging the cat or not problem according to the picture.

At this time, the total number of elements contained in the image matrix is ​​64*64*3 (red, green and blue 64*64 is the image size), so n(x)=12288 is used to represent the dimension of the input feature vector x; the output y is 1 or 0 (Label). When there are m test cases, the given X:


This nx*m writing method is simpler than m*nx in the use of neural networks.


Logistic Regression

Linear regression is not a good algorithm for binary classification problems (the probability of binary classification should be between 0 and 1, and linear regression wx+b is easy to fall out of this range.)

At this time, logistic regression is used to take the sigmoid function for wx+b:


Also note that when implementing a neural network, it is easier to separate the parameters w and b, where b is the offset and w is the nx-dimensional parameter:



Logistic Regression Cost Function

The loss function is used to optimize the parameters w and b of the logistic regression model.

The Loss function is used to define the loss at the output of the algorithm and detect the operation of the algorithm (need to be set).

However, the loss function of squared error is generally not used in logistic regression, because the optimization problem will be non-convex when studying the parameters. The actual loss function formula used is:


In this formula, when y=1, if you want a small L, you need a large log(yhat), and yhat is a sigmoid function, so the maximum is infinitely close to 1;

When y=0, if you want a small L, you need a large log(1-yhat). Similarly, yhat is infinitely close to 0.


So the loss function is as above, it wants to find w and b with the smallest overall cost for all samples.

Logistic regression is a small neural network.


Gradient Descent

The question in this subsection: How to use gradient descent to tune the parameters w and b in the training set.

Plot J(w,b) from the previous subsection (w can be a higher dimension in practice), the loss function J is a convex function (convex function), each step is trying to go along the steepest downhill direction go:


Use := to indicate that the parameter is iterating . Before the algorithm converges, repeat:


where alpha represents the learning rate and is the step size of gradient descent in each iteration.

When the parameter is too large, the derivative is positive and the parameter becomes smaller; the opposite is true when the parameter is too small.


Derivatives (Calculus)

Speaking of straight line slope (slope) is the same everywhere, junior high school mathematics.


More Derivatives Examples

Speaking of the slope of the curve is not the same everywhere, the same junior high school mathematics .


Computation graph

Nothing.


Derivatives with a Computation graph

Explain what backpropagation is, the chain rule.


Logistic Regression Gradient Descent

Continuing with the subsection, the meaning of backpropagation: reversely calculate the w and b that need to be changed.


Gradient Descent on m examples


Large data for loops are too slow, so vectorization is generally used.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325460686&siteId=291194637