Master deep learning in one article (eight)-thoroughly understand vectorization in Logistic regression

In the previous article, we learned about multi-sample gradient descent, but it has a drawback that loop statements are used in the encoding process. This is obviously not what we want to see, because the data to be processed for deep learning The order of magnitude is very large. The use of loops will waste a lot of time and hinder our development. Using vectorization can solve this problem. It may not be literally that vectorization is shorter than looping. Next we use numpy To calculate:

import numpy as np
import time

a=np.random.rand(100000)
b=np.random.rand(100000)

start=time.time()
c=np.dot(a,b)
end=time.time()
print('向量化使用的时间为:'+str(1000*(end-start))+'ms')

start=time.time()
for i in range(100000):
    c=a*b
end=time.time()
print('循环使用的时间为:'+str(1000*(end-start))+'ms')

The results are as follows:

It can be seen that the running time after using vectorization is much less than the time using loops, so it is very necessary to learn vectorization.

So we will use vectors to process multiple samples, Let's Go!

When there was no vectorization before, for the forward propagation of multiple samples, we would write:

After using vectorization, two formulas are solved:

Z=w^{T}\times X+b

A=\sigma \left ( Z \right )

Let's explain below:

Z is a vector used to store the z calculated for each sample, namelyZ = [z ^ {(1)}, z ^ {(2)}, ..., z ^ {(i)}]

X is a matrix used to store the input x of each sample, namelyX=[x^{(1)},x^{(2)},...,x^{(i)}]

A is a vector that is used to store the a obtained after the activation function of z of each sample, namelyA=[a^{(1)},a^{(2)},...,a^{(i)}]

Suppose there are m samples, and each sample has two features, that is x^{(i)}=\begin{bmatrix} x_{1}^{(i)}\\ x_{2}^{(i)} \end{bmatrix}, X will make a matrix, that is X=[x^{(1)},x^{(2)},...,x^{(i)}]=\begin{bmatrix} x_{1}^{(1)} & x_{1}^{(2)} & ... &x_{1}^{(i)} \\ x_{2}^{(1)} & x_{2}^{(2)} &... & x_{2}^{(i)} \end{bmatrix}, where the superscript represents the i-th sample, and the following table represents the i-th feature.

Let’s calculate the dimensions, where the dimension of X is 2*m, the dimension of Z is 1*m, w^{T}the dimension of matrix multiplication is 2*1, the dimension of b is 1*m, the dimension of A and the dimension of Z are the same.

Let's check it out:

After checking the dimensions are completely correct, it shows that our forward propagation vectorization is successful.

Next we look at back propagation:

When there was no vectorization before, for the backpropagation of multiple samples, we would write:

With vectorization, we can write:

dZ = AY

dW=1\setminus m\times X\times dZ^{T}

db=1\setminus m*dZ

The dimensions of dZ, dW, and db here are the same as those of Z, W, and b. Let’s take a look:

dZ = AY, Where A is 1*m, and Y is also 1*m, so the dimension of dZ is also 1*m, which is the same as the dimension of Z.

dW=1\setminus m\times X\times dZ^{T}, Where X is 2*m, which dZ ^ {T}is m*1, so the dimension of dW is 2*1 and the dimension of W is the same.

db=1\setminus m*dZ, Where dZ is 1*m, then db is also 1*m, which is the same dimension as b.

Then update the parameters:

W=W-\alpha \times dW

b=b-\alpha \times db

Finally, I will post a picture from Wu Enda's video:

If you think the article is helpful to you, please pay attention not to get lost~

The above is the entire content of this article. To get the deep learning materials and courses, scan the official account below and reply to the word "data" to get it. I wish you a happy learning.

Guess you like

Origin blog.csdn.net/qq_38230338/article/details/107655753