Table of contents
PyTorch is an open source Python machine learning library, based on Torch, for applications such as natural language processing
I. Introduction
This article is the second part of [Numpy Basics] in PyTorch deep learning . It mainly explains the relevant knowledge of array transformation, batch processing, general functions and broadcasting mechanism .
Review of the previous issue:
Introduction to Deep Learning PyTorch Deep Learning - Numpy Basics (Part 1)
Two, array deformation
In machine learning and deep learning tasks, it is usually necessary to input the processed data to the model in a format that the model can accept, and then the model will go through a series of operations and finally return a processing result .
However, since the input formats received by different models are different , it is often necessary to perform a series of transformations and calculations to process the data into a format that meets the requirements of the model . In the operation of matrix or array, it is often encountered that multiple vectors or matrices need to be combined or flattened in a certain axis direction (for example, in convolutional or cyclic neural networks, the matrix needs to be flattened before the fully connected layer ) situation.
Introduced belowSeveral commonly used data transformation methods。
2.1 Changing the shape of an array
Modifying the shape of a specified array is one of the most common operations in Numpy. There are many common methods. The following table lists some commonly used functions .
function | describe |
---|---|
arr.reshape | Re-change the dimension of the vector arr without modifying the vector itself |
arr.resize | Re-change the dimension of the vector arr and modify the vector itself |
arr.T | Transpose the vector arr |
arr.ravel | Flatten the vector arr, that is, turn the multi-dimensional array into a 1-dimensional array, without generating a copy of the original array |
arr.flatten | Flatten the vector arr, that is, turn the multi-dimensional array into a 1-dimensional array, and return a copy of the original array |
arr.squeeze | Dimensionality reduction can only be performed on dimensions with a dimension of 1. No error will be reported when using multidimensional arrays, but it will not have any impact |
arr.transpose | Axis swapping for high-dimensional matrices |
Next, let's look at some examples.
2.1.1 reshape
change the dimensions of the vector( without modifying the vector itself ):
import numpy as np
arr -np.arange(10)
print(arr)
#将向量arr维度变换为2行5列print(arr.reshape(2,5))
#指定维度时可以只指定行数或列数,其他用-1代替
print(arr.reshape(5,-1))
print(arr.reshape(-1,5))
Output result:
[0 1 2 3 4 5 6 7 8 9]
[[0 1 2 3 4]
[5 6 7 8 9]]
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
[[0 1 2 3 4]
[5 6 7 8 9]]
It is worth noting that the reshape function does not support specifying the number of rows or columns , so -1 is necessary here . andThe specified number of rows or columns must be divisible, For example, if the above code is modified to arr.reshape(3,-1), it will be wrong.
2.1.2 resize
change the dimensions of the vector( modify the vector itself ):
import numpy as np
arr=np.arange(10)
print(arr)
#将向量arr维度变换为2行5列arr.resize(2,5)
print(arr)
Output result:
[0 1 2 3 4 5 6 7 8 9]
[[0 1 2 3 4]
[5 6 7 8 9]]
2.1.3 T (transpose)
Vector transpose:
import numpy as np
arr=np.arange(12).reshape(3,4)
#向量arr为3行4列
print(arr)
#将向量arr进行转置为4行3列
print(arr.T)
Output result:
[[0 1 2 3]
[4 5 6 7]
[8 9 10 11]]
[[0 4 8]
[1 5 9]
[2 6 10]
[3 7 11]]
2.1.4 ravel
Vector flattening:
import numpy as np
arr=np.arange(6).reshape(2,-l)
print(arr)
# 并按照列优先,展平
print("按照列优先,展平")
print (arr.ravel('F"))
# 按照行优先,展平
print("按照行优先,展平")
print(arr.ravel())
output result:
[0 1 2 3 4 5]
# 按照列优先,展平
[ 0 3 1 4 2 5]
# 按照行优先,展平
[0 1 2 3 4 5]
2.1.5 flatten
Converting matrices to vectors often occurs between convolutional networks and fully connected layers .
import numpy as np
a=np.floor(10*np.random.random((3,4)))
print(a)
print(a.flatten())
Output result:
[[4. 0. 8. 5.]
[1. 0. 4. 8.]
[8. 2. 3. 7.]]
[4. 0. 8. 5. 1. 0. 4. 8. 8. 2. 3. 7.]
2.1.6 squeeze
This is a function mainly used for dimensionality reduction , which removes the dimension containing 1 in the matrix .There is also an opposite operation in PyTorch——torch.unsqueeze, which will be introduced later.
arr=np.arange(3).reshape(3,1)
print(arr.shape) #(3,1)
print(arr.squeeze().shape) #(3,)
arr1=np.arange(6).reshape(3,1,2,1)
print(arrl.shape) #(3,1,2,1)
print(arr1.squeeze().shape) #(3,2)
2.1.7 transpose
Perform axis swapping on high-dimensional matrices , which is often used in deep learning, such as changing RGB to GBR to represent the color order in a picture .
import numpy as np
arr2=np.arange(24).reshape(2,3,4)
print(arr2.shape) #(2,3,4)
print(arr2.transpose(1,2,0).shape) #(3,4,2)
2.2 Merging arrays
Merging arrays is also one of the most common operations. The following table lists common methods for array or vector merging .
function | describe |
---|---|
np.append | large memory usage |
np.concatenate | no memory issues |
np.stack | Join a series of arrays along a new axis |
np.hstack | Stack array vertical order (rows) |
np.vstack | stack array vertical order (columns) |
np.dstack | Stack arrays go deep sequentially (along the 3rd dimension) |
e.g. vs. split | Break down an array into a vertical list of multiple subarrays |
Explanation:
(1) append, concatenate, and stack all have an axis parameter , which is used to control whether the arrays are merged by row or by column .
(2) For append and concatenate , the arrays to be merged must have the same number of rows or columns (satisfy one).
(3) stack, hstack and dstack require that the arrays to be merged must have the same shape . Some commonly used functions are selected below for illustration .
2.2.1 append
Merge one-dimensional arrays :
import numpy as np
a=np.array([1,2,3])
b=np.array([4,5,6])
c=np.append(a,b)
print(c)
#[1 2 3 4 5 6]
Merge multidimensional arrays :
import numpy as np
a=np.arange(4).reshape(2,2)
b=np.arange(4).reshape(2,2)
# 按行合并
c=np.append(a,b,axis=0)
print('按行合并后的结果')
print(c)
print('合并后数据维度',c.shape) # 按列合并
d=np.append(a,b,axis=l)
print('按列合并后的结果')
print(d)
print('合并后数据维度',d.shape)
Output result:
按行合并后的结果
[[0 1]
[2 3]
[0 1]
[2 3]]
合并后数据维度 (4,2)
按列合并后的结果
[[0 1 0 1]
[2 3 2 3]]
合并后数据维度 (2,4)
2.1.2 concatenate
Link arrays or matrices along specified axes :
import numpy as np
a=np.array([[1,2],[3,4]])
b=np.array([[5,6]])
c=np.concatenate((a,b),axis=0)
print(c)
d=np.concatenate((a,b.T),axis=1)
print(d)
Output result:
[[12]
[3 4]
[5 6]]
[[1 2 5]
[3 4 6]]
2.1.3 stack
Stack arrays or matrices along a specified axis :
import numpy as np
a=np.array([[1, 2],[3, 4]])
b=np.array([[5, 6],[7, 8]])
print(np.stack((a,b),axis=0))
Output result:
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
3. Batch processing
In deep learning, since the source data is relatively large , batch processing is usually required .For example, the stochastic gradient method (SGD) that uses batches to calculate gradients is a typical application.. The calculation of deep learning is generally more complex, and the amount of data is generally relatively large . If the entire data is processed at one time, there is a high probability of resource bottlenecks .
For more efficient computation, the entire dataset is generally processed in batches . At the other extreme, as opposed to processing the entire dataset, is to process only one record at a time. This method is also unscientific . Processing one record at a time cannot give full play to the parallel processing advantages of GPU and Numpy . Therefore, the method of batch processing (Mini-Batch)
is often used in actual use .
How to split big data into multiple batches?The following steps can be taken:
(1) Get the data set
(2) Randomly scramble the data
(3) Define the batch size
(4) Batch the data set
Let’s use an example to illustrate:
import numpy as np
# 生成10000个形状为2x3的矩阵
data_train=np.random.randn(10000,2,3)
#这是一个3维矩阵,第1个维度为样本数,后两个是数据形状print(data_train.shape)
# (10000,2,3)
# 打乱这10000条数据
np.random.shuffle(data_train)
# 定义批量大小
batch_size=100
# 进行批处理
for i in range(0,len(data_train),batch_size):
x_batch_sum=np.sum(data_train[i:i+batch_size])
print("第{}批次,该批次的数据之和:(}".format(i,x_batch_sum))
The last 5 rows result in:
第9500批次,该批次的数据之和:17.63702580438092
第9600批次,该批次的数据之和:-1.360924607368387
第9700批次,该批次的数据之和:-25.912226239266445
第9800批次,该批次的数据之和:32.018136957835814
第9900批次,该批次的数据之和:2.9002576614446935
[Explanation] The batch starts from 0, so the last batch is 9900.
4. Universal functions
Numpy provides two basic objects , ndarray and ufunc objects. ndarray has been introduced earlier, this section will introduce another object universal function (ufunc) of Numpy .
ufunc is the abbreviation of universal function , it is aFunctions that operate on each element of an array. Many ufunc functions are implemented at the C language level, so they are very fast to compute . Also, they are more flexible than the functions in the math module .
The input of the math module is generally a scalar , but the function in Numpy can be a vector or a matrix , and the use of a vector or a matrix can avoid the use of loop statements, which is very important in machine learning and deep learning.
The following table shows several common functions commonly used in Numpy .
function | Instructions |
---|---|
sqrt | Calculate the square root of serialized data |
without,cos | Trigonometric functions |
abs | Calculate the absolute value of serialized data |
dot | Matrix Operations |
log,log10,log2 | Logarithmic function |
exp | exponential function |
cumsum, cumproduct | cumulative sum, product |
sum | Sum a serialized data |
mean | Calculate mean |
median | Calculate the median |
std | Calculate standard deviation |
was | Calculate the variance |
corrcoef | Calculate the correlation coefficient |
4.1 Performance comparison between math and numpy functions
import time
import math
import numpy as np
x=[i * 0.001 for i in np.arange (1000000)]
start=time.clock()
for i,t in enumerate(x):
x[i]=math.sin(t)
print("math.sin:",time.clock()-start)
x=[i*0.001 for i in np.arange (1000000)]=np.array(x)
start=time.clock()np.sin(x)
print("numpy.sin:",time.clock()-start)
Print result:
math.sin:0.5169950000000005
numpy.sin:0.05381199999999886
4.2 Comparison of Loop and Vector Operations
Make full use of the built-in function (Built-in Function) in Python's Numpy library to realize the vectorization of calculation , which can greatly improve the running speed. The built-in functions in the Numpy library use SIMD instructions . Vectorization as used below is much faster than computation using loops . If you use a GPU, its performance will be more powerful, butNumpy does not support GPU...
PyTorch supports GPU , and I will introduce how PyTorch uses GPU to accelerate algorithms later.
import time
import numpy as np
x1=np.random.rand(1000000)
x2=np.random.rand(1000000)
# 使用循环计算向量点积
tic=time.process_time()
dot=0
for i in range(len(xl)):
dot+=x1[i]*x2[i]
toc=time.process_time()
print("dot="+str(dot)+"\n for loop----- Computation time = " +str(1000*(toc-tic))+""ms "" )
# 使用numpy函数求点积
tic=time.process_time()
dot=0
dot=np.dot(x1,x2)
toc=time.process_time()
print("dot="+str(dot)+"\n verctor version---- Computation time = "+str(1000*(toc-tic))+"ms")
Output result:
dot=250215.601995
for loop----- Computation time=798.3389819999998 ms
dot=250215.601995
verctor version---- Computation time=1.885051999999554 ms
Judging from the running results, the running time of using the for loop is about 400 times that of the vector operation . Therefore, in deep learning algorithms, vectorized matrices are generally used for operations.
5. Broadcast mechanism
Numpy's Universal functions require that the shape of the input array be consistent . When the shapes of the arrays are not equal, the broadcast mechanism will be used . However, to adjust the array to make the shape the same, certain rules need to be met, otherwise an error will occur. These rules can be summarized into the following 4 items.
(1)Let all input arrays be aligned with the array with the longest shape among them, and the insufficient part is filled by adding 1 in front, such as:
a: 2×3×2
b: 3×2,
then b is aligned with a, and 1 is added in front of b to become: 1×3×2
(2)The shape of the output array is the maximum value on each axis of the input array shape;
(3)If the length of a certain axis of the input array is the same as that of the corresponding axis of the output array or the length of a certain axis is 1, this array can be used for calculation, otherwise an error occurs;
(4)When the length of an axis of the input array is 1, the first group of values on this axis is used (or copied) when operating along this axis.
Broadcasting is used throughout Numpy to decide how to handle arrays of widely different shapes, involving arithmetic operations such as (+,-,*,/…). These rules are very rigorous, but not intuitive, let's combine graphics and code to further explain.
Purpose: A+B, where A is a 4×1 matrix, and B is a one-dimensional vector ( 3,).
To add, you need to do the following processing:
according to rule 1, B needs to be aligned, change B to (1,3 )
according to rule 2, the output result is the maximum value on each axis, that is, the output result should be (4, 3) Matrix , then how does A change from (4,1) to (4,3) matrix? How does B change from (1,3) to (4,3) matrix?
According to rule 4, use the first set of values on this axis (to mainly distinguish which axis) to copy (but not really in actual processing Copy, otherwise it will consume too much memory, but use other objects such as ogrid objects for grid processing), and the detailed processing process is shown in the figure below.
Code:
import numpy as np
A=np.arange(0,40,10).reshape(4,1)
B=np.arange(0,3)
print("A矩阵的形状:{},B矩阵的形状:{}",format(A.shape,B.shape))
C=A+B
print("c矩阵的形状:{}".format(C.shape))
print(C)
operation result:
A矩阵的形状:(4,1),B矩阵的形状:(3,)
c矩阵的形状:(4,3)
[[0 1 2]
[10 11 12]
[20 21 22]
[30 31 32]]
6. Summary
This chapter mainly introduces the common operations , especiallyOperations on matrices, these operations are often used in subsequent programs. Numpy is very rich in content, here are only some of the main content, if you want to know more, you can log in to Numpy's official website (http://www.Numpy.org/) to see more.
Recently, I am learning the optimization algorithm of deep learning . I don’t know which one you want to see first... Please vote below!