Python pytorch tutorial - take you from entry to actual combat (the code is all runnable)

Python pytorch tutorial - take you from entry to actual combat (the code is all runnable)

In fact, this tutorial has been written once by a blogger before, but this time I will write it again, intending to write a little more content, from the shallower to the deeper, and then add some practical cases.

Here is our content directory:

1. Let's start with the data type

1.1 How to generate various data types of pytorch?
1.2 What are the properties of the various data types of pytorch?
1.3 What are the function operations of various data types of pytorch?

2. After talking about data types and their operations, choose a certain direction to start learning and practicing (deep learning).

2.1 Derivation
2.2 Loss function
2.3 Optimizer
2.4 Linear regression code combat
2.5 Convolutional neural network combat
2.6 Neural network combat
2.7 RNN and LSTM combat

1. Let's start with the data type

So if we start from the data type, we have to talk about it from the following angles:

1.1 How to generate various data types of pytorch?

1.2 What are the properties of the various data types of pytorch?

1.3 What are the function operations of various data types of pytorch?

First of all, we know that pytorch is a computing library, and some people say it is a deep learning library, so computing libraries must have their own data types.
In addition, pytorch is calculated around Tensors (tensors). Tensors are similar to NumPy's ndarrays, and Tensors can use GPU for calculations. ndarrays are not allowed, and can only be stored in the CPU, but Tensor can, Tensor is actually a data structure used to perform various calculations, store data, and have various operations, such as adding, deleting, modifying, and checking.

So let's start with the pytorch data type. Without our own data type, we can't talk about it. Of course, if you have studied data structures, you will know that basically you want to develop a large-scale system or a small system, or you want to develop some functional interfaces. Like pytorch, it is actually a functional interface. We all need to start from the most basic classes. Each data type of pytorch is actually a class. This class will define some of its basic data and define some of its function operations, such as multiplication, addition, subtraction, multiplication, and gradient update. Therefore, when we look at the data types of pytorch, we can look at it from the perspective of classes.

Let's look at a picture below:

insert image description here
The above picture actually shows the corresponding relationship between the five data types of python and the five data types of pytorch. In fact, there should be more in pytorhc, but in general, it is enough to master the above four types, even if you are Those who do deep learning only need to master the above four.

Let's start the text:

1.1 How to generate various data types of pytorch?
It is relatively simple to generate various data types of pytorch. Of course, it is also divided into many situations. Let us introduce them one by one:

Before starting 1.1, we need to talk about a function in advance, this part should have been said in 1.3.

This function is type

The code example is as follows:
insert image description here

The running result will return the data type of his variable, which we will use frequently later.
insert image description here

(1) Generate data types through pytorch functions

1. The randn function returns a tensor containing a set of random numbers drawn from the standard normal distribution

import torch
#使用torch函数生成torch数据类型
print(torch.randn(2,3))
print(torch.randn(2,3).type())

The input results are as follows:
This function returns a tensor of FloatTensor.
insert image description here

2. The rand function, this function looks similar to the randn function but is actually very different.


print(torch.rand(2,3))

print(torch.rand(2,3).type())

The output is as follows:
This returns a tensor of the FloatTensro data type.
insert image description here
3. torch.eye() returns a two-dimensional tensor, which is also a tensor of FloatTensor data type. Students who have studied linear algebra may know that this function is very useful. Stars can be used to generate identity matrices.
There is a doubt here. We all know that the elements of the unit matrix are generally represented by integers. Why is it also a float data type here? Therefore, pytorch is for the convenience of calculation, and all of them are uniform. They all use the data type of FloatTensor tensor, so that there will be no loss of type conversion data, and it is also convenient to operate.


print(torch.eye(2,3))

print(torch.eye(2,3).type())

insert image description here

4.from_numpy, convert numpy.ndarray to Tensor.
This function has many limitations. For example, the returned tensor and ndarray share the same memory space. Modifying one will cause the other to be modified. The returned tensor cannot change its size.
So pay attention to these problems when using them.


a=np.random.randn(2,3)

#print(a)

tor=torch.from_numpy(a)
print(tor)
print(tor.type())

The output is as follows, note that this is the data type of DoubleTensor.
insert image description here

5.linspace returns a one-dimensional tensor. This usage means that we set a start point, an end point, and then set an interval, and the data will be returned from the start point to the end point according to the set interval. The return data type is FloatTensor.

Note that step can only be an integer, and start and end can be set at will

print(torch.linspace(2,10,2))

print(torch.linspace(2,10,2).type())

The output is as follows:
insert image description here

6. logspace, returns a one-dimensional tensor, sets a start point, an end point, and also needs to set a number of returned elements, which will return the required number of elements in the interval at equal intervals.
By the way, this function can also process the data logarithmically.



print(torch.logspace(2,10,20))

print(torch.logspace(2,10,20).type())
#以5为底对数据进行对数处理,默认为10
print(torch.logspace(2,10,20,5))

print(torch.logspace(2,10,20,5).type())

insert image description here7.ones, returns a tensor with all 1s

print(torch.ones(10))

print(torch.logspace(10))

The return data type is also FloatTensor.
insert image description here

8.randperm, returns a random integer arrangement, so only the integer n can be passed in, and then the arranged elements are less than n.


print(torch.randperm(10))

print(torch.randperm(10).type())

Note that the return type is LongTensor.
insert image description here
9. The arange function, this also returns a one-dimensional tensor, its usage is similar to logspace, but it has no base, and the base is always 10.

Look at the code below to understand:

print(torch.arange(10))

print(torch.arange(2,10))
print(torch.arange(2,10,2))
print(torch.arange(2,10,0.5))
print(torch.arange(10).type())
print(torch.arange(2,10,0.5).type())

The output results are as follows:
Note, in fact, the data type generated by the function is not necessarily fixed, as we will see below.
insert image description here10.zeros, the usage of this function is very similar to ones, you can refer to the usage of the ones function.
Look at the code below:


print(torch.zeros(10))

print(torch.zeros(10).type())

Return a tensor full of 0s, then the type is FloatTensor, and the number of elements is passed by us.

The output results are as follows:
insert image description here
11.zeros_like This will generate a tensor with all 0s with the same dimension as the given tensor. It can generate any dimension ha.

Look at the code:

print(torch.zeros_like(a))
print(torch.zeros_like(b))
print(torch.zeros_like(a).type())

The output is as follows:
insert image description here
12 empty_like, the usage is similar to zeros_like, but the tensor it generates is uninitialized.

To be precise, the generated data actually exists, but it is random.


a=torch.eye(2,3)
b=torch.ones(12)
print(torch.empty_like(a))
print(torch.empty_like(b))
print(torch.empty_like(a).type())

insert image description here13 full function, pass in a dimension s, and then pass in a value v, will generate a tensor with dimension s and all values ​​b, the
code is as follows:


print(torch.full((4,6),10))

print(torch.full((4,6),10).type())

The output is as follows:
insert image description here
14 full_like, very similar to zeros_like, let’s look at the code.

a=torch.eye(2,3)
b=torch.ones(6)
print(torch.full_like(a))
print(torch.full_like(b))
print(torch.full_like(a).type())

The output is as follows:
insert image description here

15. The function as_tensor is very important. The above fourteen functions are all important, but this one is more important because it is a transition function. If you have learned better about lists, tuples, and ndarrays before, then this function is for you It is more practical, it can directly convert types, and convert lists, tuples, and ndarray types.

code show as below:

a=[1,2,3]
b=(1,2,3)
a2=[[1,2,3],[1,2,3]]
c=np.array([1,2,3])
print(torch.as_tensor(a))
print(torch.as_tensor(b))
print(torch.as_tensor(a2))
print(torch.as_tensor(c))

print(torch.as_tensor(a).type())

The output is as follows.

insert image description here
So in fact, pytorch is also regular, generally converted to FloatTensor, only some integers will be converted to LongTensor.

16 rand_like is similar to the previous full_lile, zeros_like, ones_like, but this returns a random number between 0-1.
Let's look directly at the code below:



a=torch.eye(2,3)
b=torch.ones(6)
print(torch.rand_like(a))
print(torch.rand_like(b))
print(torch.rand_like(a).type())

The output results are as follows:
insert image description here

17 randint, returns a tensor filled with random integers, we need to set the range of random integers and tensor dimensions.
The sample code is as follows:

print(torch.randint(0,10,size=(12,)))
print(torch.randint(0,10,size=(2,3)))
print(torch.randint(0,10,size=(2,3)).type())

Pay attention to setting the range in which the random integer is returned, that is, passing the upper and lower bounds.
Output result:

insert image description here
18 randint_like Needless to say, I know everything from the above, just look
at the code to understand:

a=torch.eye(2,3)
b=torch.ones(6)
print(torch.randint_like(a,0,10))
print(torch.randint_like(b,0,10))
print(torch.randint_like(a,0,10).type())

The output is as follows:

insert image description here

19. randn_like, this is the same, according to the shape of the input tensor, the returned data is the normal distribution.
Look at the code:


a=torch.eye(2,3)
b=torch.ones(6)
print(torch.randn_like(a))
print(torch.randn_like(b))
print(torch.randn_like(a).type())

The output is as follows:

insert image description here

(2) The data type is generated through the tensor function. This function should have been mentioned above, but this time it mainly defines its data type.
We just took it out to do this, because this function is quite special, it can directly specify the data type we generate. It can also be said to be very, very important.


a = torch.tensor([3, 2], dtype=torch.float32) 
print(a.type())
a = torch.tensor([3, 2], dtype=torch.int32) 
print(a.type())
a = torch.tensor([3, 2], dtype=torch.int64) 
print(a.type())
a = torch.tensor([3, 2], dtype=torch.float64) 
print(a.type())

The output is as follows:
insert image description here

Ok, at this point, our 1.1 is over. When you learn a data structure, you must first learn how it is generated, and then we can get to other content.

All sample codes for 1.1 are as follows:


#coding=gbk
import os
import torch
import numpy  as np
#使用torch函数生成torch数据类型
print(torch.randn(2,3))

print(torch.randn(2,3).type())

a=torch.randn(2,3)
print(a.type())

print(torch.rand(2,3))

print(torch.rand(2,3).type())


print(torch.eye(2,3))

print(torch.eye(2,3).type())

a=np.random.randn(2,3)

#print(a)

tor=torch.from_numpy(a)
print(tor)
print(tor.type())


print(torch.linspace(2,10,2))

print(torch.linspace(2,10,2).type())


print(torch.logspace(2,10,20))

print(torch.logspace(2,10,20).type())
#以5为底对数据进行对数处理,默认为10
print(torch.logspace(2,10,20,5))

print(torch.logspace(2,10,20,5).type())

print(torch.ones(10))

print(torch.ones(10).type())


print(torch.randperm(10))

print(torch.randperm(10).type())
print(torch.arange(10))

print(torch.arange(2,10))
print(torch.arange(2,10,2))
print(torch.arange(2,10,0.5))
print(torch.arange(10).type())
print(torch.arange(2,10,0.5).type())

print(torch.zeros(10))

print(torch.zeros(10).type())

a=torch.eye(2,3)
b=torch.ones(12)
print(torch.zeros_like(a))
print(torch.zeros_like(b))
print(torch.zeros_like(a).type())


a=torch.eye(2,3)
b=torch.ones(6)
print(torch.empty_like(a))
print(torch.empty_like(b))
print(torch.empty_like(a).type())

print(torch.full((4,6),10))

print(torch.full((4,6),10).type())




a=torch.eye(2,3)
b=torch.ones(6)
print(torch.full_like(a,12))
print(torch.full_like(b,12))
print(torch.full_like(a,12).type())

a=[1,2,3]
b=(1,2,3)
a2=[[1,2,3],[1,2,3]]
c=np.array([1,2,3])
print(torch.as_tensor(a))
print(torch.as_tensor(b))
print(torch.as_tensor(a2))
print(torch.as_tensor(c))

print(torch.as_tensor(a).type())


a=torch.eye(2,3)
b=torch.ones(6)
print(torch.rand_like(a))
print(torch.rand_like(b))
print(torch.rand_like(a).type())

print(torch.randint(0,10,size=(12,)))
print(torch.randint(0,10,size=(2,3)))
print(torch.randint(0,10,size=(2,3)).type())

a=torch.eye(2,3)
b=torch.ones(6)
print(torch.randint_like(a,0,10))
print(torch.randint_like(b,0,10))
print(torch.randint_like(a,0,10).type())

a=torch.eye(2,3)
b=torch.ones(6)
print(torch.randn_like(a))
print(torch.randn_like(b))
print(torch.randn_like(a).type())

os.system("pause")

For the data generation method mentioned in 1.1, you can basically choose the storage device. Whether you choose the gpu or the cpu as the device, if you choose the cpu, the subsequent operations will be executed by the cpu, and similarly, if you choose the gpu, it will be executed by the gpu. So you can choose storage devices according to your needs.

1.2 What are the properties of the various data types of pytorch?

The content of this part may be much easier, because the attributes of each data type are basically shared, so we can just give one.
How do we get started?

Of course, create a tensor first! This is not what we have been talking about in 1.1.
Create a tensor:

import os
import torch
import numpy  as np
#使用torch函数生成torch数据类型

a=torch.randn(2,3)
  1. type View its data type

Then, check if his data type is:

print(a.type())

insert image description here
Then the first attribute is the attribute of data type, which is actually an attribute.
In addition, type should actually be a function, because it can be called and has parentheses, which can be used for type conversion.


a=torch.randn(2,3)
print(a.type())
a = a.type(torch.int64)

print(a.type())

The above code can perform type conversion. It can be seen that this function should re-open a section of memory.
insert image description here

2.size view data shape

Usage is also very simple:


a=torch.randn(2,3)
print(a.size())

a=torch.randn(2,)
print(a.size())
a=torch.randn(6)
print(a.size())

output:
insert image description here

At this point, 1.2 is actually over. Yes, there is never much content, because attributes are used to understand a data variable. We only need to know the status of this variable through type and size.
1.2 The code is as follows:


#coding=gbk

import os
import torch
import numpy  as np
#使用torch函数生成torch数据类型


a=torch.randn(2,3)
print(a.type())
a = a.type(torch.int64)

print(a.type())

a=torch.randn(2,3)
print(a.size())

a=torch.randn(2,)
print(a.size())
a=torch.randn(6)
print(a.size())

os.system("pause")

1.3 What are the function operations of various data types of pytorch?

In fact, the function operations of various data types are basically the same here. It doesn’t matter whether it is added, deleted, modified, checked, some numerical processing, etc., so let’s introduce it now:

1. is_tensor and is_storage. This is like this. When we generate a tensor, we will get a tensor and its corresponding variable, but the address pointed to by this variable does not store data, but stores some tensor attributes, such as type and size. , and where its real data is stored.
Then when we generate a variable, we will get a spoof tensor area and storage, the tensor area stores tensor information, and the storage area stores data. As shown in the figure below
insert image description here
, now let's look at the sample code


a=torch.randn(2,3)
print(torch.is_tensor(a))

c=a.storage()

print(torch.is_storage(a))

print(torch.is_storage(c))

The result of running it will be as follows:
insert image description here

  1. numel This function will return the total number of data elements of the tensor, which is also very important, see the sample code

a=torch.randn(2,3)
b=torch.randn(7)
print(torch.numel(a))
print(torch.numel(b))

The output results are 6 and 7:
insert image description here
3. sparse_coo_tensor generates a sparse matrix. The reason why the sparse matrix is ​​taken out and processed with a function is that pytorch must also encapsulate its memory and handle the storage of sparse matrices. Can save memory.
Let's take a look at the usage of this function. The sample code is as follows:



index=torch.tensor([[1,2],[2,1]])
value=torch.tensor([2,3],dtype=torch.float32)

t=torch.sparse_coo_tensor(index,value,(3,4))
print(t)

The output is as follows:

4. cat , performs a concatenation operation on the input tensor sequence seq in a given dimension. This function takes two parameters.

第一个参数tensors是你想要连接的若干个张量,按你所传入的顺序进行连接,注意每一个张量需要形状相同,或者更准确的说,进行行连接的张量要求列数相同,进行列连接的张量要求行数相同。
第二个参数dim表示维度,dim=0则表示按行连接,dim=1表示按列连接

Take a look at the sample code:


a=torch.randn(2,3)

b=torch.randn(2,4)

c=torch.randn(4,3)

print(a)
print(b)
print(c)
print(torch.cat((a,b),1))
print(torch.cat((a,c),0))

The output is as follows:
insert image description here

Note that the second parameter indicates which axis we choose to splice on, and that axis must be equal.

5.chunk This function will divide the tensor into chunks. Three parameters need to be passed in, one is the tensor, the other is the number of blocks, and the other is the block according to that axis.
Let's take a look at the sample code:



a=torch.randn(12,3)
print(torch.chunk(a,6,0))
a=torch.randn(3,12)
print(torch.chunk(a,6,1))

The output is as follows:
insert image description here

  1. gather, this function is for aggregation, in fact, it takes out elements from a tensor.
    Have a look at the example usage

a=torch.randn(4,3)
index1=torch.LongTensor([[0,1,2,1]])
index2=torch.LongTensor([[0,1,2]])
print(a)
print(torch.gather(a,1,index1))

print(torch.gather(a,0,index2))

The output is as follows:
insert image description here

7. index_select, this function is almost the same as the one above, but gather selects elements, and index_select directly selects a small unit.
Take a look at the sample code


a=torch.randn(4,3)
index1=torch.LongTensor([0,1,2,1])
index2=torch.LongTensor([0,1,2])

print(a)
print(torch.index_select(a,1,index1))

print(torch.index_select(a,0,index2))

The output is as follows:
insert image description here

7. The ge function, which passes in a tensor and a value v, will return a tensor containing only unique data according to the value, return true if it is greater than or equal to v, and return false if it is less than.
Look at the sample code below

This function is very important, you can learn more.

a=torch.randint(0,10,(2,3))

print(a)
b=torch.ge(a,5)
print(b)

The output solution results are as follows:
insert image description here

8. mask_index In fact, we use the above function to explain this function. This function passes in two tensors, the data type of the second tensor is bool type, and returns the second according to whether the first tensor is true. Elements of a tensor.
Look at the sample code below:


a=torch.randint(0,10,(2,3))

print(a)
b=torch.ge(a,5)
print(b)
print(torch.masked_select(a,b))

Output result:
insert image description here

9. nonzero, this function needs to enter and exit a tensor, and it will return the index of the non-zero element of this tensor.
Take a look at the sample code:


a=torch.randint(0,10,(2,3))
print(a)
print(torch.nonzero(a))

The output is as follows:
insert image description here

10.reshape, this function will change the size of the tensor
Let's take a look at the sample code:


a=torch.randint(0,10,(2,3))
b=a.reshape(3,2)
print(b)
b=a.reshape(6,1)
print(b)
b=a.reshape(1,6)
print(b)
b=a.reshape(6,)
print(b)

The output results are as follows:
insert image description here
This function is very important and must be mastered carefully, so three examples are given.

11. Split function, this function is very similar to the previous chunk, but it is more flexible, it can also pass a list for splitting. The
sample code is as follows:



a=torch.randn(6,3)

print(a)
print(torch.split(a,3,0))

print(torch.split(a,[1,5],0))
a=torch.randn(3,6)
print(a)
print(torch.split(a,2,1))

The output results are as follows:
insert image description here
12.unsqueeze This function will expand the dimension of the incoming tensor.




a=torch.randint(0,10,(2,3))

print(a)
b=torch.unsqueeze(a,0)
print(b)
b=torch.unsqueeze(a,1)
print(b)

This function may be used a little less:
insert image description here
13 squeeze, this function is used more, it can be said that it is a dimensionality reduction function. Remove 1s from the input tensor shape and return.
Take a look at the sample code:


a=torch.randint(0,10,(2,3))

print(a)
b=torch.squeeze(a,0)
print(b)
a=torch.randint(0,10,(6,1))

b=torch.squeeze(a,0)
print(b)

The output results are as follows:
Note that only the part with dimension 1 will be compressed.
insert image description here

14. The stack function, which will splice the data, is somewhat similar to the cat function. But cat can require tensors of different dimensions to be spliced, and the stack must be of the same dimension.

Let's take a look at the sample code:



a=torch.randint(0,10,(2,3))

b=torch.randint(0,10,(2,3))
print(a)
print(b)

print(torch.stack((a,b),1))
print(torch.stack((a,b),0))

The output is as follows:
insert image description here

15.t function, this is the transpose function, this part of knowledge is in the matrix.
The sample code is as follows:


a=torch.randint(0,10,(2,3))
print(a)
print(torch.t(a))

The output is as follows:
insert image description here

  1. The transpose function, this function is actually an advanced version of the t function, and two dimensions can be arbitrarily selected for exchange processing.

a=torch.randint(0,10,(2,3))
print(a)
print(torch.transpose(a,0,1))

The output is as follows:
insert image description here
If the dimension is greater than 3, it is better to use this.

  1. After unbind
    removes the specified dimension, it returns a tuple containing each slice after slicing along the specified dimension. It is to split the incoming tensor.


a=torch.randint(0,10,(2,3))
print(a)
print(torch.unbind(a,0))
print(torch.unbind(a,1))

A tuple will be returned, splitting the tensor according to the passed axis.
The output is as follows:
insert image description here

  1. where, this function is especially in pytorch. It needs to pass in an expression, and then pass in two tensors. If the conditions are met, the elements of the first tensor are returned, and if the conditions are not satisfied, the elements of the second tensor are returned. This function should You can pay attention to it when you listen to commonly used ones.
a=torch.randint(0,10,(2,3))

b=torch.randint(0,10,(2,3))
print(a)
print(b)
print(torch.where(a>5,a,b))

The output results are as follows:
insert image description here
18. manual_seed and initial_seed, manual_seed will set our random seed, and initial_seed will return the currently set random seed.
Take a look at the sample code.


torch.manual_seed(10)
print(torch.initial_seed())

The returned result is 10.
After the random seed is set, the data we randomly generate will be fixed, so isn't it not random? The main thing is to repeat the experiment when doing the experiment.

  1. Bernoulli, those who do statistics may understand this, this, in fact, returns a tensor, the input parameter is a probability tensor, and the element value must be between 0 and 1, this function will use the input probability as Bernoulli The probability parameter of the distribution, returns 0, 1 value. Take a look at the code:


a=torch.tensor([[0.5,0.2],[0.4,0.6]])
print(a)
print(torch.bernoulli(a))

The output results are as follows:
insert image description here
20 multinomial function, multinomial distribution, what does it mean, pass in a weight tensor, and then select the subscript of the element according to the weight, first look at the code,


a=torch.tensor([[0.5,0.2],[0.4,0.6]])
print(a)
print(torch.multinomial(a,4, replacement=True))

a=torch.rand(2,3)

print(torch.multinomial(a,4, replacement=True))
a=torch.rand(2,3)
print(a)
print(torch.multinomial(a,4, replacement=True))

The output result is as follows. It returns an integer value, which is actually the subscript corresponding to the probability. We set the length, and that's it.
insert image description here

  1. The normal function is very important. It is a function of the normal distribution. After we set the mean and variance, and then set the dimension, a normal distribution tensor corresponding to the parameters will be generated. Let’s look at the code.


print(torch.normal(0,1,(2,3)))

print(torch.normal(0,3,(2,3)))

Output result:
insert image description here
If the first two parameters are passed in 0, 1, it becomes randn.

22. save, this function is used for machine learning, the neural network must be more familiar, yes, this is our deep learning, the save mentioned by the neural network, the function that saves the model.
Let's take a look at the sample code:


a=torch.rand(2,3)
torch.save(a,"tensor.pt")

torch.save(model,'net.pth')#保存网络结构和模型参数

torch.save(model.state_dict(),'net_params.pth')#只保存网络参数


By the way, there is no big difference between saving the question and using pth or pt

  1. After talking about save, what is the next talk? It must be loaded, this is to load data.
    Take a look at the sample code:


device=torch.device('cpu')
a=torch.load('tensor.pt',map_location=device)

print(a)

Output result:

insert image description here

Let's start with some functions for mathematical processing

Because the blogger took a look at most of the usages are basically the same, and it is much easier to use, I will introduce it in batches in the code here:
24. Mathematical functions, direct processing of elements

a=torch.randn(2,3)
print(a)
#绝对值处理
print(torch.abs(a))
a=torch.rand(2,3)

#反余弦处理
print(torch.acos(a))

#反正弦处理
print(torch.asin(a))

#正弦处理
print(torch.sin(a))

#正切处理
print(torch.tan(a))

#双曲正切处理
print(torch.tanh(a))

#双曲正弦处理
print(torch.sinh(a))
#反正切处理
print(torch.atan(a))

#向上取整

print(torch.ceil(a))
#exp处理
print(torch.exp(a))
#log处理
print(torch.log(a))
#log(a+1)处理
print(torch.log1p(a))
#取负处理
print(torch.neg(a))

#倒数处理

print(torch.reciprocal(a))
#得到除法余数
print(torch.remainder(a,0.2))
#张量元素分别加入一个值。
print(torch.add(a,10))#每个元素加10
#都除以一个数
print(torch.fmod(a,10))#每个元素除10
#向下取整
print(torch.floor(a))
#四舍五入处理
print(torch.round(a))

#平方根倒数
print(torch.rsqrt(a))
#sigmoid函数处理
print(torch.sigmoid(a))

#得到元素正负bool值
print(torch.sign(a))
#求平方根
print(torch.sqrt(a))

25. The addcdiv function uses tensor2 to divide tensor1 element by element, then multiplies the scalar value value and adds it to tensor.
insert image description here
Take a look at the sample code:

c=torch.tensor([[0,0,0],[0,0,0]])
a=torch.randn(2,3)
b=torch.rand(2,3)
print(c)
print(a)

print(b)
print(torch.addcdiv(c,0.5,a,b))

The output is as follows:
insert image description here
26.addcmul This usage is very similar to the above function. Take a look at the formula:
insert image description here
sample code:


c=torch.tensor([[0,0,0],[0,0,0]])
a=torch.randn(2,3)
b=torch.rand(2,3)
print(c)
print(a)
print(b)
print(torch.addcmul(c,0.5,a,b))

Output result:
insert image description here

27 lerp Linear interpolation processing, pass in two tensors with the same dimension, and a weight value
formula:

insert image description here

Sample code:

a=torch.randn(2,3)
b=torch.rand(2,3)
print(a)
print(b)

print(torch.lerp(a,b,10))

output:
insert image description here

29 mul Pass in two tensors of the same dimension and multiply them element-wise.


a=torch.randn(2,3)
b=torch.rand(2,3)
print(a)
print(b)

print(torch.mul(a,b))

The output is as follows:
insert image description here

29 cumprod, this function will multiply, but we set the position of the axis.
Take a look at the sample code:


a=torch.randint(1,4,(3,3))
print(a)
print(torch.cumprod(a,0))
print(torch.cumprod(a,1))

The test results are as follows:
By setting the axis, you can multiply by row or by column.
insert image description here

30 cumsum, this function is similar to the previous function, but it becomes cumulative.
Sample code:

a=torch.randint(1,4,(3,3))
print(a)
print(torch.cumsum(a,0))
print(torch.cumsum(a,1))

The output is as follows:
insert image description here

31. The dist function can be calculated by the Jining P function. Two tensors are input, and the tensor after subtracting the two tensors will be calculated to obtain the norm result.
Look at the norm solution formula:

insert image description here

The sample code is as follows:

a=torch.randint(1,5,(6,))

b=torch.randint(1,5,(6,))
a=a.type(dtype=torch.float32)
b=b.type(dtype=torch.float32)
print(a)
print(b)
print(torch.dist(a,b,2))


print(torch.dist(a,b,4))


a=torch.randint(1,5,(2,3))

b=torch.randint(1,5,(2,3))
a=a.type(dtype=torch.float32)
b=b.type(dtype=torch.float32)
print(a)
print(b)
print(torch.dist(a,b,2))


print(torch.dist(a,b,4))

This function is very important, please master it:
the output is as follows:
insert image description here

32 Mathematics-related processing functions

Let’s directly upload the code below. The usage of this function is relatively simple. You can understand it by running the following code.

a=torch.randint(1,4,(3,3))
print(a)

#求均值,第二个参数设置轴的方向,按行还是按列
print(torch.mean(a.type(dtype=torch.float32),0))
print(torch.mean(a.type(dtype=torch.float32),1))



#求中位数,第二个参数设置轴的方向,按行还是按列
print(torch.median(a,0))
print(torch.median(a,1))



#求众数
print(torch.mode(a,0))
print(torch.mode(a,1))
#求进行P范数求解,第二个为范数参数,第三个参数为轴的选择
print(torch.norm(a.type(dtype=torch.float32),2,0))
print(torch.norm(a.type(dtype=torch.float32),2,1))


#求累乘结果,第二个此参数设置轴
print(torch.prod(a.type(dtype=torch.float32),0))
print(torch.prod(a.type(dtype=torch.float32),1))
#求标准差
print(torch.std(a.type(dtype=torch.float32),0))
print(torch.std(a.type(dtype=torch.float32),1))
#求和
print(torch.sum(a,0))
print(torch.sum(a,1))
#求方差
print(torch.var(a.type(dtype=torch.float32),0))
print(torch.var(a.type(dtype=torch.float32),1))
#求最大值
print(torch.max(a,0))
print(torch.max(a,1))
#求最小值
print(torch.min(a,0))
print(torch.min(a,1))
#进行排序
print(torch.sort(a,0))
print(torch.sort(a,1))
  1. The following are some comparison operation functions eq, pass in two parameters, the first is a tensor, the second is a number or a tensor, and then compare whether they are equal or not. Take a look at the sample
    code:


a=torch.randint(1,4,(3,3))

b=torch.randint(1,4,(3,3))
print(torch.eq(a,1))
print(torch.eq(a,b))

The result is as follows:
insert image description here

  1. The equal function is more powerful, it is to compare whether two tensors are completely equal, let's take a look
a=torch.randint(1,4,(3,3))

b=torch.randint(1,4,(3,3))
print(torch.equal(a,b))
a=torch.tensor([[1,2,3]])

b=torch.tensor([[1,2,3]])
print(torch.equal(a,b))

b=b.type(dtype=torch.float32)
print(a.type())
print(b.type())
print(torch.equal(a,b))

Output result:
insert image description here
It can be seen that the types are not compared, but whether the values ​​are equal. Dimensions are not the same.
35. ge This function is to compare the size of two tensors, ge(a,b), a>=b returns true, otherwise returns false. The second argument can be a value or a tensor.
Take a look at the sample code:


a=torch.randint(1,4,(3,3))

b=torch.randint(1,4,(3,3))
print(a)
print(b)
print(torch.ge(a,b))

The output is as follows:

insert image description here

  1. gt This function also compares the size of two tensors, but the equal sign is removed, ge(a,b), a>b returns true, otherwise returns false.
    This does not need sample code, and it is almost the same as above.

  2. le This function also compares the size of two tensors, le(a,b), a<=b returns true, otherwise returns false, note that it is compared element by element, the second parameter can be a value or a tensor.

  3. lt This function also compares the size of two tensors, but the equal sign is removed, lt(a,b), a<b returns true, otherwise returns false. , pay attention to element-by-element comparison, the second parameter can be a value or a tensor.
    39 kthvalue, input a tensor, input a k and dim, dim is the direction of the selected axis, k is the smallest value to obtain, the second parameter can be a value or a tensor.
    Take a look at the sample code:


a=torch.randint(1,4,(4,4))

print(a)

print(torch.kthvalue(a,2,0))

print(torch.kthvalue(a,2,1))

The output results are as follows:
insert image description here
40 topk, input a tensor, input a k and dim, where dim is the selected axis direction, and k is the largest value obtained. Similar to kthvalue above.


a=torch.randint(1,4,(4,4))

print(a)

print(torch.topk(a,2,0))

print(torch.topk(a,2,1))

Take a look at the sample code

41.cross Cross product is also called vector product, which is knowledge in linear algebra.

The calculation formula of the vector product is as follows:
insert image description here

Take a look at the sample code:


a=torch.randint(1,4,(2,3))
b=torch.randint(1,4,(2,3))
print(a)
print(b)
print(torch.cross(a,b))

insert image description here

42.diag If the input is a vector, return a 2D square matrix with input as the diagonal element. If the input is a matrix, returns a 1D Tensor containing the diagonal elements of input.
Take a look at the sample code:



a=torch.randint(1,4,(2,3))
print(a)

print(torch.diag(a))
print(torch.diag(torch.tensor([1,2,3])))

The output is as follows:
insert image description here

43 The histc function, for interval statistics, passes in a tensor, the number of intervals, and the upper and lower bounds. If the upper and lower bounds are not passed in, the maximum and minimum values ​​in the tensor are used as the upper and lower bounds by default.
The sample code is as follows:


a=torch.randint(1,10,(2,3))
print(a)

print(torch.histc(a.type(dtype=torch.float32),5))
a=torch.randint(1,10,(10,))
print(a)

print(torch.histc(a.type(dtype=torch.float32),5))
print(torch.histc(a.type(dtype=torch.float32),5,5,10))

The output is as follows:
insert image description here

44. renorm, returns a tensor, including each sub-tensor after normalization, so that the p-norm of each sub-tensor divided along the dim dimension is smaller than maxnorm. If the value of the p-norm is less than maxnorm, the current subtensor does not need to be modified.
Take a look at the sample code:


a=torch.randint(1,10,(2,3))
print(a)

print(torch.renorm(a.type(dtype=torch.float32),2,0,10))

print(torch.renorm(a.type(dtype=torch.float32),2,1,10))

insert image description here

The output results are as follows:
45. trace, output the sum of the diagonal elements of the two-dimensional matrix



a=torch.randint(1,10,(2,3))
print(a)

print(torch.trace(a.type(dtype=torch.float32)))

Note that non-square matrices can also be found.

insert image description here

  1. tril takes a matrix as input and returns a matrix with all upper triangular elements set to 0.


a=torch.randint(1,10,(3,3))
print(a)

print(torch.tril(a.type(dtype=torch.float32)))

insert image description here

  1. triu inputs a matrix and returns a matrix with all lower triangular elements set to 0.


a=torch.randint(1,10,(3,3))
print(a)

print(torch.triu(a.type(dtype=torch.float32)))

insert image description here
47. dot returns the dot product of two tensors.
Take a look at the sample code:


a=torch.randint(1,10,(3,))
b=torch.randint(1,10,(3,))
print(a)
print(b)
print(torch.dot(a,b))

The output results are as follows:
insert image description here
48. inalg.eig, find the eigenvalues ​​and eigenvectors of the matrix, this is very important.
Take a look at the sample code:



a=torch.randint(1,10,(3,3))
print(a)

print(torch.inalg.eig(a.type(dtype=torch.float32)))

The output results are as follows:
insert image description here
49. inverse, inverse the matrix.





a=torch.randint(1,10,(3,3))
print(a)

print(torch.inverse(a.type(dtype=torch.float32)))

The output results are as follows:
insert image description here
50 mm, perform matrix multiplication.



a=torch.randint(1,5,(2,3))

b=torch.randint(1,5,(3,2))
print(a)
print(b)
print(torch.mm(a.type(dtype=torch.float32),b.type(dtype=torch.float32)))

The output is as follows:
insert image description here
51. mv, matrix and vector are multiplied.




a=torch.randint(1,5,(2,3))

b=torch.randint(1,5,(3,))
print(a)
print(b)
print(torch.mv(a.type(dtype=torch.float32),b.type(dtype=torch.float32)))

insert image description here

  1. any , returns true when there is an element in the tensor, otherwise returns false.

  2. all, returns true when all tensor elements are true, otherwise returns false.

At this point, our 1.3 is considered over. Below we attach all the codes of 1.3:

#coding=gbk


import os
import torch
import numpy  as np
#使用torch函数生成torch数据类型


a=torch.randn(2,3)
print(torch.is_tensor(a))

c=a.storage()

print(torch.is_storage(a))

print(torch.is_storage(c))


a=torch.randn(2,3)
b=torch.randn(7)
print(torch.numel(a))
print(torch.numel(b))


index=torch.tensor([[1,2],[2,1]])
value=torch.tensor([2,3],dtype=torch.float32)

t=torch.sparse_coo_tensor(index,value,(3,4))
print(t)


a=torch.randn(2,3)

b=torch.randn(2,4)

c=torch.randn(4,3)

print(a)
print(b)
print(c)
print(torch.cat((a,b),1))
print(torch.cat((a,c),0))


a=torch.randn(12,3)

print(torch.chunk(a,6,0))


a=torch.randn(3,12)

print(torch.chunk(a,6,1))

a=torch.randn(4,3)
index1=torch.LongTensor([[0,1,2,1]])
index2=torch.LongTensor([[0,1,2]])

print(a)
print(torch.gather(a,1,index1))

print(torch.gather(a,0,index2))



a=torch.randn(4,3)
index1=torch.LongTensor([0,1,2,1])
index2=torch.LongTensor([0,1,2])

print(a)
print(torch.index_select(a,1,index1))

print(torch.index_select(a,0,index2))

a=torch.randint(0,10,(2,3))

print(a)
b=torch.ge(a,5)
print(b)

a=torch.randint(0,10,(2,3))

print(a)
b=torch.ge(a,5)
print(b)
print(torch.masked_select(a,b))


a=torch.randint(0,10,(2,3))
print(a)
print(torch.nonzero(a))



a=torch.randint(0,10,(2,3))
b=a.reshape(3,2)
print(b)
b=a.reshape(6,1)
print(b)
b=a.reshape(1,6)
print(b)
b=a.reshape(6,)
print(b)



a=torch.randn(6,3)

print(a)
print(torch.split(a,3,0))

print(torch.split(a,[1,5],0))
a=torch.randn(3,6)
print(a)
print(torch.split(a,2,1))



a=torch.randint(0,10,(2,3))

print(a)
b=torch.unsqueeze(a,0)
print(b)
b=torch.unsqueeze(a,1)
print(b)



a=torch.randint(0,10,(2,3))

print(a)
b=torch.squeeze(a,0)
print(b)
a=torch.randint(0,10,(6,1))

b=torch.squeeze(a,0)
print(b)


a=torch.randint(0,10,(2,3))

b=torch.randint(0,10,(2,3))
print(a)
print(b)

print(torch.stack((a,b),1))
print(torch.stack((a,b),0))


a=torch.randint(0,10,(2,3))
print(a)
print(torch.t(a))



a=torch.randint(0,10,(2,3))
print(a)
print(torch.transpose(a,0,1))




a=torch.randint(0,10,(2,3))
print(a)
print(torch.unbind(a,0))
print(torch.unbind(a,1))





a=torch.randint(0,10,(2,3))

b=torch.randint(0,10,(2,3))
print(a)
print(b)
print(torch.where(a>5,a,b))

torch.manual_seed(10)
print(torch.initial_seed())


a=torch.tensor([[0.5,0.2],[0.4,0.6]])
print(a)
print(torch.bernoulli(a))


a=torch.tensor([[0.5,0.2],[0.4,0.6]])
print(a)
print(torch.multinomial(a,4, replacement=True))

a=torch.rand(2,3)

print(torch.multinomial(a,4, replacement=True))
a=torch.rand(2,3)
print(a)
print(torch.multinomial(a,4, replacement=True))


print(torch.normal(0,1,(2,3)))

print(torch.normal(0,3,(2,3)))

a=torch.rand(2,3)
torch.save(a,"tensor.pt")


device=torch.device('cpu')
a=torch.load('tensor.pt',map_location=device)

print(a)



a=torch.randn(2,3)
print(a)
#绝对值处理
print(torch.abs(a))
a=torch.rand(2,3)

#反余弦处理
print(torch.acos(a))

#反正弦处理
print(torch.asin(a))

#正弦处理
print(torch.sin(a))

#正切处理
print(torch.tan(a))

#双曲正切处理
print(torch.tanh(a))

#双曲正弦处理
print(torch.sinh(a))
#反正切处理
print(torch.atan(a))

#向上取整

print(torch.ceil(a))
#exp处理
print(torch.exp(a))
#log处理
print(torch.log(a))

#取负处理
print(torch.neg(a))

#倒数处理

print(torch.reciprocal(a))
#得到除法余数
print(torch.remainder(a,0.2))
#张量元素分别加入一个值。
print(torch.add(a,10))#每个元素加10

#四舍五入处理
print(torch.round(a))

#平方根倒数
print(torch.rsqrt(a))
#sigmoid函数处理
print(torch.sigmoid(a))

#得到元素正负bool值
print(torch.sign(a))
#求平方根
print(torch.sqrt(a))

c=torch.tensor([[0,0,0],[0,0,0]])
a=torch.randn(2,3)
b=torch.rand(2,3)
print(c)
print(a)

print(b)
print(torch.addcdiv(c,0.5,a,b))


c=torch.tensor([[0,0,0],[0,0,0]])
a=torch.randn(2,3)
b=torch.rand(2,3)
print(c)
print(a)
print(b)
print(torch.addcmul(c,0.5,a,b))

a=torch.randn(2,3)
b=torch.rand(2,3)
print(a)
print(b)

print(torch.mul(a,b))

a=torch.randint(1,4,(3,3))
print(a)
print(torch.cumprod(a,0))
print(torch.cumprod(a,1))



a=torch.randint(1,4,(3,3))
print(a)
print(torch.cumsum(a,0))
print(torch.cumsum(a,1))


a=torch.randint(1,5,(6,))

b=torch.randint(1,5,(6,))
a=a.type(dtype=torch.float32)
b=b.type(dtype=torch.float32)
print(a)
print(b)
print(torch.dist(a,b,2))


print(torch.dist(a,b,4))


a=torch.randint(1,5,(2,3))

b=torch.randint(1,5,(2,3))
a=a.type(dtype=torch.float32)
b=b.type(dtype=torch.float32)
print(a)
print(b)
print(torch.dist(a,b,2))


print(torch.dist(a,b,4))



a=torch.randint(1,4,(3,3))
print(a)

#求均值,第二个参数设置轴的方向,按行还是按列
print(torch.mean(a.type(dtype=torch.float32),0))
print(torch.mean(a.type(dtype=torch.float32),1))



#求中位数,第二个参数设置轴的方向,按行还是按列
print(torch.median(a,0))
print(torch.median(a,1))



#求众数
print(torch.mode(a,0))
print(torch.mode(a,1))
#求进行P范数求解,第二个为范数参数,第三个参数为轴的选择
print(torch.norm(a.type(dtype=torch.float32),2,0))
print(torch.norm(a.type(dtype=torch.float32),2,1))


#求累乘结果,第二个此参数设置轴
print(torch.prod(a.type(dtype=torch.float32),0))
print(torch.prod(a.type(dtype=torch.float32),1))
#求标准差
print(torch.std(a.type(dtype=torch.float32),0))
print(torch.std(a.type(dtype=torch.float32),1))
#求和
print(torch.sum(a,0))
print(torch.sum(a,1))
#求方差
print(torch.var(a.type(dtype=torch.float32),0))
print(torch.var(a.type(dtype=torch.float32),1))
#求最大值
print(torch.max(a,0))
print(torch.max(a,1))
#求最小值
print(torch.min(a,0))
print(torch.min(a,1))
#进行排序
print(torch.sort(a,0))
print(torch.sort(a,1))



a=torch.randint(1,4,(3,3))

b=torch.randint(1,4,(3,3))
print(torch.eq(a,1))
print(torch.eq(a,b))



a=torch.randint(1,4,(3,3))

b=torch.randint(1,4,(3,3))
print(torch.equal(a,b))
a=torch.tensor([[1,2,3]])

b=torch.tensor([[1,2,3]])
print(torch.equal(a,b))

b=b.type(dtype=torch.float32)
print(a.type())
print(b.type())
print(torch.equal(a,b))



a=torch.randint(1,4,(3,3))

b=torch.randint(1,4,(3,3))
print(a)
print(b)
print(torch.ge(a,b))



a=torch.randint(1,4,(4,4))

print(a)

print(torch.kthvalue(a,2,0))

print(torch.kthvalue(a,2,1))


a=torch.randint(1,4,(4,4))

print(a)

print(torch.topk(a,2,0))

print(torch.topk(a,2,1))


a=torch.randint(1,4,(2,3))
b=torch.randint(1,4,(2,3))
print(a)
print(b)



print(torch.cross(a,b))



a=torch.randint(1,4,(2,3))
print(a)

print(torch.diag(a))
print(torch.diag(torch.tensor([1,2,3])))


a=torch.randint(1,10,(2,3))
print(a)

print(torch.histc(a.type(dtype=torch.float32),5))
a=torch.randint(1,10,(10,))
print(a)

print(torch.histc(a.type(dtype=torch.float32),5))
print(torch.histc(a.type(dtype=torch.float32),5,5,10))


a=torch.randint(1,10,(2,3))
print(a)

print(torch.renorm(a.type(dtype=torch.float32),2,0,10))

print(torch.renorm(a.type(dtype=torch.float32),2,1,10))


a=torch.randint(1,10,(2,3))
print(a)

print(torch.trace(a.type(dtype=torch.float32)))



a=torch.randint(1,10,(3,3))
print(a)

print(torch.tril(a.type(dtype=torch.float32)))


a=torch.randint(1,10,(3,3))
print(a)

print(torch.triu(a.type(dtype=torch.float32)))

a=torch.randint(1,10,(3,))
b=torch.randint(1,10,(3,))
print(a)
print(b)
print(torch.dot(a,b))



a=torch.randint(1,10,(3,3))
print(a)

print(torch.linalg.eig(a.type(dtype=torch.float32)))



a=torch.randint(1,10,(3,3))
print(a)

print(torch.inverse(a.type(dtype=torch.float32)))


a=torch.randint(1,5,(2,3))

b=torch.randint(1,5,(3,2))
print(a)
print(b)
print(torch.mm(a.type(dtype=torch.float32),b.type(dtype=torch.float32)))



a=torch.randint(1,5,(2,3))

b=torch.randint(1,5,(3,))
print(a)
print(b)
print(torch.mv(a.type(dtype=torch.float32),b.type(dtype=torch.float32)))
os.system("pause")

2. After talking about data types and their operations, choose a certain direction to start learning and practicing.

After we have learned the knowledge, we must carry out this practice, but there are many practice directions that pytorch can choose, but the most commonly used direction is the direction of deep learning, and bloggers also start to explain from this direction.

For the deep learning practice part, we need to start with the following steps:
2.1 Derivation
2.2 Loss function
2.3 Optimizer
2.4 Linear regression code combat
2.5 Convolutional neural network combat
2.6 Neural network combat
2.7 RNN and LSTM combat

Let's start the text
2.1 Derivation
Why is derivation required? In fact, it is to iterate our model parameters. Generally, gradient descent algorithm is used, so derivation is required to solve the gradient and then update the model parameters.

First of all, we know that the gradient is for updating, so it means that the set parameters are variable, that is, variables. Pytorch provides this convenience. We can use
Variable in autograd to generate variables, which can save their corresponding gradients.
Let's look at the generation code of a Variable object



from torch.autograd import Variable
x = Variable(torch.randint(0,4,(2, 3)).type(dtype=torch.float32), requires_grad=True)

The output is as follows:
insert image description here

requires_grad=True means to save gradient information.

Let's look at a piece of code to solve the extraction:

print(x)
#进行x运算
y=x*x
out=y.mean()
print(y)

print(out)

out.backward()
print(x.grad)

The output is as follows:
insert image description here

The above function is out=sum(x*x)/len(x), which calculates the average after squaring x element by element.

Let's look at a gradient update code:

learning_rating=0.001
y_list=[]

def grad_update(x):
    print("||")
    print(x)
    for i in range(1000):
        y=x*x
        out=torch.abs(y.sum())
        print(out)
        out.backward()
        x.data=x.data-(learning_rating*x.grad.data)
       # print(x.data)
        y_list.append(out.data)


grad_update(x)
plt.plot(list(range(len(y_list))),y_list)
plt.show()



During the update process of x, the output of out is as follows:

insert image description here
In fact, there is a problem of local optimal solution because the learning rate is set too large.
insert image description here
When reaching the lowest point soon, some parameters become smaller and some parameters become larger. Then skip the optimal point directly.
Okay, that's all for our derivation part.

2.1 The complete code is as follows:

#coding=gbk
import os
import torch
import numpy  as np
import matplotlib.pyplot as plt
#使用torch函数生成torch数据类型

from torch.autograd import Variable
x = Variable(torch.randint(0,4,(2, 3)).type(dtype=torch.float32), requires_grad=True)

print(x)
#进行x运算
y=x*x
out=y.mean()
print(y)

print(out)

out.backward()
print(x.grad.data)


learning_rating=0.000001
y_list=[]

def grad_update(x):
    print("||")
    print(x)
  
   
    for i in range(1000):
        y=x*x
        out=torch.abs(y.sum())
        print(out)
        out.backward()
        x.data=x.data-(learning_rating*x.grad.data)
       # print(x.data)
        y_list.append(out.data)
        print(x.data)
        print(x.grad.data)


grad_update(x)
plt.plot(list(range(len(y_list))),y_list)
plt.show()

2.2 Loss function
Let's go to the second part of the loss function.
Before defining the loss function, we must first define two tensors to make the loss function take effect. One is the target tensor and the other is the prediction tensor.

import os
import torch
import torch.nn as nn
import numpy  as np
import matplotlib.pyplot as plt
#使用torch函数生成torch数据类型

from torch.autograd import Variable
predict = Variable(torch.randint(0,4,(2, 3)).type(dtype=torch.float32), requires_grad=True)


a=torch.Tensor([[1,2,2],[1,1,2]])

target = Variable (a)

(1) Now let's look at the first loss function nn.L1Loss, which takes the average of the absolute errors between the predicted value and the real value.

print(predict)

print(target)
criterion = nn.L1Loss()
loss = criterion(predict, target)

Take a look at the output:

insert image description here

(2) nn.SmoothL1Loss loss function, the error is square loss on (-1,1), and other cases are treated as L1Loss loss.


print(predict)
print(target)
criterion = nn.SmoothL1Loss()
loss = criterion(predict, target)

Output result:
insert image description here

(3) nn.MSELoss loss function, square loss function. Its calculation formula is the average of the sum of squares between the predicted value and the true value.

print(predict)
print(target)
criterion = nn.MSELoss()
loss = criterion(predict, target)
print(loss)

The output is as follows:
insert image description here

(4) nn.CrossEntropyLoss, cross entropy loss function

The calculation formula of the cross entropy loss function is as follows:
insert image description here
the code is the same

print(predict)
print(target)
criterion = nn.CrossEntropyLoss()
loss = criterion(predict, target)
print(loss)

(5) nn.NLLLoss, negative log likelihood loss function.
Calculated as follows:
insert image description here

This function is generally used in images.



m = nn.LogSoftmax(dim=1) #横向计算
loss = nn.NLLLoss()
torch.manual_seed(2)
# 3行5列的输入,即3个样本各包含5个特征,每个样本通过softmax产生5个输出
input = torch.randn(3, 5, requires_grad=True)
target = torch.tensor([1, 0, 4])
# NLL将取输出矩阵中第0行的第1列、第1行的第0列、第2行的第4列加负号求和
output = loss(m(input), target)

Here are a few commonly used ones. In practice, we will use them slowly, and bong has mastered them proficiently.
2.2 The complete sample code is as follows:

#coding=gbk


import os
import torch
import numpy  as np
import matplotlib.pyplot as plt
#使用torch函数生成torch数据类型
import torch.nn as nn

from torch.autograd import Variable
predict = Variable(torch.randint(0,4,(2, 3)).type(dtype=torch.float32), requires_grad=True)


a=torch.Tensor([[1,2,2],[1,1,2]])

target = Variable (a)
print(predict)

print(target)
criterion = nn.L1Loss()
loss = criterion(predict, target)
print(loss)

print(predict)
print(target)
criterion = nn.SmoothL1Loss()
loss = criterion(predict, target)
print(loss)

print(predict)
print(target)
criterion = nn.MSELoss()
loss = criterion(predict, target)
print(loss)



print(predict)
print(target)
criterion = nn.CrossEntropyLoss()
loss = criterion(predict, target)
print(loss)



m = nn.LogSoftmax(dim=1) #横向计算
loss = nn.NLLLoss()
torch.manual_seed(2)
# 3行5列的输入,即3个样本各包含5个特征,每个样本通过softmax产生5个输出
input = torch.randn(3, 5, requires_grad=True)
target = torch.tensor([1, 0, 4])
# NLL将取输出矩阵中第0行的第1列、第1行的第0列、第2行的第4列加负号求和
output = loss(m(input), target)
print(output)


2.3 Optimizer An optimizer
is an algorithm in layman's terms, an algorithm for calculating derivatives. The purpose of various optimizers and the original intention of inventing them is to allow users to choose an optimizer suitable for their own scenarios. The most important measure of the optimizer is the smoothness of the optimization curve. The best optimizer is that each round of sample data optimization makes the weight parameters approach the target value at a uniform speed, instead of jumping up and down. Therefore, the steady decline of the loss value is a very important measure for a deep learning model
(1) SGD optimizer, the batch stochastic gradient descent function, randomly selects part of the data set to participate in the calculation, which is the batch version of the gradient descent.

It is to select batch samples and solve the common loss to update the model.

SGD's formula:
insert image description here
Momentum formula:

insert image description here

γ \gamma γ is the momentum transfer parameter.

Use the code as follows:


from troch import optim
optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)

The processing of this function batch is processed by ourselves in advance, and this function will not process it.
(2) RMSprop optimizer, this optimizer is actually an adaptive adjustment of the learning rate

Take a look at his update formula:
insert image description here
Among them, gt ​represents the gradient at time t, sts represents the moving average of the gradient square at time t, β is the moving average coefficient, generally 0.9, α is the learning rate, ϵ is A minimum value to avoid zero in the denominator. Among them, g_t ​ represents the gradient at time t, s_t s represents the moving average of the gradient square at time t, \beta is the moving average coefficient, generally 0.9, \alpha is the learning rate, \epsilon is a minimum value , to avoid a denominator of 0.where gt​Indicates the gradient at time t , sts represents the moving average of the gradient square at time t , β is the moving average coefficient, generally 0.9 , α is the learning rate, and ϵ is a minimum value to avoid the denominator being 0 .

Use the code as follows:

torch.optim.RMSprop(params,
                    lr=0.01,
                    alpha=0.99,
                    eps=1e-08,
                    weight_decay=0,
                    momentum=0,
                    centered=False)

(3) AdaGrad optimizer

AdaGrad can automatically change the learning rate, but only needs to set a global learning rate ϵ, but this is not the actual learning rate, the actual rate is inversely proportional to the root of the sum of the moduli of the previous parameters. Maybe it’s a bit convoluted, but it’s more straightforward to express it with a formula: where δ is a small constant light, about 10-7, to prevent the situation of dividing by 0.
Implementation:
insert image description here

Requires: global learning rate ϵ, initial parameter θ, numerical stability δ.
Intermediate variable: gradient accumulation r (initialized to 0).
Each iteration process:

  1. Randomly select a batch of samples {x1,…,xm} of capacity m from the training set and the related
    output yi.
  2. Calculate the gradient and error, update r, and then calculate the parameter update amount according to r and gradient.
    insert image description here
    Use the code:
torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)

(4) Adadelta optimizer, see how to implement it

Let’s look at the first formula first:
insert image description here
Here, the update parameter uses the expectation of the gradient, and the calculation formula is as above.

Then the square root is obtained to obtain the square root of the gradient expectation, ϵ is a constant that prevents the denominator from being 0:
insert image description here

After that, for the t state, the parameter update amount is:
insert image description here
the final update formula is:
insert image description here

The Adadelta optimizer mainly adopts the expected method to update the gradient, considers a sliding window w, and considers the gradient every time the parameters are updated in this window state, so that the update situation is more stable.

Use the code as follows:

keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=1e-06)

(5) The Adam optimizer
update formula is as follows:
insert image description here
So we need to pass in the learning rate, and there are two other parameters, r and s
calling methods are as follows:

torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

(6) Adamax optimizer

The update formula is as follows:
insert image description here
We generally only need to set the following three parameters, the default is:
insert image description here

Use the code:

torch.optim.Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

Optimizer Here, in fact, the optimizer will not be of great help to the performance of our model under normal circumstances. The optimizer does more to speed up our solution to the optimal solution and prevent local optimal solutions from appearing.

2.4 Linear regression code combat

Because linear regression is relatively simple, we can simulate the data ourselves. In order to repeat the experiment, we first need to set a random seed.
And import related dependencies

torch.import os
import torch

from torch.utils import data
import numpy  as np
import matplotlib.pyplot as plt


from numpy import random
#使用torch函数生成torch数据类型
import torch.nn as nn

from torch.autograd import Variable
torch.manual_seed(1)

Next, we generate random data, which is actually to generate a set of data, then perform linear processing on x, and then add noise.

X = np.linspace(-1, 1, 200)
Y = 0.5 * X + 0.2* np.random.normal(0, 0.5, (200, ))
plt.scatter(X,Y)
plt.show()
#将X,Y转成200 batch大小,1维度的数据
X=Variable(torch.Tensor(X.reshape(200,1)))
Y=Variable(to

will draw a graph:
insert image description here

The data generated above is relatively simple.

Then look at the model solving code:


model = torch.nn.Sequential(torch.nn.Linear(1, 1),)#输出结果为1,输出结果也为1

optimizer = torch.optim.SGD(model.parameters(), lr=0.5)
loss_function = torch.nn.MSELoss()
for i in range(300):
     prediction = model(X)
     loss = loss_function(prediction, Y)
     optimizer.zero_grad()
     loss.backward()
     optimizer.step()

plt.figure(1, figsize=(5, 5))

plt.title('model')
plt.scatter(X.data.numpy(), Y.data.numpy())
plt.plot(X.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
plt.show()

The linear model is obtained as follows:
insert image description here

The above example is relatively simple, let's take a multiple linear regression. But the multiple regression can't be displayed with an image, so let's solve his loss to see the result.
Go directly to the code:



X =torch.randn(100,4)
w=torch.tensor([1,2,3,4])

Y =torch.matmul(X, w.type(dtype=torch.float))  + torch.normal(0, 0.1, (100, ))+6.5
Y=Y.reshape((-1, 1))
print(Y.type())
print(w.type())
print(X.type())
#将X,Y转成200 batch大小,1维度的数据
X=Variable(X)
Y=Variable(Y)
def load_array(data_arrays, batch_size, is_train=True):
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

data_iter = load_array((X, Y), 32)


model = torch.nn.Sequential(torch.nn.Linear(4, 1))

optimizer = torch.optim.SGD(model.parameters(), lr=0.03)
loss_function = torch.nn.MSELoss()
num_epochs = 20
for epoch in range(num_epochs):
    for x, y in data_iter:
        l = loss_function(model(x), y)
        optimizer.zero_grad()
        l.backward()
        optimizer.step()
    l = loss_function(model(X), Y)
    print(f'epoch {
      
      epoch + 1}, loss {
      
      l:f}')

for para in model.parameters():
        print(para)  

This is a linear regression, a quaternary linear regression.

Take a look at the output:
insert image description here
the parameters of the model are almost the same as those we set. What I want to say is that the above code is actually a complete machine learning code in a strict sense, and the SGD algorithm was not used at the beginning.
Moreover, the difference between the one without the SGD algorithm and the one with the SGD algorithm is very different.
That's it for the linear regression code.
The complete code is attached below:

#coding=gbk


import os
import torch
import numpy  as np
import matplotlib.pyplot as plt

from torch.utils import data
from numpy import random
#使用torch函数生成torch数据类型
import torch.nn as nn

from torch.autograd import Variable
torch.manual_seed(1)

X = np.linspace(-1, 1, 200)
Y = 0.5 * X + 0.2* np.random.normal(0, 0.5, (200, ))
#plt.scatter(X,Y)
#plt.show()
#将X,Y转成200 batch大小,1维度的数据
X=Variable(torch.Tensor(X.reshape(200,1)))
Y=Variable(torch.Tensor(Y.reshape(200,1)))

model = torch.nn.Sequential(torch.nn.Linear(1, 1),)

optimizer = torch.optim.SGD(model.parameters(), lr=0.5)
loss_function = torch.nn.MSELoss()
for i in range(300):
     prediction = model(X)
     loss = loss_function(prediction, Y)
     optimizer.zero_grad()
     loss.backward()
     optimizer.step()

#plt.figure(1, figsize=(5, 5))

#plt.title('model')
#plt.scatter(X.data.numpy(), Y.data.numpy())
#plt.plot(X.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
#plt.show()


X =torch.randn(100,4)
w=torch.tensor([1,2,3,4])

Y =torch.matmul(X, w.type(dtype=torch.float))  + torch.normal(0, 0.1, (100, ))+6.5
Y=Y.reshape((-1, 1))
print(Y.type())
print(w.type())
print(X.type())
#将X,Y转成200 batch大小,1维度的数据
X=Variable(X)
Y=Variable(Y)
def load_array(data_arrays, batch_size, is_train=True):
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

data_iter = load_array((X, Y), 32)


model = torch.nn.Sequential(torch.nn.Linear(4, 1))

optimizer = torch.optim.SGD(model.parameters(), lr=0.03)
loss_function = torch.nn.MSELoss()
num_epochs = 20
for epoch in range(num_epochs):
    for x, y in data_iter:
        l = loss_function(model(x), y)
        optimizer.zero_grad()
        l.backward()
        optimizer.step()
    l = loss_function(model(X), Y)
    print(f'epoch {
      
      epoch + 1}, loss {
      
      l:f}')

for para in model.parameters():
        print(para)  

2.5 Convolutional Neural Networks in Practice

The convolutional neural network is actually nothing more than a model:
the following is a classic model, first convolution, then ReLU function processing, and then pooling to get the final feature map we want.
insert image description here

Then the matrix size calculation formula after convolution is as follows:
insert image description here

First, we import the relevant dependency packages, and then load the dataset:

import torch 
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
 
 
# 设备配置
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
 
# 超参数

# MNIST 数据集
train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)
 
test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False, 
                                          transform=transforms.ToTensor())
 
# 数据加载器
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)
 
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)
 

Below we define the network:


class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7*7*32, num_classes)
        
    def forward(self, x):
        out = self.layer1(x)#1x28x28->16x14x14
        out = self.layer2(out)#16x14x14->32x7x7
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out
 

Define the loss function and optimizer:


# 损失和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
 

Train the model and test the model:

# 训练模型
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        # 前向传播
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # 向后优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
 
# 测试模型
model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
 
    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))
 

Save the model:

 
# 保存模型
torch.save(model.state_dict(), 'model.ckpt')

This example is very typical.
The complete code is attached below:

#coding=gbk

import torch 
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
 
 
#定义超参数
num_epochs = 5
num_classes = 10
batch_size = 32
learning_rate = 0.001
# 设备配置
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
 


# MNIST 数据集
train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)
 
test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False, 
                                          transform=transforms.ToTensor())
 
# 数据加载器
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)
 
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

 
class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))

        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
      
      
        self.fc = nn.Linear(7*7*32, num_classes)
        
    def forward(self, x):
        out = self.layer1(x)#1x28x28->
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out
 
model = ConvNet(num_classes).to(device)
 
# 损失和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
 
# 训练模型
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        # 前向传播
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # 向后优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
 
# 测试模型
model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
 
    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))
 
# 保存模型
torch.save(model.state_dict(), 'model.ckpt')

This result is great:
insert image description here
2.6 Neural Networks in Action

Next, we will start using pytorch for actual combat of neural networks.

In fact, neural networks are much simpler than convolutional neural networks.

Here we also use the minist dataset for testing. The idea is to flatten the picture to form a long vector of 784. Then, we build a hidden layer of 200 neurons and a fully connected layer of 10 neurons to output classification. . Use sigmoid as the activation function.
Let's look at the code.
The first step is to import the relevant library, load the data set, and set the hyperparameters:


import torch 
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
 
 
#定义超参数
num_epochs = 5
num_classes = 10
batch_size = 32
learning_rate = 0.001
# 设备配置
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
 


# MNIST 数据集
train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)
 
test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False, 
                                          transform=transforms.ToTensor())
 
# 数据加载器
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)
 
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

Define model structure, optimizer, loss function:


class Classifier(nn.Module):

  def __init__(self):
    # 初始化PyTorch父类
    super().__init__()

    # 定义神经网络层
    self.model = nn.Sequential(
        nn.Linear(784, 200),
        nn.Sigmoid(),
        nn.Linear(200, 10),
        nn.Sigmoid()
    )

    # 创建损失函数
  

  def forward(self, inputs):
    # 直接运行模型
    
    inputs=inputs.reshape(-1,784)
    return self.model(inputs)

model = Classifier().to(device)
 
criterion=nn.MSELoss()

optimizer= torch.optim.SGD(model.parameters(), lr=0.01)

Train the model and test the model:



# 训练模型
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        # 前向传播
        outputs = model(images)
       # print(outputs)
        loss=0
        p=0
        for j in labels:
            label=torch.tensor([0,0,0,0,0,0,0,0,0,0])
            label[j]=1
            label=label.type(dtype=torch.float)
            label = label.to(device)
          #  print(label)
             
            loss = criterion(outputs[p], label)+loss
            
            p=p+1
        
        # 向后优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
       # print(i)
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
 
# 测试模型
model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)


with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = model(images)

        loss=0
        p=0
        for j in labels:
            label=torch.tensor([0,0,0,0,0,0,0,0,0,0])
            label[j]=1
            label=label.type(dtype=torch.float)
            label = label.to(device)

            if outputs[p].argmax()==label[j]:
                correct=correct+1
          #  print(label)
             
            loss = criterion(outputs[p], label)+loss

            p=p+1
            total=total+1
        
      
       
 
    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))

Save the model:

 
# 保存模型
torch.save(model.state_dict(), 'model.ckpt')

Then take a look at our test results:

insert image description here

Here is the full code for 2.6:

#coding=gbk

import torch 
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
 
 
#定义超参数
num_epochs = 5
num_classes = 10
batch_size = 32
learning_rate = 0.001
# 设备配置
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
 


# MNIST 数据集
train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)
 
test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False, 
                                          transform=transforms.ToTensor())
 
# 数据加载器
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)
 
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

 
class Classifier(nn.Module):

  def __init__(self):
    # 初始化PyTorch父类
    super().__init__()

    # 定义神经网络层
    self.model = nn.Sequential(
        nn.Linear(784, 200),
        nn.Sigmoid(),
        nn.Linear(200, 10),
        nn.Sigmoid()
    )

    # 创建损失函数
  

  def forward(self, inputs):
    # 直接运行模型
    
    inputs=inputs.reshape(-1,784)
    return self.model(inputs)

model = Classifier().to(device)
 
criterion=nn.MSELoss()

optimizer= torch.optim.SGD(model.parameters(), lr=0.01)


total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        # 前向传播
        outputs = model(images)
       # print(outputs)
        loss=0
        p=0
        for j in labels:
            label=torch.tensor([0,0,0,0,0,0,0,0,0,0])
            label[j]=1
            label=label.type(dtype=torch.float)
            label = label.to(device)
          #  print(label)
             
            loss = criterion(outputs[p], label)+loss
            
            p=p+1
        
        # 向后优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
       # print(i)
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
 
# 测试模型
model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)


with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = model(images)

        loss=0
        p=0
        for j in labels:
            label=torch.tensor([0,0,0,0,0,0,0,0,0,0])
            label[j]=1
            label=label.type(dtype=torch.float)
            label = label.to(device)

            if outputs[p].argmax()==label[j]:
                correct=correct+1
          #  print(label)
             
            loss = criterion(outputs[p], label)+loss

            p=p+1
            total=total+1
        
      
       
 
    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))
 
# 保存模型
torch.save(model.state_dict(), 'model.ckpt')

2.7 RNN and LSTM in practice

The RNN model was introduced before, let's fight it directly.

The idea is to use the minist dataset with a size of 28x28 to split into 28 sequence output models.
Still the same, import related libraries, set hyperparameters, load dataset:

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

# Hyper Parameters
EPOCH = 1               # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 64
TIME_STEP = 28          # rnn time step / image height
INPUT_SIZE = 28         # rnn input size / image width
LR = 0.01               # learning rate
DOWNLOAD_MNIST = True   # set to True if haven't download the data

# Mnist digital dataset
train_data = torchvision.datasets.MNIST(
    root='../../data/',
    train=True,                         # this is training data
    transform=transforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to
                                        # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
    download=DOWNLOAD_MNIST,            # download it if you don't have it
)
print(train_data.train_data.size())     # (60000, 28, 28)
print(train_data.train_labels.size())   # (60000)

# 加载训练数据集
train_loader = torch.utils.data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)

# 加载测试数据集,选取2000个样本以加快测试速度
test_data = torchvision.datasets.MNIST(root='../../data/', train=False, transform=transforms.ToTensor())
test_x = test_data.test_data.type(torch.FloatTensor)[:2000]/255.   # shape (2000, 28, 28) value in range(0,1)
test_y = test_data.test_labels.numpy()[:2000]    # covert to numpy array

Define the RNN model:


class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()

        self.rnn = nn.LSTM(         # if use nn.RNN(), it hardly learns
            input_size=INPUT_SIZE,
            hidden_size=64,         # rnn hidden unit
            num_layers=1,           # number of rnn layer
            batch_first=True,       # input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)
        )
        self.out = nn.Linear(64, 10)

    def forward(self, x):
        # x shape (batch, time_step, input_size)
        # r_out shape (batch, time_step, output_size)
        # h_n shape (n_layers, batch, hidden_size)
        # h_c shape (n_layers, batch, hidden_size)
        r_out, (h_n, h_c) = self.rnn(x, None)   # None represents zero initial hidden state

        # choose r_out at the last time step
        out = self.out(r_out[:, -1, :])
        return out

Optimizer and model tests:


optimizer = torch.optim.Adam(rnn.parameters(), lr=LR)
loss_func = nn.CrossEntropyLoss()

# 训练
for epoch in range(EPOCH):
    for step, (b_x, b_y) in enumerate(train_loader):        # gives batch data
        b_x = b_x.view(-1, 28, 28)              # reshape x to (batch, time_step, input_size)

        output = rnn(b_x)                               # rnn output
        loss = loss_func(output, b_y)                   # cross entropy loss
        optimizer.zero_grad()                           # clear gradients for this training step
        loss.backward()                                 # backpropagation, compute gradients
        optimizer.step()                                # apply gradients

        if step % 50 == 0:
            test_output = rnn(test_x)                   # (samples, time_step, input_size)
            pred_y = torch.max(test_output, 1)[1].data.numpy()
            accuracy = float((pred_y == test_y).astype(int).sum()) / float(test_y.size)
            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)

# print 10 predictions from test data
test_output = rnn(test_x[:10].view(-1, 28, 28))
pred_y = torch.max(test_output, 1)[1].data.numpy()
print(pred_y, 'prediction number')
print(test_y[:10], 'real number')

Effect:
insert image description here

The effect of the model is still very powerful:

The model code is implemented as follows:


class LSTM(nn.Module):
     def __init__(self):
        super(LSTM, self).__init__()

        self.rnn = nn.LSTM(         # if use nn.RNN(), it hardly learns
            input_size=INPUT_SIZE,
            hidden_size=64,         # rnn hidden unit
            num_layers=1,           # number of rnn layer
            batch_first=True,       # input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)
        )
        self.out = nn.Linear(64, 10)

     def forward(self, x):
        # x shape (batch, time_step, input_size)
        # r_out shape (batch, time_step, output_size)
        # h_n shape (n_layers, batch, hidden_size)
        # h_c shape (n_layers, batch, hidden_size)
        r_out, (h_n, h_c) = self.rnn(x, None)   # None represents zero initial hidden state

        # choose r_out at the last time step
        out = self.out(r_out[:, -1, :])
        return out

Then there is LSTM actual combat, because RNN and LSTM are very similar, so we don't need to move the rest, just change the code of the model.

The lstm model is much more powerful:

insert image description here

The complete code for this part is as follows:

#coding=gbk

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

# Hyper Parameters
EPOCH = 1               # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 64
TIME_STEP = 28          # rnn time step / image height
INPUT_SIZE = 28         # rnn input size / image width
LR = 0.01               # learning rate
DOWNLOAD_MNIST = True   # set to True if haven't download the data

# Mnist digital dataset
train_data = torchvision.datasets.MNIST(
    root='../../data/',
    train=True,                         # this is training data
    transform=transforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to
                                        # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
    download=DOWNLOAD_MNIST,            # download it if you don't have it
)
print(train_data.train_data.size())     # (60000, 28, 28)
print(train_data.train_labels.size())   # (60000)

# 加载训练数据集
train_loader = torch.utils.data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)

# 加载测试数据集,选取2000个样本以加快测试速度
test_data = torchvision.datasets.MNIST(root='../../data/', train=False, transform=transforms.ToTensor())
test_x = test_data.test_data.type(torch.FloatTensor)[:2000]/255.   # shape (2000, 28, 28) value in range(0,1)
test_y = test_data.test_labels.numpy()[:2000]    # covert to numpy array


class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()

        self.rnn = nn.RNN(         # if use nn.RNN(), it hardly learns
            input_size=INPUT_SIZE,
            hidden_size=28,         # rnn hidden unit
            num_layers=1,           # number of rnn layer
            batch_first=True,       # input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)
        )
        self.out = nn.Linear(28, 10)

    def forward(self, x):
        # x shape (batch, time_step, input_size)
        # r_out shape (batch, time_step, output_size)
        # h_n shape (n_layers, batch, hidden_size)
        # h_c shape (n_layers, batch, hidden_size)
     #   print(self.rnn(x, None))
        r_out,h = self.rnn(x, None)   # None represents zero initial hidden state

        # choose r_out at the last time step
        out = self.out(r_out[:, -1, :])
        return out


#print(rnn)


class LSTM(nn.Module):
     def __init__(self):
        super(LSTM, self).__init__()

        self.rnn = nn.LSTM(         # if use nn.RNN(), it hardly learns
            input_size=INPUT_SIZE,
            hidden_size=64,         # rnn hidden unit
            num_layers=1,           # number of rnn layer
            batch_first=True,       # input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)
        )
        self.out = nn.Linear(64, 10)

     def forward(self, x):
        # x shape (batch, time_step, input_size)
        # r_out shape (batch, time_step, output_size)
        # h_n shape (n_layers, batch, hidden_size)
        # h_c shape (n_layers, batch, hidden_size)
        r_out, (h_n, h_c) = self.rnn(x, None)   # None represents zero initial hidden state

        # choose r_out at the last time step
        out = self.out(r_out[:, -1, :])
        return out


rnn = LSTM()
#rnn = RNN()

optimizer = torch.optim.Adam(rnn.parameters(), lr=LR)
loss_func = nn.CrossEntropyLoss()

# 训练
for epoch in range(EPOCH):
    for step, (b_x, b_y) in enumerate(train_loader):        # gives batch data
        b_x = b_x.view(-1, 28, 28)              # reshape x to (batch, time_step, input_size)

        output = rnn(b_x)                               # rnn output
    #    print(output)
        loss = loss_func(output, b_y)                   # cross entropy loss
        optimizer.zero_grad()                           # clear gradients for this training step
        loss.backward()                                 # backpropagation, compute gradients
        optimizer.step()                                # apply gradients

        if step % 50 == 0:
            test_output = rnn(test_x)                   # (samples, time_step, input_size)
            pred_y = torch.max(test_output, 1)[1].data.numpy()
            accuracy = float((pred_y == test_y).astype(int).sum()) / float(test_y.size)
            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)

# print 10 predictions from test data
test_output = rnn(test_x[:10].view(-1, 28, 28))
pred_y = torch.max(test_output, 1)[1].data.numpy()
print(pred_y, 'prediction number')
print(test_y[:10], 'real number')

Okay, here we go, even if this tutorial is over, the blogger will continue to update the content in this area in the future. If you have any questions, please leave a message below the blog.

Guess you like

Origin blog.csdn.net/weixin_43327597/article/details/131298864