Two frame depth study the advantages and disadvantages of dynamic graph (PyTorch) simultaneously with the operation performed when calculating the code of FIG. FIG static (Tensorflow < 2.0 ) self nomenclature system self timing control for interfering advantage of using deep learning framework GPU acceleration (CUDA) automatic derivation API common network layer PyTorch features a GPU dynamic neural network Python priority command formula experience easy spread
1 .Pytorch About Pytorch is Facebook's AI research team released a Python-based scientific computing package, designed to serve two types of situations: Alternative numpy play GPU potential (online environment does not support GPU) that provides a high degree of flexibility and efficiency depth study of experimental platform 2 .Pytorch features and Benefits 2.1 Pytorch features PyTorch provides a running GPU / above the CPU, based on tensor manipulation library; can be built-in neural network library; provide training model function; support for multi-process shared memory concurrent (multiprocessing) libraries; 2.2 Pytorch features in machine learning among the largest ecosystem Python language, so developers can use the majority of Python libraries and software; such as NumPy, SciPy and Cython (for speed to compile Python to C language); (greatest advantage) improve existing neural networks, provides a more rapid approach - without rebuilding the entire network from scratch, which is due to the use of dynamic calculation of FIG PyTorch ( dynamic Computational Graph) structure, rather than a majority of open source framework (TensorFlow, Caffe, CNTK, Theano, etc.) used in a static FIG state calculations; provides kits, such as the torch, torch.nn, torch.optim the like; . 3 .Pytorch common Toolkit torch: Similar tensor NumPy library, strong GPU support; torch.autograd: Based on the difference between automatic tape library supports all tensor can be distinguished among the torch run; torch.nn: To maximize flexibility not involved, and autograd deep integration of neural network library; torch.optim: optimization pack for use with torch.nn, containing SGD, RMSProp, LBFGS, Adam and other standard optimization mode; torch.multiprocessing: Python more complicated process, torch Tensors memory between processes sharing; torch.utils: data loader. Having a training and other convenience features; torch.legacy (.nn /.optim): in consideration of backward compatibility, the legacy codes Torch transplanted;
pytorch Getting Started
1. pytorch Overview
pytorch facebook is developed python version torch (Lua language), and in 2017 detonated academia
official propaganda pytorch focused on two types of users: numpy version of gpu depth study and research platform
pytorch use dynamic graph mechanism, compared to the beginning of tensorflow static picture, more flexible
current pytorch supported systems include: win, linux, macos
official propaganda pytorch focused on two types of users: numpy version of gpu depth study and research platform
pytorch use dynamic graph mechanism, compared to the beginning of tensorflow static picture, more flexible
current pytorch supported systems include: win, linux, macos
2. pytorch basic library
Commonly used pytorch basic library include:
torch: contains some common methods, and more like numpy
torch.Tensor: tensor contains some of the operations of the method may be called by tensor.xx ()
torch.nn: contains some commonly used models, such as rnn, cnn other
torch. nn.functional: contains a number of common methods, such as sigmoid, softmax other
torch.optim: contains a number of optimization algorithms, such as SGD, and other ADAM
torch.utils.data: iterative methods contains some data
3. Basic Operation
a. tensor operations
# Null initialization vector
torch.empty (3,4)
torch.empty (3,4)
# Random initialization array
torch.rand (4,3)
torch.rand (4,3)
# Initialization zero vector
torch.zeros (4,3, dtype = torch.int)
torch.zeros (4,3, dtype = torch.int)
# Built from an array of data
X = torch.tensor ([3,4-], DTYPE = torch.float)
X = torch.IntTensor ([3,4-])
X = torch.tensor ([3,4-], DTYPE = torch.float)
X = torch.IntTensor ([3,4-])
# Get the size of the tensor tuple
x.shape
x.size ()
x.shape
x.size ()
# _ In the sense methods: changing the representation of their
X = torch.ones (3,4-)
# same meaning as the following three formulas
X = X + X
X = torch.add (X, X)
x.add_ ( x)
X = torch.ones (3,4-)
# same meaning as the following three formulas
X = X + X
X = torch.add (X, X)
x.add_ ( x)
# Index, the same as the operation of numpy
x [:, 1]
x [:, 1]
# Change shape
x.view (-1)
x.view (4,3-)
x.view (-1)
x.view (4,3-)
# If contains only one element value, obtaining
X torch.randn = (. 1)
x.item ()
X torch.randn = (. 1)
x.item ()
# Add one-dimensional
INPUT = torch.randn (32, 32)
INPUT = input.unsqueeze (0)
input.size ()
INPUT = torch.randn (32, 32)
INPUT = input.unsqueeze (0)
input.size ()
# Tensor of data or tensor, but = False requires_grad
x.data.requires_grad
x.data.requires_grad
# Changing the type
x.type (torch.LongTensor) 123456789101112131415161718192021222324252627282930313233343536373839404142434445
x.type (torch.LongTensor) 123456789101112131415161718192021222324252627282930313233343536373839404142434445
b. numpy and tensor conversion
# Conversion, shared memory
A = numpy.array ([l, 2,3])
A = torch.from_numpy (A)
a.numpy () 1234
A = numpy.array ([l, 2,3])
A = torch.from_numpy (A)
a.numpy () 1234
c. calls gpu
# Whether GPU available
torch.cuda.is_available ()
# call device
Device torch.device = ( 'CPU') or CUDA CPU #
A = torch.tensor ([l, 2,3], Device = 'CUDA') Direct # created on GPU
A = a.to (Device) # Upload
a = a.to ( 'cpu') # upload, or CPU CUDA
A = a.cuda () # Upload cuda12345678
torch.cuda.is_available ()
# call device
Device torch.device = ( 'CPU') or CUDA CPU #
A = torch.tensor ([l, 2,3], Device = 'CUDA') Direct # created on GPU
A = a.to (Device) # Upload
a = a.to ( 'cpu') # upload, or CPU CUDA
A = a.cuda () # Upload cuda12345678
d. Gradient
.requires_grad, determine whether differentiable (gradient)
.backward (), calculated gradient; if you do not need to specify the value of a single parameter, or need to pass the weight (size and size with the tensor)
.grad, for storing a cumulative value gradient . Only graded tensor value, computing nodes not
.detach (), equivalent to a new variable, calculated in FIG historical invalid
with torch.no_grad () :, available to the evaluation model, the gradient is not calculated
.grad_fn, how the node is generated ; a user created tensor ([1,2,3]) grad_fn is None
.data (), Tensor value, requires_grad = False
# Create a differentiable Tensor
the X-torch.ones = (2, 3, requires_grad = True)
# Change differentiability
x.requires_grad_ (False)
x.requires_grad_ (False)
# Obtain gradient values
X = torch.ones (2, 2, requires_grad = True)
Y +2 X =
Z = Y * Y *. 3
OUT = torch.sum (Z)
out.backward ()
x.grad
X = torch.ones (2, 2, requires_grad = True)
Y +2 X =
Z = Y * Y *. 3
OUT = torch.sum (Z)
out.backward ()
x.grad
# 无梯度, 报错
with torch.no_grad():
x = torch.ones(2, 2, requires_grad=True)
y = x +2
z = y * y *3
out = torch.sum(z)
out.backward()
x.grad12345678910111213141516171819202122
with torch.no_grad():
x = torch.ones(2, 2, requires_grad=True)
y = x +2
z = y * y *3
out = torch.sum(z)
out.backward()
x.grad12345678910111213141516171819202122
e. Define model
Two definitions of way
class definitions
Sequential definitions
# Class defined by
class Net (nn.Module):
DEF the __init __ (Self):
Super (Net, Self) __ .__ the init ()
# by the following parameters in the form of a declaration model instance variables need to learn
self.fc1 = nn.Linear ( . 5, 10)
self.fc2 nn.Linear = (10,20)
Forward DEF (Self, X):
# define FIG following calculation
X = self.fc1 (X)
X = nn.functional.relu (X)
X = self.fc2 (X)
return X
NET = Net ()
# define FIG following calculation
X = self.fc1 (X)
X = nn.functional.relu (X)
X = self.fc2 (X)
return X
NET = Net ()
Sequential defined by #
NET = Sequential (
nn.Linear (. 5, 10),
nn.Relu (),
nn.Linear (10, 20 is)
) 12345678910111213141516171819202122
NET = Sequential (
nn.Linear (. 5, 10),
nn.Relu (),
nn.Linear (10, 20 is)
) 12345678910111213141516171819202122
f. Operation model parameters
# Obtain model parameters
net.parameters () # can be used for iterative
net.parameters () # can be used for iterative
The gradient # model parameters cleared
net.zero_grad (12345)
net.zero_grad (12345)
g. loss function defined
loss = nn.CrossEntropyLoss()1
h. optimize operator defined
optimizer = optim.SGD(net.parameters(), lr=0.01)1
i. Training
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() 12345
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() 12345
j. Test
# 测试
with torch.no_grad():
output = net(input)123
with torch.no_grad():
output = net(input)123
k. Save and load
# 模型
torch.save(net, file)
net = torch.load(file)
torch.save(net, file)
net = torch.load(file)
# 参数
torch.save(model.state_dict(), file)
net = Model()
net.load_state_dict(file)12345678
torch.save(model.state_dict(), file)
net = Model()
net.load_state_dict(file)12345678
4. A complete machine-learning process
Data
loading data
data processing
constructs an iterator
Model
Loss
Optimizer
New / Load Model
New
Loading
direct loading model
loading parameters
New Model
load model parameters (like for adam optimizer, which also need to load parameter)
Training
batch training
I for, in the enumerate BATCH (Dataloader):
x_batch, y_batch = BATCH
Outputs = NET (x_batch)
Loss = Criterion (Output, target)
optimizer.zero_grad ()
loss.backward ()
optimizer.step () 1234567
intervals, Print validation set loss
from time to time, the storage model
test
x_batch, y_batch = BATCH
Outputs = NET (x_batch)
Loss = Criterion (Output, target)
optimizer.zero_grad ()
loss.backward ()
optimizer.step () 1234567
intervals, Print validation set loss
from time to time, the storage model
test
Loading Test Data
Data Processing
Construction iteration (optional)
into the model, the output
calculation accuracy
Data Processing
Construction iteration (optional)
into the model, the output
calculation accuracy