访问本站观看效果更佳

写在前面

首先说一下写作目的，很多时候，看了官网的教程，感觉看懂了，但在实际操作的时候又无从下手，所以我打算整理几篇博文帮助大家迅速入门，如果大家有什么好的意见建议，欢迎在文末留言。

一、pytorch简介

现在流行的机器学习框架很多，比如tensorflow、Keras等等，那么我为什么要用pytorch呢？之前我一直在使用Keras，然而毕竟多套了一层，不够灵活。而pytorch又很好学，所以我打算把一部分工作迁移到pytorch上来，也算是多掌握一门手艺活。
从今天开始，我会和大家一起学习pytorch。
pytorch的学习资源，文档很详细，github上也有很多代码。pytorch的官网应该没有被墙吧？pytorch官网上有详细的安装教程以及相关文档，所以本文就不讲如何安装pytorch了。有时间我再把安装配置CUDA环境的博文补上，我发现好多博文里基本不讲CUDA编程的知识，其实这部分也十分有趣。
我琢磨了一下，从何讲起呢？翻来覆去想了想不如按照github上面的教程pytorch-tutorial讲解吧！还是动手敲敲代码学习得快。
下面展示一下我的学习过程～～

二、今天我们要做什么？

我们首先讲一下pytorch的基本操作，详细代码参见PyTorch Basics。如果您已经看过官网的教程，熟知基本的操作，建议您跳过该章节，直接看后面的博文。
我们先引入所需的包。

import torch 
import torchvision
import torch.nn as nn
import numpy as np
import torchvision.transforms as transforms

在实际工作中，我们常常需要和图片，文本等打交道，这些非结构化的数据在框架里又是怎么体现的呢？正如tensorflow这个框架所暗示的，操作的基本单元就是’tensor’张量。拿到张量后，我们又需要干什么呢？加加减减乘乘除除，经过一系列运算，得到一个结果，然后对这个结果求导。所谓的网络也就是一个复杂点的公式吧！

三、一个小例子

我们给出一个简单的例子，我们先用常量赋值的方式创造一些tensor：

# Create tensors.
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# Build a computational graph.
y = w * x + b    # y = 2 * x + 3

# Compute gradients.
y.backward()

# Print out the gradients.
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1

对公式y = 2 * x + 3 求导再简单不过了。requires_grad=True表明一直跟踪变量的状态，时刻准备求导。只要我们敲一下：

# Compute gradients.
y.backward()

就会计算出各自的导数啦。

# Print out the gradients.
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1

但是这个例子比较简单，我们来看一个更加复杂的例子。

四、复杂一些的例子

首先我们来构造，一些tensor，为了使数据具有代表性，我们希望它们是多维的。

# Create tensors of shape (10, 3) and (10, 2).
x = torch.randn(10, 3)
y = torch.randn(10, 2)

我们既然有了两个“矩阵”（10×3以及10×2），那么我们可以做什么呢？我们不妨把y视作结果，能不能找到一个方法，去用x拟合y呢？似乎是可能的，我们只要构造y = Wx + b的形式就可以吧？在pytorch里怎么表示W和b呢？我们可以调用torch.nn下的线性层。

# Build a fully connected layer.
linear = nn.Linear(3, 2)
print ('w: ', linear.weight)
print ('b: ', linear.bias)

打印结果如下

w:  Parameter containing:
tensor([[ 0.4690, -0.5511,  0.2672],
        [ 0.4337, -0.4777,  0.1417]], requires_grad=True)
b:  Parameter containing:
tensor([-0.2510,  0.2035], requires_grad=True)

我们看到说到底还是一个个tensor嘛！那么这个nn.Linear究竟是什么？我们打开跟踪一下代码到linear.py。
这里头有一个类class Linear(Module)，文档里是这么介绍的。
Applies a linear transformation to the incoming data: :math:`y = xA^T + b
参数有三个，很容易就看懂了：

    Args:
        in_features: size of each input sample
        out_features: size of each output sample
        bias: If set to False, the layer will not learn an additive bias.
            Default: ``True``

那么我们在看看nn.Linear(3, 2)也就很好理解了，前面一个是输入的维度(3)后面一个是输入的维度(2)。这就是一个矩阵乘法。
再看看如何逼近y。怎么算是逼近y？相差的越小越好，就是一个反向传播求导，逐步缩小误差的过程。
我们能不能像上面的例子里一样自己定义一个差值呢？理一下思路，loss公式是什么样的？（公式是Latex格式，解析地址在中国，国外网络没解析出来请等一会，或者复制下来找自行解析）。
$ \ell(x, y) = L = {l_1,\dots,l_N}^\top，\quad
l_n = \left( x_n - y_n \right)^2$

$ \ell(x, y) = \begin{cases}
\operatorname{mean}(L), & \text{if}; \text{size_average} = \text{True},\
\operatorname{sum}(L), & \text{if}; \text{size_average} = \text{False}.
\end{cases}$
当然可以但是这样做存在一个问题，难道我们每次都要自己实现一遍？太麻烦了所以我们调用pytorch替我们实现好的函数吧！

# Build loss function and optimizer.
criterion = nn.MSELoss()

同样的，pytorch为我们提供了求导的工具，我们可以直接调用。

optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

目前为止，我们已经理清了大体思路，设置好了求导过程。让我们继续完成后续的工作吧！

# Forward pass.
pred = linear(x)

# Compute loss.
loss = criterion(pred, y)
print('loss: ', loss.item())

扣动求导的扳机～

# Backward pass.
loss.backward()

# Print out the gradients.
print ('dL/dw: ', linear.weight.grad) 
print ('dL/db: ', linear.bias.grad)

结果如下所示：

loss:  1.4480115175247192
dL/dw:  tensor([[ 0.9242,  0.2026,  0.4504],
        [ 0.6620, -0.2875,  0.1874]])
dL/db:  tensor([-0.0164,  0.1238])

上面的代码中完成了一次求导的工作，如果想更加直观的操作一把，您可以进行如下操作，实际结果上是相近的：

# 1-step gradient descent.
optimizer.step()

# You can also perform gradient descent at the low level.
# linear.weight.data.sub_(0.01 * linear.weight.grad.data)
# linear.bias.data.sub_(0.01 * linear.bias.grad.data)

# Print out the loss after 1-step gradient descent.
pred = linear(x)
loss = criterion(pred, y)
print('loss after 1 step optimization: ', loss.item())

五、从numpy加载数据

pytorch尤其方便的一点就是可以轻松的实现numpy到torch tensor的转换，直接看代码吧～

# Create a numpy array.
x = np.array([[1, 2], [3, 4]])

# Convert the numpy array to a torch tensor.
y = torch.from_numpy(x)

# Convert the torch tensor to a numpy array.
z = y.numpy()

六、数据读取

数据读取往往是大家在修改代码时面临的第一个问题，我们先就一个标准数据集合说明一下读取问题。

第一步下载数据集

# Download and construct CIFAR-10 dataset.
train_dataset = torchvision.datasets.CIFAR10(root='../../data/',
                                             train=True, 
                                             transform=transforms.ToTensor(),
                                             download=True)

其实非常简单，就是指定一下位置下载数据。那么数据是什么样的呢？我们得看看吧！

第二步预览数据
数据说明在dataset的说明里都有，我们要做的是观察一下数据的格式大小。

# Fetch one data pair (read data from disk).
image, label = train_dataset[0]
print (image.size())
print (label)

输出结果如下所示：

torch.Size([3, 32, 32])
6

pytorch有没有给我们提供一个较为方便的接口，让我们可以遍历数据呢？当然有。我们查看文档，看到class DataLoader，看看它是怎么说的。
**Data loader. Combines a dataset and a sampler, and provides single- or multi-process iterators over the dataset. **
有什么参数呢？看着一堆，其实也很好懂的。

Arguments:
        dataset (Dataset): dataset from which to load the data.
        batch_size (int, optional): how many samples per batch to load
            (default: 1).
        shuffle (bool, optional): set to ``True`` to have the data reshuffled
            at every epoch (default: False).
        sampler (Sampler, optional): defines the strategy to draw samples from
            the dataset. If specified, ``shuffle`` must be False.
        batch_sampler (Sampler, optional): like sampler, but returns a batch of
            indices at a time. Mutually exclusive with batch_size, shuffle,
            sampler, and drop_last.
        num_workers (int, optional): how many subprocesses to use for data
            loading. 0 means that the data will be loaded in the main process.
            (default: 0)
        collate_fn (callable, optional): merges a list of samples to form a mini-batch.
        pin_memory (bool, optional): If ``True``, the data loader will copy tensors
            into CUDA pinned memory before returning them.
        drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
            if the dataset size is not divisible by the batch size. If ``False`` and
            the size of dataset is not divisible by the batch size, then the last batch
            will be smaller. (default: False)
        timeout (numeric, optional): if positive, the timeout value for collecting a batch
            from workers. Should always be non-negative. (default: 0)
        worker_init_fn (callable, optional): If not None, this will be called on each
            worker subprocess with the worker id (an int in ``[0, num_workers - 1]``) as
            input, after seeding and before data loading. (default: None)

书接上文，我们看一个例子，先实例化一个dataloader：

第三步定义loader

# Data loader (this provides queues and threads in a very simple way).
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=64, 
                                           shuffle=True)

不难看出，常用的选项就这几个，毕竟这里数据量还不算太大。暂时不考虑并行性。那么这个train_loader又是什么呢？我们要去如何访问一个个元素呢？文档里显示DataLoader底层是使用迭代的方法访问元素的，显然我们可以调用一个迭代器。

# When iteration starts, queue and thread start to load data from files.
data_iter = iter(train_loader)

第四步调用数据
每次读取多少由上面的batch_size告知。现在我们拿到了迭代器，就可以一个个取值，并进行实际操作了。

# Mini-batch images and labels.
images, labels = data_iter.next()

# Actual usage of the data loader is as below.
for images, labels in train_loader:
    # Training code should be written here.
    pass

七、从自定义数据集合里导入数据

在上文中我们直接加载的预设的数据集，这对一名有其它数据要求的用户来说显然是不够的。我们需要做的就是重构一下pytorch的代码。

# You should your build your custom dataset as below.
class CustomDataset(torch.utils.data.Dataset):
    def __init__(self):
        # TODO
        # 1. Initialize file paths or a list of file names. 
        pass
    def __getitem__(self, index):
        # TODO
        # 1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open).
        # 2. Preprocess the data (e.g. torchvision.Transform).
        # 3. Return a data pair (e.g. image and label).
        pass
    def __len__(self):
        # You should change 0 to the total size of your dataset.
        return 0 

# You can then use the prebuilt data loader. 
custom_dataset = CustomDataset()
train_loader = torch.utils.data.DataLoader(dataset=custom_dataset,
                                           batch_size=64, 
                                           shuffle=True)

八、加载预训练模型

pytorch为我们提供了一些预训练模型，可以在models下查看。

# Download and load the pretrained ResNet-18.
resnet = torchvision.models.resnet18(pretrained=True)

# If you want to finetune only the top layer of the model, set as below.
for param in resnet.parameters():
    param.requires_grad = False

# Replace the top layer for finetuning.
resnet.fc = nn.Linear(resnet.fc.in_features, 100)  # 100 is an example.

# Forward pass.
images = torch.randn(64, 3, 224, 224)
outputs = resnet(images)
print (outputs.size())     # (64, 100)

九、保存和加载模型

非常简单的操作，相信大家一下子就能看明白了。


# Save and load the entire model.
torch.save(resnet, 'model.ckpt')
model = torch.load('model.ckpt')

# Save and load only the model parameters (recommended).
torch.save(resnet.state_dict(), 'params.ckpt')
resnet.load_state_dict(torch.load('params.ckpt'))

十、小结

今天我们学习了pytorch的基本操作以及常用的一些内置模块，是不是很简单？后面会陆陆续续介绍更加丰富的内容，敬请期待！

pytorch入门——边学边练01基础知识