pytorch入门

前言

在接触深度学习的时候，最开始是使用theano，当时觉得哇塞这个还等自动求导，和之前使用python自己写权值更新公式方法比较起来简直是厉害到不行，后来接触到了tensorflow，发现比Theano还要简便，因此也写了很多关于tensorflow的文章，而且在实习过程中，发现很多的互联网公司也是用tensorflow，当时还准备就蹲在这个坑了，诚然，tensorflow以其Google爸爸做背景，拥有相当多的用户，而且因为其活跃的社区和完善的文档和其分布式特性，也吸引了很多企业的青睐，不过最近发现了Facebook爸爸出品的pytorch，顿时感觉又发现了新大陆，之前因为torch是用lua语言写的，对其了解的不多就没关注，后来发现pytorch是可以支持python的，而且重点是pytorch是动态图，相比于静态的tensorflow，pytorch能随时打印出中间计算结果而不用像tensorflow一样，调试起来难度极大，于是准备入一波pytorch坑。

pytorch语法入门

Tensor

pytorch的tensor和tensorflow的tensor是一个意思，都表示的是张量，包含一些基本的运算，可以从python的数组直接创建tensor：

>>> import torch
>>> a=[[1.0,2.0,3.0],[2.5,3.5,4.5]]
>>> t=torch.Tensor(a)
>>> t

 1.0000  2.0000  3.0000
 2.5000  3.5000  4.5000
[torch.FloatTensor of size 2x3]

也可以直接从pytorch创建：

>>> t=torch.FloatTensor([1,2,3])
>>> t

 1
 2
 3
[torch.FloatTensor of size 3]

torch的Tensor支持一些计算，如加法：

>>> t1=torch.FloatTensor([1,2,3])
>>> t2=torch.FloatTensor([1,2,3])
>>> t3=t1+t2
>>> t3

 2
 4
 6
[torch.FloatTensor of size 3]

也支持类似numpy的一些矩阵操作：

>>> t1=torch.FloatTensor(torch.randn(2,3))
>>> t1

 1.9273 -1.2105 -3.1721
 0.7561 -0.6986 -0.1638
[torch.FloatTensor of size 2x3]

>>> t2=torch.FloatTensor(torch.randn(2,2))
>>> t2

 0.1947 -1.3970
-0.0017  0.9051
[torch.FloatTensor of size 2x2]

>>> t3=torch.cat([t1,t2],dim=1)
>>> t3

 1.9273 -1.2105 -3.1721  0.1947 -1.3970
 0.7561 -0.6986 -0.1638 -0.0017  0.9051
[torch.FloatTensor of size 2x5]

tensor也支持和numpy数据互相转换：

>>> import numpy as np
>>> a = np.ones(5)
>>> b = torch.from_numpy(a)
>>> np.add(a, 1, out=a)
array([ 2.,  2.,  2.,  2.,  2.])
>>> print(a)
[ 2.  2.  2.  2.  2.]
>>> print(b)

 2
 2
 2
 2
 2
[torch.DoubleTensor of size 5]

>>> c=b.numpy()
>>> c
array([ 2.,  2.,  2.,  2.,  2.])

同时如果需要使用GPU的话，可以把tensor转成GPU，当然也支持GPU转到cpu：

>>> d=b.cuda()
>>> d

 2
 2
 2
 2
 2
[torch.cuda.DoubleTensor of size 5 (GPU 0)]

>>> e=d.cpu()
>>> e

 2
 2
 2
 2
 2
[torch.DoubleTensor of size 5]

Variable

在pytorch中，有个和tensorflow中的tf.Variable非常相似的变量，也就是 autograd.Variable，这个变量能支持自动求导，一个简单的对参数求导的例子如下：

>>> a=Variable(torch.FloatTensor(torch.randn(2,2)),requires_grad=True)
>>> b=a+2
>>> c=b*b*3
>>> out=c.mean()
>>> out.backward()
>>> a.grad()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Variable' object is not callable
>>> a.grad
Variable containing:
 1.1911  5.0219
 4.1623  2.0740
[torch.FloatTensor of size 2x2]

在官方文档中，说明了Variable是对Tensor的包装，而正是这层包装，使得其有了求导这一特性
注意，对于标量scala而言，求导使用的backward需要提供gradient这一个参数，这个参数的size和需要求导的size是一样的，gradient这个数值可以认为是在求导的基础上进行一个scaling操作实例如下：

>>> a=Variable(torch.FloatTensor(torch.ones(2,2)),requires_grad=True)
>>> b=a+2
>>> r=torch.randn(2,2)
>>> r

 0.2842 -0.5370
 0.7611  0.3377
[torch.FloatTensor of size 2x2]

>>> b.backward(r)
>>> a.grad
Variable containing:
 0.2842 -0.5370
 0.7611  0.3377

pytorch在求完导数之后，会把所有的buffer都清空，因此当计算一次导数之后，是不能计算第二次的，如果想计算多次导数，需要在Variable初始化中加入retain_graph=true的参数

pytorch神经网络入门

pytorch神经网络构建很容易，主要使用的包是torch.nn这个包，我们尝试使用pytorch构建一个简单的二层神经网络结构：

#coding=utf-8
"""
author:luchi
date:8/6/2016
"""
import numpy as np
import torch
from torch.autograd import Variable

class network(torch.nn.Module):
    def __init__(self,in_num,hidden_num,out_num):
        super(network,self).__init__()
        self.input_layer=torch.nn.Linear(in_num,hidden_num)
    self.sigmoid=torch.nn.Sigmoid()
    self.output_layer=torch.nn.Linear(hidden_num,out_num)
    self.softmax=torch.nn.LogSoftmax()
    def forward(self,input_x):
        h_1 = self.sigmoid(self.input_layer(input_x))
        h_2 = self.softmax(self.output_layer(h_1))
        return h_2

in_num=100
hidden_num=50
out_num=2
batch_n=8
input_data = Variable(torch.randn(batch_n,in_num))
target = np.zeros([batch_n],dtype=np.int64)
for idx,t in enumerate(target):
    if idx%2==0:
        target[idx]=1
    else:target[idx]=0

target = Variable(torch.from_numpy(target))
net=network(in_num,hidden_num,out_num)
loss_function=torch.nn.NLLLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=1e-1, momentum=0.9)
for i in range(100):
    out=net(input_data)
    #print out
    loss=loss_function(out,target)
    print ("loss is %f"%loss.data.numpy())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

这里面写了一个简单的二层神经网络，最后一层是一个softmax损失函数，训练的结果是：

loss is 0.744480
loss is 0.694724
loss is 0.674453
loss is 0.658577
loss is 0.596174
loss is 0.527202
loss is 0.478970
loss is 0.411746
loss is 0.320474
loss is 0.239478
loss is 0.176087
loss is 0.121769
loss is 0.077865
loss is 0.047481
loss is 0.028717
loss is 0.017663
loss is 0.011171
loss is 0.007291
loss is 0.004912
loss is 0.003414
loss is 0.002444
loss is 0.001799
loss is 0.001360

可见训练效果还是可以，而且中间结果随时都可以输出，便于调试

pytorch的loss function

首先是我们熟悉的交叉熵损失函数：

class torch.nn.CrossEntropyLoss(weight=None, size_average=True)

官方文档提供了计算方法，其实就是交叉熵计算公式：

loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))
               = -x[class] + log(\sum_j exp(x[j]))

这里的x是一个2维的矩阵，维度为[batch_size,N]，然后class是一维度的tensor,大小为[batch_size]，这里是指单分类问题、

官方还提供了一个新的损失函数：

class torch.nn.NLLLoss(weight=None, size_average=True)

其计算方法是：

loss(x, class) = -x[class]

乍看还不知道什么情况，其实这个是配合：

nn.LogSoftmax()

这个函数使用，计算完LogSoftmax之后，作为NLLLoss的输入x，其实也就是softmax的交叉熵损失函数

说到这里，对于多分类问题，该使用什么损失函数呢？pytorch官方也提供了相应的计算方法：

class torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=True)

其计算方法为：

loss(x, y) = - sum_i (y[i] log( exp(x[i]) / (1 + exp(x[i])))
                      + (1-y[i]) log(1/(1+exp(x[i])))) / x:nElement()

其中x是还是2维的[batch_size,N]的矩阵，而y是二维的二值矩阵，其中每一行可以有多个类别值1

另外,pytorch对于损失函数都计算了其平均值，所以其损失函数值是一个标量

pytorch常用网络讲解之RNN 和 CNN

这里直接贴上官方文档上的示例代码，做一个简要的介绍。

pytorch之CNN

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


class MNISTConvNet(nn.Module):

    def __init__(self):
        # this is the place where you instantiate all your modules
        # you can later access them using the same names you've given them in
        # here
        super(MNISTConvNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    # it's the forward function that defines the network structure
    # we're accepting only a single input in here, but if you want,
    # feel free to use more
    def forward(self, input):
        x = self.pool1(F.relu(self.conv1(input)))
        x = self.pool2(F.relu(self.conv2(x)))

        # in your model definition you can go full crazy and use arbitrary
        # python code to define your model structure
        # all these are perfectly legal, and will be handled correctly
        # by autograd:
        # if x.gt(0) > x.numel() / 2:
        #      ...
        #
        # you can even do a loop and reuse the same module inside it
        # modules no longer hold ephemeral state, so you can use them
        # multiple times during your forward pass
        # while x.norm(2) < 10:
        #    x = self.conv1(x)

        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return x
net = MNISTConvNet()
input = Variable(torch.randn(1, 1, 28, 28))
out = net(input)
print(out.size())

参数理解起来也不是很难，这里需要说明下代码中的：

 nn.Conv2d(1, 10, 5)

这表示的是输入的channel个数为1，输出的channel个数为10，kernel大小是[5,5]，如果省略第四个参数表示kernel大小是一个方形，width和length都一样，为5

这里还需要说明的一点是，这段pytorch的代码中，没有像Tensorflow的padding和stride，然后我按照结果推的话，这段代码中的卷积操作的conv默认的padding方法是tensorflow中的”VALID”，其Stride应该是[1*1]，对于max-pooling 操作，padding也是“VALID”，然后stride是pooling的宽度，具体的设置我还没有研究。

pytorch之RNN

class RNN(nn.Module):

    # you can also accept arguments in your model constructor
    def __init__(self, data_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size
        input_size = data_size + hidden_size

        self.i2h = nn.Linear(input_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)

    def forward(self, data, last_hidden):
        input = torch.cat((data, last_hidden), 1)
        hidden = self.i2h(input)
        output = self.h2o(hidden)
        return hidden, output


rnn = RNN(50, 20, 10)

这段代码介绍的是最简单的RNN，理解起来也不难，不过注意这里是先把hidden和input的数据拼接起来的，其实是对应与RNN中的 $W\dot V_w+h\dot V_h+b$ 公式，不过其计算过程是拼接起来一起计算了，也不难理解，这里有一点是我还没发现官方文档里面提到了RNN的batch操作，包括里面关于NLP的一些示例代码也没有看到batch的操作，这一点还有待研究。

pytorch 和tensorflow的比较

便利性和上手难度

在便利性和上手难度上，pytorch远胜于tensorflow，原因是pytorch是基于动态图，而tensorflow是基于静态计算图，因此pytorch能随时打印tensor的值，但是Tensorflow需要设置回调的方法才能打印，如果想在tensorflow中想判断一个变量的值的正确性，只能使用assert方法，这一点确实tensorflow不及pytorch，而在上手难度上pytorch也是比tensorflow容易

官方文档和社区

tensorflow的官方文档的细致和社区的活跃度来说是远远胜于pytorch的，而且因为tensorflow的可分布式操作，也有很多大企业加持，所以显得生气勃勃，而pytorch似乎就差了那么一点点

稍微吐槽下tensorflow

tensorflow的坑爹在于其变化太快的版本API，而且变化还挺大，导致学习tensorflow的转换代价有点大，希望今后tensorflow能够人性化一点

2017年6月8号于北京

前言