PyTorch的Tensor（上）

背景

一个神经网络主要是由什么构成的呢？大概就是各种各样的op、op结合的方式、loss；这种形态对应到一个组织上，那就分别是组织中各种各样的人、组织架构、组织的奖惩机制。如此看来，小到一个人，大到一个国家，都具有这样的形态，就像自相似分形一样。在神经网络中，各种各样的op拥有或多或少的参数，而Tensor就是用来存储并计算这些参数的基础。

更近一步，PyTorch宣称自己是支持GPU运算的numpy，并且可以自动求微分，这究竟是什么意思呢？因此在本文中，gemfield将从以下几个方面来讲述Tensor：

1，如何创建一个tensor？创建一个tensor的时候发生了什么？

2，CUDA tensor和CPU tensor的区别是什么呢？这两个之间怎么互相转换呢？转换的时候发生了什么？

3，对于tensor上的方法调用，真正的执行逻辑是在哪里定义的呢？CPU tensor和CUDA tensor的执行有什么不一样呢？

4，更重要的是，一个tensor的requires_grad标志有什么含义呢？一个tensor的grad_fn属性又代表了什么呢？

5，最后逃避不了的问题就是，针对一个tensor的backward()调用发生了什么？每个tensor上的grad属性的意义又是什么呢？

那么欢迎来到PyTorch的Tensor系列，这个系列应该是PyTorch关于tensor讲解的最底层的文章了。此篇为本系列第一篇，专门阐述Tensor的创建。

PyTorch Tensor在Python中的继承体系

在Gemfield：详解Pytorch中的网络构造一文中，gemfield提到过，所有可学习的参数（如weights和bias）的类型都是Parameter类，Parameter的父类正是torch.Tensor类（Parameter和torch.Tensor的区别只有4个：Parameter重新实现了序列化、如何print、deep copy、requires_grad默认True），而torch.Tensor的父类又是torch._C._TensorBase。看起来这个Tensor的继承体系是这样的：

#在python中定义了Parameter类
class Parameter(torch.Tensor)

#在python中定义了torch.Tensor类
class Tensor(torch._C._TensorBase)

#在C++中定义了Variable类
struct TORCH_API Variable : public at::Tensor

//PyObject* Py_InitModule(char *name, PyMethodDef *methods)
//创建torch._C
Py_InitModule("torch._C", methods.data()）

//创建 torch._C._TensorBase
PyModule_AddObject(module, "_TensorBase",   (PyObject *)&THPVariableType);

要了解Pytorch的Tensor，我们就肯定需要了解Tensor的继承体系以及父子之间的区别。Gemfield就从最基类torch._C说起。import torch的时候，按照Python规范，位于torch/__init__.py中的逻辑就会被执行：

from torch._C import *
......
__all__ += [name for name in dir(_C) if name[0] != '_' and not name.endswith('Base')]

这里的关键就是torch._C，因为Tensor类就是继承自torch._C._TensorBase。如果我们按照诞生顺序（初始化顺序）来描述这一过程的话，就是先有了torch._C，然后有了torch._C._TensorBase，然后有了torch.Tensor继承自torch._C._TensorBase。但这毕竟是C++部分，要在Python中能够import torch._C，则必定要使用Python的扩展规范来导出这个符号，PyTorch就是这么做的：如果是Python2，会使用Py_InitModule API；如果是Python3的话，则会使用PyModule_Create API；不管是哪个API，都会创建出torch._C这个python对象：

//name is "torch._C"
PyObject* Py_InitModule(char *name, PyMethodDef *methods)

并在torch._C上注册一个list的function，这个list很长很长。每一个函数由一个PyMethodDef代表，存放在几个很长很长的list里。这些符号的名字都会在dir(torch)里看到(除了那些符号名前带"_" prefix和"Base" suffix的)。有了这么一个长长的methods list，我们就可以使用CPython的API来创建一个新的Python类：torch._C这个Python 类就诞生了。

下面的工作就是要往torch._C这个对象上注入一些（很多）成员。其中一个就是torch._C._TensorBase。torch._C的_TensorBase是通过下面的调用完成的初始化：

//来自civilnet的torch/csrc/autograd/python_variable.cpp
bool THPVariable_initModule(PyObject *module)
{
  static std::vector<PyMethodDef> methods;
  THPUtils_addPyMethodDefs(methods, torch::autograd::variable_methods);
  THPUtils_addPyMethodDefs(methods, extra_methods);
  THPVariableType.tp_methods = methods.data();
  if (PyType_Ready(&THPVariableType) < 0)
    return false;
  Py_INCREF(&THPVariableType);
  PyModule_AddObject(module, "_TensorBase",   (PyObject *)&THPVariableType);
  torch::autograd::initTorchFunctions(module);
  return true;
}

执行THPVariable_initModule的时候，使用

PyModule_AddObject(module, "_TensorBase",   (PyObject *)&THPVariableType);

来将THPVariableType注册成为torch._C._TensorBase。所以你现在知道了，torch._C._TensorBase就是c++中的THPVariableType（类型是PyTypeObject，Python对象系统中最重要的一个类）。现在我们注册了torch._C._TensorBase这个Python类，下面就要往这个类上注册一些函数：

THPUtils_addPyMethodDefs(methods, torch::autograd::variable_methods);
THPUtils_addPyMethodDefs(methods, extra_methods);
......
torch::autograd::initTorchFunctions(module);

其中，torch::autograd::variable_methods包含了下列358个方法：

//来自syszux的torch/csrc/autograd/generated/python_variable_methods.cpp
//的torch::autograd::variable_methods

"__add__", (PyCFunction)THPVariable_add
......

另外，在初始化完成torch._C._TensorBase后，紧接着立刻初始化了torch._C._VariableFunctions（包含了大量的方法），torch._C._VariableFunctions主要是暴露给torch/functional.py使用的符号。对于torch._C._TensorBase来说，初始化工作就要到此结束了。不过还有一个巨大的疑问没有解释，就是torch._C._TensorBase上的359个方法是在哪里实现的呢？难道这不应该是最关键的吗？再等等......好像从本文开头到现在，我们已经提到过三处函数区了（torch._C的方法、torch._C._TensorBase的方法、torch._C._VariableFunctions的方法）。Anyway，在本章节，我们就先聚焦torch._C._TensorBase的方法。这些方法都是torch::autograd::variable_methods，想起什么来了吗？在Gemfield：PyTorch Autograd代码的动态生成一文中已经提到过，这些方法的实现都是动态生成的，并且由生成的python_variable_methods_dispatch.h中定义的inline dispatch函数将这些variable_methods的逻辑分发到Tensor类对应的方法上，比如native、cuda等等。

PyTorch Tensor在C++中的继承体系

一个tensor比较重要的特质主要有：tensor的维度信息、tensor的值内容、tensor的grad、tensor的type、tensor的backend等等。更重要的是，一个tensor需要精巧的内存管理。在C++中，一个tensor是由DataPtr、StorageImpl、Storage、TensorImpl、Tensor、Variable::Impl、Variable、AutogradMeta这些底层的类组成的。这个继承体系看起来是这样的：

#垂直表示继承，水平表示被包含,()表示为一个类

DataPtr -> StorageImpl ->  Storage ->  (TensorImpl)  ->  (Tensor)
                                             |              |
                                             v              v
                            (Tensor) ->  Variable::Impl   Variable  -> AutogradMeta -> (TensorImpl)

（下面是我自己重新画了一个流程图）

tony2278

发布了1715 篇原创文章 · 获赞 380 · 访问量 247万+

他的留言板关注