Pytorch英文官方文档学习笔记(一、Tensors)

$t e n sor 是一种与数组和矩阵非常相似的数据结构 .$
在PyTorch中，我们使用tensor来编码一个模型的输入和输出，以及模型的参数。

一、tensor的初始化

torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False) → Tensor

·data：data的数据类型可以是列表list、元组tuple、numpy数组ndarray、纯量scalar（又叫标量）和其他的一些数据类型。
·dtype：该参数可选参数，默认为None，如果不进行设置，生成的Tensor数据类型会拷贝data中传入的参数的数据类型，比如data中的数据类型为float，则默认会生成数据类型为torch.FloatTensor的Tensor。
·device：该参数可选参数，默认为None，如果不进行设置，会在当前的设备上为生成的Tensor分配内存。
·requires_grad：该参数为可选参数，默认为False，在为False的情况下，创建的Tensor不能进行梯度运算，改为True时，则可以计算梯度。
·pin_memory：该参数为可选参数，默认为False，如果设置为True，则在固定内存中分配当前Tensor，不过只适用于CPU中的Tensor。

设置require_grad参数的函数
Tensor.requires_grad_(requires_grad=True) → Tensor
Change if autograd should record operations on this tensor: sets this tensor’s requires_grad attribute in-place. Returns this tensor.

with torch.no_grad的作用

在该模块下，所有计算得出的tensor的requires_grad都自动设置为False，可以大大节省显存。
即使一个tensor（命名为x）的requires_grad = True，在with torch.no_grad中计算，由x得到的新tensor（命名为w）requires_grad也为False，且grad_fn也为None,即不会对w求导。例子如下所示：

x = torch.rand(10, 5, requires_grad = True)
y = torch.rand(10, 5, requires_grad = True)
with torch.no_grad():
    w = x + y
    print(w.requires_grad)
    print(w.grad_fn)
#False
#None

1、直接用数据初始化方式

data=[[1,2],[3,4]]
x_data=torch.tensor(data)

2、由Numpy array转化而来

np_array=np.array(data)
x_np=torch.from_numpy(np_array)

t = torch.ones(5)
print(f"t: {
      
      t}")
n = t.numpy()#反之Numpy array也可以由tensor转化而来
print(f"n: {
      
      n}")
#t: tensor([1., 1., 1., 1., 1.])
#n: [1. 1. 1. 1. 1.]
t.add_(1)
print(f"t: {
      
      t}")
print(f"n: {
      
      n}")
#t: tensor([2., 2., 2., 2., 2.])
#n: [2. 2. 2. 2. 2.]
#CPU上的张量和NumPy数组可以共享它们的底层内存位置，所以改变一个的同时可以改变另一个。

3、由另一个tensor的shape转化而来

x_ones = torch.ones_like(x_data)
print(f"Ones Tensor: \n {
      
      x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float)
print(f"Random Tensor: \n {
      
      x_rand} \n")
#Ones Tensor:
# tensor([[1, 1],
#        [1, 1]])

#Random Tensor:
# tensor([[0.5900, 0.9701],
#        [0.3483, 0.4137]])

4、用随机值或常量并借助shape元组定义tensor

shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
print(f"Random Tensor: \n {
      
      rand_tensor} \n")
print(f"Ones Tensor: \n {
      
      ones_tensor} \n")
print(f"Zeros Tensor: \n {
      
      zeros_tensor}")
#Random Tensor:
# tensor([[0.8330, 0.5998, 0.8980],
#        [0.6941, 0.3293, 0.7102]])
#
#Ones Tensor:
# tensor([[1., 1., 1.],
#        [1., 1., 1.]])
#
#Zeros Tensor:
# tensor([[0., 0., 0.],
#        [0., 0., 0.]])

torch.Tensor(data)：将输入的data转化torch.FloatTensor()
torch.Tensor() 可以创建一个空的FloatTensor，使用torch.tensor() 时则会报错。

a=torch.tensor(1)
print(a)
print(a.type())
a=torch.Tensor(1)
print(a)
print(a.type())
#tensor(1)
#torch.LongTensor
#tensor([1.0010e-38])
#torch.FloatTensor

二、tensor的属性

Tensor.shape[i]的就代表第i维的数值

tensor = torch.rand(3,4)
print(f"Shape of tensor: {
      
      tensor.shape}")
print(f"Shape of tensor: {
      
      tensor.shape[0]}")
print(f"Shape of tensor: {
      
      tensor.shape[1]}")
print(f"Datatype of tensor: {
      
      tensor.dtype}")
print(f"Device tensor is stored on: {
      
      tensor.device}")
print(tensor.requires_grad) #表示autograd时是否需要计算此tensor的梯度，默认False
print(tensor.grad) #存储梯度的值，初始为None
print(tensor.grad_fn) #反向传播时，用来计算梯度的函数
print(tensor.is_leaf) #该张量节点在计算图中是否为叶子节点
'''
当这个 tensor 是用户创建的时候，它是一个叶子节点，
当这个 tensor 是由其他运算操作产生的时候，它就不是一个叶子节点
'''
print(f"mean of tensor: {
      
      Tensor.mean(tensor)}")
print(f"variance of tensor: {
      
      Tensor.var(tensor)}")
#Shape of tensor: torch.Size([3, 4])
#Shape of tensor: 3
#Shape of tensor: 4
#Datatype of tensor: torch.float32
#Device tensor is stored on: cpu
#False
#None
#None
#True

requires_grad: 如果需要为张量计算梯度，则为True，否则为False。我们创建tensor时，可以指定requires_grad为True（默认为False）

grad_fn： grad_fn用来记录变量是怎么来的，方便计算梯度，y = x*3，y.grad_fn记录了y由x计算的过程。直接创建得到的叶子结点的grad_fn=None

grad：当执行完了backward()之后，通过x.grad查看x的梯度值。

对requires_grad，grad，grad_fn，is_leaf的深层次探讨参考此篇

三、对tensor的操作

1、tensor默认是在CPU上创建的，我们需要使用.to方法明确地将张量移动到GPU上（在检查GPU的可用性之后）

if torch.cuda.is_available():
    tensor = tensor.to("cuda")

2索引、切片、转置(permute、transpose)操作

tensor = torch.ones(4, 4)
print(f"First row: {
      
      tensor[0]}")
print(f"First column: {
      
      tensor[:, 0]}")
print(f"Last column: {
      
      tensor[..., -1]}")#此处用:或...都行
tensor[:,1] = 2
print(tensor)
#First row: tensor([1., 1., 1., 1.])
#First column: tensor([1., 1., 1., 1.])
#Last column: tensor([1., 1., 1., 1.])
#tensor([[1., 2., 1., 1.],
#        [1., 2., 1., 1.],
#        [1., 2., 1., 1.],
#        [1., 2., 1., 1.]])

'''
def transpose(self, dim0, dim1): -> Tensor
def permute(self, *dims): -> Tensor
transpose：只能选择tensor中两个维度进行转置
permute：可以让tensor按照任意指定维度顺序进行转置
'''
tensor1=tensor.transpose(0,1)#例原本坐标为(0,1,1,0)的点就会变成(1,0,1,0)
tensor2=tensor.permute(1,0,3,2)#例原本坐标为(0,1,1,0)的点就会变成(1,0,0,1)

注：但numpy中的def transpose(self, *axes):方法是让numpy.array()按照任意指定维度顺序进行转置

3、squeeze和unsqueeze操作，降维升维

$\text torch.unsqueeze(input, dim) → Tensor$

Returns a new tensor with a dimension of size one inserted at the specified position.
The returned tensor shares the same underlying data with this tensor.
A dim value within the range [-input.dim() - 1, input.dim() + 1) can be used. Negative dim will correspond to unsqueeze() applied at dim = dim + input.dim() + 1.

>>> x = torch.tensor([1, 2, 3, 4])#shape为4
>>> torch.unsqueeze(x, 0)#shape变为1×4
tensor([[ 1,  2,  3,  4]])
>>> torch.unsqueeze(x, 1)#shape变为4×1
tensor([[ 1],
        [ 2],
        [ 3],
        [ 4]])

$\text torch.squeeze(input, dim=None) → Tensor$

Returns a tensor with all the dimensions of input of size removed.
For example, if input is of shape: $\times 1 \times B \times C \times 1 \times D)$ then the out tensor will be of shape: $\times B \times C \times D)$
When dim is given, a squeeze operation is done only in the given dimension. If input is of shape: $\times 1 \times B)$ , squeeze(input, 0) leaves the tensor unchanged, but squeeze(input, 1) will squeeze the tensor to the shape $\times B)$

>>> x = torch.zeros(2, 1, 2, 1, 2)
>>> x.size()
torch.Size([2, 1, 2, 1, 2])

>>> y = torch.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])

>>> y = torch.squeeze(x, 0)#leaves the tensor unchanged
>>> y.size()
torch.Size([2, 1, 2, 1, 2])

>>> y = torch.squeeze(x, 1)
>>> y.size()
torch.Size([2, 2, 1, 2])

4、算术操作torch.matmul，±*/

基础四则运算
a + b等同于torch.add()
a - b等同于torch.sub()
a * b等同于torch.mul()
a / b等同于torch.div()
直接进行加减乘除、幂、开方torch.sqrt

a=10000**(torch.arange(0, 256, 2).float() / 256)

*法运算即mul，/法运算也类似也如下分类

(1)、Tensor*标量k的结果是Tensor的每个元素乘以k

>>> a = torch.ones(3,4)
>>> a
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
>>> a * 2
tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]])

(2)Tensor* 一维张量
Tensor* 行向量的结果是每列乘以行向量对应列的值
Tensor* 与列向量的结果是每行乘以列向量对应行的值

>>> a = torch.ones(3,4)
>>> a
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
>>> b = torch.Tensor([1,2,3,4])
>>> b
tensor([1., 2., 3., 4.])
>>> a * b
tensor([[1., 2., 3., 4.],
        [1., 2., 3., 4.],
        [1., 2., 3., 4.]])

>>> b = torch.Tensor([1,2,3]).reshape((3,1))
>>> b
tensor([[1.],
        [2.],
        [3.]])
>>> a * b
tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

(3)Tensor1* Tensor2
如果相乘的两个张量维度不一致，但Tensor2的维度与Tensor1的后几个维度对应相等就可以相乘。
相乘时会先将Tensor2以重复方式扩充为Tensor1的shape

当Tensor1、Tensor2维度相同时，对应维的值需要一样，如果对应维的值不一样，那其中必须有一个为1，否则会报错。

+法运算即add，-法运算也类似也如下分类
(1)(2)情况同上
(3)Tensor1+Tensor2
如果相加的两个张量维度不一致，但Tensor2的维度与Tensor1的后几个维度对应相等就可以相加。
相加时会先将Tensor2以重复方式扩充为Tensor1的shape

a = torch.ones([8, 4, 5, 6])
b = torch.ones([5, 6])
c = a+b
print(c.shape)#(8, 4, 5, 6)

当Tensor1、Tensor2维度相同时，对应轴的值需要一样，或者为1。

a = torch.ones([8, 4, 5, 6])
b = torch.ones([8, 1, 5, 6])
c = torch.ones([8, 2, 5, 6])
a+b可以。相加过程也是先将b重复扩充为与a相同shape
a+c报错

matmul运算

y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)#类似矩阵乘法

y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)
#y1,y2,y3是相同的

z1 = tensor * tensor
z2 = tensor.mul(tensor)#两个tensor的对应位置相乘

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)
#z1,z2,z3是相同的
#tensor([[1., 4., 1., 1.],
#       [1., 4., 1., 1.],
#       [1., 4., 1., 1.],
#       [1., 4., 1., 1.]])

5、torch.reshape(input, shape) → Tensor

'''
Parameters
input (Tensor) – the tensor to be reshaped
shape (tuple of python:ints) – the new shape
A single dimension may be -1, in which case it’s inferred from the remaining dimensions and
 the number of elements in input.
 一个单一的维度可能是-1，在这种情况下，它是由其余的维度和输入的元素数量推断出来的。
'''
应用举例：假设当我们的dataloader的batch_size设置为64。并且经过卷积（out_channels=6）之后，
我们需要使用tensorboard可视化，而彩色图片的writer.add.images(output)的彩色图片是in_channels=3的。
那么则需要对卷积后的图片进行reshape，Torch.size(64,6,30,30)---->torch.size(-1,3,30,30)，-1的意思为最后自动计算其batch_size。
输出通道由6变成了3，从而每一个通道的数量增加，因而结果为torch.size(128,3,30,30)

6、Tensor.view(*shape) → Tensor，有时使用前要加contiguous()方法

Returns a new tensor with the same data as the self tensor but of a different shape.

x = torch.randn(4, 4)
x.size()#torch.Size([4, 4])
y = x.view(16)
y.size()#torch.Size([16])
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
z.size()#torch.Size([2, 8])

$\color{red}重点举例：对于一个3×4维度的tensor→t，对其进行reshape(t,(4,3))和t=t.transpose(-1,-2)，虽然得到的新维度一致，但其中对应位置的数值不一致，reshape和view只是单纯按照新维度填充数值，transpose才是在转置矩阵$

7、torch.cat、torch.split

torch.cat(tensors, dim=0, *, out=None) → Tensor
Concatenates the given sequence of seq tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty.

Parameters:
tensors (sequence of Tensors) – any python sequence of tensors of the same type. Non-empty tensors provided must have the same shape, except in the cat dimension.
dim (int, optional) – the dimension over which the tensors are concatenated
dim=-1表示倒数第一维

>>> x = torch.randn(2, 3)
>>> x
tensor([[ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 0)#shape是6×3
tensor([[ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497],
        [ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497],
        [ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 1)#shape是2×9
tensor([[ 0.6580, -1.0969, -0.4614,  0.6580, -1.0969, -0.4614,  0.6580,
         -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497, -0.1034, -0.5790,  0.1497, -0.1034,
         -0.5790,  0.1497]])

8、torch.clamp(input, min=None, max=None, *, out=None) → Tensor

Clamps all elements in input into the range [ min, max ]. Letting min_value and max_value be min and max, respectively, this returns:

$min_value i ) , max_value i ) 计算公式：y_i = \min(\max(x_i, \text{min\_value}_i), \text{max\_value}_i)$
即把在min,max范围外的数限定为min或max，范围内的数不变

If min is None, there is no lower bound. Or, if max is None there is no upper bound.

>>> a = torch.randn(4)
>>> a
tensor([-1.7120,  0.1734, -0.0478, -0.0922])
>>> torch.clamp(a, min=-0.5, max=0.5)
tensor([-0.5000,  0.1734, -0.0478, -0.0922])

>>> min = torch.linspace(-1, 1, steps=4)
>>> torch.clamp(a, min=min)
tensor([-1.0000,  0.1734,  0.3333,  1.0000])

9、torch.linspace(start, end, steps, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Creates a one-dimensional tensor of size steps whose values are evenly spaced from start to end, inclusive. That is, the value are:

$(\text{start}, \text{start} + \frac{\text{end} - \text{start}}{\text{steps} - 1}, \ldots, \text{start} + (\text{steps} - 2) * \frac{\text{end} - \text{start}}{\text{steps} - 1}, \text{end})$

>>> torch.linspace(3, 10, steps=5)
tensor([  3.0000,   4.7500,   6.5000,   8.2500,  10.0000])
>>> torch.linspace(-10, 10, steps=5)
tensor([-10.,  -5.,   0.,   5.,  10.])
>>> torch.linspace(start=-10, end=10, steps=5)
tensor([-10.,  -5.,   0.,   5.,  10.])
>>> torch.linspace(start=-10, end=10, steps=1)
tensor([-10.])

10、repeat和expand

*Tensor.repeat(sizes) → Tensor
Repeats this tensor along the specified dimensions.

>>> x = torch.tensor([1, 2, 3])
>>> x.repeat(4, 2)#把x看作1×3了,然后1乘4,3乘2
tensor([[ 1,  2,  3,  1,  2,  3],
        [ 1,  2,  3,  1,  2,  3],
        [ 1,  2,  3,  1,  2,  3],
        [ 1,  2,  3,  1,  2,  3]])
>>> x.repeat(4, 2, 1).size()#把x看作1×1×3了,然后1乘4,1乘2,3乘1
torch.Size([4, 2, 3])

*Tensor.expand(sizes) → Tensor
Returns a new view of the self tensor with singleton dimensions expanded to a larger size.
Passing -1 as the size for a dimension means not changing the size of that dimension.

Tensor can be also expanded to a larger number of dimensions, and the new ones will be appended at the front. For the new dimensions, the size cannot be set to -1.

>>> x = torch.tensor([[1], [2], [3]])
>>> x.size()
torch.Size([3, 1])
>>> x.expand(3, 4)
tensor([[ 1,  1,  1,  1],
        [ 2,  2,  2,  2],
        [ 3,  3,  3,  3]])
>>> x.expand(-1, 4)   # -1 means not changing the size of that dimension
tensor([[ 1,  1,  1,  1],
        [ 2,  2,  2,  2],
        [ 3,  3,  3,  3]])

11、torch.sum(input, dim, keepdim=False, *, dtype=None)

Returns the sum of each row of the input tensor in the given dimension dim. If dim is a list of dimensions, reduce over all of them.
Parameters:
·input (Tensor) – the input tensor.
·dim (int or tuple of ints, optional) – the dimension or dimensions to reduce. If None, all dimensions are reduced.
·keepdim (bool) – whether the output tensor has dim retained or not.

Keyword Arguments:
·dtype (torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to dtype before the operation is performed. This is useful for preventing data type overflows. Default: None.

>>>a=torch.arange(3*2*2).view(2,2,3)
>>>print(a)
tensor([[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]]])
>>>b=torch.sum(a,(1,2))
>#把(0,0,0)、(0,0,1)、(0,0,2)、(0,1,0)、(0,1,1)、(0,1,2)的相加，以此类推
>>>print(b)
tensor([15, 51])

>>>c=torch.sum(a,0)#即把坐标为(0,0,0)和(1,0,0)的相加，以此类推
>>>print(c)
tensor([[ 6,  8, 10],
        [12, 14, 16]])

12、torch.cumprod

>>>a=torch.Tensor([[0.1,0.2],[0.3,0.4]])
>>>b=torch.cumprod(a,dim=0)#第0维所以就是坐标为(0,0)和(1,0)的进行相乘
tensor([[0.1000, 0.2000],
		[0.0300, 0.0800]])
>>>b=torch.cumprod(a,dim=1)
tensor([[0.1000, 0.0200],
		[0.3000, 0.1200]])

13、torch.flatten(n),从第n维往后的维度展平

agg = tensor.sum().item()
print(agg, type(agg))
tensor.add_(5)
#In-place operations：Operations that store the result into the operand are called in-place. 
They are denoted by a _ suffix.