[Deep Learning] Tensor Broadcast Topics

1. Description

        Tensor broadcasting is a technology that automatically converts low-dimensional tensors into high-dimensional tensors, enabling element-based operations (such as addition, subtraction, multiplication, etc.) to be performed between tensors. During tensor broadcasting, tensors with fewer dimensions are copied along the axis of length 1. After matching dimensions, two tensors can be operated.

2. The basic concept of tensor

        Broadcasting occurs when a smaller tensor is "stretched" to have a compatible shape with the larger tensor in order to perform an operation.

 

Broadcasting can be an efficient way to perform tensor operations without creating duplicate data.

According to PyTorch, tensors are "broadcastable" when:

Every tensor has at least one dimension

When looping through dimension sizes, starting from the trailing dimension, the dimension sizes must be equal, one of them is 1, or one of them does not exist

When comparing shapes, the trailing dimension is the rightmost number.

In the image above, the general process can be seen:

1. Determine if the rightmost dimensions are compatible

  • Does every tensor have at least one dimension?
  • Are they equal in size? one of them? Does one not exist?

2. Stretch the dimensions to the appropriate size

3. Repeat the above steps for the next dimension

These steps can be seen in the example below.

3. Element-level operations

        All element-wise operations require tensors to have the same shape.

3.1 Vector and scalar examples

import torch
a = torch.tensor([1, 2, 3])
b = 2 # becomes ([2, 2, 2])

a * b
tensor([2, 4, 6])

        In this example, the shape of the scalar is (1,) and the shape of the vector is (3,). As shown, b is broadcasted to shape (3,) and the Hadamard product performs as expected.

3.2 Matrix and vector example 1

 

        In this example, A  has shape (3, 3) and has shape (3,).

When multiplication occurs, the vectors are stretched row by row to create a matrix, as shown in the image above. Now, both A  and  b  have shape (3, 3).

        This can be seen below.


A = torch.tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

b = torch.tensor([1, 2, 3])

A * b
tensor([[ 1,  4,  9],
        [ 4, 10, 18],
        [ 7, 16, 27]])

3.3 Matrix and vector example 2

 

        In this example, has shape (3, 3) and has shape (3, 1).

        When the multiplication occurs, the vector is stretched column by column to create two additional columns, as shown in the image above. Now, both A  and  b  have shape (3, 3).

A = torch.tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

b = torch.tensor([[1], 
                  [2], 
                  [3]])
A * b
tensor([[ 1,  2,  3],
        [ 8, 10, 12],
        [21, 24, 27]])

Tensor and Vector Example

         In this example, is a tensor of shape (2, 3, 3) and is a column vector of shape (3, 1).

A = (2, 3, 3)
b = ( , 3, 1)

        Starting with the rightmost dimension, each element is stretched column-wise to produce a (3, 3) matrix. The middle dimensions are equal. At this point, b is just a matrix. The leftmost dimension does not exist, so one must be added. Then, the matrix must be broadcast to create a size of (2, 3, 3). Now there are two (3, 3) matrices, which can be seen in the figure above.

        This allows to compute Hadamard products and generate (2, 3, 3) matrices:

A = torch.tensor([[[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]],

                  [[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]]])

b = torch.tensor([[1], 
                  [2], 
                  [3]])

A * b
tensor([[[ 1,  2,  3],
         [ 8, 10, 12],
         [21, 24, 27]],

        [[ 1,  2,  3],
         [ 8, 10, 12],
         [21, 24, 27]]])

3.4 Tensor and matrix examples

        In this example, is a tensor of shape (2, 3, 3) and is a matrix of shape (3, 3).

A = (2, 3, 3)
B = ( , 3, 3)

        This example is easier than the previous one because the two rightmost dimensions are the same. This means that the matrix simply needs to be broadcasted in the leftmost dimension to create a shape of (2, 3, 3). It just means that an extra matrix is ​​required.

        When calculating the Hadamard product, the result is (2, 3, 3).

A = torch.tensor([[[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]],
                   
                  [[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]]])

B = torch.tensor([[1, 2, 3], 
                  [1, 2, 3], 
                  [1, 2, 3]])

A * B
tensor([[[ 1,  4,  9],
         [ 4, 10, 18],
         [ 7, 16, 27]],

        [[ 1,  4,  9],
         [ 4, 10, 18],
         [ 7, 16, 27]]])

4. Matrix and tensor multiplication and dot product

        For all previous examples, the goal is to end up with the same shape to allow element-wise multiplication. The goal of this example is to implement matrix and tensor multiplication via dot product, which requires the last dimension of the first matrix or tensor to match the second-to-last dimension of the second matrix or tensor.

        For matrix multiplication:

  • (m, n) x (n, r) = (c, m, r)

        For 3D tensor multiplication:

  • (c, m, n) x (c, n, r) = (c, m, r)

For 4D tensor multiplication:

  • (z, c, m, n) x (z, c, n, r) = (z, c, m, r)

example

        For this example, A has shape (2, 3, 3) and has shape (3, 2). As of now, the last two dimensions qualify for dot product multiplication. A dimension needs to be added to  B , and a (3, 2) matrix needs to be broadcast across this dimension to create a shape of (2, 3, 2).

        The result of this tensor multiplication will be (2, 3, 3) x (2, 3, 2 ) = (2, 3, 2).

A = torch.tensor([[[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]],
                   
                  [[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]]])

B = torch.tensor([[1, 2], 
                  [1, 2], 
                  [1, 2]])

A @ B # A.matmul(B)
tensor([[[ 6, 12],
         [15, 30],
         [24, 48]],

        [[ 6, 12],
         [15, 30],
         [24, 48]]])

        Additional information about the broadcast can be found at the link below. More information on tensors and their operations can be found here .

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/131758414