Hands-on Deep Learning - Basics of Linear Algebra

A brief introduction to the basic mathematical objects, arithmetic, and operations in linear algebra, and to represent them with mathematical symbols and corresponding code implementations. (pytorch)

1.1 Scalar

A scalar is represented by a tensor with only one element.

Example: Instantiate two scalars and perform some familiar arithmetic operations, namely addition, multiplication, division, and exponentiation.

import torch

x = torch.tensor(3.0)
y = torch.tensor(2.0)

x + y, x * y, x / y, x**y

output:

(tensor(5.), tensor(6.), tensor(1.5000), tensor(9.))

1.2 Vectors

A vector can be thought of as a list of scalar values. We
call these scalar values ​​vectorial 元素(element)或分量(component).
example:

x = torch.arange(4)
x

output:

tensor([0, 1, 2, 3])

We can use subscripts to refer to any element of a vector. For example, we can refer to the ith element by x[i].

x[3]

output:

tensor(3)

A vector is just an array of numbers, and just as every array has a length, so does every vector.
As with normal Python arrays, we can access the length of a tensor by calling Python's built-in len() function.

len(x)

output:

4

When representing a vector (with only one axis) as a tensor, we can also access the length of the vector via the .shape property. shape is a group of elements listing the length (dimension) of a tensor along each axis. For tensors with only one axis, the shape has only one element.

x.shape

output:

torch.Size([4])

Note that the word dimension tends to have different meanings in different contexts, which is often confusing. For clarity, let's make it clear here: The dimension of a vector or axis is used to denote the length of the vector or axis, ie the number of elements of the vector or axis. However, the dimensions of a tensor are used to represent the number of axes the tensor has. In this sense, the dimension of an axis of a tensor is the length of that axis.

1.3 Matrices

Just as vectors generalize scalars from order zero to order one, matrices generalize vectors from order one to order two. Represented in code as a tensor with two axes.
When calling a function to instantiate a tensor, we can create a matrix of shape m×n by specifying two components m and n.

A = torch.arange(20).reshape(5, 4)
A

output:

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])

Transpose of a matrix:

A.T

output:

tensor([[ 0,  4,  8, 12, 16],
        [ 1,  5,  9, 13, 17],
        [ 2,  6, 10, 14, 18],
        [ 3,  7, 11, 15, 19]])

Define a matrix:

B = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
B

output:

tensor([[1, 2, 3],
        [2, 0, 4],
        [3, 4, 5]])

Comparison of matrices:
only matrices with the same shape
Compare B with its transpose:

B == B.T

output:

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

1.4 Tensors

Just like a vector is a generalization of a scalar and a matrix is ​​a generalization of a vector, we can build data structures with more axes.
Tensors give us a generic way to describe n-dimensional arrays with any number of axes. For example, a vector is a first-order tensor, and a matrix is ​​a second-order tensor.
Tensors will become even more important when we start working with images, which come as n-dimensional arrays with 3 axes corresponding to height, width, and a channel axis for color channels (red, green) and blue).

X = torch.arange(24).reshape(2, 3, 4)
X

output:

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

Given any two tensors with the same shape, the result of any element-wise binary operation will be a tensor of the same shape. For example, adding two matrices of the same shape performs element-wise addition on both matrices.

A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = A.clone()  # 通过分配新内存,将A的一个副本分配给B
A, A + B,A * B

output:

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [12., 13., 14., 15.],
         [16., 17., 18., 19.]]),
 tensor([[ 0.,  2.,  4.,  6.],
         [ 8., 10., 12., 14.],
         [16., 18., 20., 22.],
         [24., 26., 28., 30.],
         [32., 34., 36., 38.]]),
tensor([[  0.,   1.,   4.,   9.],
        [ 16.,  25.,  36.,  49.],
        [ 64.,  81., 100., 121.],
        [144., 169., 196., 225.],
        [256., 289., 324., 361.]]))

Multiplying or adding a tensor by a scalar does not change the shape of the tensor, where each element of the tensor will be added or multiplied by the scalar.

a = 2
X = torch.arange(24).reshape(2, 3, 4)
a + X, (a * X).shape

output:

(tensor([[[ 2,  3,  4,  5],
          [ 6,  7,  8,  9],
          [10, 11, 12, 13]],

         [[14, 15, 16, 17],
          [18, 19, 20, 21],
          [22, 23, 24, 25]]]),
 torch.Size([2, 3, 4]))

1.5 dot product

The sum of element-wise products at the same position, the dot product can be expressed as加权平均(weighted average)

x = torch.arange(4,dtype = torch.float32)
y = torch.ones(4, dtype = torch.float32)
x, y, torch.dot(x, y)

output:

(tensor([0., 1., 2., 3.]), tensor([1., 1., 1., 1.]), tensor(6.))

1.6 Matrix Multiplication

example:

A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = torch.ones(4, 3)
A,B,torch.mm(A, B)

output:

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [12., 13., 14., 15.],
         [16., 17., 18., 19.]]),
 tensor([[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]),
 tensor([[ 6.,  6.,  6.],
         [22., 22., 22.],
         [38., 38., 38.],
         [54., 54., 54.],
         [70., 70., 70.]]))

We perform matrix multiplication on A and B. Here A is a matrix with 5 rows and 4 columns, and B is a matrix with 4 rows and 3 columns. After multiplying the two, we get a matrix with 5 rows and 3 columns.

1.7 Norm

Some of the most useful operators in linear algebra are 范数(norm). Informally, the norm of a vector tells us how big a vector is. The concept of size considered here does not refer to dimensions, but to the size of components.
Given an arbitrary vector x , the vector norm has to satisfy some properties.

  • One property is that if we scale all elements of a vector by a constant factor α, its norm is also scaled by the absolute value of the same constant factor:
    insert image description here

  • The second property is the familiar triangle inequality
    insert image description here

  • The third property simply says that the norm must be non-negative
    insert image description here

L 2 norm:
Euclidean distance is an L 2 norm.
Suppose the elements in an n-dimensional vector x are x 1 ,…,x n , whose L2 norm is the square root of the sum of squares of the vector elements:
insert image description here
where, in L 2 The subscript 2 is often omitted from the norm, which means that ∥x∥ is equivalent to ∥x∥ 2 . In the code, we can calculate the L2 norm of the vector as follows. In deep learning, we use the square of the L2 norm more often .

u = torch.tensor([3.0, -4.0])
torch.norm(u)

output:

tensor(5.)

L1 norm: Expressed as the sum of the absolute values ​​of
the vector elements: Compared
insert image description here
to the L2 norm, the L1 norm is less affected by outliers. To compute the L1 norm, we combine the absolute value function and element-wise summation .

torch.abs(u).sum()

output:

tensor(7.)

Both the L2 norm and the L1 norm are special cases of the more general Lp norm:
insert image description here
in deep learning, we often try to solve optimization problems: maximize the probability assigned to the observed data; minimize the difference between predictions and true observations the distance. Items (such as words, products, or news articles) are represented by vectors in order to minimize the distance between similar items and maximize the distance between different items. The goal, perhaps the most important component of a deep learning algorithm (besides the data), is usually expressed as a norm.

Guess you like

Origin blog.csdn.net/qq_52118067/article/details/122522021