This article will help you understand what a tensor is in deep learning, how it operates, how to understand tensors, and the dimensions of tensors. It is easy to understand.

The mathematical foundation of deep learning (don’t be intimidated, it’s very simple)

Data representation and tensor operations

Tensor

In multidimensional Numpy arrays, they are also called tensors. Generally speaking, all current machine learning systems use tensors as the basic data structure.

The core concept of a tensor is that it is a data container. The data it contains is almost always numerical data, so it is a container for numbers. You may be familiar with matrices, which are two-dimensional tensors. Tensors are a generalization of matrices to arbitrary dimensions (note that the dimension of a tensor is usually called the axis)

0. scalar scalar 0D tensor

A tensor that contains only one number is called a scalar (also called a scalar tensor, a zero-dimensional tensor, or an 0D tensor). In Numpy, a float32 or float64 number is a scalar tensor (or scalar array). You can use the ndim attribute to check the number of axes of a Numpy tensor. A scalar tensor has 0 axes (ndim == 0). The number of tensor axes is also called rank. Below is a Numpy scalar.

1. vector vector 1D tensor

An array of numbers is called a vector or one-dimensional tensor (1D tensor). A one-dimensional tensor has only one axis. Below is a Numpy vector.

\>>> x = np.array([12, 3, 6, 14, 7])

\>>> x

array([12, 3, 6, 14, 7])

\>>> x.ndim

1

This vector has 5 elements, so it is called a 5D vector. Don't confuse 5D vectors with 5D tensors! A 5D vector has only one axis and has 5 dimensions along it, whereas a 5D tensor has 5 axes (possibly any number of dimensions along each axis). Dimensionality can represent the number of elements along an axis (such as a 5D vector), or it can represent the number of axes in a tensor (such as a 5D tensor), which can sometimes be confusing. In the latter case, the technically more accurate term is a 5th order tensor (the order of the tensor is the number of axes), but the vague way of writing 5D tensor is more common.

Supplement: From Andrew Ng's "Machine Learning"
Vector: An nx 1 matrix
vector is a matrix with only one column

2. matrix matrix 2D tensor
\>>> x = np.array([[5, 78, 2, 34, 0],

 [6, 79, 3, 35, 1],

 [7, 80, 4, 36, 2]])

\>>> x.ndim

2

The elements on the first axis are called rows, and the elements on the second axis are called columns. In the above example, [5, 78, 2, 34, 0] is the first row of x and [5, 6, 7] is the first column.

Key attributes
  • The number of axes (order). For example, a 3D tensor has 3 axes and a matrix has 2 axes. This is also called tensor ndim in Python libraries such as Numpy.
  • shape. This is a tuple of integers representing the dimensions (number of elements) of the tensor along each axis . For example, the previous matrix example has shape (3, 5), and the 3D tensor example has shape (3, 3, 5). The shape of a vector contains only one element, such as (5,), while the shape of a scalar is empty, that is, ().
  • Data type (often called dtype in Python libraries). This is the type of data contained in the tensor. For example, the type of a tensor can be float32, uint8, float64, etc. In rare cases, you may encounter character (char) tensors. Note that string tensors do not exist in Numpy (and most other libraries) because tensors are stored in pre-allocated contiguous memory segments, while strings are of variable length and cannot be stored this way.

Matrix Addition
Insert image description here
Multiplying Scalars
Insert image description here

Matrix multiplication
Insert image description here

Techniques for program implementation to predict multiple house prices : use matrix operations instead of for, which makes calculations more efficient

Insert image description here
If there are multiple sets of possible parameters, multiply the matrix by the matrix (equivalent to splitting the second matrix into several column vectors)
Insert image description here

Data batch

Generally speaking, the first axis (0 axis, because the index starts from 0) of all data tensors in deep learning is the samples axis (sometimes also called the sample dimension).

For this kind of batch tensor, the first axis (0 axis) is called the batch axis or batch dimension.

batch = train_images[:128]

batch = train_images[128:256]

batch = train_images[128 * n:128 * (n + 1)]
Real world data tensors

‰ Vector data: 2D tensor with shape (samples, features).

‰ Time series data or sequence data: 3D tensor of shape (samples, timesteps, features).

‰ Image: 4D tensor of shape (samples, height, width, channels) or (samples, channels,

height, width)。

‰ Video: 5D tensor of shape (samples, frames, height, width, channels) or (samples, frames, channels, height, width).

vector data

This is the most common data. For this kind of dataset, each data point is encoded as a vector, so a data batch is encoded as a 2D tensor (i.e., an array of vectors), where the first axis is the sample axis and the second axis is Characteristic axis .

  • A demographic data set that includes each person's age, zip code, and income. Each person can be represented as a vector containing 3 values, and the entire dataset contains 100 000 people and can therefore be stored in a 2D tensor of shape (100000, 3).
  • Text document dataset, we represent each document as the number of times each word appears in it (the dictionary contains 20 000 common words). Each document can be encoded as a vector containing 20 000 values ​​(each value corresponds to the number of occurrences of each word in the dictionary), and the entire data set contains 500 documents, so it can be stored in a shape of shape (500, 20000) in the tensor.
time series data or sequence data

When time (or sequence order) is important to the data, the data should be stored in a 3D tensor with a timeline.

Each sample can be encoded as a sequence of vectors (i.e., a 2D tensor), so a batch of data is encoded as a 3D tensor.

By convention, the time axis is always the 2nd axis (the axis with index 1). Let's look at a few examples.

  • Stock price data set. Each minute, we save the current price of the stock, the highest price of the previous minute, and the lowest price of the previous minute. So each minute is encoded as a 3D vector, the entire trading day is encoded as a 2D tensor of shape (390, 3) (there are 390 minutes in a trading day), and 250 days of data can be saved in a shape of (390, 3) 250, 390, 3) in the 3D tensor. Here each sample is one day's stock data.
  • Tweets dataset. We encode each tweet as a sequence of 280 characters, each character coming from the 128-character alphabet. In this case, each character can be encoded as a binary vector of size 128 (only the index position corresponding to the character has a value of 1, and other elements are 0). Then each tweet can be encoded as a 2D tensor of shape (280, 128), and a dataset containing 1 million tweets can be stored in a tensor of shape (1000000, 280, 128).
image data

Images typically have three dimensions: height, width, and color depth

Although grayscale images (such as MNIST digital images) have only one color channel and can therefore be stored in a 2D tensor, by convention image tensors are always 3D tensors, and the color channel of a grayscale image has only one dimension. Therefore, if the image size is 256×256, then a batch of 128 grayscale images can be saved in a tensor of shape (128, 256, 256, 1), while a batch of 128 color images can be saved in a in a tensor of shape (128, 256, 256, 3).

There are two conventions for the shape of image tensors: the channels-last convention (used in TensorFlow) and the channels-first convention (used in Theano). Google's TensorFlow machine learning framework puts the color depth axis at the end: (samples, height, width, color_depth). In contrast, Theano places the image depth axis after the batch axis: (samples, color_depth, height, width). If the Theano convention is adopted, the previous two examples will become (128, 1, 256, 256) and (128, 3, 256, 256). The Keras framework supports both formats.

video data

Video data is one of the few data types that requires the use of 5D tensors in real life. Video can be viewed as a series of frames, each frame being a color image. Since each frame can be saved in a 3D tensor of shape (height, width, color_depth) , a series of frames can be saved in a 4D tensor of shape (frames, height, width, color_depth) , and different videos are composed of The batch can be stored in a 5D tensor with shape **(samples, frames, height, width, color_depth)**.

For example, a 60-second YouTube video clip sampled at 4 frames per second, the video size is 144×256, and the video has a total of 240 frames. A batch of 4 such video clips will be stored in a tensor of shape (4, 240, 144, 256, 3). There are 106 168 320 values ​​in total! If the data type (dtype) of the tensor is float32 and each value is 32 bits, then the tensor has a total of 405MB. so big! The videos you encounter in real life are much smaller because they are not stored in float32 format and are usually heavily compressed, such as MPEG.

tensor operations
Element-wise calculation

The relu operation and addition are both element-wise operations, that is, the operation is applied independently to each element in the tensor. In other words, these operations are very suitable for large-scale parallel implementation (vectorized implementation, The term comes from the vector processor supercomputer architecture of 1970-1990)

def naive_relu(x):
 assert len(x.shape) == 2 
 x = x.copy() 
 for i in range(x.shape[0]):
 for j in range(x.shape[1]):
 x[i, j] = max(x[i, j], 0)
 return x

def naive_add(x, y):
 assert len(x.shape) == 2 
 assert x.shape == y.shape
 x = x.copy() 
 for i in range(x.shape[0]):
 for j in range(x.shape[1]):
 x[i, j] += y[i, j]
 return x

According to the same method, you can implement element-wise multiplication, subtraction, etc.

broadcast

The simple implementation of naive_add in the previous section only supports the addition of two 2D tensors of the same shape. But in the Dense layer introduced earlier, we added a 2D tensor to a vector. If two tensors with different shapes are added, the smaller tensor will be broadcast to match the shape of the larger tensor if there is no ambiguity .

  • (1) Add an axis (called a broadcast axis) to the smaller tensor so that its ndim is the same as the larger tensor.
  • (2) Repeat the smaller tensor along the new axis so that it has the same shape as the larger tensor.

Let’s look at a specific example. Suppose the shape of X is (32, 10) and the shape of y is (10,). First, we give y

Add an empty first axis so that the shape of y becomes (1, 10). We then repeat y along the new axis 32 times, so that the resulting tensor Y has shape (32, 10), and Y[i, :] == y for i in range(0, 32). Now, we can add X and Y because they have the same shape.

In the actual implementation, new 2D tensors are not created because it is very inefficient. The repeated operation is completely virtual , it only occurs in the algorithm and does not occur in memory . But imagining repeating the vector 10 times along a new axis is a useful mental model. Below is a simple implementation.

def naive_add_matrix_and_vector(x, y):
 assert len(x.shape) == 2 
 assert len(y.shape) == 1 
 assert x.shape[1] == y.shape[0]
 x = x.copy() 
 for i in range(x.shape[0]):
 for j in range(x.shape[1]):
 x[i, j] += y[j]
 return x
tensor dot product

The dot product operation, also called tensor product (not to be confused with element-wise product), is the most common and most useful tensor operation. Unlike element-wise operations, it merges the elements of the input tensor together.

In Numpy, Keras, Theano and TensorFlow, *** is used to implement element-wise product**. Dot products in TensorFlow use different syntax, but in Numpy and Keras, the dot product is implemented using the standard dot operator .

import numpy as np

z = np.dot(x, y)

The dot (.) in mathematical symbols represents the dot product operation.

z=x.y

The dot product of two vectors is a scalar

The dot product of a matrix and a vector is a vector

Dot products can be generalized to tensors with any number of axes. Probably the most common application is the dot product between two matrices. For two matrices x and y, you can do dot product (dot(x, y)) of two matrices if and only if x.shape[1] == y.shape[0]. The result is a matrix of shape (x.shape[0], y.shape[1])

More generally, you can do dot products with higher-dimensional tensors, as long as their shape matching follows the same principles as before for 2D tensors:

  • (a, b, c, d) . (d,) -> (a, b, c)

  • (a, b, c, d) . (d, e) -> (a, b, c, e)

tensor deformation

reshape

>>> x = np.array([[0., 1.],
 [2., 3.],
 [4., 5.]])
>>> print(x.shape)
(3, 2)
>>> x = x.reshape((6, 1))

transposetranspose

>>> x = np.zeros((300, 20)) 
>>> x = np.transpose(x)
>>> print(x.shape)
(20, 300)
Geometric interpretation
Geometric interpretation of tensors

Tensor operations operate on tensors whose elements can be interpreted as coordinates of points in some geometric space.

Add two tensors. Geometrically, this is equivalent to connecting two vector arrows together, and the resulting position represents the vector corresponding to the sum of the two vectors.

Generally speaking, basic geometric operations such as affine transformation, rotation, and scaling can be expressed as tensor operations. For example, to rotate a two-dimensional vector by theta angle, you can do it by doing a dot product with a 2×2 matrix. This matrix is ​​R = [u, v], where u and v are both plane vectors: u = [cos(theta), sin(theta)], v = [-sin(theta), cos(theta)].

Geometric explanation of deep learning

Now crumple the two papers together into small balls. This crumpled ball of paper is your input data, and each piece of paper corresponds to a category in the classification problem. All the neural network (or any machine learning model) has to do is find a transformation that flattens the paper ball so that the two categories are clearly separable again. Through deep learning, this process can be realized with a series of simple transformations in three-dimensional space. For example, the transformation you make on a paper ball with your finger to make the paper ball return to flatness is the content of machine learning: for complex, highly folded data flows. Find a concise representation of the shape.

By now you should have a good idea of ​​why deep learning is particularly good at this: it breaks down complex geometric transformations step by step into long lists of basic geometric transformations, in much the same way humans would unfold a paper ball. Each layer of a deep network unravels the data a little bit through a transformation—many layers stacked together can achieve very complex unraveling processes.

gradient-based optimization

Mini-batch stochastic gradient descent, also known as mini-batch SGD. The term stochastic means that each batch of data is drawn at random (stochastic is the scientific synonym for random).

Note that a variant of the mini-batch SGD algorithm draws only one sample and target at each iteration instead of drawing a batch of data. This is called true SGD (as distinguished from mini-batch SGD). There is also the other extreme, where every iteration is run on all data, which is called batch SGD. Doing so makes each update more accurate, but is also much more computationally expensive. A valid compromise between these two extremes is choosing a reasonable batch size.

References

This article is referenced from "python deep learning"

Guess you like

Origin blog.csdn.net/Zilong0128/article/details/125744754