Tensors in deep learning

Definition of tensor

The data used in the previous example is stored in a multi-dimensional Numpy array, also called a tensor. In general, all current machine learning systems use tensors as the basic data structure. Tensors are very important in this field, so important that Google's TensorFlow is named after it. The core of the concept of tensor is that it is a data container. The data it contains is almost always numeric data, so it is a container of numbers. You may be familiar with matrices, which are two-dimensional tensors. A tensor is a generalization of a matrix to any dimension [note that the dimension of a tensor (dimension) is usually called the axis].

vector

A tensor that contains only one number is called a scalar (scalar, also called scalar tensor, zero-dimensional tensor, or 0D tensor).
An array of numbers is called a vector or a one-dimensional tensor (1D tensor). One-dimensional tensor has only one axis. Below is a Numpy vector.

>>> x = np.array([12, 3, 6, 14, 7])
>>> x
array([12, 3, 6, 14, 7])
>>> x.ndim
1

This vector has 5 elements, so it is called a 5D vector. Don't confuse 5D vectors with 5D tensors! The 5D vector has only one axis, with 5 dimensions along the axis, and the 5D tensor has 5 axes (there may be any number of dimensions along each axis). Dimensionality (dimensionality) can represent the number of elements along a certain axis (such as a 5D vector) or the number of axes in a tensor (such as a 5D tensor), which is sometimes confusing. For the latter case, the technically more accurate statement is the 5th order tensor (the order of the tensor is the number of axes), but the vague way of writing the 5D tensor is more common.

Combining multiple 3D tensors into an array, you can create a 4D tensor, and so on. Deep learning generally processes
tensors from 0D to 4D, but you may encounter 5D tensors when processing video data.

Key attribute

Tensors are defined by the following three key attributes.
‰ Number of axes (order). For example, a 3D tensor has 3 axes, and a matrix has 2 axes. This is also called tensor's ndim in Python libraries such as Numpy.
‰ shape. This is a tuple of integers that represents the dimension (number of elements) of the tensor along each axis. For example, the shape of the previous matrix example is (3, 5), and the shape of the 3D tensor example is (3, 3, 5). The shape of the vector contains only one element, such as (5,), and the shape of the scalar is empty, ie ().
‰ Data type (often called dtype in Python libraries). This is the type of data contained in the tensor. For example, the type of tensor can be float32, uint8, float64, and so on. In rare cases, you may encounter char tensors. Note that there is no string tensor in Numpy (and most other libraries) because the tensor is stored in a pre-allocated contiguous memory segment, and the length of the string is variable and cannot be stored in this way.

Operation tensor

We use the syntax train_images [i] to select specific numbers along the first axis. Selecting a specific element of a tensor is called tensor slicing.

Data batch

Generally speaking, the first axis (axis 0, because the index starts at 0) of all data tensors in deep learning is the sample axis (sometimes called the sample dimension).
The first axis (0 axis) is called the batch axis or batch dimension.

Data tensors in the real world

‰ Vector data: 2D tensor with shape (samples, features).
‰ Time series data or sequence data: 3D tensor with shape (samples, timesteps, features).
‰ Image: 4D tensor with shape (samples, height, width, channels) or (samples, channels, height, width).
‰ Video: 5D tensor with shape (samples, frames, height, width, channels) or (samples, frames, channels, height, width).

Vector data

This is the most common data. For this data set, each data point is encoded as a vector, so a batch of data is encoded as a 2D tensor (that is, an array of vectors), where the first axis is the sample axis and the second axis is Feature axis.

Time series data or sequence data

Insert picture description here

Image data

Images usually have three dimensions: height, width, and color depth. Although grayscale images (such as MNIST digital images) have only one color channel and can therefore be stored in 2D tensors, by convention, image tensors are always 3D tensors, and grayscale images have only one-dimensional color channel. Therefore, if the image size is 256 × 256, the batch consisting of 128 grayscale images can be stored in a tensor of shape (128, 256, 256, 1), and the batch consisting of 128 color images can be stored in a In a tensor of shape (128, 256, 256, 3).
There are two conventions for the shape of the image tensor: the channels-last convention (used in TensorFlow) and the channels-first convention (used in Theano). Google ’s TensorFlow machine learning framework puts the color depth axis at the end: (samples, height, width, color_depth). In contrast, Theano puts the image depth axis after the batch axis: (samples, color_depth, height, width).

Video data

Video data is one of the few data types that requires 5D tensors in real life. Video can be seen as a series of frames, each frame is a color image. Since each frame can be saved in a 3D tensor with the shape (height, width, color_depth), a series of frames can be saved in a 4D tensor with the shape (frames, height, width, color_depth), which is composed of different videos The batch can be saved in a 5D tensor with the shape (samples, frames, height, width, color_depth).

Published 304 original articles · 51 praises · 140,000 views

Guess you like

Origin blog.csdn.net/qq_39905917/article/details/104546998