Hands-on Machine Learning - Data Preprocessing & Linear Algebra

1. Supervised Learning

  • A good rule of thumb for regression problems is that any question about "how much" is likely to be a regression problem.
    In these cases, we will try to learn the model that minimizes the "difference between predicted and actual label values". In most chapters of this book, we will focus on minimizing the squared error loss function.

  • This "which one?" question is called a classification problem. In classification problems, we want the model to be able to predict which category (category, formally called class) the sample belongs to. For example, for handwritten numbers, we might have 10 classes for the digits 0 to 9. When we have more than two classes, we call this problem a multiclass classification problem. Common examples include handwritten character recognition. Unlike solving regression problems, a common loss function for classification problems is called cross-entropy

  • Labeling problems, the problem of learning to predict classes that are not mutually exclusive is called multi-label classification. Take, for example, the tags people put on tech blogs, like "machine learning," "technology," "gadgets," "programming languages," "Linux," "cloud computing," "AWS." A typical article might use 5-10 tags because the concepts are interrelated

2. Unsupervised Learning

Your boss might give you a ton of data and ask you to do some data science with it, without asking for the results. We call machine learning problems with no "target" in this type of data unsupervised learning, and we will discuss unsupervised learning techniques in later chapters.

  • Clustering problem: K-Means, random forest, etc. are commonly used to classify a set of random data.
  • Principal Component Analysis Problem: Can we find a small number of parameters that accurately capture the linearly dependent properties of the data? For example, the trajectory of a ball can be described by the ball's velocity, diameter, and mass. As another example, tailors have developed a small set of parameters that fairly accurately describe the shape of the human body to fit the clothes. Another example: is there a representation of an (arbitrarily structured) object in Euclidean space such that its symbolic properties are well matched? This could be used to describe entities and their relationships, e.g. "Rome" "Italy " "Paris France".
  • The question of causality and probabilistic graphical models: Can we describe the root cause of much of the observed data? For example, if we have demographic data on housing prices, pollution, crime, geography, education, and wages, can we simply discover relationships between them based on empirical data?
  • Generative adversarial networks: Give us a way to synthesize data, even complex unstructured data like images and audio. The underlying statistical mechanisms are tests that check whether real and fake data are the same, and it is another important and exciting area of ​​unsupervised learning

3. Reinforcement Learning

If you are interested in using machine learning to develop interactions and actions with the environment, you may end up focusing on reinforcement learning. This could include artificial intelligence (AI) applied to robots, dialogue systems, and even the development of video games. Deep reinforcement learning (deep reinforcement learning) is a very popular research field that applies deep learning to reinforcement learning problems. The groundbreaking deep Q-network (Q-network) that beat humans at an Atari game using only visual input, and the AlphaGo program that beat a world champion at the board game Go, are two prominent examples of reinforcement learning.

4. Getting Started Knowledge

First, we introduce n-dimensional arrays, also known as tensors. This section will be familiar to readers who have used the NumPy computing package in Python. No matter which deep learning framework is used, its tensor class (ndarray in MXNet, Tensor in PyTorch and TensorFlow) is similar to Numpy's ndarray. But the deep learning framework has more important functions than Numpy's ndarray: first, GPU supports accelerated computing well, while NumPy only supports CPU computing; second, tensor classes support automatic differentiation. These features make tensor classes more suitable for deep learning. Unless otherwise specified, the tensor mentioned in this book refers to the instance of the tensor class.

  • First, we can use arange to create a row vector x. This row vector contains the first 12 integers starting with 0, which are created as integers by default. You can also specify the creation type as a floating point number. Each value in a tensor is called a tensor 元素(element). For example, there are 12 elements in the tensor x. Unless otherwise specified, new tensors will be stored in memory and computed on a CPU basis.

  • To change the shape of a tensor without changing the number of elements and element values, call the reshape function. For example, the tensor x can be converted from a row vector of shape (12,) to a matrix of shape (3,4). This new tensor contains the same values ​​as before the transformation, but it is treated as a matrix of 3 rows and 4 columns. It is important to note that although the tensor's shape has changed, its element values ​​have not. Note that by changing the shape of the tensor, the size of the tensor does not change

  • We don't need to change the shape by manually specifying each dimension. That is to say, if our target shape is (height, width), then after knowing the width, the height will be calculated automatically, and we don't have to do the division ourselves. In the above example, to get a 3-row matrix, we manually specified that it has 3 rows and 4 columns. Fortunately, we can call this automatically calculated dimension by passing -1. That is, we can replace x.reshape(3,4) with x.reshape(-1,4) or x.reshape(3,-1).

  • Sometimes we want to initialize a matrix with all 0s, all 1s, other constants, or numbers randomly sampled from a particular distribution. We can create a tensor of shape (2,3,4) with all elements set to 0. x = x.zeros((2,3,4))

  • Similarly, we can create a tensor of shape (2,3,4) with all elements set to 1
    y = y.ones(2,3,4)

  • Sometimes we want to get the value of each element in a tensor by randomly sampling from a certain probability distribution. For example, when we construct arrays to serve as parameters in a neural network, we usually initialize the values ​​of the parameters randomly. The following code creates a tensor of shape (3,4). Each element in it is randomly sampled from a standard Gaussian distribution (normal distribution) with mean 0 and standard deviation 1. y = torch.randn((3,4))

  • Common standard arithmetic operators (+, -, *, /, and **) can be upgraded to element-wise operations on arbitrary tensors of the same shape. We can call element-wise operations on any two tensors of the same shape.x ** y # **运算符是求幂运算

  • y = torch.exp(x) A tensor of the same dimension as x composed of y=e raised to the xi power

X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)

result:
insert image description here

  • Summing all elements in a tensor produces a one-element tensor.
    insert image description here
  • Broadcasting mechanism Since a and b are sum matrices respectively, if you add them up, their shapes don't match. We broadcast the two matrices into one larger matrix as follows: matrix a will copy the columns, matrix b will copy the rows, and then add element-wise.
    insert image description here
  • As with any Python array: the first element has index 0, the last element index -1; a range can be specified to include the first element and the element before the last. As shown below, we can use [-1] to select the last element, and [1:3] to select the second and third elements.
    insert image description here

It should be noted here that because X is a two-dimensional array, it contains 3 arrays with 1 row and 4 columns. Directly using X[-1] means to take the last 1 row and 4 columns, so the result is a 1 row A one-dimensional array of 4 columns.
and the slice is前闭后开

  • Here X[row slice, column slice] :
    insert image description here
  • Z = torch.zeros_like(Y) This function is literally, creating a 0 matrix with the same shape as Y
    insert image description here
  • During model training, each addition and display of the loss value is a practical loss.item(), which converts the tensor data type into a numerical type.
    insert image description here

5. Data preprocessing

  • To load the raw dataset from the created CSV file, we import the pandas package and call the read_csv function
    insert image description here
  • To deal with missing values, one is interpolation and the other is deletion.
    Through the position index iloc, we divide the data into inputs and outputs, where the former is the first two columns of data, and the latter is the last column of data. For missing values ​​in inputs, we replace "NaN" entries with the mean of the same column.
    insert image description here
  • Data filling uses inputs = inputs.fillna(inputs.mean()) to fill all missing values ​​in the inputs array with the mean value of the column .
  • Convert to tensor format inputs = torch.tensor(inputs.values)

6. Linear Algebra

  • Strictly speaking, we call a scalar that contains only one value, so the writer.add_scalar("name", scalar, step) in tensorboard is used to record the change trend of loss.
  • You can think of a vector as a list of scalar values. We call these scalar values ​​the elements or components of the vector
    insert image description here
  • Now we access the transpose of the matrix in code
    insert image description here
  • Tensors will become even more important when we start working with images, which come as dimensional arrays with 3 axes corresponding to height, width, and a channel axis, which represents the color channels (red, green, and blue)
    insert image description here
  • By default, calling the sum function reduces the dimensions of the tensor along all axes, making it a scalar. We can also specify along which axis the tensor is reduced by summation. Taking a matrix as an example, in order to reduce dimensionality (axis 0) by summing the elements of all rows, we can specify axis=0 when calling the function. Since the input matrix is ​​dimensionally reduced along the 0 axis to generate the output vector, the dimensionality of the input axis 0 disappears in the output shape. Specifying axis=1 will reduce dimensionality by summing the elements of all columns (axis 1). Therefore, the dimensionality of the input axis 1 disappears in the output shape.
    insert image description here
  • At the same time, in addition to sum, there is also mean, which can be controlled by axis dimension, axis=0, calculates the mean value of the row, that is, starts from the first column, calculates the mean value, so the last is 4, if axis=1, Calculate the mean value of the column, set the value of the next row, and traverse the column to solve the mean value.
    insert image description here

Note that axis=0 is to sum the rows, not to sum each row. It is to find the sum or mean value of the i-th element of each row according to the traversal order of the rows, and 1 is vice versa.

5.1 Non-dimension reduction summation

Sometimes it is useful to keep the number of axes constant when calling the function to calculate the sum or mean.
insert image description here
insert image description here
If we want to calculate the cumulative sum of the elements of A along a certain axis, say axis=0 (by row), we can call the cumsum function. This function does not reduce the dimensionality of the input tensor along any axis. Data processing is carried out in the form of sequential addition and assignment.
insert image description here

6.2 Dot product

insert image description here
insert image description here
insert image description here
The tensor data appearing in tensor.arange(5) is not a matrix but a vector, so be careful
insert image description here

6.3 Matrix multiplication

insert image description here

6.3 Norm

L2 norm:
insert image description here

The L2 norm is the square sum of the elements and then the square root is calculated using torch.norm(tensor).

L1 norm:
insert image description here
put 1 and 2 in to get the norm formula
insert image description here

Frobenius 范数(Frobenius norm)
insert image description here

Guess you like

Origin blog.csdn.net/qq_44864833/article/details/126669030