Introduction to Basic Concepts of Machine Learning

Introduction to Basic Concepts of Machine Learning

Machine learning is to give the machine the ability to find a function.

For example:

  • Speech recognition: the input of the function is sound parameters, and the output of the function is text

  • Image recognition: the input of the function is an image, and the output of the function is the name of the item in the image

  • AlphaGo: The function input is the distribution of black and white pieces on the chessboard, and the function output is the position where the next step should be placed

Different types of Functions (machine learning tasks)

Regression

output a value (a scalar)

Classification

Output the given options, types (options classes)

AlphaGo itself is also a Classification problem, but there are more output options.

Structured Learning

Generate a structured object, i.e. let the machine learn to create.

Let the machine draw a picture and write an article

Steps (steps for machine learning to find functions)

1. Function with Unknown Parameters

Build a model function (Based on domain knowledge), that is, define a function with unknown parameters

2. Define Loss from Training Data

The loss function is a function with unknown parameters in the model function as input, which is used to evaluate the quality of a set of parameters

Classification:

  • MAE - L is the mean of the sum of absolute errors (mean absolute error)
  • MSE - L is the mean of the sum of squared errors (mean square error)
  • If the predicted value and the real value are probability distributions --> Cross-entropy (cross-entropy)

Note:

  • The Loss function is actually defined by itself, so there may be negative numbers.

  • The real value is often denoted as label .

3. Optimization

Find the optimal parameter solution, that is, a set of parameters that minimize the value of the Loss function

gradient descent (gradient descent method)

Steps:

  1. Randomly pick initial values ​​for each parameter.

  2. Computes the derivative of the Loss function at selected points for each unknown parameter. And substitute into the formula to calculate the new value of the unknown parameter.
    Please add a picture description

    η : Learning rate The learning rate is a size determined by oneself. If the setting is larger, the learning speed will be faster.

    This kind of thing that needs to be set by yourself when doing machine learning is called hyper parameters (beyond parameters)

  3. Iterative update for unknown parameters.

Models

Linear Models-linear model

feature multiplied by a weight plus a bias to get a predicted result

  • Format: y = b + w*x

But usually Linear models have severe limitation. This limitation from model is called Model Bias .

Piecewise Linear Curves

Hard Sigmoid

for Linear Regression

All Piecewise Linear Curves = constant + sum of set of Please add a picture description

All linear polylines can be written as combinations of a bunch of Hard Sigmoids (this blue function).

Soft Sigmoid(Sigmoid)

For Logistic Regression

Hard Sigmoid (this blue function) can be approximated using the sigmoid function , as shown in the figure below.
Please add a picture description

The shape of the graph can be adjusted by adjusting the parameters of the Sigmoid Function:

  • w controls slope
  • b move left and right
  • c height adjustment
    Please add a picture description

When we have multiple features, that is, multiple x, the form of Sigmoid Function is as follows, and we can also use linear algebra to simplify the functional form.
Please add a picture description
Please add a picture description

Optimization (Supplementary)

gradient

Collect the differential of the Loss function for each parameter to form a vector called gradient. It is abbreviated as ▽L(θ 0 ), where θ 0 means to seek differentiation at the θ 0 point. As shown below.

Please add a picture description

update and epoch

Since the data sets are usually relatively large, the L of all the data is generally not directly used for optimization, but the data is divided into batches one by one (just group them casually), and only L in one batch is used for each update parameter.

Each update of the parameters is called update, and the cycle of all batches is called epoch.

It takes a long time to update the parameters, so it usually ends when you don’t want to update any more, or when the differential is 0.
Please add a picture description

hyper parameters
  • Here the Batch size is also set by yourself, which is a hyper parameter.

  • Several sigmoids are also hyper parameters.

resume

Of course, it is not necessary to use Soft Sigmoid, and Hard Sigmoid can also be used. In machine learning, Hard Sigmoid is called ReLU.

Please add a picture description

If you want to convert Sigmoid into ReLU, you only need to change the place of sigmoid in the function to max. Since the combination of two max is a hard Sigmoid (ReLU), the value of i should be *2.

1000 ReLUs can simulate a model with a thousand polylines.

Please add a picture description

Activation function

Sigmoid and ReLU are Activation functions , and of course there are other Activation functions.

In order to make the model more optimized, we perform the Sigmoid or ReLU operation on the results obtained by Sigmoid or ReLU again.
Please add a picture description
Please add a picture description

Deep Learning

These Sigmoid or ReLU are called Neurons (neurons), and many Neurons form a Neural Network (neural network).

Because Neural Network was touted too much when it first appeared, it was later rejected, so in order to revive its glory, the name was changed.

Neuron is called hidden layer , and the composition of many layers is called Deep Learning .

Overfitting

During the training process of the neural network, the accuracy rate becomes higher on the training data, but the accuracy rate on the unseen data becomes worse.

When the activation function is used more times, overfitting may occur.

Steps (steps for machine learning to find functions - neural networks)

  1. Define Neural Network
  2. Loss Function
  3. Optimization Algorithm

Guess you like

Origin blog.csdn.net/qq_61539914/article/details/126574064