"30 minutes" will take you into the world of linear regression, and easily learn the recommendation algorithm of Internet marketing!

 

foreword

This chapter mainly introduces the derivation and introduction of the algorithm used in the recommendation system-linear regression algorithm. The idea of ​​the article is as follows: from the introduction of machine learning to supervised learning, and focuses on the linear regression problem and derivation in the regression problem in supervised learning.

It may be necessary for everyone to have certain knowledge of statistics and advanced mathematics.

1. Elicited by machine learning

1.1 The Beginning of Machine Learning

In 1952, IBM's Arthur Samuel (known as the "Father of Machine Learning") designed a checkers program that could learn

It can build new models by observing the positions of chess pieces, and use them to improve its chess-playing skills

Samuel played multiple games with the program and found that the program's chess skills became better and better over time.

1.2 Learning? with machine learning

Starting from people, ① learning theory and summarizing from practice; ② deriving in theory and testing in practice; ③ the process of acquiring knowledge or skills through various means

So how do machines learn?

  • To deal with a specific task, based on a lot of "experience"
  • Give certain criteria for judging the quality of task completion
  • Tasks get better by analyzing empirical data

1.3 Classification of Machine Learning Algorithms

Machine learning is a method for computers to construct probability and statistics models based on data distribution, and use the models to analyze and predict data. According to the different ways of learning data distribution, the main forms of machine learning are supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning:

Other subdivisions, as shown in the figure below:

1.4 Machine Learning Modeling Process

  1. Clarify the business problem

Defining business problems is a prerequisite for machine learning. Here, it is necessary to abstract solutions to real business problems: what kind of data needs to be learned as input, and what kind of model to make decisions as output is the goal.

  1. data selection

The data determines the upper limit of the machine learning results, and the algorithm is only as close as possible to this upper limit. The quality of the data determines the final performance of the model.

  1. feature engineering

Feature engineering is to convert raw data processing into useful features of the model, including data preprocessing and feature extraction

  1. model training

Model training is the process of selecting a model to learn from the data distribution. This process also needs to adjust the (hyper) parameters of the algorithm according to the training results to make the results better.

  1. model evaluation

The purpose of model learning is to make the learned model have good predictive ability (generalization ability) for new data. In reality, the training data learning degree and generalization ability of the model are usually evaluated by training error and test error

  1. model evaluation

Decision-making is the ultimate goal of machine learning, which analyzes and interprets model prediction information and applies it to actual work areas.

The above will not be explained in detail, but will be broken down in the next chapter

1.5 Introduction to Supervised Learning

Supervised learning uses labeled training data , and "supervision" can be understood as knowing the expected output signal (label) in the training sample (input data).

The supervised learning process is to first provide labeled training data to the machine learning algorithm to fit the prediction model, and then use the model to make predictions on new unlabeled data.

Classification of Supervised Learning

Supervised learning problems can be mainly divided into two categories, namely classification problems and regression problems.

  • Classification problems predict which class the data belongs to - discrete
  • Regression Problem Predicting a value from data - Continuous

classification problem

Classification problems predict which category the data belongs to. Examples of classification include spam detection, churn prediction, sentiment analysis, dog breed detection, etc.

For example: judging benign or malignant based on tumor characteristics, the result is "benign" or "malignant", which is discrete.

Related classification (Classfication) algorithm:

  • K Nearest Neighbors (K-NN)
  • Naive Bayes
  • Logistic Regression¶
  • Support Vector Machine (SVM)
  • Decision Trees

regression problem

Regression problems predict values ​​based on previously observed data. Examples of regression include house price predictions, stock price predictions, height-weight predictions, etc.

For example: predict housing prices, and fit a continuous curve based on the sample set.

Correlation regression (Regression) algorithm:

  • Linear Regression
  • Polynomial Regression
  • Ridge/Lasson Regression (Ridge/Lasson Regression)

2. Linear Regression (Linear Regression) for regression problems

2.1 Introduced by the problem regression problem

For example: we need to analyze which factors affect the bank loan amount?

After feature engineering, extract features (salary and age), a total of 2 features, which affect bank loans, as follows:

salary

age

quota

4000

25

20000

8000

30

70000

5000

28

35000

7500

33

50000

12000

40

85000

Goal: Predict how much the bank will loan me (label)

Consider: Salary and age both affect loan outcomes, so how much of an impact do they each have? (parameter)

Fitting equation

 

Error term - loss function

 

 

 

Parameter Solving

 

 

 

 

2.2 Least square method

 

 

 

2.3 Gradient Descent

2.3.1 Introduction

 

2.3.2 Introduction to Gradient Descent

 

 

α is called the learning rate or step size in the gradient descent algorithm, which means that we can use α to control the distance of each step. α cannot be too large or too small. If it is too small, it may cause delays in walking to the minimum. If the point is too large, it will cause the lowest point to be missed!

(5) Gradient descent idea

The central idea is to iteratively adjust parameters to minimize the cost function

Start with a random value of θ (random initialization), and then gradually improve, each time you take a step, each step tries to reduce the cost function (such as MSE) a little bit, until the algorithm converges to a minimum value. The learning step size is proportional to the slope of the cost function, so as the parameter approaches the minimum, the step size gradually becomes smaller

If the learning rate is too low, the algorithm needs a large number of iterations to converge

If the learning rate is too high, it may be higher than the previous starting point, which will cause the algorithm to diverge and the value will become larger and larger

(6) Two main challenges of gradient descent


This diagram shows two main challenges of gradient descent:

  • Gradient descent may not be able to find the global optimal solution, it may be a local optimal solution
  • If the loss function is a convex function, the solution obtained by gradient descent is not necessarily the global optimal solution

2.3.3 Comparison of Gradient Descent and Normal Equation

With an optimization algorithm such as gradient descent, regression has the ability to "automatically learn".

gradient descent

Normal equations (such as least squares)

Need to choose the learning rate

unnecessary

need to iteratively solve

One operation yields

Larger number of features can use

Need to calculate the equation, the time complexity is high O(n3)

2.3.4 Gradient descent classification

There are three different forms of gradient descent: Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent .

 

batch gradient descent

(1) Find the partial derivative of the objective function:

(2) Each iteration updates the parameters:

It is easy to get the optimal solution, but because all samples are considered each time, the speed is very slow

stochastic gradient descent

Its specific idea is: each update of Theta in the algorithm does not need to traverse the entire sample again, it only needs to check a training sample for update, and then use the next sample for the next update, iterating continuously like batch gradient descent renew.

Summary: Find a sample each time, the iteration speed is fast, but not necessarily in the direction of convergence every time

Mini-batch Gradient Descent

Each update selects a small part of the data to calculate, practical!

2.4 Others: Newton and quasi-Newton methods

Newton method

 

quasi Newton method

  • Newton's method needs to solve the inverse matrix of the Hessian matrix of the objective function, which greatly simplifies the calculation process
  • The quasi-Newton method approximates the inverse of the Hessian matrix by a positive definite matrix, which greatly simplifies the calculation process

 

3. Algorithm implementation

3.1 Least square method

Prepare data: data.csv

Code:

import numpy as np
import matplotlib.pyplot as plt

# 1. Import data
points = np.genfromtxt('data.csv', delimiter=',')
# Extract two columns of data in points, as x, y respectively
x = points[:, 0]
y = points[:, 1]

# Draw a scatterplot with plt
plt.scatter(x, y)
plt.show()

# 2. Define the loss function
# The loss function is a function of the coefficient, and the x, y of the data must also be passed in
def compute_cost(w, b, points):
    total_cost = 0
    M = len(points)
    
    # Calculate the squared loss error point by point, then average
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        total_cost += ( y - w * x - b ) ** 2
    
    return total_cost/M

# First define a function to calculate the mean
def average(data):
    sum = 0
    num = len(data)
    for i in range(num):
        sum += data[i]
    return am/am

# 3. Define the core fitting function
def fit(points):
    M = len(points)
    x_bar = average(points[:, 0])
    
    sum_yx = 0
    sum_x2 = 0
    sum_delta = 0
    
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        sum_yx += y * ( x - x_bar )
        sum_x2 += x ** 2
    # Calculate w according to the formula
    w = sum_yx / ( sum_x2 - M * (x_bar**2) )
    
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        sum_delta += ( y - w * x )
    b = sum_delta / M
    
    return w, b

# 4. Test
w, b = fit(points)

print("w is: ", w)
print("b is: ", b)

cost = compute_cost(w, b, points)

print("cost is: ", cost)

# 5. Draw the fitting curve
plt.scatter(x, y)
# For each x, calculate the predicted y value
pred_y = w * x + b

plt.plot(x, pred_y, c='r')
plt.show()

The graphics are as follows:

3.2 Gradient Descent

The following uses an example to apply the gradient descent algorithm to find the minimum value of y=x^2+1.

Algorithm flow:

  1. Define the initial value of the independent variable x1
  2. The function finds the gradient at x1, and the parameters are updated (find the steepest next point x2)
  3. The function finds the gradient at x2, and the parameters are updated (find the steepest next point x3)
  4. Iterate continuously until the minimum value is approximately equal (reaching the bottom)
#### 1-1 Load dependent libraries and define functions
import numpy as np
import matplotlib.pyplot as plt
# define y=x^2+1 function
def function(x):
    x = np.array(x)
    y = x ** 2 + 1
    return y

####1-2 Define the initial value of the parameter
#Specify the number of independent variable updates (number of iterations)
epochs = 50
# Specify the value of the learning rate
lr = 0.1
# Initialize the value of the argument
xi = -18

####1-3 Solve the gradient, update the parameters, and train continuously.
# Find the gradient value of the function
def get_gradient(x):
    gradient = 2 * x
    return gradient
# Used to store the value after each independent variable update
trajectory = []
# Use the gradient descent algorithm to find the value x_star of the independent variable that makes the function take the minimum value
def get_x_star(xi):
    for i in range(epochs):
        trajectory.append(xi)
        xi = xi - lr * get_gradient(xi)
    x_star = xi
    return x_star
# Run the get_x_star function
get_x_star(xi)


#####1-4 for display
x1 = np.arange(-20, 20, 0.1)
y = function(x1)
# draw function graph
plt.plot(x1, y)
x_trajectory = np.array(trajectory)
y_trajectory = function(trajectory)
# The value of the independent variable and its corresponding function during the drawing update process
plt.scatter(x_trajectory, y_trajectory)
plt.show()

The displayed graphics are as follows:

4. Linear regression application based on sklearn

Linear regression is a regression analysis method that deals with the relationship between one or more independent variables and dependent variables, and then performs modeling. If there is only one independent variable, it is called linear regression, and if there are two or more independent variables, it is called multiple regression. The linear_model module in sklearn integrates almost all linear models, and the linearRegression linear regression can be realized by using the linear_model.

Case: find out the fitting line y=ax+b of the house area to the house price, and predict the house price according to the house area.

Prepare data:

code show as below:

#1-1 Import the corresponding data module
import sys
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

# Read data from the csv file, respectively: X list and corresponding Y list
def get_data(file_name):
    # 1. Read csv with pandas
    data = pd.read_csv(file_name)
    print('data', data)
    # 2. Construct X list and Y list
    X_parameter = []
    Y_parameter = []
    for single_square_feet,single_price_value in zip(data['square_feet'],data['price']):
        X_parameter.append([float(single_square_feet)])
        Y_parameter.append([float(single_price_value)])
    return X_parameter,Y_parameter

# 1-2 Import the corresponding basic training data set
X,Y = get_data('./house_price.csv')

# 1-3 Drawing
regr = LinearRegression() # Construct regression object
regr.fit(X,Y)
predict_outcome = regr.predict([[700]]) # Get the predicted value and predict the price of a house with a size of 700 square feet

# Forecast line information
print(regr.intercept_) # intercept value
print(regr.coef_) # regression coefficient (slope value)
print(predict_outcome) # predicted value

plt.scatter(X,Y,color = 'blue') # Draw known data scatter diagram
plt.plot(X,regr.predict(X),color = 'red',linewidth = 4) # draw the predicted line
plt.title('Predict the house price')
plt.xlabel('square feet')
plt.ylabel('price')
plt.show() # display image

The effect is as follows:

Gradient off-line application

# stochastic gradient descent
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler

# Normalize the data
std = StandardScaler()
std.fit(X_train)
X_train_std = std.transform(X_train)
X_test_std = std.transform(X_test)

# n_iter represents how many times to browse, the default is 5
sgd_reg = SGDRegressor(n_iter=100)
sgd_reg.fit(X_train_std, y_train)
sgd_reg.score(X_test_std, y_test)

Guess you like

Origin blog.csdn.net/weixin_43805705/article/details/130974478