机器学习回归算法之线性回归

一、概念

线性回归(Linear Regression)是回归算法中比较简单的一种,是一种监督学习算法,类似于逻辑回归,但是线性回归不需要Sigmoid函数处理。

线性回归会拟合出一条直线,这条线可以某种程度上代表这些点的发展趋势和分布,拟合出线后,就可以推测后续点的分布,从而实现预测。

二、计算

除 Sigmoid函数外类似逻辑回归。

三、实现

算法分别使用sklearn和自己实现的算法实现线性回归:

# !/usr/bin/env python
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

cs = ['black', 'blue', 'brown', 'red', 'yellow', 'green']


def create_sample():
    np.random.seed(5)  # 随机数种子,保证随机数生成的顺序一样
    n_dim = 2
    num = 100
    k = 1
    data_mat = 1 * np.random.randn(1, n_dim)
    for i in range(num - 1):
        k += 1
        b = k * np.random.randn(1, n_dim)
        data_mat = np.concatenate((data_mat, b))
    return {'data_mat': data_mat}


def grad_ascent(data_mat, class_label, alpha):
    data_matrix = np.mat(data_mat).transpose()
    label_mat = np.mat(class_label).transpose()
    m, n = np.shape(data_matrix)
    data_matrix = augment(data_matrix)  # 增广
    n += 1
    weight = np.ones((n, 1))
    while True:
        error = data_matrix * weight - label_mat
        cha = alpha * data_matrix.transpose() * error
        if np.abs(np.sum(cha)) < 0.00001:
            break
        weight = weight - cha
    return np.asarray(weight).flatten()


def augment(data_matrix):
    n, n_dim = data_matrix.shape
    a = np.mat(np.ones((n, 1)))
    return np.concatenate((data_matrix, a), axis=1)


def plot_data(samples, color, plot_type='o'):
    plt.plot(samples[:, 0], samples[:, 1], plot_type, markerfacecolor=color, markersize=14)


def sk_linear_regression(x, y):
    linear_regression = linear_model.LinearRegression()
    linear_regression.fit(x, y)
    return np.asarray((linear_regression.coef_, linear_regression.intercept_)).flatten()


def main():
    data = create_sample()
    weight_sk = sk_linear_regression(data['data_mat'][:, 0:1], data['data_mat'][:, 1:2])
    print(weight_sk)
    weight = grad_ascent(data['data_mat'][:, 0], data['data_mat'][:, 1], 0.000001)
    print(weight)
    plot_data(data['data_mat'], 'red')
    lx = [-200, 200]
    ly = [-200 * weight[0] + weight[1], 200 * weight[0] + weight[1]]
    ly_sk = [-200 * weight_sk[0] + weight_sk[1], 200 * weight_sk[0] + weight_sk[1]]
    plt.plot(lx, ly)
    plt.plot(lx, ly_sk)
    plt.show()


if __name__ == '__main__':
    main()

结果:

sklearn:[0.1165388985642626 3.958251157566739]
自己的算法:[0.11655941 3.85822306]

可以看出,差别不大,拟合出的线画出来也基本是重合的。

猜你喜欢

转载自www.cnblogs.com/small-office/p/10363848.html