Machine Learning - Linear Regression (sklearn)

Machine Learning – Linear Regression Models (sklearn)

Linear regression models include: general form of unary linear regression and multiple linear regression , ridge regression (Ridge ) using L2 norm , lasso regression (Lasso) using L1 norm , ElasticNet regression using L1 and L2 norm (yes Fusion of Lasso regression and ridge regression), logistic regression .
insert image description here

Linear regression-sklearn library calling method and parameter explanation:

from sklearn.linear_model import LinearRegression
LinearRegression(fit_intercept=True,normalize=False,copy_X=True,n_jobs=1)

Parameter meaning:

fit_intercept: Boolean value, specifying whether the intercept in linear regression needs to be calculated, that is, the b value. If False, then do not calculate the b value.
normalize: Boolean value. If False, then the training samples will be normalized.
copy_X: Boolean value. If True, make a copy of the training data.
n_jobs: an integer. The number of CPUs specified when tasks are parallelized. A value of -1 uses all available CPUs.

Attributes

coef_: weight vector
intercept_: intercept b value

method:

fit(X,y): train the model.
predict(X): Use the trained model to predict and return the predicted value.
score(X,y): Returns the score of predictive performance.
The calculation formula is: $s c o r e = (1 - u / v)$
其中 $u=((y_true - y_pred) ** 2).sum()，v=((y_true - y_true.mean()) ** 2).sum()$
The maximum score is 1, but it may be negative (the prediction effect is too poor). The larger the score, the better the prediction performance.

Linear regression with L2 regularization - sklearn library calling method and parameter explanation

from sklearn.linear_model import Ridge
Ridge(alpha=1.0, fit_intercept=True, normalize=False,copy_X=True, max_iter=None,
tol=1e-3, solver=“auto”,random_state=None)

Parameter meaning:

alpha: regularization coefficient, the larger the value, the larger the proportion of regularization. The initial value is recommended to be set to 0 at the beginning, so that a better learning rate is determined first. Once the learning rate is determined, give alpha a smaller value, and then increase or decrease by 10 times according to the accuracy rate on the verification set. 10 times is coarse adjustment, when the appropriate order of magnitude is determined, then fine adjustment within the same order of magnitude.
fit_intercept: Boolean value, specifying whether to calculate the intercept b value. False does not calculate the b value.
normalize: Boolean value. If equal to True, the data will be normalized before model training. Normalization here has two benefits: (1): Improve the convergence speed of the model and reduce the time to find the optimal solution. (2) Improve the accuracy of the model.
copy_X: Boolean value. If set to True, a copy of the training data will be made.
max_iter: integer. The maximum number of iterations is specified. If None, the default is assumed.
tol: threshold. Determine whether the iteration converges or meets the accuracy requirements.
solver: string. Specifies an algorithm for solving an optimization problem.
(1).solver='auto', the algorithm is automatically selected according to the data set.
(2).solver='svd', use the singular value decomposition method to calculate
(3).solver='cholesky', use the scipy.linalg.solve function to solve the optimal solution.
(4).solver='sparse_cg', use the scipy.sparse.linalg.cg function to find the optimal solution.
(5).solver='sag', use the Stochastic Average Gradient descent algorithm to solve the optimization problem.
random_state: An integer or a RandomState instance, or None. It is used when solver="sag".
(1). If an integer, it specifies the seed of the random number generator.
(2). If it is a RandomState instance, a random number generator is specified.
(3). If None, use the default random number generator.

Attributes:

coef_: weight vector.
intercept_: The value of the intercept b.
n_iter_: The actual number of iterations.

method:

fit(X,y): train the model.
predict(X): Use the trained model to predict and return the predicted value.
score(X,y): Returns the score of predictive performance.
The calculation formula is: $s c o r e = (1 - u / v)$
其中 $u=((y_true - y_pred) ** 2).sum()，v=((y_true - y_true.mean()) **2).sum()$
The maximum score is 1, but it may be negative (the prediction effect is too poor). The larger the score, the better the prediction performance.

Adding L1 regularized linear regression-sklearn library calling method and parameter explanation

from sklearn.linear_model import Lasso
Lasso(alpha=1.0, fit_intercept=True, normalize=False, precompute=False,
copy_X=True, max_iter=1000,
tol=1e-4, warm_start=False, positive=False,random_state=None,
selection=‘cyclic’)

Parameter meaning:

alpha: regularization term coefficient
fit_intercept: Boolean value, specifying whether to calculate the intercept b value. False does not calculate the b value.
max_iter: Specifies the maximum number of iterations.
normalize: Boolean value. If equal to True, the data will be normalized before model training. Normalization here has two benefits: (1): Improve the convergence speed of the model and reduce the time to find the optimal solution. (2) Improve the accuracy of the model.
precompute: A boolean or a sequence. It decides whether to calculate the Gram matrix in advance to speed up the calculation.
tol: threshold. Determine whether the iteration converges or meets the accuracy requirements.
warm_start: Boolean value. If True, continue training using the previous training results. Otherwise train from scratch.
positive: Boolean value. If True, forces all components of the weight vector to be positive.
selection: string, can be "cyclic" or "random". It specifies which component of the weight vector is chosen to be updated each iteration.
(1) "random": When updating, randomly select a component of the weight vector to update.
(2) "cyclic": When updating, select a component of the weight vector from front to back to update
random_state: An integer or a RandomState instance, or None.
(1): If integer, it specifies the seed of the random number generator.
(2): If it is a RandomState instance, it specifies a random number generator.
(3): If None, use the default random number generator.

Attributes:

coef_: weight vector.
intercept_: Intercept b value.
n_iter_: the actual number of iterations.

method:

fit(X,y): train the model.
predict(X): Use the model to predict and return the predicted value.
score(X,y): Returns the score of predictive performance.
The calculation formula is: $s c o r e = (1 - u / v)$
其中 $u=((y_true - y_pred) ** 2).sum()，v=((y_true - y_true.mean()) ** 2).sum()$
The maximum score is 1, but it may be negative (the prediction effect is too poor). The larger the score, the better the prediction performance.

ElasticNet regression is a fusion of Lasso regression and ridge regression, and its regularization term is a trade-off between L1 norm and L2 norm. -sklearn library calling method and parameter explanation

Default: $alpha * l1_ratio * ||w||_1 + 0.5 * . alpha * ( 1 - l1_ratio ) * ||w||^2_2$

from sklearn.linear_model import ElasticNet
ElasticNet(alpha=1.0, l1_ratio=0.5, fit_intercept=True,
normalize=False,precompute=False, max_iter=1000,copy_X=True, tol=1e-4,
warm_start=False, positive=False,random_state=None, selection=‘cyclic’)

Parameter meaning:

alpha: The alpha value in the regularization term.
l1_ratio: The l1_ratio value in the regularization term.
fit_intercept: Boolean value, specifying whether the intercept b-value needs to be calculated. False does not calculate the b value.
max_iter: Specifies the maximum number of iterations.
normalize: Boolean value. If equal to True, the data will be normalized before model training. Normalization here has two benefits:
(1): Improve the convergence speed of the model and reduce the time to find the optimal solution.
(2) Improve the accuracy of the model.
copy_X: Boolean value. If set to True, a copy of the training data will be made.
precompute: A boolean or a sequence. It decides whether to calculate the Gram matrix in advance to speed up the calculation.
tol: threshold. Determine whether the iteration converges or meets the accuracy requirements.
warm_start: Boolean value. If True, continue training using the previous training results. Otherwise train from scratch.
positive: Boolean value. If True, forces all components of the weight vector to be positive.
selection: string, can be "cyclic" or "random". It specifies which component of the weight vector is chosen to be updated each iteration.
(1) "random": When updating, randomly select a component of the weight vector to update.
(2) "cyclic": When updating, select a component of the weight vector to update in sequence from front to back.
random_state: An integer or a RandomState instance, or None.
(1): If integer, it specifies the seed of the random number generator.
(2): If it is a RandomState instance, it specifies a random number generator.
(3): If None, use the default random number generator.

Attributes:

coef_: weight vector.
intercept_: Intercept b value.
3.n_iter_: The actual number of iterations.

method:

fit(X,y): train the model.
predict(X): Use the model to predict and return the predicted value.
score(X,y): Returns the score of predictive performance.

The calculation formula is: $s c o r e = (1 - u / v)$
其中 $u=((y_true - y_pred) ** 2).sum()，v=((y_true - y_true.mean()) ** 2).sum()$
The maximum score is 1, but it may be negative (the prediction effect is too poor). The larger the score, the better the prediction performance.

Logistic regression-sklearn library calling method and parameter explanation

from sklearn.linear_model import LogisticRegression
LogisticRegression(penalty=‘l2’,dual=False,tol=1e-4,C=1.0,
fit_intercept=True,intercept_scaling=1,class_weight=None,
random_state=None,solver=‘liblinear’,max_iter=100,
multi_class=‘ovr’,verbose=0,warm_start=False,n_jobs=1)

Parameter meaning:

penalty: A string specifying the regularization strategy. The default is "l2"
(1) If it is "l2", the optimized objective function is: $0.5*||w||^2_2 +C*L(w),C>0$ , L(w) is the maximum likelihood function.
(2) If it is "l1", the optimized objective function is $w||_1+C*L(w), C>0$ , L(w) is the maximum likelihood function.
dual: Boolean value. The default is False. If equal to True, its dual form is solved.
The dual form is only available when penalty="l2" and solver="liblinear". If False, the original form is solved. When n_samples > n_features, favor dual=False.
tol: threshold. Determine whether the iteration converges or meets the accuracy requirements.
C: float, the default is 1.0. Specifies the reciprocal of the coefficient of the regularization term. Must be a positive floating point number. The smaller the value of C, the larger the regularization term.
fit_intercept: bool value. The default is True. If False, no b-values will be calculated.
intercept_scaling: float, default 1. Only meaningful if solver="liblinear" and fit_intercept=True. In this case, it is equivalent to adding a feature to the last column of the training data, which is always 1. Its corresponding weight is b.
class_weight: dict or 'balanced', default: None.
(1) If it is a dictionary, the weight of each category is given. Follow the form of {class_label: weight}.
(2) If it is "balanced": the weight of each category is inversely proportional to the frequency of the category appearing in the sample set.
$n_samples / (n_classes * np. bincount(y))$
(3) If not specified, the weight of each category is 1.
random_state: int, RandomState instance or None, default: None
(1): If integer, it specifies the seed of the random number generator.
(2): If it is a RandomState instance, it specifies a random number generator.
(3): If None, use the default random number generator.
solver: A string specifying the algorithm to solve the optimization problem.
{'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}, default: 'liblinear' (
1) solver='liblinear', for small datasets, 'liblinear' is good choose. For large datasets, 'sag' and 'saga' are faster to process.
(2) solver='newton-cg', using Newton's method
(3) solver='lbfgs', using L-BFGS quasi-Newton method.
(4) solver='sag', using the Stochastic Average Gradient descent algorithm.
(5) For multi-classification problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial losses; '
liblinear' is limited to the 'ovr' scheme.
(6) newton-cg', 'lbfgs' and 'sag' can only handle L2 penalty, 'liblinear' and 'saga' can handle L1 penalty.
max_iter: Specifies the maximum number of iterations. default: 100. Only available for 'newton-cg', 'sag' and 'lbfgs'.
multi_class: {'ovr', 'multinomial'}, default: 'ovr'. Specifies the strategy for classification problems.
(1) multi_class='ovr', adopt the 'one_vs_rest' strategy.
(2) multi_class='multinomal', directly adopt the multi-class logistic regression strategy.
verbose: It is used to turn on or off the log function of the iterative intermediate output.
warm_start: Boolean value. If True, continue training using the previous training results. Otherwise train from scratch.
n_jobs: int, default: 1. Specifies the number of CPUs for parallel tasks. If -1, use all available CPUs.

Attributes:

coef_: weight vector.
intercept_: Intercept b value.
n_iter_: the actual number of iterations.

method:

fit(X,y): train the model.
predict(X): Use the trained model to predict and return the predicted value.
predict_log_proba(X): Returns an array whose elements are the logarithms of the probabilities predicted by X for each category.
predict_proba(X): Returns an array whose elements are the probability values predicted by X for each category.
score(X,y): Returns the accuracy of the prediction.

Practical case:

multiple linear regression

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets,linear_model,model_selection
from sklearn.model_selection import train_test_split
def load_data():
    diabetes = datasets.load_diabetes()
    return train_test_split(diabetes.data,diabetes.target,test_size=0.2)

def test_LineearRegression(*data):
    X_train,X_test,y_train,y_test = data
    regr = linear_model.LinearRegression()
    regr.fit(X_train,y_train)
    print('Coefficients:{},intercept:{}'.format(regr.coef_,regr.intercept_))
    print('Residual sum of squares:{}'.format(np.mean(regr.predict(X_test)-y_test)**2))
    print('Scores:{}'.format(regr.score(X_test,y_test)))
    
if __name__ == '__main__':
    X_train,X_test,y_train,y_test = load_data()
    test_LineearRegression(X_train,X_test,y_train,y_test)

Lasso returns

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets,linear_model
from sklearn.model_selection import train_test_split

def load_data():
    diabetes = datasets.load_diabetes()
    return train_test_split(diabetes.data,diabetes.target,test_size=0.2)

def test_Lasso(*data):
    X_train,X_test,y_train,y_test = data
    regr = linear_model.Lasso()
    regr.fit(X_train,y_train)
    print('Coefficients:{},intercept:{}'.format(regr.coef_,regr.intercept_))
    print('Residual sum of squares:{}'.format(np.mean(regr.predict(X_test)-y_test)**2))
    print('Scores:{}'.format(regr.score(X_test,y_test)))
    
def test_Lasso_alpha(*data):
    X_train,X_test,y_train,y_test = data
    alphas = [0.01,0.02,0.05,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000]
    scores = []
    for i,alpha in enumerate(alphas):
        regr = linear_model.Lasso(alpha=alpha)
        regr.fit(X_train,y_train)
        scores.append(regr.score(X_test,y_test))
    
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    ax.plot(alphas,scores)
    ax.set_xlabel("alpha")
    ax.set_ylabel("score")
    ax.set_xscale("log")
    ax.set_title('Lasso')
    plt.show()
    
if __name__ == '__main__':
    X_train,X_test,y_train,y_test = load_data()
    test_Lasso(X_train,X_test,y_train,y_test)
    test_Lasso_alpha(X_train,X_test,y_train,y_test)

Ridge Regression (Ridge)

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets,linear_model
from sklearn.model_selection import train_test_split

def load_data():
    diabetes = datasets.load_diabetes()
    return train_test_split(diabetes.data,diabetes.target,test_size=0.2,random_state=0)

def test_Ridge(*data):
    X_train,X_test,y_train,y_test = data
    regr = linear_model.Ridge()
    regr.fit(X_train,y_train)
    print('Coefficients:{},intercept:{}'.format(regr.coef_,regr.intercept_))
    print('Residual sum of squares:{}'.format(np.mean(regr.predict(X_test)-y_test)**2))
    print('Scores:{}'.format(regr.score(X_test,y_test)))
    
def test_Ridge_alpha(*data):
    X_train,X_test,y_train,y_test = data
    alphas = [0.01,0.02,0.05,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000]
    scores = []
    for i,alpha in enumerate(alphas):
        regr = linear_model.Lasso(alpha=alpha)
        regr.fit(X_train,y_train)
        scores.append(regr.score(X_test,y_test))
    
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    ax.plot(alphas,scores)
    ax.set_xlabel("alpha")
    ax.set_ylabel("score")
    ax.set_xscale("log")
    ax.set_title('Ridge')
    plt.show()
    
if __name__ == '__main__':
    X_train,X_test,y_train,y_test = load_data()
    test_Ridge(X_train,X_test,y_train,y_test)
    test_Ridge_alpha(X_train,X_test,y_train,y_test)

logistic regression

from math import exp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

def create_data():
    iris = load_iris()
    df = pd.DataFrame(iris.data,columns=iris.feature_names)
    df['label'] = iris.target
    df.columns =['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
    data = np.array(df.iloc[:100,[0,1,-1]])
    return data[:,:2],data[:,-1]

X,y = create_data()
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)

class LogisticRegressionClassifier(): #自定义方式
    def __init__(self,max_iter=700,learning_rate=0.01):
        self.max_iter = max_iter
        self.learning_rate = learning_rate
    
    def sigmoid(self,x):
        return x / (1+exp(-1))
    
    def data_matrix(self,X):
        data_mat = []
        for d in X:
            data_mat.append([1.0,*d])
        return data_mat
    
    def fit(self, X, y):
        data_mat = self.data_matrix(X)
        self.weights = np.zeros((len(data_mat[0]), 1), dtype=np.float32)
        
        for iter_ in range(self.max_iter):
            for i in range(len(X)):
                result = self.sigmoid(np.dot(data_mat[i], self.weights))
                error = y[i] - result
                self.weights += self.learning_rate * error * np.transpose([data_mat[i]])
        print('LogisticRegression Model(learning_rate={}, max_iter={})'.
                            format(self.learning_rate, self.max_iter))

    def score(self, X_test, y_test):
        right = 0
        X_test = self.data_matrix(X_test)
        for x, y in zip(X_test, y_test):
            result = np.dot(x, self.weights)
            if (result > 0 and y == 1) or (result < 0 and y == 0):
                right += 1
        return right / len(X_test)
 
lr_clf = LogisticRegressionClassifier()
lr_clf.fit(X_train, y_train)
lr_clf.score(X_test, y_test)
x_points = np.arange(4, 8)
y_ = -(lr_clf.weights[1] * x_points + lr_clf.weights[0]) / lr_clf.weights[2]
plt.plot(x_points, y_)
plt.scatter(X[:50, 0], X[:50, 1], label='0')
plt.scatter(X[50:, 0], X[50:, 1], label='1')
plt.legend()     

#调用方式
Logitic_model = linear_model.LogisticRegression()
Logitic_model.fit(X_train,y_train)     

Logitic_model.score(X_test,y_test)