Machine learning algorithm code implementation - Linear Regression

Preface: get a case, to analyze:

  This classification is to do it or do return, what part do the classification, which part of the regressions, which part of the optimization done, what their target values ​​are yes.

  Then pick factors, which influence factors and classification of, and what factors affect the return, and what factors related to optimization.

  For linear regression, the

First, the need to import all of the modules and packages

# Introduced all the necessary packets 
from sklearn.model_selection Import train_test_split # data into classes, for cross-validation 
from sklearn.linear_model Import LinearRegression # Linear regression 
from sklearn.preprocessing Import StandardScaler # data were normalized 
Import numpy AS NP
 Import matplotlib mpl AS
 Import matplotlib.pyplot AS plt
 Import PANDAS AS pd
 from PANDAS Import DataFrame
 Import Time

Download Data

# Load data 
path1 = '  ' 
DF = pd.read_csv (path1, On Sep = ' ; ' , low_memory = False) # not mixed type when invoked by more memory low_memory = F, enhance the efficiency)

Look at the data and format

df.head (10) # look at the first 10 rows of data 
# view format information 
df.info ()

Get x and y

Split into a training data set and test set

# # Dataset Test set Training set into 
# X-: wherein matrix (type typically DataFrame) 
# the Y: the Label tag corresponding features (type typically Series) 
# test_size: time of X / Y is divided, the test set accounting data, is a type of float between the (0,1) values 
# random_state: dividing the data is divided based randomizer, the parameters for a given random number seed; to a value (int type) role is ensure that each number generated by dividing the data set are identical 
X_train, X_test, Y_train, android.permission.FACTOR. train_test_split = (X-, the Y, test_size = 0.2, random_state = 0)
 # look training set and test set sample 
Print (X_train .shape)
 Print (X_test.shape)
 Print (Y_train.shape)

Standardization / normalization, standardization do continuous, discrete doing normalized, as the case may be

# Fit is seeking the mean and variance, transform is seeking out mean - seeking out the variance, the two can be combined 
SS = StandardScaler () # normalize the data, create a model object normalized, Normalization is normalized

Fitting, standardization, fit and transfoem can be split into two steps to write

# Normalized training set , a test set of standardized 
X_train = ss.fit_transform (X_train) # training model and the training set conversion 
X_test = ss.transform (X_test) # directly on the data in a model building Data normalization (test set) as a is to let the mean and variance of the test set and the training set is the same

Model training, predictions

# # Fitting, model train 
LR = LinearRegression (fit_intercept = True) # linear model objects built, whether to include intercept 
lr.fit (X_train, Y_train) # # fitting training model 
# # model validation 
y_predict = lr. predict (X_test) # # predictions

See goodness of fit, i.e., R-squared

Print ( " the training set R2: " , lr.score (X_train, Y_train))
 Print ( " test set R2: " , lr.score (X_test, android.permission.FACTOR.))
 # MSE = np.average ((y_predict-android.permission.FACTOR.) 2 **) 
# RMSE = np.sqrt (MSE) 
# Print ( "RMSE:", RMSE)

Output parameters to get training

# Parameters output model training obtained 
Print ( " : coefficients of the model ([theta]) " , End = "" )
 Print (lr.coef_)
 Print ( " intercept model: " , End = '' )
 Print (LR. intercept_)

Guess you like

Origin www.cnblogs.com/qianchaomoon/p/12103810.html