Preface: get a case, to analyze:
This classification is to do it or do return, what part do the classification, which part of the regressions, which part of the optimization done, what their target values are yes.
Then pick factors, which influence factors and classification of, and what factors affect the return, and what factors related to optimization.
For linear regression, the
First, the need to import all of the modules and packages
# Introduced all the necessary packets from sklearn.model_selection Import train_test_split # data into classes, for cross-validation from sklearn.linear_model Import LinearRegression # Linear regression from sklearn.preprocessing Import StandardScaler # data were normalized Import numpy AS NP Import matplotlib mpl AS Import matplotlib.pyplot AS plt Import PANDAS AS pd from PANDAS Import DataFrame Import Time
Download Data
# Load data path1 = ' ' DF = pd.read_csv (path1, On Sep = ' ; ' , low_memory = False) # not mixed type when invoked by more memory low_memory = F, enhance the efficiency)
Look at the data and format
df.head (10) # look at the first 10 rows of data # view format information df.info ()
Get x and y
Split into a training data set and test set
# # Dataset Test set Training set into # X-: wherein matrix (type typically DataFrame) # the Y: the Label tag corresponding features (type typically Series) # test_size: time of X / Y is divided, the test set accounting data, is a type of float between the (0,1) values # random_state: dividing the data is divided based randomizer, the parameters for a given random number seed; to a value (int type) role is ensure that each number generated by dividing the data set are identical X_train, X_test, Y_train, android.permission.FACTOR. train_test_split = (X-, the Y, test_size = 0.2, random_state = 0) # look training set and test set sample Print (X_train .shape) Print (X_test.shape) Print (Y_train.shape)
Standardization / normalization, standardization do continuous, discrete doing normalized, as the case may be
# Fit is seeking the mean and variance, transform is seeking out mean - seeking out the variance, the two can be combined SS = StandardScaler () # normalize the data, create a model object normalized, Normalization is normalized
Fitting, standardization, fit and transfoem can be split into two steps to write
# Normalized training set , a test set of standardized X_train = ss.fit_transform (X_train) # training model and the training set conversion X_test = ss.transform (X_test) # directly on the data in a model building Data normalization (test set) as a is to let the mean and variance of the test set and the training set is the same
Model training, predictions
# # Fitting, model train LR = LinearRegression (fit_intercept = True) # linear model objects built, whether to include intercept lr.fit (X_train, Y_train) # # fitting training model # # model validation y_predict = lr. predict (X_test) # # predictions
See goodness of fit, i.e., R-squared
Print ( " the training set R2: " , lr.score (X_train, Y_train)) Print ( " test set R2: " , lr.score (X_test, android.permission.FACTOR.)) # MSE = np.average ((y_predict-android.permission.FACTOR.) 2 **) # RMSE = np.sqrt (MSE) # Print ( "RMSE:", RMSE)
Output parameters to get training
# Parameters output model training obtained Print ( " : coefficients of the model ([theta]) " , End = "" ) Print (lr.coef_) Print ( " intercept model: " , End = '' ) Print (LR. intercept_)