Article directory
<Data preprocessing>
- Aggregation: Combine multiple samples or features (reduce sample size, convert scale, more stable)
- Sampling: taking a sample
- Dimensionality reduction: representing samples in position space (PCA, SVD)
- Feature selection: select important features (Lasso)
- Feature Creation: Reconstructing Useful Features (Fouter Transformation)
- discretization
- The process of converting continuous attributes into discrete attributes
- Commonly used for classification
- dualization
- Map continuous or categorical attributes to one or more binary variables
- Correlation Analysis
- Convert continuous attributes into categorical attributes and convert categorical attributes into a set of binary variables
- variable transformation
- Converts the value of a given attribute
- Linear transformation method (simple function)
- Standardize
- min-max normalization (normalization)
- z-score normalization (zero-mean normalization)
- Decimal scaling normalization
<sklearn machine learning platform>
MLlib learning library:
- Algorithms covered: classification algorithms, clustering algorithms, regression algorithms, dimensionality reduction algorithms
- Scikit-learn main usage:
- Symbol tags: training data, training set labels, test data, test set labels, complete data, labeled data
- Data partition:
- train_test_split(x,y,random)
- shuffle = True
- Data preprocessing
- Supervised learning algorithms (classification,
- logistic regression
- Support Vector Machines
- Naive Bayes
Chapter 3 Regression Analysis
3.1 Basic concepts of regression analysis
- regression analysis
- Divided by the number of variables involved: single regression, multiple regression analysis
- Divided according to the number of dependent variables: simple regression analysis, multiple regression analysis
- Divided according to the type of relationship between independent variables and dependent variables: linear regression analysis, nonlinear regression analysis.
- Problems solved by regression analysis:
- Correlation between variables: deterministic relationship, non-deterministic relationship
- Predict or control the value of a variable(s)
- Regression analysis steps
- Determine variables: related influencing factors (independent variables), main influencing factors
- Building a predictive model: Calculation of historical statistics for independent and dependent variables
- Conduct correlation analysis: the degree of correlation between variables and predicted objects
- Calculate prediction error: can it be used for actual predictions
- Determine the predicted value: conduct a comprehensive analysis of the predicted value
3.2 Univariate linear regression
F test, T test
- Y = a + bX + ε
- Model features:
- Y is a linear function of X plus an error term
- The linear part reflects changes in Y due to changes in X
- The error chosen ε is a random variable
- For a given value of X, the expected value of Y is E(Y) = a+bX
- Regression equation:
- Regression equation solving and model testing:
- Least Squares (Equation Solving), Residual Sum of Squares
- Goodness of fit test (model test)
- Significance test of linear relationship: Significance level test regression equation (significance test of regression parameters), ESS, RSS
- Univariate linear regression example
- Evaluation criteria r 2
3.3 Multiple linear regression
- Y = a + b1X1 + b2X2 + … + bnXn
- Model features:
- Y has a linear relationship with X 1 X 2 X 3 …X 4
- Each observation value Y i (i=1,2,3,…) is independent of each other
- Random error ε~N(0,q 2 )
- Solving polynomial regression equations using least squares method
- Goodness of fit test
- Significance test of regression parameters
- Multiple linear regression example
3.4 Polynomial regression
- Polynomial regression equation (nonlinear → linear)
- Polynomial regression equation example
- Solving polynomial regression equations
- Regression equation F test
- Polynomial regression equation t-test
Evaluation criteria for regression
- Mean Squared Error (MSE)
- Root mean square error (RMSE)
- Mean Absolute Error (MAE)
- Choose MSE or MAR?