1, the linear regression
2, Logistic regression
principle
Logistic regression models because only two values 0 and 1 variables, assumptions: at x1, x2 .... xp role of p independent variable, taking note of the probability y is 1 p = P (y = 1 | X), take the 0 probability 1-p, probability is 1 and the ratio of 0: p / (1-p), known as the advantages of an event than the odds of taking the natural logarithm of the reciprocal of Logistic transformation
When p (0,1) changes, the odds in the range: 0 to positive infinity, the In (p / 1-p) in the range: negative infinity to positive infinity
Modeling steps
- The purpose of providing the mining features, filtering characteristics y: x1, x2 .... xp
- Column regression equation
- The estimated regression coefficients
- Model Checking (correct rate, confusion matrix, ROC curve, KS value)
- Predictive Control
Code
import pandas as pd # Initialize filename = ' ../data/bankloan.xls ' Data = pd.read_excel (filename) x = data.iloc[:,:8].as_matrix() y = data.iloc[:,8].as_matrix() from sklearn.linear_model import LogisticRegression as LR from sklearn.linear_model import RandomizeLogisticRegression as RLR RLR = RLR, () # establish random logic regression model, the variable filter rlr.fit (X, Y) # training data rlr.get_support () # Get wherein the screening results, various features can also be obtained by fractional .scores_ method Print (U effective wherein:% S ' % ' , ' .join (data.columns [rlr.get_support ()]) as_matrix () # good filter characteristics. LR = the LR () # establishing a logical container model lr.fit (x, y ) # using training data characteristic model screening Print (U ' average model accuracy was: S% ' % lr.score (X, Y)) # model gives the correct average ratio
The main idea is that recursive feature elimination: repeatedly build a model, and then select the best or worst features of the selected features aside, the remaining portion of the above-described procedure is repeated, until the order of traversing all features, this procedure is eliminated It is the sort feature.