【Feature selection】Embedded feature selection method

Original blog post, please indicate the source for reprinting!

    Embedded feature selection uses machine learning models for feature selection. The feature selection process is related to the learner, the feature selection process is integrated with the learner training process, and the feature selection is automatically performed during the learner training process.

 

Feature selection via L1 regularization

 

      The model SelectFromModel integrated by sklearn in the feature_selection module implements embedded feature selection. The SelectFromModel model selects features through the feature importance indicator coef_ or feature_importance provided by sklearn's built-in machine learning model, that is, if the value of the feature's coef_ or feature_importance Below the preset threshold, the features below the threshold are removed. The setting of the threshold can be specified, or an appropriate threshold can be selected through a heuristic method. Heuristic methods usually include mean, median and so on.

# L1+LR feature selection

# -*- coding: utf-8 -*-
"""
# author: wanglei5205
# blog: http: //cnblogs.com/wanglei5205
# github:http://github.com/wanglei5205

"""
### 生成数据
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, #Number of          samples n_features 
                           =25,           #Number of features 
                           n_informative=3, #Number of         valid features 
                           n_redundant=2, #Number of           redundant features (random combination of valid features) 
                           n_repeated= 0,            #Number of repeated features (random combination of valid features and redundant features) 
                           n_classes=8,             #sample category n_clusters_per_class 
                           =1, #Number of clusters 
                           random_state= 0)

# ## Feature selection 
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(penalty="l1",C=0.1)

from sklearn.feature_selection import SelectFromModel
X_L1 = SelectFromModel(estimator = lr).fit_transform(X,y)

# ## Data split 
from sklearn.model_selection import train_test_split
x_a_train,x_a_test,y_a_train,y_a_test = train_test_split(X,y,random_state = 33,test_size = 0.25)
x_b_train,x_b_test,y_b_train,y_b_test = train_test_split(X_L1,y,random_state = 33,test_size = 0.25)

# ## Effect comparison 
from sklearn.svm import SVC
svc1 = SVC().fit(x_a_train,y_a_train)
print(svc1.score(x_a_test,y_a_test))

from sklearn.svm import SVC
svc2 = SVC().fit(x_b_train,y_b_train)
print(svc2.score(x_b_test,y_b_test))

# Reference blog

http://www.cnblogs.com/stevenlk/p/6543646.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325111775&siteId=291194637