Original blog post, please indicate the source for reprinting!
Embedded feature selection uses machine learning models for feature selection. The feature selection process is related to the learner, the feature selection process is integrated with the learner training process, and the feature selection is automatically performed during the learner training process.
Feature selection via L1 regularization
The model SelectFromModel integrated by sklearn in the feature_selection module implements embedded feature selection. The SelectFromModel model selects features through the feature importance indicator coef_ or feature_importance provided by sklearn's built-in machine learning model, that is, if the value of the feature's coef_ or feature_importance Below the preset threshold, the features below the threshold are removed. The setting of the threshold can be specified, or an appropriate threshold can be selected through a heuristic method. Heuristic methods usually include mean, median and so on.
# L1+LR feature selection
# -*- coding: utf-8 -*- """ # author: wanglei5205 # blog: http: //cnblogs.com/wanglei5205 # github:http://github.com/wanglei5205 """ ### 生成数据 from sklearn.datasets import make_classification X, y = make_classification(n_samples=1000, #Number of samples n_features =25, #Number of features n_informative=3, #Number of valid features n_redundant=2, #Number of redundant features (random combination of valid features) n_repeated= 0, #Number of repeated features (random combination of valid features and redundant features) n_classes=8, #sample category n_clusters_per_class =1, #Number of clusters random_state= 0) # ## Feature selection from sklearn.linear_model import LogisticRegression lr = LogisticRegression(penalty="l1",C=0.1) from sklearn.feature_selection import SelectFromModel X_L1 = SelectFromModel(estimator = lr).fit_transform(X,y) # ## Data split from sklearn.model_selection import train_test_split x_a_train,x_a_test,y_a_train,y_a_test = train_test_split(X,y,random_state = 33,test_size = 0.25) x_b_train,x_b_test,y_b_train,y_b_test = train_test_split(X_L1,y,random_state = 33,test_size = 0.25) # ## Effect comparison from sklearn.svm import SVC svc1 = SVC().fit(x_a_train,y_a_train) print(svc1.score(x_a_test,y_a_test)) from sklearn.svm import SVC svc2 = SVC().fit(x_b_train,y_b_train) print(svc2.score(x_b_test,y_b_test))
# Reference blog