How to perform feature selection with baggingregressor?

DN1 :

I am trying to select features from gradient boosting using bootstrapping - performing the bootstapping via the BaggingRegressor in scikit-learn. I am not sure this is possible or correct, but this is what I've tried:

bag = BaggingRegressor(base_estimator=GradientBoostingRegressor(), bootstrap_features=True, random_state=seed)
bag.fit(X,Y)
model = SelectFromModel(bag, prefit=True, threshold='mean')
gbr_boot = model.transform(X)
print('gbr_boot', gbr_boot.shape)

This gives the error:

ValueError: The underlying estimator BaggingRegressor has no `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to SelectFromModel or call fit before calling transform.

I am not sure how to address this error, I thought gradient boosting gives the feature_importances_. I have tried working around it with:

bag = BaggingRegressor(base_estimator=GradientBoostingRegressor(), bootstrap_features=True, random_state=seed)
bag.fit(X,Y)

feature_importances = np.mean([
    tree.feature_importances_ for tree in bag.estimators_
], axis=0)

threshold = np.mean(feature_importances)


temp=()
for i in feature_importances: 
    if i > threshold:
        temp=temp + ((i),)
    else:
        temp=temp + (('null'),) 


model_features=data.columns

feature = pd.DataFrame(np.array(model_features))

df = pd.DataFrame(temp)

df_total = pd.concat([feature, df], axis=1)

This seems to be successful in giving selected features surpassing the importance threshold I've made, but I am not sure if I am finding the true feature selection from BaggingRegressor which SelectFromModel would also find, or if (as the scikit-learn error implies to me) it does not exist for this method. For clarity, the reason I am trying BaggingRegressor bootstrapping is due to SelectFromModel with gradient boosting alone fluctuating in the number of features it selects, and I read a paper ( section 7.1) saying bootstrapping can reduce this variance (as I understood it, I don't have a CS/stats background).

Venkatachalam :

You have to create a wrapper on BaggingRegressor for this problem.

class MyBaggingRegressor(BaggingRegressor):
    @property
    def feature_importances_(self):
        return self.regressor_.feature_importances_

    @property
    def coef_(self):
        return self.regressor_.coef_

There is an existing issue regarding this in sklearn here and the corresponding PR.

Note: you don't have to go for BaggingRegressor, if your base_estimator is GradientBoostingRegressor.

use the subsample param to achieve the same.

subsample: float, optional (default=1.0)
The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=198743&siteId=1
Recommended