[Machine Learning]filter methods

It is common that datasets have thousands of features. However, processing thousands of features during training and testing can be computationally infeasible. Besides, many irrelevant features can lead to overfitting. So, we need to select most relevant features in order to obtain faster, better and easier to understand learning models. There are a lot of methods for feature selection, such as wrapper method, filter method, univariate method and multivariate method. Here I want to talk about filter method.

Filter method means rank all the features using a measure of correlation with the label. And then select top K features to use in the model. There are a couple of ways to measure correlation between feature X and label Y: Mutual Information, Chi-square statistic, Pearson Correlation coefficient, Single-to-Noise Ratio and T-test.

1) Mutual Information:

As a feature of probability we know that, if X and Y is independent then P(X,Y)=P(X)P(Y);

Measure of dependence:

2010051001574196.jpg

It is 0 when X and Y are independent.

It is maximum when X=Y.

limitation of the MI method:

-works only with nominal features and labels

-biased toward high arity features

-may choose redundant features

-features may beome relevant in the context of other

(comparision between MI X2, and LLR in [Dunning, CL'98] Accurate methods for the statistics of suprise and coincidence)

2)Chi Square Test of independent

3)Pearson Correlation Coefficient

2010051005361455.jpg

4)Signal-to-Noise Ratio

2010051005430528.jpg

5)T-test

2010051005432187.jpg

posted on 2010-05-10 05:45 Zhu Qing 阅读( ...) 评论( ...) 编辑 收藏

转载于:https://www.cnblogs.com/Qing_Zhu/archive/2010/05/10/1731471.html

猜你喜欢

转载自blog.csdn.net/weixin_34245749/article/details/94342962
今日推荐