Feature selection algorithm learning 2

Feature selection algorithm study notes 2

Mainly to talk about common evaluation function

The evaluation function is to select the feature after selecting the amount of good or bad to do a visual explanation. . Intelligent algorithm and evaluation function is the same, we will have to quantify show

(A) mind FIG.


This very personal feeling clearly explain the child diagram. . It can be summarized. . Source address https://www.cnblogs.com/babyfei/p/9674128.html

(B) selecting a common feature in the evaluation function is divided into three main

  • Purifying filter
  • The wrap-wrapper
  • Embedded embeded

Purifying filter

1. Definition: The characteristics of each dimension of the "points" that is given to the characteristics of each dimension of weight, so the weight represents the importance of the dimensional characteristics and try to sort right.
Simply means that the use of statistical probability method was evaluated. .
2. The common methods:
2.1 Correlation (Correlation)
      using the correlation to measure the quality of the subset of features is based on the assumption: feature a good feature subset should be included in the classification with a higher degree of correlation (high correlation ), while a lower degree of correlation between the characteristics (Kang low margin). You can use the linear correlation coefficient (correlation coefficient) to measure the degree of linear correlation between the vectors. Person actually correlation coefficient function in which R is COR () ..

2.2 the distance (Distance Metrics)
using a distance metric feature selection is based on the assumption: good feature subset should be such that the sample from the same class as small as possible, the distance between the samples belonging to different classes as far as possible.
Common distance metric (similarity measure) comprises a Euclidean distance, a standardized Euclidean distance, Mahalanobis distance. Euclidean distance ( \ [\ OperatorName} {dist (X-, the Y) = \ sqrt {\ sum_ {I}. 1 = n-^ {} \ left (-y_ X_ {I} {I} \ right) ^ {2}} \] ) above normal. . .

2.3 Chi-squared test (chi-square test)
Consistency metrics commonly used chi-square test, the idea is to identify and predict the target irrelevant features, so the process is a chi-square statistic calculated for each feature and forecast targets.

2.4 Consistency (Consistency)
if the sample 1 and sample 2 belong to different categories, but on the characteristic values A, B exactly the same, then the feature subset {A, B} should not be chosen as the final feature set.
2.5 Gain information, entropy
entropy: entropy refers to the uncertainty, the greater the entropy, the greater the uncertainty
\ [H (X) = - \ sum_ {i = 1} ^ {n} P_ {i} \ bullet \ log _ {2} P_
{i} \] information for the feature is a gain in terms of a, it is to see a feature t, and it did not have the system when it is the number information, the difference between the two this feature is the amount of information brought to the system, that gain. The system containing the
characteristic information when t is calculated very well, is just equation, which represents the amount of information when the system comprises all the features.
Entropy has the following features: if the collection element Y distribution is "pure", it is smaller entropy; if Y distribution is "disorder", the greater its entropy. In extreme cases: if Y can only take a value, i.e., P1 = 1, then H (Y) take the most
small value of 0; contrary, if the occurrence probability of the various values are equal, i.e., is 1 / m, then H (Y) takes a maximum value log2M ( https://blog.csdn.net/weixin_42296976/article/details/81126883

The wrap-wrapper

  这个目前我看的包裹式论文稍微多一点,主要是与原启发式算法相结合
  1.定义:将子集的选择看作是一个搜索寻优问题,生成不同的组合,对组合进行评价,再与其他的组合进行比较。这样就将子集的选择看作是一个是一个优化问题,这 

There are many optimization algorithm can solve, especially some of the heuristic optimization algorithm, such as GA, PSO, DE, ABC, GWO, WOA, FA, FPA, BOA, ALO, ACO. In general improved algorithm is a binary number of majority.

  2.1分类错误率
     使用特定的分类器,用给定的特征子集对样本集进行分类,用分类的精度来衡量特征子集的好坏。
 公式:$$\text {error}_{-} \text {rate}=\frac{\sum\{1 | Y i \neq P Y i\}}{\sum\{1 | Y i=Y i\}}$$

Some papers which mostly the formula: \ [\ text Fitness} = {\ Alpha \ R & lt gamma_ {} (D) + \ Beta \ FRAC {| R & lt |} {| C |} \]

 3常见的分类器
  这个写个专门的吧,挺多的,不过论文中常用的KNN和SVM居多
  <a>https://i-beta.cnblogs.com/posts/edit</a>

Embedded embeded

  1.定义:
    在模型既定的情况下学习出对提高模型准确性最好的属性。这句话并不是很好理解,其实是讲在确定模型的过程中,挑选出那些对模型的训练有重要意义的属 
    性。嵌入式特征选择是将特征选择过程与学习器训练过程融为一体,两者在同一个优化过程中完成,即在学习器训练过程中自动地进行了特征选择。
  2.给个思维导图吧
   ![](https://img2018.cnblogs.com/blog/1365906/201911/1365906-20191110095305442-150755377.png)
   这里有几个分类器,我回头单独写出来,给出代码。

Reference:
1.《机器学习》周志华
2.Binary ant lion approaches for feature selection
3.Binary butterfly optimization approaches for feature selection
4.Whale optimization approaches for wrapper feature selection
5.https://www.cnblogs.com/stevenlk/p/6543628.html#%E7%A7%BB%E9%99%A4%E4%BD%8E%E6%96%B9%E5%B7%AE%E7%9A%84%E7%89%B9%E5%BE%81-removing-features-with-low-variance

  1. M. Dash, H. Liu, Feature Selection for Classification. In:Intelligent Data Analysis 1 (1997) 131–156.
    7。Lei Yu,Huan Liu, Feature Selection for High-Dimensional Data:A Fast Correlation-Based Filter Solution
    8.Ricardo Gutierrez-Osuna, Introduction to Pattern Analysis ( LECTURE 11: Sequential Feature Selection )
    http://courses.cs.tamu.edu/rgutier/cpsc689_f08/l11.pdf

Guess you like

Origin www.cnblogs.com/gaowenxingxing/p/11828885.html