"Machine learning" Watermelon Book Chapter XI sparse feature selection and learning

Chapter XI sparse feature selection and learning

11.1 subset of the search and evaluation

Useful for current learning task property called "relevant characteristics", with no property called "irrelevant characteristics." Selecting from a given set of characteristic features of a process associated subset, referred to as "feature selection."

Feature selection is an important "data preprocessing" process, after acquiring the data is usually the first feature selection, after which the training learner.

Why feature selection? The reason: ① In order to solve the curse of dimensionality properties caused by excessive; ② remove features irrelevant tend to reduce the difficulty of learning tasks. -

How to select a subset of features includes all the important information from the initial feature set? Produce a "candidate subset" to evaluate its good or bad, based on a subset of candidates under evaluation result, and then evaluate them, until I can not find a better candidate subset so far.

The first link is a "subset of search" for a given feature set, we can be seen as a feature of each candidate subset, d evaluate these single candidate feature subsets, assuming {

a_2

} Optimal, so as to select a first set of wheels; and then concentrated on a selected one feature is added, and then evaluated. . . Assume that k + 1 in the first round k + 1 the best candidate feature subset is better on a selected set, generating a candidate subset is stopped, and on a selected set of features as a feature of the k selection result . So increasing the relevant features of the policy called "forward search" Similarly, if we have a complete set of features from the start, each attempt to remove an unrelated features, such gradual reduction feature strategy is called "backward search." You can also combine the search for the front and rear to search up each round increasing relevance of selected characteristics, while reducing extraneous feature, such a strategy known as "two-way" search.

The second part is a "subset evaluation" issue, attribute subset of the information gain of A:

Where the information entropy is defined as:

The larger information gain, means more features contribute to a subset of the information contained in classified, therefore, we can calculate the information to gain, as the evaluation criteria based on training data D.

More generally, A is a subset of features to determine a division of adding data set D, each corresponding to a divided area on the A value, and the sample Y flag information D corresponding to the really divided by estimating the two divided differences can be evaluated on a.

The subset of features to search sub-set evaluation mechanism combined feature selection method can be obtained.

Common feature selection method can be roughly divided into three categories: filtering, wrapped and embedded.

11.2 purifying selection

A method of filtering to select the characteristic data sets, and then training and learning, a subsequent independent feature selection learner. This corresponds to the first feature by feature selection process wherein the initial filter characteristics, to train the model and then filtered.

Relief is a well-known filtering feature selection method. The method uses "relevant statistics" to measure the importance of features. This statistic is a vector, each component corresponding to an initial characteristic, and the importance of the feature subset is a sub-component of the relevant statistic corresponding to each feature and the concentration determined. The final formulation just one threshold value, and select component than the threshold value large correlation statistics to the corresponding features.

How to determine the relevant statistics? Given training set

{(x_1,y_1),(x_2,y_2),...,(x_m,y_m)}

xi, Relief to find examples of each sample in its class in its nearest neighbor, known as the "guess near abroad", and then look for from its nearest neighbor in a heterogeneous sample xi, known as the "wrong neighborhood '' and related statistics an amount of a component corresponding to the attribute j is

 

If xi guessed from its neighbor on the property xi j is smaller than the distance to its neighbors wrong, then the property j and differentiating similar beneficial heterogeneous sample, then increase j statistic properties corresponding component; on the contrary, if greater than, then the attribute j its negative effects, thus reducing the component statistic properties corresponding to j. Finally, based on the estimation results of different samples were averaged, to obtain statistics relevant components of each attribute, the greater the component value, corresponding to the classification ability stronger property.

11.3 wrap-selection

Filtering and feature selection does not consider the different subsequent study, a wrap-around feature selection directly learning performance will eventually be used as an evaluation criterion feature subset.

 

Guess you like

Origin www.cnblogs.com/ttzz/p/11730679.html