Feature engineering: what is a "combined feature"? How to deal with "high-dimensional combination features"?

In order to improve the fitting ability of complex relationships, first-order discrete features are often combined in pairs in feature engineering to form high-order combined features. In practical problems, it is necessary to face a variety of high-dimensional features, and simply combining them in pairs is still prone to problems such as too many parameters and overfitting.

How to efficiently find combination features? Decision trees can be used to find combinations of features.

For example, the film and television recommendation problem has two low-level features "language" and "type". Among them, the language is divided into Chinese and English, and the type is divided into movies and TV series. Then the high-level combination features of these two features are (Chinese, movie ), (English, TV series), (English, movie), (Chinese, TV series). The data in the following table can be changed to new data:

Whether to click language type
0 Chinese Movie
1 English Movie
1 Chinese TV drama
0 English TV drama
Whether to click Language = Chinese, Type = Movie Language = English, Genre = Movie Language = Chinese, Genre = TV Drama Language = English, Genre = TV Series
0 1 0 0 0
1 0 1 0 0
1 0 0 0 1
0 0 0 0 1

Taking logistic regression as an example, suppose the feature vector of the data is X = ( x 1 , x 2 , … , xk ) X=(x_1,x_2,\dots,x_k)X=(x1,x2,,xk),则有:
Y = sigmoid ( ∑ i ∑ j w i j ⟨ x i , x j ⟩ ) Y=\text{sigmoid}(\sum_i\sum_jw_{ij}\langle x_i,x_j\rangle) Y=sigmoid(ijwijxi,xj)
⟨ x i , x j ⟩ \langle x_i,x_j\rangle xi,xj meansxi x_ixiand xj x_jxjCombination features of wij w_{ij}wijThe dimension of is equal to the iii andjjThe number of different values ​​of j features. In the above example, the feature of "language" has two choices of Chinese and English, and the feature of "genre" has two choices of movie and TV series, thenwij w_{ij}wijThe dimension is 2 × 2 = 4 2\times 2=42×2=4. When the number of different values ​​of the two features before the combination is not large, there will be no big problem in this way. But for some problems, there are user IDs and item IDs, and the number of users and items is tens of millions, tens of millions times tens of millionsm × nm\times nm×n , such a large amount of parameters, cannot be learned.

How to deal with this "high-dimensional combination feature"? Suppose the number of users and items are mmm andnnn , an effective method is to separate the two features withkkk -dimensional low-dimensional vector representation (k ≪ m , k ≪ nk\ll m,k\ll nkm,kn ), so originallym × nm\times nm×The learning parameter of n is reduced tom × k + n × km\times k + n\times km×k+n×k , which is actually equivalent to the matrix decompositionin the recommendation algorithm.


Reference:
[1] Zhuge Yue, Huluwa, "Hundred Faces of Machine Learning", China Industry and Information Technology Publishing Group, People's Posts and Telecommunications Publishing House

おすすめ

転載: blog.csdn.net/weixin_39679367/article/details/124317646