Machine learning knowledge points check and fill gaps (feature engineering)

1. Feature Engineering

Data sets an upper bound for machine learning, and algorithms just try to approximate that upper bound as much as possible. If the input data itself does not reflect objective facts, no matter how powerful the algorithm is, there is nothing that can be done. Therefore, it is necessary to have a process of collecting, sorting and selecting input data (features), so that the algorithm model can better identify the laws of objective facts.

Generally speaking, feature engineering can be roughly divided into three aspects, one is feature construction, the other is feature generation, and the third is feature selection.

Second, the characteristic structure

Feature construction refers to manually finding and constructing features with business significance from the original data. This part needs to observe the original data based on business knowledge, think about the potential impact form of the problem, and construct and add new features. The segmentation and combination of attributes are commonly used methods. Features that have synergistic effects can be considered for their synergy, and then combined into new features. Different granularities of time information have different effects, and can be divided into hours, weekends and weekdays.

In addition, for example, in the competition between the supply and demand interval between Didi drivers and orders, some participating teams combined the original order data to construct the order volume of the first three hours of each period, as a reference for the instant order trend of the cell on the map of the day.

Personal understanding, this part is the part that contains the most personal experience and domain knowledge.

3. Feature Generation

 

 

4. Feature selection

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324404375&siteId=291194637