Dimensionality reduction in machine learning (disaster of dimensionality and explosion of dimensionality)

In machine learning, dimensionality reduction refers to the process of simplifying data representation by reducing the feature dimensions of the data. High-dimensional data sets may contain redundant information, and the goal of dimensionality reduction is to retain as much useful information as possible while reducing the dimensionality of the data. The main advantages of dimensionality reduction include improving the computational efficiency of the model, mitigating the effects of the curse of dimensionality, and better visualizing the data.

Dimensional disaster and dimensional explosion

1. Curse of Dimensionality:

The curse of dimensionality refers to the effect that in high-dimensional space, the performance of many commonly used distance measures and machine learning algorithms will be affected and no longer applicable to low-dimensional space. Mainly manifested in the following aspects:

  • Sample sparsity : As the dimensionality increases, the training data becomes very sparse in the high-dimensional space, making the distance between samples become relatively large.

  • Distance calculation problem : In high-dimensional space, the calculation results of Euclidean distance will be affected by the increase in dimension. The distances between all data points tend to be equal, which reduces the discrimination of distance.

  • More data is needed : As the dimension increases, more data points are needed to maintain the same density of sample distribution, otherwise the model is prone to overfitting.

2. Curse of Dimensionality Explosion:

Dimension explosion refers to the sharp increase in the distance between data points in high-dimensional space, leading to problems such as reduced model performance and weakened generalization capabilities.

  • Increased computational complexity : In high-dimensional space, for many algorithms&

Guess you like

Origin blog.csdn.net/u011095039/article/details/135225955