DR of ML: Introduction to sklearn.manifold (algorithm module for manifold learning and dimensionality reduction), partial source code interpretation, and detailed guides for case applications

DR of ML: Introduction to sklearn.manifold (algorithm module for manifold learning and dimensionality reduction), partial source code interpretation, and detailed guides for case applications

Table of contents

Introduction to sklearn.manifold

Overview of sklearn.manifold (algorithm module for manifold learning and dimensionality reduction)

English translation

Part of the source code interpretation of sklearn.manifold


Introduction to sklearn.manifold

Overview of sklearn.manifold (algorithm module for manifold learning and dimensionality reduction)

Introduction

sklearn.manifold is a module in the scikit-learn machine learning library, which is mainly used for algorithm implementation of manifold learning and dimensionality reduction . This module contains implementations of various dimensionality reduction and manifold learning algorithms , such as PCA (Principal Component Analysis), LLE (Local Linear Embedding), Isomap, MDS (Multidimensional Scaling), and T-SNE (t-distributed Stochastic Neighborhood Embedding), etc.

effect

Manifold learning and dimensionality reduction algorithms can be easily implemented using the sklearn.manifold module. At the same time, many parameters can be adjusted so that users can adjust them according to actual application requirements.

In addition, this module also provides some visualization tools , which can visualize the data set after dimensionality reduction , which is convenient for users to analyze and display the results.

Instructions for use

In the sklearn.manifold module, each algorithm has a corresponding class. For example, the PCA algorithm corresponds to the PCA class, the LLE algorithm corresponds to the LocallyLinearEmbedding class, and the Isomap algorithm corresponds to the Isomap class.

These classes all provide the fit_transform method, which can perform dimensionality reduction or manifold learning on the dataset and return the dimensionality-reduced dataset. In addition, these classes also provide other parameters, such as n_components is used to specify the dimension of the output, n_neighbors is used to specify the number of neighbors, and so on.

English translation

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

Manifold is a nonlinear dimensionality reduction method. Algorithms for this task are based on the idea that many datasets are simply artificially high in dimensionality .

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

The simplest way to accomplish this dimensionality reduction is by taking a random projection of the data. Though this allows some degree of visualization of the data structure, the randomness of the choice leaves much to be desired. In a random projection, it is likely that the more interesting structure within the data will be lost.

To address this concern, a number of supervised and unsupervised linear dimensionality reduction frameworks have been designed, such as Principal Component Analysis (PCA), Independent Component Analysis, Linear Discriminant Analysis, and others. These algorithms define specific rubrics to choose an “interesting” linear projection of the data. These methods can be powerful, but often miss important non-linear structure in the data.

It is difficult to visualize the inner structure of high-dimensional data sets. While 2D or 3D data can be graphed to show the underlying structure of the data, the equivalent high-dimensional graphs are difficult to understand. To help visualize the structure of the dataset, the dimensionality must be reduced somehow.

The easiest way to achieve this dimensionality reduction is to randomly project the data. While doing so allows some degree of visualization of the data structure, the randomness of the selections leaves much room for improvement. In a random projection, the more interesting structure of the data is likely to be lost.

To solve this problem, many supervised and unsupervised linear dimensionality reduction frameworks are designed, such as principal component analysis (PCA), independent component analysis, linear discriminant analysis, etc. These algorithms define specific criteria for selecting "interesting" linear projections of the data. These methods can be very powerful, but often ignore important nonlinear structures in the data.

Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data. Though supervised variants exist, the typical manifold learning problem is unsupervised: it learns the high-dimensional structure of the data from the data itself, without the use of predetermined classifications.

Manifold can be thought of as an attempt to generalize linear frameworks, such as PCA, to sensitive non-linear data structures. Although supervised variables exist, the typical Manifold problem is unsupervised: it learns the high-dimensional structure of the data from the data itself, without using predetermined classifications.

Part of the source code interpretation of sklearn.manifold

Guess you like

Origin blog.csdn.net/qq_41185868/article/details/130320793