[Sklearn Lecture] FIG scikit-learn method glance

Preface
sklearn must not have introduced the word, she is one of the most famous in the field of machine learning python module, if you want to have some achievements in the field of machine learning, will not open around sklearn

sklearn the official website link http://scikit-learn.org/stable/index.html#

First of all, put an official online sklearn the structure:

Each category links: https://blog.csdn.net/fuqiuai/article/details/79495865

 

 

We can see from the chart, machine learning, divided into four blocks, namely,

 classification (classification),

 regression (regression), 

 clustering (clustering), 

 dimensionality reduction (降维)。

Given a sample characteristics, we hope to predict its corresponding attribute value, if it is discrete, then this is a classification problem, on the contrary, if it is a continuous real number, which is a regression problem.

Given a set of sample characteristics, we do not have a corresponding property value, but this group would like to explore the distribution of samples dimensional space, such as analysis of samples which come closer, which samples far apart, which is part of poly class problem.

If we want to use a lower dimension subspace to represent the original high-dimensional feature space, then this is the problem of dimensionality reduction.

classification & regression
       无论是分类还是回归,都是想建立一个预测模型 ,给定一个输入  , 可以得到一个输出 : 
     不同的只是在分类问题中,  是离散的; 而在回归问题中  是连续的。所以总得来说,两种问题的学习算法都很类似。所以在这个图谱上,我们看到在分类问题中用到的学习算法,在回归问题中也能使用。分类问题最常用的学习算法包括 SVM (支持向量机) , SGD (随机梯度下降算法), Bayes (贝叶斯估计), Ensemble, KNN 等。而回归问题也能使用 SVR, SGD, Ensemble 等算法,以及其它线性回归算法。

clustering
      聚类也是分析样本的属性, 有点类似classification, 不同的就是classification 在预测之前是知道  的范围, 或者说知道到底有几个类别, 而聚类是不知道属性的范围的。所以 classification 也常常被称为 supervised learning, 而clustering就被称为unsupervised learning。 
clustering 事先不知道样本的属性范围,只能凭借样本在特征空间的分布来分析样本的属性。这种问题一般更复杂。而常用的算法包括 k-means (K-均值), GMM (高斯混合模型) 等。

dimensionality reduction
      降维是机器学习另一个重要的领域, 降维有很多重要的应用, 特征的维数过高, 会增加训练的负担与存储空间, 降维就是希望去除特征的冗余, 用更加少的维数来表示特征.降维算法最基础的就是PCA了, 后面的很多算法都是以PCA为基础演化而来。




Guess you like

Origin www.cnblogs.com/ljt1412451704/p/11598679.html