Bobo老师机器学习笔记第七课-如何求得前N个主成分

在上一篇博客中B oBo老师介绍了主成分分析法的原理，以及用代码实现了如何求得一个主成分。那这篇文章中我们主要讲述如何求得前N个主成分，以及如何用代码实现。

1、如何求取前N个主成分？

主要方法是数据进行改变，将数据在第一个主成分上的分量给去掉，然后在新的数据上求取第一主成分, 而在新数据的第一主成分其实就是原来数据的第二主成分，那么如何求取新数据呢，如下图。

上图中X‘就是原来数据减去在第一主成分上分量获取的新数据， w是第一主成分，X‘（project）是数据X在w方向的分量。

代码实现：

'''
封装自己的PCA算法，用来求取前N个主成分

'''
import numpy as np
import matplotlib.pyplot as plt


class PCA(object):

    def __init__(self, n_components):
        self.n_components = n_components  # 求前几个主成分的个数
        self.components_ = None

    def fit(self, x_train, eta=0.01, n_iters=1e4, eplosion=1e-18):

        def demean(X):
            """
            Mekes the mean of X is zero.
            :param X:  Martrix
            :return:  matrix which mean of X is zero
            """
            return X - np.mean(X, axis=0)

        def f(X, w):
            return np.sum((X.dot(w)) ** 2) / len(X)

        def direction(w):
            return w / np.linalg.norm(w)

        def df(X, w):
            return (X.T.dot(X.dot(w))) * 2. / len(X)

        def first_component(X, eta, initial_w, n_iters, eplosion):
            # 在此处混淆了一个概念，就是当适用随机梯度下降法的时候，不用n_iter来控制，梯度上升发还是需要，因为它依然是按照梯度方法也就是增大最大的方向进行

            w = direction(initial_w)
            cur_index = 0
            while cur_index < n_iters:
                last_w = w
                w = w + eta * df(X, w)
                w = direction(w)
                if abs(f(X, w) - f(X, last_w)) < eplosion:
                    break
                cur_index += 1

            return w

        X = demean(x_train)

        self.components_= np.empty((self.n_components, X.shape[1]))
        for i in range(self.n_components):
            # 每次都要进行初始化
            initial_w = np.random.random(X.shape[1])
            w = first_component(X, eta, initial_w, n_iters, eplosion)
            self.components_[i, :] = w
            X = X - X.dot(w.reshape(-1, 1)) * w



        return self.components_


    def __repr__(self):
        return "PCA(n_components=%d)" % self.n_components


if __name__ == '__main__':
    x = np.random.randint(1, 100, size=100)
    X = np.empty((100, 2))
    X[:, 0] = x
    X[:, 1] = 0.75 * x + 3. + np.random.normal(1,10., size=len(x))
    pca = PCA(n_components=2)

    pca.fit(X)

    print (pca.components_)

运行结果：

[[ 0.75221024 0.65892318]
[ 0.65892319 -0.75221023]]

这是数据X在2个方向的主成分，可以通过点乘会发现这两个主成分结果为0，这是因为它们是垂直的。我在本地计算出来是

2.053357484044227e-12

总结：

扫描二维码关注公众号，回复： 3935765 查看本文章

1、在这个里面要理解清楚分量的求法

2、相互点乘时候要注意维度

Bobo老师机器学习笔记第七课-如何求得前N个主成分

猜你喜欢