PCA dimensionality reduction notes

PCA dimensionality reduction notes

  1. A non-supervised machine learning algorithm

  2. Dimensionality reduction is mainly used for data

  3. Through dimensionality reduction can be found more easily understood human characteristics

  4. Other applications: Visualization; denoising

The PCA (the Principal the Component the Analysis) is a common method of data analysis.

PCA linear transformation, the original data into a set of linearly independent representation of the dimensions, may be used for the main component of the extracted feature data, commonly used in high-dimensional data dimension reduction.

Invensys down the road

raw data:

According to the two-dimensional data dimensionality reduction to a number of dimensions:

 

This characteristic feature is to remove the 2-dimensional appearance after 1 and down, we can feel from the figure, the better the right side than the left, because of their relatively large spacing dispersion, can be distinguished relatively high, so that the data information is also retained more, we think that the amount of data loss even smaller, reaching us not only reduces the dimensions, but also to retain as much information purposes.

How to find the sample spacing make it the largest of the shaft?

Dimensionality reduction derivation

 

Stochastic gradient descent solve dimensionality reduction

The first step will be the sample mean zero

x = x-x_mean

 

Step Two: seeking the first principal component unit vector W = (w1, w2)

All samples will be mapped to the data by the projection in this direction vector to get a new vector, we call X_pr, so our final goal is to find the maximum variance direction below all the vectors of this vector X_pr:

 

The third step: gradient ascent to optimize the variance

PCA solve the dimensionality reduction

Reduction target dimension optimization problem

After a set of N-dimensional vector of reduced dimension K (0 <K <N), the goal is to select K units (mode 1) orthonormal basis, so that conversion to this raw data set group, each field twenty-two covariance is 0, and the variance of the field is as large as possible (in the perpendicular constraint, taking the maximum number K of variance).

 

Covariance matrix diagonalization

premise:

 

Provided the original data covariance matrix corresponding to the matrix X is C, and P is a matrix consisting of rows yl group, set Y = PX, then Y is the group X for data transform P. Let Y covariance matrix is ​​D, we deduce about the relationship between D and C:

 

Now things are very clear! We are looking for P is nothing but let the original covariance matrix diagonalization of C P .

PCA algorithm steps of

It has the m n-dimensional data.

  1. The raw data is transposed X

  2. Each row of X (representing a property field) zero-mean, that is, minus the mean of the line

  3. Calculated covariance matrix

  4. 求出协方差矩阵的特征值及对应的特征向量

  5. 将特征向量按对应特征值大小从上到下按行排列成矩阵取前k行组成矩阵P

  6. Y=PX即为降维到k维后的数据

PCA实例

用PCA的方法将整个二维的数据降到一维

 

1.将原始数据X进行转置

 

2.将X的每一行(代表一个属性字段)进行零均值化,即减去这一行的均值

3.求出协方差矩阵

4.求出协方差矩阵的特征值及对应的特征向量

5.将特征向量按对应特征值大小从上到下按行排列成矩阵取前k行组成矩阵P

 

6.Y=PX即为降维到k维后的数据

 

这里可以看到D是一个对角化的矩阵,满足的条件

 

SVD算法解决降维

也就是在PCA的过程中,第三步和第四步改为奇异值分解:

 

SVD的计算步骤

(1) 对矩阵A和A转置的乘积进行特征值分解

 

其中v就是右奇异向量.

(2)通过仿真求解左奇异向量

其中奇异值σ跟特征值很类似,在矩阵Σ中也是从大到小排列,而且σ的减少特别的快,在很多情况下,前10%甚至1%的奇异值的和就占了全部的奇异值之和的99%以上了。也就是说,我们也可以用前r大的奇异值来近似描述矩阵,其中r<<n,这里定义一下部分奇异值分解:

 

(3)选择适当的r值,就可以将原数据进行压缩

使用特征值分解实现降维

import numpy as np
​
​
def PCA(X, k):
    data = X - np.mean(X, axis=0)
    # 计算协方差矩阵
    cov = np.cov(data.T)
    # 计算协方差矩阵的特征值和特征向量
    eig_val, eig_vec = np.linalg.eig(cov)
    # 将特征值和特征向量组成一个元组
    eig_pairs = [(np.abs(eig_val[i]), eig_vec[:, i]) for i in range(data.shape[1])]
    # 将特征值和特征向量从大到小排序
    eig_pairs.sort(reverse=True)
    # #保留最大的K个特征向量
    ft = []
    for i in range(k):
        ft.append(list(eig_pairs[i][1]))
    return np.dot(data, np.array(ft).T)

使用SVD分解实现降维

def SVD(data, k=2):
    data = data - np.mean(data)
    u, s, vt = np.linalg.svd(data)
    v_reduce = vt[:k, :].T  # 取前k个特征向量
    Z = np.dot(data, v_reduce)
    return Z

梯度上升实现降维

def GD_PCA(X,n_components, eta=0.01, n_iters=1e4):
    # 将均值归零 也就是将坐标轴进行移动
    def demean(X):
        return X - np.mean(X, axis=0)
​
    # 损失函数(方差)
    def f(w, X):
        return np.sum((X.dot(w) ** 2)) / len(X)
​
    # 损失函数的导数
    # 把w向量变成单位向量 也就是除于自己的模
    def direction(w):
        return w / np.linalg.norm(w)
​
    # 进行梯度下降
    def first_component(X, initial_w, eta=0.01, n_iters=1e4, epsilon=1e-8):
        # 保证每次的w为单位向量
        w = direction(initial_w)
        cur_iter = 0
        while cur_iter < n_iters:
            gradient = X.T.dot(X.dot(w)) * 2. / len(X)
            last_w = w
            # 梯度上升
            w = w + eta * gradient
            # 每一次重新迭代前,需要把w向量变成单位向量
            w = direction(w)
            if (abs(f(w, X) - f(last_w, X)) < epsilon):
                break
            cur_iter += 1
        return w
​
    # 先将整个数据的中心点移动到圆点
    X_pca = demean(X)
    # 分几类就是几行,然后列数就是特征集合的列数
    components_ = np.empty(shape=(n_components, X.shape[1]))
    # 分几类,循环几次
    for i in range(n_components):
        # 此时初始化theta不能为0了
        initial_w = np.random.random(X_pca.shape[1])
        w = first_component(X_pca, initial_w, eta, n_iters)
        components_[i, :] = w
        # 把X_pca在w上的分量去除
        X_pca = X_pca - X_pca.dot(w).reshape(-1, 1) * w
    return X.dot(components_.T)

使用鸢尾花实现降维:

 

 

 

Guess you like

Origin www.cnblogs.com/TimVerion/p/11305588.html