python achieve PCA algorithm theory

Data PCA principal component analysis and principal component analysis principles to achieve python

1, the main component analysis, after the first principal component is obtained, if necessary under a main component is obtained, the original data is required to the first principal component is obtained and then removed after the new data X 'of the first main component, that is the second main component of the original data X, to the cycle.


2, using the principle of the PCA algorithm performs data dimension reduction, which mathematical principle computing process shown below, which dimension reduction process some information is lost , thus the use of the recovery process to restore the original high-dimensional data, it will return to the original data map points on the new main component, rather than the original coordinate points .

(1) dimension reduction of high dimensional data (data from the k-dimensional reduced dimension n)


(2) dimensional k-dimensional data obtained restored to the original n-dimensional data set from the drop

 

3, a specific implementation principle dimensionality reduction codes as follows:

import  numpy as np
import matplotlib.pyplot as plt
x=np.empty((100,2))
x[:,0]=np.random.uniform(0.0,100.0,size=100)
x[:,1]=0.75*x[:,0]+3.0*np.random.normal(0,3,size=100)
plt.figure()
plt.scatter(x[:,0],x[:,1])
plt.show()

#demean操作函数定义
def demean(x):
return x-np.mean(x,axis=0)
print(x)
print(np.mean(x,axis=0))
print(demean(x))
print(np.mean(demean(x),axis=0))
x_demean=demean(x)

#梯度上升法的函数定义
def f(w,x):
return np.sum((x.dot(w))**2)/len(x)
def df_math(w,x):
return x.T.dot(x.dot(w))*2/len(x)
def df_debug(w,x,epsilon=0.00001):
res=np.empty(len(x))
for i in range(len(x)):
w1=w.copy()
w1[i]=w1[i]+epsilon
w2= w.copy()
w2[i] =w2[i]-epsilon
res[i]=(f(w1,x)-f(w2,x))/(2*epsilon)
return res
def derection(w):
return w/np.linalg.norm(w)
def gradient_ascent1(x,eta,w_initial,erro=1e-8, n=1e6):
w=w_initial
w=derection(w)
i=0
while i<n:
gradient =df_math(w,x)
last_w = w
w = w + gradient * eta
w = derection(w) #注意1:每次都需要将w规定为单位向量
if (abs(f(w,x) - f(last_w,x))) < erro:
break
i+=1
return w
w0=np.random.random(x.shape[1]) #注意2:不能从0向量开始
print(w0)
eta=0.001 #注意3:不能将数据进行标准化,即不可以使用standardscaler进行数据标准化
w1=gradient_ascent1(x_demean,eta,w0)
print(w1)
q=np.linspace(-40,40)
Q=q*w1[1]/w1[0]
plt.figure(1)
plt.scatter(x[:,0],x[:,1])
plt.plot(q,Q,"r")
print(w1[1]/w1[0])

#求取数据的前n个的主成分,循环往复即可
x2=np.empty(x.shape)
for i in range(len(x)):
x2[i]=x_demean[i]-x_demean[i].dot(w1)*w1
plt.figure()
plt.scatter(x2[:,0],x2[:,1],color="g")
plt.show()
w00=np.random.random(x.shape[1])
print(w00)
w2=gradient_ascent1(x2,eta,w00)
print(w2)

#求取n维数据的前n个主成分的封装函数
def first_n_compnent(n,x,eta=0.001,erro=1e-8, m=1e6):
x_pca=x.copy()
x_pca=demean(x_pca)
res=[]
for i in range(n):
w0=np.random.random(x.shape[1])
w=gradient_ascent1(x_pca,eta,w0)
res.append(w)
x_pca=x_pca-x_pca.dot(w).reshape(-1,1)*w
return res
print(first_n_compnent(2,x))
实际的运行效果如下所示:










Guess you like

Origin www.cnblogs.com/Yanjy-OnlyOne/p/11323342.html