Re-Orient: Apply UT on x <==> dot proct of x with each rows of UT. x will be projected on each orthonormal basis (new coordinate) which are the rows of the UT. The geometric meaning of UTx is equivalent to rotate x based on the new cordinate (orthonormal basis) in UT .
ΣUTx is to use Σ to extend or shrink the rotated x along with the new coordinate. (If there is one λ=0, then that dimension will be removed. )
Re-Orient(to original):Same meaning with UT. U will rotate back x to original coordinate.
PCA
Main idea: Find a coordinate transformation matrix P applied on data suc that (一种直观的看法是:希望投影后的投影值尽可能分散。数据越分散,可分性就越强,可分性越强,概率分布保存的就越完整)
The variability of the transformed data can be explained as much as possible along the new coordinates (降维后的信息损失尽可能小,尽可能保留原始样本的概率分布)
For each new coordinate, they should be orthoganal to each other to avoid the redundancy. (降维后的基之间是完全正交的)
So the final objective function is: P∈Rm,d,U∈Rd,margmini=1∑n∣∣xi−UPxi∣∣22(*) , where d is the original dimension of data and m is the new dimension of the transoformed data (r=rank(A) <=m ), x1,...,xn∈Rd , P∈Rd,m is the transformation of new coordinate, U∈Rd,m (in the end U=PT)is the transoformation of original coordinate.
In order to figure out above problems, the essential idea is to maxmize the variances of transfomred data point projected on new coordinates, and at the same time we need to minimize the covariance between two different coordinates(orthogonal<==>covarince is 0). Therefore, equivalently, we can diagolize the symmetric covariance matrix of the transformed data.
Transformed data by matrix P is Y=PX, where P∈Rm,d=2. Then the transofrmed covariance matrix is:
n1YYT=n1(PX)(PX)T=n1PXXTPT=P(n1XXT)PT=PAPT
Once the minimization(*) is done , n1YYT will look like Σ. Underline technique used to achieve the goal is the formula (2) , so that PAPT=Σ=⎣⎡λ1⋱λm⎦⎤
equivalently, A=PTΣP
where λ1=σ12≥λ2=σ22≥...≥λm=σm2, their corresponding eigenvectors construct the transformation matrix P=⎣⎢⎢⎡u1Tu2T...umT⎦⎥⎥⎤∈Rm,d=2, depends on your problems, we can select different r<=m rows in P.
SVD
A will not be a symmetric matrix but with m×n dimension. 注意(SVD): Original Feature is n (the number of colmns), New Feature is m 注意(PCA above) : Original Feature is m (the number of rows)
Formula of Singular Value Decomposition Am,n=UΣVT
The main word of SVD is trying to solve the following two tasks:
First, find a set of orthonormal bais in n dimensional space
Second, once performed on above space, the resulting basis in m dimensional space are still orthonormal.
Suppose we have already got the set of orthonormal bais in n dimensional space Vn,n=[v1,v2,...,vn] (note: some vs can be 0 vectors), where vi⊥vj
Project A on n dimensional bais, we have: [Av1,Av2,...,Avn]
To make the projected basis be orthogonal as well, we need let the following happen: Avi⋅Avj=(Avi)TAvj=viTATAvj=0 Therefore, if vis are eigen vectors of ATA, then the resulting projected basis are also orthogonal because: Avi⋅Avj=(Avi)TAvj=viT(ATAvj)=viTλjvj=0
To scale the projected orthogonal basis into unit length. We have ui=∣Avi∣Avi=∣Avi∣2Avi=(Avi)TAvi)Avi=viT(ATAvi)Avi=viTλiviAvi(note:viTvi=1)=λiAvi
uiλi=uiσi=Avi,, where σi=λi is called singular value, 0≤i≤r, r=rank(A).
In the end, we expand [u1,u2,...,ur] to [u1,u2,...,ur∣ur+1,...,um] and select [vr+1,vr+1,...,vm] from the nullspace of A where Avi=0, i>r, and σi=0 下图中k就是r
A=XY
Take-away message
The PCA just compute Left or Right Singular Matrix (depends how you define covariance matrix) of SVD.