Use pca for coordinate system conversion and dimensionality reduction

Use PCA for coordinate system conversion

pca is a commonly used method of data dimensionality reduction, and the steps of dimensionality reduction are:

  • Select the first k eigenvalues.

If we do not choose this step, then dimensionality reduction will not be performed, but coordinate system conversion will be performed.

Specific steps

1. First generate data with Gaussian two-dimensional distribution

matlab code

mul = [1 2];
SIGMA = [1 0.81; 0.81 1];
data1 = mvnrnd(mul,SIGMA,500);
plot(data1(:,1),data1(:,2),'*');
axis equal

alt

2. Use PCA to select the coordinate axis

The optimization purpose of the new coordinates is to make the coordinate axes orthogonal, and the variance of the data along these coordinate directions is the largest.

clear;clc;close all;
mul = [1 2];
SIGMA = [1 0.81; 0.81 1];
data1 = mvnrnd(mul,SIGMA,500);
[pc,score,latent] = pca(data1);
figure(1)
axis equal
plot(data1(:,1),data1(:,2),'*');
hold on 
quiver(1,2,pc(1,1),pc(2,1),5)
quiver(1,2,pc(1,2),pc(2,2),5)
plot(sore(:,1),score(:,2))

Insert picture description here
In this way, a new coordinate system can be established.

Introduction to the main process of 3.m code

  1. Generate random data with Gaussian two-dimensional distribution
  2. Use of pca function
  3. Draw vectors on coordinates

PCA performs multi-dimensional dimensionality reduction and evaluation of dimensionality reduction effects

Sometimes when pca reduces the dimensionality, the local manifold of the data is lost, causing bad results.

1. Generate data

First define a function to generate a series of regular points

%生成一系列园点
function [x1,y1] = creat_circle(r1 , r1_ratio,sita_ratio)
sita = 0:0.05:2*pi;
all_num = size(sita);
all_num = all_num(1,2);
%rand : sita
sita_p = randperm(all_num,floor(sita_ratio*all_num));
%rand : r
r_p = rand(1,floor(sita_ratio*all_num))*r1*r1_ratio;
r1_p = repmat(r1,1,floor(sita_ratio*all_num));
r1_p = r1_p - r_p;
x1 = r1_p.*cos(sita_p);
y1 = r1_p.*sin(sita_p);
scatter(x1,y1)

Then run the following code:

% 建立坐标点
clear;clc;close all;
[x1,y1] = creat_circle(3,0.05,0.95);
[x2,y2] = creat_circle(5,0.05,0.95);
[x3,y3] = creat_circle(9,0.05,0.95);
num = size(x1);
z1 = normrnd(5,1,1,num(1,2))+x1;
z2 = wgn(1,num(1,2),1)+4+y2;
z3 = rand(1,num(1,2))+2+x3;
% 画
figure(1)
scatter(x1,y1,'r')
hold on
scatter(x2,y2,'b')
scatter(x3,y3,'g')
figure(2)
scatter3(x1,y1,z1,'r')
hold on
scatter3(x2,y2,z2,'b');
scatter3(x3,y3,z3,'g');

After generation, we can look at the distribution of these points.
Insert picture description hereFrom another angle, we can see the pattern. We hope to preserve this law after dimensionality reduction.
Insert picture description hereHowever, in fact, after using PCA for dimensionality reduction (reduced to 2 dimensions), it is like this: In
Insert picture description herethis way, the effect of dimensionality reduction is not good.

Guess you like

Origin blog.csdn.net/qq_43110298/article/details/104478012