Purpose Description
The essence of the C question I got is a big data analysis question. It needs to infer various things from the data that has been given. However, there are many kinds of parameters of the given data, which need to be selected by themselves. Let everyone figure it out for themselves.
After getting the selected data label
, it is necessary to cut these label
corresponding data from a large amount of data.
At that time, of course, the organizer would not let you pass so easily, because most of the data you selected were incomplete, and the missing year may even be different, which is rather painful.
Solutions
The methods provided by MATLAB to import data from a variety of materials are very powerful, so there is no need to worry about importing. So we need to solve the following problems.
- How to accurately find
label
the corresponding data selected by yourself in a large amount of data? - What if data is missing?
The method of how to select data varies from person to person. The author posted his own code, hoping to give everyone a little inspiration.
As for missing data, either give up this label
and select complete data label
, or find a way to fill in the missing data.
When it comes to complementing, many people naturally think that 插值与拟合或者回归
the subtle differences between these methods are issues that readers should consider, so I won’t go into details. But what should be paid NaN和0
attention to when completing the data is the difference. At the same time, it should be noted that the supplemented data should be placed in the corresponding year, otherwise it will affect the analysis of the meeting.
Code display
#param_cell是读者自己的lab,MSN是总的lab
function OUTPUT=select_MSN_Index(param_cell,MSN)
temp=[];
sizeNum=size(param_cell);
sizeNum=sizeNum(2);
for ind=1:sizeNum
if MSN==param_cell{ind}{1}
temp=cat(2,temp,param_cell{ind}{4});
end
end
OUTPUT=temp;
function DataSet=Split_Data(ProblemData,MSN)
MSN=unique(MSN)
%MSN 是行向量
TX=(1960:2009)';
#...
for i=1:4
TEMP=select_Code_Index(ProblemData,CODE(i*2-1:2*i),2);
for j=1:583
TTEMP=select_MSN_Index(TEMP,char(MSN{j}));
SIZE=size(TTEMP);
if SIZE<50
MX=zeros(1,50-SIZE(2));
MX(:,:)=NaN;
TTEMP=cat(2,MX,TTEMP);
end
switch(i)
case 1
TX(:,j+1)=TTEMP';
#...
end
end
switch(i)
case 1
DataSet(1)={TX};
# ...
end
end
#切割出对应的地方因子(区域)的数据
function [FData_M,LOCATION,Lab]=Split_Factor_Data(title,F_title,DataSet,index)
LOCATION=[];
Lab={};
NUM=1;
for i=1:index
OBJECT=F_title{i,1};
for j=1:584
MSN=title{1,j};
if j~=1
if MSN==OBJECT
LOCATION(:,NUM)=j;
Lab(i,1)={MSN};
Lab(i,2)={OBJECT};
NUM=NUM+1;
end
end
end
FData_M=DataSet(:,LOCATION);
end
#将选取切割好的数据做数据的补齐
function Data=FitData_Cubicinterp(DataSet,index)
Data_X=[1960:2009]';
for i=1:index
Data_Y=DataSet(:,i);
LOCATION=find(Data_Y==0|isnan(Data_Y));
TEMP=Data_Y;
Data_X(LOCATION)=[];
Data_Y(LOCATION)=[];
Param=fit(Data_X,Data_Y,'exp1' );
RESULT=Param(LOCATION+1959);
TEMP(LOCATION)=RESULT;
Data(:,i)=TEMP;
Data_X=[1960:2009]';
figure
plot(Data_X,TEMP)
end
Results display
Due to the variety of results, the MATLAB workspace is directly given, and interested readers can download it by themselves.
Link Password: dsj3