MATLAB Data Processing Series - Elimination of Duplicate Data

MATLAB Data Processing - Removal of Duplicate Data



foreword

Assuming that there are duplicate rows in the 100*8 matrix, how to keep the unrepeated rows, eliminate the duplicate rows, and save them in the original arrangement order?


1. Function

Use the unique function that comes with MATLAB, its format is:

[uniqueData, ~, idx] = unique(data, 'rows', 'stable');

Here 'rows' means to deduplicate by row, and 'stable' means to keep the row that appears for the first time. uniqueData is the cell array after deduplication, and idx is the index of each row in the deduplication array, which can be used to restore the original array.

Two, use

1. If the data is all numeric, you can use the following code:

[uniqueRows, ~, idx] = unique(data,'rows','stable');
uniqueIdx = unique(idx);
uniqueA = data(uniqueIdx,:);

The unique function returns the unique values ​​in idx, that is, the indices at which each duplicate row first occurs, and these indices are finally used to extract the matrix in A.
Note: The data used here is numeric.

2. If the data contains both numeric and text types, how to deal with it? Let's import the data first, here are two methods.

(1) Use the readtable function to read an Excel file containing text data, and then remove duplicate rows.

%% 读取数据
filename = '路径\example.xlsx'; % 输入文件路径和文件名
sheet = 'Sheet1'; % 输入表格名称
data = readtable(filename,'Sheet',sheet);
%% 剔除重复行
[uniqueStrData, ~, idx] = unique(data(:,5:11), 'rows');
% 根据索引取出未重复的行
result = data(unique(idx), :);

(2) Read the excel file into a cell array, and then remove duplicate rows.

[a,b,c]=xlsread('路径\名称.xlsx','Sheet'); % 根据自己的文件名进行调整
% 注:a是Excel数字部分(矩阵形式),b为Excel的文本内容(元胞数组形式),c为Excel全部内容(元胞数组形式).

% 将元胞数组转换为字符串型
strData = string(c);

% 使用unique函数去除重复行,这里可以指定需要进行对比的列范围,
% 例如这里指定511列的内容作为比对行内容的基础
[uniqueStrData, ~, idx] = unique(strData(:,5:11), 'rows'); %这里只保存5:11列的内容

% 如果想依据5:11列的内容作为条件,在去除重复行的同时,保存所有列的内容,可使用如下代码
result = strData(unique(idx), :); % 使用这行代码的同时,上行代码必须运行,因为这里引用的上串代码中的idx

3. Conclusion

This is the content of this explanation. I have been a little busy recently and haven't updated it. See you in the next issue!

Guess you like

Origin blog.csdn.net/xiatiandexia123/article/details/130140049