【Detailed Explanation of Practical Cases of MATLAB Data Processing (18)】—Using Self-Organizing Feature Mapping Network to Realize Horizontal Clustering of Asian Football

1. Problem description

The results of the Chinese men's football team have always affected the hearts of fans. Many people believe that the Chinese team is already at the third-rate or even the lowest level in Asia; in the competitions in Asia, the Chinese team once had a good performance, but suffered successive disastrous defeats in recent years.
In this context, it is necessary to scientifically count the game data of various Asian teams, and convincingly give the level and strength of men's football in each country.
Clustering using unsupervised learning. Clustering does not need to know the level and strength of some teams in advance. It only needs to give the number of categories, and the algorithm will divide all samples into categories according to the principle of similarity .
To obtain more accurate clustering results, the key lies in selecting appropriate sample features. It answers the question of what kind of indicators to choose to correctly reflect the strength and level of the team. To compare the development level of football in various countries, it is natural not to choose the results of the domestic league of the country, but the performance of each team in the international official competition should be taken into consideration.
(1) For the World Cup, if it enters the final round, its final ranking (1~32); if it does not enter the final round, if it enters the top ten qualifiers, it will be coded as 33; if the qualifying group does not qualify, it will be coded as 43.
(2) For the Asian Cup, if you reach the semi-finals, take its final ranking (1-4); if you enter the quarter-finals, code it as 5; if you enter the top 16, code it as 9; if you do not qualify in the qualifiers, code it for 17.
The list of team results is as follows:
insert image description here

2. The self-organizing feature map network realizes the principle of horizontal clustering of Asian football

The algorithm flow chart is as follows:
insert image description here

3. Algorithm steps

3.1 Define samples

The clustering involves a total of 16 countries, and the team performance of each country is represented by a four-dimensional vector:

%% 定义输入样本
N = 16;
strr = {
    
    '中国','日本','韩国','伊朗','沙特','伊拉克','卡塔尔','阿联酋','乌兹别克','泰国',...
    '越南','阿曼','巴林','朝鲜','印尼','澳大利亚'};
data = [43,43,9,9;        % 中国
    28,9,4,1;               % 日本
    17,15,3,3;              % 韩国
    25,33,5,5;              % 伊朗
    28,33,2,9;              % 沙特
    43,43,1,5;              % 伊拉克
    43,33,9,5;              % 卡塔尔
    43,33,9,9;              % 阿联酋 
    33,33,5,4;              % 乌兹别克 
    43,43,9,17;             % 泰国 
    43,43,5,17;             % 越南
    43,43,9,17;             % 阿曼
    33,33,9,9;              % 巴林
    33,32,17,9;             % 朝鲜
    43,43,9,17;             % 印尼
    16,21,5,2]';            % 澳大利亚

3.2 Create a network

Use the selforgmap function in the MATLAB Neural Network Toolbox to create:

% 2*2 自组织映射网络
net = selforgmap([2,2]);

3.3 Network training

Use the train function to train the input samples:

data = mapminmax(data);
net = init(net);
net = train(net, data([1,2,3,4],:));

3.4 Testing

Testing of self-organizing networks differs from testing in supervised learning, where the training data is the same as the test data. Input the matrix used for training into the network, and the classification label of each sample can be obtained:

%% 测试
y = net(data([1,2,3,4],:));

% 将向量表示的类别转为标量
result = vec2ind(y);

3.5 Display clustering results

When the clustering is completed, the samples classified into one class are given the same classification label, but what number is used as the classification label for different classes is random. In order to get the correct display results, the sum of the eigenvector values ​​of each clustering category is counted. Since the lower the value, the higher the level, the statistical results can be used to judge whether the same category is better or worse:

%% 输出结果
% 将分类标签按实力排序
score = zeros(1,M);
for i=1:M
    t = data(:, result==i);
    score(i) = mean(t(:));
end
[~,ind] = sort(score);

result_ = zeros(1,N);
for i=1:M
    result_(result == ind(i)) = i; 
end

fprintf('  足球队            实力水平\n');
for i = 1:N
    fprintf('   %-8s      第 %d 流\n', strr{
    
    i}, result_(i)) ;
end

4. Running results

The program running results are as follows:
Neural network training process:
insert image description here
Classification category:
insert image description here

Five, complete code

The complete code is as follows:

% football.m
% 亚洲足球水平聚类
%% 清空工作空间
clear,clc
close all;

rng(now)
M=4;
%% 定义输入样本
N = 16;
strr = {
    
    '中国','日本','韩国','伊朗','沙特','伊拉克','卡塔尔','阿联酋','乌兹别克','泰国',...
    '越南','阿曼','巴林','朝鲜','印尼','澳大利亚'};
data = [43,43,9,9;        % 中国
    28,9,4,1;               % 日本
    17,15,3,3;              % 韩国
    25,33,5,5;              % 伊朗
    28,33,2,9;              % 沙特
    43,43,1,5;              % 伊拉克
    43,33,9,5;              % 卡塔尔
    43,33,9,9;              % 阿联酋 
    33,33,5,4;              % 乌兹别克 
    43,43,9,17;             % 泰国 
    43,43,5,17;             % 越南
    43,43,9,17;             % 阿曼
    33,33,9,9;              % 巴林
    33,32,17,9;             % 朝鲜
    43,43,9,17;             % 印尼
    16,21,5,2]';            % 澳大利亚

%% 创建网络
% 2*2 自组织映射网络
net = selforgmap([2,2]);

%% 网络训练
data = mapminmax(data);
tic
net = init(net);
net = train(net, data([1,2,3,4],:));
toc

%% 测试
y = net(data([1,2,3,4],:));

% 将向量表示的类别转为标量
result = vec2ind(y);

%% 输出结果
% 将分类标签按实力排序
score = zeros(1,M);
for i=1:M
    t = data(:, result==i);
    score(i) = mean(t(:));
end
[~,ind] = sort(score);

result_ = zeros(1,N);
for i=1:M
    result_(result == ind(i)) = i; 
end

fprintf('  足球队            实力水平\n');
for i = 1:N
    fprintf('   %-8s      第 %d 流\n', strr{
    
    i}, result_(i)) ;
end

Guess you like

Origin blog.csdn.net/didi_ya/article/details/130422238