分类中基于代价的特征选择的多目标粒子群优化方法

引用

Latex

@ARTICLE{7243331,
author={Y. Zhang and D. w. Gong and J. Cheng},
journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
title={Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification},
year={2017},
volume={14},
number={1},
pages={64-75},
keywords={bioinformatics;decision making;encoding;feature selection;particle swarm optimisation;probability;signal processing;PSO-based multiobjective feature selection algorithm;Pareto domination relationship;Pareto front;bioinformatics;classification performance;classification problems;cost-based feature selection problems;crowding distance;data-preprocessing technique;decision-makers;effective hybrid operator;external archive;feature subsets;multiobjective feature selection algorithms;multiobjective particle swarm optimization;nondominated solutions;probability-based encoding technology;signal processing;single-objective optimization problem;Bioinformatics;Classification algorithms;Genetic algorithms;IEEE transactions;Optimization;Particle swarm optimization;Search problems;Feature selection;cost;multi-objective;particle swarm optimization},
doi={10.1109/TCBB.2015.2476796},
ISSN={1545-5963},
month={Jan},}

Normal

Y. Zhang, D. w. Gong and J. Cheng, “Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification,” in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 14, no. 1, pp. 64-75, Jan.-Feb. 1 2017.
doi: 10.1109/TCBB.2015.2476796
keywords: {bioinformatics;decision making;encoding;feature selection;particle swarm optimisation;probability;signal processing;PSO-based multiobjective feature selection algorithm;Pareto domination relationship;Pareto front;bioinformatics;classification performance;classification problems;cost-based feature selection problems;crowding distance;data-preprocessing technique;decision-makers;effective hybrid operator;external archive;feature subsets;multiobjective feature selection algorithms;multiobjective particle swarm optimization;nondominated solutions;probability-based encoding technology;signal processing;single-objective optimization problem;Bioinformatics;Classification algorithms;Genetic algorithms;IEEE transactions;Optimization;Particle swarm optimization;Search problems;Feature selection;cost;multi-objective;particle swarm optimization},
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7243331&isnumber=7842713


摘要

Feature selection is an important data-preprocessing technique in classification problems — bioinformatics and signal processing

  • maximizing the classification performance
  • minimizing the cost that may be associated with features

cost-based feature selection

multi-objective particle swarm optimization (PSO)

  • a probability-based encoding technology
  • an effective hybrid operator
  • the ideas of the crowding distance, the external archive, and the Pareto domination relationship

compared with:several multi-objective feature selection algorithms

5 benchmark datasets


主要内容


PSO

这里写图片描述


FS

  1. filter
  2. wrapper

多目标

  1. the number of features
  2. the classification performance

support vector machine classifier

chaotic mappings:

  • logistic
  • tent

算法


A 编码粒子

这里写图片描述

解码:

这里写图片描述


B 适应度评估

成本:

这里写图片描述

总成本:

这里写图片描述

the classification error rate:
the leave-oneout cross-validation (LOOCV) of k-NN

the one nearest neighbor (1-NN) method:
In this method, a datum from the original dataset is selected as a testing sample, and the rest constitute the training samples. Then the 1-NN classifier predicts the class of the testing sample by calculating and sorting the distances between the testing sample and the training ones.
— repeated for each datum

这里写图片描述


C 外部存档更新

the crowding distance

the Pareto dominated comparison


D 更新Gbest and Pbest

a domination-based strategy — Pbest

archive — Gbest :
the diversity of non-dominated solutions — the crowding distance
the binary tournament — crowding distances


E 混合变异

trapped in local optima

  1. the re-initialization operator — reinitialize the fly velocities in each generation (10%)
  2. the jumping mutation — uniformly jump in any dimensional space with the probability of p m (a partial re-initialization)

the two operators does not add much computational burden


F 算法框架

这里写图片描述

acceleration coefficients 加速度系数:

这里写图片描述


G 收敛性分析

这里写图片描述


H 复杂度分析

Space Complexity
the archive memorizer — O ( N a  × D )
memorizer for the particles — O ( N s ×  D )
total — O ( max ( N a , N s ) ×  D )

Computational Complexity — main time complexity
the Pareto comparison — O ( M ×  N s ×  N a ) basic operation
the crowding distance metric — O ( M ×   N a ×   log N a )
the Pbest update — O ( M ×  N s )
the Gbest update — O ( N s )
the worst case time complexity — O ( M ×  N s ×  N a )


试验


A 数据集

这里写图片描述


B 比较的算法及参数

  1. DE-based multi-objective feature selection algorithm (DEMOFS)
  2. the NSGA-based feature selection algorithm (NSGAFS)
  3. the SPEA2-based feature selection algorithm (SPEAFS)
  4. NSGAFS — based on the idea of NSGA-II
  5. 本文的HMPSOFS

All the algorithms are wrapper approaches

K-nearest neighbor (KNN)

the jumping probability is set to 0.01


C 性能度量

  1. the hyper-volume (HV) metric
  2. the two-set coverage (SC) — the degree of convergence of two Pareto optimal sets
  3. the SP metric — To estimate the distribution of solutions throughout the Pareto optimal set

D 混合变异分析

  1. HMPSOFS/J — HMPSOFS without the jumping mutation
  2. HMPSOFS/JR — HMPSOFS without the two operators

这里写图片描述
这里写图片描述
这里写图片描述
the re-initialization proportion


E

这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述

猜你喜欢

转载自blog.csdn.net/u010203404/article/details/80209165