quote
LaTex
@article{JIMENEZ201775,
title = “Multi-objective evolutionary feature selection for online sales forecasting”,
journal = “Neurocomputing”,
volume = “234”,
pages = “75 - 92”,
year = “2017”,
issn = “0925-2312”,
doi = “https://doi.org/10.1016/j.neucom.2016.12.045“,
url = “http://www.sciencedirect.com/science/article/pii/S0925231216315612“,
author = “F. Jim茅nez and G. S谩nchez and J.M. Garc铆a and G. Sciavicco and L. Miralles”,
keywords = “Multi-objective evolutionary algorithms, Feature selection, Random forest, Regression model, Online sales forecasting”
}
Normal
F. Jiménez, G. Sánchez, J.M. García, G. Sciavicco, L. Miralles,
Multi-objective evolutionary feature selection for online sales forecasting,
Neurocomputing,
Volume 234,
2017,
Pages 75-92,
ISSN 0925-2312,
https://doi.org/10.1016/j.neucom.2016.12.045.
(http://www.sciencedirect.com/science/article/pii/S0925231216315612)
Keywords: Multi-objective evolutionary algorithms; Feature selection; Random forest; Regression model; Online sales forecasting
Summary
historical sales figures
products characteristics and peculiarities
sound financial and business plans
an accurate regression model for online sales forecasting:
a novel feature selection methodology
multi-objective evolutionary algorithm
ENORA (Evolutionary NOn-dominated Radial slots based Algorithm)
a wrapper method
regression model learner — Random Forest
integrates feature selection for regression, model evaluation, and decision making
in order to choose the most satisfactory model
an a posteriori process
a multi-objective context
main content
root mean squared error (RMSE)
ENORA (Evolutionary NOn-dominated Radial slots based Algorithm)
a (μ + λ) survival strategy
an elitist method
μ = λ = N
N is the size of the population,
binary tournament selection,
and self-adaptive crossover and mutation
for multi-objective evolutionary optimization
a rank-crowding-better function
— Objective function
exist
After normalization
— number of objective functions
NSGA-II (Non-dominated Sorted Genetic Algorithm)
a (μ + λ) strategy
a binary tournament selection
a rank-crowding better function
Difference between ENORA and NSGA-II
how the calculation of the ranking of the individuals in the population is performed
- ENORA:the non-domination level of the individual in its slot
- NSGA-II:the non-domination level of the individual in the whole population
In the binary tournament, can the dominant individual win? Can
individual C be better than B to improve diversity
Feature selection
算法:
- supervised
- unsupervised
- semi-supervised
Depends on whether the training set is labeled
模型:
- filter — statistical measures
- wrapper — a search problem
- embedded — model-dependent
算法步骤:
- subset generation — greedy hill-climbing approach, sequential forward selection, sequential backward elimination, bi-directional selection, branch and bound, beam search, Las Vegas algorithms, evolutionary algorithms, and particle swarm optimization algorithms.
- subset evaluation — multivariate filter methods (the distance, the uncertainty, the dependence, and the consistency) + wrapper methods (the accuracy)
- stopping criterion
- result validation
Many goals
- accuracy
- number of features
- number of instances
- the cardinality and granularity of the subset selection
- the cross-validation accuracy
- the false positive rate
- the false negative rate
- the sensitivity
- the specificity
- measures of consistency, dependency, distance and information
- error identification rate
- undetected identification rate
algorithm
Simultaneous optimization of feature representation and crossover and mutation operators used
optimize the target:
the root mean squared error
the cardinality of the subset
a Bernoulli random variable
maintaining diversity in the population and sustaining the convergence capacity of the evolutionary algorithm
test
data set — the Kaggle community — predictive modeling competitions — the Online
Product Sales competition
population size equal to 1000 and for 100 generations
100,000 evaluations
10-folds cross-validation