quote
LaTex
@article{GUMUS201323,
title = “Multi objective SNP selection using pareto optimality”,
journal = “Computational Biology and Chemistry”,
volume = “43”,
pages = “23 - 28”,
year = “2013”,
issn = “1476-9271”,
doi = “https://doi.org/10.1016/j.compbiolchem.2012.12.006“,
url = “http://www.sciencedirect.com/science/article/pii/S1476927112001156“,
author = “Ergun Gumus and Zeliha Gormez and Olcay Kursun”,
keywords = “Feature selection, Principal component analysis (PCA), Mutual information (MI), Genomic鈥揼eographical distance, Human Genome Diversity Project SNP dataset”
}
Normal
Ergun Gumus, Zeliha Gormez, Olcay Kursun,
Multi objective SNP selection using pareto optimality,
Computational Biology and Chemistry,
Volume 43,
2013,
Pages 23-28,
ISSN 1476-9271,
https://doi.org/10.1016/j.compbiolchem.2012.12.006.
(http://www.sciencedirect.com/science/article/pii/S1476927112001156)
Keywords: Feature selection; Principal component analysis (PCA); Mutual information (MI); Genomic–geographical distance; Human Genome Diversity Project SNP dataset
Summary
Biomarker discovery
SNP — single nucleotide polymorphism
Traditional Single Objective - Maximize Classification Accuracy
1 High classification accuracy
2 Correlation between genetic diversity of ethnic groups and geographic distance
main content
Dataset:
Human Genome Diversity Project (HGDP) SNP dataset
1064 individuals
52 populations
Raw data:
1043 individuals
per individual — 660,918 SNPs (163 from mitochondrial DNA, excluded) — with 660,755
each SNP — 2 alleles Gene — code is expressed as:
Goal one :
High classification accuracy — mutual information MI
— Entropy of a random variable
Goal two :
Genomic geographic correlations — principal components analysis PCA
Due to higher dimensionality - used a "dimensional trick" for PCA
—
dimensional covariance matrix
—
is the central data matrix,
- Feature vector
Multiply both sides
— covariance matrix
First
Eigenvectors are multiplied
on both sides
Available: