Multi-objective SNP selection based on Pareto optimality

quote

LaTex

@article{GUMUS201323,
title = “Multi objective SNP selection using pareto optimality”,
journal = “Computational Biology and Chemistry”,
volume = “43”,
pages = “23 - 28”,
year = “2013”,
issn = “1476-9271”,
doi = “https://doi.org/10.1016/j.compbiolchem.2012.12.006“,
url = “http://www.sciencedirect.com/science/article/pii/S1476927112001156“,
author = “Ergun Gumus and Zeliha Gormez and Olcay Kursun”,
keywords = “Feature selection, Principal component analysis (PCA), Mutual information (MI), Genomic鈥揼eographical distance, Human Genome Diversity Project SNP dataset”
}

Normal

Ergun Gumus, Zeliha Gormez, Olcay Kursun,
Multi objective SNP selection using pareto optimality,
Computational Biology and Chemistry,
Volume 43,
2013,
Pages 23-28,
ISSN 1476-9271,
https://doi.org/10.1016/j.compbiolchem.2012.12.006.
(http://www.sciencedirect.com/science/article/pii/S1476927112001156)
Keywords: Feature selection; Principal component analysis (PCA); Mutual information (MI); Genomic–geographical distance; Human Genome Diversity Project SNP dataset


Summary

Biomarker discovery

SNP — single nucleotide polymorphism

Traditional Single Objective - Maximize Classification Accuracy

1 High classification accuracy
2 Correlation between genetic diversity of ethnic groups and geographic distance

main content

Dataset:
Human Genome Diversity Project (HGDP) SNP dataset
1064 individuals
52 populations
Raw data:
1043 individuals
per individual — 660,918 SNPs (163 from mitochondrial DNA, excluded) — with 660,755
each SNP — 2 alleles Gene — code is expressed as: { 1 , 0 , 1 }

Goal one :

High classification accuracy — mutual information MI

write picture description here
write picture description here

H — Entropy of a random variable

write picture description here

Goal two :

Genomic geographic correlations — principal components analysis PCA

Due to higher dimensionality - used a "dimensional trick" for PCA

write picture description here

C D × D dimensional covariance matrix
AND N × D is the central data matrix, N D

write picture description here

k i - Feature vector i
Multiply both sides AND

write picture description here

v i = AND k i — covariance matrix AND AND T First i Eigenvectors are multiplied
on both sides AND T

write picture description here

Available:

write picture description here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325564425&siteId=291194637