Title
EPS: automated feature selection in case-control studies using extreme pseudo-sampling
Abstract
Finding informative predictive features in high-dimensional biological case-control datasets is challenging. The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models. First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. The most significant variables in this regression are selected. We present an open-source implementation of the algorithm that is easy to set up, use and customize. Our package enhances the original algorithm by providing new features and customizability for data preparation, model training and classification functionalities. We believe the new features will enable the adoption of the algorithm for a diverse range of datasets.
Year
DOI
Venue
2021
10.1093/bioinformatics/btab214
BIOINFORMATICS
DocType
Volume
Issue
Journal
37
19
ISSN
Citations 
PageRank 
1367-4803
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Ruhollah Shemirani100.34
Stephane Wenric200.34
Eimear Kenny300.34
José Luis Ambite4958110.89