Title
Identification of phenotype-relevant differentially expressed genes in breast cancer demonstrates enhanced quantile discretization protocol's utility in multi-platform microarray data integration.
Abstract
Microarray for transcriptomics experiments often suffer from limited statistical power due to small sample size. Quantile discretization (QD) maps expression values for a sample into a series of equivalently sized 'bins' that represent a discrete numerical range, e.g. -4 to +4, which enables normalized data from multiple experiments and/or expression platforms to be combined for re-analysis. We found, however, that informal selection of bin numbers often resulted in loss of the underlying correlation structure in the data through assigning of the same numerical value to genes that are in reality expressed at significantly different levels within a sample. Here we report a procedure for determining an optimal bin number for dataset. Applying this to integrated public breast cancer datasets enabled statistical identification of several differentially expressed tumorigenesis-related genes that were not found when analyzing the individual datasets, and also several cancer biomarkers not previously indicated as having utility in the disease. Notably, differential modulation of translational control and protein synthesis via multiple pathways were found to potentially have central roles in breast cancer development and progression. These findings suggest that our protocol has significant utility in making meaningful novel biomedical discoveries by leveraging the large public expression data repositories.
Year
DOI
Venue
2016
10.1142/S0219720016500220
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY
Keywords
Field
DocType
Microarray,cross-platform integration,differential expression,quantile discretization,statistical analysis
Data mining,Normalization (statistics),Breast cancer,Biology,Correlation,Quantile,Microarray analysis techniques,Bioinformatics,Cancer biomarkers,Statistical power,Sample size determination
Journal
Volume
Issue
ISSN
14
SP5
0219-7200
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Darlington S. Mapiye100.34
Alan Christoffels210010.81
Junaid Gamieldien381.83