CAREER: Theory and Application of Small-Sample Error Estimation in Genomic Signal Processing
NSF Proposal #0845407Title: Theory and Application of Small-Sample Error Estimation in Genomic Signal ProcessingP.I.: Ulisses Braga-NetoProject AbstractGenomic Signal Processing (GSP) is the engineering discipline thatstudies modeling and statistical issues related to biological signalsmeasured by high-throughput technology, such as gene-expressionmicroarrays or protein-abundance mass spectrometry. Research in GSPtypically involve the discovery of reliable molecular markers fordisease diagnosis and prognosis, using pattern recognition or machinelearning approaches. Such approaches rely on the accuracy of errorestimation for classification and prediction. This is particularlycritical due to the small sample sizes that are common in GSPapplications. Novel robust small-sample error estimation methodologiesin GSP are needed in order to enable reproducible scientific discoverythat leads to genuine medical advancement.This research has as its goal solving significant computational andstatistical problems that exist in small-sample error estimation.Among the open problems that will be addressed are (1) to obtain exactand approximate representations of the joint sampling distribution ofthe estimated and true errors for linear continuous classifiers, whichwill lead to better-performing error estimators and practical tools toassess significance of results; (2) to study error estimation fordiscrete classifiers, including the binary coefficient ofdetermination (CoD), using both analytical and complete enumerationapproaches; (3) to develop the methodology of bolstered errorestimation, addressing the application in high-dimensional spaces andwith adaptive kernels, with an emphasis on feature selection; (4) toapply these error estimation techniques to the problem of biomarkerdiscovery for diagnosis and prognosis in cancer and infectiousdiseases, in partnership with medical collaborators at TranslationalGenomics (TGen), the Johns Hopkins Medical School, and the OswaldoCruz Foundation, Brazil (FIOCRUZ).