EFFECTS OF PARTIAL REPORTING OF CLASSIFICATION RESULTS Conference Paper uri icon

abstract

  • When proposing a new classification scheme, perhaps in the form of a classification rule or feature selection method, modelers in the bioinformatics literature typically report its performance on data sets of interest, such as gene-expression microarrays. These data sets often include thousands of features but a small number of sample points, which increases variability in feature selection and error estimation, resulting in highly imprecise reported performances. This suggests that the reported performance of the proposed scheme would be less correlated with and highly biased from the actual performance if only the best results are demonstrated. This paper confirms this by showing the behavior of the joint distributions of the minimum reported estimated errors and corresponding true errors as functions of the number of samples tested in a large simulation study using both modeled and real data. 2010 IEEE.

name of conference

  • 2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)

published proceedings

  • 2010 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS)

author list (cited authors)

  • Yousefi, M. R., Hua, J., Chao, S., & Dougherty, E. R.

citation count

  • 0

complete list of authors

  • Yousefi, Mohammadmahdi R||Hua, Jianping||Chao, Sima||Dougherty, Edward R

publication date

  • November 2010