Design Issues and Comparison of Methods for Microarray-Based Classification
- Additional Document Info
- View All
© 2006 Springer Science+Business Media, Inc. All rights reserved. 9. Conclusion: Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.
author list (cited authors)
Dougherty, E. R., & Attoor, S. N.
Computational and Statistical Approaches to Genomics