Quality-Based Distance Measures and Applications to Clustering Conference Paper uri icon

abstract

  • When analyzing biological data sets, a common approach is to partition the data into clusters. Examples of this include finding a subset of genes with co-regulated expression among experiments, grouping similar disease phenotypes, or implicating regions of genetic variation in disease. The ability to separate the data into subsets depends upon the structure of the distribution of points and the choice of clustering algorithm. Furthermore, the biological relevance of the clustering results is biased by the variation among the data points themselves. We introduce a mathematical quality-based distance metric which will allow all data, regardless of its error, to be included in analysis without the need to introduce a cutoff. This removes the need to exclude points or to change the dimensionality. The advantage of this approach is shown by clustering simulated data with added noise. 2006 IEEE.

name of conference

  • 2006 IEEE/NLM Life Science Systems and Applications Workshop

published proceedings

  • 2006 IEEE/NLM Life Science Systems and Applications Workshop

author list (cited authors)

  • Taverna, D. M., Brun, M., Dougherty, E. R., & Chen, Y.

citation count

  • 0

complete list of authors

  • Taverna, Darin M||Brun, Marcel||Dougherty, Edward R||Chen, Yidong

publication date

  • July 2006