Quality-Based Distance Measures and Applications to Clustering
Conference Paper
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
When analyzing biological data sets, a common approach is to partition the data into clusters. Examples of this include finding a subset of genes with co-regulated expression among experiments, grouping similar disease phenotypes, or implicating regions of genetic variation in disease. The ability to separate the data into subsets depends upon the structure of the distribution of points and the choice of clustering algorithm. Furthermore, the biological relevance of the clustering results is biased by the variation among the data points themselves. We introduce a mathematical quality-based distance metric which will allow all data, regardless of its error, to be included in analysis without the need to introduce a cutoff. This removes the need to exclude points or to change the dimensionality. The advantage of this approach is shown by clustering simulated data with added noise. 2006 IEEE.
name of conference
2006 IEEE/NLM Life Science Systems and Applications Workshop