A probabilistic theory of clustering - Texas A&M University (TAMU) Scholar

abstract

Data clustering is typically considered a subjective process, which makes it problematic. For instance, how does one make statistical inferences based on clustering? The matter is different with pattern classification, for which two fundamental characteristics can be stated: (1) the error of a classifier can be estimated using "test data," and (2) a classifier can be learned using "training data." This paper presents a probabilistic theory of clustering, including both learning (training) and error estimation (testing). The theory is based on operators on random labeled point processes. It includes an error criterion in the context of random point sets and representation of the Bayes (optimal) cluster operator for a given random labeled point process. Training is illustrated using a nearest-neighbor approach, and trained cluster operators are compared to several classical clustering algorithms. 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

authors

Dougherty, Edward

published proceedings

Pattern Recognition

author list (cited authors)

Dougherty, E. R., & Brun, M.

citation count

42

complete list of authors

Dougherty, Edward R||Brun, Marcel

publication date

May 2004

publisher

Elsevier Publisher

published in

Pattern Recognition Journal

Digital Object Identifier (DOI)

10.1016/j.patcog.2003.10.003

start page

917

end page

925

volume

37

issue

5

URL

http%3A%2F%2Fdx.doi.org%2F10.1016%2Fj.patcog.2003.10.003

A probabilistic theory of clustering Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL