Evaluation of geometric & probabilistic distance measures to retrieve electron density patterns for protein structure determination
Additional Document Info
Similarity between cases in pattern recognition is typically measured by computing distances between feature vectors. This paper evaluates the effectiveness of various measures of similarity in retrieving good matches in TEXTAL, a system that uses nearest neighbor learning to retrieve matching 3D patterns of electron density to incrementally determine the structure of proteins by X-ray crystallography. We investigate various geometric measures of similarity, including Euclidean, Manhattan (city-block, or Ll), the generalized Minkowsky metric (Lm) and the Cosine measure. We also experiment with a probabilistic distance metric - a likelihood measure based on the Bayesian classifier. Our experiments in the protein crystallography domain show that the probabilistic measure of similarity outperforms geometric ones significantly. We present a general framework for efficient pattern retrieval from a large database using feature-based matching, and argue that probabilistic and statistical measures of similarity are more robust in noisy, high-dimensional feature spaces representing visual patterns.