Determining relevant features to recognize electron density patterns in x-ray protein crystallography.
Additional Document Info
High-throughput computational methods in X-ray protein crystallography are indispensable to meet the goals of structural genomics. In particular, automated interpretation of electron density maps, especially those at mediocre resolution, can significantly speed up the protein structure determination process. TEXTAL(TM) is a software application that uses pattern recognition, case-based reasoning and nearest neighbor learning to produce reasonably refined molecular models, even with average quality data. In this work, we discuss a key issue to enable fast and accurate interpretation of typically noisy electron density data: what features should be used to characterize the density patterns, and how relevant are they? We discuss the challenges of constructing features in this domain, and describe SLIDER, an algorithm to determine the weights of these features. SLIDER searches a space of weights using ranking of matching patterns (relative to mismatching ones) as its evaluation function. Exhaustive search being intractable, SLIDER adopts a greedy approach that judiciously restricts the search space only to weight values that cause the ranking of good matches to change. We show that SLIDER contributes significantly in finding the similarity between density patterns, and discuss the sensitivity of feature relevance to the underlying similarity metric.