Adapting indexing trees to data distribution in feature spaces Academic Article uri icon

abstract

  • Fast similarity retrieval is critical for content-based image retrieval systems. Tree indexing is a classical technique for fast retrieval, but the practical performance increase offered by the indexing tree depends on the intrinsic dimension of the data. Data with a low intrinsic dimension can be indexed more efficiently than data with high intrinsic dimension. This suggests that an indexing tree that is adapted to the data distribution may be more efficient. This paper proposes two adaptation procedures that are guaranteed to improve indexing efficiency. The procedures are based on a formula for average number of node tests incurred during the retrieval. The formula clearly shows how indexing performance varies with the distribution of feature points and the query. Greedy and optimal tree adaptation procedures are derived based on the formula. Both procedures explicitly enhance the retrieval performance of indexing trees. The optimally adapted tree carries the mathematical guarantee that it is the best performing tree in a set of possible trees obtained by node elimination. The adaptation procedures are applied to kdb-trees and hierarchical clustering trees for indexing synthetic as well as real data sets in medical image databases. Experimental results validate the claim that adaptation procedures increase retrieval efficiency. © 2009 Elsevier Inc. All rights reserved.

author list (cited authors)

  • Qian, X., & Tagare, H. D.

citation count

  • 3

publication date

  • January 2010