The coefficient of intrinsic dependence (feature selection using el CID) Academic Article uri icon

abstract

  • Measuring the strength of dependence between two sets of random variables lies at the heart of many statistical problems, in particular, feature selection for pattern recognition. We believe that there are some basic desirable criteria for a measure of dependence not satisfied by many commonly employed measures, such as the correlation coefficient, Briefly stated, a measure of dependence should: (1) be model-free and invariant under monotone transformations of the marginals; (2) fully differentiate different levels of dependence; (3) be applicable to both continuous and categorical distributions; (4) should not have the dependence of X on Y be necessarily the same as the dependence of Y on X; (5) be readily estimated from data; and (6) be straightforwardly extended to multivariate distributions. The new measure of dependence introduced in this paper, called the coefficient of intrinsic dependence (CID), satisfies these criteria. The main motivating idea is that Y is strongly (weakly, resp.) dependent on X if and only if the conditional distribution of Y given X is significantly (mildly, resp.) different from the marginal distribution of Y. We measure the difference by the normalized integrated square difference distance so that the full range of dependence can be adequately reflected in the interval [0, 1]. The paper treats estimation of the CID, provides simulations and comparisons, and applies the CID to gene prediction and cancer classification based on gene-expression measurements from microarrays. 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

published proceedings

  • PATTERN RECOGNITION

author list (cited authors)

  • Hsing, T. L., Liu, L. Y., Brun, M., & Dougherty, E. R.

citation count

  • 20

complete list of authors

  • Hsing, TL||Liu, LY||Brun, M||Dougherty, ER

publication date

  • May 2005