Detecting feature interactions from accuracies of random feature subsets

Interaction among features notoriously causes difficulty for machine learning algorithms because the relevance of one feature for predicting the target class can depend on the values of other features. In this paper, we introduce a new method for detecting feature interactions by evaluating the accuracies of a learning algorithm on random subsets of features. We give an operational definition for feature interactions based on when a set of features allows a learning algorithm to achieve higher than expected accuracy, assuming independence. Then we show how to adjust the sampling of random subsets in a way that is fair and balanced, given a limited amount of time. Finally, we show how decision trees built from sets of interacting features can be converted into DNF expressions to form constructed features. We demonstrate the effectiveness of the method empirically by showing that it can improve the accuracy of the C4.5 decision-tree algorithm on several benchmark databases.

SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99)

Detecting feature interactions from accuracies of random feature subsets Conference Paper