Detecting feature interactions from accuracies of random feature subsets
Conference Paper
Overview
Identity
Additional Document Info
Other
View All
Overview
abstract
Interaction among features notoriously causes difficulty for machine learning algorithms because the relevance of one feature for predicting the target class can depend on the values of other features. In this paper, we introduce a new method for detecting feature interactions by evaluating the accuracies of a learning algorithm on random subsets of features. We give an operational definition for feature interactions based on when a set of features allows a learning algorithm to achieve higher than expected accuracy, assuming independence. Then we show how to adjust the sampling of random subsets in a way that is fair and balanced, given a limited amount of time. Finally, we show how decision trees built from sets of interacting features can be converted into DNF expressions to form constructed features. We demonstrate the effectiveness of the method empirically by showing that it can improve the accuracy of the C4.5 decision-tree algorithm on several benchmark databases.