Cross-validation and the estimation of probability distributions with categorical data
Overview
Research
Identity
Additional Document Info
View All
Overview
abstract
In this paper, we consider the problem of estimating a joint distribution that is defined over a set of discrete variables. We use a smoothing kernel estimator to estimate the joint distribution. We allow for the case in which some of the discrete variables are uniformly distributed, and explicitly address the vector-valued smoothing parameter case due to its practical relevance. We show that the cross-validated smoothing parameters differ in their asymptotic behavior depending on whether a variable is uniformly distributed or not. We also discuss the mixed discrete and continuous variable case. Simulations show that the proposed estimator performs much better than the commonly used frequency estimator.