Bias properties of Bayesian statistics in finite mixture of negative binomial regression models in crash data analysis.
Academic Article
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
Factors that cause heterogeneity in crash data are often unknown to researchers and failure to accommodate such heterogeneity in statistical models can undermine the validity of empirical results. A recently proposed finite mixture for the negative binomial regression model has shown a potential advantage in addressing the unobserved heterogeneity as well as providing useful information about features of the population under study. Despite its usefulness, however, no study has been found to examine the performance of this finite mixture under various conditions of sample sizes and sample-mean values that are common in crash data analysis. This study investigated the bias associated with the Bayesian summary statistics (posterior mean and median) of dispersion parameters in the two-component finite mixture of negative binomial regression models. A simulation study was conducted using various sample sizes under different sample-mean values. Two prior specifications (non-informative and weakly-informative) on the dispersion parameter were also compared. The results showed that the posterior mean using the non-informative prior exhibited a high bias for the dispersion parameter and should be avoided when the dataset contains less than 2,000 observations (even for high sample-mean values). The posterior median showed much better bias properties, particularly at small sample sizes and small sample means. However, as the sample size increases, the posterior median using the non-informative prior also began to exhibit an upward-bias trend. In such cases, the posterior mean or median with the weakly-informative prior provided smaller bias. Based on simulation results, guidelines about the choice of priors and the summary statistics to use are presented for different sample sizes and sample-mean values.