Modeling over-dispersed crash data with a long tail: Examining the accuracy of the dispersion parameter in Negative Binomial models Academic Article uri icon

abstract

  • © 2015 Elsevier Ltd All rights reserved. Despite many statistical models that have been proposed for modeling motor vehicle crashes, the most commonly used statistical tool remains the Negative Binomial (NB) model. Crash data collected for safety studies may exhibit over-dispersion and a long tail (i.e., a few sites have unusually high number of crashes). However, some studies have shown that NB models cannot handle over-dispersed count data with a long tail adequately. So far, no work has investigated the performance of the dispersion parameter of the NB model when analyzing over-dispersed crash data with a long tail. The dispersion parameter of the NB model plays an important role in various types of transportation safety analysis. The first objective of this study is to examine whether the dispersion parameter can truly reflect the level of dispersion in over-dispersed crash data with a long tail. The second objective is to determine whether the dispersion term of the Sichel (SI) model can be used as an alternative to the dispersion parameter of the NB model. To accomplish the objectives of this study, crash data sets are simulated from NB and SI regression models using different values describing the mean and the dispersion level. For the simulated data sets, the dispersion parameter and dispersion term are estimated and compared to the true values. To complement the output of the simulation study, crash data collected in Texas are also used to compare the dispersion parameter and dispersion term. The results from this study suggest that the dispersion parameter of the NB model can erroneously estimate the level of dispersion in over-dispersed count data with a long tail and the dispersion term of the SI model is more reliable in estimating the true level of dispersion. Thus, considering the findings in this study, it is believed that the dispersion term may offer a viable alternative for analyzing over-dispersed crash data with a long tail.

author list (cited authors)

  • Zou, Y., Wu, L., & Lord, D.

citation count

  • 36

publication date

  • January 2015