Finite mixture modeling for vehicle crash data with application to hotspot identification.
Academic Article
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
The application of finite mixture regression models has recently gained an interest from highway safety researchers because of its considerable potential for addressing unobserved heterogeneity. Finite mixture models assume that the observations of a sample arise from two or more unobserved components with unknown proportions. Both fixed and varying weight parameter models have been shown to be useful for explaining the heterogeneity and the nature of the dispersion in crash data. Given the superior performance of the finite mixture model, this study, using observed and simulated data, investigated the relative performance of the finite mixture model and the traditional negative binomial (NB) model in terms of hotspot identification. For the observed data, rural multilane segment crash data for divided highways in California and Texas were used. The results showed that the difference measured by the percentage deviation in ranking orders was relatively small for this dataset. Nevertheless, the ranking results from the finite mixture model were considered more reliable than the NB model because of the better model specification. This finding was also supported by the simulation study which produced a high number of false positives and negatives when a mis-specified model was used for hotspot identification. Regarding an optimal threshold value for identifying hotspots, another simulation analysis indicated that there is a discrepancy between false discovery (increasing) and false negative rates (decreasing). Since the costs associated with false positives and false negatives are different, it is suggested that the selected optimal threshold value should be decided by considering the trade-offs between these two costs so that unnecessary expenses are minimized.