Using Topic Modeling to Develop Multi-level Descriptions of Naturalistic Driving Data from Drivers with and without Sleep Apnea.
Academic Article
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
One challenge in using naturalistic driving data is producing a holistic analysis of these highly variable datasets. Typical analyses focus on isolated events, such as large g-force accelerations indicating a possible near-crash. Examining isolated events is ill-suited for identifying patterns in continuous activities such as maintaining vehicle control. We present an alternative approach that converts driving data into a text representation and uses topic modeling to identify patterns across the dataset. This approach enables the discovery of non-linear patterns, reduces the dimensionality of the data, and captures subtle variations in driver behavior. In this study topic models are used to concisely described patterns in trips from drivers with and without untreated obstructive sleep apnea (OSA). The analysis included 5000 trips (50 trips from 100 drivers; 66 drivers with OSA; 34 comparison drivers). Trips were treated as documents, and speed and acceleration data from the trips were converted to "driving words." The identified patterns, called topics, were determined based on regularities in the co-occurrence of the driving words within the trips. This representation was used in random forest models to predict the driver condition (i.e., OSA or comparison) for each trip. Models with 10, 15 and 20 topics had better accuracy in predicting the driver condition, with a maximum AUC of 0.73 for a model with 20 topics. Trips from drivers with OSA were more likely to be defined by topics for smaller lateral accelerations at low speeds. The results demonstrate topic modeling as a useful tool for extracting meaningful information from naturalistic driving datasets.