Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors Academic Article uri icon


  • 2014 IEEE. This paper presents a fusion approach for improving human action recognition based on two differing modality sensors consisting of a depth camera and an inertial body sensor. Computationally efficient action features are extracted from depth images provided by the depth camera and from accelerometer signals provided by the inertial body sensor. These features consist of depth motion maps and statistical signal attributes. For action recognition, both feature-level fusion and decision-level fusion are examined by using a collaborative representation classifier. In the feature-level fusion, features generated from the two differing modality sensors are merged before classification, while in the decision-level fusion, the Dempster-Shafer theory is used to combine the classification outcomes from two classifiers, each corresponding to one sensor. The introduced fusion framework is evaluated using the Berkeley multimodal human action database. The results indicate that because of the complementary aspect of the data from these sensors, the introduced fusion approaches lead to 2% to 23% recognition rate improvements depending on the action over the situations when each sensor is used individually.

published proceedings

  • IEEE Transactions on Human-Machine Systems

author list (cited authors)

  • Chen, C., Jafari, R., & Kehtarnavaz, N.

citation count

  • 205

complete list of authors

  • Chen, Chen||Jafari, Roozbeh||Kehtarnavaz, Nasser

publication date

  • October 2014