A Mining Technique Using $N$n-Grams and Motion Transcripts for Body Sensor Network Data Repository
Additional Document Info
Recent years have witnessed a large influx of applications in the field of cyber-physical systems. An important class of these systems is body sensor networks (BSNs) where lightweight embedded processors and communication systems are tightly coupled with the human body. BSNs can provide researchers, care providers and clinicians access to tremendously valuable information extracted from data that are collected in users' natural environment. With this information, one can monitor the progression of a disease, identify its early onset, or simply assess user's wellness. One major obstacle is managing repositories that store the large amount of sensing data. To address this issue, we propose a data mining approach inspired by the experience in the areas of text and natural language processing. We represent sensor readings with a sequence of characters, called motion transcripts. Transcripts reduce complexity of the data significantly while maintaining morphological and structural properties of the physiological signals. To further take advantage of the physiological signal's structure, our data mining technique focuses on the characteristic transitions in the signals. These transitions are efficiently captured using the concept of n-grams. To facilitate a lightweight and fast mining approach, we reduce the overwhelmingly large number of n-grams via information gain (IG) feature selection. We report the effectiveness of the proposed approach in terms of the speed of mining while maintaining an acceptable accuracy in terms of the F-score combining both precision and recall. 2012 IEEE.