MacVisSTA: a system for multimodal analysis
Conference Paper
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
The study of embodied communication requires access to mul-tiple data sources such as multistream video and audio, various derived and meta-data such as gesture, head, posture, facial expression and gaze information. The common element that runs through these data is the co-temporality of the multiple modes of behavior. In this paper, we present the multimedia Visualization for Situated Temporal Analysis (MacVisSTA) system for the analysis of multimodal human communication through video, audio, speech transcriptions, and gesture and head orientation data. The system uses a multiple linked representation strategy in which different rep-resentations are linked by the current time focus. In this framework, the multiple display components associated with the disparate data types are kept in synchrony, each compo-nent serving as both a controller of the system as well as a display. Hence the user is able to analyze and manipulate the data from different analytical viewpoints (e.g. through the time-synchronized speech transcription or through motion segments of interest). MacVisSTA supports analysis of the synchronized data at varying timescales. It provides an annotation interface that permits users to code the data into 'music-score' objects, and to make and organize multimedia observa-tions about the data. Hence MacVisSTA integrates flexible visualization with annotation within a single framework. An XML database manager has been created for storage and search of annotation data. We compare the system with other existing annotation tools with respect to functionality and interface design. The software runs on Macintosh OS X computer systems. Copyright 2004 ACM.
name of conference
Proceedings of the 6th international conference on Multimodal interfaces