Li, Wenzhe (2012-08). Acoustic Based Sketch Recognition. Master's Thesis. Thesis uri icon

abstract

  • Sketch recognition is an active research field, with the goal to automatically recognize hand-drawn diagrams by a computer. The technology enables people to freely interact with digital devices like tablet PCs, Wacoms, and multi-touch screens. These devices are easy to use and have become very popular in market. However, they are still quite costly and need more time to be integrated into existing systems. For example, handwriting recognition systems, while gaining in accuracy and capability, still must rely on users using tablet-PCs to sketch on. As computers get smaller, and smart-phones become more common, our vision is to allow people to sketch using normal pencil and paper and to provide a simple microphone, such as one from their smart-phone, to interpret their writings. Since the only device we need is a single simple microphone, the scope of our work is not limited to common mobile devices, but also can be integrated into many other small devices, such as a ring. In this thesis, we thoroughly investigate this new area, which we call acoustic based sketch recognition, and evaluate the possibilities of using it as a new interaction technique. We focus specifically on building a recognition engine for acoustic sketch recognition. We first propose a dynamic time wrapping algorithm for recognizing isolated sketch sounds using MFCC(Mel-Frequency Cesptral Coefficients). After analyzing its performance limitations, we propose improved dynamic time wrapping algorithms which work on a hybrid basis, using both MFCC and four global features including skewness, kurtosis, curviness and peak location. The proposed approaches provide both robustness and decreased computational cost. Finally, we evaluate our algorithms using acoustic data collected by the participants using a device's built-in microphone. Using our improved algorithm we were able to achieve an accuracy of 90% for a 10 digit gesture set, 87% accuracy for the 26 English characters and over 95% accuracy for a set of seven commonly used gestures.
  • Sketch recognition is an active research field, with the goal to automatically recognize hand-drawn diagrams by a computer. The technology enables people to freely interact with digital devices like tablet PCs, Wacoms, and multi-touch screens. These devices are easy to use and have become very popular in market. However, they are still quite costly and need more time to be integrated into existing systems. For example, handwriting recognition systems, while gaining in accuracy and capability, still must rely on users using tablet-PCs to sketch on. As computers get smaller, and smart-phones become more common, our vision is to allow people to sketch using normal pencil and paper and to provide a simple microphone, such as one from their smart-phone, to interpret their writings. Since the only device we need is a single simple microphone, the scope of our work is not limited to common mobile devices, but also can be integrated into many other small devices, such as a ring. In this thesis, we thoroughly investigate this new area, which we call acoustic based sketch recognition, and evaluate the possibilities of using it as a new interaction technique. We focus specifically on building a recognition engine for acoustic sketch recognition. We first propose a dynamic time wrapping algorithm for recognizing isolated sketch sounds using MFCC(Mel-Frequency Cesptral Coefficients). After analyzing its performance limitations, we propose improved dynamic time wrapping algorithms which work on a hybrid basis, using both MFCC and four global features including skewness, kurtosis, curviness and peak location. The proposed approaches provide both robustness and
    decreased computational cost. Finally, we evaluate our algorithms using acoustic data collected by the participants using a device's built-in microphone. Using our improved algorithm we were able to achieve an accuracy of 90% for a 10 digit gesture set, 87% accuracy for the 26 English characters and over 95% accuracy for a set of seven commonly used gestures.

publication date

  • August 2012