Comparing Visual, Textual, and Multimodal Features for Detecting Sign Language in Video Sharing Sites Conference Paper uri icon

abstract

  • © 2018 IEEE. Easy recording and sharing of video content has led to the creation and distribution of increasing quantities of sign language (SL) content. Current capabilities make locating SL videos on a desired topic dependent on the existence and correctness of metadata indicating both the language and topic of the video. Automated techniques to detect sign language content can aid this problem. This paper compares metadata-based classifiers and multimodal classifiers, using both early and late fusion techniques, with video content-based classifiers in the literature. Comparisons of applying TF-IDF, LDA, and NMF in the generation of metadata features indicates that NMF performs best, either when used independently or when combined with video features. Multimodal classifiers perform better than unimodal SL video classifiers. Experiments show multimodal features obtained results of up to 86% precision, 81% recall, and 84% F1 score. This represents an improvement on F1 score of roughly 9% in comparison with the video-based approach presented in the literature and an improvement of 6% over text-based features extracted using NMF.

author list (cited authors)

  • Monteiro, C., Shipman, F., & Gutierrez-Osuna, R.

citation count

  • 3

publication date

  • April 2018

publisher