Crawling and Classification Strategies for Generating a Multi-Language Corpus of Sign Language Video Conference Paper uri icon


  • 2019 IEEE. Although there is considerable sign language content available online, it can be hard to locate content in a specific sign language on a particular topic. The Sign Language Digital Library (SLaDL) aims to improve access through the generation of a multi-language corpus of sign language video. SLaDL uses a combination of crawling to collect potential sign language content and applying multimodal sign language detection and identification classifiers to winnow the collected videos to those believed to be in a particular sign language. Here we compare the quantity and variety of sign language videos located via breadth-first, depth-first, and focused crawling strategies. Then we examine the accuracy of different approaches to combining textual metadata and video features for the 3-way classification task of identifying videos in American Sign Language (ASL), British Sign Language (BSL), and without-sign language. Finally, due to the high computational cost of generating the video features used for classification, we explore the tradeoffs when using a cascading classifier and when generating features based on motion in sampled frames on classifier accuracy.

name of conference

  • 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL)

published proceedings


author list (cited authors)

  • Shipman, F. M., & Monteiro, C.

citation count

  • 1

complete list of authors

  • Shipman, Frank M||Monteiro, Caio DD

publication date

  • January 2019