Generating Gestural Scores from Acoustics Through a Sparse Anchor-Based Representation of Speech Conference Paper uri icon


  • Copyright 2016 ISCA. We present a procedure for generating gestural scores from speech acoustics. The procedure is based on our recent SABR (sparse, anchor-based representation) algorithm, which models the speech signal as a linear combination of acoustic anchors. We present modifications to SABR that encourage temporal smoothness by restricting the number of anchors that can be active over an analysis window. We propose that peaks in the SABR weights can be interpreted as "keyframes" that determine when vocal tract articulations occur. We validate the approach in two ways. First, we compare SABR keyframes to maxima in the velocity of electromagnetic articulography (EMA) pellets from an articulatory corpus. Second, we use keyframes and SABR weights to build a gestural score for the VocalTractLab (VTL) model, and compare synthetic EMA trajectories generated by VTL against those in the articulatory corpus. We find that SABR keyframes occur within 15-20 ms (on average) of EMA maxima, suggesting that SABR keyframes can be used to identify articulatory phenomena. However, comparison of synthetic and real EMA pellets show moderate correlation on tongue pellets but weak correlation on lip pellets, a result that may be due to differences between the VTL speaker model and the source speaker in our corpus.

name of conference

  • Interspeech 2016

published proceedings

  • Interspeech 2016

author list (cited authors)

  • Liberatore, C., & Gutierrez-Osuna, R.

citation count

  • 2

complete list of authors

  • Liberatore, Christopher||Gutierrez-Osuna, Ricardo

editor list (cited editors)

  • Morgan, N.

publication date

  • September 2016