Data driven articulatory synthesis with deep neural networks

abstract

2015 Elsevier Ltd. All rights reserved. The conventional approach for data-driven articulatory synthesis consists of modeling the joint acoustic-articulatory distribution with a Gaussian mixture model (GMM), followed by a post-processing step that optimizes the resulting acoustic trajectories. This final step can significantly improve the accuracy of the GMM frame-by-frame mapping but is computationally intensive and requires that the entire utterance be synthesized beforehand, making it unsuited for real-time synthesis. To address this issue, we present a deep neural network (DNN) articulatory synthesizer that uses a tapped-delay input line, allowing the model to capture context information in the articulatory trajectory without the need for post-processing. We characterize the DNN as a function of the context size and number of hidden layers, and compare it against two GMM articulatory synthesizers, a baseline model that performs a simple frame-by-frame mapping, and a second model that also performs trajectory optimization. Our results show that a DNN with a 60-ms context window and two 512-neuron hidden layers can synthesize speech at four times the frame rate - comparable to frame-by-frame mappings, while improving the accuracy of trajectory optimization (a 9.8% reduction in Mel Cepstral distortion). Subjective evaluation through pairwise listening tests also shows a strong preference toward the DNN articulatory synthesizer when compared to GMM trajectory optimization.

authors

Gutierrez-Osuna, Ricardo

published proceedings

COMPUTER SPEECH AND LANGUAGE

author list (cited authors)

Aryal, S., & Gutierrez-Osuna, R.

citation count

21

complete list of authors

Aryal, Sandesh||Gutierrez-Osuna, Ricardo

publication date

January 2016

publisher

Elsevier Publisher

published in

Computer Speech and Language Journal

keywords

Articulatory Synthesis
Deep Learning
Electromagnetic Articulography
Gaussian Mixture Models

Digital Object Identifier (DOI)

10.1016/j.csl.2015.02.003

start page

260

end page

273

volume

36

URL

http%3A%2F%2Fdx.doi.org%2F10.1016%2Fj.csl.2015.02.003

Data driven articulatory synthesis with deep neural networks Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

Other

URL