SABR: Sparse, Anchor-Based Representation of the Speech Signal Conference Paper uri icon

abstract

  • Copyright 2015 ISCA. We present SABR (Sparse, Anchor-Based Representation), an analysis technique to decompose the speech signal into speaker-dependent and speaker-independent components. Given a collection of utterances for a particular speaker, SABR uses the centroid for each phoneme as an acoustic "anchor," then applies Lasso regularization to represent each speech frame as a sparse non-negative combination of the anchors. We illustrate the performance of the method on a speaker-independent phoneme recognition task and a voice conversion task. Using a linear classifier, SABR weights achieve significantly higher phoneme recognition rates than Mel frequency Cepstral coefficients. SABR weights can also be used directly to perform accent conversion without the need to train a speakerto- speaker regression model.

name of conference

  • Interspeech 2015

published proceedings

  • 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5

author list (cited authors)

  • Liberatore, C., Aryal, S., Wang, Z., Polsley, S., & Gutierrez-Osuna, R.

citation count

  • 5

complete list of authors

  • Liberatore, Christopher||Aryal, Sandesh||Wang, Zelun||Polsley, Seth||Gutierrez-Osuna, Ricardo

publication date

  • January 2015