Workshop on Speech Production in Automatic Speech Recognition
August 30, 2013
Speaker: Carol Espy-Wilson (University of Maryland)
Title: The Invariant Property of Gestures
Variability in speech particularly as a consequence of production rate is still a great challenge in the development of automatic speech recognition (ASR) systems that perform well with minimal constraints. Articulatory Phonology provides a unified framework for understanding the resulting acoustic consequences of changes in speech production due to gestural overlap and gestural reduction that are often reported as assimilations, insertions, deletions and substitutions. In this talk, I will discuss the development of our speech inversion system, and its ability to extract vocal tract constriction variables and, hence, gestures from speech spoken at different speaking rates. We have conducted several studies to show that augmenting acoustic features with such articulatory information improves the robustness of ASR systems in noise. An additional goal is to provide a framework that models in a seamless way speech variability due to coarticulation and lenition.