Sanjeev Khudanpur, Johns Hopkins University
Thursday, May 4, 11:00am
Title: Innovations in Acoustic Modeling for Automatic Speech Recognition
We will describe two new developments in neural network based acoustic models for speech recognition. The first one, combining two notable recent advances in Kaldi—Time-Delay Neural Networks (ASpIRE Challenge, Interspeech 2015) and the Lattice-Free Maximum Mutual Information training criterion (Chain Models, Interspeech 2016)—is a TDNN-LSTM network architecture that matches/outperforms bidirectional LSTM (BLSTM) acoustic models without incurring the high training/test complexity and decoding latency of BLSTMs. The second one, inspired by a recent idea called adversarial training, is a modification of the standard stochastic gradient descent (SGD) algorithm used for neural network training that counteracts finite-sample bias by taking a small step in the opposite direction of the SGD step before taking a step in gradient descent direction. Both innovations are still works in progress, and many details are still being understood. The empirical results so far, however, are promising, including a notable reduction in word error rate on far-field speech transcription tasks. So think of this as two back-to-back talk rather than one long presentation.
Sanjeev Khudanpur received a B.Tech in Electrical Engineering from the Indian Institute of Technology, Bombay, in 1988, and a Ph.D in Electrical Engineering from the University of Maryland, College Park, in 1997. Since 1996, he has been on the faculty of the Johns Hopkins University. Until June 2001, he was an Associate Research Scientist in the Center for Language and Speech Processing and, from July 2001 to June 2008, an Assistant Professor in the Department of Electrical and Computer Engineering and the Department of Computer Science; he became an Associate Professor in July 2008. He is a founding member of the Johns Hopkins University Human Language Technology Center of Excellence, and a member of the steering committee of the Johns Hopkins University Science of Learning Institute. His research interests are in the application of information theoretic and statistical methods to human language technologies, including automatic speech recognition, machine translation and information retrieval. In his spare time, he organizes the annual Johns Hopkins Summer Workshops to advance the greater research agenda of this field.
Back to main page