Introduction

Spoken keyword spotting (also called Spoken Term Detection) refers to the detection of all occurrences of any given word in a speech signal. This webpage presents a spoken keyword spotting benchmark on clean read speech, non-clean read speech and spontaneous speech on several known speech corpora: TIMIT, HTIMIT, WSJ, and OGI Stories. The goal of this benchmark is to propose a standard way to evaluate keyword spotting algorithms. The need for such a standard evaluation method is a result of our paper on Discriminative Keyword Spotting (see references below).


Description

The benchmarks propose a list of keywords and a list of utterances to evaluate a keyword spotter. In all experiments 39 phoneme set defined in Lee and Hon (1989) was used. For each dataset TIMIT, HTIMIT, WSJ, and OGI Stories a list of keywords is given, and for each keyword a list of positive and negative utterances is given. The benchmark data can be downloaded from here [benchmark.tar.gz].




Source Code


How to install?

The code can be downloaded from [kasr.tar.gz]. The code is written in standard C++ and should be compiled on Linux, Mac and Windows. If you don’t have BLAS and ATLAS installed on your system, either install it from http://math-atlas.sourceforge.net/ or remove the compilation flag -D _USE_ATLAS_ from the Makefile.

A more use friendly version of the source code would be published very soon. If you have any problems using the code, don’t hesitate to contact me (jkeshet at ttic dot edu).


Useful References and Links

  1. Bullet J. Keshet, D. Grangier and S. Bengio, Discriminative Keyword Spotting, Speech Communication, Volume 51, Issue 4, pp. 317-329, April 2009.

  2. Bullet Corpora Group at CSLU/OGI

  3. Bullet LDC - Linguistic Data Consortium

  4. Bullet NIST Spoken Term Detection Portal

Spoken Keyword Spotting Benchmark

Joseph Keshet, David Grangier and Samy Bengio

Results

The results presented here are based on a kernel-based discriminative system presented in our paper Discriminative Keyword Spotting and on a context-independent HMM-based system based on Torch3 library. Both systems were trained on the TIMIT training set and evaluated on the different corpora without further adaptation or re-training. A detailed description of the training is given in the paper. The results are given in terms of detection of a keyword per utterance.

ROC curves of the models generated from the discriminative system and the context-independent HMM-based system evaluated on 80 keywords from WSJ test set.
 
ROC curves of the models generated from the discriminative system and the context-independent HMM-based system evaluated on 80 keywords from TIMIT test set.
 
On the left side is a summary of the empirical performance of the discriminative system and the context-independent HMM-based system in all experiments.
 

Last update: Oct 9, 2009

new revised code and scripts