Introduction
Spoken keyword spotting (also called Spoken Term Detection) refers to the detection of all occurrences of any given word in a speech signal. This webpage presents a spoken keyword spotting benchmark on clean read speech, non-clean read speech and spontaneous speech on several known speech corpora: TIMIT, HTIMIT, WSJ, and OGI Stories. The goal of this benchmark is to propose a standard way to evaluate keyword spotting algorithms. The need for such a standard evaluation method is a result of our paper on Discriminative Keyword Spotting (see references below).
Description
The benchmarks propose a list of keywords and a list of utterances to evaluate a keyword spotter. In all experiments 39 phoneme set defined in Lee and Hon (1989) was used. For each dataset TIMIT, HTIMIT, WSJ, and OGI Stories a list of keywords is given, and for each keyword a list of positive and negative utterances is given. The benchmark data can be downloaded from here [benchmark.tar.gz].
Source Code
How to install?
The code can be downloaded from [kasr.tar.gz]. The code is written in standard C++ and should be compiled on Linux, Mac and Windows. If you don’t have BLAS and ATLAS installed on your system, either install it from http://math-atlas.sourceforge.net/ or remove the compilation flag -D _USE_ATLAS_ from the Makefile.
A more use friendly version of the source code would be published very soon. If you have any problems using the code, don’t hesitate to contact me (jkeshet at ttic dot edu).
Useful References and Links
-
J. Keshet, D. Grangier and S. Bengio, Discriminative Keyword Spotting, Speech Communication, Volume 51, Issue 4, pp. 317-329, April 2009.



