This directory contains data based on the University of Wisconsin X-ray Microbeam Database (referred to here as XRMB).
The original XRMB manual can be found here: http://www.haskins.yale.edu/staff/gafos_downloads/ubdbman.pdf
We acknowledge John Westbury for providing the original data and for permitting this post-processed version to be redistributed. The original data collection was supported (in part) by research grant number R01 DC 00820 from the National Institute of Deafness and Other Communicative Disorders, U.S. National Institutes of Health.
The post-processed data provided here was produced as part of work supported in part by NSF grant IIS-1321015.
Some of the original XRMB articulatory data was missing due to issues such as pellet tracking errors. The data has been reconstructed in using the algorithm described in this paper:
Wang, Arora, and Livescu, "Reconstruction of articulatory measurements with smoothed low-rank matrix completion," SLT 2014.
http://ttic.edu/livescu/papers/wang_SLT2014.pdf
The data provided here has been used for multi-view acoustic feature learning in this paper:
Tang, Wang and Livescu, "Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis"
If you use this version of the data, please cite the papers above. This version of the data is slightly different from data used in the following paper:
Wang, Arora, Livescu, and Bilmes, "Unsupervised learning of acoustic features via deep canonical correlation analysis," ICASSP 2015.
http://ttic.edu/livescu/papers/wang_ICASSP2015.pdf
The difference between the two versions is that all silence frames are also included in this version. For the version of the data used in the ICCASSP paper, please refer to 'http://ttic.uchicago.edu/~klivescu/XRMB_data/full/README';
There are in total 10 .mat files which are downloadable : (XRMB_SEQ.mat, train_z.mat, tune_z.mat, test_z.mat, train_h1.mat, tune_h1.mat, test_h1.mat, train_h2.mat, tune_h2.mat and test_h2.mat).
******************
About XRMB_SEQ.mat
******************
XRMB_SEQ includes both acoustic, articulatory measurements and also frame-wise labels of this version of XRMB data. Here are the details:
'MFCCS' -- 2430668 x 39, acoustic features
'ARTICS' -- 2430668 x 16, articulatory features
'LENGTHS' -- 2357 x 1, length of 2357 sequences (total length of the 2357 sequences equal to 2430668)
'LABELS' -- 2430668 x 1, per frame label. There are totally 41 labels (0-40),
aa 0
ae 1
ah 2
aw 3
ay 4
b 5
ch 6
d 7
dh 8
dx 9
eh 10
er 11
ey 12
f 13
g 14
hh 15
ih 16
iy 17
jh 18
k 19
l 20
m 21
n 22
ng 23
ow 24
oy 25
p 26
r 27
s 28
sh 29
t 30
th 31
uh 32
uw 33
v 34
w 35
y 36
z 37
zh 38
cg 39
sil 40
***********************************************
About train_z.mat, tune_z.mat, test_z.mat,
train_h1.mat, tune_h1.mat, test_h1.mat,
train_h2.mat, tune_h2.mat and test_h2.mat
***********************************************
We first concatenated both articulatory and acoustic measurements over a 35-frame window around each frame, giving 560D and 1365D inputs for each view respectively. Then we trained a VCCAP (Variational Canonical Correlation Analysis with Private Variables, as mentioned in the Interspeech paper) model, and used the trained model to generate the 9 .mat files.
train_z.mat, tune_z.mat and test_z.mat include 'mean' and 'var' of the posterior distributions of common variables z for train, tune and test sets respectively.
train_h1.mat, tune_h1.mat and test_h1.mat include 'mean' and 'var' of the posterior distributions of private variables h1 for train, tune and test sets respectively.
train_h2.mat, tune_h2.mat and test_h2.mat include 'mean' and 'var' of the posterior distributions of private variables h2 for train, tune and test sets respectively.
The 9 .mat files (indicating posterior distributions learned from 35-frame concatenated data), can serve as new priors for training VCCAP with a larger context window (e.g 100-frame concatenated data).
****************
To download Data
****************
http://ttic.uchicago.edu/~qmtang/Data/Interspeech2017/XRMB_SEQ.mat
http://ttic.uchicago.edu/~qmtang/Data/Interspeech2017/train_z.mat
http://ttic.uchicago.edu/~qmtang/Data/Interspeech2017/tune_h1.mat
http://ttic.uchicago.edu/~qmtang/Data/Interspeech2017/test_h2.mat