Below are resources that I helped develop in or before 2017. For more recent resources, see links under relevant paper entries.


Training and validation data created for LAMBADA word prediction task; described here.  [lambada-train-valid.tar.gz (330MB)]
Manual analysis of 100 LAMBADA instances from paper above.  [lambada-analysis.tar.gz]

Code for training charagram models and pre-trained models from this EMNLP16 paper (developed by John Wieting)  [link]

Who-did-What reading comprehension dataset from this EMNLP16 paper  [link]

Resources for commonsense knowledge representation from ACL16 paper  [link]

Code for training paragram phrase embeddings and other models from ICLR16 paper (developed by John Wieting)  [link]

Pretrained paragram word embeddings and annotated phrase similarity datasets (developed by John Wieting)  [link]

Rampion, a framework for training statistical machine translation models  [link]

Twitter part-of-speech tagger and tweets manually annotated with part-of-speech tags  [link]

NFL game data and aligned tweets  [link]

Code for performing inference for monolingual and bilingual gappy pattern models  [link] [sample patterns]

Code to find trigger word pairs using mutual information (reimplementation of Rosenfeld, 1994)  [code]

Factoid question-answer pairs from Wikipedia articles with difficulty ratings [link]

Scripts for performing bootstrap resampling for BLEU significance testing  [link]