Below are resources that I helped develop.


Training and validation data created for LAMBADA word prediction task; described here.  [lambada-train-valid.tar.gz (330MB)]
Manual analysis of 100 LAMBADA instances from paper above.  [lambada-analysis.tar.gz]

Code for training charagram models and pre-trained models from this EMNLP16 paper (developed by John Wieting)  [link]

Who-did-What reading comprehension dataset from this EMNLP16 paper  [link]

Resources for commonsense knowledge representation from ACL16 paper  [link]

Code for training paragram phrase embeddings and other models from ICLR16 paper (developed by John Wieting)  [link]

Pretrained paragram word embeddings and annotated phrase similarity datasets (developed by John Wieting)  [link]

TTIC Segmental CRF toolkit (developed by Hao Tang)   [link]

Rampion, a framework for training statistical machine translation models  [link]

Twitter part-of-speech tagger and tweets manually annotated with part-of-speech tags  [link]

NFL game data and aligned tweets  [link]

Restaurant menus with item descriptions and prices  [train.json.gz, dev.json.gz, test.json.gz]

Code for performing inference for monolingual and bilingual gappy pattern models  [link] [sample patterns]

Code to find trigger word pairs using mutual information (reimplementation of Rosenfeld, 1994)  [code]

Corpus of movie critic reviews and opening weekend revenues (updated Feb. 2015 with preprocessed data for running regression experiments)  [link]

Factoid question-answer pairs from Wikipedia articles with difficulty ratings [link]

Scripts for performing bootstrap resampling for BLEU significance testing  [link]