World Modeling for Natural Language Understanding

How should AI systems understand what they read? Should AI systems emulate people? How do people understand what they read?

While there are many theoretical models of human reading comprehension (McNamara and Magliano, 2009), certain concepts have drawn widespread support from psychologists and cognitive scientists. For example, human readers identify protagonists, attribute mental states to them, and expect them to act in a goal-directed manner; events are situated within spatiotemporal contexts; and causal links are formed among events (Graesser et al., 1994; Zwaan and Radvansky, 1998; Elson, 2012; inter alia). We refer to this information as the world described by the text.

We seek to develop systems to automatically construct rich representations of the world underlying the text being analyzed. In designing world construction models we use abstractions identified by cognitive scientists and psychologists as fundamental in human understanding and reading comprehension. We also design targeted probing tasks to enable fine-grained assessment of the extent to which systems have captured this information.

There is a tension in the AI community between pattern recognition and model-building (Lake et al., 2017). Pattern recognition, the dominant paradigm in most application areas, interprets learning as the ability to make predictions. Model-building, by contrast, seeks to construct a hypothesis that can explain a set of observations. This project seeks to develop a model-building framework for natural language understanding, taking inspiration from Lake et al. in using ideas from cognitive science and psychology. The goal is to develop AI systems that are highly effective while offering greater flexibility, robustness, and interpretability than methods based on pattern recognition.

Below we describe individual research projects oriented towards these goals:

Memory-Augmented Neural Readers


We are developing memory-augmented neural networks in which the memory is designed to capture information about named entities in order to resolve references. Below is a visualization of our current model, PeTra ("People Tracking"), run on text from the GAP dataset:

The visualization above shows a memory with four memory cells, showing overwrite (OW) and coreference (CR) scores for each memory cell at each position in the text. The model has learned to associate one memory cell with each entity mentioned, and correctly resolves the reference in the GAP annotations (the blue mentions in the text).

For more details, please see the following:
PeTra: A Sparsely Supervised Memory Model for People Tracking
Shubham Toshniwal, Allyson Ettinger, Kevin Gimpel, Karen Livescu
ACL 2020
[arxiv] [code] [colab] [bib]

Probing for Entity and Event Information in Contextualized Embeddings


Targeted diagnostic analyses are important both to assess our models' success in capturing world information, and also to establish baseline assessments of how well existing NLP models capture this information. In this project we focus on exploring the world information represented by pre-trained contextual encoders, testing systematically how this information is distributed across the contextualized representations for different tokens within each sentence.

We design probing tasks for transitive sentences (subject-verb-object structure), and we probe each token of the sentence for information about the entity participants (subject and object nouns), and event (verb) described in the sentence. We probe for different types of features of these components -- for subject/object nouns, these include the number, gender, and animacy of the corresponding entities, and for verbs, these include the timing of the event (tense), flexibility in number of participants (causative-inchoative alternation), and whether the event is a dynamic or stative event type. Mapping the encoding of these features across token embeddings of BERT, ELMo, and GPT models, we find that most tokens of a sentence encode information about subject, object, and verb, but different encoders prioritize different types of information. Figure 1 below shows the distribution of information about the subject noun, and Figure 2 shows the distribution of information about the object noun.


Figure 1:
Figure 2:

These figures show, for instance, that BERT, and to an extent ELMo, greatly deprioritize gender information about the entities, and ELMo in particular deprioritizes information about the object entity on subject tokens. GPT also deprioritizes object entity information, with the object token encoding more information about the subject entity than the object entity.

For more details, please see the following:
Spying on your neighbors: Fine-grained probing of contextual embeddings for information about surrounding words
Josef Klafka, Allyson Ettinger
ACL 2020
[arxiv] [bib]

Principal Investigators:





This material is based upon work supported by the National Science Foundation under Award Nos. 1941178 and 1941160.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.