◆ Raymond J. Mooney, The University of Texas at Austin
Keynote Topic: Answering Why Questions about Narrative Text
Short Bio: Raymond J. Mooney is a Professor in the Department of Computer Science at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 180 published research papers, primarily in the areas of machine learning and natural language processing. He was the President of the International Machine Learning Society from 2008-2011, program co-chair for AAAI 2006, general chair for HLT-EMNLP 2005, and co-chair for ICML 1990. He is a Fellow of AAAI, ACM, and ACL and the recipient of the Classic Paper award from AAAI-19 and best paper awards from AAAI-96, KDD-04, ICML-05 and ACL-07.
Homepage: https://www.cs.utexas.edu/~mooney/ .
Abstract: Answering questions about why people perform particular actions is central to understanding and reasoning about narrative text. Despite recent progress in QA, it is not clear if existing models have the ability to answer “why” questions, which generally require using commonsense knowledge external to the narrative and inferring people's plans and goals. We have been developing novel data and methods for such why-question answering. TellMeWhy is a new crowd-sourced corpus consisting of more than 30k questions and free-form answers concerning why characters in short narratives perform the actions described. Given the limitations of automated evaluation for this task, we have also designed a systematized human evaluation interface for this dataset. Our evaluation of state-of-the-art models show that they are significantly below human performance on answering such questions. We have also explored what aspects of the knowledge required to answer why questions are accessible in current large language models and what aspects can be made accessible via external commonsense knowledge resources. Not surprisingly, larger models perform better, but all of the variable-sized models we explored benefited from the injection of question-specific knowledge from the COMET knowledge base. We also developed a simple ontology of knowledge types and analyzed the relative coverage of different models across these categories.
◆ Jason Eisner, Johns Hopkins University
Keynote Topic: Imposing Structure on Sequence Models
Short Bio: Jason Eisner is Professor of Computer Science at Johns Hopkins University, as well as Director of Research at Microsoft Semantic Machines. He is a Fellow of the Association for Computational Linguistics. At Johns Hopkins, he is also affiliated with the Center for Language and Speech Processing, the Mathematical Institute for Data Science, and the Cognitive Science Department. His goal is to develop the probabilistic modeling, inference, and learning techniques needed for a unified model of all kinds of linguistic structure. His 150+ papers have presented various algorithms for parsing, machine translation, and weighted finite-state machines; formalizations, algorithms, theorems, and empirical results in computational phonology; and unsupervised or semi-supervised learning methods for syntax, morphology, and word-sense disambiguation. He is also the lead designer of Dyna, a new declarative programming language that provides an infrastructure for AI research. He has received two school-wide awards for excellence in teaching, as well as recent Best Paper Awards at ACL 2017, EMNLP 2019, and NAACL 2021 and an Outstanding Paper Award at ACL 2022.
Homepage: https://www.cs.jhu.edu/~jason/.
Abstract: LSTMs and Transformers do very well at predicting the future from the past. They develop predictive internal representations based on large training sets. Sometimes, however, we may like to use domain knowledge to enforce hard constraints on the output. The same handwritten grammar rules that enforce constraints can also be used to guide representation learning. By structuring how the model attends to the past, this strategy may improve generalization and interpretability.
After giving some introductory examples, I will present two interesting architectures that make use of rules.
● A finite-state transducer (FST) is a classical architecture that can be built from a regular expression. It describes the legal outputs that can be aligned to each input, in terms of a path through a labeled graph. I'll describe how to featurize those paths and score them neurally, breaking the usual Markov assumption in a way that requires approximate inference algorithms such as beam search or particle smoothing. The neural architecture can be made to follow the structure of the regular expression.
● Datalog is a classical language that uses logic rules to derive facts from other facts within a database. We extend Datalog to handle temporal databases, so that its rules model how the facts can change over time. At any time, the database determines the set of possible next events, and when an event happens, it modifies some facts, causing the derived facts and possible next events to change. This is a kind of automaton that can be used to model either language (e.g., the events are words) or irregularly spaced event sequences such as transactions, clicks, posts, news events, or medical events. I'll explain how to turn it into an autoregressive probability model, whose neural architecture embeds the database facts and events by following the structure of the Datalog proofs.
This is joint work with Chu-Cheng Lin, Hongyuan Mei, and other collaborators.
◆ Luke Zettlemoyer, University of Washington
Keynote Topic: Large Language Models: Will they keep getting bigger? And, how will we use them if they do?
Short Bio: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Scientist at Meta. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. His honors include being named an ACL Fellow as well as winning a PECASE award, an Allen Distinguished Investigator award, and multiple best paper awards. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.
Homepage: https://www.cs.washington.edu/people/faculty/lsz .
Abstract: The trend of building ever larger language models has dominated much research in NLP over the last few years. In this talk, I will discuss our recent efforts to (at least partially) answer two key questions in this area: Will we be able to keep scaling? And, how will we actually use the models, if we do? I will cover our recent efforts on learning new types of sparse mixtures of experts (MoEs) models. Unlike model-parallel algorithms for learning dense models, which are very difficult to further scale with existing hardware, our sparse approaches have significantly reduced cross-node communication costs and could possibly provide the next big leap in performance, although finding a version that scales well in practice remains an open challenge. I will also present our recent work on prompting language models that better controls for surface form variation, to improve performance of models that are so big we can only afford to do inference, with little to no task-specific fine tuning. Finally, time permitting, I will discuss work on new forms of supervision for language model training, including learning from the hypertext and multi-modal structure of web pages to provide new signals for both learning and prompting the model. Together, these methods present our best guesses for how to keep the scaling trend alive as we move forward to the next generation of NLP models.
This talk describes work done at the University of Washington and Meta, primarily led by Armen Aghajanyan, Ari Holtzmann, Mike Lewis, Sewon Min, and Peter West.
◆ Alexander Rush, Cornell University
Keynote Topic: Prompting, Metadatasets, and Zero-Shot NLP
Short Bio: Alexander Rush is a Professor at Cornell Tech and researcher at Hugging Face. His work is at the intersection of natural language processing and probabilistic deep learning with applications in text generation and efficient inference. He has written several popular open-source software projects supporting NLP research and data science, as well as pedagogical implementations of popular libraries. He is the secretary of ICLR and developed the MiniConf software used to run ML/NLP virtual conferences during COVID. His work has received paper and demo awards at major NLP, visualization, and hardware conferences, an NSF Career Award, and a Sloan Fellowship.
Homepage: https://rush-nlp.com/.
Abstract: The paradigm of NLP tasks is changing, expanding from mostly single-dataset supervised learning in structured form to multi-dataset semi-supervised learning expressed in natural language. This talk focuses on T0, a large-scale language model trained on multitask prompted data (Sanh et al 2022). Despite being an order of magnitude smaller than GPT-3 class models, T0 exhibits similar zero-shot accuracy on unseen task categories. In addition to the modeling elements, this talk highlights the community processes of collecting data, dataset, and prompts for models of this scale. The work was done as part of BigScience, an international, collaborative effort to study large language models.