Automatic Differentiation of Functional Programs and its use for Probabilistic Programming
Jeffrey Mark Siskind
School of Electrical and Computer Engineering
It is extremely useful to be able to take derivatives of functions expressed as computer programs to support function optimization and approximation, parameter estimation, machine learning, and ultimately computational science and engineering design. The established discipline of Automatic Differentiation (AD) has largely focussed on imperative languages, where it is most efficiently implemented as a source-to-source transformation performed by a preprocessor. This talk will present a novel formulation of AD for functional programs expressed in the lambda calculus. A key novel aspect of our formulation is that AD is performed by higher-order functions that reflectively transform closure bodies. Our methods exhibit an important closure property that prior AD formulations lack: the ability to transform the entire space of input programs, including those produced by the AD transformations themselves. This is crucial for solving the kinds of nested optimizations that arise in domains such as game theory and automatic control. Furthermore, since the input and output of our transformations is the lambda calculus, efficient implementation is facilitated by novel extensions of standard compilation techniques. We exhibit a novel "almost" union-free polyvariant flow-analysis algorithm, formulated as abstract interpretation, that partially evaluates calls to the AD operators, migrating reflective source-to-source transformation to compile time. This yields code with run-time performance that exceeds the best existing AD implementations for imperative languages by a factor of two and outperforms all AD implementations for functional languages by two to three orders of magnitude.
AD has traditionally been applied to purely numeric programs written in imperative languages like Fortran. Our novel methods can be applied to mixed symbolic-numeric programs written in functional languages. This is useful for developing complex stochastic models such as is done in the emerging field of probabilistic programming. We demonstrate how our methods support this enterprise by constructing evaluators for two different probabilistic programming languages, one based on Scheme and one based on Prolog, and using both forward-mode and reverse-mode variants of our AD methods to take the gradients of such evaluators executing probabilistic programs in their respective target languages in order to perform gradient-based maximum-likelihood estimation of the distribution parameters of the free random variables in those programs. We demonstrate that for this domain our methods yield performance that exceeds straightforward implementation of AD in functional languages by many orders of magnitude.
Joint work with Barak A. Pearlmutter
Jeffrey M. Siskind received the B.A. degree in computer science from the Technion, Israel Institute of Technology, Haifa, in 1979, the S.M. degree in computer science from the Massachusetts Institute of Technology (M.I.T.), Cambridge, in 1989, and the Ph.D. degree in computer science from M.I.T. in 1992. He did a postdoctoral fellowship at the University of Pennsylvania Institute for Research in Cognitive Science from 1992 to 1993. He was an assistant professor at the University of Toronto Department of Computer Science from 1993 to 1995, a senior lecturer at the Technion Department of Electrical Engineering in 1996, a visiting assistant professor at the University of Vermont Department of Computer Science and Electrical Engineering from 1996 to 1997, and a research scientist at NEC Research Institute, Inc. from 1997 to 2001. He joined the Purdue University School of Electrical and Computer Engineering in 2002 where he is currently an associate professor. His research interests include machine vision, artificial intelligence, cognitive science, computational linguistics, child language acquisition, and programming languages and compilers.