Archive for October 2010

## Automatic Differentiation

I asked my son Alexey Radul what exactly he is doing for his postdoc at the Hamilton Institute in Ireland. Here is his reply:

The short, jargon-loaded version: We are building an optimizing compiler for a programming language with first-class automatic differentiation, and exploring mathematical foundations, connections, applications, etc.

Interpretation of jargon:

Automatic differentiation is a technique for turning a program that computes a function into a program that computes that function together with its derivative; with a constant factor overhead. This is better than the usual symbolic differentiation that, say, Mathematica does because there is no intermediate-expression bulge. For example, if your function is a large product

Product f1(x) f2(x) … fn(x),

the symbolic derivative has size n2

Sum (Product f1′(x) f2(x) … fn(x)) (Product f1(x) f2′(x) … fn(x)) … (Product f1(x) f2(x) … fn’(x))

automatic differentiation avoids that cost. Automatic (as opposed to symbolic) differentiation also extends to conditionals, data structures, higher-order functions, and all the other wonderful things that distinguish a computer program from a mathematical expression.

First-class means that the differentiation operations are normal citizens of the programming language. This is not the case with commonly used automatic differentiation systems, which are all preprocessors that rewrite C or Fortran source code. In particular, we want to be able to differentiate any function written in the language, even if it is a derivative of something, or contains a derivative of something, etc. The automatic differentiation technique works but becomes more complicated in the presence of higher order, multivariate, or nested derivatives.

We are building an optimizing compiler because the techniques necessary to get good performance and correct results with completely general automatic differentiation are exactly the techniques used to produce aggressive optimizing compilers for functional languages, so we might as well go all the way.

It appears that the AD trick (or at least half of it) is just an implementation of synthetic differential geometry in the computer. This leads one to hope that a good mathematical foundation can be found that dictates the behavior of the system in all the interesting special cases; there is lots of math to be thought about in the vicinity of this stuff.

Applications are also plentiful. Any time you want to optimize anything with respect to real parameters, gradients help. Any time you are dealing with curves, slopes help. Computer graphics, computer vision, physics simulations, economic and financial models, probabilities — there’s so much stuff to apply a high quality such system to that we don’t know where to begin.

## Modern Coin-Weighing Puzzles

I usually give a lot of lectures and I never used to announce them in my blog. This time I will give a very accessible lecture at the MIT “Women in Mathematics” series. It will be on Wednesday October 6th at 5:30-6:30 PM in room 2-135. If you are in Boston, feel free to join. Here is the abstract.

I will discuss several coin-weighing puzzles and related research. Here are two examples of such puzzles:

1. Among 10 given coins, some may be real and some may be fake. All real coins weigh the same. All fake coins weigh the same, but have a different weight than real coins. Can you prove or disprove that all ten coins weigh the same in three weighings on a balance scale?

2. Among 100 given coins, four are fake. All real coins weigh the same. All fake coins weigh the same, but they are lighter than real coins. Can you find at least one real coin in two weighings on a balance scale?

You are not expected to come to my talk with the solutions to the above puzzles, but you are expected to know how to find the only fake coin among many real coins in the minimum number of weighings.