log-dot-exp: generalizing the log-sum-exp trick

In scientific computing applications we often have to work with very small positive numbers.  With finite precision representation for numbers (like 32 or 64 bit floats), this can be a problem when precision matters even at the smallest scales.  In these cases, it is useful to represent numbers in log space and perform computations directly in that space.  Working in log space can make a numerically unstable algorithm stable.  Suppose for example that we want to know the sum of $n$ positive real numbers: $Z = \phi_1 + \dots + \phi_n$.  In log space, we would instead work with $\theta_i = \log{(\phi_i)}$.  To perform this summation in log space, we need to evaluate the following expression:

$$ \log Z = \log \sum_{i} \exp{(\theta_i)} $$

When done naively, this computation can be numerically unstable, e.g., if $\theta_i$ is large then $\exp{(\theta_i)}$ may cause an overflow error.  The Log-Sum-Exp trick provides a way to do this computation in a numerically stable manner.  In particular, let $\theta_* = \max_{i} \theta_i$.  Then we can equivalently compute:

$$ \log Z = \theta_* + \log \sum_{i} \exp{(\theta_i - \theta_*)} $$

Performing this computation in this way is equivalent but is now numerically stable, because the largest value that we need to exponentiate is $0$.  In python programs, one can simply call scipy.special.logsumexp(theta) to perform this computation.

But now what if we want to perform other computations in log space in a numerically stable manner?  Below I provide three generalizations of this trick that I have needed to use a couple of times personally.  Similar to the log-sum-exp trick, it's not hard to work out these generalizations for yourself, it just might not be obvious at first glance.  

For a simple dot product between two vectors $\mathbf{a}$ and $\mathbf{b}$, we can use the following Log-Dot-Exp trick:

$$ \log{\Big( \exp{(\mathbf{a})}^T \exp{(\mathbf{b})} \Big)} = \log{\Big( \sum_i \exp{(a_i)} \exp{(b_i)} \Big)} = \log \sum_i \exp{(a_i + b_i)} $$

Thus, log-dot-exp just simplifies to log-sum-exp of $\mathbf{a}+\mathbf{b}$, and can be implemented efficiently in python using

        scipy.special.logsumexp(a+b) 

 Now suppose we want to compute the matrix-vector product between a matrix $\mathbf{A}$ and a vector $\mathbf{b}$, when both are represented in log space (Note that in python, this computation is also called "dot").  This computation produces a vector $y$, where each entry is a dot product between a row of $\mathbf{A}$ and $\mathbf{b}$.  Thus, this computation just reduces to repreated calls to Log-Dot-Exp from above.  In python, we could write 

  np.array([scipy.special.logsumexp(a + b) for a in A]) 

 which will iterate through the rows of $\mathbf{A}$ one-by-one and perform the computations. This works, but in practice it would be better to avoid the list comprehension, in favor of a vectorized implementation. That is, we could equivalently write 

  scipy.special.logsumexp(A + b[None], axis=1)

Finally, suppose we want to evaluate the matrix-vector product where $\mathbf{A}$ is not represented in log space, but $\mathbf{b}$.  That is, we want to evaluate $ \log{\big( \mathbf{A} \exp{(\mathbf{b})} \big)} $.  This computation may sometimes be useful if $\mathbf{A}$ contains zeros or negatives.  However, one has to be careful that $\mathbf{A} \exp{(\mathbf{b})}$ is positive, or else $\log$ is undefined.  This computation can be performed similarly to Log-Sum-Exp.  Let's consider one entry at a time, so we need to compute $\log{( \mathbf{a}^T \exp{(\mathbf{b})} )}$ 

$$ \log{( \mathbf{a}^T \exp{(\mathbf{b})} )} = \log \sum_i a_i \exp{(b_i)} = \log \sum_i a_i \exp{(b_i - b_*)} \exp{(b_*)} = b_* + \log \sum_i a_i \exp{(b_i - b_*)} $$

The entire matrix-vector product can be done in vectorized form using 
b.max() + np.log(A @ np.exp(b - b.max())

Comments

Popular posts from this blog

Optimal Strategy for Farkle Dice

Markov Chains and Expected Value

Automatically Finding Recurrence Relations from Integer Sequences