Kronecker Product for Matrices with Special Structure Using LinearOperators

I have been working with very large matrices for my research lately, and some of them are sparse, while others are dense but have some special structure that can be exploited to represent them efficiently.  I have also been constructing enormous matrices from these smaller building blocks using the Kronecker product.  Both numpy and scipy have functions built in to compute the Kronecker product between two matrices, but they need the input matrices to be explicitly represented (as a dense numpy array or a sparse scipy matrix), and so they are not compatible with the special-structure matrices that I've been dealing with.  I recently solved this problem, and that is the topic of this blog post.

An Example

As an example to make things more concrete, consider the two matrices shown below:

$$ A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}  \qquad B = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 1 \end{bmatrix} $$

As we scale up, $A$, which is the row of all ones stacked on top of an identity matrix is very sparse, while $B$, which is the lower triangular matrix of all $1$'s is dense, but has very special structure.  The Kronecker product between these matrices is a new $ 16 \times 12 $ matrix obtained by multiplying the (scalar) entries of $A$ by the matrix $B$ one at a time to form a block matrix.   

$$ A \otimes B = \begin{bmatrix} B & B & B \\ B & 0 & 0 \\ 0 & B & 0 \\ 0 & 0 & B \end{bmatrix} $$

where now the $0$'s refer to $ 4 \times 4 $ matrices.  

Exploiting Special Structure

A matrix is nothing more than an explicit way to represent a linear transformation.  The linear transformation represented by $B$ in the example above is wasteful (as it grows larger, that is) because the linear transformation $ u = B v $ of the vector $v$ can be computed very efficiently by a simple algorithm, namely $ u[0] = v[0]; u[i] = u[i-1] + v[i] $.  It is possible to do lots of things, even if we are only equipped with functions like this to compute $ B v $ and $ B^T v $ for arbitrary vector s $v$.  For example, methods like GMRES, which is an iterative gradient-based algorithm for solving a linear system of equations, only require the linear transformation for $B$ and $B^T$, which can be passed in explicitly with a dense numpy array or sparse scipy matrix, or implicitly with a scipy sparse LinearOperator.  We can instantiate a LinearOperator for $B$ using the simple algorithm from above together with a similar algorithm for computing $ B^T v $, which allows us to use GMRES (or other iterative solvers) to solve the system of equations $ B x = y $ for $ x $.  Note that for this particular example, it is trivial to solve the system $ B x = y $ without GMRES using an algorithm similar to the one used to express the linear transformations, but for other matrices with special structure, it might be the case that the linear transformation can be computed efficiently while solving a linear system may be non-trivial.

To exploit the special structure of $A$, it suffices to use a scipy sparse matrix, which only stores the indices and values of the non-zero entries.  

Kronecker Product with Linear Operators

In scipy, a LinearOperator encodes the $n \times m$ linear transformation $P$ as two functions: one which computes $ P v $ implicitly, and another one that computes $ P^T v $ implicitly.  I use the word implicitly here to emphasize that $P$ is never represented in matrix form.  Just like regular matrices, LinearOperators can be added together (if they have the same shape) and they can be multiplied together (if their shapes are compatible).  From the scipy documentation, "The result of these operations is always a new, composite LinearOperator, that defers linear operations to the original operators and combines the results".  For example, matrix multiplication is just function composition: For two linear operators $P$ and $Q$ such that $P v = f(v)$ and $Q u = g(u)$, we have $ (P Q) u = P (Q u) = f(g(u)) $.

While scipy provides functions to multiply and add LinearOperators, they do not provide functions for other operations that we can do with matrices that we might want to do with LinearOperators.  For example, in both numpy and scipy, there are functions for stacking matrices vertically (vstack) and horizontally (hstack), but analogous functions are not provided for LinearOperators.  Similarly, there is no provided implementation for Kronecker product - that's okay though, because we can define all of these things ourselves!

Returning to the example from earlier (although what proceeds applies more generally), we can use the following identity for Kronecker products:

$$ (A \otimes B) vec(X) = vec(A X B^T) $$

where $X$ is a $ 3 \times 4 $ matrix and $ vec(Z) $ flattens the input matrix $Z$ into a column vector by stacking the rows of $Z$.  There is a similar version of the statement if $vec(Z)$ stacks the columns of $Z$ instead as well.  We can rewrite the equation above in terms of a column vector $x$ as

$$ (A \otimes B) x = vec(A mat(x) B^T) $$ where $ mat $ is the inverse of $ vec $ - it turns a column vector back into a matrix of known size.  We can apply this equation to define the functions required to instantiate a LinearOperator for $ A \otimes B $ in terms of the functions defined for $A$ and $B$.  First note that $ C = A mat(x) = \begin{bmatrix} A x_1 & A x_2 & A x_3 & A x_4 \end{bmatrix} $ where $ x_i $ is the $i^{th}$ column of $mat(x)$.  $C$ is a $ 4 \times 4 $ matrix, and the problem reduces to finding $ vec(C B^T) $ - using the self-inverse property of transpose, we can rewrite this in a form that LinearOperator is capable of dealing with $ vec((B C^T)^T) $.  So again we are in the same setting where we have a linear operator $B$ multiplied by a matrix $C^T$, so we can use the same idea as earlier just multiply each column of $ C^T $.

I have shown how to define the linear transformation $A \otimes B$ in terms of the linear transformation $A$ and $B$ without needing to explicitly represent any of them - but what about the transpose operation: $ (A \otimes B)^T $?  The math works out roughly the same, we just have to use the identity $ (A \otimes B)^T = A^T \otimes B^T $.  

A Python Implementation

Below is an implementation of this idea in python:
from scipy.sparse import linalg
def kron(A, B):
    # A and B are linalg.LinearOperators
    am, an = A.shape
    bm, bn = B.shape
    shape = (am*bm, an*bn)
    def matvec(x):
        X = x.reshape(an, bn)
        return B.matmat(A.matmat(X).T).T.flatten()
    def rmatvec(x):
        X = x.reshape(am, bm)
        return B.H.matmat(A.H.matmat(X).T).T.flatten()
    return linalg.LinearOperator(shape, matvec, rmatvec, dtype=float)
The final implementation of kron turned out to be quite nice, and the code for vstack and hstack can be made very nice as well.  For my research, getting this working was a big breakthrough for me because before this was working, my only option was to explicitly represent the dense lower triangular matrix, and that caused a lot of performance problems (mainly with memory consumption), but the new LinearOperator-based implementation has much better performance, which allows me to scale up to bigger problems.  Note that if you plan to take the Kronecker product between more than two matrices, that can be done with reduce(kron, matrices) where matrices is a list of LinearOperators. However, since the linear transformation for the composite operator defers evaluation to the original operators, this can be pretty inefficient if it composes too many things. An easy workaround for this is to use scipy.sparse.kron when the matrices can be represented efficiently as sparse matrices, and only use the LinearOperator-based kron when it is necessary. I hope you found this blog post interesting, or maybe even useful if you are currently facing the same problems that I was when I figured all this stuff out.

Comments

Popular posts from this blog

Optimal Strategy for Farkle Dice

Markov Chains and Expected Value

Automatically Finding Recurrence Relations from Integer Sequences