A central question in the theory of algorithms is to determine the constant , called the exponent of matrix multiplication. This constant is defined as the infimum of all real numbers such that for all there is an algorithm for multiplying matrices running in time . Until the late 1960s it was believed that , i.e. that no improvement can be found for the problem. In 1969, Strassen surprised everyone by showing that two matrices can be multiplied in time. This discovery spawned a twenty-year-long extremely productive time in which the upper bound on was gradually lowered to . After a twenty-year stall, some very recent research has brought the upper bound down to .

**Bilinear algorithms and recursion.**

Strassen’s approach was to exploit the inherent recursive nature of matrix multiplication: the product of two matrices can be viewed as the product of two matrices, the entries of which are matrices. Suppose that we have an algorithm *ALG* that runs in time and multiplies two matrices. Then one can envision obtaining a fast recursive algorithm for multiplying matrices (for any integer ) as well: view the matrices as matrices the entries of which are matrices; then multiply the matrices using *ALG* and when *ALG* requires us to multiply two matrix entries, recurse.

This approach only works, provided that the operations that *ALG* performs on the matrix entries make sense as matrix operations: e.g. entry multiplication, taking linear combination of entries etc. One very general type of such algorithm is the so called *bilinear* algorithm: Given two matrices and , compute products

i.e. take possibly different linear combinations of entries of and multiply each one with a possibly different linear combination of entries of . Then, compute each entry of the product as a linear combination of the : .

Given a bilinear algorithm *ALG* for multiplying two matrices (for constant ) that computes products , the recursive approach that multiplies matrices using *ALG* gives a bound . To see this, notice that the number of additions that one has to do is no more than : at most to compute the linear combinations for each and at most for each of the outputs . Since matrix addition takes linear time in the matrix size, we have a recurrence of the form .

As long as we get a nontrivial bound on . Strassen’s famous algorithm used and thus showing that . A lot of work went into getting better and better “base algorithms” for varying constants . Methods such as Pan’s method of trilinear aggregation were developed. This approach culminated in Pan’s algorithm (1978) for multiplying matrices that used products and hence showed that **.**

**Approximate algorithms and Schonhage’s theorem.**

A further step was to look at more general algorithms, so called *approximate* bilinear algorithms. In the definition of a bilinear algorithm the coefficients were constants. In an approximate algorithm, these coefficients can be formal linear combinations of integer powers of an indeterminate, (e.g. ). The entries of the product are then only “approximately” computed, in the sense that , where the term is a linear combination of *positive* powers of . The term “approximate” comes from the intuition that if you set to be close to , then the algorithm would get the product almost exactly.

Interestingly enough, Bini et al. (1980) showed that when dealing with the asymptotic complexity of matrix multiplication, approximate algorithms suffice for obtaining bounds on . This is not obvious! What Bini et al. show, in a sense, is that as the size of the matrices grows, the “approximation” part can be replaced by a sort of bookkeeping which does not present an overhead asymptotically. The upshot is that if there is an *approximate* bilinear algorithm that computes products to compute the product of two matrices, then .

Bini et al. (1979) gave the first approximate bilinear algorithm for a matrix product. Their algorithm used entry products to multiply a matrix with a matrix. Although this algorithm is for rectangular matrices, it can easily be converted into one for square matrices: a matrix is a matrix with entries that are matrices with entries that are matrices, and so multiplying matrices can be done recursively using Bini et al.’s algorithm three times, taking entry products. Hence .

Schonhage (1981) developed a sophisticated theory involving the bilinear complexity of rectangular matrix multiplication that showed that approximate bilinear algorithms are even more powerful. His paper culminated in something called the Schonhage -theorem, or the asymptotic sum inequality. This theorem is one of the most useful tools in designing and analyzing matrix multiplication algorithms.

Schonhage’s -theorem says roughly the following. Suppose we have several instances of matrix multiplication, each involving matrices of possibly different dimensions, and we are somehow able to design an approximate bilinear algorithm that solves all instances and uses fewer products than would be needed when computing each instance separately. Then this bilinear algorithm can be used to multiply (larger) square matrices and would imply a nontrivial bound on .

What is interesting about Schonhage’s theorem is that it is believed that when it comes to *exact* bilinear algorithms, one cannot use fewer products to compute several instances than one would use by just computing each instance separately. This is known as Strassen’s additivity conjecture. Schonhage showed that the additivity conjecture is false for *approximate* bilinear algorithms. In particular, he showed that one can approximately compute the product of a by a vector and the product of a by a vector together using only entry products, whereas any exact bilinear algorithm would need at least products. His theorem then implied , and this was a huge improvement over the previous bound of Bini et al.

**Using fast solutions for problems that are not matrix multiplications.**

The next realization was that there is no immediate reason why the “base algorithm” that we use for our recursion has to compute a matrix product at all. Let us focus on the following family of computational problems. We are given two vectors and and we want to compute a third vector . The dependence of on and is given by a three-dimensional tensor as follows: . The vector is a bilinear form. The tensor can be arbitrary, but let us focus on the case where . Notice that completely determines the computational problem. Some examples of such bilinear problems are polynomial multiplication and of course matrix multiplication. For polynomial multiplication, if and only if , and for matrix multiplication, if and only if and .

The nice thing about these bilinear problems is that one can easily extend the theory of bilinear algorithms to them. A bilinear algorithm computing a problem instance for tensor computes products of the form and then sets . Here, an algorithm is nontrivial if the number of products that it computes is less than the number of positions where the tensor is nonzero.

In order to be able to talk about recursion for general bilinear problems, it is useful to define the tensor product of two tensors and : . Thus, the bilinear problem defined by can be viewed as a bilinear problem defined by , where each product is actually itself a bilinear problem defined by .

This allows one to compute an instance of the problem defined by using an algorithm for and an algorithm for . One can similarly define the tensor power of a tensor as tensor-multiplying by itself times. Then any bilinear algorithm computing an instance defined by using entry products can be used recursively to compute the tensor power of using products, just as in the case of matrix multiplication.

A crucial development in the study of matrix multiplication algorithms was the discovery that sometimes algorithms for bilinear problems that do not look at all like matrix products can be converted into matrix multiplication algorithms. This was first shown by Strassen in the development of his “laser method” and was later exploited in the work of Coppersmith and Winograd (1987,1990). The basic idea of the approach is as follows.

Consider a bilinear problem for which you have a really nice approximate algorithm *ALG* that uses entry products. Take the tensor power of (for large ), and use* ALG* recursively to compute using entry products. is a bilinear problem that computes a long vector from two long vectors and . Suppose that we can embed the product of two matrices and into as follows: we put each entry of into some position of and set all other positions of to , we similarly put each entry of into some position of and set all other positions of to , and finally we argue that each entry of the product is in some position of the computed vector (all other entries are ). Then we would have a bilinear algorithm for computing the product of two matrices using entry products, and hence .

The goal is to make as large of a function of as possible, thus minimizing the upper bound on .

Strassen’s laser method and Coppersmith and Winograd’s paper, and even Schonhage’s -theorem, present ways of embedding a matrix product into a large tensor power of a different bilinear problem. The approaches differ in the starting algorithm and in the final matrix product embedding. We’ll give a very brief overview of the Coppersmith-Winograd algorithm.

**The Coppersmith-Winograd algorithm.**

The bilinear problem that Coppersmith and Winograd start with is as follows. Let be an integer. Then we are given two vectors and of length and we want to compute a vector of length defined as follows:

,

for , and .

Notice that is far from being a matrix product. However, it is related to matrix products:

which is the inner product of two -length vectors,

, , and , which are three scalar products, and

the two matrix products computing and for which are both products of a vector with a scalar.

If we could somehow convert Coppersmith and Winograd’s bilinear problem into one of computing these products as *independent *instances, then we would be able to use Schonhage’s -theorem. Unfortunately, however, the matrix products are merged in a strange way, and it is unclear whether one can get anything meaningful out of an algorithm that solves this bilinear problem.

Coppersmith and Winograd develop a multitude of techniques to show that when one takes a large tensor power of the starting bilinear problem, one can actually decouple these merged matrix products, and one can indeed apply the -theorem. The -theorem then gives the final embedding of a large matrix product into a tensor power of the original construction, and hence defines a matrix multiplication algorithm.

Their approach combines several impressive ingredients: sets avoiding -term arithmetic progressions, hashing and the probabilistic method. The algorithm computing their base bilinear problem is also impressive. The number of entry products it computes is , which is exactly the length of the output vector ! That is, their starting algorithm is optimal for the particular problem that they are solving.

What is not optimal, however, is the analysis of how good of a matrix product algorithm one can obtain from the base algorithm. Coppersmith and Winograd noticed this themselves: They first applied their analysis to the original bilinear algorithm and obtained an embedding of an matrix product into the -tensor power of the bilinear problem for some explicit function . (Then they took to go to infinity and obtained .) Then they noticed, that if one applies the analysis to the second tensor power of the original construction, then one obtains an embedding of an matrix product into the same -tensor power, where . That is, although one is considering embeddings into the same () tensor power of the construction, the analysis crucially depends on which tensor power of the construction you start from! This led to the longstanding bound . Coppersmith and Winograd left as an open problem what bound one can get if one starts from the third or larger tensor powers.

**The recent improvements on .**

It seems that many researchers attempted to apply the analysis to the third tensor power of the construction, but this somehow did not improve bound on . Because of this and since each new analysis required a lot of work, the approach was abandoned, at least until 2010. In 2010, Andrew Stothers carried through the analysis on the fourth tensor power and discovered that the bound on can be improved to .

As mentioned earlier, the original Coppersmith-Winograd bilinear problem was related to different matrix multiplication problems that were merged together. The tensor power of the bilinear problem is similarly composed of merged instances of simpler bilinear ptoblems, however these instances may no longer be matrix multiplications. When applying a Coppersmith-Winograd-like analysis to the tensor power, there are two main steps.

The first step involves analyzing each of the bilinear problems, intuitively in terms of how close they are to matrix products; there is a formal definition of the similarity measure called the *value *of the bilinear form. The second step defines a family of matrix product embeddings in the tensor power in terms of the values. These embeddings are defined via some variables and constraints, and each represents some matrix multiplication algorithm. Finally, one solves a nonlinear optimization program to find the best among these embeddings, essentially finding the best matrix multiplication algorithm in the search space.

Both the Coppersmith-Winograd paper and Stothers’ thesis perform an entirely new analysis for each new tensor power . The main goal of my work was to provide a general framework so that the two steps of the analysis do not have to be redone for each new tensor power . My paper first shows that the first step, the analysis of each of the values, can be completely automated by solving linear programs and simple systems of linear equations. This means that instead of proving theorems one only needs to solve linear programs and linear systems, a simpler task. My paper then shows that the second step of the analysis, the theorem defining the search space of algorithms, can also be replaced by just solving a simple system of linear equations. (Amusingly, the fact that matrix multiplication algorithms can be used to solve linear equations implies that good matrix multiplication algorithms can be used to search for better matrix multiplication algorithms.) Together with the final nonlinear program, this presents a fully automated approach to performing a Coppersmith-Winograd-like analysis.

After seeing Stothers’ thesis in the summer of last year, I was impressed by a shortcut he had used in the analysis of the values of the fourth tensor power. This shortcut gave a way to use recursion in the analysis, and I was able to incorporate it in my analysis to show that the number of linear programs and linear systems one would need to solve to compute the values for the tensor power drops down to , at least when is a power of . This drop in complexity allowed me to analyze the tensor power, thus obtaining an improvement in the bound of : .

There are several lingering open questions. The most natural one is, how does the bound on change when applying the analysis to higher and higher tensor powers. I am currently working together with a Stanford undergraduate on this problem: we’ll apply the automated approach to several consecutive powers, hoping to uncover a pattern so that one can then mathematically analyze what bounds on can be proven with this approach.

A second open question is more far reaching: the Coppersmith-Winograd analysis is not optimal– in a sense it computes an approximation to the best embedding of a matrix product in the tensor power of their bilinear problem. What is the optimal embedding? Can one analyze it mathematically? Can one automate the search for it?

Finally, I am fascinated by automating the search for algorithms for problems. In the special case of matrix multiplication we were able to define a search space of algorithms and then use software to optimize over this search space. What other computational problems can one approach in this way?

September 25, 2012 at 7:11 am |

The Salem-Spencer sets (those avoiding 3-term arithmetic progressions) can be avoided: see Section 15.7 of Algebraic Complexity Theory (Bürgisser et al.), which contains an alternative construction due to Strassen.