Matrix multiplication has become a popular topic recently. The main goal in this area is determining the value of the matrix multiplication constant , which is the infimum over all such that two complex matrices can be multiplied using arithmetic operations (addition, subtraction, multiplication, and division; arbitrary complex constants can appear as operands). Sloppy authors often define as a minimum instead of an infimum, but it is not clear that any algorithm exists. Indeed, given the common belief that , a classic result of Ran Raz which gives a lower bound of , assuming all the constants used are small, suggests that there is no algorithm (though there could be an asymptotically optimal algorithm, say one which runs in time ).

In this blog post we describe a result due to Coppersmith and Winograd that implies that a certain class of techniques *provably* cannot yield an optimal exponent, i.e. an algorithm, namely all algorithms which result from an invocation of Schönhage’s *asymptotic sum inequality*. This class includes Strassen’s original algorithm and all algorithms hence up to Strassen’s *laser method*, used in the celebrated algorithm of Coppersmith and Winograd, which corresponds to infinitely many invocations of the asymptotic sum inequality, and so is not subject to this limitation. The proof proceeds by showing how any identity (to which the asymptotic sum inequality can be applied) can be improved to another identity yielding a better bound on .

**Strassen’s original algorithm and the language of tensors**

In 1969, Strassen stunned the world of science by describing a matrix multiplication algorithm running in time . The algorithm relies on an identity showing how to multiply two matrices in non-commuting variables using “essential” multiplications (that is, not including multiplication by constants). Running this algorithm recursively (by treating a matrix as a matrix whose entries are matrices) allows us to multiply two matrices using essential multiplications, and the resulting algorithm has complexity .

Later on, Strassen and others developed a framework in which the basic identity can be expressed succinctly as

Generalizing the recursive construction gives Strassen’s theorem

It is now time to explain the notations and . The notation , or more generally , represents a certain *tensor*, a three-dimensional analog of a matrix:

The expression on the right-hand side represents a three-dimensional array whose rows are indexed by the elements , whose columns are indexed by the elements , and whose third dimension is indexed by the elements . Each term in the sum indexes a particular cell in this three-dimensional array, and represents the fact that this cell has value . All other cells have value . For comparison, a matrix could be encoded in a similar fashion by the double sum . The tensor represents the multiplication of an matrix (whose entries are indexed by the variables) by an matrix (whose entries are indexed by the variables), resulting in an matrix (whose entries are indexed by the variables).

The notation stands for the *rank* of the tensor , which is the smallest number of rank one tensors which sum to . A rank one tensor is an outer product of the form

The usual matrix rank can also be defined in this fashion, and this is the easiest way to show that row rank equals column rank for matrices.

Here is Strassen’s identity in this language:

Tensorial notation makes it clear that , and the rank is the same for the other four permutations as well.

Two important operations on tensors are *direct sum* and *tensor product*. The direct sum of two tensors is a “block diagonal tensor” which has as one block and as the other block. In tensorial notation, after ensuring that variables appearing in both tensors are disjoint, . For example,

The variable names we chose are arbitrary; in fact, the equals sign really means “equal up to isomorphism”, or rather “equal up to choice of indices for rows, columns and the third dimension”. The two constituent tensors represent an outer product and an inner product of vectors, respectively, and will figure out prominently in the sequel. It is not hard to check that in general, , but this is not always tight, as the example shows.

The tensor product of two tensors corresponds to the Kronecker product of matrices. Instead of explaining it formally, here is an example:

Formally speaking, the rows of the product tensor are indexed by pairs of row indices, and similarly for columns and the third dimension; the entry of the product tensor is equal to . It is not hard to check that .

The tensor product can be iterated to produce the *tensor power* . In terms of rank, .

**Border rank and the asymptotic sum inequality**

The rank operator on matrices is continuous, that is, if a matrix has rank , then all matrices close enough to will have rank . Surprisingly, this is not the case for tensors: the identity

shows that for any , the rank of the tensor on the left is , and on the other hand one can check that when the rank is . The *border rank* of a tensor , denoted , is the minimum rank of a sequence of tensors converging to . Here is an example due to Schönhage:

In this identity, is some expression only involving powers of which are or higher. If we divide both sides by and let , we recover on the left-hand side. Since there are terms on the right-hand side, we conclude that

.

This is rather surprising, since clearly . At the cost of an increase of one in the border rank, we are somehow able to compute a large disjoint inner product!

It is a deep fact due to Strassen that *all* upper bounds on the border rank arise as “approximate identities”, which are identities like Schönhage’s identity mentioned above (the power of on the left-hand side can be different); all powers of appearing in such identities are positive integers. This shows that and for all tensors .

How is Schönhage’s identity useful for multiplying two square matrices?

**Asymptotic sum inequality:** For a set of triples, not all equal to ,

This result of Schönhage is a vast generalization of Strassen’s theorem mentioned above, which corresponds to the special case in which there is only one triple (with border rank replaced by the rank). Unfortunately, explaining the proof of Schönhage’s identity will take us too far afield. The proof can be found in many sources, such as my lecture notes. One interesting aspect is that the proof of the asymptotic sum inequality does *not* result in an algorithm, where is the upper bound on resulting from the inequality. Rather, the proof gives a sequence of algorithms whose exponents *converge* to .

Choosing in Schönhage’s identity and applying the asymptotic sum inequality, we obtain

Solving for , we get the upper bound .

**The asymptotic sum inequality isn’t optimal**

Finally we are ready to state Coppersmith and Winograd’s result.

**Main theorem. **Let be the upper bound on obtained from an application of the asymptotic sum inequality. Then .

In other words, a single invocation of the asymptotic sum inequality cannot yield the optimal value of . One way around this limitation is to apply the asymptotic sum inequality to a sequence of identities — the course of action taken by Strassen’s laser method.

In the rest of this post, we provide an (almost) complete proof of this theorem. We start with a result generalizing Schönhage’s identity, which shows that in certain situations one can compute an additional inner product for free or at only a small cost in the border rank. This result is strong enough to prove a baby version of the main theorem.

**Baby theorem.** For all , .

This shows that no single identity à la Strassen’s original identity can yield the optimal value of .

Proving the main theorem itself requires more work, but the main ideas are contained in the proof of the baby theorem.

**Main construction and proof of baby theorem**

All the results of Coppersmith and Winograd rely on the following construction. We say that a tensor involves *independent* x-variables if it involves x-variables, and it cannot be written in terms of linear combinations of the x-variables.

**Main construction. **Let be a tensor involving independent x-variables and independent y-variables, and suppose that

*, where .*

*Then , and furthermore this is witnessed by a decomposition of the form*

*where , the z-variable of the inner product, doesn’t appear in , and*

*.*

Before proving the main construction, let us see why it is useful.

**Corollary. **For all , putting we have

This corollary generalizes Schönhage’s identity: taking and using , we obtain .

**Proof of corollary.**** **Consider a decomposition . Choose vectors and such that is non-zero for all ; here is the outer product of and , and is the result of substituting the transpose of the matrix for the matrix variables : . Such vectors exist since due to the minimality of the decomposition. Consider now the following decomposition of length :

Notice that

since we substituted . On the other hand, also

This shows that the premises of the main construction are satisfied, with . Since , the corollary follows directly from the main construction.

Using the corollary, the baby theorem follows quite easily.

**Proof of baby theorem.**** **The proof uses the standard inequality , which can be proved using the so-called *substitution method*. Let , so that . For each integer , . If for some then Strassen’s theorem shows that , so assume that for all . The corollary shows that

and the asymptotic sum inequality shows that

Since and so , for large enough we have , and so

which shows that .

**Proof of main construction**

Let us briefly recall the assumptions of the construction: is a tensor involving independent x-variables and independent y-variables, decomposes as a sum of rank one tensors , and .

For each y-index , let be the corresponding standard basis vector. Substituting for the y-variables, we obtain

We rephrase this in the language of matrices. Let be the matrix whose columns are , and let be the column vector whose entries are . The displayed equation is equivalent to . Since the x-variables are independent, has rank , and so the subspace of column vectors annihilating has dimension . The independence of the y-variables implies that the vectors are linearly independent, and so we can find vectors that together with form a basis for the subspace of column vectors annihilating .

We can apply the same argument the other way around. Let be the matrix whose rows are , and let be the row vector whose entries are . As before, , and the vectors are linearly independent. Furthermore, the subspace of row vectors annihilating has dimension , and so can be completed to a basis for this subspace by the addition of vectors .

Consider now the matrix whose rows are and the matrix whose columns are . The matrix has full row rank and the matrix has full column rank, and so the matrix has rank at least . On the other hand, the last rows of are simply the matrix , and the last columns of simply form the matrix . This shows that is a block diagonal matrix of the form , where is an square block. Since has rank at least , this block must be invertible. In fact, by choosing a different basis instead of , we can guarantee that

The next step is to lift to linear combinations of new x-variables and y-variables : for each , define

The linear combination has the same relation to the as has to the . The identity stated above for can be recast as

This allows us to add the additional inner product to the original identity. Let be a new z-variable, and consider the identity

This shows that . Furthermore, the coefficient of is exactly . This completes the proof of the main construction.

**Proof of main theorem**

The proof of the main theorem proceeds in three steps. In the first step we show how to absorb a tensor into another tensor satisfying the requirements of the main construction. In the second step we iterate the first step to obtain a result similar to the corollary used above to prove the baby theorem. Finally, in the third step we prove the main theorem itself, using a result akin to the lower bound .

We start with the first step.

**Auxiliary construction. **Suppose that is a tensor along with a decomposition satisfying the requirements of the main construction, and that is an arbitrary tensor (on different variables) whose border rank satisfies (notation as in the main construction). Then there is a decomposition of of length satisfying the requirements of the main construction.

**Proof.**** **Apply the main construction to to obtain a decomposition

Furthermore, let be a decomposition of . By adjusting one of the decompositions, we can assume without loss of generality that .

Denote by the substitution that sets for and otherwise, and define similarly. Applying these substitutions to the decomposition of above, we get

Moreover, the coefficient of in the first expression is exactly

We can rewrite the combined decomposition as

where the coefficient of vanishes. Removing , we obtain a decomposition satisfying the requirements of the main construction, completing the proof.

The next step is iterating the auxiliary construction, with an (almost) arbitrary starting point. Applying the main construction at the very end, we obtain the following corollary.

**Auxiliary corollary. **Suppose is a tensor involving independent x-variables and independent y-variables, whose border rank is at least . Then for every integer ,

In the statement of the corollary, is the direct sum of copies of .

**Proof.**** **Consider some decomposition , and extend it to the decomposition

of length , which satisfies the requirements of the main construction. By assumption , and so we can apply the auxiliary lemma to obtain a decomposition of of length satisfying the requirements of the main construction with independent x-variables and independent y-variables. Once again, by assumption , and so we can apply the auxiliary lemma again (to and ) to obtain a decomposition of of length satisfying the requirements of the main construction. In this way, for every we can obtain a decomposition of of length satisfying the requirements of the main construction with independent x-variables and independent y-variables. A final application of the main construction completes the proof.

Before we can prove the main theorem, we need one simple result.

**Lemma. **Suppose that applying the asymptotic sum inequality to the tensors yields the same upper bound on . If then applying the asymptotic sum inequality to shows that , and otherwise the same upper bound on is obtained. Similarly, if then applying the asymptotic sum inequality to shows that , and otherwise the same upper bound on is obtained.

* Proof.* Let have border rank and let have border rank . By assumption,

Applying the asymptotic sum inequality to the direct sum gives

If then this gives the bound , and otherwise a better bound is obtained (recall that ).

Applying the asymptotic sum inequality to the tensor product gives

If then this gives the bound , and otherwise a better bound is obtained (recall that ).

We can now complete the proof of the main theorem, using the above lemma freely.

**Proof of main theorem.**** **Let be an arbitrary tensor such that for some , denote the border rank of by , and let be the solution to the equation

Our goal is to show that .

Any tensor can be “rotated” by replacing the x,y,z-variables by y,z,x-variables. Any such permutation of variables results in a tensor with the same rank and border rank. In particular, the tensor

has border rank , and the same number of x-variables, y-variables and z-variables, say of each. If then the asymptotic sum inequality shows that , so we can assume that . A lower bound proved using the substitution method shows that (this is Theorem 6.1 in the paper of Coppersmith and Winograd). Pick a power so large that , and put . Note that has each of x-variables, y-variables and z-variables, and its border rank is at most . If the border rank is smaller then the asymptotic sum inequality shows that , so we can assume that .

For brevity, put , , and . Also, let , where

The auxiliary corollary shows that for every , . Applying the asymptotic sum inequality gives

Choose so large that . Then

showing that . This completes the proof.

July 16, 2014 at 3:27 pm |

nice/impressive work, seems like a real result, should this be a paper? is this your 1st blog ever?

fyi RJL recently mused on the general theme of this blog does program size matter? … am collecting links & plan to blog on the subj at some pt also.

July 17, 2014 at 2:18 am |

As mentioned in the second paragraph, this post is an exposition of a paper by Coppersmith and Winograd from 1982. Another blog post, describing recent novel results, will appear in a few days.

July 17, 2014 at 2:28 am

lol ok. maybe the ref to the “baby thm” was what tripped me up that there could be new results or a new/ different/ alternate/ simplified proof/ angle. presumably not called that in any other paper… 😉

July 17, 2014 at 3:10 am

another funky idea just occurred to me. could this be an algorithm that converges ie on repeated applications is actually estimating to higher degrees of precision ie similar to iterative algorithms that compute digits of ? in other words are the supposed “improvements” increasingly small?

July 17, 2014 at 3:35 am

This is mentioned as an open question by Coppersmith and Winograd. In some sense, Strassen’s laser method, especially as used by Coppersmith and Winograd, is an answer. One can probably also show by experiment that the improvements decrease sharply, though this is of course not a proof.

July 20, 2014 at 10:03 pm |

[…] Are there familiar computational problems with no best algorithm? « The asymptotic sum inequality is not optimal (guest post by Yuval Filmus) […]