Archive for the ‘Matrix multiplication’ Category

Limits on the recent progress on matrix multiplication algorithms (guest post by Yuval Filmus)

July 20, 2014

Virginia Vassilevska Williams stunned the world of computer science by presenting the first improvement to the matrix multiplication constant \omega in twenty years (later it was found that a more modest progress had been obtained independently and slightly earlier by Andrew James Stothers). The current world record is held by François Le Gall, who showed that \omega < 2.3728639 by perfecting the methods of Vassilevska Williams. For an exposition of this line of work, check out my recent lecture notes. For the full details, I recommend Le Gall’s paper and the journal version of Stothers’ thesis (with his advisor A. M. Davie). The recent progress heavily builds on classic work by Coppersmith and Winograd. Briefly, they came up with a basic identity known as the Coppersmith–Winograd identity. Applying Strassen’s laser method with their own ingenious construction (relying on Salem–Spencer sets), they obtained the bound \omega < 2.388. Applying their method to the tensor square of the basic identity, they obtained the improved bound \omega < 2.376. That’s where things had been standing since 1990, until Stothers managed to perform the computations for the fourth tensor power, obtaining the bound \omega < 2.373 in 2010. A year later (but independently), Vassilevska Williams analyzed the eighth tensor power and obtained the bound \omega < 2.3729. Further progress was obtained by Le Gall, who analyzed the sixteenth and thirty-second tensor powers, obtaining the current world record stated above. Although taking ever higher tensor powers improves the bound on \omega, the improvements are diminishing, and it seems safe to conjecture that taking the Nth tensor power does not yield the conjectured \omega = 2 as N \to \infty. However, how can we be certain? Perhaps the improvements slow down, but like a slowly divergent series, eventually go all the way down to 2? In the rest of the blog post, we describe a recent result of Andris Ambainis and myself, which shows that the best bound that this method can produce is \omega < 2.3078, for any value of N. In fact, our result allows for a wider class of methods which utilize the Coppersmith–Winograd identity, and hint to a new technique which can potentially lead to better algorithms. Very recently, Le Gall was able to use our methods to show that the best bound that can be obtained by taking an Nth tensor power of the Coppersmith–Winograd identity is \omega < 2.3725, and so the current analysis of the identity is quite tight. (more…)


The asymptotic sum inequality is not optimal (guest post by Yuval Filmus)

July 15, 2014

Matrix multiplication has become a popular topic recently. The main goal in this area is determining the value of the matrix multiplication constant \omega, which is the infimum over all \alpha such that two n\times n complex matrices can be multiplied using O(n^\alpha) arithmetic operations (addition, subtraction, multiplication, and division; arbitrary complex constants can appear as operands). Sloppy authors often define \omega as a minimum instead of an infimum, but it is not clear that any O(n^\omega) algorithm exists. Indeed, given the common belief that \omega = 2, a classic result of Ran Raz which gives a lower bound of \Omega(n^2\log n), assuming all the constants used are small, suggests that there is no O(n^\omega) algorithm (though there could be an asymptotically optimal algorithm, say one which runs in time O(n^2\log n)).

In this blog post we describe a result due to Coppersmith and Winograd that implies that a certain class of techniques provably cannot yield an optimal exponent, i.e. an O(n^\omega) algorithm, namely all algorithms which result from an invocation of Schönhage’s asymptotic sum inequality. This class includes Strassen’s original algorithm and all algorithms hence up to Strassen’s laser method, used in the celebrated algorithm of Coppersmith and Winograd, which corresponds to infinitely many invocations of the asymptotic sum inequality, and so is not subject to this limitation. The proof proceeds by showing how any identity (to which the asymptotic sum inequality can be applied) can be improved to another identity yielding a better bound on \omega.


On the recent progress on matrix multiplication algorithms (guest post by Virginia Vassilevska Williams)

September 22, 2012

A central question in the theory of algorithms is to determine the constant \omega, called the exponent of matrix multiplication. This constant is defined as the infimum of all real numbers such that for all \varepsilon>0 there is an algorithm for multiplying n\times n matrices running in time O(n^{\omega+\varepsilon}). Until the late 1960s it was believed that \omega=3, i.e. that no improvement can be found for the problem. In 1969, Strassen surprised everyone by showing that two n\times n matrices can be multiplied in O(n^{2.81}) time. This discovery spawned a twenty-year-long extremely productive time in which the upper bound on \omega was gradually lowered to 2.376. After a twenty-year stall, some very recent research has brought the upper bound down to 2.373.

Bilinear algorithms and recursion.
Strassen’s approach was to exploit the inherent recursive nature of matrix multiplication: the product of two kn\times kn matrices can be viewed as the product of two k\times k matrices, the entries of which are n\times n matrices. Suppose that we have an algorithm ALG that runs in o(k^3) time and multiplies two k\times k matrices. Then one can envision obtaining a fast recursive algorithm for multiplying k^i\times k^i matrices (for any integer i>1) as well: view the k^i\times k^i matrices as k\times k matrices the entries of which are k^{i-1}\times k^{i-1} matrices; then multiply the k\times k matrices using ALG and when ALG requires us to multiply two matrix entries, recurse.

This approach only works, provided that the operations that ALG performs on the matrix entries make sense as matrix operations: e.g. entry multiplication, taking linear combination of entries etc. One very general type of such algorithm is the so called bilinear algorithm: Given two matrices A and B, compute r products

P_l = (\sum_{i,j} u_{ijl} A[i,j])(\sum_{ij} v_{ijl} B[i,j]),

i.e. take r possibly different linear combinations of entries of A and multiply each one with a possibly different linear combination of entries of B. Then, compute each entry of the product AB as a linear combination of the P_l: AB[i,j]=\sum_{l} w_{ijl} P_l.

Given a bilinear algorithm ALG for multiplying two k\times k matrices (for constant k) that computes r products P_l, the recursive approach that multiplies k^i\times k^i matrices using ALG gives a bound \omega\leq \log_k r. To see this, notice that the number of additions that one has to do is no more than 3rk^2: at most 2k^2 to compute the linear combinations for each P_l and at most r for each of the k^2 outputs AB[i,j]. Since matrix addition takes linear time in the matrix size, we have a recurrence of the form T(k^i) = r T(k^{i-1}) + O(rk^{2i}).

As long as r<k^3 we get a nontrivial bound on \omega. Strassen’s famous algorithm used k=2 and r=7 thus showing that \omega\leq \log_2 7. A lot of work went into getting better and better “base algorithms” for varying constants k. Methods such as Pan’s method of trilinear aggregation were developed. This approach culminated in Pan’s algorithm (1978) for multiplying 70\times 70 matrices that used 143,640 products and hence showed that \omega\leq \log_{70} 143,640<2.796.

Approximate algorithms and Schonhage’s theorem.
A further step was to look at more general algorithms, so called approximate bilinear algorithms. In the definition of a bilinear algorithm the coefficients u_{ijl},v_{ijl},w_{ijl} were constants. In an approximate algorithm, these coefficients can be formal linear combinations of integer powers of an indeterminate, \lambda (e.g. 2\lambda^2-5\lambda^{-3}). The entries of the product AB are then only “approximately” computed, in the sense that AB[i,j]=\sum_{l} w_{ijl} P_l + O(\lambda), where the O(\lambda) term is a linear combination of positive powers of \lambda. The term “approximate” comes from the intuition that if you set \lambda to be close to 0, then the algorithm would get the product almost exactly.

Interestingly enough, Bini et al. (1980) showed that when dealing with the asymptotic complexity of matrix multiplication, approximate algorithms suffice for obtaining bounds on \omega. This is not obvious! What Bini et al. show, in a sense, is that as the size of the matrices grows, the “approximation” part can be replaced by a sort of bookkeeping which does not present an overhead asymptotically. The upshot is that if there is an approximate bilinear algorithm that computes r products P_l to compute the product of two k\times k matrices, then \omega\leq \log_k r.

Bini et al. (1979) gave the first approximate bilinear algorithm for a matrix product. Their algorithm used 10 entry products to multiply a 2\times 3 matrix with a 3\times 3 matrix. Although this algorithm is for rectangular matrices, it can easily be converted into one for square matrices: a 12\times 12 matrix is a 2\times 3 matrix with entries that are 3\times 2 matrices with entries that are 2\times 2 matrices, and so multiplying 12\times 12 matrices can be done recursively using Bini et al.’s algorithm three times, taking 1,000 entry products. Hence \omega\leq \log_{12} 1,000<2.78.

Schonhage (1981) developed a sophisticated theory involving the bilinear complexity of rectangular matrix multiplication that showed that approximate bilinear algorithms are even more powerful. His paper culminated in something called the Schonhage \tau-theorem, or the asymptotic sum inequality. This theorem is one of the most useful tools in designing and analyzing matrix multiplication algorithms.

Schonhage’s \tau-theorem says roughly the following. Suppose we have several instances of matrix multiplication, each involving matrices of possibly different dimensions, and we are somehow able to design an approximate bilinear algorithm that solves all instances and uses fewer products than would be needed when computing each instance separately. Then this bilinear algorithm can be used to multiply (larger) square matrices and would imply a nontrivial bound on \omega.

What is interesting about Schonhage’s theorem is that it is believed that when it comes to exact bilinear algorithms, one cannot use fewer products to compute several instances than one would use by just computing each instance separately. This is known as Strassen’s additivity conjecture. Schonhage showed that the additivity conjecture is false for approximate bilinear algorithms. In particular, he showed that one can approximately compute the product of a 3\times 1 by a 1\times 3 vector and the product of a 1\times 4 by a 4\times 1 vector together using only 10 entry products, whereas any exact bilinear algorithm would need at least 3\cdot 3+4 = 13 products. His theorem then implied \omega<2.55, and this was a huge improvement over the previous bound of Bini et al.

Using fast solutions for problems that are not matrix multiplications.
The next realization was that there is no immediate reason why the “base algorithm” that we use for our recursion has to compute a matrix product at all. Let us focus on the following family of computational problems. We are given two vectors x and y and we want to compute a third vector z. The dependence of z on x and y is given by a three-dimensional tensor t as follows: z_k = \sum_{ij} t_{ijk} x_i y_j. The vector z is a bilinear form. The tensor t can be arbitrary, but let us focus on the case where t_{ijk}\in \{0,1\}. Notice that t completely determines the computational problem. Some examples of such bilinear problems are polynomial multiplication and of course matrix multiplication. For polynomial multiplication, t_{ijk} = 1 if and only if j=k-i, and for matrix multiplication, t_{(i,i'),(j,j'),(k,k')}=1 if and only if i'=j, j'=k and k'=i.

The nice thing about these bilinear problems is that one can easily extend the theory of bilinear algorithms to them. A bilinear algorithm computing a problem instance for tensor t computes r products P_l of the form (\sum_{i} u_{il} x_i)(\sum_j v_{jl} y_j) and then sets z_k=\sum_k w_{kl} P_l. Here, an algorithm is nontrivial if the number of products r that it computes is less than the number of positions t_{ijk} where the tensor t is nonzero.

In order to be able to talk about recursion for general bilinear problems, it is useful to define the tensor product t\otimes t' of two tensors t and t': (t\otimes t')_{(i,i'),(j,j'),(k,k')} = t_{ijk}\cdot t_{i'j'k'}. Thus, the bilinear problem defined by t\otimes t' can be viewed as a bilinear problem z_k = \sum_{ij} t_{ijk} x_i y_j defined by t, where each product x_iy_j is actually itself a bilinear problem z_{ijk'}=\sum_{i'j'} t'_{i'j'k'} x_{ii'}y_{jj'} defined by t'.

This allows one to compute an instance of the problem defined by t\otimes t' using an algorithm for t and an algorithm for t'. One can similarly define the k^{th} tensor power of a tensor t as tensor-multiplying t by itself k times. Then any bilinear algorithm computing an instance defined by t using r entry products can be used recursively to compute the k^{th} tensor power of t using r^k products, just as in the case of matrix multiplication.

A crucial development in the study of matrix multiplication algorithms was the discovery that sometimes algorithms for bilinear problems that do not look at all like matrix products can be converted into matrix multiplication algorithms. This was first shown by Strassen in the development of his “laser method” and was later exploited in the work of Coppersmith and Winograd (1987,1990). The basic idea of the approach is as follows.

Consider a bilinear problem P for which you have a really nice approximate algorithm ALG that uses r entry products. Take the n^{th} tensor power P^n of P (for large n), and use ALG recursively to compute P^n using r^n entry products. P^n is a bilinear problem that computes a long vector z from two long vectors x and y. Suppose that we can embed the product C of two N\times N matrices A and B into P^n as follows: we put each entry of A into some position of x and set all other positions of x to 0, we similarly put each entry of B into some position of y and set all other positions of y to 0, and finally we argue that each entry of the product C is in some position of the computed vector z (all other z entries are 0). Then we would have a bilinear algorithm for computing the product of two N\times N matrices using r^n entry products, and hence \omega\leq \log_N r^n.

The goal is to make N as large of a function of n as possible, thus minimizing the upper bound on \omega.

Strassen’s laser method and Coppersmith and Winograd’s paper, and even Schonhage’s \tau-theorem, present ways of embedding a matrix product into a large tensor power of a different bilinear problem. The approaches differ in the starting algorithm and in the final matrix product embedding. We’ll give a very brief overview of the Coppersmith-Winograd algorithm.

The Coppersmith-Winograd algorithm.
The bilinear problem that Coppersmith and Winograd start with is as follows. Let q\geq 3 be an integer. Then we are given two vectors x and y of length q+2 and we want to compute a vector z of length q+2 defined as follows:

z_0 = \sum_{i=1}^q (x_iy_i+x_0y_{q+1})+x_{q+1}y_0,

z_i=x_iy_0+x_0y_i for i\in \{1,\ldots, q\}, and z_{q+1}=x_0y_0.

Notice that z is far from being a matrix product. However, it is related to 6 matrix products:

\sum_{i=1}^q x_iy_i which is the inner product of two q-length vectors,

x_0y_{q+1}, x_{q+1}y_0, and x_0y_0, which are three scalar products, and

the two matrix products computing x_iy_0 and x_0y_i for i\in \{1,\ldots, q\} which are both products of a vector with a scalar.

If we could somehow convert Coppersmith and Winograd’s bilinear problem into one of computing these 6 products as independent instances, then we would be able to use Schonhage’s \tau-theorem. Unfortunately, however, the 6 matrix products are merged in a strange way, and it is unclear whether one can get anything meaningful out of an algorithm that solves this bilinear problem.

Coppersmith and Winograd develop a multitude of techniques to show that when one takes a large tensor power of the starting bilinear problem, one can actually decouple these merged matrix products, and one can indeed apply the \tau-theorem. The \tau-theorem then gives the final embedding of a large matrix product into a tensor power of the original construction, and hence defines a matrix multiplication algorithm.

Their approach combines several impressive ingredients: sets avoiding 3-term arithmetic progressions, hashing and the probabilistic method. The algorithm computing their base bilinear problem is also impressive. The number of entry products it computes is q+2, which is exactly the length of the output vector z! That is, their starting algorithm is optimal for the particular problem that they are solving.

What is not optimal, however, is the analysis of how good of a matrix product algorithm one can obtain from the base algorithm. Coppersmith and Winograd noticed this themselves: They first applied their analysis to the original bilinear algorithm and obtained an embedding of an f(n)\times f(n) matrix product into the 2n-tensor power of the bilinear problem for some explicit function f. (Then they took n to go to infinity and obtained \omega< 2.388.) Then they noticed, that if one applies the analysis to the second tensor power of the original construction, then one obtains an embedding of an f'(n)\times f'(n) matrix product into the same 2n-tensor power, where f'(n)>f(n). That is, although one is considering embeddings into the same (2n) tensor power of the construction, the analysis crucially depends on which tensor power of the construction you start from! This led to the longstanding bound \omega<2.376. Coppersmith and Winograd left as an open problem what bound one can get if one starts from the third or larger tensor powers.

The recent improvements on \omega.
It seems that many researchers attempted to apply the analysis to the third tensor power of the construction, but this somehow did not improve bound on \omega. Because of this and since each new analysis required a lot of work, the approach was abandoned, at least until 2010. In 2010, Andrew Stothers  carried through the analysis on the fourth tensor power and discovered that the bound on \omega can be improved to 2.374.

As mentioned earlier, the original Coppersmith-Winograd bilinear problem was related to 6 different matrix multiplication problems that were merged together. The k^{th} tensor power of the bilinear problem is similarly composed of poly(k) merged instances of simpler bilinear ptoblems, however these instances may no longer be matrix multiplications. When applying a Coppersmith-Winograd-like analysis to the k^{th} tensor power, there are two main steps.

The first step involves analyzing each of the poly(k) bilinear problems, intuitively in terms of how close they are to matrix products; there is a formal definition of the similarity measure called the value of the bilinear form. The second step defines a family of matrix product embeddings in the n^{th} tensor power in terms of the values. These embeddings are defined via some variables and constraints, and each represents some matrix multiplication algorithm. Finally, one solves a nonlinear optimization program to find the best among these embeddings, essentially finding the best matrix multiplication algorithm in the search space.

Both the Coppersmith-Winograd paper and Stothers’ thesis perform an entirely new analysis for each new tensor power k. The main goal of my work was to provide a general framework so that the two steps of the analysis do not have to be redone for each new tensor power k. My paper first shows that the first step, the analysis of each of the values, can be completely automated by solving linear programs and simple systems of linear equations. This means that instead of proving poly(k) theorems one only needs to solve poly(k) linear programs and linear systems, a simpler task. My paper then shows that the second step of the analysis, the theorem defining the search space of algorithms, can also be replaced by just solving a simple system of linear equations. (Amusingly, the fact that matrix multiplication algorithms can be used to solve linear equations implies that good matrix multiplication algorithms can be used to search for better matrix multiplication algorithms.) Together with the final nonlinear program, this presents a fully automated approach to performing a Coppersmith-Winograd-like analysis.

After seeing Stothers’ thesis in the summer of last year, I was impressed by a shortcut he had used in the analysis of the values of the fourth tensor power. This shortcut gave a way to use recursion in the analysis, and I was able to incorporate it in my analysis to show that the number of linear programs and linear systems one would need to solve to compute the values for the k^{th} tensor power drops down to O(k^2), at least when k is a power of 2. This drop in complexity allowed me to analyze the 8^{th} tensor power, thus obtaining an improvement in the bound of \omega: \omega<2.3727.

There are several lingering open questions. The most natural one is, how does the bound on \omega change when applying the analysis to higher and higher tensor powers. I am currently working together with a Stanford undergraduate on this problem: we’ll apply the automated approach to several consecutive powers, hoping to uncover a pattern so that one can then mathematically analyze what bounds on \omega can be proven with this approach.

A second open question is more far reaching: the Coppersmith-Winograd analysis is not optimal– in a sense it computes an approximation to the best embedding of a matrix product in the n^{th} tensor power of their bilinear problem. What is the optimal embedding? Can one analyze it mathematically? Can one automate the search for it?

Finally, I am fascinated by automating the search for algorithms for problems. In the special case of matrix multiplication we were able to define a search space of algorithms and then use software to optimize over this search space. What other computational problems can one approach in this way?

Combinatorial problems related to matrix multiplication (guest post by Chris Umans)

September 10, 2012

Determining the exponent of matrix multiplication, \omega, is one of the most prominent open problems in algebraic complexity. Strassen famously showed that \omega < 2.81 in 1969, and after a sequence of works culminating in work of Stothers and Williams the best current upper bound is \omega < 2.3727. It is believed that indeed \omega = 2, meaning that there is a family of algorithms with running time O(n^{2 + \epsilon}) for every \epsilon > 0. In this sense, if \omega =2, matrix multiplication exhibits “speedup”: there is no single best asymptotic algorithm, only a sequence of algorithms having running times O(n^a), with a approaching 2. In fact Coppersmith and Winograd showed in 1982 that this phenomenon occurs whatever the true value of \omega is (two, or larger than two), a result that is often summarized by saying “the exponent of matrix multiplication is an infimum, not a minimum.”

Despite the algebraic nature of the matrix multiplication problem, many of the suggested routes to proving \omega = 2 are combinatorial. This post is about connections between one such combinatorial conjecture (the “Strong Uniquely Solvable Puzzle” conjecture of Cohn, Kleinberg, Szegedy and Umans) and some more well-known combinatorial conjectures. These results appear in this recent paper with Noga Alon and Amir Shpilka.

The cap set conjecture, a strengthening, and a weakening

We start with a well-known, and stubborn, combinatorial problem, the “cap-set problem.” A cap-set in Z_3^n is a subset of vectors in Z_3^n with the property that no three vectors (not all the same) sum to 0. The best lower bound is 2.21...^n due to Edel. Let us denote by the “cap set conjecture” the assertion that one can actually find sets of size (3 - o(1))^n. It appears that there is no strong consensus on whether this conjecture is true or false, but one thing that we do know is that it is a popular blog topic (see herehere, here, and the end of this post).

Now it happens that the only triples of elements of Z_3 that sum to 0 are \{0,0,0\}, \{1,1,1\}, \{2,2,2\} and \{1,2,3\} — triples that are “all the same, or all different.” So the cap set conjecture can be rephrased as the conjecture that there are large subsets of vectors in Z_3^n so that for any x, y,z in the set (not all equal), there is some coordinate i such that (x_i, y_i, z_i) are “not all the same, and not all different.” Generalizing from this interpretation, we arrive at a family of much stronger assertions, one for each D: let’s denote by the “Z_D^n conjecture” the assertion that there are subsets of vectors in Z_D^n, of size (D - o(1))^n with the property that for any x, y,z in the set (not all equal), there is some coordinate i such that (x_i, y_i, z_i) are “not all the same, and not all different.” Such sets of size (D - o(1))^n in Z_D^n imply size (3 - o(1))^n-size sets of this type in Z_3^n by viewing each symbol in Z_D as a block of \log_3 D symbols in Z_3, so the Z_D^n conjecture is stronger (i.e. it implies the cap-set conjecture). Indeed, if you play around with constructions you can see that as D gets larger it seems harder and harder to have large sets avoiding triples of vectors for which every coordinate i has (x_i, y_i, z_i) “all different.” Thus one would probably guess that the Z_D^n conjecture is false.

One of the results in our paper is that the Z_D^n conjecture is in fact *equivalent* to falsifying the following well-known sunflower conjecture of Erdos-Szemeredi: there exists an \epsilon > 0 such that any family of at least 2^{(1 - \epsilon)n} subsets of [n] contains a 3-sunflower (i.e., three sets whose pairwise intersections are all equal).

So the intuition that the Z_D^n conjecture is false agrees with Erdos and Szemeredi’s intuition, which is a good sign.

Now let’s return to the cap-set conjecture in Z_3^n. Being a cap set is a condition on *all* triples of vectors in the set. If one restricts to a condition on only some of the triples of vectors, then a construction becomes potentially easier. We will be interested in a “multicolored” version in which vectors in the set are assigned one of three colors (say red, blue, green), and we only require that no triple of vectors x,y,z sums to 0 with x being red, y being blue, and z being green. But a moment’s thought reveals that the problem has become too easy: one can take (say) the red vectors to be all vectors beginning with 01, the green vectors to be all vectors beginning with 00, and the the blue vectors to be all vectors beginning with 1. A solution that seems to recover the original flavor of the problem, is to insist that the vectors come from a collection of red, green, blue triples (x, y,z) with x+y+z = 0; we then require that every red, green, blue triple *except those in the original collection* not sum to 0. So, let’s denote by the “multicolored cap-set conjecture” the assertion that there are subsets of *triples of vectors* from Z_3^n of size (3 - o(1))^n, with each triple summing to 0, and with the property that for any three triples (x_1, y_1,z_1), (x_2, y_2, z_2), (x_3, y_3, z_3) in the set (not all equal), x_1+y_2+z_3 \neq 0.

If S is a cap set in Z_3^n, then the collection of triples \{(x,x,x): x \in S\} constitutes a multicolored cap-set of the same size, so the multicolored version of the conjecture is indeed weaker (i.e. it is implied by the cap-set conjecture).

A matrix multiplication conjecture between the two

The SUSP conjecture of Cohn, Kleinberg, Szegedy, and Umans is the following: there exist subsets of *triples of vectors* from \{0,1\}^n of size (27/4 - o(1))^{n/3}, with each triple summing (in the integers) to the all-ones vector, and with the property that for any three triples [(x_1, y_1,z_1), (x_2, y_2, z_2), (x_3, y_3, z_3)] in the set (not all equal), there is a coordinate i such that x_1[i]+y_2[i]+z_3[i] = 2.

The mysterious constant (27/4)^{1/3} comes from the fact {n \choose {n/3}} \approx (27/4)^{n/3}, and it is easy to see that one cannot have multiple triples with the same “x vector” (or y, or z…). More specifically, “most” triples (x, y, z) that sum to the all-ones vector are balanced (meaning that x, y, z each have weight n/3), and there can be at most {n \choose {n/3}} balanced triples, without repeating an x vector. So the conjecture is that there are actually subsets satisfying the requirements, whose sizes approach this easy upper bound.

If in the statement of the SUSP conjecture, one replaces “there is a coordinate i such that x_1[i]+y_2[i]+z_3[i] = 2” with “there is a coordinate i such that x_1[i]+y_2[i]+z_3[i] \neq 1” one gets the weaker “Uniquely Solvable Puzzle” conjecture instead of the “Strong Uniquely Solvable Puzzle” conjecture. Here one is supposed to interpret the (x,y,z) triples as triples of “puzzle pieces” that fit together (i.e. their 1’s exactly cover the n coordinates without overlap); the main requirement then ensures that no other way of “assembling the puzzle” fits together in this way, thus it is “uniquely solvable.” It is a consequence of the famous Coppersmith-Winograd paper that the Uniquely Solvable Puzzle conjecture is indeed true. Cohn, Kleinberg, Szegedy and Umans showed that if the stronger SUSP conjecture is true, then \omega = 2.

Two of the main results in our paper are that (1) the Z_D^n conjecture implies the SUSP conjecture, and (2) the SUSP conjecture implies the multicolored cap-set conjecture.

So by (1), *disproving the SUSP* conjecture is as hard as proving the Erdos-Szemeredi conjecture (which recall is equivalent to disproving the Z_D^n conjecture). Of course we hope the SUSP conjecture is true, but if it is not, it appears that will be difficult to prove.

And by (2), proving the SUSP conjecture entails proving the multicolored cap-set conjecture. So apart from being a meaningful weakening of the famous cap-set conjecture in Z_3^n, the multicolored cap-set conjecture can be seen as a “warm-up” to (hopefully) proving the SUSP conjecture and establishing \omega = 2. As a start, we show in our paper a lower bound of 2.51^n for multicolored cap-sets, which beats the 2.21^n lower bound of Edel for ordinary cap sets.

Returning to “speedup,” the topic of this blog, notice that the SUSP conjecture, as well as the cap-set, the multicolored cap-set, and the Z_D^n conjectures, all assert that there exist sets of size (c-o(1))^n with certain properties, for various constants c. In all cases c^n is easily ruled out, and so all one can hope for is a sequence of sets, one for each n, whose sizes approach c^n as n grows. If the SUSP conjecture is true, then it is this sequence of sets that directly corresponds to a sequence of matrix multiplication algorithms with running times O(n^a) for a approaching 2. This would then be a concrete manifestation of the “speedup” phenomenon for matrix multiplication.