## Archive for the ‘Matrix multiplication’ Category

### Limits on the recent progress on matrix multiplication algorithms (guest post by Yuval Filmus)

July 20, 2014

Virginia Vassilevska Williams stunned the world of computer science by presenting the first improvement to the matrix multiplication constant $\omega$ in twenty years (later it was found that a more modest progress had been obtained independently and slightly earlier by Andrew James Stothers). The current world record is held by François Le Gall, who showed that $\omega < 2.3728639$ by perfecting the methods of Vassilevska Williams. For an exposition of this line of work, check out my recent lecture notes. For the full details, I recommend Le Gall’s paper and the journal version of Stothers’ thesis (with his advisor A. M. Davie). The recent progress heavily builds on classic work by Coppersmith and Winograd. Briefly, they came up with a basic identity known as the Coppersmith–Winograd identity. Applying Strassen’s laser method with their own ingenious construction (relying on Salem–Spencer sets), they obtained the bound $\omega < 2.388$. Applying their method to the tensor square of the basic identity, they obtained the improved bound $\omega < 2.376$. That’s where things had been standing since 1990, until Stothers managed to perform the computations for the fourth tensor power, obtaining the bound $\omega < 2.373$ in 2010. A year later (but independently), Vassilevska Williams analyzed the eighth tensor power and obtained the bound $\omega < 2.3729$. Further progress was obtained by Le Gall, who analyzed the sixteenth and thirty-second tensor powers, obtaining the current world record stated above. Although taking ever higher tensor powers improves the bound on $\omega$, the improvements are diminishing, and it seems safe to conjecture that taking the $N$th tensor power does not yield the conjectured $\omega = 2$ as $N \to \infty$. However, how can we be certain? Perhaps the improvements slow down, but like a slowly divergent series, eventually go all the way down to $2$? In the rest of the blog post, we describe a recent result of Andris Ambainis and myself, which shows that the best bound that this method can produce is $\omega < 2.3078$, for any value of $N$. In fact, our result allows for a wider class of methods which utilize the Coppersmith–Winograd identity, and hint to a new technique which can potentially lead to better algorithms. Very recently, Le Gall was able to use our methods to show that the best bound that can be obtained by taking an $N$th tensor power of the Coppersmith–Winograd identity is $\omega < 2.3725$, and so the current analysis of the identity is quite tight. (more…)

### The asymptotic sum inequality is not optimal (guest post by Yuval Filmus)

July 15, 2014

Matrix multiplication has become a popular topic recently. The main goal in this area is determining the value of the matrix multiplication constant $\omega$, which is the infimum over all $\alpha$ such that two $n\times n$ complex matrices can be multiplied using $O(n^\alpha)$ arithmetic operations (addition, subtraction, multiplication, and division; arbitrary complex constants can appear as operands). Sloppy authors often define $\omega$ as a minimum instead of an infimum, but it is not clear that any $O(n^\omega)$ algorithm exists. Indeed, given the common belief that $\omega = 2$, a classic result of Ran Raz which gives a lower bound of $\Omega(n^2\log n)$, assuming all the constants used are small, suggests that there is no $O(n^\omega)$ algorithm (though there could be an asymptotically optimal algorithm, say one which runs in time $O(n^2\log n)$).

In this blog post we describe a result due to Coppersmith and Winograd that implies that a certain class of techniques provably cannot yield an optimal exponent, i.e. an $O(n^\omega)$ algorithm, namely all algorithms which result from an invocation of Schönhage’s asymptotic sum inequality. This class includes Strassen’s original algorithm and all algorithms hence up to Strassen’s laser method, used in the celebrated algorithm of Coppersmith and Winograd, which corresponds to infinitely many invocations of the asymptotic sum inequality, and so is not subject to this limitation. The proof proceeds by showing how any identity (to which the asymptotic sum inequality can be applied) can be improved to another identity yielding a better bound on $\omega$.

### On the recent progress on matrix multiplication algorithms (guest post by Virginia Vassilevska Williams)

September 22, 2012

A central question in the theory of algorithms is to determine the constant $\omega$, called the exponent of matrix multiplication. This constant is defined as the infimum of all real numbers such that for all $\varepsilon>0$ there is an algorithm for multiplying $n\times n$ matrices running in time $O(n^{\omega+\varepsilon})$. Until the late 1960s it was believed that $\omega=3$, i.e. that no improvement can be found for the problem. In 1969, Strassen surprised everyone by showing that two $n\times n$ matrices can be multiplied in $O(n^{2.81})$ time. This discovery spawned a twenty-year-long extremely productive time in which the upper bound on $\omega$ was gradually lowered to $2.376$. After a twenty-year stall, some very recent research has brought the upper bound down to $2.373$.

Bilinear algorithms and recursion.
Strassen’s approach was to exploit the inherent recursive nature of matrix multiplication: the product of two $kn\times kn$ matrices can be viewed as the product of two $k\times k$ matrices, the entries of which are $n\times n$ matrices. Suppose that we have an algorithm ALG that runs in $o(k^3)$ time and multiplies two $k\times k$ matrices. Then one can envision obtaining a fast recursive algorithm for multiplying $k^i\times k^i$ matrices (for any integer $i>1$) as well: view the $k^i\times k^i$ matrices as $k\times k$ matrices the entries of which are $k^{i-1}\times k^{i-1}$ matrices; then multiply the $k\times k$ matrices using ALG and when ALG requires us to multiply two matrix entries, recurse.

This approach only works, provided that the operations that ALG performs on the matrix entries make sense as matrix operations: e.g. entry multiplication, taking linear combination of entries etc. One very general type of such algorithm is the so called bilinear algorithm: Given two matrices $A$ and $B$, compute $r$ products

$P_l = (\sum_{i,j} u_{ijl} A[i,j])(\sum_{ij} v_{ijl} B[i,j]),$

i.e. take $r$ possibly different linear combinations of entries of $A$ and multiply each one with a possibly different linear combination of entries of $B$. Then, compute each entry of the product $AB$ as a linear combination of the $P_l$: $AB[i,j]=\sum_{l} w_{ijl} P_l$.

Given a bilinear algorithm ALG for multiplying two $k\times k$ matrices (for constant $k$) that computes $r$ products $P_l$, the recursive approach that multiplies $k^i\times k^i$ matrices using ALG gives a bound $\omega\leq \log_k r$. To see this, notice that the number of additions that one has to do is no more than $3rk^2$: at most $2k^2$ to compute the linear combinations for each $P_l$ and at most $r$ for each of the $k^2$ outputs $AB[i,j]$. Since matrix addition takes linear time in the matrix size, we have a recurrence of the form $T(k^i) = r T(k^{i-1}) + O(rk^{2i})$.

As long as $r we get a nontrivial bound on $\omega$. Strassen’s famous algorithm used $k=2$ and $r=7$ thus showing that $\omega\leq \log_2 7$. A lot of work went into getting better and better “base algorithms” for varying constants $k$. Methods such as Pan’s method of trilinear aggregation were developed. This approach culminated in Pan’s algorithm (1978) for multiplying $70\times 70$ matrices that used $143,640$ products and hence showed that $\omega\leq \log_{70} 143,640<2.796$.

Approximate algorithms and Schonhage’s theorem.
A further step was to look at more general algorithms, so called approximate bilinear algorithms. In the definition of a bilinear algorithm the coefficients $u_{ijl},v_{ijl},w_{ijl}$ were constants. In an approximate algorithm, these coefficients can be formal linear combinations of integer powers of an indeterminate, $\lambda$ (e.g. $2\lambda^2-5\lambda^{-3}$). The entries of the product $AB$ are then only “approximately” computed, in the sense that $AB[i,j]=\sum_{l} w_{ijl} P_l + O(\lambda)$, where the $O(\lambda)$ term is a linear combination of positive powers of $\lambda$. The term “approximate” comes from the intuition that if you set $\lambda$ to be close to $0$, then the algorithm would get the product almost exactly.

Interestingly enough, Bini et al. (1980) showed that when dealing with the asymptotic complexity of matrix multiplication, approximate algorithms suffice for obtaining bounds on $\omega$. This is not obvious! What Bini et al. show, in a sense, is that as the size of the matrices grows, the “approximation” part can be replaced by a sort of bookkeeping which does not present an overhead asymptotically. The upshot is that if there is an approximate bilinear algorithm that computes $r$ products $P_l$ to compute the product of two $k\times k$ matrices, then $\omega\leq \log_k r$.

Bini et al. (1979) gave the first approximate bilinear algorithm for a matrix product. Their algorithm used $10$ entry products to multiply a $2\times 3$ matrix with a $3\times 3$ matrix. Although this algorithm is for rectangular matrices, it can easily be converted into one for square matrices: a $12\times 12$ matrix is a $2\times 3$ matrix with entries that are $3\times 2$ matrices with entries that are $2\times 2$ matrices, and so multiplying $12\times 12$ matrices can be done recursively using Bini et al.’s algorithm three times, taking $1,000$ entry products. Hence $\omega\leq \log_{12} 1,000<2.78$.

Schonhage (1981) developed a sophisticated theory involving the bilinear complexity of rectangular matrix multiplication that showed that approximate bilinear algorithms are even more powerful. His paper culminated in something called the Schonhage $\tau$-theorem, or the asymptotic sum inequality. This theorem is one of the most useful tools in designing and analyzing matrix multiplication algorithms.

Schonhage’s $\tau$-theorem says roughly the following. Suppose we have several instances of matrix multiplication, each involving matrices of possibly different dimensions, and we are somehow able to design an approximate bilinear algorithm that solves all instances and uses fewer products than would be needed when computing each instance separately. Then this bilinear algorithm can be used to multiply (larger) square matrices and would imply a nontrivial bound on $\omega$.

What is interesting about Schonhage’s theorem is that it is believed that when it comes to exact bilinear algorithms, one cannot use fewer products to compute several instances than one would use by just computing each instance separately. This is known as Strassen’s additivity conjecture. Schonhage showed that the additivity conjecture is false for approximate bilinear algorithms. In particular, he showed that one can approximately compute the product of a $3\times 1$ by a $1\times 3$ vector and the product of a $1\times 4$ by a $4\times 1$ vector together using only $10$ entry products, whereas any exact bilinear algorithm would need at least $3\cdot 3+4 = 13$ products. His theorem then implied $\omega<2.55$, and this was a huge improvement over the previous bound of Bini et al.

Using fast solutions for problems that are not matrix multiplications.
The next realization was that there is no immediate reason why the “base algorithm” that we use for our recursion has to compute a matrix product at all. Let us focus on the following family of computational problems. We are given two vectors $x$ and $y$ and we want to compute a third vector $z$. The dependence of $z$ on $x$ and $y$ is given by a three-dimensional tensor $t$ as follows: $z_k = \sum_{ij} t_{ijk} x_i y_j$. The vector $z$ is a bilinear form. The tensor $t$ can be arbitrary, but let us focus on the case where $t_{ijk}\in \{0,1\}$. Notice that $t$ completely determines the computational problem. Some examples of such bilinear problems are polynomial multiplication and of course matrix multiplication. For polynomial multiplication, $t_{ijk} = 1$ if and only if $j=k-i$, and for matrix multiplication, $t_{(i,i'),(j,j'),(k,k')}=1$ if and only if $i'=j, j'=k$ and $k'=i$.

The nice thing about these bilinear problems is that one can easily extend the theory of bilinear algorithms to them. A bilinear algorithm computing a problem instance for tensor $t$ computes $r$ products $P_l$ of the form $(\sum_{i} u_{il} x_i)(\sum_j v_{jl} y_j)$ and then sets $z_k=\sum_k w_{kl} P_l$. Here, an algorithm is nontrivial if the number of products $r$ that it computes is less than the number of positions $t_{ijk}$ where the tensor $t$ is nonzero.

In order to be able to talk about recursion for general bilinear problems, it is useful to define the tensor product $t\otimes t'$ of two tensors $t$ and $t'$: $(t\otimes t')_{(i,i'),(j,j'),(k,k')} = t_{ijk}\cdot t_{i'j'k'}$. Thus, the bilinear problem defined by $t\otimes t'$ can be viewed as a bilinear problem $z_k = \sum_{ij} t_{ijk} x_i y_j$ defined by $t$, where each product $x_iy_j$ is actually itself a bilinear problem $z_{ijk'}=\sum_{i'j'} t'_{i'j'k'} x_{ii'}y_{jj'}$ defined by $t'$.

This allows one to compute an instance of the problem defined by $t\otimes t'$ using an algorithm for $t$ and an algorithm for $t'$. One can similarly define the $k^{th}$ tensor power of a tensor $t$ as tensor-multiplying $t$ by itself $k$ times. Then any bilinear algorithm computing an instance defined by $t$ using $r$ entry products can be used recursively to compute the $k^{th}$ tensor power of $t$ using $r^k$ products, just as in the case of matrix multiplication.

A crucial development in the study of matrix multiplication algorithms was the discovery that sometimes algorithms for bilinear problems that do not look at all like matrix products can be converted into matrix multiplication algorithms. This was first shown by Strassen in the development of his “laser method” and was later exploited in the work of Coppersmith and Winograd (1987,1990). The basic idea of the approach is as follows.

Consider a bilinear problem $P$ for which you have a really nice approximate algorithm ALG that uses $r$ entry products. Take the $n^{th}$ tensor power $P^n$ of $P$ (for large $n$), and use ALG recursively to compute $P^n$ using $r^n$ entry products. $P^n$ is a bilinear problem that computes a long vector $z$ from two long vectors $x$ and $y$. Suppose that we can embed the product $C$ of two $N\times N$ matrices $A$ and $B$ into $P^n$ as follows: we put each entry of $A$ into some position of $x$ and set all other positions of $x$ to $0$, we similarly put each entry of $B$ into some position of $y$ and set all other positions of $y$ to $0$, and finally we argue that each entry of the product $C$ is in some position of the computed vector $z$ (all other $z$ entries are $0$). Then we would have a bilinear algorithm for computing the product of two $N\times N$ matrices using $r^n$ entry products, and hence $\omega\leq \log_N r^n$.

The goal is to make $N$ as large of a function of $n$ as possible, thus minimizing the upper bound on $\omega$.

Strassen’s laser method and Coppersmith and Winograd’s paper, and even Schonhage’s $\tau$-theorem, present ways of embedding a matrix product into a large tensor power of a different bilinear problem. The approaches differ in the starting algorithm and in the final matrix product embedding. We’ll give a very brief overview of the Coppersmith-Winograd algorithm.

The bilinear problem that Coppersmith and Winograd start with is as follows. Let $q\geq 3$ be an integer. Then we are given two vectors $x$ and $y$ of length $q+2$ and we want to compute a vector $z$ of length $q+2$ defined as follows:

$z_0 = \sum_{i=1}^q (x_iy_i+x_0y_{q+1})+x_{q+1}y_0$,

$z_i=x_iy_0+x_0y_i$ for $i\in \{1,\ldots, q\}$, and $z_{q+1}=x_0y_0$.

Notice that $z$ is far from being a matrix product. However, it is related to $6$ matrix products:

$\sum_{i=1}^q x_iy_i$ which is the inner product of two $q$-length vectors,

$x_0y_{q+1}$, $x_{q+1}y_0$, and $x_0y_0$, which are three scalar products, and

the two matrix products computing $x_iy_0$ and $x_0y_i$ for $i\in \{1,\ldots, q\}$ which are both products of a vector with a scalar.

If we could somehow convert Coppersmith and Winograd’s bilinear problem into one of computing these $6$ products as independent instances, then we would be able to use Schonhage’s $\tau$-theorem. Unfortunately, however, the $6$ matrix products are merged in a strange way, and it is unclear whether one can get anything meaningful out of an algorithm that solves this bilinear problem.

Coppersmith and Winograd develop a multitude of techniques to show that when one takes a large tensor power of the starting bilinear problem, one can actually decouple these merged matrix products, and one can indeed apply the $\tau$-theorem. The $\tau$-theorem then gives the final embedding of a large matrix product into a tensor power of the original construction, and hence defines a matrix multiplication algorithm.

Their approach combines several impressive ingredients: sets avoiding $3$-term arithmetic progressions, hashing and the probabilistic method. The algorithm computing their base bilinear problem is also impressive. The number of entry products it computes is $q+2$, which is exactly the length of the output vector $z$! That is, their starting algorithm is optimal for the particular problem that they are solving.

What is not optimal, however, is the analysis of how good of a matrix product algorithm one can obtain from the base algorithm. Coppersmith and Winograd noticed this themselves: They first applied their analysis to the original bilinear algorithm and obtained an embedding of an $f(n)\times f(n)$ matrix product into the $2n$-tensor power of the bilinear problem for some explicit function $f$. (Then they took $n$ to go to infinity and obtained $\omega< 2.388$.) Then they noticed, that if one applies the analysis to the second tensor power of the original construction, then one obtains an embedding of an $f'(n)\times f'(n)$ matrix product into the same $2n$-tensor power, where $f'(n)>f(n)$. That is, although one is considering embeddings into the same ($2n$) tensor power of the construction, the analysis crucially depends on which tensor power of the construction you start from! This led to the longstanding bound $\omega<2.376$. Coppersmith and Winograd left as an open problem what bound one can get if one starts from the third or larger tensor powers.

The recent improvements on $\omega$.
It seems that many researchers attempted to apply the analysis to the third tensor power of the construction, but this somehow did not improve bound on $\omega$. Because of this and since each new analysis required a lot of work, the approach was abandoned, at least until 2010. In 2010, Andrew Stothers  carried through the analysis on the fourth tensor power and discovered that the bound on $\omega$ can be improved to $2.374$.

As mentioned earlier, the original Coppersmith-Winograd bilinear problem was related to $6$ different matrix multiplication problems that were merged together. The $k^{th}$ tensor power of the bilinear problem is similarly composed of $poly(k)$ merged instances of simpler bilinear ptoblems, however these instances may no longer be matrix multiplications. When applying a Coppersmith-Winograd-like analysis to the $k^{th}$ tensor power, there are two main steps.

The first step involves analyzing each of the $poly(k)$ bilinear problems, intuitively in terms of how close they are to matrix products; there is a formal definition of the similarity measure called the value of the bilinear form. The second step defines a family of matrix product embeddings in the $n^{th}$ tensor power in terms of the values. These embeddings are defined via some variables and constraints, and each represents some matrix multiplication algorithm. Finally, one solves a nonlinear optimization program to find the best among these embeddings, essentially finding the best matrix multiplication algorithm in the search space.

Both the Coppersmith-Winograd paper and Stothers’ thesis perform an entirely new analysis for each new tensor power $k$. The main goal of my work was to provide a general framework so that the two steps of the analysis do not have to be redone for each new tensor power $k$. My paper first shows that the first step, the analysis of each of the values, can be completely automated by solving linear programs and simple systems of linear equations. This means that instead of proving $poly(k)$ theorems one only needs to solve $poly(k)$ linear programs and linear systems, a simpler task. My paper then shows that the second step of the analysis, the theorem defining the search space of algorithms, can also be replaced by just solving a simple system of linear equations. (Amusingly, the fact that matrix multiplication algorithms can be used to solve linear equations implies that good matrix multiplication algorithms can be used to search for better matrix multiplication algorithms.) Together with the final nonlinear program, this presents a fully automated approach to performing a Coppersmith-Winograd-like analysis.

After seeing Stothers’ thesis in the summer of last year, I was impressed by a shortcut he had used in the analysis of the values of the fourth tensor power. This shortcut gave a way to use recursion in the analysis, and I was able to incorporate it in my analysis to show that the number of linear programs and linear systems one would need to solve to compute the values for the $k^{th}$ tensor power drops down to $O(k^2)$, at least when $k$ is a power of $2$. This drop in complexity allowed me to analyze the $8^{th}$ tensor power, thus obtaining an improvement in the bound of $\omega$: $\omega<2.3727$.

There are several lingering open questions. The most natural one is, how does the bound on $\omega$ change when applying the analysis to higher and higher tensor powers. I am currently working together with a Stanford undergraduate on this problem: we’ll apply the automated approach to several consecutive powers, hoping to uncover a pattern so that one can then mathematically analyze what bounds on $\omega$ can be proven with this approach.

A second open question is more far reaching: the Coppersmith-Winograd analysis is not optimal– in a sense it computes an approximation to the best embedding of a matrix product in the $n^{th}$ tensor power of their bilinear problem. What is the optimal embedding? Can one analyze it mathematically? Can one automate the search for it?

Finally, I am fascinated by automating the search for algorithms for problems. In the special case of matrix multiplication we were able to define a search space of algorithms and then use software to optimize over this search space. What other computational problems can one approach in this way?

### Combinatorial problems related to matrix multiplication (guest post by Chris Umans)

September 10, 2012

Determining the exponent of matrix multiplication, $\omega$, is one of the most prominent open problems in algebraic complexity. Strassen famously showed that $\omega < 2.81$ in 1969, and after a sequence of works culminating in work of Stothers and Williams the best current upper bound is $\omega < 2.3727$. It is believed that indeed $\omega = 2$, meaning that there is a family of algorithms with running time $O(n^{2 + \epsilon})$ for every $\epsilon > 0$. In this sense, if $\omega =2$, matrix multiplication exhibits “speedup”: there is no single best asymptotic algorithm, only a sequence of algorithms having running times $O(n^a)$, with $a$ approaching $2$. In fact Coppersmith and Winograd showed in 1982 that this phenomenon occurs whatever the true value of $\omega$ is (two, or larger than two), a result that is often summarized by saying “the exponent of matrix multiplication is an infimum, not a minimum.”

Despite the algebraic nature of the matrix multiplication problem, many of the suggested routes to proving $\omega = 2$ are combinatorial. This post is about connections between one such combinatorial conjecture (the “Strong Uniquely Solvable Puzzle” conjecture of Cohn, Kleinberg, Szegedy and Umans) and some more well-known combinatorial conjectures. These results appear in this recent paper with Noga Alon and Amir Shpilka.

### The cap set conjecture, a strengthening, and a weakening

We start with a well-known, and stubborn, combinatorial problem, the “cap-set problem.” A cap-set in $Z_3^n$ is a subset of vectors in $Z_3^n$ with the property that no three vectors (not all the same) sum to 0. The best lower bound is $2.21...^n$ due to Edel. Let us denote by the “cap set conjecture” the assertion that one can actually find sets of size $(3 - o(1))^n$. It appears that there is no strong consensus on whether this conjecture is true or false, but one thing that we do know is that it is a popular blog topic (see herehere, here, and the end of this post).

Now it happens that the only triples of elements of $Z_3$ that sum to 0 are $\{0,0,0\}, \{1,1,1\}, \{2,2,2\}$ and $\{1,2,3\}$ — triples that are “all the same, or all different.” So the cap set conjecture can be rephrased as the conjecture that there are large subsets of vectors in $Z_3^n$ so that for any $x, y,z$ in the set (not all equal), there is some coordinate $i$ such that $(x_i, y_i, z_i)$ are “not all the same, and not all different.” Generalizing from this interpretation, we arrive at a family of much stronger assertions, one for each $D$: let’s denote by the “$Z_D^n$ conjecture” the assertion that there are subsets of vectors in $Z_D^n$, of size $(D - o(1))^n$ with the property that for any $x, y,z$ in the set (not all equal), there is some coordinate $i$ such that $(x_i, y_i, z_i)$ are “not all the same, and not all different.” Such sets of size $(D - o(1))^n$ in $Z_D^n$ imply size $(3 - o(1))^n$-size sets of this type in $Z_3^n$ by viewing each symbol in $Z_D$ as a block of $\log_3 D$ symbols in $Z_3$, so the $Z_D^n$ conjecture is stronger (i.e. it implies the cap-set conjecture). Indeed, if you play around with constructions you can see that as $D$ gets larger it seems harder and harder to have large sets avoiding triples of vectors for which every coordinate $i$ has $(x_i, y_i, z_i)$ “all different.” Thus one would probably guess that the $Z_D^n$ conjecture is false.

One of the results in our paper is that the $Z_D^n$ conjecture is in fact *equivalent* to falsifying the following well-known sunflower conjecture of Erdos-Szemeredi: there exists an $\epsilon > 0$ such that any family of at least $2^{(1 - \epsilon)n}$ subsets of $[n]$ contains a 3-sunflower (i.e., three sets whose pairwise intersections are all equal).

So the intuition that the $Z_D^n$ conjecture is false agrees with Erdos and Szemeredi’s intuition, which is a good sign.

Now let’s return to the cap-set conjecture in $Z_3^n$. Being a cap set is a condition on *all* triples of vectors in the set. If one restricts to a condition on only some of the triples of vectors, then a construction becomes potentially easier. We will be interested in a “multicolored” version in which vectors in the set are assigned one of three colors (say red, blue, green), and we only require that no triple of vectors $x,y,z$ sums to 0 with $x$ being red, $y$ being blue, and $z$ being green. But a moment’s thought reveals that the problem has become too easy: one can take (say) the red vectors to be all vectors beginning with $01$, the green vectors to be all vectors beginning with $00$, and the the blue vectors to be all vectors beginning with $1$. A solution that seems to recover the original flavor of the problem, is to insist that the vectors come from a collection of red, green, blue triples $(x, y,z)$ with $x+y+z = 0$; we then require that every red, green, blue triple *except those in the original collection* not sum to 0. So, let’s denote by the “multicolored cap-set conjecture” the assertion that there are subsets of *triples of vectors* from $Z_3^n$ of size $(3 - o(1))^n$, with each triple summing to 0, and with the property that for any three triples $(x_1, y_1,z_1), (x_2, y_2, z_2), (x_3, y_3, z_3)$ in the set (not all equal), $x_1+y_2+z_3 \neq 0$.

If $S$ is a cap set in $Z_3^n$, then the collection of triples $\{(x,x,x): x \in S\}$ constitutes a multicolored cap-set of the same size, so the multicolored version of the conjecture is indeed weaker (i.e. it is implied by the cap-set conjecture).

### A matrix multiplication conjecture between the two

The SUSP conjecture of Cohn, Kleinberg, Szegedy, and Umans is the following: there exist subsets of *triples of vectors* from $\{0,1\}^n$ of size $(27/4 - o(1))^{n/3}$, with each triple summing (in the integers) to the all-ones vector, and with the property that for any three triples $[(x_1, y_1,z_1), (x_2, y_2, z_2), (x_3, y_3, z_3)]$ in the set (not all equal), there is a coordinate $i$ such that $x_1[i]+y_2[i]+z_3[i] = 2$.

The mysterious constant $(27/4)^{1/3}$ comes from the fact ${n \choose {n/3}} \approx (27/4)^{n/3}$, and it is easy to see that one cannot have multiple triples with the same “$x$ vector” (or $y$, or $z$…). More specifically, “most” triples $(x, y, z)$ that sum to the all-ones vector are balanced (meaning that $x, y, z$ each have weight $n/3$), and there can be at most ${n \choose {n/3}}$ balanced triples, without repeating an $x$ vector. So the conjecture is that there are actually subsets satisfying the requirements, whose sizes approach this easy upper bound.

If in the statement of the SUSP conjecture, one replaces “there is a coordinate $i$ such that $x_1[i]+y_2[i]+z_3[i] = 2$” with “there is a coordinate $i$ such that $x_1[i]+y_2[i]+z_3[i] \neq 1$” one gets the weaker “Uniquely Solvable Puzzle” conjecture instead of the “Strong Uniquely Solvable Puzzle” conjecture. Here one is supposed to interpret the $(x,y,z)$ triples as triples of “puzzle pieces” that fit together (i.e. their 1’s exactly cover the $n$ coordinates without overlap); the main requirement then ensures that no other way of “assembling the puzzle” fits together in this way, thus it is “uniquely solvable.” It is a consequence of the famous Coppersmith-Winograd paper that the Uniquely Solvable Puzzle conjecture is indeed true. Cohn, Kleinberg, Szegedy and Umans showed that if the stronger SUSP conjecture is true, then $\omega = 2$.

Two of the main results in our paper are that (1) the $Z_D^n$ conjecture implies the SUSP conjecture, and (2) the SUSP conjecture implies the multicolored cap-set conjecture.

So by (1), *disproving the SUSP* conjecture is as hard as proving the Erdos-Szemeredi conjecture (which recall is equivalent to disproving the $Z_D^n$ conjecture). Of course we hope the SUSP conjecture is true, but if it is not, it appears that will be difficult to prove.

And by (2), proving the SUSP conjecture entails proving the multicolored cap-set conjecture. So apart from being a meaningful weakening of the famous cap-set conjecture in $Z_3^n$, the multicolored cap-set conjecture can be seen as a “warm-up” to (hopefully) proving the SUSP conjecture and establishing $\omega = 2$. As a start, we show in our paper a lower bound of $2.51^n$ for multicolored cap-sets, which beats the $2.21^n$ lower bound of Edel for ordinary cap sets.

Returning to “speedup,” the topic of this blog, notice that the SUSP conjecture, as well as the cap-set, the multicolored cap-set, and the $Z_D^n$ conjectures, all assert that there exist sets of size $(c-o(1))^n$ with certain properties, for various constants $c$. In all cases $c^n$ is easily ruled out, and so all one can hope for is a sequence of sets, one for each $n$, whose sizes approach $c^n$ as $n$ grows. If the SUSP conjecture is true, then it is this sequence of sets that directly corresponds to a sequence of matrix multiplication algorithms with running times $O(n^a)$ for $a$ approaching $2$. This would then be a concrete manifestation of the “speedup” phenomenon for matrix multiplication.