## The asymptotic sum inequality is not optimal (guest post by Yuval Filmus)

Matrix multiplication has become a popular topic recently. The main goal in this area is determining the value of the matrix multiplication constant $\omega$, which is the infimum over all $\alpha$ such that two $n\times n$ complex matrices can be multiplied using $O(n^\alpha)$ arithmetic operations (addition, subtraction, multiplication, and division; arbitrary complex constants can appear as operands). Sloppy authors often define $\omega$ as a minimum instead of an infimum, but it is not clear that any $O(n^\omega)$ algorithm exists. Indeed, given the common belief that $\omega = 2$, a classic result of Ran Raz which gives a lower bound of $\Omega(n^2\log n)$, assuming all the constants used are small, suggests that there is no $O(n^\omega)$ algorithm (though there could be an asymptotically optimal algorithm, say one which runs in time $O(n^2\log n)$).

In this blog post we describe a result due to Coppersmith and Winograd that implies that a certain class of techniques provably cannot yield an optimal exponent, i.e. an $O(n^\omega)$ algorithm, namely all algorithms which result from an invocation of Schönhage’s asymptotic sum inequality. This class includes Strassen’s original algorithm and all algorithms hence up to Strassen’s laser method, used in the celebrated algorithm of Coppersmith and Winograd, which corresponds to infinitely many invocations of the asymptotic sum inequality, and so is not subject to this limitation. The proof proceeds by showing how any identity (to which the asymptotic sum inequality can be applied) can be improved to another identity yielding a better bound on $\omega$.

Strassen’s original algorithm and the language of tensors

In 1969, Strassen stunned the world of science by describing a matrix multiplication algorithm running in time $O(n^{\log_2 7})$. The algorithm relies on an identity showing how to multiply two $2\times 2$ matrices in non-commuting variables using $7$ “essential” multiplications (that is, not including multiplication by constants). Running this algorithm recursively (by treating a $2^m\times 2^m$ matrix as a $2\times 2$ matrix whose entries are $2^{m-1}\times 2^{m-1}$ matrices) allows us to multiply two $2^m\times 2^m$ matrices using $7^m$ essential multiplications, and the resulting algorithm has complexity $O(7^m)$.

Later on, Strassen and others developed a framework in which the basic identity can be expressed succinctly as

$R(\langle 2,2,2 \rangle) \leq 7.$

Generalizing the recursive construction gives Strassen’s theorem

$\omega \leq \log_m R(\langle m,m,m \rangle).$

It is now time to explain the notations $\langle 2,2,2 \rangle$ and $R(\cdot)$. The notation $\langle 2,2,2 \rangle$, or more generally $\langle n,m,p \rangle$, represents a certain tensor, a three-dimensional analog of a matrix:

$\displaystyle \langle n,m,p \rangle = \sum_{i=1}^n \sum_{j=1}^m \sum_{k=1}^p x_{ij} y_{jk} z_{ki}.$

The expression on the right-hand side represents a three-dimensional array whose rows are indexed by the $nm$ elements $x_{ij}$, whose columns are indexed by the $mp$ elements $y_{jk}$, and whose third dimension is indexed by the $pn$ elements $z_{ki}$. Each term $x_{ij} y_{jk} z_{ki}$ in the sum indexes a particular cell in this three-dimensional array, and represents the fact that this cell has value $1$. All other cells have value $0$. For comparison, a matrix $A = (a_{ij})$ could be encoded in a similar fashion by the double sum $\sum_{ij} a_{ij} x_i y_j$. The tensor $\langle n,m,p \rangle$ represents the multiplication of an $n\times m$ matrix (whose entries are indexed by the $x$ variables) by an $m\times p$ matrix (whose entries are indexed by the $y$ variables), resulting in an $n\times p$ matrix (whose entries are indexed by the $z$ variables).

The notation $R(T)$ stands for the rank of the tensor $T$, which is the smallest number of rank one tensors which sum to $T$. A rank one tensor is an outer product of the form

$\displaystyle (\sum_a \alpha_a x_a) (\sum_b \beta_b y_b) (\sum_c \gamma_c z_c) = \sum_{a,b,c} \alpha_a \beta_b \gamma_c x_a y_b z_c.$

The usual matrix rank can also be defined in this fashion, and this is the easiest way to show that row rank equals column rank for matrices.

Here is Strassen’s identity in this language:

$\langle 2,2,2 \rangle = (x_{11} + x_{22})(y_{11} + y_{22})(z_{11} + z_{22}) +$

$(x_{21}+x_{22})y_{11}(z_{21}-z_{22}) + x_{11}(y_{12}-y_{22})(z_{12}+z_{22}) + x_{22}(y_{21}-y_{11})(z_{11}+z_{21}) +$

$(x_{11}+x_{12})y_{22}(-z_{11}+z_{12}) + (x_{21}-x_{11})(y_{11}+y_{12})z_{22} + (x_{12}-x_{22})(y_{21}+y_{22})z_{11}.$

Tensorial notation makes it clear that $R(\langle n,m,p \rangle) = R(\langle m,p,n \rangle)$, and the rank is the same for the other four permutations as well.

Two important operations on tensors are direct sum and tensor product. The direct sum $T_1 \oplus T_2$ of two tensors is a “block diagonal tensor” which has $T_1$ as one block and $T_2$ as the other block. In tensorial notation, after ensuring that variables appearing in both tensors are disjoint, $T_1 \oplus T_2 = T_1 + T_2$. For example,

$\displaystyle \langle n,1,m \rangle \oplus \langle 1,p,1 \rangle = \sum_{i=1}^n \sum_{k=1}^m x_i y_k z_{ki} + \sum_{j=1}^p X_j Y_j Z.$

The variable names we chose are arbitrary; in fact, the equals sign really means “equal up to isomorphism”, or rather “equal up to choice of indices for rows, columns and the third dimension”. The two constituent tensors represent an outer product and an inner product of vectors, respectively, and will figure out prominently in the sequel. It is not hard to check that in general, $R(T_1 \oplus T_2) \leq R(T_1) + R(T_2)$, but this is not always tight, as the example shows.

The tensor product $T_1 \otimes T_2$ of two tensors corresponds to the Kronecker product of matrices. Instead of explaining it formally, here is an example:

$\langle n_1,m_1,p_1 \rangle \otimes \langle n_2,m_2,p_2 \rangle =$

$\displaystyle \sum_{i_1=1}^{n_1} \sum_{j_1=1}^{m_1} \sum_{k_1=1}^{p_1} x_{i_1j_1} y_{j_1k_1} z_{k_1i_1} \otimes \sum_{i_2=1}^{n_2} \sum_{j_2=1}^{m_2} \sum_{k_2=1}^{p_2} x_{i_2j_2} y_{j_2k_2} z_{k_2i_2} =$

$\displaystyle \sum_{(i_1,i_2)=(1,1)}^{(n_1,n_2)} \sum_{(j_1,j_2)=(1,1)}^{(m_1,m_2)} \sum_{(k_1,k_2)=(1,1)}^{(p_1,p_2)} x_{(i_1,i_2)(j_1,j_2)} y_{(j_1,j_2)(k_1,k_2)} z_{(k_1,k_2)(i_1,i_2)} =$

$\langle n_1n_2,m_1m_2,p_1p_2 \rangle.$

Formally speaking, the rows of the product tensor are indexed by pairs of row indices, and similarly for columns and the third dimension; the $(a_1,a_2),(b_1,b_2),(c_1,c_2)$ entry of the product tensor is equal to $T_1(a_1,b_1,c_1) T_2(a_2,b_2,c_2)$. It is not hard to check that $R(T_1 \otimes T_2) \leq R(T_1) R(T_2)$.

The tensor product can be iterated to produce the tensor power $T^{\otimes N}$. In terms of rank, $R(T^{\otimes N}) \leq R(T)^N$.

Border rank and the asymptotic sum inequality

The rank operator on matrices is continuous, that is, if a matrix $M$ has rank $R$, then all matrices close enough to $M$ will have rank $R$. Surprisingly, this is not the case for tensors: the identity

$x_1 y_1 z_1 + \epsilon x_2 y_1 z_1 + x_2 y_2 z_1 + x_1 y_2 z_2 = (x_1 + \epsilon x_2)(y_1 + \epsilon^{-1} y_2)z_1 + x_1 y_2 (-\epsilon^{-1} z_1+z_2)$

shows that for any $\epsilon \neq 0$, the rank of the tensor on the left is $2$, and on the other hand one can check that when $\epsilon = 0$ the rank is $3$. The border rank of a tensor $T$, denoted $\underline{R}(T)$, is the minimum rank of a sequence of tensors converging to $T$. Here is an example due to Schönhage:

$\displaystyle \epsilon^2 \left(\sum_{i=1}^n \sum_{k=1}^m x_i y_k z_{ki} + \sum_{(i,k)=(1,1)}^{(n-1,m-1)} X_{(i,k)} Y_{(i,k)} Z\right) + O(\epsilon^3) =$

$\displaystyle \sum_{i=1}^{n-1} \sum_{k=1}^{m-1} (x_i + \epsilon X_{(i,k)})(y_k + \epsilon Y_{(i,k)}) (\epsilon^2 z_{ki} + Z) +$

$\displaystyle \sum_{i=1}^{n-1} x_i (y_m - \epsilon \sum_{k=1}^{m-1} Y_{(i,k)})(\epsilon^2 z_{mi}) + \sum_{k=1}^{m-1} (x_n - \epsilon \sum_{i=1}^{n-1} X_{(i,k)}) y_k (\epsilon^2 z_{kn}) +$

$\displaystyle x_n y_m (\epsilon^2 z_{nm} + Z) - \left(\sum_{i=1}^n x_i\right) \left(\sum_{k=1}^m y_k\right) Z.$

In this identity, $O(\epsilon^3)$ is some expression only involving powers of $\epsilon$ which are $\epsilon^3$ or higher. If we divide both sides by $\epsilon^2$ and let $\epsilon \to 0$, we recover $\langle n,1,m \rangle \oplus \langle 1, (n-1)(m-1), 1 \rangle$ on the left-hand side. Since there are $nm + 1$ terms on the right-hand side, we conclude that

$\underline{R}(\langle n,1,m \rangle \oplus \langle 1,(n-1)(m-1),1 \rangle) \leq nm+1$.

This is rather surprising, since clearly $\underline{R}(\langle n,1,m \rangle) = nm$. At the cost of an increase of one in the border rank, we are somehow able to compute a large disjoint inner product!

It is a deep fact due to Strassen that all upper bounds on the border rank arise as “approximate identities”, which are identities like Schönhage’s identity mentioned above (the power of $\epsilon$ on the left-hand side can be different); all powers of $\epsilon$ appearing in such identities are positive integers. This shows that $\underline{R}(T \oplus T') \leq \underline{R}(T) + \underline{R}(T')$ and $\underline{R}(T \otimes T') \leq \underline{R}(T) \underline{R}(T')$ for all tensors $T,T'$.

How is Schönhage’s identity useful for multiplying two square matrices?

Asymptotic sum inequality: For a set $(n_t,m_t,p_t)$ of triples, not all equal to $(1,1,1)$,

$\displaystyle \sum_t (n_t m_t p_t)^{\omega/3} \leq \underline{R}(\bigoplus_t \langle n_t,m_t,p_t \rangle).$

This result of Schönhage is a vast generalization of Strassen’s theorem mentioned above, which corresponds to the special case in which there is only one triple (with border rank replaced by the rank). Unfortunately, explaining the proof of Schönhage’s identity will take us too far afield. The proof can be found in many sources, such as my lecture notes. One interesting aspect is that the proof of the asymptotic sum inequality does not result in an $O(n^\alpha)$ algorithm, where $\alpha$ is the upper bound on $\omega$ resulting from the inequality. Rather, the proof gives a sequence of algorithms whose exponents converge to $\alpha$.

Choosing $n=m=4$ in Schönhage’s identity and applying the asymptotic sum inequality, we obtain

$16^{\omega/3} + 9^{\omega/3} \leq 17.$

Solving for $\omega$, we get the upper bound $\omega \leq 2.55$.

The asymptotic sum inequality isn’t optimal

Main theorem. Let $\alpha$ be the upper bound on $\omega$ obtained from an application of the asymptotic sum inequality. Then $\omega < \alpha$.

In other words, a single invocation of the asymptotic sum inequality cannot yield the optimal value of $\omega$. One way around this limitation is to apply the asymptotic sum inequality to a sequence of identities — the course of action taken by Strassen’s laser method.

In the rest of this post, we provide an (almost) complete proof of this theorem. We start with a result generalizing Schönhage’s identity, which shows that in certain situations one can compute an additional inner product for free or at only a small cost in the border rank. This result is strong enough to prove a baby version of the main theorem.

Baby theorem. For all $n$, $\omega < \log_n R(\langle n,n,n \rangle)$.

This shows that no single identity à la Strassen’s original identity can yield the optimal value of $\omega$.

Proving the main theorem itself requires more work, but the main ideas are contained in the proof of the baby theorem.

Main construction and proof of baby theorem

All the results of Coppersmith and Winograd rely on the following construction. We say that a tensor $T$ involves $r$ independent x-variables if it involves $r$ x-variables, and it cannot be written in terms of $r-1$ linear combinations of the x-variables.

Main construction. Let $T$ be a tensor involving $r$ independent x-variables and $s$ independent y-variables, and suppose that

$\displaystyle \epsilon^d T + O(\epsilon^{d+1}) = \sum_{t=1}^L \alpha_t \beta_t \gamma_t$, where $\displaystyle \sum_{t=1}^L \alpha_t \beta_t = 0$.

Then $\underline{R}(T \oplus \langle 1,L-r-s,1 \rangle) \leq L$, and furthermore this is witnessed by a decomposition of the form

$\displaystyle \epsilon^D(T \oplus \langle 1,L-r-s,1 \rangle) + O(\epsilon^{D+1}) = \sum_{t=1}^L \alpha'_t \beta'_t (\gamma'_t + \zeta),$

where $\zeta$, the z-variable of the inner product, doesn’t appear in $\gamma'_t$, and

$\displaystyle \sum_{t=1}^L \alpha'_t \beta'_t \zeta = \epsilon^D \langle 1,L-r-s,1 \rangle$.

Before proving the main construction, let us see why it is useful.

Corollary. For all $n,m,p$, putting $\Lambda = R(\langle n,m,p \rangle)$ we have

$\underline{R}(\langle n,m,p \rangle \oplus \langle 1,\Lambda-m(n+p-1),1 \rangle) \leq \Lambda + m.$

This corollary generalizes Schönhage’s identity: taking $(n,m,p) = (n,1,m)$ and using $R(\langle n,1,m \rangle) = nm$, we obtain $\underline{R}(\langle n,1,m \rangle \oplus \langle 1, (n-1)(m-1), 1 \rangle) \leq nm+1$.

Proof of corollary. Consider a decomposition $\langle n,m,p \rangle = \sum_{t=1}^\Lambda \alpha_t \beta_t \gamma_t$. Choose vectors $a \in \mathbb{R}^n$ and $b \in \mathbb{R}^p$ such that $c_t \triangleq \gamma_t(ab')$ is non-zero for all $t$; here $ab'$ is the outer product of $a$ and $b$, and $\gamma_t(ab')$ is the result of substituting the transpose of the matrix $ab'$ for the matrix variables $z_{ki}$: $z_{ki} = a_i b_k$. Such vectors $a,b$ exist since $\gamma_t \neq 0$ due to the minimality of the decomposition. Consider now the following decomposition of length $\Lambda+m$:

$\displaystyle \langle n,m,p \rangle = \sum_{t=1}^\Lambda (c_t\alpha_t) \beta_t (c_t^{-1}\gamma_t) + \sum_{j=1}^m (-\sum_{i=1}^n a_i x_{ij}) (\sum_{k=1}^p b_k y_{jk}) (0).$

Notice that

$\displaystyle \sum_{t=1}^\Lambda (c_t\alpha_t) \beta_t = \sum_{t=1}^\Lambda \alpha_t \beta_t \gamma_t(ab') = \sum_{i=1}^n \sum_{j=1}^m \sum_{k=1}^p x_{ij} y_{jk} a_i b_k,$

since we substituted $z_{ki} = a_i b_k$. On the other hand, also

$\displaystyle \sum_{j=1}^m (\sum_{i=1}^n a_i x_{ij}) (\sum_{k=1}^p b_k y_{jk}) = \sum_{i=1}^n \sum_{j=1}^m \sum_{k=1}^p x_{ij} y_{jk} a_i b_k.$

This shows that the premises of the main construction are satisfied, with $L=\Lambda+n,r=nm,s=mp$. Since $L-r-s = \Lambda-m(n+p-1)$, the corollary follows directly from the main construction. $\square$

Using the corollary, the baby theorem follows quite easily.

Proof of baby theorem. The proof uses the standard inequality $R(\langle n,n,n \rangle) > n^2$, which can be proved using the so-called substitution method. Let $R(\langle n,n,n \rangle) = n^\alpha$, so that $\alpha > 2$. For each integer $d$, $R_d \triangleq R(\langle n^d,n^d,n^d \rangle) = R(\langle n,n,n \rangle^{\otimes d}) \leq n^{d\alpha}$. If $R_d < n^{d\alpha}$ for some $d$ then Strassen’s theorem shows that $\omega \leq \log_{n^d} R_d < \alpha$, so assume that $R_d = n^{d\alpha}$ for all $d$. The corollary shows that

$\underline{R}(\langle n^d,n^d,n^d \rangle + \langle 1,n^{d\alpha}-n^d(2n^d-1),1 \rangle) \leq n^{d\alpha} + n^d,$

and the asymptotic sum inequality shows that

$n^{d\omega} + (n^{d\alpha}-n^d(2n^d-1))^{\omega/3} \leq n^{d\alpha} + n^d.$

Since $\alpha > 2$ and so $\alpha > 3/\alpha$, for large enough $d$ we have $n^{d\alpha} - n^d(2n^d-1) > n^{d(3/\alpha)}$, and so

$n^{d\omega} + n^{d(\omega/\alpha)} < n^{d\alpha} + n^d,$

which shows that $\omega < \alpha$. $\square$

Proof of main construction

Let us briefly recall the assumptions of the construction: $T$ is a tensor involving $r$ independent x-variables and $s$ independent y-variables, $\epsilon^d T + O(\epsilon^{d+1})$ decomposes as a sum of $L$ rank one tensors $\alpha_t \beta_t \gamma_t$, and $\sum_{t=1}^L \alpha_t \beta_t = 0$.

For each y-index $j$, let $e_j$ be the corresponding standard basis vector. Substituting $e_j$ for the y-variables, we obtain

$\sum_{t=1}^L \alpha_t \beta_t(e_j) = 0.$

We rephrase this in the language of matrices. Let $A$ be the $r \times L$ matrix whose columns are $\alpha_t$, and let $b_j$ be the $L \times 1$ column vector whose entries are $\beta_t(e_j)$. The displayed equation is equivalent to $Ab_j = 0$. Since the x-variables are independent, $A$ has rank $r$, and so the subspace of column vectors annihilating $A$ has dimension $L-r$. The independence of the y-variables implies that the $s$ vectors $b_j$ are linearly independent, and so we can find $L-r-s$ vectors $q_1,\ldots,q_{L-r-s}$ that together with $b_1,\ldots,b_s$ form a basis for the subspace of column vectors annihilating $A$.

We can apply the same argument the other way around. Let $B$ be the $L \times s$ matrix whose rows are $\beta_t$, and let $a_i$ be the $1 \times L$ row vector whose entries are $\alpha_t(e_i)$. As before, $a_i B = 0$, and the vectors $a_i$ are linearly independent. Furthermore, the subspace of row vectors annihilating $B$ has dimension $L-s$, and so $a_1,\ldots,a_r$ can be completed to a basis for this subspace by the addition of $L-r-s$ vectors $p_1,\ldots,p_{L-r-s}$.

Consider now the $(L-s)\times L$ matrix $P$ whose rows are $p_1,\ldots,p_{L-r-s},a_1,\ldots,a_r$ and the $L\times (L-r)$ matrix $Q$ whose columns are $q_1,\ldots,q_{L-r-s},b_1,\ldots,b_s$. The matrix $P$ has full row rank and the matrix $Q$ has full column rank, and so the $L-s\times L-r$ matrix $PQ$ has rank at least $L-r-s$. On the other hand, the last $r$ rows of $P$ are simply the matrix $A$, and the last $s$ columns of $Q$ simply form the matrix $B$. This shows that $PQ$ is a block diagonal matrix of the form $PQ = \begin{pmatrix} \ast & 0 \\ 0 & 0 \end{pmatrix}$, where $\ast$ is an $(L-r-s)\times(L-r-s)$ square block. Since $PQ$ has rank at least $L-r-s$, this block must be invertible. In fact, by choosing a different basis instead of $p_1,\ldots,p_{L-r-s}$, we can guarantee that

$PQ = \begin{pmatrix} I_{L-r-s} & 0 \\ 0 & 0 \end{pmatrix}.$

The next step is to lift $p_l,q_l$ to linear combinations of new x-variables $\xi_1,\ldots,\xi_{L-r-s}$ and y-variables $\eta_1,\ldots,\eta_{L-r-s}$: for each $t$, define

$\displaystyle \pi_t = \sum_{l=1}^{L-r-s} p_l(t) \xi_l, \kappa_t = \sum_{l=1}^{L-r-s} q_l(t).$

The linear combination $\pi_t$ has the same relation to the $p_l$ as $\alpha_t$ has to the $a_j$. The identity stated above for $PQ$ can be recast as

$\displaystyle \sum_{t=1}^L (\alpha_t + \pi_t) (\beta_t + \kappa_t) = \sum_{l=1}^{L-r-s} \xi_l \eta_l.$

This allows us to add the additional inner product to the original identity. Let $\zeta$ be a new z-variable, and consider the identity

$\displaystyle \sum_{t=1}^L (\alpha_t + \epsilon^{d+1} \pi_t) (\beta_t + \epsilon^{d+1} \kappa_t) (\epsilon^{d+2} \gamma_t + \zeta) =$

$\displaystyle \sum_{t=1}^L \alpha_t \beta_t \zeta + \epsilon^{d+1} \sum_{t=1}^L (\alpha_t \kappa_t + \pi_t \beta_t)\zeta +$

$\displaystyle \epsilon^{d+2}\sum_{t=1}^L \alpha_t \beta_t \gamma_t + \epsilon^{2d+2} \sum_{t=1}^L \pi_t \kappa_t \zeta + O(\epsilon^{3d+4}) =$

$\displaystyle \epsilon^{2d+2} (T + \sum_{l=1}^{L-r-s} \xi_l \eta_l) + O(\epsilon^{2d+3}).$

This shows that $\underline{R}(T \oplus \langle 1,L-r-s,1 \rangle) \leq L$. Furthermore, the coefficient of $\zeta$ is exactly $\sum_{l=1}^{L-r-s} \xi_l \eta_l$. This completes the proof of the main construction. $\square$

Proof of main theorem

The proof of the main theorem proceeds in three steps. In the first step we show how to absorb a tensor into another tensor satisfying the requirements of the main construction. In the second step we iterate the first step to obtain a result similar to the corollary used above to prove the baby theorem. Finally, in the third step we prove the main theorem itself, using a result akin to the lower bound $R(\langle n,n,n \rangle) > n^2$.

Auxiliary construction. Suppose that $T$ is a tensor along with a decomposition satisfying the requirements of the main construction, and that $T'$ is an arbitrary tensor (on different variables) whose border rank satisfies $L' \triangleq \underline{R}(T') \leq L-r-s$ (notation as in the main construction). Then there is a decomposition of $T \oplus T'$ of length $L +L'$ satisfying the requirements of the main construction.

Proof. Apply the main construction to $T$ to obtain a decomposition

$\displaystyle \epsilon^d(T + \sum_{l=1}^{L-r-s} \xi_l \eta_l \zeta) + O(\epsilon^{d+1}) = \sum_{t=1}^L \alpha_t \beta_t (\gamma_t + \zeta).$

Furthermore, let $\epsilon^{d'} T' + O(\epsilon^{d'+1}) = \sum_{t'=1}^{L'} \alpha'_{t'} \beta'_{t'} \gamma'_{t'}$ be a decomposition of $T'$. By adjusting one of the decompositions, we can assume without loss of generality that $d' = d$.

Denote by $\xi \gets -\alpha'$ the substitution that sets $\xi_l = -\alpha'_l$ for $l \leq L-r-s$ and $\xi_l = 0$ otherwise, and define $\eta \gets \beta'$ similarly. Applying these substitutions to the decomposition of $T$ above, we get

$\displaystyle \sum_{t=1}^L \alpha_t(\xi \gets -\alpha') \beta_t(\eta \gets \beta') (\gamma_t + \zeta) + \sum_{t'=1}^{L'} \alpha'_t \beta'_t (\gamma'_t + \zeta) =$

$\displaystyle \epsilon^d (T - \sum_{t'=1}^{L'} \alpha'_t \beta'_t \zeta) + \epsilon^d (T' + \sum_{t'=1}^{L'} \alpha'_t \beta'_t \zeta) + O(\epsilon^{d+1}) =$

$\epsilon^d (T + T') + O(\epsilon^{d+1}).$

Moreover, the coefficient of $\zeta$ in the first expression is exactly

$\displaystyle \left[\sum_{l=1}^{L-r-s} \xi_l \eta_l\right](\xi\gets -\alpha', \eta\gets \beta') + \sum_{t'=1}^{L'} \alpha'_t \beta'_t = 0.$

We can rewrite the combined decomposition as

$\displaystyle \epsilon^d (T + T') + O(\epsilon^{d+1}) = \sum_{t=1}^{L+L'} \alpha''_t \beta''_t (\gamma''_t + \zeta),$

where the coefficient of $\zeta$ vanishes. Removing $\zeta$, we obtain a decomposition satisfying the requirements of the main construction, completing the proof. $\square$

The next step is iterating the auxiliary construction, with an (almost) arbitrary starting point. Applying the main construction at the very end, we obtain the following corollary.

Auxiliary corollary. Suppose $T$ is a tensor involving $r$ independent x-variables and $s$ independent y-variables, whose border rank $L$ is at least $r+s$. Then for every integer $D \geq 1$,

$\underline{R}(D \cdot T \oplus \langle 1,L+D(L-r-s),1 \rangle) \leq (D+1) L.$

In the statement of the corollary, $D \cdot T$ is the direct sum of $D$ copies of $T$.

Proof. Consider some decomposition $\epsilon^d T + O(\epsilon^{d+1}) = \sum_{t=1}^L \alpha_t \beta_t \gamma_t$, and extend it to the decomposition

$\epsilon^d T + O(\epsilon^{d+1}) = \sum_{t=1}^L \alpha_t \beta_t \gamma_t + \sum_{t=1}^L (-\alpha_t) \beta_t (0)$

of length $2L$, which satisfies the requirements of the main construction. By assumption $\underline{R}(T) = L \leq 2L - r - s$, and so we can apply the auxiliary lemma to obtain a decomposition of $2\cdot T$ of length $3L$ satisfying the requirements of the main construction with $2r$ independent x-variables and $2s$ independent y-variables. Once again, by assumption $\underline{R}(T) = L \leq 3L - 2r - 2s$, and so we can apply the auxiliary lemma again (to $2\cdot T$ and $T$) to obtain a decomposition of $3\cdot T$ of length $4L$ satisfying the requirements of the main construction. In this way, for every $D$ we can obtain a decomposition of $D\cdot T$ of length $(D+1)L$ satisfying the requirements of the main construction with $Dr$ independent x-variables and $Ds$ independent y-variables. A final application of the main construction completes the proof. $\square$

Before we can prove the main theorem, we need one simple result.

Lemma. Suppose that applying the asymptotic sum inequality to the tensors $T,T'$ yields the same upper bound $\alpha$ on $\omega$. If $\underline{R}(T \oplus T') < \underline{R}(T) + \underline{R}(T')$ then applying the asymptotic sum inequality to $T \oplus T'$ shows that $\omega <\alpha$, and otherwise the same upper bound on $\omega$ is obtained. Similarly, if $\underline{R}(T \otimes T') < \underline{R}(T) \underline{R}(T')$ then applying the asymptotic sum inequality to $T\otimes T'$ shows that $\omega < \alpha$, and otherwise the same upper bound on $\omega$ is obtained.

Proof. Let $T = \sum_l \langle n_l,m_l,p_l \rangle$ have border rank $L$ and let $T' = \sum_{l'} \langle n'_{l'},m'_{l'},p'_{l'} \rangle$ have border rank $L'$. By assumption,

$\displaystyle \sum_l (n_lm_lp_l)^{\alpha/3} = L \quad \text{and} \quad \sum_{l'} (n'_{l'}m'_{l'}p'_{l'})^{\alpha/3} = L'.$

Applying the asymptotic sum inequality to the direct sum gives

$\displaystyle \sum_l (n_lm_lp_l)^{\omega/3} + \sum_{l'} (n'_{l'}m'_{l'}p'_{l'})^{\omega/3} \leq \underline{R}(T\oplus T').$

If $\underline{R}(T\oplus T') = L+L'$ then this gives the bound $\omega \leq \alpha$, and otherwise a better bound is obtained (recall that $\underline{R}(T\oplus T') \leq L+L'$).

Applying the asymptotic sum inequality to the tensor product gives

$\displaystyle \sum_l (n_lm_lp_l)^{\omega/3} \sum_{l'} (n'_{l'}m'_{l'}p'_{l'})^{\omega/3} = \sum_{l,l'} (n_ln'_{l'}m_lm'_{l'}p_lp'_{l'})^{\omega/3} \leq \underline{R}(T\otimes T').$

If $\underline{R}(T\otimes L') = LL'$ then this gives the bound $\omega \leq \alpha$, and otherwise a better bound is obtained (recall that $\underline{R}(T\otimes T') \leq LL'$). $\square$

We can now complete the proof of the main theorem, using the above lemma freely.

Proof of main theorem. Let $T = \bigoplus_l \langle n_l,m_l,p_l \rangle$ be an arbitrary tensor such that $(n_l,m_l,p_l) \neq (1,1,1)$ for some $l$, denote the border rank of $T$ by $L$, and let $\alpha$ be the solution to the equation

$\displaystyle \sum_l (n_lm_lp_l)^{\alpha/3} = L.$

Our goal is to show that $\omega < \alpha$.

Any tensor can be “rotated” by replacing the x,y,z-variables by y,z,x-variables. Any such permutation of variables results in a tensor with the same rank and border rank. In particular, the tensor

$\displaystyle T_S = \bigoplus_l \langle n_l,m_l,p_l \rangle \oplus \langle m_l,p_l,n_l \rangle \oplus \langle p_l,n_l,m_l \rangle$

has border rank $\underline{R}(T_S) \leq 3L$, and the same number of x-variables, y-variables and z-variables, say $r$ of each. If $\underline{R}(T_S) < 3L$ then the asymptotic sum inequality shows that $\omega < \alpha$, so we can assume that $\underline{R}(T_S) = 3L$. A lower bound proved using the substitution method shows that $3L > r$ (this is Theorem 6.1 in the paper of Coppersmith and Winograd). Pick a power $d$ so large that $(3L)^d \geq (r+1)^d > 2r^d$, and put $T' = T_S^{\otimes d}$. Note that $T'$ has $r^d$ each of x-variables, y-variables and z-variables, and its border rank is at most $(3L)^d$. If the border rank is smaller then the asymptotic sum inequality shows that $\omega < \alpha$, so we can assume that $\underline{R}(T') = (3L)^d > 2r^d$.

For brevity, put $L' = (3L)^d$, $r' = r^d$, and $\delta = L'-2r' \geq 1$. Also, let $L' = \bigoplus_{l'} \langle n'_{l'},m'_{l'},p'_{l'} \rangle$, where

$\displaystyle \sum_{l'} (n'_{l'}m'_{l'}p'_{l'})^{\alpha/3} = L'.$

The auxiliary corollary shows that for every $D$, $\underline{R}(D\cdot T' \oplus \langle 1,L'+D\delta,1 \rangle) \leq (D+1)L'$. Applying the asymptotic sum inequality gives

$\displaystyle D \sum_{l'}(n'_{l'}m'_{l'}p'_{l'})^{\omega/3} + (L'+D\delta)^{\omega/3} \leq (D+1)L'.$

Choose $D$ so large that $(L'+D\delta)^{\omega/3} > L'$. Then

$D \sum_{l'} (n'_{l'} m'_{l'} p'_{l'})^{\omega/3} + L' < (D+1)L',$

showing that $\omega < \alpha$. This completes the proof. $\square$

### 6 Responses to “The asymptotic sum inequality is not optimal (guest post by Yuval Filmus)”

1. vznvzn Says:

nice/impressive work, seems like a real result, should this be a paper? is this your 1st blog ever?
fyi RJL recently mused on the general theme of this blog does program size matter? … am collecting links & plan to blog on the subj at some pt also.

• Yuval Filmus Says:

As mentioned in the second paragraph, this post is an exposition of a paper by Coppersmith and Winograd from 1982. Another blog post, describing recent novel results, will appear in a few days.

• vznvzn Says:

lol ok. maybe the ref to the “baby thm” was what tripped me up that there could be new results or a new/ different/ alternate/ simplified proof/ angle. presumably not called that in any other paper… 😉

• vzn Says:

another funky idea just occurred to me. could this be an algorithm that converges ie on repeated applications is actually estimating $\omega$ to higher degrees of precision ie similar to iterative algorithms that compute digits of $\pi$? in other words are the supposed “improvements” increasingly small?

• Yuval Filmus Says:

This is mentioned as an open question by Coppersmith and Winograd. In some sense, Strassen’s laser method, especially as used by Coppersmith and Winograd, is an answer. One can probably also show by experiment that the improvements decrease sharply, though this is of course not a proof.

2. Limits on the recent progress on matrix multiplication algorithms (guest post by Yuval Filmus) | Speedup in Computational Complexity Says:

[…] Are there familiar computational problems with no best algorithm? « The asymptotic sum inequality is not optimal (guest post by Yuval Filmus) […]