The asymptotic sum inequality is not optimal (guest post by Yuval Filmus)

by

Matrix multiplication has become a popular topic recently. The main goal in this area is determining the value of the matrix multiplication constant \omega, which is the infimum over all \alpha such that two n\times n complex matrices can be multiplied using O(n^\alpha) arithmetic operations (addition, subtraction, multiplication, and division; arbitrary complex constants can appear as operands). Sloppy authors often define \omega as a minimum instead of an infimum, but it is not clear that any O(n^\omega) algorithm exists. Indeed, given the common belief that \omega = 2, a classic result of Ran Raz which gives a lower bound of \Omega(n^2\log n), assuming all the constants used are small, suggests that there is no O(n^\omega) algorithm (though there could be an asymptotically optimal algorithm, say one which runs in time O(n^2\log n)).

In this blog post we describe a result due to Coppersmith and Winograd that implies that a certain class of techniques provably cannot yield an optimal exponent, i.e. an O(n^\omega) algorithm, namely all algorithms which result from an invocation of Schönhage’s asymptotic sum inequality. This class includes Strassen’s original algorithm and all algorithms hence up to Strassen’s laser method, used in the celebrated algorithm of Coppersmith and Winograd, which corresponds to infinitely many invocations of the asymptotic sum inequality, and so is not subject to this limitation. The proof proceeds by showing how any identity (to which the asymptotic sum inequality can be applied) can be improved to another identity yielding a better bound on \omega.

Strassen’s original algorithm and the language of tensors

In 1969, Strassen stunned the world of science by describing a matrix multiplication algorithm running in time O(n^{\log_2 7}). The algorithm relies on an identity showing how to multiply two 2\times 2 matrices in non-commuting variables using 7 “essential” multiplications (that is, not including multiplication by constants). Running this algorithm recursively (by treating a 2^m\times 2^m matrix as a 2\times 2 matrix whose entries are 2^{m-1}\times 2^{m-1} matrices) allows us to multiply two 2^m\times 2^m matrices using 7^m essential multiplications, and the resulting algorithm has complexity O(7^m).

Later on, Strassen and others developed a framework in which the basic identity can be expressed succinctly as

R(\langle 2,2,2 \rangle) \leq 7.

Generalizing the recursive construction gives Strassen’s theorem

\omega \leq \log_m R(\langle m,m,m \rangle).

It is now time to explain the notations \langle 2,2,2 \rangle and R(\cdot). The notation \langle 2,2,2 \rangle, or more generally \langle n,m,p \rangle, represents a certain tensor, a three-dimensional analog of a matrix:

\displaystyle \langle n,m,p \rangle = \sum_{i=1}^n \sum_{j=1}^m \sum_{k=1}^p x_{ij} y_{jk} z_{ki}.

The expression on the right-hand side represents a three-dimensional array whose rows are indexed by the nm elements x_{ij}, whose columns are indexed by the mp elements y_{jk}, and whose third dimension is indexed by the pn elements z_{ki}. Each term x_{ij} y_{jk} z_{ki} in the sum indexes a particular cell in this three-dimensional array, and represents the fact that this cell has value 1. All other cells have value 0. For comparison, a matrix A = (a_{ij}) could be encoded in a similar fashion by the double sum \sum_{ij} a_{ij} x_i y_j. The tensor \langle n,m,p \rangle represents the multiplication of an n\times m matrix (whose entries are indexed by the x variables) by an m\times p matrix (whose entries are indexed by the y variables), resulting in an n\times p matrix (whose entries are indexed by the z variables).

The notation R(T) stands for the rank of the tensor T, which is the smallest number of rank one tensors which sum to T. A rank one tensor is an outer product of the form

\displaystyle (\sum_a \alpha_a x_a) (\sum_b \beta_b y_b) (\sum_c \gamma_c z_c) = \sum_{a,b,c} \alpha_a \beta_b \gamma_c x_a y_b z_c.

The usual matrix rank can also be defined in this fashion, and this is the easiest way to show that row rank equals column rank for matrices.

Here is Strassen’s identity in this language:

\langle 2,2,2 \rangle = (x_{11} + x_{22})(y_{11} + y_{22})(z_{11} + z_{22}) +

(x_{21}+x_{22})y_{11}(z_{21}-z_{22}) + x_{11}(y_{12}-y_{22})(z_{12}+z_{22}) + x_{22}(y_{21}-y_{11})(z_{11}+z_{21}) +

(x_{11}+x_{12})y_{22}(-z_{11}+z_{12}) + (x_{21}-x_{11})(y_{11}+y_{12})z_{22} + (x_{12}-x_{22})(y_{21}+y_{22})z_{11}.

Tensorial notation makes it clear that R(\langle n,m,p \rangle) = R(\langle m,p,n \rangle), and the rank is the same for the other four permutations as well.

Two important operations on tensors are direct sum and tensor product. The direct sum T_1 \oplus T_2 of two tensors is a “block diagonal tensor” which has T_1 as one block and T_2 as the other block. In tensorial notation, after ensuring that variables appearing in both tensors are disjoint, T_1 \oplus T_2 = T_1 + T_2. For example,

\displaystyle \langle n,1,m \rangle \oplus \langle 1,p,1 \rangle = \sum_{i=1}^n \sum_{k=1}^m x_i y_k z_{ki} + \sum_{j=1}^p X_j Y_j Z.

The variable names we chose are arbitrary; in fact, the equals sign really means “equal up to isomorphism”, or rather “equal up to choice of indices for rows, columns and the third dimension”. The two constituent tensors represent an outer product and an inner product of vectors, respectively, and will figure out prominently in the sequel. It is not hard to check that in general, R(T_1 \oplus T_2) \leq R(T_1) + R(T_2), but this is not always tight, as the example shows.

The tensor product T_1 \otimes T_2 of two tensors corresponds to the Kronecker product of matrices. Instead of explaining it formally, here is an example:

\langle n_1,m_1,p_1 \rangle \otimes \langle n_2,m_2,p_2 \rangle =

\displaystyle \sum_{i_1=1}^{n_1} \sum_{j_1=1}^{m_1} \sum_{k_1=1}^{p_1} x_{i_1j_1} y_{j_1k_1} z_{k_1i_1} \otimes \sum_{i_2=1}^{n_2} \sum_{j_2=1}^{m_2} \sum_{k_2=1}^{p_2} x_{i_2j_2} y_{j_2k_2} z_{k_2i_2} =

\displaystyle \sum_{(i_1,i_2)=(1,1)}^{(n_1,n_2)} \sum_{(j_1,j_2)=(1,1)}^{(m_1,m_2)} \sum_{(k_1,k_2)=(1,1)}^{(p_1,p_2)} x_{(i_1,i_2)(j_1,j_2)} y_{(j_1,j_2)(k_1,k_2)} z_{(k_1,k_2)(i_1,i_2)} =

\langle n_1n_2,m_1m_2,p_1p_2 \rangle.

Formally speaking, the rows of the product tensor are indexed by pairs of row indices, and similarly for columns and the third dimension; the (a_1,a_2),(b_1,b_2),(c_1,c_2) entry of the product tensor is equal to T_1(a_1,b_1,c_1) T_2(a_2,b_2,c_2). It is not hard to check that R(T_1 \otimes T_2) \leq R(T_1) R(T_2).

The tensor product can be iterated to produce the tensor power T^{\otimes N}. In terms of rank, R(T^{\otimes N}) \leq R(T)^N.

Border rank and the asymptotic sum inequality

The rank operator on matrices is continuous, that is, if a matrix M has rank R, then all matrices close enough to M will have rank R. Surprisingly, this is not the case for tensors: the identity

x_1 y_1 z_1 + \epsilon x_2 y_1 z_1 + x_2 y_2 z_1 + x_1 y_2 z_2 = (x_1 + \epsilon x_2)(y_1 + \epsilon^{-1} y_2)z_1 + x_1 y_2 (-\epsilon^{-1} z_1+z_2)

shows that for any \epsilon \neq 0, the rank of the tensor on the left is 2, and on the other hand one can check that when \epsilon = 0 the rank is 3. The border rank of a tensor T, denoted \underline{R}(T), is the minimum rank of a sequence of tensors converging to T. Here is an example due to Schönhage:

\displaystyle \epsilon^2 \left(\sum_{i=1}^n \sum_{k=1}^m x_i y_k z_{ki} + \sum_{(i,k)=(1,1)}^{(n-1,m-1)} X_{(i,k)} Y_{(i,k)} Z\right) + O(\epsilon^3) =

\displaystyle \sum_{i=1}^{n-1} \sum_{k=1}^{m-1} (x_i + \epsilon X_{(i,k)})(y_k + \epsilon Y_{(i,k)}) (\epsilon^2 z_{ki} + Z) +

\displaystyle \sum_{i=1}^{n-1} x_i (y_m - \epsilon \sum_{k=1}^{m-1} Y_{(i,k)})(\epsilon^2 z_{mi}) + \sum_{k=1}^{m-1} (x_n - \epsilon \sum_{i=1}^{n-1} X_{(i,k)}) y_k (\epsilon^2 z_{kn}) +

\displaystyle x_n y_m (\epsilon^2 z_{nm} + Z) - \left(\sum_{i=1}^n x_i\right) \left(\sum_{k=1}^m y_k\right) Z.

In this identity, O(\epsilon^3) is some expression only involving powers of \epsilon which are \epsilon^3 or higher. If we divide both sides by \epsilon^2 and let \epsilon \to 0, we recover \langle n,1,m \rangle \oplus \langle 1, (n-1)(m-1), 1 \rangle on the left-hand side. Since there are nm + 1 terms on the right-hand side, we conclude that

\underline{R}(\langle n,1,m \rangle \oplus \langle 1,(n-1)(m-1),1 \rangle) \leq nm+1.

This is rather surprising, since clearly \underline{R}(\langle n,1,m \rangle) = nm. At the cost of an increase of one in the border rank, we are somehow able to compute a large disjoint inner product!

It is a deep fact due to Strassen that all upper bounds on the border rank arise as “approximate identities”, which are identities like Schönhage’s identity mentioned above (the power of \epsilon on the left-hand side can be different); all powers of \epsilon appearing in such identities are positive integers. This shows that \underline{R}(T \oplus T') \leq \underline{R}(T) + \underline{R}(T') and \underline{R}(T \otimes T') \leq \underline{R}(T) \underline{R}(T') for all tensors T,T'.

How is Schönhage’s identity useful for multiplying two square matrices?

Asymptotic sum inequality: For a set (n_t,m_t,p_t) of triples, not all equal to (1,1,1),

\displaystyle \sum_t (n_t m_t p_t)^{\omega/3} \leq \underline{R}(\bigoplus_t \langle n_t,m_t,p_t \rangle).

This result of Schönhage is a vast generalization of Strassen’s theorem mentioned above, which corresponds to the special case in which there is only one triple (with border rank replaced by the rank). Unfortunately, explaining the proof of Schönhage’s identity will take us too far afield. The proof can be found in many sources, such as my lecture notes. One interesting aspect is that the proof of the asymptotic sum inequality does not result in an O(n^\alpha) algorithm, where \alpha is the upper bound on \omega resulting from the inequality. Rather, the proof gives a sequence of algorithms whose exponents converge to \alpha.

Choosing n=m=4 in Schönhage’s identity and applying the asymptotic sum inequality, we obtain

16^{\omega/3} + 9^{\omega/3} \leq 17.

Solving for \omega, we get the upper bound \omega \leq 2.55.

The asymptotic sum inequality isn’t optimal

Finally we are ready to state Coppersmith and Winograd’s result.

Main theorem. Let \alpha be the upper bound on \omega obtained from an application of the asymptotic sum inequality. Then \omega < \alpha.

In other words, a single invocation of the asymptotic sum inequality cannot yield the optimal value of \omega. One way around this limitation is to apply the asymptotic sum inequality to a sequence of identities — the course of action taken by Strassen’s laser method.

In the rest of this post, we provide an (almost) complete proof of this theorem. We start with a result generalizing Schönhage’s identity, which shows that in certain situations one can compute an additional inner product for free or at only a small cost in the border rank. This result is strong enough to prove a baby version of the main theorem.

Baby theorem. For all n, \omega < \log_n R(\langle n,n,n \rangle).

This shows that no single identity à la Strassen’s original identity can yield the optimal value of \omega.

Proving the main theorem itself requires more work, but the main ideas are contained in the proof of the baby theorem.


Main construction and proof of baby theorem

All the results of Coppersmith and Winograd rely on the following construction. We say that a tensor T involves r independent x-variables if it involves r x-variables, and it cannot be written in terms of r-1 linear combinations of the x-variables.

Main construction. Let T be a tensor involving r independent x-variables and s independent y-variables, and suppose that

\displaystyle \epsilon^d T + O(\epsilon^{d+1}) = \sum_{t=1}^L \alpha_t \beta_t \gamma_t, where \displaystyle \sum_{t=1}^L \alpha_t \beta_t = 0.

Then \underline{R}(T \oplus \langle 1,L-r-s,1 \rangle) \leq L, and furthermore this is witnessed by a decomposition of the form

\displaystyle \epsilon^D(T \oplus \langle 1,L-r-s,1 \rangle) + O(\epsilon^{D+1}) = \sum_{t=1}^L \alpha'_t \beta'_t (\gamma'_t + \zeta),

where \zeta, the z-variable of the inner product, doesn’t appear in \gamma'_t, and

\displaystyle \sum_{t=1}^L \alpha'_t \beta'_t \zeta = \epsilon^D \langle 1,L-r-s,1 \rangle.

Before proving the main construction, let us see why it is useful.

Corollary. For all n,m,p, putting \Lambda = R(\langle n,m,p \rangle) we have

\underline{R}(\langle n,m,p \rangle \oplus \langle 1,\Lambda-m(n+p-1),1 \rangle) \leq \Lambda + m.

This corollary generalizes Schönhage’s identity: taking (n,m,p) = (n,1,m) and using R(\langle n,1,m \rangle) = nm, we obtain \underline{R}(\langle n,1,m \rangle \oplus \langle 1, (n-1)(m-1), 1 \rangle) \leq nm+1.

Proof of corollary. Consider a decomposition \langle n,m,p \rangle = \sum_{t=1}^\Lambda \alpha_t \beta_t \gamma_t. Choose vectors a \in \mathbb{R}^n and b \in \mathbb{R}^p such that c_t \triangleq \gamma_t(ab') is non-zero for all t; here ab' is the outer product of a and b, and \gamma_t(ab') is the result of substituting the transpose of the matrix ab' for the matrix variables z_{ki}: z_{ki} = a_i b_k. Such vectors a,b exist since \gamma_t \neq 0 due to the minimality of the decomposition. Consider now the following decomposition of length \Lambda+m:

\displaystyle \langle n,m,p \rangle = \sum_{t=1}^\Lambda (c_t\alpha_t) \beta_t (c_t^{-1}\gamma_t) + \sum_{j=1}^m (-\sum_{i=1}^n a_i x_{ij}) (\sum_{k=1}^p b_k y_{jk}) (0).

Notice that

\displaystyle \sum_{t=1}^\Lambda (c_t\alpha_t) \beta_t = \sum_{t=1}^\Lambda \alpha_t \beta_t \gamma_t(ab') = \sum_{i=1}^n \sum_{j=1}^m \sum_{k=1}^p x_{ij} y_{jk} a_i b_k,

since we substituted z_{ki} = a_i b_k. On the other hand, also

\displaystyle \sum_{j=1}^m (\sum_{i=1}^n a_i x_{ij}) (\sum_{k=1}^p b_k y_{jk}) = \sum_{i=1}^n \sum_{j=1}^m \sum_{k=1}^p x_{ij} y_{jk} a_i b_k.

This shows that the premises of the main construction are satisfied, with L=\Lambda+n,r=nm,s=mp. Since L-r-s = \Lambda-m(n+p-1), the corollary follows directly from the main construction. \square

Using the corollary, the baby theorem follows quite easily.

Proof of baby theorem. The proof uses the standard inequality R(\langle n,n,n \rangle) > n^2, which can be proved using the so-called substitution method. Let R(\langle n,n,n \rangle) = n^\alpha, so that \alpha > 2. For each integer d, R_d \triangleq R(\langle n^d,n^d,n^d \rangle) = R(\langle n,n,n \rangle^{\otimes d}) \leq n^{d\alpha}. If R_d < n^{d\alpha} for some d then Strassen’s theorem shows that \omega \leq \log_{n^d} R_d < \alpha, so assume that R_d = n^{d\alpha} for all d. The corollary shows that

\underline{R}(\langle n^d,n^d,n^d \rangle + \langle 1,n^{d\alpha}-n^d(2n^d-1),1 \rangle) \leq n^{d\alpha} + n^d,

and the asymptotic sum inequality shows that

n^{d\omega} + (n^{d\alpha}-n^d(2n^d-1))^{\omega/3} \leq n^{d\alpha} + n^d.

Since \alpha > 2 and so \alpha > 3/\alpha, for large enough d we have n^{d\alpha} - n^d(2n^d-1) > n^{d(3/\alpha)}, and so

n^{d\omega} + n^{d(\omega/\alpha)} < n^{d\alpha} + n^d,

which shows that \omega < \alpha. \square

Proof of main construction

Let us briefly recall the assumptions of the construction: T is a tensor involving r independent x-variables and s independent y-variables, \epsilon^d T + O(\epsilon^{d+1}) decomposes as a sum of L rank one tensors \alpha_t \beta_t \gamma_t, and \sum_{t=1}^L \alpha_t \beta_t = 0.

For each y-index j, let e_j be the corresponding standard basis vector. Substituting e_j for the y-variables, we obtain

\sum_{t=1}^L \alpha_t \beta_t(e_j) = 0.

We rephrase this in the language of matrices. Let A be the r \times L matrix whose columns are \alpha_t, and let b_j be the L \times 1 column vector whose entries are \beta_t(e_j). The displayed equation is equivalent to Ab_j = 0. Since the x-variables are independent, A has rank r, and so the subspace of column vectors annihilating A has dimension L-r. The independence of the y-variables implies that the s vectors b_j are linearly independent, and so we can find L-r-s vectors q_1,\ldots,q_{L-r-s} that together with b_1,\ldots,b_s form a basis for the subspace of column vectors annihilating A.

We can apply the same argument the other way around. Let B be the L \times s matrix whose rows are \beta_t, and let a_i be the 1 \times L row vector whose entries are \alpha_t(e_i). As before, a_i B = 0, and the vectors a_i are linearly independent. Furthermore, the subspace of row vectors annihilating B has dimension L-s, and so a_1,\ldots,a_r can be completed to a basis for this subspace by the addition of L-r-s vectors p_1,\ldots,p_{L-r-s}.

Consider now the (L-s)\times L matrix P whose rows are p_1,\ldots,p_{L-r-s},a_1,\ldots,a_r and the L\times (L-r) matrix Q whose columns are q_1,\ldots,q_{L-r-s},b_1,\ldots,b_s. The matrix P has full row rank and the matrix Q has full column rank, and so the L-s\times L-r matrix PQ has rank at least L-r-s. On the other hand, the last r rows of P are simply the matrix A, and the last s columns of Q simply form the matrix B. This shows that PQ is a block diagonal matrix of the form PQ = \begin{pmatrix} \ast & 0 \\ 0 & 0 \end{pmatrix}, where \ast is an (L-r-s)\times(L-r-s) square block. Since PQ has rank at least L-r-s, this block must be invertible. In fact, by choosing a different basis instead of p_1,\ldots,p_{L-r-s}, we can guarantee that

PQ = \begin{pmatrix} I_{L-r-s} & 0 \\ 0 & 0 \end{pmatrix}.

The next step is to lift p_l,q_l to linear combinations of new x-variables \xi_1,\ldots,\xi_{L-r-s} and y-variables \eta_1,\ldots,\eta_{L-r-s}: for each t, define

\displaystyle \pi_t = \sum_{l=1}^{L-r-s} p_l(t) \xi_l, \kappa_t = \sum_{l=1}^{L-r-s} q_l(t).

The linear combination \pi_t has the same relation to the p_l as \alpha_t has to the a_j. The identity stated above for PQ can be recast as

\displaystyle \sum_{t=1}^L (\alpha_t + \pi_t) (\beta_t + \kappa_t) = \sum_{l=1}^{L-r-s} \xi_l \eta_l.

This allows us to add the additional inner product to the original identity. Let \zeta be a new z-variable, and consider the identity

\displaystyle \sum_{t=1}^L (\alpha_t + \epsilon^{d+1} \pi_t) (\beta_t + \epsilon^{d+1} \kappa_t) (\epsilon^{d+2} \gamma_t + \zeta) =

\displaystyle \sum_{t=1}^L \alpha_t \beta_t \zeta + \epsilon^{d+1} \sum_{t=1}^L (\alpha_t \kappa_t + \pi_t \beta_t)\zeta +

\displaystyle \epsilon^{d+2}\sum_{t=1}^L \alpha_t \beta_t \gamma_t + \epsilon^{2d+2} \sum_{t=1}^L \pi_t \kappa_t \zeta + O(\epsilon^{3d+4}) =

\displaystyle \epsilon^{2d+2} (T + \sum_{l=1}^{L-r-s} \xi_l \eta_l) + O(\epsilon^{2d+3}).

This shows that \underline{R}(T \oplus \langle 1,L-r-s,1 \rangle) \leq L. Furthermore, the coefficient of \zeta is exactly \sum_{l=1}^{L-r-s} \xi_l \eta_l. This completes the proof of the main construction. \square

Proof of main theorem

The proof of the main theorem proceeds in three steps. In the first step we show how to absorb a tensor into another tensor satisfying the requirements of the main construction. In the second step we iterate the first step to obtain a result similar to the corollary used above to prove the baby theorem. Finally, in the third step we prove the main theorem itself, using a result akin to the lower bound R(\langle n,n,n \rangle) > n^2.

We start with the first step.

Auxiliary construction. Suppose that T is a tensor along with a decomposition satisfying the requirements of the main construction, and that T' is an arbitrary tensor (on different variables) whose border rank satisfies L' \triangleq \underline{R}(T') \leq L-r-s (notation as in the main construction). Then there is a decomposition of T \oplus T' of length L +L' satisfying the requirements of the main construction.

Proof. Apply the main construction to T to obtain a decomposition

\displaystyle \epsilon^d(T + \sum_{l=1}^{L-r-s} \xi_l \eta_l \zeta) + O(\epsilon^{d+1}) = \sum_{t=1}^L \alpha_t \beta_t (\gamma_t + \zeta).

Furthermore, let \epsilon^{d'} T' + O(\epsilon^{d'+1}) = \sum_{t'=1}^{L'} \alpha'_{t'} \beta'_{t'} \gamma'_{t'} be a decomposition of T'. By adjusting one of the decompositions, we can assume without loss of generality that d' = d.

Denote by \xi \gets -\alpha' the substitution that sets \xi_l = -\alpha'_l for l \leq L-r-s and \xi_l = 0 otherwise, and define \eta \gets \beta' similarly. Applying these substitutions to the decomposition of T above, we get

\displaystyle \sum_{t=1}^L \alpha_t(\xi \gets -\alpha') \beta_t(\eta \gets \beta') (\gamma_t + \zeta) + \sum_{t'=1}^{L'} \alpha'_t \beta'_t (\gamma'_t + \zeta) =

\displaystyle \epsilon^d (T - \sum_{t'=1}^{L'} \alpha'_t \beta'_t \zeta) + \epsilon^d (T' + \sum_{t'=1}^{L'} \alpha'_t \beta'_t \zeta) + O(\epsilon^{d+1}) =

\epsilon^d (T + T') + O(\epsilon^{d+1}).

Moreover, the coefficient of \zeta in the first expression is exactly

\displaystyle \left[\sum_{l=1}^{L-r-s} \xi_l \eta_l\right](\xi\gets -\alpha', \eta\gets \beta') + \sum_{t'=1}^{L'} \alpha'_t \beta'_t = 0.

We can rewrite the combined decomposition as

\displaystyle \epsilon^d (T + T') + O(\epsilon^{d+1}) = \sum_{t=1}^{L+L'} \alpha''_t \beta''_t (\gamma''_t + \zeta),

where the coefficient of \zeta vanishes. Removing \zeta, we obtain a decomposition satisfying the requirements of the main construction, completing the proof. \square

The next step is iterating the auxiliary construction, with an (almost) arbitrary starting point. Applying the main construction at the very end, we obtain the following corollary.

Auxiliary corollary. Suppose T is a tensor involving r independent x-variables and s independent y-variables, whose border rank L is at least r+s. Then for every integer D \geq 1,

\underline{R}(D \cdot T \oplus \langle 1,L+D(L-r-s),1 \rangle) \leq (D+1) L.

In the statement of the corollary, D \cdot T is the direct sum of D copies of T.

Proof. Consider some decomposition \epsilon^d T + O(\epsilon^{d+1}) = \sum_{t=1}^L \alpha_t \beta_t \gamma_t, and extend it to the decomposition

\epsilon^d T + O(\epsilon^{d+1}) = \sum_{t=1}^L \alpha_t \beta_t \gamma_t + \sum_{t=1}^L (-\alpha_t) \beta_t (0)

of length 2L, which satisfies the requirements of the main construction. By assumption \underline{R}(T) = L \leq 2L - r - s, and so we can apply the auxiliary lemma to obtain a decomposition of 2\cdot T of length 3L satisfying the requirements of the main construction with 2r independent x-variables and 2s independent y-variables. Once again, by assumption \underline{R}(T) = L \leq 3L - 2r - 2s, and so we can apply the auxiliary lemma again (to 2\cdot T and T) to obtain a decomposition of 3\cdot T of length 4L satisfying the requirements of the main construction. In this way, for every D we can obtain a decomposition of D\cdot T of length (D+1)L satisfying the requirements of the main construction with Dr independent x-variables and Ds independent y-variables. A final application of the main construction completes the proof. \square

Before we can prove the main theorem, we need one simple result.

Lemma. Suppose that applying the asymptotic sum inequality to the tensors T,T' yields the same upper bound \alpha on \omega. If \underline{R}(T \oplus T') < \underline{R}(T) + \underline{R}(T') then applying the asymptotic sum inequality to T \oplus T' shows that \omega <\alpha, and otherwise the same upper bound on \omega is obtained. Similarly, if \underline{R}(T \otimes T') < \underline{R}(T) \underline{R}(T') then applying the asymptotic sum inequality to T\otimes T' shows that \omega < \alpha, and otherwise the same upper bound on \omega is obtained.

Proof. Let T = \sum_l \langle n_l,m_l,p_l \rangle have border rank L and let T' = \sum_{l'} \langle n'_{l'},m'_{l'},p'_{l'} \rangle have border rank L'. By assumption,

\displaystyle \sum_l (n_lm_lp_l)^{\alpha/3} = L \quad \text{and} \quad \sum_{l'} (n'_{l'}m'_{l'}p'_{l'})^{\alpha/3} = L'.

Applying the asymptotic sum inequality to the direct sum gives

\displaystyle \sum_l (n_lm_lp_l)^{\omega/3} + \sum_{l'} (n'_{l'}m'_{l'}p'_{l'})^{\omega/3} \leq \underline{R}(T\oplus T').

If \underline{R}(T\oplus T') = L+L' then this gives the bound \omega \leq \alpha, and otherwise a better bound is obtained (recall that \underline{R}(T\oplus T') \leq L+L').

Applying the asymptotic sum inequality to the tensor product gives

\displaystyle \sum_l (n_lm_lp_l)^{\omega/3} \sum_{l'} (n'_{l'}m'_{l'}p'_{l'})^{\omega/3} = \sum_{l,l'} (n_ln'_{l'}m_lm'_{l'}p_lp'_{l'})^{\omega/3} \leq \underline{R}(T\otimes T').

If \underline{R}(T\otimes L') = LL' then this gives the bound \omega \leq \alpha, and otherwise a better bound is obtained (recall that \underline{R}(T\otimes T') \leq LL'). \square

We can now complete the proof of the main theorem, using the above lemma freely.

Proof of main theorem. Let T = \bigoplus_l \langle n_l,m_l,p_l \rangle be an arbitrary tensor such that (n_l,m_l,p_l) \neq (1,1,1) for some l, denote the border rank of T by L, and let \alpha be the solution to the equation

\displaystyle \sum_l (n_lm_lp_l)^{\alpha/3} = L.

Our goal is to show that \omega < \alpha.

Any tensor can be “rotated” by replacing the x,y,z-variables by y,z,x-variables. Any such permutation of variables results in a tensor with the same rank and border rank. In particular, the tensor

\displaystyle T_S = \bigoplus_l \langle n_l,m_l,p_l \rangle \oplus \langle m_l,p_l,n_l \rangle \oplus \langle p_l,n_l,m_l \rangle

has border rank \underline{R}(T_S) \leq 3L, and the same number of x-variables, y-variables and z-variables, say r of each. If \underline{R}(T_S) < 3L then the asymptotic sum inequality shows that \omega < \alpha, so we can assume that \underline{R}(T_S) = 3L. A lower bound proved using the substitution method shows that 3L > r (this is Theorem 6.1 in the paper of Coppersmith and Winograd). Pick a power d so large that (3L)^d \geq (r+1)^d > 2r^d, and put T' = T_S^{\otimes d}. Note that T' has r^d each of x-variables, y-variables and z-variables, and its border rank is at most (3L)^d. If the border rank is smaller then the asymptotic sum inequality shows that \omega < \alpha, so we can assume that \underline{R}(T') = (3L)^d > 2r^d.

For brevity, put L' = (3L)^d, r' = r^d, and \delta = L'-2r' \geq 1. Also, let L' = \bigoplus_{l'} \langle n'_{l'},m'_{l'},p'_{l'} \rangle, where

\displaystyle \sum_{l'} (n'_{l'}m'_{l'}p'_{l'})^{\alpha/3} = L'.

The auxiliary corollary shows that for every D, \underline{R}(D\cdot T' \oplus \langle 1,L'+D\delta,1 \rangle) \leq (D+1)L'. Applying the asymptotic sum inequality gives

\displaystyle D \sum_{l'}(n'_{l'}m'_{l'}p'_{l'})^{\omega/3} + (L'+D\delta)^{\omega/3} \leq (D+1)L'.

Choose D so large that (L'+D\delta)^{\omega/3} > L'. Then

D \sum_{l'} (n'_{l'} m'_{l'} p'_{l'})^{\omega/3} + L' < (D+1)L',

showing that \omega < \alpha. This completes the proof. \square

Advertisements

6 Responses to “The asymptotic sum inequality is not optimal (guest post by Yuval Filmus)”

  1. vznvzn Says:

    nice/impressive work, seems like a real result, should this be a paper? is this your 1st blog ever?
    fyi RJL recently mused on the general theme of this blog does program size matter? … am collecting links & plan to blog on the subj at some pt also.

    • Yuval Filmus Says:

      As mentioned in the second paragraph, this post is an exposition of a paper by Coppersmith and Winograd from 1982. Another blog post, describing recent novel results, will appear in a few days.

      • vznvzn Says:

        ^^’ lol ok. maybe the ref to the “baby thm” was what tripped me up that there could be new results or a new/ different/ alternate/ simplified proof/ angle. presumably not called that in any other paper… 😉

      • vzn Says:

        another funky idea just occurred to me. could this be an algorithm that converges ie on repeated applications is actually estimating \omega to higher degrees of precision ie similar to iterative algorithms that compute digits of \pi? in other words are the supposed “improvements” increasingly small?

      • Yuval Filmus Says:

        This is mentioned as an open question by Coppersmith and Winograd. In some sense, Strassen’s laser method, especially as used by Coppersmith and Winograd, is an answer. One can probably also show by experiment that the improvements decrease sharply, though this is of course not a proof.

  2. Limits on the recent progress on matrix multiplication algorithms (guest post by Yuval Filmus) | Speedup in Computational Complexity Says:

    […] Are there familiar computational problems with no best algorithm? « The asymptotic sum inequality is not optimal (guest post by Yuval Filmus) […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: