Archive for the ‘General’ Category

Levin’s and Schnorr’s optimality results

March 22, 2013

This week on “Speedup in Computational Complexity” we’re going to learn how to write an optimal solver for SAT. Thanks to Leonid Levin we know that a very partial answer to “Are there familiar computational problems with no best algorithm?” is “Well, computing a satisfying assignment for a boolean formula does have a best algorithm, that’s for sure!”. In fact we’ll explore several variations of an idea by Levin all of which provide us with computational problems that each have a best algorithm. One particular variation, by Claus-Peter Schnorr, applies to the class of self-reducible problems.

A word of warning before we start: The below constructions only work in some models of complexity. One of the models in which the results will not apply is the Turing Machine model. I will mention the model requirements as we go along but if you’d like a more detailed discussion of this topic, I refer you to Gurevich’s paper in the references.

Without further ado let’s jump straight into Levin’s neat little trick, performed by a combination of an interpreter and a program enumerator.

A wonderful sequence of values

The program that we’ll design in this section takes an input x and runs infinitely, outputting an infinite sequence of values. Our program will output a new number in the sequence every k steps for some constant k. The sequence produced will turn out to be quite a wonderful characterization of x (if you love computational complexity). I’ll use the name iseq(x) for the infinite sequence generated on input x.

To design our program – let’s call it iseqprog – we’ll need two other programs to start from: A program enumerator and an interpreter for the enumerated programs.

The program enumerator, progen, takes as input a pair (i,x) and returns the initial configuration of the program with index i on input x. We’ll expect this operation to be constant-time when either i=0 or we already called progen on (i-1,x). In other words: progen is more like a method of an object (with internal state) which expects to be called with inputs (0,x), (1,x), (2,x),… and is able to process each item in such a call sequence in constant-time.

The interpreter we’ll need cannot be any old interpreter. In these modern times we can expect a certain service level. The interpreter should work like a slot machine in the arcades: Whenever I put in a new coin I continue my game with three more lives. In other words, when I give the interpreter the configuration of program p after t steps on input x, it returns an updated configuration representing the state of program p after t+1 steps on input x. It also tells me if p terminated in step t+1 and, if so, the return value of p on x. All of this happens in constant time. After all, the interpreter only needs to simulate one single step of p on x.

Comment: Almost any old interpreter can be used for Levin’s construction, but the exposition would become more complex.

Now I’ll describe the computation of iseqprog on input x. The computation proceeds in rounds, and each round consists of a number of stages. There is an infinite number of rounds. The number of stages in each round is finite but not constant across rounds.

Round 1 has only 1 stage. In this first stage of the first round, iseqprog runs progen on (0,x) and gets back the initial configuration of program 0 on input x. iseqprog then uses the interpreter to interpret just 1 step of program 0 on input x. If program 0 happens to terminate on input x in that first step, iseqprog immediately outputs program 0’s output on input x. Regardless of whether the interpretation of program 0 terminated, iseqprog itself does not terminate; it is on its way to generate an infinite sequence. If the interpretation of program 0 on input x did not terminate in its first step, iseqprog outputs the value 0 before continuing, providing us with a signal that it’s still live and running. This concludes the first (1-stage) round of iseqprog’s computation on x.

The computation continues in an infinite sequence of rounds. In each round, iseqprog calls progen once, adding one new item to the set of program configurations it has accumulated during the previous rounds. Each of these program configurations is interpreted for a number of steps. Every time the interpreter has executed one step of a program i, iseqprog outputs one value. The output value will be 0 or program i’s output on x. Whatever program you may think of, iseqprog will eventually generate its output on x (in between a lot of 0’s and many other values).

If we use this shorthand:

  • +i”: means “create the initial configuration for program i on input x, then interpret 1 step on that configuration and output one value”
  • i” means “interpret 1 more step on the stored configuration for program i and output one value”

then we can sum up iseqprog’s first rounds like this:

Round 1: +1
Round 2: 11+2
Round 3: 111122+3
Round 4: 11111111222233+4
Round 5: 111111111111111122222222333344+5
Round 6: 32 1’s, 16 2’s, 8 3’s, 4 4’s, 2 5’s, and one +6

I hope it has become clear why iseqprog should be able to generate a new item of iseq every k steps or less for some constant k. Apart from the administrative work of looking up and saving configurations in some table, each step involves at most one call to the program enumerator and one call to the interpreter. These calls were assumed to be constant-time. The administrative work I wlll simply assume to be constant-time as well. iseqprog cannot work as intended in all complexity models; in particular, it doesn’t work for Turing machines.

Now let’s have a look at the sequence iseq(x) itself. The key observation is that although any individual program does not get much attention from iseqprog, it does get a specific percentage of attention that is not dependent on the input x. For instance, program 3 accounts for \frac{1}{2^3}=\frac{1}{8} of the interpreter calls made by iseqprog regardless of the input x. The percentage is tied only to the program’s index number according to the program enumerator. From this observation we can derive (proof left to the reader) the salient feature of iseq(x):

If program p outputs y on input x in time t, then y appears in iseq(x) at an index less than ct for c depending only on p.

I think this is great! Whatever you want to compute from x, you’ll find it in iseq(x). What’s more: Your answer appears quite early in the sequence – so early, in fact, that you might as well just run through iseq(x) rather than perform the computation itself! That’s why I decided to call iseq(x) a wonderful sequence.

It’s too good to be true…if it wasn’t for two caveats. First, how do you recognize the value that you’re looking for? And second, what about that constant c? We’ll address these two questions below.

Comment: Another caveat is that the above doesn’t apply to all complexity models, in particular to Turing Machines. For most of the common complexity models, I expect that the result will be true if you replace ct by poly(t) where poly is a polynomial depending only on p

I’ll end this section with a simple specialization of the above that is too nice not to mention:

For any function f in P, there is a polynomial p such that \min \{ i | {\it iseq(x)}_i = f(x) \} < p(|x|)

And yes, iseqprog generates a new item of iseq(x) in k steps or less for some constant k!

The optimal solver for SAT

So what good is all this if you cannot recognize the value you’re looking for. Luckily there are some situations where validating a correct answer is simpler than producing it – yes, I’m thinking about SAT. A satisfying assignment for a boolean formula can be validated in linear time. How can we exploit Levin’s idea to create an optimal solver for SAT?

The simplest answer is to modify the program enumerator. Our new program enumerator, call it progenSAT, wraps each program generated by the original program enumerator in a SAT validator. The computation of progenSAT(i,x) will proceed in two phases like this:

Phase 1: Run progen(i,x) and assign its output value to variable y.
Phase 2: If y is a satisfying assignment for the boolean formula x then output y else loop infinitely.

If we plug progenSAT into iseqprog we get a new program iseqprogSAT generating a new sequence iseqSAT(x) on input x.

Like the original iseqprog, our new program iseqprogSAT generates a new item every k steps or less for some constant k. I’m assuming that progenSAT also takes constant time to generate each new program configuration. Let us adapt the key observation about iseq(x) to the sequence iseqSAT(x) (once again, I’ll leave the proof to the reader):

If program p outputs y on input x in time t, and y is a satisfying assigment for the boolean formula x, then y appears in iseqSAT(x) at an index less than c'(t+|x|) for c’ depending only on p.

This is remarkable! This means we have a concrete program that is optimal (up to a constant factor) for solving SAT. As a consequence, The question of P vs. NP boils down to a question about this single program’s running time. Define \it time^{\neg 0}_p(x) to be the number of steps program p takes to generate a nonzero value on input x. Now P=NP if and only if there is a polynomial p such that \it time^{\neg 0}_{\it iseqprogSAT}(x) < p(|x|) for every satisfiable boolean formula x.

In other words, there may be 1 million US$ waiting for you if you’re able to analyze iseqprogSAT‘s running time in detail.

Notes for the experimentalists

Now we’ll have a look at the other caveat about Levin’s idea: The constant factor. In the 1990’s, under the supervision of Neil Jones and Stephen Cook, I worked on implementing a program enumerator that would get iseqprog to actually terminate on some toy problems. The problem, of course, is that the constant factors involved are so large you’ll be tempted to never use big-O-notation ever again. Let’s assume that your programs are sequences of k different instructions, and that every sequence of instructions is a valid program. Then the index of a program p is roughly k^{|p|}. The constant factor c is then approximately 2^{k^{|p|}} i.e. doubly exponential in |p|. So to get an answer from iseqprog the useful programs need to be really short.

Actually I found that iseqprog favours short programs so much that it sometimes fails to find program that actually computes the function you’re looking for. In one case, half of the inputs caused one little program, p’, to give the correct result while the other half of the inputs caused another little program, p’’, to give iseqprog its output. A program that tested the input then continued as either p’ or p’’ was too long to ever get simulated.

It’s actually possible to reduce the constant factor c by a lot, if you’re willing to sacrifice the optimality in asymptotical running time. By revising the strategy used to pick which program to interpret, you wil obtain different tradeoffs between constant factor and asymptotical relation. For instance, consider the variant of iseq(x), call it iseq_triangle(x) obtained by using the following simple strategy in Levin’s construction:

Round 1: +1
Round 2: 1+2
Round 3: 12+3
Round 4: 123+4
Round 5: 1234+5

I’ll postulate the following, leaving the proof to the reader: If program p outputs y on input x in time t, then y appears in iseq_triangle(x) at an index less than {\it index}_p^2 t^2.

I once identified a few strategies of this kind but never got around to clarifying in more detail which tradeoffs are possible; or indeed optimal. Could the “triangle” strategy be improved so that the expression above instead would be {\it index}_p^2 t \log (t)? I doubt it, but have no proof. It seems like a small but interesting math exercise.

In one variation of iseqprog the programs are actually enumerated in the order of their descriptive complexity. See the references below for details on that.

Schnorr’s self-reducible predicates

Claus-Peter Schnorr analyzed applications of Levin’s result in a 1976 ICALP paper. In particular, he was interested in defining a class of predicates that do not allow asymptotical speedup. The contrast to the above results should be noticed: It is an actual predicate, a 2-valued function, that does not have speedup.

I have not been able to prove Schnorr’s main result (the paper’s proof is missing a few details) but I’d like to outline his central idea because it is interesting, and maybe one of the readers can be helpful by providing a proof in the comments of this blog post. I have simplified his definition a bit and refer you to the ICALP paper for the general definition, and for his results on graph isomorphism and speedup in the Turing machine space model.

Let us adapt some notation from the previous blog post by Amir Ben-Amram and define the complexity set T_f of a function f to be

T_f \stackrel{\it def}= \{ {\it time}_p(x)\ |\ p\ {\it computes}\ f \}

In the remainder of this section, all data will be bit strings, and P will designate a binary predicate, i.e. P(x,y) \in \{ 0, 1\}. You may think of SAT as a prime example of the kind of predicates Schnorr analyzes. The decision function for P will be defined by

{\it decide}_P \stackrel{\it def}= \lambda x.\ \exists y: P(x,y)

A function w is a witness function for P if and only if

\forall x: P(x, w(x))\ \vee (\forall y: \neg P(x,y))

The idea behind Schnorr’s result is to consider a class of predicates, P for which there is a tight connection between the complexity set T_{{\it decide}_P} and the complexity sets of the associated witness functions:

\bigcup \{ T_w\ |\ w\ \textrm{is a witness function for}\ P\}

The class in question is the class of (polynomial-time) self-reducible predicates. The criteria for being self-reducible are a bit complex. I will provide a simplified, less general, version here. P is self-reducible if P(x,y) implies |x| = |y| and there is a polynomial-time function {\it spec}_P mapping a pair of (bit string, bit) to a bit string such that

  1. P(x, y_1 \cdots y_n) = P({\it spec}_P(x, y1), y_2 \cdots y_n)
  2. |{\it spec}_P(x, b)| < |x|

Theorem (Schnorr, 1976, Theorem 2.4, rewritten): When P is self-reducible, there is an integer k and witness function w for P such that

t \in T_{{\it decide}_P} \Rightarrow \exists t' \in T_w: t' = \mathcal{O}(\lambda x .\ |x|^k t(x))

This theorem is not too hard to prove. To find a witness y for an x, you figure out the bits of y one at a time. It takes |x| rounds in which we test both 0 and 1 as the potential “next bit” of a witness. For the details, I refer you to Schnorr’s paper.

The main theorem of interest to this blog post is Schnorr’s Theorem 2.7. A precise statement of the Theorem requires more technical detail than I’m able to provide here, but its essence is this: For a self-reducible predicate P, the decision problem {\it decide}_P cannot be sped up by a factor of |x|.

As mentioned above, I’ve not been able to construct a proof based on the ICALP paper, so I’ll leave this a homework to the readers! It certainly seems like all of the necessary constructions have been lined up, but at the place where “standard methods of diagonalization” should be applied I cannot find a satisfactory interpretation of how to combine the big-O notation with the quantification of the variable i. I’d be very interested in hearing from readers that succeeded in proving this Theorem.

Historical notes and references

All papers mentioned below appear in this blog’s bibliography

Leonid Levin introduced the idea in (Levin, 1973). I must admit that I’ve never read the original Russian paper nor its translation in (Trakhtenbrot, 1984), so I rely on (Gurevich, 1988) and (Li and Vitányi, 1993) in the following. The paper presented his “Universal Search Theorem” as a result concerning resource-bounded descriptional complexity. There was no proof in the paper, but he provided the proof in private communications to Gurevich. Levin’s paper uses an advanced strategy for selecting which program to generate in each round. This strategy causes the constant factor associated with a program p to be 2^{K(p)+1} where K(p) is the prefix complexity of p and K(p) \leq |p| + 2 \log |p| + k for some constant k. This is explained in Section 7.5 of (Li and Vitányi, 1993).

Schnorr’s paper (Schnorr, 1976) is the earliest English exposition on this topic that I know of, and it seems to be the first application of Levin’s idea to predicates rather than functions with arbitrarily complex values. Gurevich dedicated most of (Gurevich, 1988) to explaining Levin’s idea which seems to have been largely unknown at the time. A major topic in Gurevich’s discussion is the complexity models in which Levin’s idea can be utilized. Amir Ben-Amram wrote a clear and precise exposition on Levin’s idea in Neil Jones’s complexity book (Ben-Amram, 1997), in his guest chapter “The existence of optimal programs”.

There have been some experiments with practical implementation of Levin’s idea. (Li and Vitányi, 1993) mentions work from the 1980’s that combines Levin’s algorithm with machine learning. My own experiments (Christensen, 1999) were done without knowledge of this prior and does not use machine learning but focuses on tailored programming languages and efficient implementations.

About the author

I’m a Ph.D. of computer science based in Copenhagen, Denmark. I am currently a Senior System Engineer working on developing highly scalable, distributed systems for Issuu, the leading digital publishing platform (see My interest in complexity theory was nurtured by Neil D. Jones and I was brought up on his book “Computability and Complexity From a Programming Perspective”. I recently had the pleasure of co-authoring a paper with Amir Ben-Amran and Jakob G. Simonsen for the Chicago Journal of Theoretical Computer Science, see


A Fundamental Theorem on Optimality and Speedup (guest post by Amir Ben-Amram)

October 11, 2012

Leonid Levin is known for several fundamental contributions to Complexity Theory. The most widely known is surely the notion of “universal search problem,” a concept similar to (and developed concurrently with) NP-completeness. Next, we might cite the proof of the existence of optimal “honest” algorithms for search problems (an honest algorithm uses a given verifier to check its result; expositions of this theorem can be found, among else, here and here).  This is an important result regarding the tension between speedup and the existence of optimal algorithms, and will surely be discussed here in the future.

Then, there is a third result, which seems to be less widely known, though it definitely ought to: I will call it The Fundamental Theorem of Complexity Theory, a name given by Meyer and Winklmann to a theorem they published in 1979, following related work including (Meyer and Fischer, 1972) and (Schnorr and Stumpf, 1975). As with the NP story, similar ideas were being invented by Levin in the USSR at about the same time.  Several years later, with Levin relocated to Boston, these lines of research united and the ultimate, tightest, polished form of the theorem was formed, and presented in a short paper by Levin and, thankfully, a longer “complete exposition” by Seiferas and Meyer (here – my thanks to Janos Simon for pointing this paper out to me). Seiferas and Meyer did not name it The Fundamental Theorem, perhaps to avoid ambiguity, but I think that it does deserve a name more punchy than “a characterization of realizable space complexities” (the title of the article).

My purpose in this blog post is to give a brief impression of this result and its significance to the study of speedup phenomena.  The reader who becomes sufficiently interested can turn to the complete exposition mentioned (another reason to do so is the details I omit, for instance concerning partial functions).

Why this theorem is really fundamental

Some Definitions: An algorithm will mean, henceforth, a Turing machine with a read-only input tape and a worktape whose alphabet is \{0,1\} (the machine can sense the end of the tape – so no blank necessary).  Program also means such a machine, but emphasizes that its “code” is written out as a string. Complexity will mean space complexity as measured on the worktape. For a machine e, its space complexity function is denoted by S_e. A function that is the S_e of some e is known as space-constructible.

Note that up to a certain translation overhead, results will apply to any Turing-equivalent model and a suitable complexity measure (technically, a Blum measure).

The Seiferas-Meyer paper makes use of a variety of clever programming techniques for space-bounded Turing machines; one example is universal simulation with a constant additive overhead.

I will argue that this theorem formulates an (almost) ultimate answer to the following (vague) question: “what can be said about optimal algorithms and speedup in general, that is, not regarding specific problems of interest?”

Examples of things that can be said are Blum’s speedup theorem: there exist problems with arbitrarily large speedup; the Hierarchy theorem: there do exist problems that have an optimal algorithm, at various levels of complexity (the type of result they prove is called a compression theorem, which unfortunately creates confusion with the well-known tape compression theorem. The appelation hierarchy theorem may bring the right kind of theorem to mind).

As shown by Seiferas and Meyer, these results, among others, can all be derived from the Fundamental Theorem, and for our chosen measure of space, the results are particularly tight: their Compression Theorem states that for any space-constructible function S, there are algorithms of that complexity that cannot be sped up by more than an additive constant. So, even a tiny (1-\varepsilon) multiplicative speedup is ruled out for these algorithms – and the algorithm may be assumed to compute a predicate (so, the size of the output is one bit and is not responsible for the complexity).

An important step to this result is the choice of a clever way to express the notion of “a problem’s complexity” (more specifically, the complexity of computing a given function). To the readership of this blog it may be clear that such a description cannot be, as one may naïvely assume, a single function that describes the complexity of a single, best algorithm for the given problem. The good answer is a so-called complexity set. This is the set of all functions S_e for machines e that solve the given function (Meyer and Fischer introduced a similar concept, complexity sequence, which is specifically intended to describe problems with speedup).

How can a complexity set be specified? Since we are talking about a set of computable functions here (in fact, space-constructible), it can be given as a set \cal E of programs that compute the functions. This is called a complexity specification. The gist of the Theorem is that it tells us which sets of programs do represent the complexity of something – moreover, it offers a choice of equivalent characterizations (an always-useful type of result).

Clearly, if a f can be computed in space S, it can also be computed in a larger space bound; it can also be computed in a space bound smaller by a constant (a nice exercise in TM hacking – note that we have fixed the alphabet). If S_1 and S_2 are space bounds that suffice, then \min(S_1,S_2) does too (another Turing-machine programming trick). So, we can assume that a set of programs \cal E represent the closure of the corresponding set of functions under the above rules. We can also allow it to include programs that compute functions which are not space-constructible: they will not be space bounds themselves, but will imply that constructible functions above them are. So, even a single program can represent an infinite family of time bounds: specifically, the bounds S_e lower-bounded (up to a constant) by the given function.

A statement of the theorem (roughly) and consequences

Theorem. Both of the following are ways to specify all existing complexity sets:

  • Sets \cal E of programs described by \Sigma_2 predicates (i.e., {\cal E} = \{ e \mid \exists a \forall b \, P(a,b,e)\}, where P is a decidable predicate).
  • Singletons.

The last item I find the most surprising. It is also very powerful. For any machine e, the fact that \{e\} is a complexity specification tells us there there is a function (in fact, a predicate) computable in space S if and only if S \ge S_e - O(1). Here is our Compression theorem!

The first characterization is important when descriptions by an infinite number of functions are considered. For example, let us prove the following:

Theorem. There is a decision problem, solvable in polynomial (in fact, quadratic) space, that has no best (up to constant factors) algorithm.

Proof. Let e_k be a program that computes \lceil n^{1+\frac{1}{k}} \rceil, written so that k is just a hard-coded constant. The idea is for the set of e_k to be recursively enumerable. Note that all these functions are constructible, that is, prospective space bounds.

Then, by the fundamental theorem, there is a decision problem solvable in space S if and only if S is at least one of these functions (up to an additive constant). If some algorithm for this problem takes at least \lceil n^{1+\frac{1}{k}} \rceil space, then there is also a solution in \lceil n^{1+\frac{1}{k+1}} \rceil, and so forth.

Limitations and some loose ends

As exciting as I find this theorem, it has its limitations. Not all the speedup-related results seem to follow from it; for instance, the other Levin’s theorem doesn’t (or I couldn’t see how). Also, results like those given here and here about the measure, or topological category, of sets like the functions that have or do not have speedup, do not seem to follow. In fact, Seiferas and Meyer only prove a handful of “classic” results like Blum speedup, the Compression theorem and a Gap theorem. What new questions about complexity sets can be asked and answered using these techniques?

Another limitation is that for complexity measures other than space we do not have such tight results. So, for example, for Turing-machine time we are stuck with the not-so-tight hierarchy theorems proved by diagonalization, padding etc. (see references in the bibliography). Is this a problem with our proof methods? Or could some surprising speedup phenomenon be lurking there?

[Oct 16, 2012. Fixed error in last theorem]

Several Comments from Manuel Blum

September 13, 2012

Dear Hunter — What a nice idea. And many of your examples below are both new to me and quite interesting. Thanks! And good luck! — manuel

While thinking about your blog, two things occurred to me that might be worth mentioning:

1. If a function f(x) has speedup, then any lower bound on its computation can be improved by a corresponding amount. For example, if every program for computing f(x) can be sped up to run twice as fast (on all but a finite number of integers), then any lower bound G(x) on its run time can be raised from G(x) to 2G(x) (on all but a finite number of integers). For another example, if any program for computing f(x) can be sped up by a sqrt (so that any run time F(x) can be reduced to a runtime of at most sqrt(F(x)), then any lower bound G(x)on its run time can be raised to [G(x)]^2, etc. This is all easy to see.

2. Much harder to see is a curious relation between speedup and inductive inference, which has to do with inferring an algorithm from observation of the sequence of integers that it generates. Theorem: there exists an inductive inference algorithm for inferring all sequences that have optimal algorithms (i.e. have programs that cannot be sped up)! This was quite a surprise (and a breakthrough) for me. Still is. To explain it though, I’d have to explain inductive inference, etc, and this would take me a bit of time. Some day…

Anyway, thanks again for your blog.

Best wishes and warm regards,

manuel (blum)

Speedup in Computational Complexity

September 10, 2012

Welcome to this new blog aimed at promoting discussion of speedup for natural computational problems, that is, the question in computational complexity of whether a problem has a best algorithm. The intention is to have a series of guest posts by researchers in the field and to provide a shared, user maintained bibliography of work on the topic.

I believe this blog is necessary because the topic is interesting and underappreciated. It is a fundamental property of a computational problem as to whether it does or does not have an optimal algorithm. While a few classic results on speedup versus optimality are widely known, such as Blum’s speedup for artificially constructed languages and Levin’s optimal NP search algorithm, other key results on speedup for natural problems are not familiar. For instance, there is no best Strassen-style bilinear identity for matrix multiplication. The existence of efficient algorithms for MM is linked to key problems in combinatorics such as the Erdos-Rado sunflower conjecture (see Alon et al) and efforts to lower the MM exponent have recently resumed. Curiously, four of the five problems with a gap between their monotone and nonmonotone circuit complexity rely upon integer or matrix multiplication, which are conjectured to have speedup. As another example, Stockmeyer’s PhD thesis presented a result on speedup for natural problems that had been virtually forgotten. It has also been established that if there is an infinitely often (i.o.) superpolynomial speedup for the time required to accept any (paddable) coNP-complete language, then all (paddable) coNP-complete languages have this speedup, which is equivalent to a superpolynomial speedup in proof length in propositional proof systems for tautologies.

I have argued that there is a correspondence between the properties “has no algorithm at all” and “has no best algorithm” by demonstrating that a resource-bounded version of the statement “no algorithm recognizes all non-halting Turing machines” is equivalent to an infinitely often (i.o.) superpolynomial speedup for the time required to accept any (paddable) coNP-complete language. A literature review on speedup can be found here.

Please feel free to contribute by commenting on blog posts or adding papers to the shared bibliography and also let me know if you would like to make a guest post on a topic of general interest or on your latest research.

Hunter Monroe