Levin’s and Schnorr’s optimality results


This week on “Speedup in Computational Complexity” we’re going to learn how to write an optimal solver for SAT. Thanks to Leonid Levin we know that a very partial answer to “Are there familiar computational problems with no best algorithm?” is “Well, computing a satisfying assignment for a boolean formula does have a best algorithm, that’s for sure!”. In fact we’ll explore several variations of an idea by Levin all of which provide us with computational problems that each have a best algorithm. One particular variation, by Claus-Peter Schnorr, applies to the class of self-reducible problems.

A word of warning before we start: The below constructions only work in some models of complexity. One of the models in which the results will not apply is the Turing Machine model. I will mention the model requirements as we go along but if you’d like a more detailed discussion of this topic, I refer you to Gurevich’s paper in the references.

Without further ado let’s jump straight into Levin’s neat little trick, performed by a combination of an interpreter and a program enumerator.

A wonderful sequence of values

The program that we’ll design in this section takes an input x and runs infinitely, outputting an infinite sequence of values. Our program will output a new number in the sequence every k steps for some constant k. The sequence produced will turn out to be quite a wonderful characterization of x (if you love computational complexity). I’ll use the name iseq(x) for the infinite sequence generated on input x.

To design our program – let’s call it iseqprog – we’ll need two other programs to start from: A program enumerator and an interpreter for the enumerated programs.

The program enumerator, progen, takes as input a pair (i,x) and returns the initial configuration of the program with index i on input x. We’ll expect this operation to be constant-time when either i=0 or we already called progen on (i-1,x). In other words: progen is more like a method of an object (with internal state) which expects to be called with inputs (0,x), (1,x), (2,x),… and is able to process each item in such a call sequence in constant-time.

The interpreter we’ll need cannot be any old interpreter. In these modern times we can expect a certain service level. The interpreter should work like a slot machine in the arcades: Whenever I put in a new coin I continue my game with three more lives. In other words, when I give the interpreter the configuration of program p after t steps on input x, it returns an updated configuration representing the state of program p after t+1 steps on input x. It also tells me if p terminated in step t+1 and, if so, the return value of p on x. All of this happens in constant time. After all, the interpreter only needs to simulate one single step of p on x.

Comment: Almost any old interpreter can be used for Levin’s construction, but the exposition would become more complex.

Now I’ll describe the computation of iseqprog on input x. The computation proceeds in rounds, and each round consists of a number of stages. There is an infinite number of rounds. The number of stages in each round is finite but not constant across rounds.

Round 1 has only 1 stage. In this first stage of the first round, iseqprog runs progen on (0,x) and gets back the initial configuration of program 0 on input x. iseqprog then uses the interpreter to interpret just 1 step of program 0 on input x. If program 0 happens to terminate on input x in that first step, iseqprog immediately outputs program 0’s output on input x. Regardless of whether the interpretation of program 0 terminated, iseqprog itself does not terminate; it is on its way to generate an infinite sequence. If the interpretation of program 0 on input x did not terminate in its first step, iseqprog outputs the value 0 before continuing, providing us with a signal that it’s still live and running. This concludes the first (1-stage) round of iseqprog’s computation on x.

The computation continues in an infinite sequence of rounds. In each round, iseqprog calls progen once, adding one new item to the set of program configurations it has accumulated during the previous rounds. Each of these program configurations is interpreted for a number of steps. Every time the interpreter has executed one step of a program i, iseqprog outputs one value. The output value will be 0 or program i’s output on x. Whatever program you may think of, iseqprog will eventually generate its output on x (in between a lot of 0’s and many other values).

If we use this shorthand:

  • +i”: means “create the initial configuration for program i on input x, then interpret 1 step on that configuration and output one value”
  • i” means “interpret 1 more step on the stored configuration for program i and output one value”

then we can sum up iseqprog’s first rounds like this:

Round 1: +1
Round 2: 11+2
Round 3: 111122+3
Round 4: 11111111222233+4
Round 5: 111111111111111122222222333344+5
Round 6: 32 1’s, 16 2’s, 8 3’s, 4 4’s, 2 5’s, and one +6

I hope it has become clear why iseqprog should be able to generate a new item of iseq every k steps or less for some constant k. Apart from the administrative work of looking up and saving configurations in some table, each step involves at most one call to the program enumerator and one call to the interpreter. These calls were assumed to be constant-time. The administrative work I wlll simply assume to be constant-time as well. iseqprog cannot work as intended in all complexity models; in particular, it doesn’t work for Turing machines.

Now let’s have a look at the sequence iseq(x) itself. The key observation is that although any individual program does not get much attention from iseqprog, it does get a specific percentage of attention that is not dependent on the input x. For instance, program 3 accounts for \frac{1}{2^3}=\frac{1}{8} of the interpreter calls made by iseqprog regardless of the input x. The percentage is tied only to the program’s index number according to the program enumerator. From this observation we can derive (proof left to the reader) the salient feature of iseq(x):

If program p outputs y on input x in time t, then y appears in iseq(x) at an index less than ct for c depending only on p.

I think this is great! Whatever you want to compute from x, you’ll find it in iseq(x). What’s more: Your answer appears quite early in the sequence – so early, in fact, that you might as well just run through iseq(x) rather than perform the computation itself! That’s why I decided to call iseq(x) a wonderful sequence.

It’s too good to be true…if it wasn’t for two caveats. First, how do you recognize the value that you’re looking for? And second, what about that constant c? We’ll address these two questions below.

Comment: Another caveat is that the above doesn’t apply to all complexity models, in particular to Turing Machines. For most of the common complexity models, I expect that the result will be true if you replace ct by poly(t) where poly is a polynomial depending only on p

I’ll end this section with a simple specialization of the above that is too nice not to mention:

For any function f in P, there is a polynomial p such that \min \{ i | {\it iseq(x)}_i = f(x) \} < p(|x|)

And yes, iseqprog generates a new item of iseq(x) in k steps or less for some constant k!

The optimal solver for SAT

So what good is all this if you cannot recognize the value you’re looking for. Luckily there are some situations where validating a correct answer is simpler than producing it – yes, I’m thinking about SAT. A satisfying assignment for a boolean formula can be validated in linear time. How can we exploit Levin’s idea to create an optimal solver for SAT?

The simplest answer is to modify the program enumerator. Our new program enumerator, call it progenSAT, wraps each program generated by the original program enumerator in a SAT validator. The computation of progenSAT(i,x) will proceed in two phases like this:

Phase 1: Run progen(i,x) and assign its output value to variable y.
Phase 2: If y is a satisfying assignment for the boolean formula x then output y else loop infinitely.

If we plug progenSAT into iseqprog we get a new program iseqprogSAT generating a new sequence iseqSAT(x) on input x.

Like the original iseqprog, our new program iseqprogSAT generates a new item every k steps or less for some constant k. I’m assuming that progenSAT also takes constant time to generate each new program configuration. Let us adapt the key observation about iseq(x) to the sequence iseqSAT(x) (once again, I’ll leave the proof to the reader):

If program p outputs y on input x in time t, and y is a satisfying assigment for the boolean formula x, then y appears in iseqSAT(x) at an index less than c'(t+|x|) for c’ depending only on p.

This is remarkable! This means we have a concrete program that is optimal (up to a constant factor) for solving SAT. As a consequence, The question of P vs. NP boils down to a question about this single program’s running time. Define \it time^{\neg 0}_p(x) to be the number of steps program p takes to generate a nonzero value on input x. Now P=NP if and only if there is a polynomial p such that \it time^{\neg 0}_{\it iseqprogSAT}(x) < p(|x|) for every satisfiable boolean formula x.

In other words, there may be 1 million US$ waiting for you if you’re able to analyze iseqprogSAT‘s running time in detail.

Notes for the experimentalists

Now we’ll have a look at the other caveat about Levin’s idea: The constant factor. In the 1990’s, under the supervision of Neil Jones and Stephen Cook, I worked on implementing a program enumerator that would get iseqprog to actually terminate on some toy problems. The problem, of course, is that the constant factors involved are so large you’ll be tempted to never use big-O-notation ever again. Let’s assume that your programs are sequences of k different instructions, and that every sequence of instructions is a valid program. Then the index of a program p is roughly k^{|p|}. The constant factor c is then approximately 2^{k^{|p|}} i.e. doubly exponential in |p|. So to get an answer from iseqprog the useful programs need to be really short.

Actually I found that iseqprog favours short programs so much that it sometimes fails to find program that actually computes the function you’re looking for. In one case, half of the inputs caused one little program, p’, to give the correct result while the other half of the inputs caused another little program, p’’, to give iseqprog its output. A program that tested the input then continued as either p’ or p’’ was too long to ever get simulated.

It’s actually possible to reduce the constant factor c by a lot, if you’re willing to sacrifice the optimality in asymptotical running time. By revising the strategy used to pick which program to interpret, you wil obtain different tradeoffs between constant factor and asymptotical relation. For instance, consider the variant of iseq(x), call it iseq_triangle(x) obtained by using the following simple strategy in Levin’s construction:

Round 1: +1
Round 2: 1+2
Round 3: 12+3
Round 4: 123+4
Round 5: 1234+5

I’ll postulate the following, leaving the proof to the reader: If program p outputs y on input x in time t, then y appears in iseq_triangle(x) at an index less than {\it index}_p^2 t^2.

I once identified a few strategies of this kind but never got around to clarifying in more detail which tradeoffs are possible; or indeed optimal. Could the “triangle” strategy be improved so that the expression above instead would be {\it index}_p^2 t \log (t)? I doubt it, but have no proof. It seems like a small but interesting math exercise.

In one variation of iseqprog the programs are actually enumerated in the order of their descriptive complexity. See the references below for details on that.

Schnorr’s self-reducible predicates

Claus-Peter Schnorr analyzed applications of Levin’s result in a 1976 ICALP paper. In particular, he was interested in defining a class of predicates that do not allow asymptotical speedup. The contrast to the above results should be noticed: It is an actual predicate, a 2-valued function, that does not have speedup.

I have not been able to prove Schnorr’s main result (the paper’s proof is missing a few details) but I’d like to outline his central idea because it is interesting, and maybe one of the readers can be helpful by providing a proof in the comments of this blog post. I have simplified his definition a bit and refer you to the ICALP paper for the general definition, and for his results on graph isomorphism and speedup in the Turing machine space model.

Let us adapt some notation from the previous blog post by Amir Ben-Amram and define the complexity set T_f of a function f to be

T_f \stackrel{\it def}= \{ {\it time}_p(x)\ |\ p\ {\it computes}\ f \}

In the remainder of this section, all data will be bit strings, and P will designate a binary predicate, i.e. P(x,y) \in \{ 0, 1\}. You may think of SAT as a prime example of the kind of predicates Schnorr analyzes. The decision function for P will be defined by

{\it decide}_P \stackrel{\it def}= \lambda x.\ \exists y: P(x,y)

A function w is a witness function for P if and only if

\forall x: P(x, w(x))\ \vee (\forall y: \neg P(x,y))

The idea behind Schnorr’s result is to consider a class of predicates, P for which there is a tight connection between the complexity set T_{{\it decide}_P} and the complexity sets of the associated witness functions:

\bigcup \{ T_w\ |\ w\ \textrm{is a witness function for}\ P\}

The class in question is the class of (polynomial-time) self-reducible predicates. The criteria for being self-reducible are a bit complex. I will provide a simplified, less general, version here. P is self-reducible if P(x,y) implies |x| = |y| and there is a polynomial-time function {\it spec}_P mapping a pair of (bit string, bit) to a bit string such that

  1. P(x, y_1 \cdots y_n) = P({\it spec}_P(x, y1), y_2 \cdots y_n)
  2. |{\it spec}_P(x, b)| < |x|

Theorem (Schnorr, 1976, Theorem 2.4, rewritten): When P is self-reducible, there is an integer k and witness function w for P such that

t \in T_{{\it decide}_P} \Rightarrow \exists t' \in T_w: t' = \mathcal{O}(\lambda x .\ |x|^k t(x))

This theorem is not too hard to prove. To find a witness y for an x, you figure out the bits of y one at a time. It takes |x| rounds in which we test both 0 and 1 as the potential “next bit” of a witness. For the details, I refer you to Schnorr’s paper.

The main theorem of interest to this blog post is Schnorr’s Theorem 2.7. A precise statement of the Theorem requires more technical detail than I’m able to provide here, but its essence is this: For a self-reducible predicate P, the decision problem {\it decide}_P cannot be sped up by a factor of |x|.

As mentioned above, I’ve not been able to construct a proof based on the ICALP paper, so I’ll leave this a homework to the readers! It certainly seems like all of the necessary constructions have been lined up, but at the place where “standard methods of diagonalization” should be applied I cannot find a satisfactory interpretation of how to combine the big-O notation with the quantification of the variable i. I’d be very interested in hearing from readers that succeeded in proving this Theorem.

Historical notes and references

All papers mentioned below appear in this blog’s bibliography

Leonid Levin introduced the idea in (Levin, 1973). I must admit that I’ve never read the original Russian paper nor its translation in (Trakhtenbrot, 1984), so I rely on (Gurevich, 1988) and (Li and Vitányi, 1993) in the following. The paper presented his “Universal Search Theorem” as a result concerning resource-bounded descriptional complexity. There was no proof in the paper, but he provided the proof in private communications to Gurevich. Levin’s paper uses an advanced strategy for selecting which program to generate in each round. This strategy causes the constant factor associated with a program p to be 2^{K(p)+1} where K(p) is the prefix complexity of p and K(p) \leq |p| + 2 \log |p| + k for some constant k. This is explained in Section 7.5 of (Li and Vitányi, 1993).

Schnorr’s paper (Schnorr, 1976) is the earliest English exposition on this topic that I know of, and it seems to be the first application of Levin’s idea to predicates rather than functions with arbitrarily complex values. Gurevich dedicated most of (Gurevich, 1988) to explaining Levin’s idea which seems to have been largely unknown at the time. A major topic in Gurevich’s discussion is the complexity models in which Levin’s idea can be utilized. Amir Ben-Amram wrote a clear and precise exposition on Levin’s idea in Neil Jones’s complexity book (Ben-Amram, 1997), in his guest chapter “The existence of optimal programs”.

There have been some experiments with practical implementation of Levin’s idea. (Li and Vitányi, 1993) mentions work from the 1980’s that combines Levin’s algorithm with machine learning. My own experiments (Christensen, 1999) were done without knowledge of this prior and does not use machine learning but focuses on tailored programming languages and efficient implementations.

About the author

I’m a Ph.D. of computer science based in Copenhagen, Denmark. I am currently a Senior System Engineer working on developing highly scalable, distributed systems for Issuu, the leading digital publishing platform (see http://issuu.com/about). My interest in complexity theory was nurtured by Neil D. Jones and I was brought up on his book “Computability and Complexity From a Programming Perspective”. I recently had the pleasure of co-authoring a paper with Amir Ben-Amran and Jakob G. Simonsen for the Chicago Journal of Theoretical Computer Science, see http://cjtcs.cs.uchicago.edu/articles/2012/7/contents.html


6 Responses to “Levin’s and Schnorr’s optimality results”

  1. Jesper Kristensen Says:

    > I’ll postulate the following, leaving the proof to the reader: If program p outputs y on input x in time t, then y appears in iseq_triangle(x) at an index less than {\it index}_p^2 t^2.

    It appears to me that this could be strengthened to ({\it index}_p + t)^2.

    As a solution to your small exercise I claim that the strategy below has a complexity of O(t*{\it index}_p*log(t*{\it index}_p)):
    Round n: For each p: Continue computation of program p until step 2^n/{\it index}_p rounded down.

    • Niels H. Christensen Says:

      > It appears to me that this could be strengthened to ({\it index}_p + t)^2.

      Yes, you’re right! After {\it index}_p + t rounds, program p will have been interpreted for t steps, and each round up to that has at most {\it index}_p + t steps.

    • Niels H. Christensen Says:

      > the strategy below has a complexity of \mathcal{O}(t\ {\it index}_p  \log ( t\  {\it index}_p))

      It’s an interesting strategy but how do you obtain the above limit? The lowest limit I can see right now is \mathcal{O}(t\  {\it index}^2_p + t^2 {\it index}_p))

      • Jesper Kristensen Says:

        I skipped my sketch proof of the claimed complexity in my previous comment as it got longer and uglier than I hoped for.

        After revisiting the idea I came up with a different version of the strategy that is easier to reason about:

        Round n: For every integer k with 0<=k<=n: Simulate the first 2^k steps of every of the first 2^(n-k) programs (that is; the 2^(n-k) programs p for which {\it index}_p <= 2^(n-k))

        Note that round n consists of n+1 subtasks each of length 2^k * 2^(n-k) = 2^n. The work of round n is thus (n+1)*2^n. The work of all rounds from 1 to n is sum((i+1) * 2^i) for 1<=i<=n). This is less than (n+1)*(2^(n+1)).

        The values a and b introduced below are just syntactic sugar to make the proof more readable.
        For any {\it index}_p define the integer a such that 2^a < {\it index}_p <= 2^(a+1)
        For any t define the integer b such that 2^b < t <= 2^(b+1)

        If the program p prints a solution in time t then the strategy will simulate this event in round (a+1)+(b+1) (when doing 2^(b+1) simulations of program p with {\it index}_p <= is 2^(a+1)).

        Summing the work required for all a+b+2 rounds we get the upper bound ((a+b+2)+1) * (2^((a+b+2)+1)) = 8*(a+b+3) * (2^a * 2^b) < 8 * (log_2 {\it index}_p + log_2 t + 3) * {\it index}_p * t

        This last expression gives a complexity slightly lower than the stated \mathcal{O}(t\ {\it index}_p \log ( t\ {\it index}_p))

      • Niels H. Christensen Says:

        Thanks for the proof! Yes, that checks out.

  2. Jesper Kristensen Says:

    In my earlier response I forgot to account for the work needed to create the initial configuration for a program.

    However, this work can be considered amortized contant time which essentially allows us to ignore it.

    It just requires that our strategy runs through the programs in some order where it is amortized O(1) work to modify one initial configuration into the next.

    Each program is then simulated using the initial configuration and copy-on-write (like when creating a duplicate process using fork).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: