Explicit Laziness

This post describes a particular technique that is useful for optimizing functional programs, which I call explicit laziness. This example came about because I needed a purely functional implementation of an infinite list of primes, to demonstrate the capabilities of exporting OpenTheory logic packages to Haskell. The Sieve of Eratosthenes is a classic algorithm to compute this, and in an imperative language can be implemented with an array where it is cheap to update elements. In translating the algorithm to a purely functional language the array data structure naturally becomes a list, where it is much cheaper to update leading elements than trailing elements. This led to the idea of using explicit laziness to minimize the number of updates to trailing elements, and the result is an improved algorithm that could be backported to speed up an imperative implementation. The price of the improvement is greater complexity, but this can be hidden inside an abstract type, and the increased verification challenge can be met by the HOL Light theorem prover which is used to generate the source logic package.

Example Sieve Program

To illustrate the explicit laziness technique, we’ll use it to optimize an Haskell program for computing prime numbers based on a sieve method. This is the central Sieve data structure:

newtype Sieve =
  Sieve { unSieve :: (Natural,[(Natural,Natural)]) }

Values of type Natural are non-negative integers of unbounded size. Every value Sieve (n,ps) of type Sieve must satisfy the invariant that map fst ps is the list of all prime numbers up to the perimeter n in ascending order, and the counter k in every element (p,k) of the list ps satisfies k == n `mod` p. The following examples are valid values of type Sieve:

  • Sieve (9,[(2,1),(3,0),(5,4),(7,2)])
  • Sieve (10,[(2,0),(3,1),(5,0),(7,3)])
  • Sieve (11,[(2,1),(3,2),(5,1),(7,4),(11,0)])

The Sieve API supports two operations for creating new values of type Sieve:

initial :: Sieve
increment :: Sieve -> (Bool,Sieve)

The initial value of a Sieve is simply:

initial :: Sieve
initial = Sieve (1,[])

The increment operation takes a value Sieve (n,_) and returns a new value Sieve (n + 1, _), together with a boolean indicating whether the new perimeter n + 1 is a prime number:

increment :: Sieve -> (Bool,Sieve)
increment =
  \s ->
    let (n,ps) = unSieve s in
    let n' = n + 1 in
    let ps' = inc ps in
    let b = check ps' in
    let ps'' = if b then ps' ++ [(n',0)] else ps' in
    (b, Sieve (n', ps''))
  where
    inc = map (\(p,k) -> (p, (k + 1) `mod` p))
    check = all (\(_,k) -> k /= 0)

The increment operation works by adding 1 to the perimeter n <- n + 1 and every counter k <- (k + 1) `mod` p. It then checks whether any counter is zero, which happens when the perimeter is divisible by a known prime. If no counter is zero then the perimeter is a new prime, and so (n,0) is added to the end of the list.

Motivating Explicit Laziness

How can we optimize this program? Well, there are certainly many ways that we can help the compiler produce efficient target code, but before we get into that we should consider algorithmic improvements. For example, are we performing any unnecessary computation? It seems that we are, because we increment every counter even though the boolean result only needs to know whether any counter is zero. We could stop at the first zero counter and just make a note (called a closure) that the other counters need to be incremented later. This is an instance of lazy evaluation, and since Haskell is a lazy language this optimization will be performed automatically.

Unfortunately, there is still a performance problem here. Consider a time in the future when we are incrementing counters and we come to a counter with an increment closure. At this point Haskell’s lazy evaluation will execute the closure to increment the counter, and then increment it for a second time to see whether it is zero. This is functionally correct, but it would be much more efficient to add 2 to the counter rather than add 1 and then add 1 again. Even worse, it turns out that the increment closures stack up on counters during evaluation, to the point that each time we discover a new prime p we have to increment the counter k in the final list element (q,k) a total of p - q times before we can check that it is non-zero and record the new prime.

How can we avoid this inefficient behavior of adding m to a counter by adding 1 to it m times? This is a case where Haskell’s automatic laziness is not flexible enough, because whenever evaluation encounters a value with a closure, the only possible course of action is to execute the closure and then continue evaluation. There is no way to examine the closure to find out what computation it will perform. However, we can manually simulate lazy evaluation by storing data structures together with explicit notes saying what computation needs to be done before they can be used.

Demonstrating Explicit Laziness

Here is how explicit laziness works in our example. First, we extend the Sieve data structure as follows:

newtype Sieve =
  Sieve { unSieve :: (Natural,[(Natural,(Natural,Natural))]) }

This is exactly the same as before, except that each list element (p,(k,j)) has an extra natural number component j, which is a note that j must be added to every counter following this one in the list.

The definition of the initial value remains the same:

initial :: Sieve
initial = Sieve (1,[])

However, the increment operation now takes into account the explicit j notes:

increment :: Sieve -> (Bool,Sieve)
increment =
  \s ->
    let (n,ps) = unSieve s in
    let n' = n + 1 in
    let (b,ps') = inc n' 1 ps in
    (b, Sieve (n',ps'))
  where
    inc n _ [] = (True, (n,(0,0)) : [])
    inc n i ((p,(k,j)) : ps) =
      let k' = (k + i) `mod` p in
      let j' = j + i in
      if k' == 0 then (False, (p,(0,j')) : ps)
      else let (b,ps') = inc n j' ps in (b, (p,(k',0)) : ps')

The inc and check functions in the previous implementation have been combined into one inc function which takes a parameter i recording how much to increment counters. Initially i is set to 1, but it absorbs the j values for the counters it visits. As soon a zero counter is discovered, the i parameter is added to the current j value and the rest of the list is untouched. This last point is crucial: explicit laziness eliminates the overhead of closures that are normally generated to update data structures not directly required for the result. For comparison with the version of the program without explicit laziness, here are the Sieve values with perimeters 9, 10 and 11:

  • Sieve (9,[(2,(1,0)),(3,(0,2)),(5,(2,0)),(7,(0,0))])
  • Sieve (10,[(2,(0,1)),(3,(0,2)),(5,(2,0)),(7,(0,0))])
  • Sieve (11,[(2,(1,0)),(3,(2,0)),(5,(1,0)),(7,(4,0)),(11,(0,0))])

Values of the new version of the Sieve type are much harder to read than values of the original version, because the j values have to be added to every counter following it in the list. This complicates the invariant for Sieve values, but is actually a measure of the efficiency gained. For example, the j value of 2 in the Sieve value with perimeter 9 line replaces 4 counter increments in the original program: 2 for the 5 counter and another 2 for the 7 counter.

Performance Comparison

Even though explicit laziness is an algorithmic optimization, and there remain many optimizations that can be applied to both versions of the example sieve program, it is nevertheless interesting to gauge the performance effects of explicit laziness using a standard compiler. The following table presents the time taken to compute the Nth prime using both versions of the example sieve program, for various values of N:

Nth prime 100 200 400 800 1600 3200
Without explicit laziness 0.007s 0.019s 0.068s 0.284s 1.169s 5.008s
With explicit laziness 0.004s 0.008s 0.025s 0.083s 0.345s 1.495s

It is gratifying to see that the version of the example sieve program with explicit laziness is about 3x faster on this small experiment.

Summary

In this post we have described explicit laziness, a useful technique for optimizing functional programs. Like all optimization techniques its effectiveness depends on the context: automatic laziness is more flexible and guarantees that closures will be evaluated at most once, so should be the default. However, the judicious introduction of explicit laziness can eliminate closures and allow future computations to be combined into more efficient variants before executing them (a limited form of computation on computations).

Appendix

The code for the optimized version of the example sieve program is available at Hackage: this verified Haskell package is automatically generated from an OpenTheory package. The performance results were gathered using the GHC 7.4.1 compiler on a MacBook Air with a 1.8GHz Intel Core i7 processor and 4Gb of RAM, running OS X 10.7.5.

Advertisements

One thought on “Explicit Laziness

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s