Category Archives: Haskell

The Slowest Software Development Methodology in the World

For some time now I’ve been practicing what can only be described as the slowest software development methodology in the world: a three step waltz of Prototyping, Verification and Export.

Prototyping: I begin quite normally by writing some code in Haskell, testing out different ways of solving some problem that I’m interested in. In this step I’m mainly focused on achieving a clean design with reasonable performance, and wherever possible I use existing Haskell packages (but restricted to those that have been developed with the same software development methodology). I also write QuickCheck tests to check various properties of the functions in the design. At this point I’m essentially done with the traditional coding phase, though usually I spend what might be perceived as an excessive amount of time polishing to make the design and properties as simple and clear as possible.

Verification: Next I use the HOL Light interactive theorem prover to develop a logical theory that (i) defines the types and functions in my Haskell design, and (ii) proves the QuickCheck properties as theorems that follow from the definitions. If the Haskell package depends on other packages that follow the same software development methodology then their contents must also be formalized in theories, and the logical definitions can build on each other in the same way as the Haskell definitions. This step is the most labour-intensive, and often some degree of creativity is required to formulate properties of recursive functions that can be proved by induction. This is where the effort spent polishing the prototype repays itself, since any unnecessary complexity in the design will inevitably thread its way through the proof of many properties.

Export: The final step is to use the opentheory tool to export the logical theory as a Haskell package. I first package up the HOL Light theory in OpenTheory format, by rendering its proof in a low-level article format and adding some meta-data such as the package name and the list of OpenTheory packages it depends on. There is also some Haskell-specific meta-data, such as the mapping from OpenTheory symbol names to Haskell names, and selecting which theorems should be exported as QuickCheck properties. Once that is complete the opentheory tool can automatically export the OpenTheory package as a Haskell package (in a form ready to be uploaded to Hackage).

Why Would You Do That?

At this point you might reasonably ask: why develop a Haskell package, throw it away, and then develop a logical theory only to export the same Haskell package again? Excellent question, and the rest of this post is devoted to answering it, but let’s begin with a depressing truth: it’s not the proof. Despite the countless hours and myriad creative insights you poured into that final Q.E.D., no one wants to see or even hear about your proof. Formal verification proofs are write-only. Sorry about that.

Specification: The proofs might be irrelevant, but the proved theorems constitute a precise specification of the Haskell package, grounded all the way down to standard mathematical objects formalized in higher order logic. I view Haskell as a computational platform for higher order logic (rather than higher order logic as a verification platform for Haskell), a perspective that emphasizes the consistent set of logical theories underlying the Haskell package that defines its semantics. A practical consequence of this is that any differences in the behaviour of the Haskell package and its higher order logic semantics (for example, due to bugs in the export code) must be fixed by changing the Haskell computational platform to conform to its higher order logic semantics.

Let’s take a look at an example specification for an extended GCD function egcd defined in the Haskell package opentheory-divides:

egcd :: Natural -> Natural -> (Natural, (Natural, Natural))

The following theorem about egcd is one of many proved in the corresponding logical theory natural-divides:

∀a b g s t. ¬(a = 0) ∧ egcd a b = (g,(s,t)) ⇒ t * b + g = s * a

This says that the three components of the egcd result (g, s and t) will satisfy a particular equation with its arguments (a and b), so long as the first argument a is non-zero. Side-conditions like this don’t fit naturally into the Haskell type system, and so the burden falls on programmers to discover them and code around them when necessary. This may be one reason why programmers prefer to write their own software components (with side-conditions tailored to the application) rather than reuse existing components (with unknown and/or inconvenient side-conditions). Using this software development methodology I shift the burden of discovering side-conditions to HOL Light, which will force me to prove that the first argument of egcd is non-zero in any code that relies on the above property of the result. If the code just uses the property that the result g of egcd is the greatest common divisor then HOL Light will understand that no side-condition proof is necessary, because another theorem in the logical theory proves that this is true for all arguments to egcd:

∀a b. fst (egcd a b) = gcd a b

Proof: It may be surprising to learn that verified Haskell packages can be useful as a building block to formalize mathematics in higher order logic. Once we have defined Haskell types and functions in the logic and proved that they satisfy their specification, we can build on this formal development to prove mathematical theorems that do not refer to any Haskell symbols. To continue the egcd example, suppose we have two natural numbers a and b (b is greater than 1) that are coprime (i.e., gcd a b = 1). Then we can reduce modulo b both sides of the first property of egcd to show the existence of a multiplicative inverse for a modulo b:

∀a b. 1 < b ∧ gcd a b = 1 ⇒ ∃s. (s * a) `mod` b = 1

This is a non-trivial higher order logic theorem which refers only to standard mathematical concepts, but its proof rests on the properties of the verified Haskell egcd function.

Maintenance: Software maintenance is a pain, and it is here that this software development methodology really shines (and who knows, maybe eventually cause the tortoise to pass the hare?). Formally verified software offers a new capability: by examining the logical theory of a Haskell package, we can see exactly which properties of dependent packages it relies on. We can use this information to automatically check whether or not a new version of a dependent package will preserve functional correctness: if so then extend the acceptable version range; and if not then report an error to the maintainer. The opentheory tool uses this technique to automatically generate maximal acceptable version ranges for dependent packages whenever it exports Haskell packages, removing another cognitive burden from the programmer. Earlier I said that formal verification proofs were write-only, but really all I meant is that humans won’t read them: as this version analysis shows machines are quite happy to read them and extract useful information.

Fun: Actually I retract the whole depressing truth: it is totally about the proof. Not for showing to people (humans still won’t read formal verification proofs) but rather for the pure fun of writing them. Those myriad creative insights you made over the course of countless hours? For me formal verification is as immersive as a video game, and my high scores come with a new artefact to play with and build upon. Free toy inside!


A Natural GCD Algorithm

The greatest common divisor (gcd) of two non-negative integers is the largest integer that divides both of them, so for example

gcd 15 6 == 3
gcd 21 10 == 1
gcd 10 0 == 10
gcd 0 0 == 0  -- by convention

Two numbers that have a gcd of 1 (like 21 and 10) are said to be coprime. Although in general it’s a hard problem to factor an integer into divisors

91 == 7 * 13
69 == 3 * 13

it’s an easy problem to compute the gcd of two integers

gcd 91 69 == 13

and if they are not coprime then a non-trivial divisor of both is revealed.

Here is a simple recursive definition that efficiently computes gcd:

gcd :: Integer -> Integer -> Integer
gcd a 0 = a
gcd a b = gcd b (a `mod` b)

How would we check that this definition always computes the gcd of two non-negative integers? If we define a divides relation

divides :: Integer -> Integer -> Bool
divides 0 b = b == 0
divides a b = b `mod` a == 0

then we can easily test that gcd returns some common divisor:

commonDivisorTest :: Integer -> Integer -> Bool
commonDivisorTest a b = let g = gcd a b in divides g a && divides g b

But how to check that gcd returns the greatest common divisor? Consider the linear combination g of a and b

g = s * a + t * b

where the coefficients s and t are arbitrary integers. Any common divisor of a and b must also be a divisor of g, and so if g is itself a common divisor of a and b then it must be the greatest common divisor. The extended gcd algorithm computes the coefficients s and t together with the gcd:

egcd :: Integer -> Integer -> (Integer,(Integer,Integer))
egcd a 0 = (a,(1,0))
egcd a b =
    (g, (t, s - (a `div` b) * t))
    (g,(s,t)) = egcd b (a `mod` b)

which allows us to test that egcd returns the greatest common divisor:

greatestCommonDivisorTest :: Integer -> Integer -> Bool
greatestCommonDivisorTest a b =
    let (g,(s,t)) = egcd a b in
    divides g a && divides g b && s * a + t * b == g

The coefficients s and t are useful for much more than checking the gcd result: for any coprime a and b we have

s * a + t * b == 1
(s * a) `mod` b == 1  -- assuming b > 1
(t * b) `mod` a == 1  -- assuming a > 1

which shows that s is the multiplicative inverse of a modulo b, and t is the multiplicative inverse of b modulo a.

Computing Over the Natural Numbers

The extended gcd algorithm is a classic integer algorithm, but what if we wanted a version that computed over the type of natural numbers? This would be useful in environments in which unsigned arithmetic is more natural and/or efficient than signed arithmetic (one such environment being the natural number theory of an interactive theorem prover, which is where this need arose).

The immediate problem is that one of the s and t coefficients returned by the extended gcd algorithm is usually a negative number, which is to be expected since they must satisfy

s * a + t * b == g

where g is generally less than both a and b. The solution is to modify the specification and ask for natural number coefficients s and t that satisfy

s * a == t * b + g

Fortunately for us, it turns out that such coefficients always exist except when a is zero (and b is non-zero).

Unfortunately for us, our modified specification is no longer symmetric in a and b. Given coprime a and b as input, s is the multiplicative inverse of a modulo b as before, but t is now the negation of the multiplicative inverse of b modulo a (the additive inverse of the multiplicative inverse—funny). This asymmetry breaks the recursion step of the extended gcd algorithm, because there is no longer a way to derive coefficients s and t satisfying

s * a == t * b + g

from coefficients s’ and t’ satisfying

s' * b == t' * (a `mod` b) + g

[If you try you’ll discover that a and b are the wrong way around.] This problem is solved by unwinding the recursion so that the algorithm makes two reductions before calling itself recursively, giving coefficients s’ and t’ satisfying

s' * (a `mod` b) == t' * (b `mod` (a `mod` b)) + g

Although this is expression is more complicated, it is now possible to define coefficients

s = s' + (b `div` (a `mod` b)) * t'
t = t' + (a `div` b) * s

which satisfy the desired equation

s * a == t * b + g

Here is the finished implementation:

egcd :: Natural -> Natural -> (Natural,(Natural,Natural))
egcd a 0 = (a,(1,0))
egcd a b =
    if c == 0
    then (b, (1, a `div` b - 1))
    else (g, (u, t + (a `div` b) * u))
    c = a `mod` b
    (g,(s,t)) = egcd c (b `mod` c)
    u = s + (b `div` c) * t

This satisfies a modified test confirming that it produces a greatest common divisor (for non-zero a):

greatestCommonDivisorTest :: Natural -> Natural -> Bool
greatestCommonDivisorTest a b =
    let (g,(s,t)) = egcd a b in
    divides g a && divides g b && s * a == t * b + g

And we can also check the upper bounds on the s and t coefficients (for non-zero a and b at least 2):

coefficientBoundTest :: Natural -> Natural -> Bool
coefficientBoundTest a b =
    let (_,(s,t)) = egcd a b in
    s < b && t < a

Application: The Chinese Remainder Theorem

The Chinese Remainder Theorem states that for coprime a and b, given an x and y satisfying

x < a
y < b

there is a unique n satisfying

n < a * b
n `mod` a == x
n `mod` b == y

But how to compute this n given natural number inputs a, b, x and y? We build on the natural number extended gcd algorithm, of course:

chineseRemainder :: Natural -> Natural -> Natural -> Natural -> Natural
chineseRemainder a b =
    \x y -> (x * tb + y * sa) `mod` ab
    (_,(s,t)) = egcd a b
    ab = a * b
    sa = s * a
    tb = (a - t) * b

The λ expression in the body maximizes the precomputation on a and b, allowing efficient Chinese remaindering of different x and y inputs once the a and b inputs have been fixed. As with all computation over the natural numbers we must be careful with the subtraction operation (a - t) to ensure that the first argument is always larger than the second, but here we know from the coefficientBoundTest that t is always less than a.


All of the natural number computations in this blog post have been formally verified and are available as the Haskell package opentheory-divides.

Explicit Laziness

This post describes a particular technique that is useful for optimizing functional programs, which I call explicit laziness. This example came about because I needed a purely functional implementation of an infinite list of primes, to demonstrate the capabilities of exporting OpenTheory logic packages to Haskell. The Sieve of Eratosthenes is a classic algorithm to compute this, and in an imperative language can be implemented with an array where it is cheap to update elements. In translating the algorithm to a purely functional language the array data structure naturally becomes a list, where it is much cheaper to update leading elements than trailing elements. This led to the idea of using explicit laziness to minimize the number of updates to trailing elements, and the result is an improved algorithm that could be backported to speed up an imperative implementation. The price of the improvement is greater complexity, but this can be hidden inside an abstract type, and the increased verification challenge can be met by the HOL Light theorem prover which is used to generate the source logic package.

Example Sieve Program

To illustrate the explicit laziness technique, we’ll use it to optimize an Haskell program for computing prime numbers based on a sieve method. This is the central Sieve data structure:

newtype Sieve =
  Sieve { unSieve :: (Natural,[(Natural,Natural)]) }

Values of type Natural are non-negative integers of unbounded size. Every value Sieve (n,ps) of type Sieve must satisfy the invariant that map fst ps is the list of all prime numbers up to the perimeter n in ascending order, and the counter k in every element (p,k) of the list ps satisfies k == n `mod` p. The following examples are valid values of type Sieve:

  • Sieve (9,[(2,1),(3,0),(5,4),(7,2)])
  • Sieve (10,[(2,0),(3,1),(5,0),(7,3)])
  • Sieve (11,[(2,1),(3,2),(5,1),(7,4),(11,0)])

The Sieve API supports two operations for creating new values of type Sieve:

initial :: Sieve
increment :: Sieve -> (Bool,Sieve)

The initial value of a Sieve is simply:

initial :: Sieve
initial = Sieve (1,[])

The increment operation takes a value Sieve (n,_) and returns a new value Sieve (n + 1, _), together with a boolean indicating whether the new perimeter n + 1 is a prime number:

increment :: Sieve -> (Bool,Sieve)
increment =
  \s ->
    let (n,ps) = unSieve s in
    let n' = n + 1 in
    let ps' = inc ps in
    let b = check ps' in
    let ps'' = if b then ps' ++ [(n',0)] else ps' in
    (b, Sieve (n', ps''))
    inc = map (\(p,k) -> (p, (k + 1) `mod` p))
    check = all (\(_,k) -> k /= 0)

The increment operation works by adding 1 to the perimeter n <- n + 1 and every counter k <- (k + 1) `mod` p. It then checks whether any counter is zero, which happens when the perimeter is divisible by a known prime. If no counter is zero then the perimeter is a new prime, and so (n,0) is added to the end of the list.

Motivating Explicit Laziness

How can we optimize this program? Well, there are certainly many ways that we can help the compiler produce efficient target code, but before we get into that we should consider algorithmic improvements. For example, are we performing any unnecessary computation? It seems that we are, because we increment every counter even though the boolean result only needs to know whether any counter is zero. We could stop at the first zero counter and just make a note (called a closure) that the other counters need to be incremented later. This is an instance of lazy evaluation, and since Haskell is a lazy language this optimization will be performed automatically.

Unfortunately, there is still a performance problem here. Consider a time in the future when we are incrementing counters and we come to a counter with an increment closure. At this point Haskell’s lazy evaluation will execute the closure to increment the counter, and then increment it for a second time to see whether it is zero. This is functionally correct, but it would be much more efficient to add 2 to the counter rather than add 1 and then add 1 again. Even worse, it turns out that the increment closures stack up on counters during evaluation, to the point that each time we discover a new prime p we have to increment the counter k in the final list element (q,k) a total of p - q times before we can check that it is non-zero and record the new prime.

How can we avoid this inefficient behavior of adding m to a counter by adding 1 to it m times? This is a case where Haskell’s automatic laziness is not flexible enough, because whenever evaluation encounters a value with a closure, the only possible course of action is to execute the closure and then continue evaluation. There is no way to examine the closure to find out what computation it will perform. However, we can manually simulate lazy evaluation by storing data structures together with explicit notes saying what computation needs to be done before they can be used.

Demonstrating Explicit Laziness

Here is how explicit laziness works in our example. First, we extend the Sieve data structure as follows:

newtype Sieve =
  Sieve { unSieve :: (Natural,[(Natural,(Natural,Natural))]) }

This is exactly the same as before, except that each list element (p,(k,j)) has an extra natural number component j, which is a note that j must be added to every counter following this one in the list.

The definition of the initial value remains the same:

initial :: Sieve
initial = Sieve (1,[])

However, the increment operation now takes into account the explicit j notes:

increment :: Sieve -> (Bool,Sieve)
increment =
  \s ->
    let (n,ps) = unSieve s in
    let n' = n + 1 in
    let (b,ps') = inc n' 1 ps in
    (b, Sieve (n',ps'))
    inc n _ [] = (True, (n,(0,0)) : [])
    inc n i ((p,(k,j)) : ps) =
      let k' = (k + i) `mod` p in
      let j' = j + i in
      if k' == 0 then (False, (p,(0,j')) : ps)
      else let (b,ps') = inc n j' ps in (b, (p,(k',0)) : ps')

The inc and check functions in the previous implementation have been combined into one inc function which takes a parameter i recording how much to increment counters. Initially i is set to 1, but it absorbs the j values for the counters it visits. As soon a zero counter is discovered, the i parameter is added to the current j value and the rest of the list is untouched. This last point is crucial: explicit laziness eliminates the overhead of closures that are normally generated to update data structures not directly required for the result. For comparison with the version of the program without explicit laziness, here are the Sieve values with perimeters 9, 10 and 11:

  • Sieve (9,[(2,(1,0)),(3,(0,2)),(5,(2,0)),(7,(0,0))])
  • Sieve (10,[(2,(0,1)),(3,(0,2)),(5,(2,0)),(7,(0,0))])
  • Sieve (11,[(2,(1,0)),(3,(2,0)),(5,(1,0)),(7,(4,0)),(11,(0,0))])

Values of the new version of the Sieve type are much harder to read than values of the original version, because the j values have to be added to every counter following it in the list. This complicates the invariant for Sieve values, but is actually a measure of the efficiency gained. For example, the j value of 2 in the Sieve value with perimeter 9 line replaces 4 counter increments in the original program: 2 for the 5 counter and another 2 for the 7 counter.

Performance Comparison

Even though explicit laziness is an algorithmic optimization, and there remain many optimizations that can be applied to both versions of the example sieve program, it is nevertheless interesting to gauge the performance effects of explicit laziness using a standard compiler. The following table presents the time taken to compute the Nth prime using both versions of the example sieve program, for various values of N:

Nth prime 100 200 400 800 1600 3200
Without explicit laziness 0.007s 0.019s 0.068s 0.284s 1.169s 5.008s
With explicit laziness 0.004s 0.008s 0.025s 0.083s 0.345s 1.495s

It is gratifying to see that the version of the example sieve program with explicit laziness is about 3x faster on this small experiment.


In this post we have described explicit laziness, a useful technique for optimizing functional programs. Like all optimization techniques its effectiveness depends on the context: automatic laziness is more flexible and guarantees that closures will be evaluated at most once, so should be the default. However, the judicious introduction of explicit laziness can eliminate closures and allow future computations to be combined into more efficient variants before executing them (a limited form of computation on computations).


The code for the optimized version of the example sieve program is available at Hackage: this verified Haskell package is automatically generated from an OpenTheory package. The performance results were gathered using the GHC 7.4.1 compiler on a MacBook Air with a 1.8GHz Intel Core i7 processor and 4Gb of RAM, running OS X 10.7.5.