# Large Knight’s Tours

I was at middle school when I first heard about the knight’s tour problem: place a knight on any square on a chess board, and move it to every square on the board without visiting any square more than once. A chess knight moves by jumping two squares forward and one square sideways, so a big part of the challenge is getting into and out of all the corners. I remember filling up pages and pages of squared notebooks with attempted solutions, although my record from that time was a tour that covered only 61 of the 64 squares.

### Enter the Computer

Well, it’s one thing to look for knight’s tours by hand, but quite another to search for them with a computer. It is actually a fun programming exercise to search for knight’s tours: it is possible to use a standard backtracking algorithm, but it needs a couple of tricks to make it work. Firstly, if you have a choice of squares to jump to, then you should consider them in ascending order of accessibility (i.e., from how many unvisited squares could you jump to this square). This helps to ensure that you don’t waste last chances to visit squares. However, even with this trick, it is possible to make mistakes early on in the search that you can never recover from, and end up lost forever in a maze with 1051 turns and no exits. An ugly way to fix this is to assign a move budget—I chose 100 times the number of squares—and if you don’t find a solution within this many jumps then give up and start again from an empty board.

Armed with a fast computer, a search algorithm and a couple of tricks, I was finally able to realize a childhood dream and find a knight’s tour:

Note that we always begin our tours in the bottom right corner, so that the search program only has to work out how to get into and out of 3 corners: that time spent filling squared notebooks taught me something!

### Building Up

Once the initial problem is solved, I naturally start wondering whether we can find knight’s tours on larger boards. It is easy to modify the search program to change the board dimensions from an 8×8 square to an m×n rectangle, but as the board size increases the time it takes to find a knight’s tour grows rapidly:

 Board size (n×n) 8 10 12 14 16 18 20 22 24 26 28 30 Search time (ms) 1 3 7 11 26 51 102 225 401 701 1,074 1,738

Extrapolating these results to a 100×100 board, it seems that my search program would take over a thousand years to find a knight’s tour! Clearly I need a new approach to find knight’s tours for large boards.

Our 8×8 knight’s tour above looks like a little circuit, and drawn the same way a 100×100 knight’s tour would look like a huge circuit. But let’s take some inspiration from the way that huge circuits are designed in practice: not all at once, but rather by wiring together smaller circuits. Consider this special 6×6 knight’s tour:

One reason this knight’s tour is special is because it has no start and end: it is a loop! There are probably clever ways to find knight’s tours that are loops, but I took a simple-minded approach. I modified my program to keep searching for knight’s tours until it found one where the end was a knight’s jump away from the start, and then I connected the ends of the tour to make a loop.

There is another reason that this knight’s tour is special, which is not so easy to see. You can put two of them together:

then switch two of the jumps around to connect the tours together:

and the result is a knight’s tour on a 12×6 board. To see that the result is indeed a tour, let’s visualize the two separate knight’s tours as loops using a schematic diagram:

And now when we perform the connection step the schematic diagram makes it clear that we have created one big loop from the two separate loops:

The same thing also works by vertically stacking the special 6×6 knight’s tour:

and connecting them like so:

### Assembling Large Tours

I’ve shown how to connect special 6×6 knight’s tours together, and the next step is to connect them in an m×n grid to make a knight’s tour of a 6m×6n board. Connecting them together in a grid requires a little bit of care to avoid fragmenting the knight’s tour into disconnected paths. Here’s a schematic diagram for a 2×2 grid illustrating how tours can be fragmented into disconnected paths:

One way to avoid tour fragmentation is to connect along some path through the grid that visits every 6×6 knight’s tour. I chose a spiral, starting at the bottom right of the grid, travelling anti-clockwise and ending in the middle. Here is the schematic diagram for a spiral path on a 3×3 grid:

And here’s the resulting knight’s tour on an 18×18 board:

### Filling in the Gaps

At this stage we have the machinery to construct knight’s tours for arbitrarily large boards of dimension 6m×6n. But what about boards of other dimensions, such as the 100×100 board that we set as our target? These boards can be handled by searching for more special tours of dimensions m×n, where 6 ≤ m,n ≤ 11, and using these in place of 6×6 knight’s tours on the bottom and right edges of the grid. For example, here is a knight’s tour of a 23×23 board:

This has the same 3×3 grid pattern as the 18×18 knight’s tour, but with larger pieces around the right and bottom edges. Also, the knight’s tour in the bottom right corner is not a loop—you can see one of its ends in the bottom right corner—so the schematic diagram for the assembled tour ends up looking a little different:

In fact, it is impossible to find a knight’s tour that is a loop on this board, because it has an odd number of squares. To see why, imagine the squares of the board painted light and dark colors in a checkerboard pattern:

As a knight jumps around the board it alternates between light and dark squares, and a tour of a board with an odd number of squares will contain an even number of jumps. This means the start and end squares of the tour will have the same color, and so there’s no way to jump from the end square of the tour back to the start square to close the loop.

### Summary

Using these techniques you can easily assemble knight’s tours for large boards: you can view a tour of a 100×100 board or even assemble your own knight’s tour.

But these techniques only work for boards where the width and height are at least 6: there are also an infinite number of thin boards that might contain knight’s tours:

Can you think of techniques for assembling knight’s tours on thin boards? ♘

### Appendix

The performance results were gathered using a MacBook Air with a 1.8GHz Intel Core i7 processor and 4Gb of RAM, running OS X 10.7.5.

# Explicit Laziness

This post describes a particular technique that is useful for optimizing functional programs, which I call explicit laziness. This example came about because I needed a purely functional implementation of an infinite list of primes, to demonstrate the capabilities of exporting OpenTheory logic packages to Haskell. The Sieve of Eratosthenes is a classic algorithm to compute this, and in an imperative language can be implemented with an array where it is cheap to update elements. In translating the algorithm to a purely functional language the array data structure naturally becomes a list, where it is much cheaper to update leading elements than trailing elements. This led to the idea of using explicit laziness to minimize the number of updates to trailing elements, and the result is an improved algorithm that could be backported to speed up an imperative implementation. The price of the improvement is greater complexity, but this can be hidden inside an abstract type, and the increased verification challenge can be met by the HOL Light theorem prover which is used to generate the source logic package.

### Example Sieve Program

To illustrate the explicit laziness technique, we’ll use it to optimize an Haskell program for computing prime numbers based on a sieve method. This is the central `Sieve` data structure:

```newtype Sieve =   Sieve { unSieve :: (Natural,[(Natural,Natural)]) }```

Values of type `Natural` are non-negative integers of unbounded size. Every value `Sieve (n,ps)` of type `Sieve` must satisfy the invariant that `map fst ps` is the list of all prime numbers up to the perimeter `n` in ascending order, and the counter `k` in every element `(p,k)` of the list `ps` satisfies `k == n `mod` p`. The following examples are valid values of type `Sieve`:

• `Sieve (9,[(2,1),(3,0),(5,4),(7,2)])`
• `Sieve (10,[(2,0),(3,1),(5,0),(7,3)])`
• `Sieve (11,[(2,1),(3,2),(5,1),(7,4),(11,0)])`

The `Sieve` API supports two operations for creating new values of type Sieve:

```initial :: Sieve increment :: Sieve -> (Bool,Sieve)```

The `initial` value of a `Sieve` is simply:

```initial :: Sieve initial = Sieve (1,[])```

The `increment` operation takes a value `Sieve (n,_)` and returns a new value `Sieve (n + 1, _)`, together with a boolean indicating whether the new perimeter `n + 1` is a prime number:

```increment :: Sieve -> (Bool,Sieve) increment =   \s ->     let (n,ps) = unSieve s in     let n' = n + 1 in     let ps' = inc ps in     let b = check ps' in     let ps'' = if b then ps' ++ [(n',0)] else ps' in     (b, Sieve (n', ps''))   where     inc = map (\(p,k) -> (p, (k + 1) `mod` p))     check = all (\(_,k) -> k /= 0)```

The `increment` operation works by adding 1 to the perimeter `n <- n + 1` and every counter `k <- (k + 1) `mod` p`. It then checks whether any counter is zero, which happens when the perimeter is divisible by a known prime. If no counter is zero then the perimeter is a new prime, and so `(n,0)` is added to the end of the list.

### Motivating Explicit Laziness

How can we optimize this program? Well, there are certainly many ways that we can help the compiler produce efficient target code, but before we get into that we should consider algorithmic improvements. For example, are we performing any unnecessary computation? It seems that we are, because we increment every counter even though the boolean result only needs to know whether any counter is zero. We could stop at the first zero counter and just make a note (called a closure) that the other counters need to be incremented later. This is an instance of lazy evaluation, and since Haskell is a lazy language this optimization will be performed automatically.

Unfortunately, there is still a performance problem here. Consider a time in the future when we are incrementing counters and we come to a counter with an increment closure. At this point Haskell’s lazy evaluation will execute the closure to increment the counter, and then increment it for a second time to see whether it is zero. This is functionally correct, but it would be much more efficient to add 2 to the counter rather than add 1 and then add 1 again. Even worse, it turns out that the increment closures stack up on counters during evaluation, to the point that each time we discover a new prime `p` we have to increment the counter `k` in the final list element `(q,k)` a total of `p - q` times before we can check that it is non-zero and record the new prime.

How can we avoid this inefficient behavior of adding `m` to a counter by adding 1 to it `m` times? This is a case where Haskell’s automatic laziness is not flexible enough, because whenever evaluation encounters a value with a closure, the only possible course of action is to execute the closure and then continue evaluation. There is no way to examine the closure to find out what computation it will perform. However, we can manually simulate lazy evaluation by storing data structures together with explicit notes saying what computation needs to be done before they can be used.

### Demonstrating Explicit Laziness

Here is how explicit laziness works in our example. First, we extend the Sieve data structure as follows:

```newtype Sieve =   Sieve { unSieve :: (Natural,[(Natural,(Natural,Natural))]) }```

This is exactly the same as before, except that each list element `(p,(k,j))` has an extra natural number component `j`, which is a note that `j` must be added to every counter following this one in the list.

The definition of the `initial` value remains the same:

```initial :: Sieve initial = Sieve (1,[])```

However, the `increment` operation now takes into account the explicit `j` notes:

```increment :: Sieve -> (Bool,Sieve) increment =   \s ->     let (n,ps) = unSieve s in     let n' = n + 1 in     let (b,ps') = inc n' 1 ps in     (b, Sieve (n',ps'))   where     inc n _ [] = (True, (n,(0,0)) : [])     inc n i ((p,(k,j)) : ps) =       let k' = (k + i) `mod` p in       let j' = j + i in       if k' == 0 then (False, (p,(0,j')) : ps)       else let (b,ps') = inc n j' ps in (b, (p,(k',0)) : ps')```

The `inc` and `check` functions in the previous implementation have been combined into one `inc` function which takes a parameter `i` recording how much to increment counters. Initially `i` is set to 1, but it absorbs the `j` values for the counters it visits. As soon a zero counter is discovered, the i parameter is added to the current j value and the rest of the list is untouched. This last point is crucial: explicit laziness eliminates the overhead of closures that are normally generated to update data structures not directly required for the result. For comparison with the version of the program without explicit laziness, here are the `Sieve` values with perimeters 9, 10 and 11:

• `Sieve (9,[(2,(1,0)),(3,(0,2)),(5,(2,0)),(7,(0,0))])`
• `Sieve (10,[(2,(0,1)),(3,(0,2)),(5,(2,0)),(7,(0,0))])`
• `Sieve (11,[(2,(1,0)),(3,(2,0)),(5,(1,0)),(7,(4,0)),(11,(0,0))])`

Values of the new version of the `Sieve` type are much harder to read than values of the original version, because the `j` values have to be added to every counter following it in the list. This complicates the invariant for `Sieve` values, but is actually a measure of the efficiency gained. For example, the `j` value of 2 in the `Sieve` value with perimeter 9 line replaces 4 counter increments in the original program: 2 for the 5 counter and another 2 for the 7 counter.

### Performance Comparison

Even though explicit laziness is an algorithmic optimization, and there remain many optimizations that can be applied to both versions of the example sieve program, it is nevertheless interesting to gauge the performance effects of explicit laziness using a standard compiler. The following table presents the time taken to compute the Nth prime using both versions of the example sieve program, for various values of N:

 Nth prime 100 200 400 800 1600 3200 Without explicit laziness 0.007s 0.019s 0.068s 0.284s 1.169s 5.008s With explicit laziness 0.004s 0.008s 0.025s 0.083s 0.345s 1.495s

It is gratifying to see that the version of the example sieve program with explicit laziness is about 3x faster on this small experiment.

### Summary

In this post we have described explicit laziness, a useful technique for optimizing functional programs. Like all optimization techniques its effectiveness depends on the context: automatic laziness is more flexible and guarantees that closures will be evaluated at most once, so should be the default. However, the judicious introduction of explicit laziness can eliminate closures and allow future computations to be combined into more efficient variants before executing them (a limited form of computation on computations).

### Appendix

The code for the optimized version of the example sieve program is available at Hackage: this verified Haskell package is automatically generated from an OpenTheory package. The performance results were gathered using the GHC 7.4.1 compiler on a MacBook Air with a 1.8GHz Intel Core i7 processor and 4Gb of RAM, running OS X 10.7.5.