The *greatest common divisor* (gcd) of two non-negative integers is the largest integer that divides both of them, so for example

`gcd 15 6 == 3`

gcd 21 10 == 1

gcd 10 0 == 10

gcd 0 0 == 0 -- by convention

Two numbers that have a gcd of 1 (like 21 and 10) are said to be *coprime*. Although in general it’s a hard problem to factor an integer into divisors

`91 == 7 * 13`

69 == 3 * 13

it’s an easy problem to compute the gcd of two integers

`gcd 91 69 == 13`

and if they are not coprime then a non-trivial divisor of both is revealed.

Here is a simple recursive definition that efficiently computes gcd:

`gcd :: Integer -> Integer -> Integer`

gcd a 0 = a

gcd a b = gcd b (a `mod` b)

How would we check that this definition always computes the gcd of two non-negative integers? If we define a divides relation

`divides :: Integer -> Integer -> Bool`

divides 0 b = b == 0

divides a b = b `mod` a == 0

then we can easily test that gcd returns *some* common divisor:

`commonDivisorTest :: Integer -> Integer -> Bool`

commonDivisorTest a b = let g = gcd a b in divides g a && divides g b

But how to check that gcd returns the *greatest* common divisor? Consider the linear combination g of a and b

`g = s * a + t * b`

where the coefficients s and t are arbitrary integers. Any common divisor of a and b must also be a divisor of g, and so if g is itself a common divisor of a and b then it must be the greatest common divisor. The *extended* gcd algorithm computes the coefficients s and t together with the gcd:

`egcd :: Integer -> Integer -> (Integer,(Integer,Integer))`

egcd a 0 = (a,(1,0))

egcd a b =

(g, (t, s - (a `div` b) * t))

where

(g,(s,t)) = egcd b (a `mod` b)

which allows us to test that egcd returns the greatest common divisor:

`greatestCommonDivisorTest :: Integer -> Integer -> Bool`

greatestCommonDivisorTest a b =

let (g,(s,t)) = egcd a b in

divides g a && divides g b && s * a + t * b == g

The coefficients s and t are useful for much more than checking the gcd result: for any coprime a and b we have

`s * a + t * b == 1`

(s * a) `mod` b == 1 -- assuming b > 1

(t * b) `mod` a == 1 -- assuming a > 1

which shows that s is the multiplicative inverse of a modulo b, and t is the multiplicative inverse of b modulo a.

### Computing Over the Natural Numbers

The extended gcd algorithm is a classic integer algorithm, but what if we wanted a version that computed over the type of natural numbers? This would be useful in environments in which unsigned arithmetic is more natural and/or efficient than signed arithmetic (one such environment being the natural number theory of an interactive theorem prover, which is where this need arose).

The immediate problem is that one of the s and t coefficients returned by the extended gcd algorithm is usually a negative number, which is to be expected since they must satisfy

`s * a + t * b == g`

where g is generally less than both a and b. The solution is to modify the specification and ask for natural number coefficients s and t that satisfy

`s * a == t * b + g`

Fortunately for us, it turns out that such coefficients always exist except when a is zero (and b is non-zero).

Unfortunately for us, our modified specification is no longer symmetric in a and b. Given coprime a and b as input, s is the multiplicative inverse of a modulo b as before, but t is now the *negation* of the multiplicative inverse of b modulo a (the additive inverse of the multiplicative inverse—funny). This asymmetry breaks the recursion step of the extended gcd algorithm, because there is no longer a way to derive coefficients s and t satisfying

`s * a == t * b + g`

from coefficients s’ and t’ satisfying

`s' * b == t' * (a `mod` b) + g`

[If you try you’ll discover that a and b are the wrong way around.] This problem is solved by unwinding the recursion so that the algorithm makes two reductions before calling itself recursively, giving coefficients s’ and t’ satisfying

`s' * (a `mod` b) == t' * (b `mod` (a `mod` b)) + g`

Although this is expression is more complicated, it is now possible to define coefficients

`s = s' + (b `div` (a `mod` b)) * t'`

t = t' + (a `div` b) * s

which satisfy the desired equation

`s * a == t * b + g`

Here is the finished implementation:

`egcd :: Natural -> Natural -> (Natural,(Natural,Natural))`

egcd a 0 = (a,(1,0))

egcd a b =

if c == 0

then (b, (1, a `div` b - 1))

else (g, (u, t + (a `div` b) * u))

where

c = a `mod` b

(g,(s,t)) = egcd c (b `mod` c)

u = s + (b `div` c) * t

This satisfies a modified test confirming that it produces a greatest common divisor (for non-zero a):

`greatestCommonDivisorTest :: Natural -> Natural -> Bool`

greatestCommonDivisorTest a b =

let (g,(s,t)) = egcd a b in

divides g a && divides g b && s * a == t * b + g

And we can also check the upper bounds on the s and t coefficients (for non-zero a and b at least 2):

`coefficientBoundTest :: Natural -> Natural -> Bool`

coefficientBoundTest a b =

let (_,(s,t)) = egcd a b in

s < b && t < a

### Application: The Chinese Remainder Theorem

The *Chinese Remainder Theorem* states that for coprime a and b, given an x and y satisfying

`x < a`

y < b

there is a unique n satisfying

`n < a * b`

n `mod` a == x

n `mod` b == y

But how to compute this n given natural number inputs a, b, x and y? We build on the natural number extended gcd algorithm, of course:

`chineseRemainder :: Natural -> Natural -> Natural -> Natural -> Natural`

chineseRemainder a b =

\x y -> (x * tb + y * sa) `mod` ab

where

(_,(s,t)) = egcd a b

ab = a * b

sa = s * a

tb = (a - t) * b

The λ expression in the body maximizes the precomputation on a and b, allowing efficient Chinese remaindering of different x and y inputs once the a and b inputs have been fixed. As with all computation over the natural numbers we must be careful with the subtraction operation `(a - t)`

to ensure that the first argument is always larger than the second, but here we know from the `coefficientBoundTest`

that t is always less than a.

### References

All of the natural number computations in this blog post have been formally verified and are available as the Haskell package opentheory-divides.

SimonInterestingly (well, at least to me), the extended euclidian algorithm is also useful for finding the solution of linear diophantine equations (which is a fancy way of saying equations where the solutions are integral).

Pingback: The Slowest Software Development Methodology in the World | The Robot Mathematician