Sudoku

Pro­gram­mers waste enor­mous amounts of time think­ing about, or wor­ry­ing about, the speed of non­crit­i­cal parts of their pro­grams, and these attempts at effi­ciency actu­ally have a strong neg­a­tive impact when debug­ging and main­te­nance are con­sid­ered. We should for­get about small effi­cien­cies, say about 97% of the time: pre­ma­ture opti­miza­tion is the root of all evil. Yet we should not pass up our oppor­tu­ni­ties in that crit­i­cal 3%.

Don­ald Knuth, Struc­tured Pro­gram­ming With Go To State­ments

Sudoku is a game played on 9 by 9 grid such as the one below. The goal is to fill the spots marked with dots with the dig­its 1–9 such that each row, col­umn, and 3 by 3 box, con­tains the dig­its 1–9 exactly once.

2

.

.

.

.

1

.

3

8

.

.

.

.

.

.

.

.

5

.

7

.

.

.

6

.

.

.

.

.

.

.

.

.

.

1

3

.

9

8

1

.

.

2

5

7

3

1

.

.

.

.

8

.

.

9

.

.

8

.

.

.

2

.

.

5

.

.

6

9

7

8

4

4

.

.

2

5

.

.

.

.

We will write a pro­gram that can solve Sudoku puz­zles. Our first attempt will be straight­for­ward, but highly inef­fi­cient for actual boards. We will refine this ini­tial solu­tion to even­tu­ally develop a pro­gram that is prac­ti­cal, but retains its sim­plic­ity.

Data Definitions

Our first step is to iden­tify what sorts of infor­ma­tion must be rep­re­sen­tated and chose how to rep­re­sent it.This is the first step of the Design Recipe, a sys­tem­atic pro­gram design process.

The most obvi­ous rep­re­sen­ta­tion of the Sudoku grid is as a matrix, a list of lists. The type of the ele­ments of the rows must also be spec­i­fied. Here we chose the Char type so we can rep­re­sent empty spaces as dots. Are there other rea­son­able choices? Maybe Int is pos­si­ble, where Noth­ing rep­re­sents a blank space. Think about the advan­tages and dis­ad­van­tages of this approach.

type Grid = Matrix Value
type Matrix a = [Row a]
type Row a = [a]
type Value = Char

It will also be valu­able to spec­ify some con­stants and con­ven­tions up front. We want 9 by 9 grids, so the boxes will be 3 by 3. Our cells can have the dig­its 1–9 and the spe­cial value . indi­cat­ing an empty cell.

box­Size :: Int
box­Size = 3

val­ues :: [Value]
val­ues = ['1' .. '9']

empty :: Value -> Bool
empty = (== '.')

sin­gle :: [a] -> Bool
sin­gle [_] = True
sin­gle _   = False

The last def­i­n­i­tion, sin­gle, is a helper func­tion that we’ll use later. Notice all bind­ings have explicit type anno­ta­tions and vari­ables have camel-cased names. This is good style. For future test­ing, we can define an exam­ple puz­zle.

puz­zle :: Grid
puz­zle = ["2....1.38",
          "........5",
          ".7...6...",
          ".......13",
          ".981..257",
          "31....8..",
          "9..8...2.",
          ".5..69784",
          "4..25...."]

But wait, where are our rows? Remem­ber, in Haskell String is an alias for [Char] so we can use string lit­er­als instead of the list con­struc­tion syn­tax. This is a nice con­ve­nience and one that comes as a con­se­quence of our choice of Value as Char.

Checking Correctness

You can imag­ine the first step to solv­ing Sudoku puz­zles is deter­min­ing whether a grid is solved or not. What does it mean to be solved? Our rows, columns, and boxes must have all the right dig­its with­out dupli­cates. If we have func­tions that extract these com­po­nents, valid­ity is just check­ing this cri­te­ria over all the com­po­nents.

valid :: Grid -> Bool
valid g = all no­Dups (rows g) &&
          all no­Dups (cols g) &&
          all no­Dups (boxes g)

no­Dups :: Eq a => [a] -> Bool
no­Dups [] = True
no­Dups (x : xt) = not (elem x xt) && no­Dups xt

Don’t worry too much about the Eq a busi­ness, we will see what this means next week. For now, just think of it as a require­ment to use the elem func­tion.

Now we need the extrac­tion func­tions. Rows are sim­ple, we already have them!This rep­re­sen­ta­tion of matri­ces is called row-major order. If we pre­ferred columns, we could have col­umn-major order. Then our imple­men­ta­tions of rows and cols would be swapped. Columns require trans­po­si­tion, flip­ping the rows and columns, an oper­a­tion Haskell imple­ments already. Boxes are more com­pli­cated. You should try to imple­ment it your­self and see how it com­pares to the imple­men­ta­tion I will pro­vide.

rows :: Matrix a -> [Row a]
rows =  id

cols :: Matrix a -> [Row a]
cols = trans­pose

boxes :: Matrix a -> [Row a]
boxes = unpack . map cols . pack
        where
          pack   = split . map split
          split  = chop box­Size
          unpack = map con­cat . con­cat

chop :: Int -> [a] -> [[a]]
chop n [] = []
chop n xs = take n xs : chop n (drop n xs)

Intermezzo: What’s a list?

You’ve used lists a lot dur­ing your time pro­gram­ming. You prob­a­bly feel pretty com­fort­able with them. So then, what is a list?

One way to think of a list is as a con­tainer with a bunch of ele­ments inside it. So a list of num­bers if a like a box with num­bers con­tained within. This I’ll call the con­tainer inter­pre­ta­tion of lists.

How­ever, this is not the only way to think of a list.

You can view lists as non-deter­min­is­tic com­pu­ta­tions. A value like 100 or “what” can be viewed as a deter­min­is­tic com­pu­ta­tion that has only one result, whereas a list like [1, 2, 3] can be viewed as a com­pu­ta­tion that can’t decide on which result it wants to have, so it presents us with all of the pos­si­ble results.

Miran Lipo­vača, Learn You a Haskell for Great Good!

I’ll call this the com­pu­ta­tion inter­pre­ta­tion of lists. What does this per­spec­tive buy us? Well, sup­pose we think about adding two lists of num­bers [1, 2] and [3, 4]. Under the con­tainer view, this doesn’t really make sense. What does it mean to add a box to a box? How­ever this has a clear mean­ing under the com­pu­ta­tion per­spec­tive, because [1, 2] is thought of as a sin­gle non-deter­min­is­tic value. As a num­ber, it makes sense to speak of adding it to another num­ber. The sum of these two lists should be [4, 5, 5, 6] cor­re­spond­ing to every pos­si­ble sum of the num­bers.

Brute First

Let’s model the entire space of solu­tions to a Sudoku puz­zle. Any pre-filled cell is a deter­min­is­tic entry. Any empty cell is a non-deter­min­is­tic entry over all dig­its. Using a list as a rep­re­sen­ta­tion of a non-deter­min­is­tic com­pu­ta­tion, we can imple­ment this.

type Choices = [Value]

choices :: Grid -> Matrix Choices
choices g = map (map choice) g
            where
              choice v = if empty v then val­ues else [v]

Using choices we get a fixed matrix with non-deter­min­is­tic val­ues of type Matrix [a]. Instead, we need a non-deter­min­is­tic matrix with fixed val­ues of type [Matrix a]. The sequence func­tion can do this for a list.

sequence [[1, 2], [3, 4], [5, 6]] =
   [[1, 3, 5], [1, 3, 6], [1, 4, 5], [1, 4, 6],
    [2, 3, 5], [2, 3, 6], [2, 4, 5], [2, 4, 6]]

We can extend this to matri­ces by map­ping.

col­lapse :: Matrix [a] -> [Matrix a]
col­lapse =  sequence . map sequence

In our exam­ple puz­zle there were 51 empty squares. So, col­lapse (choices puz­zle) yields a list of grids with every pos­si­bil­ity for the empty squares filled in. Find­ing all solu­tions is just a mat­ter of fil­ter­ing the valid solu­tions.

solve­Brute :: Grid -> [Grid]
solve­Brute =  fil­ter valid . col­lapse . choices

How large is the out­put list? There are 51 empty spaces each with 9 pos­si­ble dig­its. That’s 9 to the 51 grids, a num­ber of the same order of mag­ni­tude as the num­ber of mol­e­cules on Earth. This is intractable to say the least, but our effort wasn’t for naught.

Instead of start­ing from scratch, think about how you would solve a Sudoku and how we could apply that rea­son­ing to shrink the search space.

Prune Second

Enu­mer­at­ing all dig­its for empty spaces doesn’t take into account the con­straints of the puz­zle itself. When you solve a puz­zle, you don’t just list off every pos­si­bil­ity. You restrict your search based on already filled-in squares. We can do this too.

For each non-deter­min­is­tic cell, we can throw away any incon­sis­tent choice. In other words, any value that already occurs as a fixed entry in the same row, col­umn, or box.

prune :: Matrix Choices -> Matrix Choices
prune =  prune­By boxes . prune­By cols . prune­By rows
         where prune­By f = f . map reduce . f

reduce :: Row Choices -> Row Choices
reduce xss =  [xs `minus` sin­gles | xs <- xss]
              where sin­gles = con­cat (fil­ter sin­gle xss)

minus :: Choices -> Choices -> Choices
xs `minus` ys = if sin­gle xs then xs else xs \\ ys

Now we can prune the search space before we col­lapse it.

solve­Prune :: Grid -> [Grid]
solve­Prune =  fil­ter valid . col­lapse . prune . choices

This is bet­ter, but still infea­si­ble. Think though, why did we just prune once? In the process of prun­ing, we may have gen­er­ated more fixed posi­tions. A sub­se­quent prune could use this new infor­ma­tion to elim­i­nate even more options. Indeed, we should be prune until we can’t prune any­more. When is that? When prun­ing stops elim­i­nat­ing choices. This is called a fixed point.

solve­Fix­Prune :: Grid -> [Grid]
solve­Fix­Prune = fil­ter valid . col­lapse . fix prune . choices

fix :: Eq a => (a -> a) -> a -> a
fix f x =  if x == x' then x else fix f x'
           where x' = f x

Bet­ter, but still not good enough.

Refine Third

A hypo­thet­i­cal choice affects all sub­se­quent choices. When you solve a puz­zle, there are points where you may have to make an assump­tion, a guess. This guess impacts every sub­se­quent move you make. If your guess was wrong, you back­track and make a dif­fer­ent one.

At the moment we con­sider every all hypo­thet­i­cal choices inde­pen­dently. Instead, we should make one guess at a time and see how this affects the rest of the puz­zle.

The solve func­tion will act as before, but instead of gen­er­at­ing the entire search space with col­lapse and fil­ter­ing over it, we will make guesses one-by-one with the search func­tion. We can then prune the search space based on our guess and con­tinue search­ing.You may won­der why we don’t use fix prune. We could, but it turns out not to help that much in prac­tice.

solve :: Grid -> [Grid]
solve = search . prune . choices

search :: Matrix Choices -> [Grid]
search m
  | blocked m  = []
  | com­plete m = col­lapse m
  | oth­er­wise  = [g | m' <- guesses m
                    , g  <- search (prune m')]

There’s a num­ber of helpers that have to be writ­ten for this to work. We need a guesses func­tion to pick the first non-deter­min­is­tic entry and return a list of grids fix­ing every pos­si­ble value.Try out a ver­sion of guesses that instead picks the cell with the fewest choices and expands those. Is the per­for­mance any bet­ter? You can think of this as the one cell equiv­a­lent of col­lapse.

guesses :: Matrix Choices -> [Matrix Choices]
guesses m =
  [rows1 ++ [row1 ++ [c] : row2] ++ rows2 | c <- cs]
  where
    (rows1, row : rows2) = break (any (not . sin­gle)) m
    (row1, cs : row2)    = break (not . sin­gle) row

We also need to know when we have a fully filled in grid or a bad grid that’s impos­si­ble to make more progress on. This is com­plete and blocked respec­tively.

com­plete :: Matrix Choices -> Bool
com­plete = all (all sin­gle)

blocked :: Matrix Choices -> Bool
blocked m = void m || not (safe m)

void :: Matrix Choices -> Bool
void = any (any null)

safe :: Matrix Choices -> Bool
safe cm = all con­sis­tent (rows cm) &&
          all con­sis­tent (cols cm) &&
          all con­sis­tent (boxes cm)

con­sis­tent :: Row Choices -> Bool
con­sis­tent = no­Dups . con­cat . fil­ter sin­gle

Notice a grid can be “bad” in two ways. Either there’s a choice with no more valid pos­si­bil­i­ties left, we call this void, or there is some incon­sis­tency in the grid, we’ll say this isn’t safe.

Conclusion

main :: IO ()
main = (put­Str­Ln . unlines . head . solve) puz­zle

Com­pile and watch this churn out a solu­tion to our exam­ple instan­ta­neously.

The upshot here is that you shouldn’t be afraid of the brute­force solu­tion. We started with that straight­for­ward approach and it was pre­dictable bad. How­ever, we used it as a foun­da­tion to build opti­miza­tions on top of it. These were guided by com­mon­sense and intu­itive tech­niques that any­one who has done Sudoku puz­zles fol­lows. What resulted was a pro­gram that works and is easy to under­stand.

Additional Reading