Cracking Word Searches with Haskell

Posted May 31, 2013

After learning Haskell a few months ago, I've had disappointingly few opportunities to use it. For a simple data-crunch a python repl starts up faster and lets me work quicker, and many of my other projects require a specific language. So when I was browsing reddit and saw /u/SWEAR_WORD_SEARCH, I thought it would be a fun, quick project to crack these in Haskell. (quick warning: if you didn't guess from the SWEAR_WORD part, this article contains some profane word searches ;-)

The format of the word searches posted by this user are pretty simple. Here's an example:

N Z C G S M Z C M Q A C B O S W R E D V
X O K P G H N R E X T C V P E Z G W F V
W C D T N C P I Y F M E R J S Z W L E S
T R L K Z A S T M Z J V O N W F C Z X K
E S N N O H R W Y J I S Z A S M Y I S P
K T M W J R A D F W F M G W L I G O Q Q
C L D O L W G A M U V K G N I K C U F S
V X L K U O G H H I E Y Y Z C I S F B H
G J W B V U O R F H P E Y M N I I T O Q
G P V S N Y Y M R N H S A C J F J X H Z
T G J M F B R L E M H N R C S R I J M M
K V S A J V O N A W J E A F U U D W L B
P B H R H H Y A D B D W D Y G S W H A A
I Q I T M K I A Q I Y M W E A T U M J M
P P X W A T C H B H Y T Y Z D R S U Y U
V Q J N T L X L F V S D P G P A E Q E K
S Z N A E C Y N E J K H T W M T S P Z M
G Q M S E X T L R G Y G U B Y I J I A B
J C U D P E Z P M D W F Z G I N Z L L Y
P Y G X F W H P N T X R S I F G C C L T

Find the words: FRUSTRATING INCREDIBLY FUCKING WATCH SMART There's two parts here: the word search itself, and the word list we need to find. We could parse this with a combinator library like Parsec, but it's simple enough that lines should be enough. Indeed, after a bit of hacking in ghci and refactoring, I got

parsePuzzle :: String -> ([[Char]], [String])
parsePuzzle p = (parseLetters $ lines p, parseWords $ lines p)
    where dup f a = f a a
          parseLine = map head . words
          parseLetters = map parseLine . dup (take . (subtract 2) . length)
          parseWords = words . last

While the point-free style didn't work perfectly on parseLetters (I'm not a fan of Forth-like argument manipulation in Haskell, such as flip and dup), I think it's better than the alternative with names. Now we need to locate the words in the 2d letter matrix. This is surprisingly easy, using Data.List.transpose. We need to search through rows, columns, and diagonals. Rows are easy, they're simply the fst of the tuple we get from parsePuzzle. Columns are similar, simply use transpose on the rows. We need a little trick for the diagonals, though, however it's not as hard as it initially seems. First, we need a helper function mapIndex. This is similar to the function map, which you are hopefully familiar with, however it also provides the function with the index of the current item.

mapIndex :: (a -> Int -> b) -> [a] -> [b]
mapIndex f l = map (uncurry f) $ zip l [0..length l - 1]

(uncurry has signature (a -> b -> c) -> (a, b) -> c, and is used to unwrap the tuple from zip into two arguments) This might not seem particularly useful, but in fact we're only a step away from finding the diagonals now. By passing flip drop to mapIndex, and transposing the result, we get

ghci> putStrLn $ unlines p
NZCGSMZCMQACBOSWREDV
XOKPGHNREXTCVPEZGWFV
WCDTNCPIYFMERJSZWLES
TRLKZASTMZJVONWFCZXK
ESNNOHRWYJISZASMYISP
KTMWJRADFWFMGWLIGOQQ
CLDOLWGAMUVKGNIKCUFS
VXLKUOGHHIEYYZCISFBH
GJWBVUORFHPEYMNIITOQ
GPVSNYYMRNHSACJFJXHZ
TGJMFBRLEMHNRCSRIJMM
KVSAJVONAWJEAFUUDWLB
PBHRHHYADBDWDYGSWHAA
IQITMKIAQIYMWEATUMJM
PPXWATCHBHYTYZDRSUYU
VQJNTLXLFVSDPGPAEQEK
SZNAECYNEJKHTWMTSPZM
GQMSEXTLRGYGUBYIJIAB
JCUDPEZPMDWFZGINZLLY
PYGXFWHPNTXRSIFGCCLT

ghci> putStrLn $ unlines $ transpose $ mapIndex (flip drop) p
NODKORGHFNHEDEDASILT
ZKTZHAAHHHNAYAREPAY
CPNARDMIPSRFGTSQZB
GGCSWFUEEACUSUUEM
SHPTYWVYYCSUWMYK
MNIMJFKYMJRDHJU
ZRYZIMGZNFIWAM
CEFJSGNCIJJLA
MXMVZWIIIXMB
QTEOALKSTHM
ACRNSICFOZ
CVJWMGUBQ
BPSFYOFH
OEZCIQS
SZWZSQ
WGLXP
RWEK
EFS
DV
V

... the left to right diagonals! We can get the left to right diagonals easily as well, by using (map reverse p) instead of p in the right of the pipeline. Finally, to cover the bottom diagonals as well we need to reverse p before we process it, these are the prime versions. Now, we should easily be able to find if a word exists in the word search.

findWord :: String -> [[Char]] -> String
findWord s p = head $ filter (checkIn s) (rows p ++ cols p ++ dial p ++ diar p ++ dial' p ++ diar' p)
    where checkIn s line = s `isInfixOf` line || reverse s `isInfixOf` line
          rows p = p
          cols p = transpose p
          dial p = transpose $ mapIndex (flip drop) p
          diar p = transpose $ mapIndex (flip drop) (map reverse p)
          dial' p = transpose $ mapIndex (flip drop) (reverse p)
          diar' p = transpose $ mapIndex (flip drop) (map reverse (reverse p))




ghci> findWord "FUCKING" (fst $ parsePuzzle puzzle)
"CLDOLWGAMUVKGNIKCUFS"

ghci> take 7 $ drop 1 $ reverse "CLDOLWGAMUVKGNIKCUFS"
"FUCKING"

That's all well and good - we can see if a word appeared in the word search! But we really need to know what letter positions the word appears in, so we can do something special when printing those positions out. There's a few ways we could do this: a zipper, printing as we go, etc. However, the simplest solution is to simply keep position data with the characters. It will be preserved through the transformations, and at the end we can simply check to see if the characters we're printing are in those positions. First, let's add a type synonym.

type Letter = (Char, Int, Int)

and modify parsePuzzle to generate Letters instead of Chars:

parsePuzzle :: String -> ([[Letter]], [String])
parsePuzzle p = (parseLetters $ lines p, parseWords $ lines p)
    where dup f a = f a a
          parseLine chars y = mapIndex (\[c] x -> (c, x, y)) $ words chars
          parseLetters = mapIndex parseLine . dup (take . (subtract 2) . length)
          parseWords = words . last

Sadly the point-free style doesn't work for parseLine anymore, but luckily we had to make only one change to parseLetters, changing map to our mapIndex function from earlier. Finally, we need to change findWord to use Letter as well. This is pretty simple:

findWord :: String -> [[Letter]] -> [Letter]
findWord s p = head $ filter (checkIn s) (rows p ++ cols p ++ dial p ++ diar p ++ dial' p ++ diar' p)
        where lts = map (\(c, _, _) -> c)
              checkIn s line = s `isInfixOf` lts line || reverse s `isInfixOf` lts line
              rows p = p
              cols p = transpose p
              dial p = transpose $ mapIndex (flip drop) p
              diar p = transpose $ mapIndex (flip drop) (map reverse p)
              dial' p = transpose $ mapIndex (flip drop) (reverse p)
              diar' p = transpose $ mapIndex (flip drop) (map reverse (reverse p))



ghci> findWord "FUCKING" (fst $ parsePuzzle puzzle)
[('C',0,6),('L',1,6),('D',2,6),('O',3,6),('L',4,6),('W',5,6),('G',6,6),('A',7,6),('M',8,6),('U',9,6),('V',10,6),('K',11,6),('G',12,6),('N',13,6),('I',14,6),('K',15,6),('C',16,6),('U',17,6),('F',18,6),('S',19,6)]

It might not seem like much, but in fact, we're incredibly close to finished! First, let's collect the positions for every word into one list:

ghci> let (p, w) = parsePuzzle puzzle
ghci> nub $ foldl (\pos word -> findWord word p ++ pos) [] w
[('G',3,0),('P',3,1),('T',3,2),('K',3,3),('N',3,4),('W',3,5),('O',3,6),('K',3,7),('B',3,8),('S',3,9),('M',3,10),('A',3,11),('R',3,12),('T',3,13),('W',3,14),('N',3,15),('A',3,16),('S',3,17),('D',3,18),('X',3,19),('P',0,14),('P',1,14),('X',2,14),('A',4,14),('T',5,14),('C',6,14),('H',7,14),('B',8,14),('H',9,14),('Y',10,14),('T',11,14),('Y',12,14),('Z',13,14),('D',14,14),('R',15,14),('S',16,14),('U',17,14),('Y',18,14),('U',19,14),('C',0,6),('L',1,6),('D',2,6),('L',4,6),('W',5,6),('G',6,6),('A',7,6),('M',8,6),('U',9,6),('V',10,6),('K',11,6),('G',12,6),('N',13,6),('I',14,6),('K',15,6),('C',16,6),('U',17,6),('F',18,6),('S',19,6),('W',15,0),('Z',15,1),('Z',15,2),('F',15,3),('M',15,4),('I',15,5),('I',15,7),('I',15,8),('F',15,9),('R',15,10),('U',15,11),('S',15,12),('T',15,13),('A',15,15),('T',15,16),('I',15,17),('N',15,18),('G',15,19)]

(nub, from Data.List, removes duplicates from a list) Finally, we need to print out the puzzle. First, let's focus on just printing it without any decorations for where words are.

mapM_ (\ row -> do
         mapM_ (\ (c, _, _) -> putChar c) row
         putStrLn ""
      ) p

Expanding it to highlight the rows is easy enough as well.

main = do
    let (p, w) = parsePuzzle puzzle
        positions = nub $ foldl (\pos word -> findWord word p ++ pos) [] w
    mapM_ (\ row -> do
             mapM_ (\ l@(c, _, _) ->
                      if l `notElem` positions
                      then putChar c
                      else putStr ("\x1b[32m" ++ [c] ++ "\x1b[0m")
                   ) row
             putStrLn ""
          ) p

While using raw escape codes to color the output is a bit ugly, and doesn't work on Windows, it's better than pulling in a whole library just for this. This works OK:

Running the program in a terminal. It highlights the entire line the word is on, not just the word.

However, you might see a problem. Usually when solving a word search, you would only highlight the word, not the entire line. To fix this, we need to change findWord. Right now, findWord returns the entire line. This is a relic from before we returned positions. Now, since we return positions, there's enough context in the word itself. The current code of findWord looks like this:

findWord :: String -> [[Letter]] -> [Letter]
findWord s p = head $ filter (checkIn s) (rows p ++ cols p ++ dial p ++ diar p ++ dial' p ++ diar' p)
        where lts = map (\(c, _, _) -> c)
              checkIn s line = s `isInfixOf` lts line || reverse s `isInfixOf` lts line
              rows p = p
              cols p = transpose p
              dial p = transpose $ mapIndex (flip drop) p
              dial' p = transpose $ mapIndex (flip drop) (reverse p)
              diar p = transpose $ mapIndex (flip drop) (map reverse p)
              diar' p = transpose $ mapIndex (flip drop) (map reverse (reverse p))

There's a few places we could change the code to make this work. However, I'm going to add an additional step after we extract the filtered list's head. This is a bit inefficient, since it means we have to locate s/reverse s twice, however it keeps the code cleaner and simpler. To start out, we need a function, similar to isInfixOf, but that returns the position of the infix list. Sadly this doesn't exist in the standard library, but happily it exists on the Haskell wiki. Now that we have this function, we can modify findWord to work like we want.

findWord :: String -> [[Letter]] -> [Letter]
findWord s p = onlyString s $ head $ filter (checkIn s) (rows p ++ cols p ++ dial p ++ diar p ++ dial' p ++ diar' p)
    where lts = map (\(c, _, _) -> c)
          checkIn s line = s `isInfixOf` lts line || reverse s `isInfixOf` lts line
          indexOfR s line = fromJust $ findSublistIndex s (lts line) `orElse` findSublistIndex (reverse s) (lts line)
          onlyString s line = take (length s) $ drop (indexOfR s line) $ line
          rows p = p
          cols p = transpose p
          dial p = transpose $ mapIndex (flip drop) p
          dial' p = transpose $ mapIndex (flip drop) (reverse p)
          diar p = transpose $ mapIndex (flip drop) (map reverse p)
          diar' p = transpose $ mapIndex (flip drop) (map reverse (reverse p)

We added the functions onlyString, which returns only the range of letters in line corresponding to s, and indexOfR, which returns the index of either s or reverse s in line. It uses the function Data.Generics.Aliases.orElse, which returns the first or second Maybe, depending on which is not Nothing. We then unwrap that with fromJust, since we know that at least one will not be Nothing. This works a lot better, only highlighting the actual words:

Only highlighting the words now!

This about wraps everything up. Here's the full code to the project:

import Data.List (transpose, findIndex, isInfixOf, isPrefixOf, tails, nub)
import Data.Maybe (fromJust)
import Data.Generics.Aliases (orElse)

type Letter = (Char, Int, Int)

main = do
    let (p, w) = parsePuzzle puzzle
        positions = nub $ foldl (\pos word -> findWord word p ++ pos) [] w
    mapM_ (\ row -> do
             mapM_ (\ l@(c, _, _) ->
                      if l `notElem` positions
                      then putChar c
                      else putStr ("\x1b[32m" ++ [c] ++ "\x1b[0m")
                   ) row
             putStrLn ""
          ) p

findSublistIndex :: Eq a => [a] -> [a] -> Maybe Int
findSublistIndex xss xs = findIndex (isPrefixOf xss) $ tails xs

mapIndex :: (a -> Int -> b) -> [a] -> [b]
mapIndex f l = map (uncurry f) $ zip l [0..length l - 1]

findWord :: String -> [[Letter]] -> [Letter]
findWord s p = onlyString s $ head $ filter (checkIn s) (rows p ++ cols p ++ dial p ++ diar p ++ dial' p ++ diar' p)
    where lts = map (\(c, _, _) -> c)
          checkIn s line = s `isInfixOf` lts line || reverse s `isInfixOf` lts line
          indexOfR s line = fromJust $ findSublistIndex s (lts line) `orElse` findSublistIndex (reverse s) (lts line)
          onlyString s line = take (length s) $ drop (indexOfR s line) $ line
          rows p = p
          cols p = transpose p
          dial p = transpose $ mapIndex (flip drop) p
          dial' p = transpose $ mapIndex (flip drop) (reverse p)
          diar p = transpose $ mapIndex (flip drop) (map reverse p)
          diar' p = transpose $ mapIndex (flip drop) (map reverse (reverse p))

parsePuzzle :: String -> ([[Letter]], [String])
parsePuzzle p = (parseLetters $ lines p, parseWords $ lines p)
    where dup f a = f a a
          parseLine chars y = mapIndex (\[c] x -> (c, x, y)) $ words chars
          parseLetters = mapIndex parseLine . dup (take . (subtract 2) . length)
          parseWords = words . last

puzzle = "\
\N Z C G S M Z C M Q A C B O S W R E D V\n\
\X O K P G H N R E X T C V P E Z G W F V\n\
\W C D T N C P I Y F M E R J S Z W L E S\n\
\T R L K Z A S T M Z J V O N W F C Z X K\n\
\E S N N O H R W Y J I S Z A S M Y I S P\n\
\K T M W J R A D F W F M G W L I G O Q Q\n\
\C L D O L W G A M U V K G N I K C U F S\n\
\V X L K U O G H H I E Y Y Z C I S F B H\n\
\G J W B V U O R F H P E Y M N I I T O Q\n\
\G P V S N Y Y M R N H S A C J F J X H Z\n\
\T G J M F B R L E M H N R C S R I J M M\n\
\K V S A J V O N A W J E A F U U D W L B\n\
\P B H R H H Y A D B D W D Y G S W H A A\n\
\I Q I T M K I A Q I Y M W E A T U M J M\n\
\P P X W A T C H B H Y T Y Z D R S U Y U\n\
\V Q J N T L X L F V S D P G P A E Q E K\n\
\S Z N A E C Y N E J K H T W M T S P Z M\n\
\G Q M S E X T L R G Y G U B Y I J I A B\n\
\J C U D P E Z P M D W F Z G I N Z L L Y\n\
\P Y G X F W H P N T X R S I F G C C L T\n\
\Find the words:\n\
\FRUSTRATING INCREDIBLY FUCKING WATCH SMART"

Exercises for the reader:

Allow it to read word searches from standard input
Allow it to load them from Reddit comments, using HTTP and a JSON library
Many thanks to https://reddit.com/user/Verroq, who pointed out a bug in my code and showed how arrows made it cleaner.

Previous entry: Fixing network list duplication with nm-applet on Linux
Next entry: I've migrated my blog!