I don’t normally play mobile games, but I do take the train every day so I’m somewhat aware of the mobile game landscape. One game that seems popular is basically a word jumble in the form of a crossword puzzle. One example of such a game is Word Crossword Search. You get a series of letters and try to make to make as many words as you can. The crossword format gives you hints.

Word Crossword Search

You know that the gameplay must be good that people are willing to put up with the garish style. Rather than submit to my desires, I decided I can build the game instead. By the time its complete, I will be thoroughly exhausted about thinking about the idea that I’ll no longer be interested in playing.

Exploring Common Words

The first step is to find a word list. Most unix systems including osx have a word list under /usr/share/dict/words but there are over 200k words and many are obscure (e.g. abaciscus, abacist, abactinal). I found a list of 5,000 common words although I had to put in an email (throw-away mail works). I made some slight manual changes to the word list (removed duplicates and added word length as a column). I kept all the columns in case I want to reference them later, but my table schema looks like this:

rank   word   len   partOfSpeech   frequency   dispersion

I loaded them into an array of Words:

import fs from 'fs'
const filename = './common_words.csv'
const getCommonWords = (filename: string) => {
    const raw = fs.readFileSync(filename, 'utf8')
    return raw.split('\r\n').slice(1).map(r => {
        const [rank, word, len, partOfSpeech, frequency, dispersion] = r.split(',')
        return {
            rank: Number(rank),
            word, len: Number(len),
            partOfSpeech,
            frequency: Number(frequency),
            dispersion: Number(dispersion)
        }
    })
}

const words = getCommonWords(filename)
export const allWords = words.map(w => w.word)

Here are the first ten words:

allWords.slice(0,10) =>
  [ 'the', 'be', 'and', 'of', 'a', 'in', 'to', 'have', 'it', 'I' ]

From there I had to build out some general utils to process the array of words. The first function I need is a counter, similar to the counter in python’s collection utils.

In our case, we want to take a word and return a map of characters and their respective counts (e.g. counter(“hello”) => {h: 1, e: 1, l: 2, o: 1})

The great thing about typed languages is that you can word on the function signature, and when you decide on that the rest of the work becomes considerably easier.

Static v Dynamic

In our case, we want to take in a string and return a Map<string, number>.

export const counter = (word: string): Map<string, number> => {
    const ct = new Map<string,number>()
    word.split('').forEach(ch => ct.set(ch, (ct.get(ch) || 0) + 1))
    return ct
}

The next thing we need is the see if a word is a subset of another word. The easiest way to do that is to determine a counter of each word, and see if all the character counts in word1 and less than or equal to the character counts in word2.

export const subsetWord = (word1: string, word2: string): boolean => {
    const ct1 = counter(word1)
    const ct2 = counter(word2)
    return Array.from(ct1).every(([k,v]) => (ct2.get(k) || 0) >= v)
}

The above function iterates through the counter object of word1, tries to get the corresponding character count from word2 and checks if word1 contains more characters than that of word2.

subsetWord('abe','anabelle') => true
subsetWord('sam','anabelle') => false

Now that we have a way to check to see if a word is a subset of another word, we can filter all the words for words that are a subset of a given word.

const getSubsetWords = (word: string, words: string[]): string[] => 
  words.filter(w => subsetWord(w, word))

To see the subset of the word method:

getSubsetWords('method', allWords) =>
  [ 'the', 'to', 'he', 'do', 'them', 'me', 'home', 'oh', 'hot', 'method', 'mode', 'toe', 'dot' ]

So let’s see what 6 letter common word has the most and least subset common words.

const allWords6 = allWords.filter(w => w.length === 6)
const subsetWords6 = allWords6.map(w => [w, getSubsetWords(w, allWords)])

You would think that the type of subsetWords6 would be [string, string[]][], but Typescript infers the more general format [string | string[]][][]. But we can give Typescript a hint and it won’t complain. This is common when your maps return arrays.

const subsetWords6: [string, string[]][] = allWords6.map(w => [w, getSubsetWords(w, allWords)])

Finally, we need to find the 6 letter word with the largest number of subsetWords. Typescript doesn’t have a max that can take in a generic. Math.max also only words on number. But what is a max really? It’s just a reduce!

const best = subsetWords6.reduce((a,b) => (a.length > b.length) ? a : b)

Finally lets look at the results:

['pistol', 
  ['to', 'it', 'so', 'its', 'lot', 'sit', 'stop', 'oil', 'top', 'list', 'spot', 'lots', 'lip', 'pilot', 'slip', 'soil', 'tip', 'post', 'pot', 'lost', 'split', 'plot', 'pit', 'opt', 'slot', 'spit', 'pistol']]
[ 'people', [ 'people', 'pop', 'pole', 'peel' ] ]

When choosing what words to use, we may want to limit to words that have a certain number of subset words. So we may want to exclude people as that puzzle would be too short. The next step is to figure out how to plot it on a crossword and how to represent that, which I’ll cover next time.