While I unfortunately haven’t had a lot of time to contribute to Factor for a couple of years, I still love using it for the little random programming tasks I have to deal with day-to-day. Factor’s design makes it perfect for the kind of exploratory work that hits at the fringe of what it makes sense to automate. And on top of that, I still think Factor should be more widely used, so I like doing what little I can with what time I have to “make fetch happen”.

One of my current addictions is to the New York Times’ Crossword app. The app, in addition the actual crossword, has a pile of other word games that are perfect for burning a couple minutes. And one of them, Spelling Bee, drives me slightly crazy.

Spelling Bee is a simple game: you are presented with seven tiles, each with a unique letter. One of the tiles is gold. Your goal is to come up with as many words as possible that are at least four letters long, use only those seven letters, and use the gold letter at least once. You are allowed to use any given tile multiple times, so, given tiles that include A, B, and O, a word like baobab would be completely legal (provided at least one of those letters was gold). If you happen to get a word that involves all seven letters at least once, that’s called a Pantograph or something, and you get the most points.

The thing is, sometimes, I cannot begin to guess what on Earth the Pentateuch is for a given day’s Spelling Bee. At the time I’m writing this, the letter files are X, E, O, I, F, L, and N, with N being the gold letter. Short of British spellings like “inflexion” (which the game does not accept), I have no idea what word they were aiming for.

Wouldn’t it be great if I had a program that could tell me? Maybe written in a language of my choosing for once, rather than whatever’s got the library I need?

So anyway, the answer is yes, and we’ll be doing it in Factor.

This is a very short and simple program that’s not itself interesting. My intent, rather, is to give a quick idea of how I approach this problem in an environment like Factor, where the program is running the whole time I’m working on it, and how much the rich library makes very short work of this kind of issue. If you want to follow along, this entire example fits comfortably inside Factor’s graphical REPL, which it calls the Listener, so it’s quite easy to run the code yourself. Just grab a nightly (under the “Development release” section), fire it up, and follow along.

First up, I know I’m going to need a list of words. Some quick googling and a words.txt later, and I’m ready to do the first step of loading the lines:

USING: io.encodings.utf8 io.files ;
"/Users/benjamin/Downloads/words.txt" utf8 file-lines

That’ll put an array of the words from the file on the stack. The USING: lists the libraries (“vocabularies”) we want to have available; I just need the UTF8 and file-reading vocabularies for this, the first for the utf8 word (which just says to read the file in UTF-8), and the second for file-lines (which is a handy way to yoink an entire file into RAM, broken up by lines).

I can see on quick inspection it’s got a lot of garbage in it I don’t want: proper nouns, contractions, words with numbers in them (what?). Since Factor’s concatenative, I can just do some more operations on this unnamed value on the stack, kind of like if I were building up a pipeline in a shell.

First, let’s get rid of the short words:

USE: sequences
[ length 4 >= ] filter

filter preserves elements in a collection that obey a given predicate, which in this case is the anonymous function (“quotation”) [ length 4 >= ], which will simply check if a string is at least four characters long.

Next, let’s nuke anything that is a proper noun or has characters like ' in it that aren’t letters:

USE: unicode
[ [ letter? ] all? ] filter

Okay, much better. letter?, from the unicode vocabulary, ensures that a character is a lowercase unicode letter. (Letter? would allow uppercase, and LETTER?, the reverse of letter?, would require it.) The all? ensures that every character in the string obeys letter?. After we run this filter, we can easily see in the REPL that the list of candidates is now entirely reasonable.

Now, one of the reasons I love Factor for this kind of problem is that it has an amazingly rich collection vocabulary, including a deep pile of set operations. On top of that, any linear collection (“sequence”) can function as a set. And guess what strings are? They’re just a sequence of characters. Sure, it won’t be efficient, but for now, I’m just playing around. All I really need to do is find any words in that word list that are subsets of a string containing all the tiles.

Well, that’s almost too easy:

USE: sets
[ "xeoifln" subset? ] filter

And boom, there’s my list. Well, almost; since N is a gold tile, I need words that actually involve N.

[ CHAR: n swap in? ] filter

And I probably should sort these from longest to shortest, since long words have the most points:

[ length ] inv-sort-by

And bingo, we’ve got our answer: inflexion. Which unfortunately was a word I already tried that it rejected. But at least, next time, when it’s not looking for some weird and obscure word, I can find it.[1]

But of course, if I want to reuse all of this, it’ll be annoying to enter these commands one-by-one on the REPL. Thankfully, since Factor’s concatenative, I can almost literally paste the entire thing into a source file and be done with it. Here’s what that might look like:

USING: kernel combinators.short-circuit io.encodings.utf8 io.files sequences sets unicode ;
IN: spelling-bee

! Words that help us filter the word list down
: word-length-okay? ( string -- f ) length 4 >= ;
: word-letters-okay? ( string -- f ) [ letter? ] all?
: word-okay? ( string -- f ) { [ word-length-okay? ] [ word-letters-okay? ] } 1&& ;

! A word to actually load and filter the word list
: word-list ( -- seq ) "/Users/benjamin/Downloads/words.txt" utf8 file-lines [ word-okay? ] filter ;

! And finally, two words to find candidate words
: candidate-words ( string -- seq ) '[ _ subset? ] word-list swap filter ;
: best-candidate-words ( string -- seq ) candidate-words [ length ] inv-sort-by ;

If you look back at what we did in the REPL, you can see that all I’m doing is naming the various steps we did, and then executing them in the exact same way.[2] Even though I don’t have tests yet, I have extremely high confidence this works, because I just interactively built it up. And if I now want to optimize this (because its performance is awful, though “awful” here just means “it takes 250ms on my machine”, so not too bad), I can easily add some tests and begin refactoring in a similar manner. In fact, after I wrote this post, I used bit-set, which is a…well, a bit set, to improve the execution speed by a factor of about 20, such that it only takes about 20ms to give me the solution. (I also added a word that finds only Pentathalons or whatever they call them, which simply swaps out subset? for set= in candidate-words.) Not too bad for a couple minutes of work.

Hopefully, that gives some insight into what actually working in Factor feels like. It’s not unique in a strict sense, since Smalltalk and Common Lisp strongly encourage the same development style. But it’s definitely a very different approach from any mainstream language I’ve used. Even in environments that do have a rich REPL, such as JavaScript, it’s not nearly so easy to migrate your test examples out of the REPL and into a source code file for further refinement; you often end up doing a lot more massaging than we did here, where we basically just named the steps.

If you’re interested in poking more, you’ll find that Factor has a very rich set of documentation, and that our Discord community is very welcoming. Stop on by, and I’ll be happy to help you out.

  1. It actually turned out that they didn’t accept inflexion, but did accept flexion. This is an outrage, and they will be hearing from my attorney. ↩︎

  2. I confess I tossed a 1&& in. Without getting into how it works, it’s just joining the two conditions we did earlier so we don’t need to run filter twice, once for length and once for letters. ↩︎