Calculator texts
Some weeks ago HN user wonger_
posted this list of words
to Hacker News. This is a list of the words you can spell using only an
upside-down calculator - the word "boobies" (5318008) is perhaps the most
well-known, but it's far from the only one.
This comment by user chriscbr
went a step
further and annotated all words according to their Part-of-Speech while
this one by user jprete
raised the bar: can
you do a long work of fiction only using calculator words?
I spent some time trying to make the longest possible text, and this post is a long, complicated way to say "probably not".
Randall Munroe of XKCD fame published some years ago this article on how to write texts using only a subset of letters - he was interested in phone keypads, but there's no reason why his code can't be adapted to our task. The basic idea is that of a trigram model: you train a model to predict "given these two words, this is an ordered list of the words that have the highest probability of coming next", you restrain your words to those that fit your constraint (in our case, letters that can be mapped to a calculator), train the model with some data, and you're done.
After downloading Randall's code, updating it for Python3 and making it slightly more efficient I trained it with as much Wikipedia text as I had patience for. The exercise gave some interesting word combinations, although nothing resembling a coherent long work of fiction:
- Be less (would make a nice parody of "Be best")
- Oh Ohio hills
- High heel shoes
- I'll go see his leg
- His shoe size is big
- He is obese she is his size
- He is high, so is his boss, so is she
The next step was to use a more capable language model, and for that we move onto LLM territory. The idea is straightforward: during generation an LLM will look at the input words and make an ordered list of the most probable next words in the sequence. Usually we want to use one of the top words, but nothing stops us from using the most likely words that only uses a specific set of characters. We need to do this by hand because LLMs are incapable of backtracking - if they have generated "Hegel oozes ego" and then realize that they've written themselves into a corner, there's no way for them to backtrack and try something different. As a result sooner or later they all choose a word that doesn't fit the instructions, which is where we come into play.
Writing the prediction code was straight-forward, but punctuation was an issue: we do want to keep some punctuation for the text to look natural, but at the same time the phrase "Oil... Oil... Oil... Oil..." is more likely than something like "Hillbillies besiege his soil". And this meant that neither GPT-2 nor LLaMa 2 could generate anything long, although some short phrases ended up being interesting enough:
- Bob is eligible.
- He is 2 eggs high. See his gills.
- He is 90. She... she is his lobolo.
- Hello Bill, Bob is his big ol' goose.
At the end, the best approach for me was to go back to the beginning, use the annotated list of words and build phrases by hand. This led me to the short-story:
Bob sees Zoe boil his sole beige goose egg, sighs. She giggles.
HN user araes
used ChatGPT
and, after some sentences that didn't stick to the prompt (color me surprised)
eventually generated the text:
Ellie sells loose shoes; Bill shills big goose oil.
Not one to be outdone, HN user bcherny
used Claude 3.5 to obtain:
Ellie sees Bill. Bill sells bibles. Ellie lies, "Bill, sell bibles else." Bill sells bibles. Ellie gobbles bibles. Bill obsesses. Ellie sighs.
Which I now counteract with my own:
Leslie obliges. She'll go see Giles' bill "Oh, Ohio hills". She seizes Giles' high heel shoes, giggles. "Bozo, go sell blisses".
Or, if you want something even longer and feel like reading some really weird prose:
Giles sells big oil. Loose soil geologies is his hobbies, his Lego. He sees Shell be so big he is ill. He begs his boss Zoe, his gig is hell.
"Hell is high heel shoes. Hell is Boise, Ohio. Hell is hillbillies", his boss hisses. "Go be Bob's hell".
Giles obliges.
Bob oozes ego. Hegel is his Bible. Bob is so high he giggles. "Oh ho ho! Hello bozo".
Giles sighs. "Hello Bob. Zoe lobs blisses."
"Blisses? She lobs ills."
Can we do better? Probably. Could we generate "a long word of fiction"? I'm going to say "probably not": having 35 verbs and lacking both the letter "a" and the word "the" restricts the set of possibilities a lot. But given enough patience I bet you could easily write a page of more-or-less coherent text.
Let me close this post with some of my favorite generations that didn't make it into the longer stories because they are a bit inappropriate:
- "Google is so big, ISIS less so": Google had ca. 180k full-time employees in 2023 while ISIS may have around 16k. So yes, ISIS is "less big" than Google.
- "He is so high he sees Oz": That is, indeed, pretty high. L. Frank Baum would be proud.
- "Hi, I'll go see hell": it's not the end of the World nor The restaurant at the End of the Universe, but maybe someone is just visiting the Museo del Prado in Madrid.
- "Oslo is hellhole": Wow, rude!