Huewords - A puzzle game of words and logic

jsnell

Level 0

« on: May 25, 2024, 05:12:57 AM »

Huewords is a browser-based word game with a logic puzzle aspect, with a daily puzzle and random puzzles of three different sizes. There’s a playable version at huewords.snellman.net.

The basic goal is similar to crosswords. Fill a grid with letters, such that you get valid English words both down and across. But instead of being given clues about the words, you’re given the letters. The letters are grouped together into sets of 3 to 4 letters, and will need to be placed on the grid such that all letters from a group are placed into the same area:

I’ve been working on this for about 6 weeks now, so the first few entries will be backfilling.


	Logged

jsnell

Level 0

Re: Huewords - A puzzle game of words and logic

« Reply #1 on: May 25, 2024, 05:24:32 AM »

Prototyping and hacking something together

I’d been thinking of making a word game for a while. Separate from that, I’d also been thinking of making a puzzle game with a grid of colored areas. The latter might sound like an absurdly specific thing to want to make a game about, but I’d been playing a lot of Sudoku variants during the winter, and ones where the best way to solve the puzzle involved coloring the grid based on some property were always my favorite. There’s just something aesthetically appealing about it.

Individually these thoughts weren’t super actionable, but combined there is some actual design direction. A colored grid that you place letters on, with the letters needing to be placed in the right colors to form words seemed like the obvious idea. I tested it out by manually building a tiny crossword, assigning colors to the grid, moving the letters off the grid, pretending to forget what the words were, and trying to fill out the grid with just logic. This is what my first iteration Google Spreadsheet looked like:

The way I was imagining the UI at this point was with an elaborate deduction game matrix showing “is it possible for this group of letters to be placed into this area”, so that’s how I organized the spreadsheet. This first playtest as workable, but had a couple of problems. It was a bit too hard to get started, but more importantly it turned out that the puzzle was ambiguous, and it was hard (impossible?) to place the areas in a way that would guarantee non-ambiguous puzzles.

So for iteration two, I used two sizes of areas, and revealed one of the words but not its location at the start. I was dubious about the first idea, it felt very non-elegant, but it turned out to be brilliant. Not only did it give way more freedom in how to construct the grid and avoid ambiguities, it made the deduction a lot more tractable and fun.

This was promising enough to whip up a puzzle generator for real testing. It was basically the crudest thing that could possibly work, less than 200 lines of C++ with a word list nicked from the original Wordle. The generator created puzzles as plain text, and then I solved them in a text editor.

The game continued to be fun when played for real rather than knowing the solution in advance. From timestamps, it looks like I played this in a text editor for an hour and half. Again, promising enough to make a UI and test it out on a friend. Again I wanted to move quickly, so I just took the UI of a Javascript game I’d made a few years earlier (https://linjat.snellman.net/) and hacked it to show these new puzzle grids rather than the old ones. The interaction design was very different from how I’d envisioned the game originally, and how I’d been playing it in the text editor. The deduction matrix that I’d thought would be the centerpiece is nowhere to be seen. I never missed it, and no playtester ever asked for anything like that.

The first playtest went well. I handed a friend a phone with the game loaded, and asked them to figure out what the game was about with no documentation, tutorials, or guidance from me. They figured it out quickly enough. But more encouragingly, they didn’t hand the phone back after solving a level and change the topic, but continued playing for a few more levels.

Going from the first idea to a playable prototype that somebody voluntarily played multiple games of in a couple of days felt pretty satisfying. This is what that first iteration of the game looked like:

And this is what I wrote about it at the time:

Quote

Anyway, it feels good enough already and the known problems are tractable enough that I’ll probably throw it over the wall in a week or two. The only absolute blockers are honestly finding a better name and making a daily puzzle mode.

The estimates for both what was still needed and how long it’d take were pretty far off


	Logged

jsnell

Level 0

Re: Huewords - A puzzle game of words and logic

« Reply #2 on: May 26, 2024, 02:14:24 PM »

Making a dictionary

For the very first prototype I was using the word list from the original Wordle, but that was strictly a placeholder asset that would need to be replaced before allowing others to play. Licensing is one obvious reason (the list isn't open source), but the word list should also be optimized for the use case.

For Wordle, having a word the player doesn't know as the goal word is a bad experience, so the list needs to be pretty careful about obscure words. But due to the structure of the game, they also need to avoid entire classes of words because they have patterns that would make guessing monotonous and too easy. For example, Wordle does not include plurals or past tenses of words. If they did, something like a third of the words would be plurals ending in an "s", and that would bias the core search problem badly.

Neither of those problems applied to my game: rare words are fine, as long as you only have 1-2 in a single puzzle. There's plenty of ways to solve the puzzles even if you don't know every word. And having a lot of predictable letters or letter combinations in certain locations of the word isn't *that* bad.

The freely licensed word lists I could find were intended for use in games like Scrabble. For those use cases, the primary goal is that a valid word should always be in the list. It doesn't matter how ridiculously obscure a word is, if there's any way one can argue it's a valid word, the right outcome is to allow it. So that's what those lists contain: archaic words not used in hundreds of years, variant spellings that you'd never see, extremely specialized scientific or medical terms, outright foreign words that aren't *really* used anywhere near as often in English to count as loan words. This kind of list wouldn't work for me either. Rather than 1-2 words the player doesn't know, they wouldn't know 2/3rds of the grid.

Also, these "anything one could argue is a word" dictionaries predictably contained a lot of slurs (ethnic, sexual), vulgar terms, etc. Again that's something that's fine for scrabble. If somebody intends to score points with some racist slur, it's up to them to read the room before doing it. The dictionary is enabling it, but not forcing it. In this game, the player doesn't get to choose the goal words. If the puzzle contains an offensive word, there's no way to avoid the player interacting with that word. Forcing the players to do that is not great, but it's also not clear where you draw the line.

So what I needed was a word list somewhere between what Wordle had (~3k words) and the utterly bonkers word game dictionaries with liberal licensing terms (~10k words, most of which are total bullshit). I couldn't find any, so clearly the way to do it would be to create my own by starting from a large list of potential words, and whittling it down to just what fit my use case.

In some initial testing, I found I could rate about 500 words as common/uncommon/not a valid word/too offensive in 10 minutes, but with a rather high 2% error rate. (Not 2% error rate compared to an objective truth, but a 2% error rate compared to my own answers to the same question an hour earlier). That kind of error rate wasn't workable.

My second thought was to use LLMs for this. I'll be damned if I let an AI do my coding, but menial data classification? That sounds perfect. I tried using 4 different LLMs for this (Llama 3, Mistral, Wizard, Gemma), and got very inconsistent results from them. This was after hours figuring out just how to write a prompt in a way that the LLMs could answer best (e.g. what output format to ask for, what kind of classification categories to ask for, how to make them say whether a word is offensive rather than refuse to answer entirely, whether asking for definitions/example usage gave better results, etc). Like, at this point I probably spent more time on just tweaking those AI prompts than on the entire programming work on the puzzle generator, and the results still weren't reliable.

What I ended up with was basically an ensemble model:

I first classified 10k words manually. I then compared my classifications to the LLMs. If my opinion matched sufficiently many LLM outputs, I just accepted the results outright. That left maybe 2k words to be triaged more carefully. For that triaging, I did the work in tranches of similar words to try to make the decisions consistent, and used the LLMs a bit to aid in the decision making. (E.g. if I had a gut feeling on what the word probably mean, and most LLMs gave a similar definition, and none of the LLMs claimed red flags like the word being archaic or slang, then there's a good chance it should be included). But still, all the actual decisions were done by a human. Based on my results, trying to fully automate this would have been a disaster.

Once I did my first round of playtesting, it quickly became obvious that I wasn't calibrated quite correctly, and the words were a tiny bit too obscure. But after a change to remove the tranche of 100 words I was least confident in, the complaints about obscure words mostly stopped. I'm still getting a trickle of inclusion requests, most of which are pretty reasonable.

Another trick I discovered too late was to maintain two lists: one lists used for puzzle generation, another used to decide during game play whether a word is valid or not. I first added it to make removing words easier. Moving a word from the first list to the second would ensure no new puzzles would use that word, but none of the existing puzzles would break. But it also turned out to be super useful when reasoning about offensive words.

Like, my initial criteria for offensive words was basically slurs, words for genitals, and vulgar words for sex or bodily functions. It's a workable definition, and I didn't want to make the list too large since it'd make the game feel too unpredictable and arbitrary to the player. "I tried to use this slightly offensive word and it was rejected, but the other one that *I* think is also offensive was accepted". With the two dictionary solution, I could put the borderline cases (not that many) into the second list. Nobody would be forced to type them, as they'd never be part of the intended solution. But anyone trying to use a common word they didn't think was that bad wouldn't rebuffed by an "unknown word" error either.

I've only done this work for 5 letter words. Doing it also for 4 and 6 letter words would be really good, as it'd give a lot more flexibility for making interesting puzzle layouts. And it'd be amazing to be able to show word definitions after the game. But given what I know now about this process, it'd going to be tons of work. I still want to do it, but would need to invest in writing some better tooling.


	Logged

jsnell

Level 0

Re: Huewords - A puzzle game of words and logic

« Reply #3 on: June 03, 2024, 04:04:50 PM »

Skipping ahead to present time, I did some work to try to distill the feel of the gameplay into about 15 seconds. A game with just one view where you make a move every ten seconds doesn't make for good trailer material

Speeding up actual gameplay doesn't seem viable.

Here's the current iteration:

https://www.youtube.com/shorts/7qgHglo9WfI

Basically I added a mode to the game where I could record all the moves, and then a separate mode where I could replay the moves back with a timing of my choosing. What I did was a really simple algorithm for whether a give move should have a long pause (500ms) or short pause (200ms) before it, and then added some noise. It gives very predictable results, but also feels kind of organic when watched.

In one sense, this is a ridiculous thing to get distracted by. I don't think I'll actually have any use for a trailer, and have no plans to enable the replay functionality for players. On the other hand, it was a very entertaining and different problem.


	Logged

Pages: [1]

« previous next »