ExtraTricky

This past weekend was the 2017 edition of the annual MIT Mystery Hunt. The Mystery Hunt is one of the largest so-called "puzzle hunts," where teams set out to solve puzzles that lead to more puzzles and eventually to some sort of coin that marks the final victory. Part of the Mystery Hunt tradition is that the winner of one year's hunt writes the hunt for the next year. Last year's hunt was won by Setec Astronomy, a relatively old team that had written the 2000, 2002, and 2005 hunts, and had been famously avoiding winning for quite some time. According to some of my teammates, their past hunts were a big part of the transformation that took Mystery Hunt from a small series of puzzles to the large extravaganza it is today.

There are a lot of different facets to Mystery Hunt, and this year did some of them well and some of them not so well. I certainly did not feel that this year's hunt was bad, but let's get the most obvious criticism out of the way first.

Hunt Length

The hunt started at about 1 pm on Friday, and the first team found the coin at 4:23 am on Saturday, for a time of a bit over 15 hours. That is absurdly short for a hunt, and it wasn't because of the top team being particularly speedy. The second place team was close behind, and my team, Hunches in Bunches, finished in eighth place around 7 pm on Saturday. Being short isn't inherently bad, but my expectation (as well as that of many other people that I talked to), was that as a team that was very unlikely to win, we'd still be solving puzzles on Sunday, with a hope to finish sometime early Sunday afternoon.

In all fairness, Setec's goal was exactly that expectation. They wanted to write a hunt that would see the first place team finish on Saturday evening (right around the time that Hunches actually finished), and then have ten or so teams trickle through endgame over the next 18 to 24 hours. If you were to ask me how long a hunt should be, I would say pretty much the exact same thing (though I wouldn't mind a longer hunt either).

A short hunt isn't all bad. As a result of the hunt being short, seventeen teams were able to finish, which I believe is an all-time high. As Setec mentioned during wrap-up, there are a lot of different teams who want a lot of different kinds of hunts. For the smaller teams who were able to finish a hunt for the first time this year, this is exactly the hunt that they wanted. On the other hand, for the bigger teams, it was a bit disappointing. One of the best compromises in my eyes is the idea of having a midpoint that has its own runaround and miniature endgame, to serve as a point of completion for the smaller teams. This idea was pioneered (as far as I'm aware) by the 2009 hunt, and has been incorporated into several hunts since then.

Setec intended for the character endgame to be that midpoint, though I didn't realize it until they said it at wrap-up. Given the results of the hunt, I think that it did its job perfectly fine. I think they said that 29 teams were able to complete the character endgame, which was unlocked after solving the six character metapuzzles. Keeping that the same but beefing up the quest portion would have been great for the larger and stronger teams, while preserving the experience for most of the smaller teams (though, of course, it would be worse for the teams that finished this hunt and wouldn't have finished a longer one).

Getting the length of hunt right does require some amount of luck. When Hunches (known as Random Fish at the time) wrote the hunt, our rule was to get two independent testsolves. Two solves is just a glimpse into what can happen to teams on the actual day. Maybe your cluephrase worked perfectly for people on your own team but does that mean it will work for everyone? What if a TV show kills off a character two days before hunt? Even with best practices during testsolving, there is still a good amount of guesswork that goes into making hunt end when you want it to.

Metapuzzles

One of the reasons that the hunt was so short was that the metapuzzles were all rather straightforward. There were six character metapuzzles and eight quest metapuzzles, and only one gave us any real trouble. Our team has historically struggled with metapuzzles, so this was a big change, and I really believe it was because of the metapuzzles and not because of something being different within our team.

A good metapuzzle is one of the hardest things to design. There is a very narrow line between the lands of obvious and impossible, and the best metapuzzles are perfectly balanced on that line. However, an impossible metapuzzle is generally going to be a worse experience than an obvious one, which means that a writing team might decide to play it safe to avoid teams getting stuck and frustrated.

The metapuzzles this year reminded me a lot of the metapuzzles from the 2014 hunt, themed around Alice in Wonderland. The character metas from this year and the MIT metas from 2014 were so-called pure metas, in which the focus is purely on the puzzle answers and noticing a pattern within them. The quest metas and the Wonderland metas both included quite a bit of flavor text, giving a clue about what to do, any additional information that might be used, and a question to be answered.

The resulting feel of the metas is, in my opinion, suboptimal. Rather than looking at the information gained from solving puzzles, much of the time spent solving this style of meta is in reading the flavor text and trying to understand what it is telling you to do. As with most mechanics, the hints can fall on a spectrum from very subtle to very obvious. I felt that this year's metas fell too far on the obvious side, though the criminal meta looks like it was executed well (sadly I was asleep while my team solved it).

Thematic Answers

Thematic answers are a tricky thing. First, let me be clear about what I mean by a thematic answer. In the 2014 hunt, there was a White Queen round whose story went that the beast had come and turned everything red. The White Queen, being the White Queen, wants things to be white. So how can you turn things white again? The metapuzzle answer, CLEAN WITH IVORY SOAP, is a direct answer to that question.

Thematic answers can be great in some regards. They give the solver a feeling that the act of solving puzzles is actually related to the advancement of the plot. On the other hand, an answer being thematic goes hand in hand with the answer being guessable. If an answer is too easily guessed, there are lots of downsides, especially if the answer is to a metapuzzle.

The biggest downside is that a guessable metapuzzle answer leads to some of the consitituent puzzles to be almost thrown away. The reason that people write puzzles is because they want people to solve the puzzles. By default, if a round has 9 puzzles and the metapuzzle is solved with only 3 of them, then the remaining 6 are worthless. Most writing teams incentivize going back to these puzzles by having every solve contribute toward unlocking new puzzles, but that's not a perfect solution.

The problem, of course, is backsolving. Metas usually place a good number of constraints on the answers of their consitituent puzzles, such as each answer needs to be 13 letters long, or have these letters in these positions. When a team figures out the meta mechanic, they usually have some amount of information that can guide them to appropriate solving paths. When they have the meta mechanic and the meta answer, they can often narrow down the set of possible answers to a very small list and call them in until they find the right one. That is the essence of backsolving.

As a solver, I love backsolving. The feeling you get from a successful backsolve is very similar to the feeling of stumbling on a useful glitch in a video game, or finding a more efficient route than the obviously intended one. It's amazing and I think that hunts should be supportive of backsolving. However, when a team backsolves a puzzle, they will typically no longer be looking at that puzzle during the hunt, because there is nothing more to be gained. This means that if backsolves happen too quickly, there is no time for people to even attempt to forward solve the puzzle. One of my biggest regrets in how we ran the 2015 hunt is that we made The 10000 Puzzle Pyramid the last puzzle to be unlocked in its round. The result was that most teams got an hour or two into it (in testsolving it took around 10 hours) and then backsolved it from the meta.

In this year's hunt, there were several times that we had very confident backsolves for puzzles that we had not even unlocked. As soon as the puzzle unlocked, we would call in the answer we already knew, and never even look at the puzzle. As much as I think backsolving is awesome, I think it's a shame that this happened to so many puzzles, and I hope that future teams put in effort to avoid it.

I have a related story from when I was one of the main problem writers for the Harvard-MIT Math Tournament. In one of the earlier years of the competition, one of the hardest problems on the contest had the simple answer 1. As there is no penalty for guessing, there were plenty of people who just decided to write down 1 on a whim and got a very valuable question right, despite not actually solving it, or several of the significantly easier problems. With respect to competitive integrity, I view this as a disaster. Do you really want a contest where someone might solve a couple 3 point questions, then guess the correct answer to an 8 point question, and then place higher than someone who solved the 3 point questions and a 4 point question?

I had an informal rule for myself that the questions should strive to have answers that you wouldn't guess without some significant insight into the problem. An easy method for a math competition is to have non-integral answers as often as possible. It's very likely for someone to guess something like 0, 1, 2, 3, 4, or 5. It's essentially impossible for someone to guess \(\frac{2^{2011} + 1}{3 \cdot 2^{2011}}\). That said, many problems are best stated in a form with an integral answer, but keeping answers above 10 will usually prevent people from getting it right with an uninformed guess.

I believe that Mystery Hunt would benefit from similar non-guessability. It should certainly be possible to intuit the answer to a meta if you're only missing one or two puzzles, but on the other hand it should usually not be possible with only one or two puzzles, unless the round is three puzzles long or if that's an intentional design decision. Getting good testsolving data is one of the most difficult problems of writing hunt, so I don't expect to ever have it perfect, but here are some ideas to start out with.

For normal puzzles, consider having testsolvers answer the following questions within 10 minutes of opening the puzzle: If you were forced to call in 5 answers right now, what would you call in? These will probably be various things based on the theme of the puzzle and so on. If your testsolvers hit the actual answer, it's probably too guessable.

For meta puzzles, the problem is a lot stickier. In a perfect world, you'd test the metapuzzle with every possible subset of answers. In the real world, it's impossible to get that many testsolves on a single puzzle. So what's the best you can do with somewhere between 2 and 5 testsolves (if there's one place to put extra testsolving power, metas are definitely it)? Ideally you'd start out a testsolving group with a very small number of answers (zero?). The testsolving group needs to then be able to say something like "At this point we are confident that we won't make more progress with what we have and need another answer to move forward."

There are several reasons why you might want another answer. It might be that you've noticed a pattern in the answers, but that pattern could have appeared by chance on the subset of answers you have, so you want another answer in order to test the predictive power of your pattern. It might be that you just don't see any pattern yet so you need more answers to free associate with. It might be that you are very confident on the mechanic but don't have enough answers to actually extract from. In any case, testsolvers should start with a set of answers that's almost assuredly too small to solve from, and then work their way up until they have a complete solve.

However, there are potential pitfalls of this approach. If the round is composed of mostly easy puzzles, then real life teams might acquire answers faster than they get stuck. That produces a solving experience that you wouldn't find with this type of testsolving. You also might find that your testsolvers asked for answers faster than they really needed to, since it's easy to feel stuck without actually being stuck. I think that when we were writing the 2015 hunt, we started testsolvers with about half of the answers to the round puzzles, and then they could ask for additional answers one at a time as they get stuck. It worked pretty well, but as always I'd be interested in hearing about alternative methods and how they've turned out.

Puzzle Quality

Shortly after last year's hunt, I wrote a blog post about what I feel consitutes a good puzzle. The puzzles in this year's hunt were all very well constructed and clean, and going through the list of points in my old post, I'd say that Setec's puzzles had every single thing nailed except for the last one. In other words, the puzzles were good but not great.

The greatest puzzles in my mind are those that push the boundary of what it means to be a puzzle. Examples from the past are puzzles that change without telling you, puzzles that have more than one answer, puzzles that are based on statistical analysis rather than constant data, puzzles that clearly have too much data to go through by hand, and many more.

There was no puzzle that I worked on this past weekend where I got the feeling of "Wow, I can't believe that someone actually had the idea to write this puzzle." While the puzzles were very clean, they also felt very safe, sticking to tried and true puzzle mechanics rather than exploring new ones.

There's a parallel here to the game development industry. There are many games that are variations of an old theme. Many first person shooters have essentially interchangeable gameplay, with the main differentiators being the story around them. These games can be good, or even very good, but they won't make a legacy as a genre-defining classic, the way games like Ocarina of Time or Super Mario 64 have. Nintendo in particular has a philosophy of making sure that when they publish a new game it has something unique about it that makes it stand out among all of the games that came before it. They try to stretch the idea of gaming with fundamentally new ways to play, rather than putting a new story on an old game.

What I'd like to see is for writing teams to be willing to take these risks and produce great puzzles rather than merely good ones. Attempts at new ideas will fail here and there, but the payoff is completely worth it. While I don't necessarily want to see another hunt go into Monday afternoon, if that happens because of an ambitious attempt at innovative puzzle writing, I think we should be encouraging that behavior.

Plot and Unlock Structure

While the main focus of the hunt is the individual puzzles and their interactions via metapuzzles, there is quite a lot of support around the puzzles that Setec did extraordinarily well and should be mentioned. The first of those is the plot.

Setec chose a plot about a DnD group getting trapped in their game world by an evil sorceror, and the hunt was a quest to defeat the villain in order to rescue the players and dungeon master. This plot idea came from their desired unlock structure, in which a team would have multiple values that could increase, and different puzzles would require a threshold on different values. They manifested this idea as the levels of characters in a party, which initially had 3 members but grew to 6. With this unlock structure, teams could choose puzzles to prioritize based on what they unlock.

I think that the unlock structure worked well, but I would give a few cautions to future writing teams. First, just like with puzzles, a great Mystery Hunt will stretch the idea of what it means to be a Mystery Hunt. Make a structure that stands out among all the other Mystery Hunts. Hunt shouldn't be a series of one round after another, but rather a cohesive experience. Be cognizant of how your round structures affect that experience and adjust accordingly.

Second, think about how many puzzles you want to have open at a given time. This year's hunt opened extremely wide. Supposedly at one point Death and Mayhem (the winning team) had a whopping 51 unsolved puzzles unlocked. For comparison, I believe we tried to keep that number around 20 for top teams in the 2015 hunt. A wide open hunt means that larger teams have an advantage. Manpower doesn't scale linearly on a single puzzle, but it comes really close to scaling linearly as long as there are puzzles that nobody is working on. It's not healthy for the hunters to group into 250-man "superteams", and writing a hunt where a superteam is not particularly advantaged is a good way to disincentivize that behavior.

I'd also like to point out that the plot choice gave Setec a lot of freedom as to the themes of their individual rounds. While each round might be thought of as a "quest", there can be a wide variety of quests drawing from many different sources of inspiration. This is one of the properties of a theme that I pushed for strongest when writing our hunt which ended up being an underwater expedition. By creating your own fictional world, it is much easier to fit in a variety of puzzle types and appeal to a wider solving audience.

A plot based off of existing works is quite limiting, and you can see that in several past hunts. For example, the 2012 hunt was based around The Producers and Broadway in general, leading to a very polarizing experience where people into Broadway loved it and people not into Broadway felt left out (I was one of those people who felt left out).

Hunt Tech

The technology behind Mystery Hunt is the other big supportive aspect of running it. A broken or slow website makes hunt less enjoyable, in part because it breaks part of the immersiveness. This year's website worked amazingly, and the main quirk that I noticed was that the people calling teams from the answer queue were told whether the submitted answer was correct or incorrect, but not what the answer was. It wasn't a huge deal because the website tracked everything as well, but it did make the calls feel less useful than they could have been.

So to all future hunt writers: Please continue to invest your resources into making the hunt run smoothly! I know how much goes on behind the scenes and how much is liable to break, but a polished and functioning website adds a lot to the solving experience.

Overall Review

Last year I started a list of the hunts that I solved onsite ranked by how much I enjoyed them. This list is here to be a guide for future hunt writers to understand the kinds of hunts that I enjoy, and to have an idea of which hunts to look at for examples of what to strive for. I hope that hunt writers can avoid taking low rankings personally. Writing hunt is always a learning experience and it's entirely understandable when stuff doesn't go quite right. I will also mention that this ranking is roughly based on my feelings about the hunts at the time that they debuted, so while I hope it is a rough indicator of the success of the hunt, it may be thrown off by effects like the novelty of the first few years.

Escape from Zyzzlvaria (2009)
Coin Heist (2013)
Time Travel (2010)
Video Games (2011)
Alice in Wonderland (2014)
Dungeons and Dragons (2017)
Huntception (2016)
The Producers (2012)