22 July 2017

In my opinion, one of the most important tenents of good communication is to say what you mean. One very common situation where people falter on this front is when there are two related words that aren't quite the same. In order to be able to communicate most effectively, it is worthwhile to examine those words, understand the difference between them, and then consistently apply the more appropriate word.

I've written on one of these situations in the past, namely that people often conflate what is being done and how it is being done, especially in the contexts of mathematics and programming. Today I'm writing about another case: the difference between something being hard, and something being unlikely.

First let me explain the distinction as I see it. Something is hard if doing it requires a significant amount of effort or skill. For example, it is hard to run a mile in under 4 minutes. On the other hand, something is unlikely if it happens very infrequently. For example, it is unlikely to win the jackpot prize of a lottery.

On the other hand, it's clear that winning a jackpot requires very little effort. For most lotteries, you buy a ticket and see if you won. For non-jackpot prizes, you can improve your results by buying many tickets with a good distribution to cover as many number combinations as possible, or other methods such as timing your ticket purchases for when they have higher expected value than cost. In those cases, it would be reasonable to attribute some amount of skill to playing the lottery, but in the specific case of a jackpot prize, you typically need to get every number correct, and so you cannot positively affect your chances of winning.

To some degree, we can estimate the degree to which an event is skill based by studying how repeatable it is. A person is no more likely to win a jackpot because they have won one before. On the other hand, someone who has run a four minute mile can realistically be expected to run one again much more often than someone who hasn't.

One personally relevant example is trainer fights in Pokemon Red, especially with regards to speedruns. It's common to hear someone to say "Misty is one of the hardest fights." But of course, this is wildly inaccurate. Misty is in fact one of the easiest fights. You simply need to select thrash as your first move, and then mash through text. What they mean, of course, is that Misty is one of the fights where you are most likely to die.

But there are legitimately difficult fights in a Pokemon Red speedrun. The one that comes to my mind is the rival fight in Silph Co. Winning the fight is decently straightforward, where you set up with a few items and then one shot all of the rival's pokemon. However, in the context of a speedrun, you don't just want to win the fight. You want to win the fight while ending in what is known as "red bar", which requires being below 5/24 of your maximum HP.

In order to maneuver into a red bar situation, there are several different sequences of items and moves that can be used. However, each turn contains several random components. The opposing pokemon will use a random move, which could either cause damage or not. If it does select a damaging attack, that attack will do a random amount of damage within a known range. Executing the fight properly is very difficult, as it requires making snap judgments about which sequence of moves will give the greatest probability of success.

Even perfect play at the Silph Rival fight will end in a bad outcome sometimes, be it death or not being in red bar. However, with practice and a strong understanding of how the probabilities in the battle, the chances of success can be increased substantially. So Silph red bar is something that is hard, as well as somewhat unlikely.

For another example, look at this article about splits in bowling. The article very clearly conflates hard and unlikely. In the entirety of the discussion, the focus is on how frequently or infrequently a spare is made given the pins that are left. At no point does it talk about what area of the pins the bowler is aiming for, or what sort of entry angle is required, or anything like that. The closest it gets is the quote from Parker Bohn III, where he suggests that a 7-10 split doesn't require very much from the bowler.

Now, I haven't made a 7-10 split, but the impression I get is that the bowler is essentially throwing their ball the same way as they would to pick up just one of the two pins, and sometimes that pin will pop out of the back and knock the other pin down. On the other hand, if you look at the 6-7-10, which is converted 7 times more frequently, the bowler needs to hit the very right side of the 6 pin so that it will fly over and knock down the 7 pin. Definitely a precise and difficult shot.

So should we say that the 6-7-10 is harder than the 7-10? Probably not, but really the data that we have doesn't tell us either way. First, we'd need to know if there is any entry angle and position at all that would consistently convert the 7-10. If there is, then a bowler could potentially aim for it. But if there isn't, then the shot is essentially a lottery, and it would not be sensible to call it hard, or at least not much harder than hitting the 7 or 10 pin in the first place.

While I'm on the topic, I think it's also important to stress that the data on spare conversions is inherently tainted. When a professional bowler leaves a tough split like a 6-7-10, a 4-7-10, or a 4-6-7-10, they will very often not attempt to convert it and instead aim to collect two of the pins with high probability rather than take an all-or-nothing risk. As a result, the conversion rate of splits like the greek church or big four will be deflated. Is the 7-10 deflated more or less than the greek church? The data in the article gives us no way to know.

The third example is perhaps the most impactful. If you read Hacker News like I do, you'll have surely seen discussions about the startup success rate. There are many people who see the statistic that over 90% of startups fail and conclude that the startup industry is essentially a lottery. They use this idea to suggest that people who have been part of a startup failure should not feel bad.

It seems reasonable to me that startups have some degree of luck, or uncontrollable factors that are indistinguishable from luck. However, it seems implausible that anyone would run their startup to perfectly maximize their chances. While a failure may not have directly been caused by a mistake from the founders or employees, it should be a reason to think about what could have been done better, especially if the involved people plan to work at a startup again.

Interestingly, this is sort of the opposite of conflation. Here the low probability is being treated as a sign that the event is independent from skill. In reality, of course, the probability of success and the amount of skill involved are rather orthogonal. There are plenty of things that take a lot of skill, but are very consistent once you have that skill. On the other hand, there are also plenty of things that are very unlikely, but take essentially no skill whatsoever.