3 August 2016

Recently, I saw two articles complaining about mathematical notation. The first complained about the fact that \( x^n \), \( \log_x(y) \), and \( \sqrt[n]{y} \) have very different notations, despite the fact that they are all related to the single equation \( x^n = y \). The second complained about the notation for expected value, as well as the fact that there are multiple competing notations for the same thing.

Now, I'm of the opinion that mathematical notation is pretty great, and part of the reason I like Haskell so much is that its syntax is reminiscent of mathematical notation. In fact, I am generally looking for ways to make my programming more like math, whereas I see most people being the other way around. So I thought it would be interesting to think about why I think mathematical notation is so great.

The purpose of notation

In order to fairly consider proposed notational changes, we first have to know what we're trying to get out of our notation. It's very important to realize that notation was not created to be a tool to scare off newcomers or anything like that.

In fact, some notation is so ubiquitous that we teach it to everyone, such as writing numbers in Arabic numerals. We could write 1527 perfectly well as "one thousand five hundred twenty-seven", but the numeral format is significantly easier to work with and understand once you get used to it. Children do take some time to get used to the idea of "place values" and so on before being comfortable with writing numbers this way, but I think nearly everyone agrees that it is worth the effort.

With more advanced notation, we should be considering it the same way. So what exactly are the payoffs that we can expect from good notation?

Notation should help understanding existing ideas.

Fundamentally, written mathematics is about communicating ideas. The author has some idea in mind, and wants the reader to understand it as well. A long time ago, people attempted to communicate these ideas purely in words, but over time it became clear that notation could assist in communicating the ideas more rapidly.

As an example, we can compare the quadratic formula given by Brahmagupta to the way it's presented with modern algebraic notation:

To the absolute number multiplied by four times the [coefficient of the] square, add the square of the [coefficient of the] middle term; the square root of the same, less the [coefficient of] middle term, being divided by twice the [coefficient of the] square is the value.

We could translate this to modern notation (note that this procedure only defines one solution) as

A solution to \( ax^2 + bx + c = 0 \) is \( x = \frac{\sqrt{4ac + b^2} - b}{2a} \).

It is significantly easier to understand the modern version, despite the fact that Brahmagupta's version doesn't use any words that you don't already know. That's a sign that introducing this notation is valuable.

Notation should help generating and understanding new ideas.

The ideas to communicate have to come from somewhere, and so a good notation should be helpful during the process of coming up with these ideas. To use algebraic notation as an illustrative example again, it is possible to derive the quadratic formula with some algebraic manipulations:

  • \( ax^2 + bx + c = 0 \)
  • \( x^2 + \frac{b}{a}x + \frac{c}{a} = 0 \)
  • \( x^2 + \frac{b}{a}x + \frac{b^2}{4a^2} + \frac{4ac - b^2}{4a^2} = 0 \)
  • \( (x + \frac{b}{2a})^2 = \frac{b^2 - 4ac}{4a^2} \)
  • \( x + \frac{b}{2a} = \frac{\pm\sqrt{b^2-4ac}}{2a} \)
  • \( x = \frac{-b \pm \sqrt{b^2-4ac}}{2a} \)

I'll leave trying to express this chain of deductions with only words as an exercise for the reader who remains unconvinced that notation is meant to be helpful.

On first blush, it might seem that it would be acceptable to have two notations, one for generating an idea and then another for communicating that idea. But I believe that would just be counterproductive. Similar to memorizing algorithms, memorizing formulas doesn't get you very far in math. True understanding involves internalizing the motivations and concepts behind the results. In other words, the reader should not only be interested in learning what the author came up with, but also how they came up with it. Or, in some cases, a more efficient path to come up with the same results than the one the author took.

Notation should be precise.

Natural language is often fuzzy, though we don't necessarily notice it day to day. While rendering math in words might work for simple cases, it breaks down quite quickly. Here's an example: "eleven minus five" is fine; it means \( 11 - 5 \). Now what about "eleven minus five minus three"? Well, it probably means \( 11 - 5 - 3 \), but maybe it means \( 11 - (5 - 3) \), which is quite a different beast.

Maybe we introduce the phrase "the quantity" to distinguish these two. Now "eleven minus five minus three" is \( 11 - 5 - 3 \), and \( 11 - (5 - 3) \) would be "eleven minus the quantity five minus three." But this convention only holds up for a moment: does "eleven minus the quanitty five minus three minus two" mean \( 11 - (5 - 3) - 2 \) or \( 11 - (5 - 3 - 2) \). A good notation makes these distinctions clear.

Non-purposes of notation

These three goals are great, but it's also useful to think about what would not be a good argument in favor of a notational change.

Notation does not need to be independent of context.

There are different amounts of context. For example, English has multiple words spelled "read". Which word is meant depends on the surrounding context (and sometimes even that isn't enough). That sort of ambiguity is a legitimate negative, because "read" and "read" can often appear in similar situations.

However, there are other kinds of context that really shouldn't be considered issues. For example, you hear someone say "Hi." Maybe you think that they are greeting someone. But what if you're in Japan? Maybe they actually said 「はい」. The context of which language you are listening to is important for understanding the meaning behind words. The fact that English and Japanese have words that are pronounced the same isn't an issue with either language.

Similarly, it's not an issue that the word "coerce" has very different meanings in programming and in law. It's not an issue that the word "force" has a different meaning in physics from economics. It also is not an issue that the symbol \( \wedge \) can mean either the boolean and operator or the exterior product, because those two concepts almost never appear in the same context.

In the cases where multiple different meanings for a symbol are used in close proximity, it may be useful to have notation where there is an option to make the meaning and/or context explicit. This occurs, for example, in category theory. When you work with a single category, there is no ambiguity about what is meant by \( \text{Hom}(X, Y) \). But if you have multiple categories, you'll often see a subscript to emphasize which category the morphisms belong to (e.g. \( \text{Hom}_D(FX, Y) \)), despite the fact that it could be determined by the arguments.

Notation may require education to understand.

Just like the fact that children take time to understand how place values work when we write numbers with arabic numerals, it is perfectly fine for a notation to have some aspects that aren't immediately obvious to a newcomer, as long as learning those features serve a purpose. A common pattern in criticisms of syntax or notation is that the critic has not understood the underlying concepts deeply enough to understand that a complication is fundamental rather than artificial. Again, while you might think that having separate notation that elides the complication is beneficial, but it would just end up requiring people to eventually unlearn the original notation, and cause confusion because they aren't used to thinking about those difficulties.

Responses to a few notational criticisms

So I want to close out this post with a couple of the common complaints I see about mathematical notation.

Why is everything one letter?

This is probably the most common complaint about math that I see from programmers. There are a few important realizations to make. First, equations and other things written with this kind of notation generally are emphasizing the relationships and interactions between multiple conceptual objects, and are not interested in the objects themselves. While a longer name may make it more clear what a particular object is, that starts interfering with seeing its connections to the other objects.

Additionally, using single symbols often leads to more visually distinct names. For example, mathematicians often write composition as \( f \circ g \), meaning the function such that \( (f \circ g)(x) = f(g(x)) \). A programmer might object to the use of the single letter names \(f\) and \(g\), and maybe also object to the symbol \(\circ\). Maybe we propose the alternative "compose function1 function2". Incidentally, the fact that the functions are numbered is potentially confusing, since function2 is applied before function1. But furthermore, it is much easier to visually distinguish \( f \circ g \) from \( g \circ f \) than it is to distinguish "compose function1 function2" from "compose function2 function1".

The other main realization is that math notation isn't meant as a complete replacement for prose. Writing an equation in a math paper is much more like including an illustration than including a paragraph of text. It is a compact representation that illustrates the concept, and not something that stands alone. Once a reader becomes familiar enough with the conventions of a field or a particular author, they might be able to start focusing purely on the parts written in notation, but an author really does not expect you to guess what a particular letter stands for unless it's already a field-wide standard (and even then, they will often define it).

The one letter variable names aren't there to scare people off from learning. On the contrary, they are there because compact notation is more amenable to rapid manipulation. Finding a new idea that works often involves trying out many ideas that don't. Therefore, we prefer notation that allows this process to occur faster.

The same word/symbol is used all over the place!

It's understandable to be confused at first why we might use the symbol \( \times \) for multiplying numbers (at least in elementary school), but then also use it for the cartesian product of sets. Not even exponentiation is safe! If \( X \) and \( Y \) are sets then \( X^Y \) means the set of functions from \( Y \) to \( X \). And in this context, "3" will often mean the set \( \{0, 1, 2\} \).

In reality, most of the time when a symbol is used in multiple ways, it's completely on purpose. Similarity of notation is used to emphasize similarity between concepts. The set of functions from \( Y \) to \( X \) really does look like an exponent. It's even called an exponential because of these similarities.

Using familiar notation for a new concept is very powerful. It sets up expectations for things that should be true. For example, in the real numbers we have \( (xy)^z = x^zy^z \). With set exponentials, the analogous fact \( (X \times Y)^Z \simeq X^Z \times Y^Z \) is also true. Aligning our expectations with a more familiar context aids in finding proofs for new theorems. If I want to prove a statement about set exponentials, I can often think about how I would prove the corresponding statement for numbers, and then translate the proof.

I think of this proof technique as "proof by analogy". You don't use the analogy when writing up the actual proof, but it helps immensely for thinking it up. When proving lemmas about arbitrary rings, it is often (though not always) the case that a proof for the integers directly translates to a proof for arbitrary rings. Sometimes you need to use a particular property of the integer, so the proof directly translates for Noetherian rings, or for unique factorization domains, or another particular type of ring. Similarly, when you start working with various algebraic objects, if they form a cartesian closed category, many theorems about them can be guided by analogy with the corresponding statement for the category of sets.

A lot of complaints about notation ring hollow because they either don't propose a new notation or their new notation doesn't provide enough improvement over the existing notation to warrant introducing a new competing notation. It can be easy to think, as a result, that mathematicians are resistant to new notation. In truth, notation is constantly evolving, and if a notation is superior for both communcation and generation of ideas, that notation will stick.

One example of this is Chinese Dumbass Notation (As of the time of writing, the link is broken. I hope it comes back up, but I might have to recreate the pdf before the contents disappear for good). I wasn't the first person to come up with writing three variable homogenous polynomials as triangular arrays, but I believe I was the first person to really write up examples of how the notation is helpful for proving inequalities faster, and many future math olympiad competitors have adopted the notation because they have found it helpful.