10 February 2016

Coming from a heavy math background, a lot of the culture in the software development world has been surprising. So many ideas that I thought were standard seem to be almost unheard of. One of these ideas is the distinction between "What is it doing" and "How is it doing it." Moreover, many ideas have more than one way to describe what they are, and changing viewpoint can be invaluable for solving problems.

Triangle Centers

Let's start with an example from geometry. What is the circumcenter of a triangle? It can be defined as the center of the circle passing through the vertices of the triangle. Notice that this definition said nothing of how to actually find a circumcenter.

So if we have a triangle, how do we find a circumcenter? We draw the perpendicular bisectors for two of the sides and their intersection is the circumcenter. How do we draw the perpendicular bisector for a line segment? We draw a circle centered at each vertex with the segment as its radius, then those two circles will intersect at two points and the line between them is the perpendicular bisector.

But these algorithmic descriptions strongly hide a lot of the structure that is present in the constructions. Consider the following definitions:

  • A circle is the set of points that are a specified distance (called the radius) from a specified point (called the center).
  • The perpendicular bisector of a line segment is the set of points that are the same distance from each of its vertices.
  • The circumcenter of a triangle is the point that is equally distant from its three vertices.

These are all statements that could in theory be derived from the constructions, but are not at all obvious. And yet these are the definitions that are the easiest to reason about. For example, why should a circumcenter even exist? The algorithmic definition tries to hide this because if you only draw two perpendicular bisectors then they'll intersect at a well-defined point, but there's still the question of whether that point depends on which two sides you decide to bisect.

But when you look at this set of definitions, it's not too difficult to prove that the circumcenter exists and that the algorithm above produces it regardless of which two sides you choose to bisect:

Let the vertices of the triangle be \( A \), \( B \), and \( C \). Consider the intersection of the perpendicular bisectors of \( AB \) and \( AC \) and call it \( O \). By the definition of the perpendicular bisector, \( OA = OB \) and \( OA = OC \). Therefore, we also have \( OB = OC \), so \( O \) also lies on the perpendicular bisector of \( BC \). Therefore, it did not matter which two bisectors we chose. Furthermore, since \( OA = OB = OC \), there is a circle centered at \( O \) passing through \( A \), \( B \), and \( C \).

This illustrates a pattern that shows up many times in mathematics, where a particular concept has multiple definitions or viewpoints. Often, the different definitions will be useful for different things. In this case, we have:

  • The circumcenter of a triangle is the intersection of the perpendicular bisectors of the sides.
  • The circumcenter of a triangle is the center of the circle passing through the triangle's vertices.
  • The circumcenter of a triangle is the point that is equidistant from the triangle's vertices.

All of these definitions are equally valid, and all are useful to consider, depending on the circumstance.

Invertible Matrices

This idea is embodied to an extreme extent in the invertible matrix theorem which says that the following are all perfectly good definitions of an \( n \times n \) invertible matrix (a subset of the list in the linked article):

  • There exists a matrix \( B \) such that \( AB \) is the identity matrix.
  • The determinant of \( A \) is nonzero.
  • For \( x \in \mathbb{R}^n \), if \( Ax = 0 \) then \( x = 0 \).

These can be thought of simply as properties of invertible matrices, but that is only really half of the story. Any one of these properties can also be used for proving that a matrix is invertible. Depending on the circumstance, any of these definitions might be most convenient.

However, once again notice that most of these definitions are not algorithmic. Imagine that you were given a formula for the determinant for a matrix, and told that a matrix is invertible if it is nonzero. Sure, you can execute the algorithm and successfully determine whether a matrix is invertible or not, but you really have no idea about what invertibility means or why it is important.

In reality, some of these definitions are better for establishing that a matrix is invertible, some are better for establishing that a matrix is not invertible, and some are very useful once you know that a matrix is invertible but hard to prove on their own.

Data Structures

I have found that it is common for people to use "priority queue" and "heap" interchangeably. This is a case where the what and the how are being conflated.

A priority queue is a data structure that supports the following operations:

  • Insert an element
  • Find the minimum element
  • Delete the minimum element

Nothing is said about how any of these operations work internally. The concept of a priority queue is very much about the what.

On the flip side, a heap is a specific strategy often used for implementing a priority queue. A heap is a tree (usually, but not always binary) where each node has a smaller value than all of its children.

It is worth mentioning that the definition I just gave is also a description of what a heap is, and not how to build one or how it is used. At one level, the heap is the what, and the actual implementation of the heap operations is the how. But there's also a level where the priority queue is the what, and the heap invariants are the how.

After all, there are plenty of other ways of implementing priority queues, and not all priority queues behave the same way. For example, you could use a binary search tree as a priority queue. Or you could use a Brodal queue. Or you could even have a list that gets continually scanned in order to find the minimum.

Universal Properties

Going back to math, I think nothing really illustrates the separation of what and how more than the idea of universal properties. For example, the product of two sets is often defined as \( A \times B = \{ (x, y)\ |\ x \in A, y \in B \} \).

However, it can also be seen that the product of two sets \( A \) and \( B \) is the set \( X \) with two maps \( \pi_1 : X \to A, \pi_2 : X \to B \) such that if we have any set \( T \) with maps \( f_1 : T \to A \) and \( f_2 : T \to B \), then there is a unique map \( g : T \to X \) such that \( f_1 = \pi_1 \circ g \) and \( f_2 = \pi_2 \circ g \).

If we start with the explicit construction, we can see that it satisfies the universal property because we can take \( \pi_1((x,y)) = x \) and \( \pi_2((x, y)) = y \), and then \( g(t) = (f_1(t), f_2(t)) \) is the unique function satisfying the required equations.

But in a sense, the universal property is really "what" the product is. It doesn't make any reference to the elements of a set, so it allows us to understand the product of sets while looking just at sets themselves and maps between them. That allows it to give a much more general understanding of products, and the universal property formulation generalizes to contexts where we don't have elements, or where taking tuples doesn't give us what we want.

What Does This Code Do?

Finally, let's get to what caused me to write a post on this topic in the first place. In discussions on the Go programming language I often see the opinion that Go's simplicity makes programs easier to understand. For example, these three Hacker News comments illustrate the opinion:

In my experience working with Go, the language offers essentially no help in understanding what a piece of code does, and instead has code that is a description of how it is done. Furthermore, the language actively impedes attempts to factor out common structure to make it more clear what is being done.

Here is a piece of code taken from the Docker source code.

// FilterByHosts filters the list of PublicKeys to only those which contain a
// 'hosts' pattern which matches the given host. If *includeEmpty* is true,
// then keys which do not specify any hosts are also returned.
func FilterByHosts(keys []PublicKey, host string, includeEmpty bool) ([]PublicKey, error) {
    filtered := make([]PublicKey, 0, len(keys))

    for _, pubKey := range keys {
        var hosts []string
        switch v := pubKey.GetExtendedField("hosts").(type) {
        case []string:
            hosts = v
        case []interface{}:
            for _, value := range v {
                h, ok := value.(string)
                if !ok {
                    continue
                }
                hosts = append(hosts, h)
            }
        }

        if len(hosts) == 0 {
            if includeEmpty {
                filtered = append(filtered, pubKey)
            }
            continue
        }

        // Check if any hosts match pattern
        for _, hostPattern := range hosts {
            match, err := filepath.Match(hostPattern, host)
            if err != nil {
                return nil, err
            }

            if match {
                filtered = append(filtered, pubKey)
                continue
            }
        }
    }

    return filtered, nil
}

This code is well documented with its intention, but imagine you're doing a code review, and think about how much of this function you need to look at to determine that it is in fact doing a filter.

  • It loops over every element of keys.
  • The only shared state between iterations of the loop is the output array.
  • The only operation being done to the output array is appends.
  • Each element is appended at most once.
  • Whether an element is appended is independent of what other elements have been appended.

When just reading the code, identifying that it is filtering an array requires checking the entire loop. Here's how this code might be written with hypothetical filterOrError and anyOrError functions:

// FilterByHosts filters the list of PublicKeys to only those which contain a
// 'hosts' pattern which matches the given host. If *includeEmpty* is true,
// then keys which do not specify any hosts are also returned.
func FilterByHosts(keys []PublicKey, host string, includeEmpty bool) ([]PublicKey, error) {
    return filterOrError(
        func(key PublicKey) (bool, error) {
            var hosts []string
            switch v := pubKey.GetExtendedField("hosts").(type) {
            case []string:
                hosts = v
            case []interface{}:
                for _, value := range v {
                    h, ok := value.(string)
                    if !ok {
                        continue
                    }
                    hosts = append(hosts, h)
                }
            }

            if len(hosts) == 0 {
                if includeEmpty {
                    return true, nil
                }
            }

            return anyOrError(
                func(hostPattern string) (bool, error) {
                    return filepath.Match(hostPattern, host)
                },
                hosts,
            )
        },
        keys,
    )
}

This code says right up front that it is filtering the keys parameter, so as a reader of this code I already know that a non-error output will be a subset of the keys in the input, and the inner function is telling me which ones will be included. When I get to the anyOrError call, I can know that the condition is based on whether any of the hosts satisfy a condition, and reading a little further, I can see that the condition is a call to filepath.Match.

In other words, I would push for this rewrite because it is more explicit about what it is doing. You could make the argument that it is hiding how the computation is being done, but those steps are purely mechanical and actually make it harder to understand what is happening, not easier.

For completeness, here are hypothetical implementations of filterOrError and anyOrError. These are not possible to implement in Go as it stands because the type system does not allow for this kind of polymorphism in user written code (some builtins, such as append, do have it).

func filterOrError(func f(A) (bool, error), as []A) ([]A, error) {
    filtered := make([]A, 0)
    for _, a := range as {
        pass, err := f(a)
        if err != nil {
            return nil, err
        }
        if pass {
            filtered = append(filered, a)
        }
    }
    return filtered, nil
}

func anyOrError(func f(A) (bool, error), as []A) (bool, error) {
    for _, a := range as {
        pass, err := f(a)
        if err != nil {
            return false, err
        }
        if pass {
            return true, nil
        }
    }
    return false, nil
}