Merriam-Webster defines an algorithm as a set of steps that are to be followed in order to solve a mathematical problem or to complete a computer process. This way of thinking is very forward-oriented. You start from some inputs, do some operations, create some output. Thinking this way you might imagine chaining together algorithms like logic gates.

There is also backward-oriented thinking: reducing one problem to another. For example, suppose you have relatively prime integers \( x \) and \( y \) and want to find integers \( a \) and \( b \) such that \( ax + by = 1 \). Following the idea of the Euclidean algorithm, you might notice that if \( x = qy + r \) then \( y \) and \( r \) will also be relatively prime. Then, if you could find \( a', b' \) such that \( a'y + b'r = 1 \), you could rewrite that equation as \( a'y + b'(x - qy) = (a' - b'q)y + b'x = 1 \). In other words, finding \( a' \) and \( b' \) are sufficient for finding \( a \) and \( b \). That is the idea behind the extended Euclidean algorithm. If forward-oriented thinking suggests chaining algorithms like logic gates, backward-oriented thinking suggests nesting algorithms within each other, and often recursively.

The perspectives don't stop there. One method that I've found repeatedly useful starts from considering the final result, like in backward-oriented thinking, but instead of looking for something that's sufficient to construct the result, to look for properties that are necessarily true about that result. In a sense, it's working forward from the end.

Working forward from the end is different from the other two perspectives I mentioned in that the results will not directly translate into an algorithm. Rather, it sheds light on structure that is inherent to the situation at hand. The understanding gained from investigating that structure illuminates the path from start to finish.

## Shortest Paths

Given a weighted graph and two vertices \( s, t \), find the path starting at \( s \) and ending at \( t \) with the minimal total weight.

For our purposes, we'll interpret edge weights as lengths, so that the minimal total weight path is the one that is "shortest". I have left some aspects of the graph unspecified. For example, is it directed or undirected? Are the edge weights allowed to be negative? These are questions that you probably know to ask from past experience with various shortest path algorithms. Let's put that aside and ask ourselves, "What is a shortest path?"

If there is a negative weight cycle reachable from \( s \) that can reach \( t \), then there is no shortest path from \( s \) to \( t \).

Indeed, in the presence of a negative weight cycle, by travelling around it repeatedly, you can make a path as short as you'd like. The only thing necessary for that to be possible is that there is any path that contains that cycle, and for that you just need to be able to get from the start to any point on the cycle, and from any point on the cycle to the end. Notice that in an undirected graph, a single negative weight edge means that there will be a negative weight cycle, because you can simply traverse that edge in both directions.

When \( s \neq t \), a shortest path from \( s \) to \( t \) must be composed of an edge from \( s \) to \( s' \) and a shortest path from \( s' \) to \( t \), for some \( s' \).

Why is this true? Well consider a shortest path from \( s \) to \( t \) and consider the second vertex along that path (the first one that isn't \( s \)). That vertex will be \( s' \). The remainder of the path must be a shortest path from \( s' \) to \( t \), or else we could swap that out to get a shorter path than the one we started with.

Similarly, we could imagine breaking up a path from the end, or any point in the middle.

When \( s \neq t \), a shortest path from \( s \) to \( t \) must be composed of a shortest path from \( s \) to \( t' \) and an edge from \( t' \) to \( t \), for some \( t' \).

If \( v \) is on a shortest path from \( s \) to \( t \), then that path is composed of a shortest path from \( s \) to \( v \) and a shortest path from \( v \) to \( t \).

Let's define the distance between two points as the length of a shortest path between them (\( -\infty \) if there are arbitrarily short paths, and \( \infty \) if there are no paths). We'll write the distance between \( s \) and \( t \) as \( d(s,t) \).

For any vertex \( v \), \( d(s, t) \leq d(s, v) + d(v, t) \).

This is a restatement of an obvious fact. Going from \( s \) to \( v \) to \( t \) is just another way of going from \( s \) to \( t \), and merely adding constraints can't produce a shorter path. But let's go a bit deeper.

If \( v \) is on any shortest path from \( s \) to \( t \), then \( d(s, t) = d(s, v) + d(v, t) \).

This is essentially a distance version of the idea that we can decompose a shortest path into two shortest paths by splitting it at any intermediate point. Similarly, we could break it right after the start, or right before the end. Let \( e(x,y) \) denote the length of an edge between \( x \) and \( y \).

When \( s \neq t \), there is some \( s' \) such that \( d(s, t) = e(s, s') + d(s', t) \).

When \( s \neq t \), there is some \( t' \) such that \( d(s, t) = d(s, t') + e(t', t) \).

Let's take that last version and refine it a bit.

When \( s \neq t \), \( d(s, t) = \min_{t'} d(s, t') + e(t', t) \). Furthermore, if all of the edge weights are positive, then we have \( d(s, t) = \min_{d(s, t') < d(s, t)} d(s, t') + e(t', t) \).

And thus we've essentially reached Dijkstra's algorithm. To compute shortest paths, we start from a source and work our way outward, ordering vertices by their distance from the source. By visiting vertices in that order, we ensure that the distance of the next vertex depends in a simple way on the edge weights and the distances that we've already computed. Of course, that order isn't known ahead of time, but it can be determined over the course of the computation.

Every shortest path algorithm that I've seen follows a similar pattern of embodying some particular facts about shortest paths. Sometimes these are quite general, such as Bellman-Ford which uses only facts that are still truei n the presence of negative edge weights. Other times they are quite specific, such as breadth-first search, which assumes that all the edges have the same weight.

## Maximal Vector Sums

Given a set of vectors in \( \mathbb{R}^2 \), find the subset whose sum has the largest magnitude.

I came across this problem in the context of competitive programming, though the key insight is identical to that of IMO 1997 Shortlist Problem 3. In the spirit of working forward from the end, we want to ask ourselves, "What are properties that the maximizing subset must satisfy?"

Certainly, it must be the case that adding another vector does not increase its magnitude.

Let \( u \) be the vector sum of a maximizing subset, and \( v \) be any vector not in that subset. Then \( |u + v| \leq |u| \).

The magnitude of a vector sum isn't particularly nice, but the squared magnitude is nicer: \( |u + v|^2 = |u|^2 + |v|^2 + 2u\cdot v \). So we can say:

Let \( u \) be the vector sum of a maximizing subset, and \( v \) be any vector not in that subset. Then \( |v|^2 + 2u\cdot v \leq 0 \). In particular, \( u\cdot v \leq 0 \), and that is strict if \( v \) is nonzero.

Similarly, we can consider vectors that are in the subset:

Let \( u \) be the vector sum of a maximizing subset, and \( v \) be any vector in that subset. Then \( |v|^2 - 2u\cdot v \leq 0 \). In particular, \( u\cdot v \geq 0 \), and that is strict if \( v \) is nonzero.

These facts characterize the potential maximizing subsets. They can be defined by some vector \( u \) and include all of the vectors \( v \) with \( u \cdot v > 0 \). Any subset that isn't of this form can't possibly maximize the sum's magnitude. Thus, we arrive at a fast algorithm for finding the maximizing subset: Sort the vectors by angle and look at every possible half plane. Since the actual set of vectors contained in a half plane only changes when the defining vector \( u \) is orthogonal to a given vector, there are only a linear number of half planes that need to be checked, and the sum can be updated by adding and removing the appropriate vectors rather than recomputing every time.

There is also another way to reach the same conclusion that I'm a little bit more fond of, but may seem a little bit trickier. The idea is to realize that the sum of a maximizing subset also maximizes a related quantity.

Let \( u \) be the vector sum of a maximizing subset, and \( \hat{u} \) be the unit vector in that direction. Then the same subset maximizes the dot product of the vector sum with \( \hat{u} \).

This is simple to prove, as for any vector \( x \), \( x \cdot \hat{u} \leq |x| \), but \( u \cdot \hat{u} = |u| \). So if \( x \) is the vector sum of any subset, we have \( x \cdot \hat{u} \leq |x| \leq |u| = u \cdot \hat{u} \).

Let \( w \) be any unit vector. The subset of vectors \( v \) with \( v \cdot w > 0 \) is a subset whose vector sum maximizes \( x \cdot w \).

So by relating the problem at hand to a simpler problem, we realize that the solutions are the same, except that we don't know what to choose for \( w \). But as before, this characterization reduces the number of candidate subsets to a mere linear number, which can be efficiently searched.