22 March 2017

Dependent types are types that depend on values. With dependent types, in addition to being able to express a type that is "a list of integers," you could also express things like "a list of integers of length 5." Having access to dependent types allows you to express invariants whose values aren't known until runtime, but have relations that can be proven at compile time.

For example, consider the following Haskell function that zips two lists together:

zip :: [a] -> [b] -> [(a,b)]
zip _ [] = []
zip [] _ = []
zip (x:xs) (y:ys) = (x,y) : zip xs ys

This function has the property that if the two lists are different sizes, then the end of the longer one is essentially thrown away. In some cases, that might be undesirable or surprising behavior. So we might hope for a version of zip that can only be called when we know for sure that the two lists are the same length. If we have a list type that is parameterized by length, then that's possible. We'd give zip a type like forall a b (n : Int). List a n -> List b n -> List (a, b) n. (The syntax could vary considerably). As an added bonus, we'd no longer need two base cases, and we could instead replace them with the case zip [] [] = [].

Unfortunately, dependent types are very difficult to implement in a reasonable way. Ideally, we want to be able to encode an property in our type, and we want the type checker to prove that the code we write has that property. In extreme cases, these properties could be things like "This operation takes logarithmic runtime." The problem is that this means that the compiler needs to also be a theorem prover, and as the proofs become more and more complicated, the programmer is going to need to significantly guide the compiler.

With our current technology and techniques, using dependent types in this way is a far cry from the ideal of writing down properties and letting the compiler check that they hold (On the other hand, if you are willing to use randomize testing instead of theorem proving, tools such as QuickCheck do a good job).

That said, there are some applications of dependent types where the proofs are not complicated at all. For example, if we're working with numbers mod n, then in C we'd need to write code like ((a*b % n)*c % n), and so on, being very careful to % n after each operation to prevent overflow. It'd be much nicer and lead to simpler code if we could have a type that was "Integers modulo n", and have the arithmetic operators do the intermediate modulos for us. This kind of thing is possible in Haskell using the reflection package, but the resulting type signatures take a while to get used to, and feel much more complicated than the problem that I want to be able to solve.

In an application like this, our type parameter is just going to be a constant for a huge section of our program. It's trivial to prove equality in cases like this: we're just saying x = x. So I'd like to propose a variant of dependent types where only trivial proofs are allowed. For lack of a better name, let's call these trivial dependent types.

Implementing these in a Haskell-like language with immutable data should not be too hard. Suppose we have a dependent type Foo (x : Int), which has a numerical parameter. During type checking we might need to determine whether Foo x and Foo y are equal. The answer in this case is the same as the answer to whether x and y can share the same memory location. So if x is read from user input, while y is the result of a long computation, then Foo x and Foo y won't be compatible. On the other hand, if you write let y = x, then they will.

It should also straightforward to have type inference with this style of dependent types. A value parameter to a type can be represented in the compiler by the location in the AST that defines the expression, or an equivalence class of those locations that you're computing anyway in order to do common subexpression elimination.

In the presence of mutable values, inference will be harder, but should still be possible. You could, for example, have your type checker (but not your code generator!) interpret a changing value as the creation of a new variable of the same name that shadows the old one. In this way, if you try to use Foo n both before and after a potential modification, you'd get a type mismatch because the n after the assignment refers to a different n as before the assignment, even though those two variables will use the same memory location when it comes to code generation.

Loops also pose a potential problem, but again it should be possible to work with in. In addition to values that might have changed in prior lines, you also need to worry about the possibility that the value changed in lines that appear later in the loop but in earlier iterations. It's possible that this could lead you down a similar road that Rust went down to manage memory, but I'm not familiar enough with Rust's borrow checker to highlight the similarities.

I'd love to try out a language with these types to see if it feels expressive enough or if I'll just end up wishing I had the next step of dependent types, and I think it'd be a fun project to work on building such a language, but realistically I won't have the time or the motivation to do it. If someone out there is interested, let me know and I'd probably help out.

To close this post, I'll highlight a couple cool things that you could do with this type of dependent type, whether in functional or imperative style.

Augmented Data Structures

Binary search tree variants such as splay trees can often be used as a starting point for efficient algorithms by augmenting each node of the tree with some auxillary data, such as the sum of all the elements in that subtree. With trivial dependent types, we could write a tree parameterized by the type of the augmentation and the function that produces a node's augment from either its value (if the node is a leaf) or the augments of its children.

data TreeF a x = Leaf a | Branch x x
data AugmentedTree a aug (combine : TreeF a aug -> aug) = ...

Safe indexing without bounds checking

The technique here provides less help than something like refinement types, but we can still do something. If you write C code, it's very common to write a loop such as:

// a is an array of size N
for(int i = 0; i < N; i++) {
    // do something with a[i]
}

With trivial dependent types, we could create a type Index (x : Array a) that is parameterized by any type of array and represents an in-bounds index into that array. We could then write an indexing function index : (x : Array a) -> (i : Index x) -> a that is essentially implemented as x[i]. Then, we could define iteration constructs that give us an Index x instead of an int, and avoid any need to do bounds checking if we only index by values of this type.

Of course, we lose the ability to add two indices together, but that's an inherently unsafe operation, and in most cases we'd want a bounds check in that case anyway. Refinement types or similar techniques might be able to prove that a particular computation is safe, but once again those techniques are much more difficult to get right. While we lose some amount of arithmetic, we can still compare indices perfectly fine, and more generally provide any operation that cannot produce invalid indices.