10 November 2013

I've worked with Python a great deal, and one of the things I've wanted most from it is a more powerful type system and static type checking. One reason for this is that the type information is one of the most important parts of the documentation, and if I'm going to include it in the source code, I want it to actually mean something to the compiler, so that it is easy to verify both that the type information is up to date and that everything has consistent views of the data.

Often I find myself looking at the documentation for something and I can understand that it does what I want, but I have no idea how to use it either because I don't understand what sorts of arguments it is expecting or how it is returning the data. For a recent example, I'll pick on the Twitch API. In particular, let's look at the user resource.

The documentation for GET /users/:user at the time of writing is "Returns a user object." Thankfully, the structure of the URL is pretty self-evident, but I could not find any documentation about what it means to be a user object, except for the example response

{
  "name": "test_user1",
  "created_at": "2011-03-19T15:42:22Z",
  "updated_at": "2012-06-14T00:14:27Z",
  "_links": {
    "self": "https://api.twitch.tv/kraken/users/test_user1"
  },
  "logo": "http://static-cdn.jtvnw.net/long/image/url.jpeg",
  "_id": 21229404,
  "display_name": "test_user1"
}

Unfortunately, isn't really sufficient to get an idea of what makes a user object. Are all of these fields always going to be present? Well, the documentation is at least clear on that point: "Blank fields are included as null instead of being omitted." However, that just brings up the related question: which fields are nullable? One could try to argue that any field is nullable, but surely the name field should always be present, at the very least. So the best solution I found is to assume that a field is not nullable, and then correct that assumption when you get an error due to receiving null. If Twitch had documentation with Haskell-style typing, this information could be more easily expressed.

This lack of clarity in the definition in what makes an object type seems to be quite prevelant in modern programming languages. One of the ideas behind duck typing is explicitly not checking for types but instead using more traditional forms of documentation to convey similar information. However, I've found that while it's pretty common for arguments to be documented well, return values aren't nearly as well done.

At my internship this last summer, I was working with some code that most of the time would return a list of events, but sometimes it would return an event object (not a list of one event) that was supposed to signal an error case. This feature wasn't well documented I wasn't aware of it until it made my code crash, while a suitable type system would have caught the error much sooner.

What I find in practice is that the use of documentation to convey type information is that it is much less reliable than a static type system that can be checked by a compiler. There are often many assumptions spread throughout the code and it's often not clear whether you know about all of them.

I think Haskell's type system handles these assumptions very well. Haskell's version of duck typing is type classes -- if a type defines (==), it's an Eq. If a type constructor defines return and (>>=), it's a monad. In this way, the compiler can keep track of these assumptions and as a result you'll get notified immediately when you make an incorrect one. Furthermore, because the Haskell compiler does type inference, you get these advantages with very few type annotations required.