A place to be (re)educated in Newspeak

Sunday, June 05, 2011

Types are Anti-Modular

Last week I attended a workshop on language design. I made the off-the-cuff remark that types are actually anti-modular, and that comment resonated enough that I decided to tweet it. This prompted some questions, tweets being a less than perfect format for elaborate explanation of ideas (tweets are anti-communicative?). And so, I decided to expand on this in a blog post.

Saying that types are anti-modular doesn’t mean that types are bad (though it certainly isn’t a good thing). Types have pros and cons, and this is one of the cons. Anyway, I should explain what I mean and how I justify it.

The specific point I was discussing when I made this comment was the distinction between separate compilation and independent compilation. Separate compilation allows you to compile parts of a program separately from other parts. I would say it was a necessary, but not sufficient, requirement for modularity.

In a typed language, you will find that the compiler needs some information about the types your compilation unit is using. Typically, some of these types originate outside the compilation unit. Even if your program is just: print(“Hello World”), one may need to know that string literals have a type String, and that the argument type of print is String. The definition of String comes from outside the compilation unit. This is a trivial example, because it is common for String to be part of the language definition. However, substantial programs will tend to involve user-defined types at the boundaries of compilation units (or of module declarations,which may or may not be the same thing).

A consequence of the above is that you need some extra information to actually compile. This could come in the form of interface/signature declaration(s) for any type(s) not defined within your compilation unit/module, or as binary file(s) representing the compiled representation of the code where the type(s) originated. Java class files are an example of the latter.

Whatever the form of the aforementioned type information, you depend on it to compile your code - you cannot compile without it. In some languages, this introduces ordering dependencies among compilation units. For example, if you have a Java package P1 that depends on another package P2, you cannot compile P1 before compiling P2. You either compile them simultaneously (giving up on even separate compilation) or you must compile P2 first so you have class files for it around. The situation is better if the language supports separate signature declarations (like Modula-3 or ML) - but you still have to have these around before you compile.

Semi-tangent: Of course, you can fake signature declarations by dummy package declarations. Java chose to avoid the conceptual overhead of separate signature declarations, on the assumption that pragmatically, one could get by without them.

Contrast this with independent compilation, where you compile your module/compilation-unit independently of anything else. The code that describes the values (and types) that your module requires may not even exist yet. Obviously, independent compilation is more modular than separate compilation. How do you achieve this in the presence of types? The short answer is you don’t.

Wait: we are blessed, and live in a world where the gods have bestowed upon us such gifts as type inference. What if I don’t have to write my types at all, and have the compiler just figure out what types I need? The problem is that inference only works well within a module (if that). If you rely on inference at module boundaries, your module leaks implementation information. Why? Because if you infer types from your implementation, those types may vary when you modify your implementation. That is why, even in ML, signatures are declared explicitly.

Wait, wait: Surely optional types avoid this problem? Not exactly. With an optional type system you can compile independently - but you cannot typecheck independently. Indeed, this is the point: there is no such thing as modular typechecking. If you want typechecking across modules, you need to use some of the same types in across modules. You can either replicate the types or place them in some specific module(s). The former clearly isn’t very modular. The latter makes some modules dependent on declarations defined elsewhere which means they cannot be typechecked independently. In the common case where types are mandatory, modules can not be compiled independently.

Now, there is an argument to be made that modules have dependencies regardless, and that we cannot reason about them without being aware of these dependencies. Ergo, the types do not make change things fundamentally. All true. Even in dynamic language we have some notion of type or signature in our head. Formalizing that notion can be helpful in some ways, but it has downsides. One such downside is that formalizing the types reduces our ability to manage things in a perfectly modular way. You cannot typecheck modules in isolation (except in trivial cases) because types capture cross-module knowledge.

One often hears the claim that types are in fact valuable (or even essential) to modularity because they can describe the interface(s) between modules. There lies the problem: the type cannot serve this purpose unless it is available in more than one module. Types are inherently non-local - they describe data that flows across module boundaries. The absence of types won’t buy you modularity on its own though. Far from it. But types and typechecking act as an inhibitor - a countervailing force to modularity.