A place to be (re)educated in Newspeak

Wednesday, August 15, 2007

Object Initialization and Construction Revisited

In my last post, which discussed object initialization and construction, I had promised to come back to the topic and clarify it with concrete examples. I've finally found time to do that; hopefully I will dispel some of the misunderstandings that the last post engendered, no doubt replacing them with fresh, deeper misunderstandings.

Below is a standard example - a class representing points in the plane. What’s non-standard is that it is written in Newspeak, an experimental language in the style of Smalltalk, which I and some of my colleagues are working on right now. In cases where the syntax is non-obvious, I’ll use comments (Pascal style, like so: (* this is a comment *)) to show how a similar construct might be written in a more conventional (and less effective) notation.

class Point2D x: i y: j = ( 
(* Javanese version might look like this :
class Point2D setXY(i, j) { ...} *)
(*A class representing points in 2-space” *)
|
public x ::= i.
public y ::= j.
|
) ( (* instance side *)

public printString = (
ˆ ’ x = ’, x printString, ’ y = ’, y printString
)
)


this declaration introduces the class Point2D. The class name is immediately followed by a message pattern (method signature for readers of Javanese) x: i y: j. This pattern describes the primary constructor message for the class. The pattern introduces two formal parameters, i and j, which are in scope in the class body. The result of sending this message to the class is a fresh instance, e.g.:

Point2D x: 42 y: 91 
(* In Javanese, you might write Point2D.setXY(42, 91);
But don’t even think of interpreting setXY as a static method!
*)


yields a new instance of Point2D with x = 42 and y = 91. The message causes a new instance to be allocated and executes the slot initializers for that instance, in this case

x ::= i.
y ::= j.


The slots are accessed only through automatically generated getters (x and y) and setters (x: and y:).

How is all this different from mainstream constructors?
Because an instance is created by sending a message to an object, and not by some special construct like a constructor invocation, we can replace the receiver of that message with any object that responds to that message. It can be another class (say, an implementation based on polar coordinates), or it can be a factory object that isn’t a class at all.

Here is a method that takes the class/factory as a parameter

makePoint: pointFactory = (
(* In Javanese:
makePoint(pointFactory) {
return pointFactory.setXY(100, 200)
}
*)
^pointFactory x: 100 y: 200
)


We can invoke this so:

makePoint: Point2D


but also so:

makePoint: Polar2D


where Polar2D might be written as follows:

class Polar2D rho: r theta: t = (
(* A class representing points in 2-space”*)
|
public rho ::= r.
public theta ::= t.
|
) ( (* instance side *)
public x = ( ^rho * theta cos) (* emulate x/y interface *)
public y = (^rho * theta sin)
...
public printString = (
ˆ ’ x = ’, x printString, ’ y = ’, y printString
)
) : ( (* class side begins here*)
public x: i y: j = (
| r t |
t := i arcCos.
r := j/ t sin.
ˆrho: r theta: t
)
)


Here, Polar2D has a secondary constructor, a class method x:y:, which will be invoked by makePoint:.

You cannot do this with constructors or with static factories; you simply cannot abstract over them.

You could use reflection in Java, passing the Class object as a parameter and then searching for a constructor or static method matching the desired signature. Even then, you would have to commit to using a class. Here we can use any object that responds to the message x:y:.

Using Java core reflection in this case is awkward and verbose, and historically hasn’t been available on configurations like JavaME. And it doesn’t work well with proxy objects either (see the OOPSLA 2004 paper we wrote for details). What’s more, you may not have the right security permissions to do it. The situation is not much better with the VM from the makers of Zune (tm) either.

Zune is a trademark of Microsoft Corporation. Microsoft is also a trademark of Microsoft Corporation. But GNU’s not Unix

Alternatively, you could also define the necessary factory interface, implement it with factory classes, create factory objects and only pass those around. You’d have to do this for every class of course, whether you declared it or not. This is tedious, error prone, and very hard to enforce. The language should be doing this for you.

So far, we’ve shown how to manufacture instances of a class. What about subclassing? This is usually where things get sticky.

Here’s a class of 3D points

class Point3D x: i y: j z: k = Point2D x: i y: j (
(* A class representing points in 3-space *)
| public z ::= k. |
) (* end class header *)
( (*begin instance side *)
public printString = (
ˆsuper printString, ’ z = ’, z printString
)
)


One detail that’s new here is the superclass clause: Point3D inherits from Point2D, and calls Point2D’s primary constructor. This is a requirement, enforced dynamically at instance creation time. It helps ensure that an object is always completely initialized.

Unlike Smalltalk, one cannot call a superclass’ constructors on a subclass. This prevents you from partially instantiating an object, say by writing:

Point3D x: 1 y: 2  (* illegal! *)


without initializing z as intended. Also, unlike Smalltalk, there’s no instance method that does the initialization on behalf of the class object. So you cannot initialize an object multiple times, unless the designer deliberately creates an API to allow it. The idea is to ensure every object is initialized once and only once, but without the downsides associated with constructors.

Preventing malicious subclasses from undermining the superclass initialization takes care. We’re still considering potential solutions. The situation is no worse than in Java, it seems, and we may be able to make it better.

A different concern is that the subclass must call the primary constructor of the superclass. So what happens when I want to change the primary constructor? Say I want to change Point2D to use polar representation. Can I make rho:theta: the primary constructor? How can I do this without breaking subclasses of Point2D, such as Point3D? We can't do it directly yet (though we should have a fix for that in not too long), but I can redefine Point2D
as

class Point2D x: i y: j =  Polar2D rho:  ... theta: ... = ()()
: ( “class side begins here”
(* secondary constructor *)
public rho: r theta: t = (
ˆx: r * t cos y: r * t sin
)
)



Now anyone who uses a Point2D gets a point in polar representation, while preserving the existing interface. And anyone who wants to can of course create polar points using the secondary constructor. I can also arrange for that constructor to return instances of Polar2D directly:

public rho: r theta: t = (
ˆPolar2D rho: r theta: t
)


If you find this interesting, you might want to read a short position paper I wrote for the Dynamic Languages Workshop at ECOOP. It only deals with one specific issue regarding the interaction of nested classes and inheritance, and it’s a just a position paper describing work in progress, but if you’ve gotten this far, you might take a look.

I still haven’t explained why I see no need for dependency inversion frameworks. The short answer is that because Newspeak classes nest arbitrarily, we can define a whole class library nested inside a class, and parameterize that class with respect to any external classes the library depends on. That probably needs more explanation; indeed, I think there’s a significant academic paper to be written on the subject. Given the length of this post, I won’t expand on the topic just yet.