August 7th, 2008

Concise Instance Creation Expressions - Dissection

Due to popular demand, a redux explanation of why CICE (the stuff I’ve been raving about in the past log) is good, and what it tries to do.

Part I: Definitions

In order to explain CICE, and why it beats BGGA, we first have to look at closures themselves. The true meaning of a closure is a tuple: It’s a combination of a so called ‘lexical context’ and a function (procedure/method/’def’, et al. From now on I’ll call it a function). The function refers to identifiers, and these identifiers are taken not only from its own internal scope (eg, things the function itself defines), but also from the scope of where the closure has been created/declared.

This is complex, so I’ll show you a javascript example, which does closures naturally:

function adder(addValue) {
    return function(x) { return x+addValue;}
}

var myAdder = adder(10);
alert(myAdder(20));   //prints '30'

Click here to run the above straight in your browser. I’ll wait. (That’s a link to javascript:adder=function(x){return function(y) {return x + y;};};myAdder=adder(10);alert(myAdder(20)); which is pretty much the above example but rolled into a single line).

There are two important things going on here:

1) The ‘adder’ function returns… a function. In order words, a function, which by pretty much by definition returns objects, returned a function. Hence, functions ARE objects in javascript. Python, and as far as I know ruby, and virtually all functional languages, including LISP, work very similarly. We shall call this feature function objects. Note that on the surface, java does NOT have them. Dig a little deeper and java does, but I’m getting ahead of myself.

2) The inner function, the one with the ‘y’ as argument, refers to ‘x’ within its operation. However, ‘x’ is not defined by this inner function itself - it’s getting the x from the OUTER function. In other words, the inner function:function (y) {return x+y;} is referring to something that is not part of the inner function at all - the ‘x’ variable. Take the function out of the place where it is defined, and at runtime the interpreter would have no clue what that ‘x’ even means. The inner function can access the outer function’s variables. This introduces a complication: ‘x’ changes over the course of the program. Perhaps the ‘adder’ function gets run a million times during the course of the full program, every time with a different ‘x’. So what value of ‘x’ does the inner function take? Well, simply the value of ‘x’ as it was when you created the inner function. We’ll call this ability to access stuff outside your own brackets lexical scoping.

This is what closures are, this is what the term really means.

However, there’s a problem. Closures consist of 2 not immediatly related concepts, and oftentimes, a request for ‘closure support’ could be satisfied by introducing only 1 of these 2 features (usually function objects). Still, there’s a good reason for both function objects and lexical scoping, and java would be in a bad shape if it supports neither.

Part II: The BGGA proposal

In part III I’ll explain that java has supported something which is functionally 100% equivalent to closures since the very first public version. However, the BGGA proposal basically acts like java just plain doesn’t have closures and implements them in a very direct way: The basis of BGGA is to add the whole closure shebang to java, including the exact implementation of functions as objects, and implicit and complete lexical scoping. Every object has a type - this is true in virtually all OO programming languages, from Java to C# to Python to Javascript.

Dynamics

However, in those last 2 (’dynamic’ languages), any function object is simply of type ‘function’. e.g., our adder has a type function. It’s not type function(int) returns (function(int) returns int) - which is what java would do, as it forces you to explain exactly and in detail what parameters go in, and what result values come out. (If that went too fast, read from left to right: our ‘adder’ is a function that takes one integer as argument and it returns a function with signature x when you call it. x = a function that takes an int as argument, and also returns an int when you call it).

Because a function does not have to declare what it takes, returns, or throws in a dynamic language, you can take, return, or throw anything you like. First run you return a function, second run a string, third run a number, fourth run a complex object, it doesn’t matter, all is fair. This is why these languages are called ‘dynamic’. Everything can return anything at any time. This makes it very easy and quick to write short code.

Unfortunately it makes it impossible for automated tools to do any type of serious analysis on your code. That’s why python and ruby have very few “compile-time” errors (eg, when you turn a .py file into a .pyc file. Interpreters like python still have to parse the text, if that fails, you get an immediate error. Other than that, though, the code has to be run to find problems, even obvious ones).

This automated analysis lies at the base of java’s strength. Eclipse has extremely extensive automation which allows from everything to massive amounts as-you-write-code errors (”compile time” is as-you-write-time as far as an IDE is concerned, “run time” is NOT) to automatic tooltips with the full signature and javadoc for each method call, to guessing arguments from local context, to complex refactor scripts, to finding exactly where a certain method gets called from elsewhere in the code (’comefrom’) etc, etc, etc.

This is the ‘java’ way. You pay a little extra in the form of typing more (though a good IDE will type almost all of the extra stuff for you, and re-use is just a refactor script away), and in return you get an IDE that acts as a second pair of eyes - a handy assistant. Either you like this way, or you don’t. If you don’t, you use something like Jython to program ‘dynamic-style’ and you still get a class file. The JVM doesn’t care one whit what you use, it happily runs whatever class file you throw at it. This is the strength of a common runtime, and I admit that microsoft’s CLI has ‘gotten’ this a bit better. Still, there’s no practical reason why the JVM can’t do the same thing, and, indeed, just about every language ever devised can be run in the JVM, including at least Python, Ruby, C#, Javascript, LISP, and many others.

While my viewpoints on which methodology is better is well known (java’s is better), I think even if you disagree with me, you do believe that a language that tries to do everything is a bad compromise - that’s what a JVM supporting multiple languages is for. The last time someone tried a language that can do everything, from declarative (prolog-like) to functional (LISP, Haskell) to OO (java, javascript, python) to imperative (C, python, PHP), was PL/I. It is mostly considered a massive failure. The problems with a language-of-all-jobs is legion, but just to give you a glimpse: Any two random joes that claim to know UberLanguage can’t read each others code because they use a completely different style, implementing compilers and the like is a huge pain, syntax is forced into ugliness in order to support so many different primitive operations (perl-soup!), and it can be difficult to make code written in one style work well with other styles.

Common Runtimes solve this problem well - they allow interoperation between different styles while keeping a very distinct separation between styles and, possibly even more importantly, distinct names. JVM is one thing, and on it you run some Jython and some Java code. Perfectly obvious what’s going on. “I got some PL/I” code… totally ambiguous. No idea if I can even read it, or work with it.

We now have our answer why BGGA is a rotten piece of idiocy that ruins java as a language, and, being the mothership of the JVM, the whole java as a framework stack. That’s because BGGA hacks closures into java by making java look like python - it applies a large brush of dynamics onto the scene. It turns java into a dynamic/static hybrid. Bad.

BGGA the proposal explained

BGGA adds function objects verbatim. Because in java everything has an explicit type, this introduces a completely new type of declaration. You used to have primitives (boolean, int, long, double, char, short, byte), classes (String, Object, List, etcetera), and arrays of any of these elements. Those are the ONLY type declarations java supports. Functions themselves aren’t in here. BGGA adds them. They look like a grouping of input and output parameters:

int(int) x; //x will contain functions that can be called with 1 parameter of type integer, and when you do, it returns an integer.

The problem is that this doesn’t fit with java. It’s a massive change, making the compile/parse job much more difficult (important, because Eclipse, NetBeans, IDEA, and all those other IDEs need to interpret the whole file everytime you enter so much as a single character in order to help you as-you-type), and more importantly it takes away typing information.

Example: A comparator compares 2 values and tells you which one is considered to be ranked higher. You can supply one to a sort function to sort lists according to your own specifications. Python uses a function object:

def mySort(a, b):
    return abs(a)-abs(b)

In Java, you use an instance of the Comparator interface - its type would be Comparator<Integer>, whereas in python the type, if anything, is ‘object’, and if you push hard enough, ‘function’.

With BGGA, this would be replaced with an int(int, int) - a function that takes 2 integers (a and b), and returns an integer (-1/0/+1 depending on if a is smaller than, equal to, or larger than b).

Where is the javadoc for the int(int, int) type? It might just as well be a component of a calculator button - the ‘*’ button calls code that also converts 2 integers on input into one outputted integer. Yet, a multiply function is completely and utterly unrelated to a comparator. The fact their signatures are the same is a total coincidence. Assigning the sorter to a calculator button is probably a mistake, and assigning the multiplier function to the sorter is -definitely- a mistake. In java, right now, you can’t screw this up - you have to give a Comparator. In BGGA you can. Furthermore, all that static information for eclipse, like eg the javadoc of Comparator, so you know whether a negative output means that a is larger or smaller than b (I can never remember!) is gone.

It’s a big gun that completely changes how you think about java.

CICE - back to your roots!

The CICE proposal is smarter, because it acknowledges that java has supported something 100% functionally equivalent to closures: Anonymous classes.

Here’s an example. This is valid -right now- in java and has been possible since java 1.0.

The javascript adder example, but now in java:

public interface NumberModifier {
    public int modify(int input);
}

public static NumberModifier makeAdder(final int addValue) {
    return new NumberModifier() {
        public int modify(int x) {
            return addValue + x;
        }
    }
}

System.out.println(makeAdder(10).modify(20)); //prints 30.

Note how you get what amounts to a function object that has lexical scoping! Where we define an on-the-fly (anonymous) implementation of the NumberModifier interface, we use the ‘addValue’ from outside the scope, yet this will compile fine, and work exactly the same as the javascript example. It’s just a bit wordy - this needs to be spread around 2 files (the NumberModifier interface is one, and the rest in another) and especially the declaration of the anonymous version is a whole bunch of lines.

CICE asserts that this practical equivalent to closures is fine as is. You can’t very well remove this, it would make java completely not backwards compatible with everything out there. Adding BGGA closures while this exists as well is making two completely different semantics that do the exact same thing, which in general is bad. Perl Soup alert! PL/I ahead! That way lies madness.

CICE simply recognizes that that example above is needlessly wordy. Anytime you create an anonymous class, you take a parent type (in this case, NumberModifier, but it also works for Comparator, or even an existing class, abstract or no). Inside the brackets you then define at the very least each ‘abstract’ method (meaning, each method declared by the supertype but not implemented. In interfaces, that’s all declared methods. In (abstract) classes, only the methods explicitly declared ‘abstract’). If you like you can define more methods, but this is basically pointless unless you are overriding (non-)abstract methods from your superclass.

This is fine, unless you are overriding only one method - then it all becomes a bit superfluous. Turns out an anonymous class implementing an interface with but one method declaration (like, say, NumberModifier above, or Comparator, our sorter function) is functionally equivalent to a function object - it’s a function, accessible as object. It’s just declared as a very specific interface (or abstract class). So, CICE takes the above declaration of NumberModifier and simply makes it smaller without sacrificing any sort of explicit typing or information:

public interface NumberModifier {
    public int modify(int input);
}

public static NumberModifier makeAdder(final int addValue) {
    return new NumberModifier() {
        return addValue + x;
    }
}

System.out.println(makeAdder(10).modify(20)); //prints 30.

The line declaring we’re implementing the ‘modify’ method is gone! Instead, there’s an all new parameter list tacked onto our interface name (the () after the NumberModifier text - it isn’t there in the original non-CICE example). There’s more to a method - there’s annotations, access type (public/private/protected/package private), return type (our modify function returns an int), and exception list (our modify function throws no explicit exceptions). All of these are missing now, but 99.99% of all methods implemented inside an anonymous class just copy from the parent anyway. In fact, for stuff like return type and exception list, you must copy them, or it’s a compiler error. That’s how CICE works: It simply assumes all those things are equal to where the modify method is declared (in the NumberModifier interface itself).

But how does the CICE syntax know that that ‘return addValue + x’ is the implementation for the modify method, and not some other method, like, say, the toString() method which all java objects have?

Because the type you are implementing (NumberModifier) only has ONE abstract method - modify. CICE assumes you’re implementing that one.

So now you know how CICE works. CICE syntax works -only- if the type you are implementing has only 1 abstract method. Trying to CICE your way around a WindowListener (which has windowClosing, windowFocus, windowCreated, etc functions) doesn’t work - it has more than 1 abstract method. Compile-time error. CICE calls all types that have only 1 abstract method, be it interfaces that declare only 1 method, or abstract classes with only 1 abstract method, ‘SAM’ types (Single-Abstract-Method).

The key is: SAMs -are- function objects. The CICE syntax merely makes it easier and less wordy to write what we already know. You can expand ANY CICE declaration into java 1.0 code if you like, in a simple and automated fashion.

BGGA can make no such claim, because the entire construct of BGGA’s closure simply doesn’t exist in java NOW.

All of the extensive changes made in java1.5 (except annotations) can be automatically translated to pre-1.5 code. For example, the foreach statement:

public void printAll(List<String> list) {
    for ( String item : list ) System.out.println(item);
}

is 100% equivalent (they even generate the EXACT SAME CLASS FILE!) to this java1.4 code:

public void printAll(List list) {
    Iterator i = list.iterator();
    while ( i.hasNext() ) {
        String item = (String)i.next();
        System.out.println(item);
    }
}

This tactic is an excellent way to grow your language: Once the java-using population starts using a certain ‘pattern’ repeatedly, make it easier to write the pattern - so easy, in fact, that it ceases to be a pattern and just becomes something the language does naturally. That’s exactly what happend with foreach statements. That’s exactly what CICE does. It takes the SAM pattern, which is used a lot in java (Runnable and Comparator for starters), and makes them so simple to write they become part of the language itself instead of an oft-used pattern. Waiting for uptake amongst the population, eventhough the syntax is convoluted, is a good way to vet features. It’s important that a feature is used a lot, otherwise you run into PL/I and perl problems. It becomes very very difficult to know what your language can do exactly, and writing parsers/compilers and further syntax extentions becomes exponentially more difficult.

Part IV: Considerations

A. ‘final’

You may have noted that anytime you use lexical scoping in java by declaring classes (anonymous or not) inside methods, you can only access from the outer lexical context those things that are final. The reason is basically as a failsafe - while java can certainly let you access non-finals from the inner scope, trying to process exactly what happends to a variable becomes a total crapshoot, because you never know when the inner code is called. Yet sometimes this is exactly what you are looking for. There’s a very simple tactic to allow write access where no write access is normally allowed: Use a 1-element array. You never change the array itself merely elements in it:

public void someMethod() {
    int x = 2; //not final, can NOT be accessed in new CICE declaration
    final int y = 2; //final, so accessible to CICE...
    y = 5; //but this is not legal, as 'y' is final.
    final int[] y = new int[1];  //final, so accessible.
    y[0] = 10; //perfectly legal.

However, this is ugly syntax. Some of the advanced considerations in the CICE proposal suggest that explicitly having to follow the compiler’s advice and delare variables used inside your ‘closures’ as final is silly, and instead the compiler should just go ahead and assume any variable used inside a closure is final. Where you want to explicitly change the variable’s value, use an explicit declaration of ‘public’ or somesuch to tell the compiler that, yes, you know what you are doing, nevermind that usually this kind of thing means you wrote buggy code. It’s really not that big an issue, given that you can emulate what you like with the ‘array’ trick above. Just eliminates the need for the array ‘hack’.

B. Annotations

Annotations are a java1.5 device that allow you to decorate certain elements of your java code with identifiers that tools can then look at. For example, you could stick a @Profile annotation on a method, and then your eclipse might for example generate a report of how many milliseconds a given run of your code spent inside each @Profile-marked method. (profiling them all becomes very slow and would generate an enormous report). Methods can also carry annotations.

Here’s a method signature with each element used:
public @Profile void methodName(String arg1, int arg2) throws SomeException

‘public’, ‘void’, ‘methodName’, ‘throws SomeException’ is also taken from the declaring interface or abstract class in CICE, and the parameter list ‘String arg1, int arg2′ is explicitly provided with the CICE declaration. So, everything is covered… except the @Profile annotation.

Just chunking them in along with the CICE might be possible but probably makes parsing too difficult, not to mention just generally looks ugly (annotations are not normally supposed to show up just anywhere). The CICE proposal does not support annotating one, though neither does BGGA. The old-style way DOES. Because I expect that tagging SAMs with an annotation is not exactly a common occurence, you could just go with CICE, and then, if you need to tag a method, ’split’ the CICE declaration back into an old-style declaration, add it there. Because CICE can be automatically split back to old style this is no problem. In BGGA however, there’s simply no alternative.

C. Library

Any change in the language should be matched by updates to the library. The CICE proposal names some places where CICE currently can’t be used, though it would be useful if it could be. All fairly simple changes. For BGGA these changes would also be neccessary, and far more extensive.

Example: take the somewhat unfortunate java.awt.event.WindowListener interface. It has a bunch of methods (more than 1), so it’s not a SAM and you can’t CICE one. Yet it would be very handy if somehow you could. Adding the following class would help, and reserves 100% backwards compatibility. Just add this class, throw it in java.awt.event, and you can CICE with WindowListener, no further changes required:

public abstract class WindowIconifiedListener extends WindowAdapter {
    public abstract void windowIconified(WindowEvent event);
}

And similar for each other method in the WindowAdapter class. Now you can add a CICEy iconifiedlistener like so:

appWindow.addWindowListener(new WindowIconifiedListener() { goToSystrayInstead();});

Nice, neat, legible one-liner. NOT adding such struts to the Java Runtime library would be a big mistake, obviously, because it would limit the use of CICE declarations - once you have to make your own WindowIconifiedListener there’s a lot less impulse to use this.

BGGA would require far more extensive changes in order to incorporate something like a BGGA’ed WindowIconifiedListener - a full wrapper that calls the BGGA closure instead of this extending one-liner.

D. Inference

Java doesn’t normally like inference, on the treatise that if the compiler gets it wrong, you either don’t immediatly notice or only notice at runtime, both of which can cause potentially days of debugging and or rewriting, whereas explicitly writing the type is a job so simple, eclipse can do it for you.

Take: List<String> list = new ArrayList<String>();

The only thing that is pretty stupidly superfluous here is the dual ‘String’. Sometimes these generics type get very large (e.g. a map that binds IP addresses to a set of valid hostnames for them would look like: Map<InetAddress, Set<String>> = new HashMap<InetAddress, Set<String>>();)), so the doubling up of the generics info can be annoying.

static methods infer any generics off of the context and parameters. For example, this is perfectly legal, warning free:

List<String> stringList = Collections.singletonList("hello");

- singletonList is a static method with generic types, but between the fact that “hello” is a string and that it gets assigned to a list of strings, the java compiler infers that singletonList is being invoked here under the generics resolution of T = String.

But for constructors (stuff with ‘new’) you MUST specify the generics stuff again in the constructor call or you get a generics warning. A constructor is pragmatically almost the same as a static method - you call either with no context of an object. In fact, a static method that returns an object of its own class type is functionally equivalent to a constructor and is an oft-used pattern. Why static methods infer generics types where possible but constructor calls never do is a mystery to me, and it makes CICE calls needlessly wordy still - specifying that you have a Comparator that compares integers explicitly.. immediatly followed by a parameter list with 2 integers in them, is overkill. The CICE ‘future thoughts’ section mentions this.

3 Responses to 'Concise Instance Creation Expressions - Dissection'

  1. 1Wunschdenken » Blog Archiv » Closures discussion closed
    November 14th, 2006 at 23:33

    […] rzwitserloot […]


  2. 2Just grin and bear it… - Closures - yes, Java has them!
    November 25th, 2006 at 20:10

    […] if the CICE proposal is accepted, the wordiness of using java’s closure (anonymous classes) virtually disappears: […]


  3. 3Just grin and bear it… - CICE ambiguity
    November 25th, 2006 at 23:28

    […] Wherein I post a question where the CICE-ification would be unclear. CICE proponents - how should this situation be handled? […]


Leave a Response

(Note: if you use a new name from an unknown ip address, your comment won't appear until I approve it. Anti-spam measure only, I don't censor).

Imhotep theme designed by Chris Lin. Proudly powered by Wordpress.
XHTML | CSS | RSS | Comments RSS