May 16th, 2008

Python/Ruby: script languages, nothing more.24

Tim Bray of sun has been interviewing the JRuby guys and stumbled upon the crux of why Python and Ruby are effectively sucktastic when it comes to implementing projects of more than, say, 6 manmonths. They don’t play ball with the IDE. Rather unfortunately, this conclusion leads him to the wrong realization. See here, the fantastic delusions of an evangelist.

To summarize, he advocates letting the IDE watch over the shoulder of the interpreter as it runs through a unit test suite. That way, the IDE can identify for each variable it finds in the code which types of values it ever holds. (Ruby/Python aren’t weakly typed; each object does have a very specific type. It’s just references to them are untyped, so for any given occurence of a reference in the code, the IDE has no clue what it could contain. Objects exist only at runtime, not at compile (read: writing code) time).

Hold the phone! Are you nuts? Short of 100% code coverage any refactorings are always an unknown, and any bugs caused by it will by definition not be caught by any of your unit tests. Reaching 100% code coverage itself is waaaaaaaaaaaaaaaaaaaaaaaaaaaaay (I cannot stress enough by how many factors!) more work compared to the very mild job of adding types to all your declarations, something your IDE practically does for you, I might add. Not to mention the time it saves you on debugging.

I’m beginning to believe that python/ruby folk are so obsessed with unit tests exactly because they can’t trust their compiler any farther than they can throw it, because it can’t help but see your code as a pile of text instead of a structured set of links. (For you agile fanboys out there who believe my statements that unit tests aren’t all that useful and generally a waste of time, remember, agile blows).

Tim, and all others convinced by the droning of the evangelists out there: It’s really quite simple. An IDE is a fantastically useful device to work with, and purposefully eliminating any chance your IDE has of becoming a second pair of eyes (because, let’s face it, nobody really does pair programming, it just sounds nice) is a monumental error.

Python and Ruby are scripting languages because they do not scale along the development process. Working on a big project with more than one person very quickly spirals out of control in languages like that, without an IDE to keep ordnung. I’m not saying java is freed of all blame here - there’s loads of boilerplate and syntax issues that can be cleared up (without ruining the strengths of java like Gafter’s and Gosling’s latest brainfart, The closures for java proposal), and there’s a general feeling amongst the community that convoluted forests of factories and XML settings files make for ‘flexible’ code. Unfortunate. At least these issues are fixable.

Weaning python and ruby off of their obsession with dynamics is impossible.

There’s Hope! Sane java closures!0

The insanity of Gafter and Gosling, as showcased in this ridiculous proposal in an apparent vein attempt to change java into something it clearly isn’t - has finally met serious resistance.

Java hero Josh Bloch (yeah, the Java Puzzlers guy) together with other java afficionados Bob Lee and Doug Lea have proposed a brilliant counter ‘implementation’ of java closures which simply highlights the fact that java already has ‘em - the single-method interface (or abstract class with only 1 abstract method).

I know the proposal is brilliant because, save for the fact that Roel and I kept the ‘new’ in the more concise syntax for anonymous implementations of such a single-method interface, it’s exactly the same. Important: this highlights how natural the CICE (as Bloch/Lee/Lea call it) proposal really is; a group of two resp. three people came up with the same plan to solve the same problem. Exactly the same plan, in fact. How natural is that?

CICE for president^H^H^H^H^H^H^H^Hjava closures!

Concise Instance Creation Expressions - Dissection3

Due to popular demand, a redux explanation of why CICE (the stuff I’ve been raving about in the past log) is good, and what it tries to do.
(more…)

A cunning plan!0

Well, after some persistence I can participate at the Dutch Programming Championship as a spectator (too old to join as contestant :-P ). I’ve worked out an interesting observation during the university-level round held 2 weeks ago.

First some background:

The ICPC model (used during the whole lot from uni-level to country-level to europe-level to world-level) of programming championship involves your team (team size maximum 3, but there’s only one computer) getting 7 to 9 assignments. The assignments vary in difficulty and are randomly ordered. You get 5 hours to solve as many of the assignments as you can. You can submit solutions for an assignment as often as you like, and each attempt (successfull or not) is logged on a publically viewable scoreboard. A correct submission scores you a scorepoint, but also costs you X penalty minutes, where X = (time since start of tournament in minutes + 20* wrong submits). Your performance is rated first on scorepoints (so with 3 points you win from all 2-pointers, regardless of penalty minutes). If the scorepoints are a tie, the penalty minutes become relevant.

In other words, as long as you don’t ever solve a certain assignment, you can submit as many wrong solutions as you like, with no penalty whatsoever. Your attempts show up on the scoreboard, but no clue is given as to why the submit was wrong (not even to you, by the way).

One of the challenges is to finish the assignments in the correct skill order, from easiest to hardest. That way you pick up as few penalty minutes as possible. Some puzzles are obvious in skill level, but some seem so incredibly convoluted that there’s a gimmick or it’s NP-complete and thus simply a matter of working out any solution - it ought to be fast enough. (A big part of the game is to submit a fast entry; the reference implementation’s time is multiplied by about 10ish, and any submissions slower than that are wrong. Most assignments are trivially easy if it WERENT for the requirement that you build the fastest order of algorithm).

Trying to analyse if an assignment is ‘hard’ or ‘easy’ is very difficult. The scoreboard has traditionally provided a big clue. If you notice a load of teams quickly (you can see the time difference between 2 successfull submits by looking at the penalty minutes and attempts, and doing the math) solving an ostensibly difficult assignment, you can bet good money there’s a hidden gimmick that allows you to reduce the whole thing to a trivial set of loops or even a one-liner formula. If no one is trying a certain assignment at all, you can bet fair money everyone who has looked at it so far thinks its hard.

All teams clue off of each other for which assignments to try. However, and here’s the interesting bit: During the university-local rounds (every university runs the same assignment set), universities only see their own university’s team’s scoreboard. Turns out that per university, the set of assignments that no one tried is different. In Delft, no one tried I, at another university, no one tried G. Apparently, if the first couple of groups start trying one assignment, the rest automatically assume those must be the easy ones and also try the same ones. This in turn drives still more groups over to try those ones - yet the actual chances that the one everyone’s working on is the next easiest couple of assignments in line is not at all a certainty.

Wisdom of Crows? Nope, even in this case, where you can barely see what people think, and there’s plenty of good reasons not to blindly follow the crowd, and most participants do have a clue, does wisdom of the crowd fail. I re-assert my theory that any kind of easy viewing of choices made by others before reverses the Wisdom of Crowds and gives you the Stupidity of Sheeple. I bet if you remove the scoreboard completely, the spread of tried assignments roughly matches the difficulties.

And now for my cunning plan:

During the NKP (21st of October of memory serves), I’ll quickly submit an empty solution for those assignments I’m guessing are very difficult. I’m likely correct, in which case I waste no penalty points (you get no penalty points for assignments that you never succeeded at, even if you submitted many wrong ones), and I might convice other teams to waste time on them, especially if I convince one or two other teams on the sly to try the same trick. The scoreboard will then list 2 teams quickly trying assignment X - all wrong, but still trying, which might cause people to veer to that one, under the assumption we figured out a gimmick (especially if we submit faster than ordinary programming appears to suggest is possible, insinuating there’s a trick to quickly calculate the answer) and are just polishing out some bugs.

In the unlikely case I’m wrong and the assignment is actually easy/there IS a gimmick, those same people I inadvertently pointed at the assignment will at some point succeed, and this will be reflected on the scoreboard. Depending on the performance of the succeeding teams and the time it took, I can then make a much more solid assessment of the chance of solving that assignment - without spending more than about 2 minutes analysing it.

It isn’t cheating: the other team(s) that are playing the same psychological game do not have to communicate with each other during the game. Seeing them quickly submit a wrong answer for an assignment that looks, at a quick 1 minute glance, not trivial, tips you off what’s going on, and then you can submit an empty file yourself, shoring up the appearance that there must be a gimmick. In fact, it’s beneficial: That way you can leech off of each other’s assessment.

If I spent 10 minutes verifying that assignment C is indeed nigh undoable, and some other team spends 10 minutes doing the same thing to H, our mutual quick submits tip us off that neither is worth trying until someone nails it and proves us wrong, yet for those who don’t know what we’re doing, it actually looks like C and H are easy.

Of course some tactical planning is neccessary to make it look somewhat believable (submitting an empty C and H 2 minutes apart would be bad, but a 15 to 20 minute hole sounds good).

I’ll let you know how it works out.

NB: Readers who are going, you know what to do :-)

A crotchety old Joel gets close.7

Joel reviews Beyond Java here. For those not in the know, Beyond Java is a book that eschews java in favour of the recently popular scripting languages (python, ruby, you know ‘em). Joel first rightly explains that in many ways programming is just plain hard, and no magic bullet exists that will ever solve that problem.

Joel then goes on to explain ‘accidental difficulty’ - a rather obvious (but nonetheless very correct) view on what programming languages are designed to do.

But now we get to the meaty bits: Joel lists 5 crucial ‘revolutions’ in eliminating accidental difficulty, from the machine code by hand->assembler as the first, to memory managed languages (from LISP to java) as last. Joel then asks what the 6th might be… and then proceeds to bash static typing’s skull in, praising latent typing to the heavens with absolutely no arguments whatsoever.

Shame, because the 6th revolution is already here.

It’s the IDE.

Think about it: You know it’s true. LISP has it in the form of the REPL. Java has it in the form of the big 3 (Eclipse, NetBeans, IDEA).

Python and Ruby don’t have it. Both languages are designed, quite excellently I might add, to be the best most productive tools on the block if you can’t use an IDE, but their very thriftyness that makes them so good if all you get is notepad blows their legs right off in the IDE department. It’s not really the fault of python, or ruby. They were conceived well before seriously cool and useful IDEs were even a glimmer. I also strongly believe java itself sort of lucked into the whole IDE revolution. I doubt Gosling and co decided to dump conditional compilation by the wayside because they realized that NOT having any sort of dynamics whatsoever makes eclipse capable of auto-complete, guaranteed safe refactoring, javadoc popups, errors-as-you-type, and all the other fun goodness.

Latent Typing is a lame duck. It saves you some characters and that’s it. How could it possibly be a ‘revolution’? Assembler has latent typing guys - it isn’t new*.

*) Actually, assembler has both latent typing and weak typing, which means python is still a big improvement. At least you can check types at runtime in python.

Learning new stuff: Java is king once again0

In the span of the past 2 weeks, I’ve tried to pick up both GWT and TurboGears while on the clock for a project that needs to be done yesterday. Not much time to fiddle about, in other words.

TurboGears failed horribly and has been left by the wayside. GWT, on the other hand, is now something I’m already quite knowledgable about with about as much time spent working on either.

The key: auto-complete. For GWT, anytime I’m stumped I’ll auto-complete my way to an answer. Without moving away from the code file I’m editing, I notice that all graphical widgets are in the same package, so I copy/paste the package name, add a dot, and hit ctrl+space. There’s the whole list of all widgets GWT has to offer me, right there. Anytime I stop my scrolling and linger on a specific one, I get a popup with a little story on what it does, exactly.

In the mean time, TurboGears offers extensive documentation, multiple ‘quickstarts’ and ‘tutorials’, and a video that walks you through building a basic WIKI in about 15 minutes. None of this is available for GWT - all it has is some very simple sample apps, and 1 serious one (a mail app. I only looked at the CSS for some stylesheet tricks, not at the code).

I’ll accept that TurboGears tries to tackle more, but what with me writing a web server, and never having done fancy client-side styling before, the familiarity bonus definitely sides with TurboGears.

The key to this story is static typing. Between the javadocs, the auto-complete, and the instant red-line marker: Whoops, that’s wrong. Please fix that to keep me from having to wade through a very large series of silly mistakes when I get around to running my doodles, learning GWT was fun. A lot of fun. I was smiling all the way through, amazed at what I could build with only minutes and no reading of any documentation whatsoever.

TurboGears in the mean time threatened to wipe my left pinky out for having to press cmd+tab so often. Fun it wasn’t. In this phase, having a language that allows very compact code tricks is something that might be fun when learning how to program, but consider this:

If you’re like most people and only reach expert level in one programming language, you only go through that process once. Picking up new tools and libraries is something you do once a year if not more often.

Knowing how to navigate java’s verbosity with patterns and eclipse templates is something I can do already. I’ve learned that once. Getting a massive boost in picking up new tools is priceless, and pays out -every- time.

Making your language easier to pick up whilst making new libraries and tools harder to learn down the road is a bad trade-off that might create fuzzy feelings when you dive into the language itself (It certainly did for me. When I decided to try out python I really liked it - until I started writing actual serious projects in it, then I turned into the java evangelist you know), but bites you in the arse in the long run.

Sidenote: GWT is -really- cool, and if your project can operate in a primarily SPA (Single-Page-App) way, it makes web devving so much easy, it’d call it revolution.

Python nails one - What makes a good language change5

Python 2.5 includes a with syntax (it’s nothing like javascript’s “with”) that helps work with objects that represent resources that need to be explicitly released. Loads of those around, from files to network connections to database sets and queries.

Good.

Basically, you write:

with open("x.txt") as f:
    data = f.read()
    do something with data

instead of:

f = open("x.txt")
try:
    data = f.read()
    do something with data
finally:
    f.close()

the object returned by open("x.txt") is now ‘guarded’ - it doesn’t matter how the with code block closes, that object gets notified that it should clean it self up.

It’s a good idea. In fact, it’s such a good idea, it’s been floating around java suggestion land for a while now in the form of this proposal by Josh Bloch. Apparently in java7 this has a real shot.

The reason this kind of language change is useful is because it nails 4 important bases for being a good addition to a programming language.

I’ll walk through them, explaining how the ‘with’ stuff passes, and how the world’s dumbest language ‘improvement’ ever, fails it. That would be C#’s coalesce.

1. It fills a needed gap.
Some changes enable stuff that simply wasn’t possible before. Always a tough call to see if the user base starts playing with such a feature. Others are simply a simplification of evolution of things already being done.

with: No brainer - try/finally constructions to ensure proper cleanup of resources amounts to the same thing, and is being used with some frequency today. All those instances can be replaced with thewith syntax. In general, convoluted but often used ‘patterns’ may safely graduate to language syntax. Similar stuff happend with e.g. java’s foreach statement.

coalesce: null checks in series hardly ever happends.

2. It improves readability by higlighting intent.
with: A ‘with’ statement is like a comment. It explicitly shows intent: This variable here is being initialized here with a resource that must be properly released. Not doing so is bound to cause hard to detect memory leaks (go write your little unit tests that find leaks, I dare you!) and a quick hack 2 days before the deadline might easily screw with the delicate balance of a try/finally based cleanup device. With a with powered cleanup guard, you’d have to be real idiot to screw it up now. Excellent.

coalesce: Intent isn’t really an issue here. The long way around, with a load of if statements, or just a library call, is just as obvious in intent.

3. It improves readability by parsing better
with: ‘with’ is somewhat succint for impressing on the reader of the code what’s going on, but it’s certainly not a wild guess like ruby’s puts. More importantly, the shorter code combined with having all the relevant setup/teardown information in one line instead of spread out over just before the try and in the finally block severely helps your brain group things together when glancing at it quickly.

coalesce: I’ve asked over 20 people what ?? would do and not one of them knew. That’s bad. A load of google searches on coalesce amount to saying how cool this little used/overseen C# 2.0 feature is. That would imply those that managed to find this thing didn’t know what it did beforehand. Ruins code readability. Furthermore, Library.coalesce(a, b, c, d, e, f); is not significantly harder to grok at a quick glance compared to a ?? b ?? c ?? d ?? e ?? f, and the method gives you a name to search for and a place for documentation to explain what’s going on.

4. A library expansion doesn’t solve the problem
Oftentimes a language change can be implemented virtually as well using a core runtime library. libraries tend to have the property that documentation for an aspect of it can be found far more quickly compared to a language quirk. Take javadocs/pydocs/etc and compare with finding the exact specifications of the syntax of a python generator. It’s just plain easier to look up some library function. It also avoids having to mess with the compiler, and all related language tools, like all IDEs, templating engines and who knows what. Even if a library solution isn’t quite as elegant as a new language feature, the gap has to be large to warrant tacking more features onto the language itself.

with: with is a language feature but there’s no easy way to pull this off with libraries. In fact, the with statement leads to better ‘librarification’ of your code - the setup and teardown can now be squared away in the object itself, without having to explicitly remember the ’setup’ and ‘teardown’ method calls and putting them just before the try and in the finally. A library hack would have involved passing a pointer to a function together with a pointer to a construction device to some library system, where the library first asks the construction system to produce an object, then call the function in a try/finally guarded block, and call the constructed object’s teardown method in the finally block. Python makes writing on-the-spot methods a bit annoying (if it doesn’t fit into a lambda) and it would have been a serious nuisance to do things this way - it makes the ‘guarded’ code and the stuff around it look more separate than it really is.

coalesce: At least in java coalesce could have been implemented in a better way with a 3-liner library function and generics:

public static <E> E coalesce(E... objects) {
   if ( objects == null ) return null;
   for ( E object : objects ) if ( object != null ) return object;
   return null;
}

That would give you Library.coalesce(a, b, c, d, e, f); instead of a ?? b ?? c ?? d ?? e ?? f. The gap has to be well in favour of the language change to make it a good plan, and in this case I’d actually say the library solution is better.



A couple of other interesting criteria exist but I’d say these are the main ones to consider.

Imhotep theme designed by Chris Lin. Proudly powered by Wordpress.
XHTML | CSS | RSS | Comments RSS