Queues and stack traces
rzwitserloot posted in programming on March 8th, 2007
A lot of java apps (in particular Lombok, the single thread webserver I’ve been working on, but there are lots of apps where this can help) employ queues and separate threads to handle the stuff inside them.
One of the problems with this is utterly pointless stack traces. The stack trace is simply of the logistics around the daemon thread handling the queue, and not the code event where the faulty job got introduced to the queue.
A simple way to help yourself is to store a stack trace at the time the job is introduced to the queue. e.g:
public void addToQueue(Job job) {
Throwable stackOnQueue = new Throwable();
synchronized ( this ) {
job.setStackOnQueue(stackOnQueue);
jobQueue.add(job);
notifyAll();
}
}
public void run() {
while ( running && !Thread.interrupted() ) {
synchronized ( this ) {
Job nextJob = jobQueue.poll();
if ( nextJob != null ) try {
handleJob(job);
} catch ( Exception e ) {
//e's stack trace here isn't of much use...
//but job.getStackOnQueue() is much more useful!
} else wait();
}
}

March 8th, 2007 at 21:51
That problem is definitely for real! It gets even worse when you use remote calls - each node logs a slice of the actual invocation chain, but it’s very hard to match them up…
ideas for improvements:
You can use job.getStackOnQueue().initCause(e) so you’ll get a cheap combined stack trace - the “stackOnQueue” stacktrace goes up to the point where the job was added, and the “e” stacktrace continues to the point where the original exception was thrown. This works because you know that the stackOnQueue throwable does not have a cause yet.
I’m not quite sure that the throwable and the job are matched up correctly in your example - wouldn’t you have to store the throwable in the job, or use a second parallel queue for the throwables? Otherwise, you may get the stack trace of the job last added, instead of the stack trace of the job just executed.
Last point would be about the performance of creating exceptions - if your jobs are very lightweight (e.g. numeric calculations of graphics rendering, as opposed to e.g. database access) then the cost of constructing an exception can be significant, as the JVM needs to traverse the stack.
March 9th, 2007 at 2:19
The throwable IS stored in the job. The reason the queue method generates the Throwable is to have less cruft in the trace (if the stackonQueue method did it, there’d be 2 pointless calls in there: The call to stackOnQueue, and the call to queue). A more involved alternative is to use getStackTrace to manually eliminate stack trace entries until you get to the call to queue, take that one off, and work with whatever’s left.
I like the initCause thing, especially because the verbiage adds up. The stack trace is (stuff when job was queued) and the cause was (actual exception). Excellent.
Yes, this is only for big jobs. In this specific case, it’s for the mailer queue (it’s a webserver, it should ship with a simple way to send mails, and now it does). The actual ’servlets’ (RequestHandler - had to reinvent servlet from the grond up for this) all are guaranteed to run in a single thread. You can communicate with any other running connection handler without worrying about synchronizing.
Incidentally the absolute requirement to never block during handle jobs also introduces ‘functional’-like programming in a big way. e.g. all my inner API calls work thusly:
CheckCookie.go(db(), cookieFromJsonRequest, new CheckCookie.Receiver() { public void checkCookieOk(int userId, String email) { /* do stuff */} public void checkCookieFailed(Reason reason) { /* do stuff */} }Where Reason is an enum defined inside Receiver (so that’s an enum inside an interface inside a class). The code is extremely readable, CICE or closures don’t help one iota (just about every job has at least two entirely different results - success and failure, and sometimes more, and once you’re in multi-callpoint land, standard java anonymous class notation is not in any significant way annoying to use, save perhaps for the need to declare variables final.
So far it’s shaping up to actually be a pretty good web platform in general, and it gets up to ludicrous pages/sec speeds as there is never any synchronization going on, and never any thread creation, construction, or much swapping. A thing or two depend on how heavy you’re using the DB of course (which runs its queries in a pool of threads, and hands you the results back in the ‘main’ thread).
March 9th, 2007 at 6:47
This is nothing but continuation-passing style. What’s more, programming in this style with success/failure continuations is a specific instance of a monad.
March 10th, 2007 at 1:32
Yup, sure is. It’s in fact the only sensible answer I managed to come up with that avoids spaghetti code, ugly hacks, inflexible garbage, and a number of other pitfalls. Fortunately this is perfect.
So you now have come to your senses and are in agreement that java can in fact do monads and all that jazz? Finally, progress!
March 10th, 2007 at 6:50
If Java could do monads, you wouldn’t be doing the CPS transform by hand. Sorry, but the mere fact that you’re writing a web server whose sole design goal is to avoid spawning threads for I/O because threads are too expensive, shows how limited Java is. Nothing dictates that language threads have to map 1:1 native threads, it is just the way the Java designers chose to do it. Hence nio, thread pools, work queues, all kinds of unnecessary abstractions.
March 10th, 2007 at 16:54
So does factor have a webserver yet that beats apache’s performance?
March 10th, 2007 at 22:58
It doesn’t beat Apache’s performance yet, but we’re working on it. It’s fast enough to run all the sites I need to host.
I don’t believe your web server is faster than Apache either. If it is, I still win because mine is smaller (1200 lines of code).
March 10th, 2007 at 23:05
Erlang’s Yaws is a fast web server written in a real language: http://www.sics.se/~joe/apachevsyaws.html
“Apache dies at about 4,000 parallel sessions. Yaws is still functioning at over 80,000 parallel connections.”
Is your web server more scalable than Yaws?
March 10th, 2007 at 23:26
Provided the OS itself doesn’t begin borking out (my macbook just start dropping connections once active and running localhost to localhost TCP connections exceed about a 1000, possibly just a hard limit on sockets that can be fixed in e.g. a custom linux kernel build, or perhaps a setting someplace) - 80k is not a problem whatsoever.
I’ll assume for a moment that yaws keeps logistic bitpushing to similar minimums of Lombok, which means the CPU’s actual ability to process stuff is the only limit; the amount of connections is simply not a relevant number anymore.
Note that in the test I refer to above, any connection that gets handled immediatly spawns another one, and after about a minute of that, you hit about 1 page served every 0.8 milliseconds, while eclipse is running, on a Core Duo 1.8Mhz macbook with 1 GHz of memory, not even using the ’server jvm’ (though I doubt that’ll make a large difference).
Your average servlet is lucky if it manages 10 ms/page.
So, yeah, I feel quite confident I can take on yaws.
March 10th, 2007 at 23:29
Uh, who gives a flying tard about how many LOC a webserver is?
I know some extremely funny and impressive basic one-liners. Do those ‘win’?
My webserver beats the pants off of apache and has been for a while now. Throw in actual dynamic serving, and it beats entire collections of pants off of apache, depending on the combo (tomcat/servlets is near trivial to beat, as is Ruby on Rails or django/turbogears, but even properly tweaked FastCGI based setups are left in the dust.
My current job (http://tipit.to/) will launch on the back of this webserver. Given that if successfull it’ll be serving a small nugget of data to an enormous stack of pages, it’ll be a fine test of stability and performance.
March 19th, 2007 at 21:37
Is Lombok public? I’d like to sniff that architectural style… I guess I can appreciate it better in Java, since my Smalltalk is a bit rusty…
March 20th, 2007 at 9:07
Soon. It’s mostly an issue of packaging it up nicely and not having the time to do this at the moment. But… soon!