MOX: Mapped Object XML
Parse any XML - a 15 minute tutorial
This document serves three purposes: It explains what MOX is, it serves as tutorial of MOX, and it shows you why MOX is usually the XML parsing technology you will want to use.
Why use another XML parser technology?
Java currently ships with 2 different XML parsing and writing technologies: DOM and SAX. MOX offers something new: MOX is simple. Extremely simple.
MOX only works if you know the structure of the XML beforehand. However, unless you are writing some sort of XML editor, this is virtually always the case. The need to be aware of the XML structure is the only limiting factor to MOX. Other than that, the sky's the limit - read XML, write XML, read it all into memory before working with the data, or process the data in small chunks. MOX can do all of this, very very simply.
Easy to implement, easy to learn, and your code looks extremely clean instead of the usual mess when trying to parse XML using SAX or DOM.
There are existing alternatives to SAX and DOM, but none of them are as easy to use as MOX. MOX is also stand-alone; it's a small 33k jar file which does not use java's existing XML architecture. It's compatible with any java edition (including j2me) and you can ship it with any java product without bloating the size of your own application.
An example XML format to parse: ATOM
To show off why MOX is so simple, we'll write a parser for a real, moderately complicated XML-based format: ATOM - the feed/syndication format.
Here's a sizable ATOM sample- and we'll parse all of it inside of 10 minutes:
<?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom"> <title>dive into mark</title> <updated>2005-07-31T12:29:29-0100</updated> <id>tag:example.org,2003:3</id> <link rel="alternate" type="text/html" hreflang="en" href="http://example.org/"/> <link rel="self" type="application/atom+xml" href="http://example.org/feed.atom"/> <rights>Copyright (c) 2003, Mark Pilgrim</rights> <generator uri="http://www.example.com/" version="1.0"> Example Toolkit </generator> <entry> <title>Atom draft-07 snapshot</title> <link rel="alternate" type="text/html" href="http://example.org/2005/04/02/atom"/> <link rel="enclosure" type="audio/mpeg" length="1337" href="http://example.org/audio/ph34r_my_podcast.mp3"/> <id>tag:example.org,2003:3.2397</id> <updated>2005-07-31T12:29:29+0100</updated> <published>2003-12-13T08:29:29GMT-04:00</published> <author> <name>Mark Pilgrim</name> <uri>http://example.org/</uri> <email>f8dy@example.com</email> </author> <contributor> <name>Sam Ruby</name> </contributor> <contributor> <name>Joe Gregorio</name> </contributor> <content> [Update: The Atom draft is finished.] </content> </entry> </feed>
Looks like a fairly difficult job if you had to parse this using DOM or SAX, no? With MOX, all we have to do, is create very simple java objects that are structurally related to the XML we wish to parse. In this case, it means creating a class for each complicated tag structure we spot.
Our first mapped class: FEED
The top-level element in the ATOM XML format is the 'feed' element. We'll map the structure of the feed tag onto a new class, which we'll also call Feed. Start your favourite code editor, create an 'atom' subdir which we'll use as a package, and open a new file called Feed.java.
Our first order of business is to import com.itipjar.mox.* and to indicate to MOX that this class is in fact the mapping class for the feed tag:
package atom; import com.itipjar.mox.*; @Tag(namespace="http://www.w3.org/2005/Atom") public class Feed { }
As you can see, we've imported mox so we can use the annotations found there, and we've marked our Feed class as being a mapping class by adding the Tag annotation to it. We've also passed along the namespace, as MOX supports XML namespaces. We don't actually have to say that the tagname is 'feed' - MOX can figure this out by itself because the classname is 'Feed'.
Next, lets add mappings for the simple stuff. <title>, <updated>, <id>, and <rights> all look really simple. All those tags just have a single piece of data as content, and no attributes. MOX calls these kinds of tags 'primitive tags', and you do not have to create explicit mapping classes for them. title, id, and rights are just strings, which are simple. updated is a bit harder - it's a date. MOX supports dates, and to mark a piece of data as a date we need to know the date format, according to java's SimpleDateFormat formatting rules. Its javadoc shows this very date format as example, so we simply take its formatting string and use it for our Feed class:
package atom; import com.itipjar.mox.*; @Tag(namespace="http://www.w3.org/2005/Atom") public class Feed { @Map String title; @Map(dateFormat="yyyy-MM-dd'T'HH:mm:ssz") long updated; @Map String id; @Map String rights; }
The @Map annotation indicates that this field maps onto a child tag inside the Feed tag. the name of the child tag is figured out automatically from the fieldname. We had to specify the date format for the updated field, but because we did, we can now let MOX take care of the date parsing; we get java's native time storage format: milliseconds since the epoch, as a long.
Generator looks a bit more complicated, so we'll create a unique new mapping class for it, which we will call Generator.java. First, we add the mapping for the generator tag in our Feed.java:
... @Map Generator generator; ...
Our generator class will introduce some new concepts. For one, the generator tag has attributes. Secondly, generator does not contain any child tags (we know how to handle those, that's what @Map is for). It has direct text content. If that's all it had, we would just treat Generator as a primitive tag, but, due to the attributes, we can't. We want to mark a field inside our Generator class as mapping to the direct content of the tag. Turns out MOX can help us out, here: We use the @Attribute and @SimpleContent annotations, like so:
package atom; import com.itipjar.mox.*; @Tag public class Generator { @Attribute String uri; @Attribute String version; @SimpleContent String content; }
@Attribute can be used almost exactly like we've been using @Map so far, except that you are restricted to primitive types. The dateFormat trick we used to parse the date earlier can also be used with the @Attribute tag. The @SimpleContent tag can only be used on String fields. Note how we no longer bother with any namespaces - because Generator is used inside Feed, which has a namespace marker, MOX will assume this new <generator> tag goes with the same namespace.
Our next job is to deal with the <link> tags. We'll use a dedicated map class for the link tag, and we know how to make them now; we can use @Attribute just like we did when we built Generator.java. However, we can't use a plain vanilla @Map in our Feed.java file to map it; there is more than one <link> tag! We'll have to tell MOX that we accept more than one link tag, and we'll have to use a data field that can handle more than one object. You can pick either some suitable class from the java collections framework, or an array. In this case, we'll use a List, because it seems the obvious thing to do. Here's what we add to Feed.java:
... @Map(mapType=MapType.MULTIPLE_TAGS, tagName="link") List<Link> links; ...
We've introduced a couple of new concepts here. Let's go through them one by one:
The MULTIPLE_TAGS indicator tells MOX that we are mapping multiple tags onto a single field. The other alternatives are SINGLE_TAG, which is the default and which is what we've been using so far, and COLLECTION_TAG, which is what you use if there is a single <links> tag with a bunch of <link> tags inside it. When we use MULTIPLE_TAGS (or COLLECTION_TAG), the field must be some sort of collection type. An array works, or you can use something from the java collections api, such as lists, sets, vectors, queues, and the like.
Normally, collection fields have a plural name to show that it represents more than one link tag. However, now MOX is lost and can no longer match your field to the tag name (as 'links' no longer matches when comparing it to the actual tag name 'link'). To fix this, we explicitly specify the name of the tag we're matching with this @Map statement: tagName="link".
Now that we have a @Map annotation, all that's left to handle the link stuff is to define the actual Link class. Here's Link.java:
package atom; import com.itipjar.mox.*; @Tag public class Link { public enum LinkRelation { alternate, enclosure, self; } @Attribute String type; @Attribute Integer length; @Attribute String href; @Attribute LinkRelation rel = LinkRelation.enclosure; }
Couple of new concepts here. One of the things you may have noticed when looking at the example: Only one of the two link tags has a length attribute. To handle missing attributes properly, use the wrapper class form of int: java.lang.Integer. The first link tag from the example will now just have null as a value for the length field.
The second new concept is the use of java 1.5's enum feature. rel can only have only a couple of values: enclosure, self, or alternate. As long as the names of the enum values match the XML values, you can use @Attribute (and a primitive @Map tag) to read and write enums.
We also see one of the two mechanisms for default values. MOX overwrites any existing values, but will not touch a value if the XML doesn't contain any data for it. In this case, if there is no attribute named 'rel', the default 'LinkRelation.enclosure' will be untouched.
Now that we have our link tags all set up, we move on to feed's final tag: entries. There's only one entry in the example XML, but, obviously, there could be more than one entry. We use MULTIPLE_TAGS again. Adding the @Map for the entry tags, we are done with Feed.java. Here's the final version:
package atom; import java.util.List; import com.itipjar.mox.*; @Tag public class Feed { @Map String title; @Map(dateFormat="yyyy-MM-dd'T'HH:mm:ssz") long updated; @Map String id; @Map(mapType = MapType.MULTIPLE_TAGS, tagName="link") List<Link> links; @Map String rights; @Map Generator generator; @Map(mapType = MapType.MULTIPLE_TAGS, tagName = "entry") List<Entry> entries; }
Now we have to tackle the Entry tag. Obviously, we use a dedicated mapping class, as it's a complicated tag. Most of it will be quite simple; we already have Link.java which we'll just reuse for the link tags appearing inside the entry tag. The rest of the tags are primitive, except for <author> and both <contributor> tags. We'll have to create dedicated mapped classes for them later. Here's all the code for Entry.java:
package atom; import java.util.List; import com.itipjar.mox.*; @Tag public class Entry { @Map String title; @Map(mapType = MapType.MULTIPLE_TAGS, tagName="link") List<Link> links; @Map String id; @Map(dateFormat="yyyy-MM-dd'T'HH:mm:ssz") long updated; @Map(dateFormat="yyyy-MM-dd'T'HH:mm:ssz") long published; @Map Author author; @Map(mapType = MapType.MULTIPLE_TAGS, tagName="contributor") List<Contributor> contributors; @Map String content; }
Our last job is to create Contributor.java and Author.java. As you may have noticed, eventhough they are differently named, their internal structure is exactly the same. We can represent this in java by creating a superclass, which we shall call Person.java, which contains the common elements (in this case, everything, except the name of the tag!). Because person is not actually a valid tag itself, we do NOT mark our Person class with the @Tag annotation.
Here's Person.java:
package atom; import com.itipjar.mox.*; public class Person { @Map String name; @Map String uri; @Map String email; }
Now with Person as a basis, we can create the mapped classes for <contributor> and <author>:
Author.java:
package atom; import com.itipjar.mox.*; @Tag public class Author extends Person {}
Contributor.java:
package atom; import com.itipjar.mox.*; @Tag public class Contributor extends Person {}
Reading and Writing XML
Now that we've designed all the classes we are going to use, all that's left is to write code that uses MOX to read an XML and convert it into a Feed object. Here's a very simple example that reads from 'atomexample.xml', converts to a Feed object, and then prints the Feed object to standard output in XML format again.
Here's AtomExample.java:
package atom; import java.io.IOException; import com.itipjar.mox.*; public class AtomExample { public static void main(String[] args) throws IOException, MoxException { XmlReader reader = XmlReader.makeReader(); MoxClassRegistry registry = new MoxClassRegistry(); registry.addPackage("atom"); Feed feed = (Feed)reader.process(registry, "atomexample.xml"); XmlWriter writer = XmlWriter.makeWriter(); writer.process(System.out, feed); } }
The MoxClassRegistry is a collection of mapped class definitions. Before MOX can use the classes we've been making so far, such as Feed.java, for parsing the XML, it needs to know about them. That's what the registry is for. You can add single classes, or, as a convenience, you can add all @Tag marked classes inside an entire package, which is the method used in the example above.
Run it, and you have finished the tutorial - and you should be well on your way to parsing and writing XML in a much saner environment.
Compiling the examples / downloading MOX
Obviously, to use MOX, you need the MOX libraries. I'm trying to convince the java 1.6 'mustang' team that MOX should be included in the next release of the JDK, but until that happends, you have to download the MOX jar. You can get the latest version here (right click, save as).
To compile the examples, let's say you have a directory c:\moxtutorial. Within this directory you have mox.jar and the 'atom' directory containing all the sources. The sample XML is located in c:\tutorial\atomexample.xml. To compile all sources, go to the c:\moxtutorial directory and type:
javac -classpath .;mox.jar atom\*.java
and to run the example program:
java -classpath .;mox.jar atom.AtomExample
You can download all the code written so far, including the sample XML, here.
A word to the wise...
MOX is currently in beta. It hasn't been thoroughly tested, and some important features are missing. This tutorial hasn't covered all the details yet, either. I'll write some documentation dealing with the more advanced concepts, such as MapType.CONTAINER_TAG, @MapAll, and @AttributeMap at a later date. There are also some important features I'd like to add in a second release:
Contacting me
My name is Reinier Zwitserloot. I hack java mostly for a hobby. I've developed this first version of MOX in the span of a weekend, in response to Graham Hamilton's request for java boilerplate. You can contact me by mailing me at my firstname @ my lastname.com.