ROXML 2.0: An open-source takeover (now with defaults, blocks, hash mapping & better syntax)

I've remarked privately from time to time how I couldn't possibly achieve the things I've been working on without a million bits of work put forth by others. The list is massive, everything from the OS I live in to the web framework I build on and the language it's written in, to my source-control management program & IDE, my browser, and a million smaller pieces. Not to mention the many web services, both commercial and not-for-profit, and various tools disguised as websites. Most important of these, lately, seems to have been the git & github team. I'd never done much open-source development in my days, before picking up git. Even if I made changes to an existing project, it was too much a fuss to publish changes, and I was busy with other things. Busy enough that what changes I made, I kept for myself. But now it seems that git(hub) has changed that. Over the past few months I've made a number of contributions, of minor significance, but am graduating into more interesting territory, which is this, ROXML 2.0. ROXML is short for "Ruby Object to XML Mapping Library." XML goes in, Ruby objects come out. In these days, with web-services flying so freely across the web, and XML being one of the common languages thereof, it seems useful that one might be able to declare a mapping in the form of a collection of objects, then extend those objects with functionality, and interact with them cleanly, as objects, not as structured text. I had just this need, and came across ROXML and a number of other similar options. I picked ROXML because it was relatively clean, simple and to the point. I've been hacking away at in in the spare parts of my nights and weekends and by now there's very little recognizable from the original library, so I figure it's time to release :-). For an initial look at ROXML, you can see John Nunnemaker's recent post, or the old site & the docs therein. I'll be using those as a baseline.

A syntax makeover

Now, I can be a stickler for syntax. As I've said before, syntax matters. It changes the way we interact with the system, it changes what is done, how it is done, and what can be done. Take, for example, the old syntax: [sourcecode language='ruby'] class Posts include ROXML xml_attribute :user xml_attribute :tag xml_object :post, Post, ROXML::TAG_ARRAY end [/sourcecode]

Read-only-ability

There's something important missing in those definitions above. Quick, is :user modifiable, or no? What about :tag and :post? Surely it's one or the other, but which? It turns out that the attributes above are writable, which is the default. To override this you'd have to write the following: [sourcecode language='ruby'] xml_attribute :user, ROXML::TAG_READONLY [/sourcecode] As syntaxes go, this is a pretty obtuse barrier to const-correctness, and will likely lead to most developers simply leaving their attributes writable, even when more restrictive setting would be correct. The Ruby community may have cast aside strict typing, but const-correctness is still a very important part of object-oriented programming, what with factoring being all about minimum exposure and minimum coupling, and it ought to be treated as such. The solution is to treat writability the same way the standard attr methods do, by making it a key part of the declaration name. The type name is relegated to a parameter, which gives us flexibility we'll exploit later. In short, you end up with this: [sourcecode language='ruby'] # read-only: xml_reader :user, :attr # writable: xml_accessor :user, :attr [/sourcecode]

Object-tivity

Now you may notice above that :attr declares the referenced type as the second argument. This is consistent throughout and there are several more types. They are:
  • :attr: an xml attribute on the current node, returned as text
  • :text: the contents of a named sub-node, returned as text
  • :content: the contents of the current node, returned as text
  • Object: Any ROXML object can be provided to declare sub-types, including recursive types (provided recursion terminates)
  • [Object], [:text]: Put the type in an Array to declare that there are multiple instances of this type which should be provided in a collection
  • {}: A hash type can be populated with sub-nodes and attributes, in various ways
:text is the default, if no type is declared,

Named args & TAG_what?

The old ROXML uses only positional arguments and these TAG_ constants to declare aspects of the declaration. But the ROXML::TAG_ stuff is unnecessarily heavyweight, so the new ROXML uses symbols instead, e.g. :cdata rather than ROXML::TAG_CDATA. Likewise, many optional arguments are now named, rather than positional. So rather than have to put in the default values for these parameters, or nil, you can simply omit them. So these: [sourcecode language='ruby'] xml_text :name, 'NAME', ROXML::TAG_CDATA & ROXML::TAG_READONLY, 'USER' xml_text :name, nil, ROXML::TAG_READONLY, 'USER' [/sourcecode] Become: [sourcecode language='ruby'] xml_reader :name, :from => 'NAME', :in => 'USER', :as => :cdata xml_reader :name, :in => 'USER' [/sourcecode] The options map as follows:
  • :in: Previously 'wrapper'
  • :from: Previously 'name'
  • :else: Used to declare a default value in case the entity is missing; previously unavailable
  • :as: Previously 'options'. Can be passed as a singly symbol, or multiple in an array

Hash attack!

One of the more important additions is the Hash base type mapping. Hash declarations have a syntax of their own which enable you to pull from attributes, contents, names and sub-nodes of a series of entries. This can be super-useful for web-services which provide collections of named attributes, which fit naturally in this type. The ROXML documentation covers these cases well. Here's a few example of the syntax: [sourcecode language='ruby'] xml_reader :definitions, {:attrs => ['dt', 'dd']} xml_reader :definitions, {:key => {:attr => 'word'}, :value => :content}, :in => 'definitions' [/sourcecode]

Blocks, yo

As xml is by its very nature textual, it may be necessary to coerce it into a certain type or otherwise modify the data before it makes proper sense in the context of an object. As such you can supply a block which enables you to modify the incoming text, for example: [sourcecode language='ruby'] xml_reader :count, :attr do |val| Integer(val) end [/sourcecode]

Under the hood

Finally, an invisible improvement is that I've moved it over to libxml-ruby rather than REXML, for the sake of performance. So it should be significantly zippier on large sets of data, though to be honest I've done no profiling to confirm this.

Wrapping up...

So there you have it. You can get the gem from my github. You can see the docs here. I'll see if I can get this onto the official rubyforge site as well. The code is fresh, and while I've increased testing on it significantly (up to 131 assertions from the initial 25), there's plenty of possibility for bugs therein. Please just send me a message or pull request on github if you run into anything. Finally, I'd be remiss if I didn't thank Anders Engstrom, Zak Mandhro and Russ Olsen for their prior work on ROXML, which made this all a lot easier to get going. Thanks guys.

Open Source in Action

Ever since I picked up Ruby on Rails, a Ruby-language web framework for a project of mine, I've been curious about what web frameworks were out there for Python, another of my favored languages. A few of them out there are Django and TurboGears. Another that happened to catch my attention was the web.py a framework by Aaron Swartz whom I recently met at the first (of hopefully many yet to come) Startup school in Boston. Anyway, I was browsing the source of 0.13 of Aaron's framework and noticed something amiss. Specifically, in storify, Aaron was testing isinstance(k, list) on a value k which he had just used to index into an array. Now in Python arrays are mutable, and arrays and dictionaries can only be indexed by immutable values. That's ints, strings, but not lists. Seeing that something was amiss, I sent him a quick e-mail, and now I'm in his changelog entry for version 0.132. I'm reproducing the relevant line here for my future self: Fix bug with storify when it received multiple inputs (tx Ben Woosley). In other news, Apple just announced it was recognizing contributors to their WebCore HTML Layout engine, with gifts of computers and free tickets to Apple's WWDC. They list none-to-trivial contributions to on a blog posting on David Hyatt's blog Surfin' Safari. One could call these events arguments both for the model of open source, and for code reviews as well. But then my evidence is anecdotal, so you're welcome to your own interpretation.