I've remarked privately from time to time how I couldn't possibly achieve the things I've been working on without a million bits of work put forth by others. The list is massive, everything from the OS I live in to the web framework I build on and the language it's written in, to my source-control management program & IDE, my browser, and a million smaller pieces. Not to mention the many web services, both commercial and not-for-profit, and various tools disguised as websites.
Most important of these, lately, seems to have been the
git &
github team. I'd never done much open-source development in my days, before picking up git. Even if I made changes to an existing project, it was too much a fuss to publish changes, and I was busy with other things. Busy enough that what changes I made, I kept for myself. But now it seems that git(hub) has changed that. Over the past few months I've made a number of contributions, of minor significance, but am graduating into more interesting territory, which is this, ROXML 2.0.
ROXML is short for "Ruby Object to XML Mapping Library." XML goes in, Ruby objects come out. In these days, with web-services flying so freely across the web, and XML being one of the common languages thereof, it seems useful that one might be able to declare a mapping in the form of a collection of objects, then extend those objects with functionality, and interact with them cleanly, as objects, not as structured text.
I had just this need, and came across ROXML and a number of other similar options. I picked ROXML because it was relatively clean, simple and to the point. I've been hacking away at in in the spare parts of my nights and weekends and by now there's very little recognizable from the original library, so I figure it's time to release :-).
For an initial look at ROXML, you can see
John Nunnemaker's recent post, or
the old site & the docs therein. I'll be using those as a baseline.
A syntax makeover
Now, I can be a stickler for syntax. As I've said before,
syntax matters. It changes the way we interact with the system, it changes what is done, how it is done, and what can be done.
Take, for example, the old syntax:
[sourcecode language='ruby']
class Posts
include ROXML
xml_attribute :user
xml_attribute :tag
xml_object :post, Post, ROXML::TAG_ARRAY
end
[/sourcecode]
Read-only-ability
There's something important missing in those definitions above. Quick, is
:user
modifiable, or no? What about
:tag
and
:post
? Surely it's one or the other, but which?
It turns out that the attributes above are writable, which is the default. To override this you'd have to write the following:
[sourcecode language='ruby']
xml_attribute :user, ROXML::TAG_READONLY
[/sourcecode]
As syntaxes go, this is a pretty obtuse barrier to const-correctness, and will likely lead to most developers simply leaving their attributes writable, even when more restrictive setting would be correct. The Ruby community may have cast aside strict typing, but const-correctness is still a very important part of object-oriented programming, what with factoring being all about minimum exposure and minimum coupling, and it ought to be treated as such.
The solution is to treat writability the same way the standard
attr
methods do, by making it a key part of the declaration name. The type name is relegated to a parameter, which gives us flexibility we'll exploit later. In short, you end up with this:
[sourcecode language='ruby']
# read-only:
xml_reader :user, :attr
# writable:
xml_accessor :user, :attr
[/sourcecode]
Object-tivity
Now you may notice above that
:attr
declares the referenced type as the second argument. This is consistent throughout and there are several more types. They are:
-
:attr
: an xml attribute on the current node, returned as text
-
:text
: the contents of a named sub-node, returned as text
-
:content
: the contents of the current node, returned as text
-
Object
: Any ROXML object can be provided to declare sub-types, including recursive types (provided recursion terminates)
-
[Object]
, [:text]
: Put the type in an Array to declare that there are multiple instances of this type which should be provided in a collection
-
{}
: A hash type can be populated with sub-nodes and attributes, in various ways
:text
is the default, if no type is declared,
Named args & TAG_what?
The old ROXML uses only positional arguments and these
TAG_
constants to declare aspects of the declaration. But the
ROXML::TAG_
stuff is unnecessarily heavyweight, so the new ROXML uses symbols instead, e.g.
:cdata
rather than
ROXML::TAG_CDATA
.
Likewise, many optional arguments are now named, rather than positional. So rather than have to put in the default values for these parameters, or
nil
, you can simply omit them. So these:
[sourcecode language='ruby']
xml_text :name, 'NAME', ROXML::TAG_CDATA & ROXML::TAG_READONLY, 'USER'
xml_text :name, nil, ROXML::TAG_READONLY, 'USER'
[/sourcecode]
Become:
[sourcecode language='ruby']
xml_reader :name, :from => 'NAME', :in => 'USER', :as => :cdata
xml_reader :name, :in => 'USER'
[/sourcecode]
The options map as follows:
-
:in
: Previously 'wrapper'
-
:from
: Previously 'name'
-
:else
: Used to declare a default value in case the entity is missing; previously unavailable
-
:as
: Previously 'options'. Can be passed as a singly symbol, or multiple in an array
Hash attack!
One of the more important additions is the
Hash
base type mapping. Hash declarations have a syntax of their own which enable you to pull from attributes, contents, names and sub-nodes of a series of entries. This can be super-useful for web-services which provide collections of named attributes, which fit naturally in this type.
The ROXML documentation covers these cases well.
Here's a few example of the syntax:
[sourcecode language='ruby']
xml_reader :definitions, {:attrs => ['dt', 'dd']}
xml_reader :definitions, {:key => {:attr => 'word'},
:value => :content}, :in => 'definitions'
[/sourcecode]
Blocks, yo
As xml is by its very nature textual, it may be necessary to coerce it into a certain type or otherwise modify the data before it makes proper sense in the context of an object. As such you can supply a block which enables you to modify the incoming text, for example:
[sourcecode language='ruby']
xml_reader :count, :attr do |val|
Integer(val)
end
[/sourcecode]
Under the hood
Finally, an invisible improvement is that I've moved it over to libxml-ruby rather than REXML, for the sake of performance. So it should be significantly zippier on large sets of data, though to be honest I've done no profiling to confirm this.
Wrapping up...
So there you have it. You can get the gem from
my github. You can see
the docs here. I'll see if I can get this onto the official rubyforge site as well.
The code is fresh, and while I've increased testing on it significantly (up to 131 assertions from the initial 25), there's plenty of possibility for bugs therein. Please just send me a message or pull request on github if you run into anything.
Finally, I'd be remiss if I didn't thank Anders Engstrom, Zak Mandhro and Russ Olsen for their prior work on ROXML, which made this all a lot easier to get going. Thanks guys.