ROXML 2.0: An open-source takeover (now with defaults, blocks, hash mapping & better syntax)

I've remarked privately from time to time how I couldn't possibly achieve the things I've been working on without a million bits of work put forth by others. The list is massive, everything from the OS I live in to the web framework I build on and the language it's written in, to my source-control management program & IDE, my browser, and a million smaller pieces. Not to mention the many web services, both commercial and not-for-profit, and various tools disguised as websites. Most important of these, lately, seems to have been the git & github team. I'd never done much open-source development in my days, before picking up git. Even if I made changes to an existing project, it was too much a fuss to publish changes, and I was busy with other things. Busy enough that what changes I made, I kept for myself. But now it seems that git(hub) has changed that. Over the past few months I've made a number of contributions, of minor significance, but am graduating into more interesting territory, which is this, ROXML 2.0. ROXML is short for "Ruby Object to XML Mapping Library." XML goes in, Ruby objects come out. In these days, with web-services flying so freely across the web, and XML being one of the common languages thereof, it seems useful that one might be able to declare a mapping in the form of a collection of objects, then extend those objects with functionality, and interact with them cleanly, as objects, not as structured text. I had just this need, and came across ROXML and a number of other similar options. I picked ROXML because it was relatively clean, simple and to the point. I've been hacking away at in in the spare parts of my nights and weekends and by now there's very little recognizable from the original library, so I figure it's time to release :-). For an initial look at ROXML, you can see John Nunnemaker's recent post, or the old site & the docs therein. I'll be using those as a baseline.

A syntax makeover

Now, I can be a stickler for syntax. As I've said before, syntax matters. It changes the way we interact with the system, it changes what is done, how it is done, and what can be done. Take, for example, the old syntax: [sourcecode language='ruby'] class Posts include ROXML xml_attribute :user xml_attribute :tag xml_object :post, Post, ROXML::TAG_ARRAY end [/sourcecode]

Read-only-ability

There's something important missing in those definitions above. Quick, is :user modifiable, or no? What about :tag and :post? Surely it's one or the other, but which? It turns out that the attributes above are writable, which is the default. To override this you'd have to write the following: [sourcecode language='ruby'] xml_attribute :user, ROXML::TAG_READONLY [/sourcecode] As syntaxes go, this is a pretty obtuse barrier to const-correctness, and will likely lead to most developers simply leaving their attributes writable, even when more restrictive setting would be correct. The Ruby community may have cast aside strict typing, but const-correctness is still a very important part of object-oriented programming, what with factoring being all about minimum exposure and minimum coupling, and it ought to be treated as such. The solution is to treat writability the same way the standard attr methods do, by making it a key part of the declaration name. The type name is relegated to a parameter, which gives us flexibility we'll exploit later. In short, you end up with this: [sourcecode language='ruby'] # read-only: xml_reader :user, :attr # writable: xml_accessor :user, :attr [/sourcecode]

Object-tivity

Now you may notice above that :attr declares the referenced type as the second argument. This is consistent throughout and there are several more types. They are:
  • :attr: an xml attribute on the current node, returned as text
  • :text: the contents of a named sub-node, returned as text
  • :content: the contents of the current node, returned as text
  • Object: Any ROXML object can be provided to declare sub-types, including recursive types (provided recursion terminates)
  • [Object], [:text]: Put the type in an Array to declare that there are multiple instances of this type which should be provided in a collection
  • {}: A hash type can be populated with sub-nodes and attributes, in various ways
:text is the default, if no type is declared,

Named args & TAG_what?

The old ROXML uses only positional arguments and these TAG_ constants to declare aspects of the declaration. But the ROXML::TAG_ stuff is unnecessarily heavyweight, so the new ROXML uses symbols instead, e.g. :cdata rather than ROXML::TAG_CDATA. Likewise, many optional arguments are now named, rather than positional. So rather than have to put in the default values for these parameters, or nil, you can simply omit them. So these: [sourcecode language='ruby'] xml_text :name, 'NAME', ROXML::TAG_CDATA & ROXML::TAG_READONLY, 'USER' xml_text :name, nil, ROXML::TAG_READONLY, 'USER' [/sourcecode] Become: [sourcecode language='ruby'] xml_reader :name, :from => 'NAME', :in => 'USER', :as => :cdata xml_reader :name, :in => 'USER' [/sourcecode] The options map as follows:
  • :in: Previously 'wrapper'
  • :from: Previously 'name'
  • :else: Used to declare a default value in case the entity is missing; previously unavailable
  • :as: Previously 'options'. Can be passed as a singly symbol, or multiple in an array

Hash attack!

One of the more important additions is the Hash base type mapping. Hash declarations have a syntax of their own which enable you to pull from attributes, contents, names and sub-nodes of a series of entries. This can be super-useful for web-services which provide collections of named attributes, which fit naturally in this type. The ROXML documentation covers these cases well. Here's a few example of the syntax: [sourcecode language='ruby'] xml_reader :definitions, {:attrs => ['dt', 'dd']} xml_reader :definitions, {:key => {:attr => 'word'}, :value => :content}, :in => 'definitions' [/sourcecode]

Blocks, yo

As xml is by its very nature textual, it may be necessary to coerce it into a certain type or otherwise modify the data before it makes proper sense in the context of an object. As such you can supply a block which enables you to modify the incoming text, for example: [sourcecode language='ruby'] xml_reader :count, :attr do |val| Integer(val) end [/sourcecode]

Under the hood

Finally, an invisible improvement is that I've moved it over to libxml-ruby rather than REXML, for the sake of performance. So it should be significantly zippier on large sets of data, though to be honest I've done no profiling to confirm this.

Wrapping up...

So there you have it. You can get the gem from my github. You can see the docs here. I'll see if I can get this onto the official rubyforge site as well. The code is fresh, and while I've increased testing on it significantly (up to 131 assertions from the initial 25), there's plenty of possibility for bugs therein. Please just send me a message or pull request on github if you run into anything. Finally, I'd be remiss if I didn't thank Anders Engstrom, Zak Mandhro and Russ Olsen for their prior work on ROXML, which made this all a lot easier to get going. Thanks guys.
14 responses
Awesome! I was just getting ready to fork the ROXML project myself to switch the XML library over to libxml-ruby. REXML was one of the reasons I wasn't going to use ROXML, despite it having the best Ruby/XML binding API currently available.

I like the changes you made to the syntax. It adds even more power and cleanliness to the API.

The only suggestion I would like to make is to change the name of the project enough to distinguish it from the old ROXML. I almost missed this because I didn't realize that it was actually a new project. I just thought the ROXML project had moved over to GitHub.

I'd like to sometime build a code generator for creating your ROXML classes based on an XSD schema (kind of like JAXB does for Java).

Anyways, this all looks really cool. I'll be sure to report any bugs I might find.

Very cool.

The only problem I'm running into is namespace support. It's not finding the elements and attributes if there is a namespace present in the root node.

Does this ROXML have a way of dealing with namespaces?

Update: This is not a libxml bug, and the problem is fixed in ROXML 2.1

Yeah Jimmy, unfortunately this is a LibXml bug. I made a first stab at forking their rubyforge svn into github, to fix this, but ran into a git import error, which may be fixed in a more current version of git-svn.

Until this is fixed, you can use REXML. I've added fallback to REXML in my own development branch on github, and that will be in 2.1. Rexml is used if libxml is not present or if you precede your require with the following (admittedly ugly) declaration. This may be cleaned up for final release:



module ROXML
module XML
ENGINE = 'rexml'
end
end
require 'roxml'

Until the libxml bug is fixed, you also might consider forking ROXML (ideally from my development branch) and having it manually remove the namespace declaration in this case (if libxml is being used). That would go early in the parse function. Otherwise, I'll get around to fixing it myself soon, and include the workaround in 2.1.

Great work Ben! Thanks for picking up ROXML and giving it some much needed TLC.
Hey, thanks for this overhaul.

We've been using ROXML 1.X for an internal gem for working with the (rather large) ONIX standard. I started hacking on a new branch last week that replaces ROXML with raw libxml for performance reasons, but it looks like I won't need to continue with it any more :)
Just a reminder: if you have any trouble or feel a need to extend the library, please modify it publicly on github so I can pick up the changes. Thanks!
I too worship at the alter of git, so that'll be no problem :)
Ben,

Are you able to explain the difference between these 2 attribute declarations? They seem to do more or less the same thing in my testing, but I'm probably missing something.

xml_accessor :contributors, Contributor, :as => :array
xml_accessor :contributors, [Contributor]
They're equivalent. The former is used internally when we translate the the arguments into something more manageable, and can be used without problem. The latter is the preferred syntax, and I think the only documented one.
ROXML is great, but i have a question. How can i define a superclass and subclass it to extend its xml attributes? The following does not work:

class MySuper
include ROXML
xml_name :sitt
end

class MySub < MySuper
xml_accessor :user_id
end

the to_xml method called on an instance of MySub will not have the xml name 'sitt' as defined in the superclass.
This would be a really useful feature to DRY up the code.

The second issue that i have are the dependencies on the extensions library, which causes problems when using activescaffold
Thanks Andy, this should be fixed in 2.4.0, thanks to this commit: http://github.com/Empact/roxml/commit/55da49cb8...

As for extensions... I'd need more details to know what problem exactly it has with activescaffold, which I haven't used. Feel free to fork the code on github and I'll pick up whatever changes you make.
So today I happened to run into my own problem with the extensions dependency, so I went ahead and removed it from the just-released ROXML 2.4.1. Hope this helps!
Firstly, great library. Very handy. Has anyone tried to do something that would encapsulate an array of arbitrary objects ? Like this:

class Fish
xml_accessor :name, :text
end

class Shark < Fish
xml_accessor :teeth, :text
end

class Goldfish < Fish
xml_accessor :friendly, :text
end

class Fishtank
xml_accessor :inhabitants, [Fish]
end

bob = Goldfish.new
bob.friendly = true
bob.name = "bob"

jaws = Shark.new
jaws.name = "jaws"
jaws.teeth = "sharp"

aquarium = Fishtank.new
aquarium << bob
aquarium << jaws

the_xml = aquarium.to_xml.to_s

you should get:

"<fishtank><inhabitant><teeth>sharp</teeth><name>Jaws</name></inhabitant><inhabitant><friendly>true</friendly><name>Bob</name></inhabitant></fishtank>"

Which you notice contains all the user defined fields of all the objects. My expectation would be that since I declared the object array as type Fish, it should only contain Fish methods. How do I now get my array of different types out of this xml string now ? Simply using from_xml results in only Fish being returned. Using Object results in errors.

Any ideas ?
sorry, that should be

aquarium.inhabitants << bob
aquarium.inhabitants << jaws