Pennin' a DSL — err.the

Okay, so you couldn’t give a flip about mofo or microformats. I understand. But DSLs are pretty cool. Here’s how I went about implementing the microformat definition DSL in mofo.

Real quickly: a microformat is, at least for the sake of this blog post, a bunch of HTML tags flagged by special CSS classes which describe certain properties of a something. Like a review, or a person. Each something exists within a container with a class designating its contents as a specific microformat. An hCard microformat container has a class of vcard, for example.

Here’s a reaaaaal simple hCard example:

<div class="vcard">
  <a class="url fn" href="http://ozmm.org/">Chris Wanstrath</a>
  <div class="org">Err the Blog</div>
  <div class="email">chris[at]ozmm[dot]org</div>
</div>

What we want to get from this is an object with the following properties: url, fn, org, and email. So far, so good? So good.

The DSL

Here’s the mofo definition for an hReview, another type of microformat.

class HReview < Microformat
  one :version, :summary, :type, :dtreviewed, :rating, :description

  one :reviewer => HCard

  one :item do
    one :fn
  end

  many :tags => RelTag
end

Knowing what you know about the hCard, can you visualize what sort of HTML we’re looking for in an hReview? There may be a few questions (what do the arrows and the nesting mean?), but I imagine you’ve got an idea.

Shelve your curiosity. How would you implement this in Ruby? What do you want this to look like internally? Our parser needs to know what to search for: how will it know, given only the definition above?

(This is starting to sound like a bad programming book where they barrage you with questions you already knew were coming. Sorry!)

The Inside

First: when I look at this DSL, I see only two plain jane Ruby methods: one and many. Both appear to take a list of symbols, a hash, or a symbol and a block. I probably need to treat properties flagged as one or many differently in my parser, so I need to remember the mappings of type to property.

I think I’ll use a hash with two keys: :one and :many.

Maybe something like this:

  { :one => [ :version, :summary, :type, :dtreviewed, :rating, :description,
     :reviewer, :item ], :many => [ :tags ] }

That works, but I’m forgetting about the arrows and the nesting. I’ll store the properties with arrows as hashes, giving me this:

  { :one => [ :version, :summary, :type, :dtreviewed, :rating, :description,
    {:reviewer => HCard}, :item ], :many => [ {:tags => RelTag} ] }

Now, how to do the nesting? The nesting represents, conceptually, the same idea our entire microformat thingie does: microformats are a collection of classes inside of a container and some of those classes can also act as a container with their own classes / properties within. A good place for a block.

If the blockage is a mini microformat (sort of), I can store the same :one and :many hash within it.

The final version of our hash:

  { :one => [ :version, :summary, :type, :dtreviewed, :rating, :description,
    {:reviewer => HCard}, {:item => {:one => [:fn]} },
    :many => [ {:tags => RelTag} ] }

Because we’re using arrays as values, the order in which we define properties is maintained. Not bad, in case it matters.

So we’ve got a bones simple DSL and the data structure we want it converted to. How do we get from A to B?

The Implementation

You got it in your head already, yeah? Not too difficult.

See, one and many are class methods of Microformat. Since we’re defining a new class, HReview, we can build out the properties hash we previously dreamt up as the methods are being called. If we store it all in a class instance variable, it will be unique to HReview and not affect Microformat or its other children (more on class vs class instance variables here).

class Microformat
  class << self
    def properties_hash
      @properties_hash ||= Hash.new([])
    end

    def one(*properties)
      properties_hash[:one] += properties
    end

    def many(*properties)
      properties_hash[:many] += properties
    end
  end
end

If you’re going to run this code, you may want to define empty HCard and RelTag classes to avoid any constant missing errors:

class HCard; end
class RelTag; end

Okay, so in Microformat we’ve got three methods wrapped in class << self (which essentially opens up the class instance as the current scope). The first, properties_hash, gives us access to a hash which initializes its keys as empty arrays. (See here for more hash tricks.) The second and third, as promised, simply take the parameters passed and adds them to the appropriate array. Once we define this class and then define our HReview class, we can check to see if it worked:

>> require 'microformat'
=> true
>> require 'hreview'
=> true
>> HReview.properties_hash
=> {:many=>[{:tags=>RelTag}], :one=>[:version, :summary, :type, :dtreviewed, :rating, :description, {:reviewer=>HCard}, :item]}

Looks pretty damn good to me. But what about :item? We didn’t capture its block. Hrm.

We’re going to have to check to see if our class method was passed a block and, if so, create a hash keyed by the passed property (:item) consisting of a properties hash of :one and :many. Not as hard as it sounds. I think I’ll just cheat and do it somewhat recursively.

Here’s my new Microformat (I haven’t touched many yet, just one):

class Microformat
  class << self
    def properties_hash
      @properties_hash ||= Hash.new([])
    end

    def properties_hash=(value)
      @properties_hash = value
    end

    def one(*properties)
      if block_given?
        current_hash = properties_hash.dup
        properties_hash.clear
        instance_eval { yield }
        properties = [{ properties.first => properties_hash }]
        self.properties_hash = current_hash
      end
      properties_hash[:one] += properties
    end

    def many(*properties)
      properties_hash[:many] += properties
    end
  end
end

Doing a puts on HReview.properties_hash now gives us:

{:many=>[{:tags=>RelTag}], :one=>[:version, :summary, :type, :dtreviewed,
 :rating, :description, {:reviewer=>HCard}, {:item=>{:one=>[:fn]}}]}

Bloody paydirt. But what’s all that code, and how can we add it to many?

What we’re doing is checking to see if a block was passed. If so, we save a copy of the current properties_hash in the current_hash local variable. This is so we can erase it (with the next line, properties_hash.clear) and re-run the DSL-collecting code we’ve already written on the block.

Next we do an instance_eval on the passed block. The instance, in this case, is our class: the one method in the block is the same as a one method in the class. The difference is that the block is working on a fresh, empty properties_hash: it’s conceptually a different scope, so we need to start with a blank slate.

Once we build the properties_hash for the block (in this case, just { :one => [:fn] }), we can re-assign what was passed in. Instead of a simple array, we’re changing our properties variable to be a single element array consisting of a hash containing our new properties_hash. Just like we planned when dreaming up what the internal representation of our DSL should be. Remember, we’re cheating and only temporarily using the properties_hash for the passed block.

Our new properties local variable will look something like [ :item => { :one => [:fn] } ] in this situation.

Finally, we set the class’ properties_hash back to what it was before we started and then commence normal execution.

Now, we need to share this new code with the many method. We could try and DRY it up by using define_method, but the problem there is the block_given? and yield: define_method’s creations don’t handle those so well. Another idea is to inject this into method_missing and do it all in there, passing calls which don’t match one or many up the chain. I’d feeling less tricky this evening, though. Here’s my final code:

class Microformat
  class << self
    def properties_hash
      @properties_hash ||= Hash.new([])
    end

    def one(*properties, &block)
      add_properties_to_hash(:one, *properties, &block)
    end

    def many(*properties, &block)
      add_properties_to_hash(:many, *properties, &block)
    end

  private
    def properties_hash=(value)
      @properties_hash = value
    end

    def add_properties_to_hash(key, *properties, &block)
      properties = nested_hash_from_block(properties.first, &block) if block_given?
      properties_hash[key] += properties
    end

    def nested_hash_from_block(key, &block)
      current_hash = properties_hash.dup
      properties_hash.clear
      instance_eval(&block)
      local_properties_hash = properties_hash
      self.properties_hash  = current_hash
      [{ key => local_properties_hash }]
    end
  end
end

Which gives us:

{:many=>[{:tags=>RelTag}], :one=>[:version, :summary, :type, :dtreviewed,
 :rating, :description, {:reviewer=>HCard}, {:item=>{:one=>[:fn]}}]}

Victory. Now write some tests and get out of here.

It’s perhaps worth noting that “Hash.new([])” doesn’t do what you might expect it to do. It will return the same array object for every key that you try to access. For instance…

props = Hash.new([])

props[:one] << 'author'
props[:many] << 'tags'

props[:one] # => ["author", "tags"]

The bug isn’t noticed in the Microformat class because += is used, which assigns a new array object each time. In my (contrived) example, << is used, which modifies the array in-place.

What you’d usually want is something like

Hash.new{|h,k| h[k] = [] }

Jon beat me to the punch. I thought that line looked suspicious.

I tried: a = Hash.new([]) a[:b] = a[:b] << ‘c’ a[:b] a[:d]

Sure enough a[:d] returned [‘c’].

Jon thanks for the pointer on how to do that properly. I tried out shorter form methods for what you did, the best I could come up with is: Hash.new(&Proc.new{[]})

Sorry, no more comments :(