So, a world of resources, right? Right. The D is talking about a design philosophy, so let’s philosophize: instead of a URL representing a resource, what if the HTML served at that URL literally was the resource? What if instead of requesting /posts/1 and posts/1.xml, all you ever needed was /posts/1? I’m not making this up. We are indeed seeing the epoch of a world of resources, and in many places we can take the RESTful design a step further: we can eliminate some of our our respond_to and params[:format] requirements. Less code.
What do I mean? Microformats. Remember when everyone realized that HTTP was really good at what it did and since we already knew it, we should just use it and stop going crazy with proprietary / complex / lesser known protocols? Yeah, me too. That was cool. Well here’s another H for you with a similar story: HTML.
I’m going to skip right to the chase. Hrm, who can I pick on? Oh, I know. atmos.
The Flickr Profile
Here’s atmos’ flickr profile.
Let’s say I want to grab his name and some of his personal information. I bet Flickr has an API. Google it. Works. Find a client. Install. API key. Etc. That’s not so bad, and now I can do whatever I want with Flickr. (fear me). But Flickr’s API is Flickr’s API.
Now, microformats. Go ahead and install the Tails extension for Firefox. You’ll thank me later. Now navigate to atmos’ flickr profile again. See that little leaf in the bottom right hand corner? Tails found you a microformat. Click it. Let’s take a look.
It’s some info pulled from atmos’ profile page. How? Straight from the HTML. By adding a few CSS classes into containers which don’t affect display, you can construct a microformat data structure. In this case, Flickr uses hCard—like vCard for the web. They put a few special classes into the HTML that displays atmos’ profile information (like name, url, avatar) and thus created a microformat data structure that any microformat parser, like Tails, can consume. Right there. No people.xml necessary. Check the source and look for vcard. That’s where it all begins.
The Mod Squad
What other types of resources are out there? Well, Cork’d uses hReview and hCard on its wines, Upcoming uses hCalendar for all its events, and ever stoic Err the Blog uses hCard and hAtom for its posts and comments.
Yeah, hAtom. Think about that. Soon you may not even need to code feeds—either your readers will understand hAtom and be capable of subscribing to a URL directly (which NetNewsWire can already do), or you can (now) use an hAtom to Atom converter. Just point Feedburner at the converted feed and bam, insta-Atom. No extra code required.
And that’s the whole point. Less code. The information is already there. Help the machines find it.
Cool, but what good is a browser extension?
Okay, I see your point. Maybe we need a Ruby library? The Google will lead us to two existing implementations: uformats and scrAPI. The latter is maintained by Assaf Arkin whose blog, Labnotes, is essential reading. Subscribe. Consistently super Web 2.0, Ruby, and microformats stuff. Anyway, while these libraries are solid, a nagging voice in the back of mind kept saying the same damn thing over and over: Hpricot.
Enter mofo
mofo is a microformat parser for Ruby based on Hpricot. It’s got a nice little DSL for defining microformats and currently supports hCard, hCalendar, hReview, hEntry, xoxo, rel-tag, and rel-bookmark. There may be a few kinks, but hey, it’s new.
$ sudo gem install mofo
Make sure you have the newest Hpricot, okay? (0.4.59 at least, 0.4 alone won’t cut it)
$ sudo gem install hpricot --source code.whytheluckystiff.net
Back to atmos.
>> require 'mofo' => true >> atmos = HCard.find 'http://flickr.com/people/atmos' => #<HCard:0x58ba10 ... > >> atmos.nickname => "a t m o s" >> atmos.properties => ["url", "nickname", "n", "fn", "logo"] >> atmos.url => "http://www.atmos.org" >> atmos.logo => "http://static.flickr.com/69/buddyicons/10813452@N00.jpg?1154559081" >> atmos.n.given_name => "Corey" >> atmos.fn => "Corey Donohoe"
Man, he is so busted. Let’s try it on Cork’d now.
>> corkd = HReview.find 'http://corkd.com/wine/view/8001' => [#<HReview:0x3ac9630 ... >, #<HReview:0x3a8460c ... >, #<HReview:0x3a3f804 ... >] >> corkd.size => 3 >> review = corkd.first => #<HReview:0x3ac9630 ... > >> review.dtreviewed => Tue Jul 25 00:00:00 PDT 2006 >> review.tags => ["dry", "fruity", "medium-bodied", "mellow", "smooth", "soft", "unoaked"] >> review.rating => 5 >> review.item.fn => "Dry Comal Creek 2003 Unoaked Cabernet Sauvignon Reserve" >> review.reviewer.fn => "cynicalpink"
Fruity, dry, and mellow? C’mon.
Of course, the DSL is pretty simple:
class HReview < Microformat one :version, :summary, :type, :dtreviewed, :rating, :description, :reviewer => HCard many :tags => RelTag one :item! do one :fn end end
Again: the kinks are still being worked out. But it’s mostly a nice start, yeah? See if you can’t figure out what all is going on in that DSL definition.
Oh, also: mofo doubles as a Rails plugin.
$ ./script/plugin install svn://errtheblog.com/svn/projects/mofo
Just drop it in and all its microformat classes are available in your Rails app. Speaking of which…
Real Life
So, you’re still not convinced. That’s fine. Why don’t you skip on over to Chow.com and check out the footer. Bottom right. See it? It’s a list of the most recent posts on Chowhound (a food message board site).
On some page requests, Chow makes a call to the Chowhound REST API and gets a list of the most recent topics in XML. We then process and cache those topics, then stick them in the footer all linked up. Pretty standard stuff.
Except, PJ removed the Chowhound API today. We don’t need it anymore. See, we marked up all the Chowhound topics with the hAtom microformat. Instantly we have feeds, pardon my French, out the wazoo.
>> recent_topics = HEntry.find 'http://www.chowhound.com' => [#<HEntry:0x148f0d4 ... > ... ] >> recent_topics.size => 17 >> recent_topics.first.entry_title => "Red Pearl Kitchen (Huntington Beach) Review + Pics"
Who needs the extra code? Even our API client is simpler—we just let mofo do the work.
And, hello?, bonus: mofo was built to work with to_yaml, Marshal, and memcached. Here’s the Chowhound model we use to wrap mofo and do our caching with acts_as_cached:
class Chowhound acts_as_cached def self.find(which) case which when :groups then XOXO.find("http://www.chowhound.com", :class => true) when :topics then HEntry.find("http://www.chowhound.com").first(5) end end end
Now we can just call Chowhound.get_cache(:topics) to get the 5 most recent topics. When the cache expires, get_cache will hit Chowhound#find and grab the newest 5 from Chowhound.
What’s this XOXO business? Another microformat. Did you happen to see the Chowhound board listing in the subnav of Chow?
How do you think we propagate that? Hardcoding the boards? Ya gotta be kiddin’ me. We used to grab the board list through our API, but who needs it: Chowhound has its own board list on every page in its subnav. We marked it up as XOXO, gave it a class of xoxo (that’s what the :class => true is for—to let XOXO know you only want trees with a parent class of xoxo), then laughed like crazy.
(Chris Neukirchen (aka the amazing Anarchaia) has a more complete Ruby XOXO parser but I couldn’t get it to work with any of the XOXO examples on the microformats wiki. I suck.)
Anyone keeping track? That’s two API pages and two API clients we got rid of. I live for destruction.
The Markup
What did we really have to do? A lot of work, right? No! Wrong! Rather than bore you with more of my Chowhound anecdotes, here’s a completely arbitrary example.
Your blog post HTML, before microformats:
<div class="post"> <h3>Megadeth Show Last Night</h3> <span class="subtitle">Posted by Chris on June 4th</span> <div class="content">Went to a show last night. Megadeth. It was alright.</div> </div>
Your blog HTML, marked up with hAtom:
<div class="post hentry"> <h3 class="entry-title">Megadeth Show Last Night</h3> <span class="subtitle">Posted by <span class="author vcard fn">Chris</span> on <abbr class="updated" title="2006-06-04T10:32:10Z">June 4th</abbr></span> <div class="content entry-content">Went to a show last night. Megadeth. It was alright.</div> </div>
All I did was add the hentry, entry-title, and entry-content classes to existing containers. Then I went ahead and wrapped the date in an <abbr> tag, giving its title a timestamp in the microformat-standard way. Finally I put a div around Chris signifying him as the author field of the hEntry and making it a valid hCard by including the vcard and fn classes. It’s really not all that hard. Did I mess it up? Maybe, but I’m sure I got close. And I didn’t even use a reference. Pfft.
Hey: check out the source of this very blog post and search for hentry or vcard. It’s a learning exercise—you can see how simple it really is. Compare Err’s vcard to the Flickr one. Fascinating.
Information.find(:more => true)
It’s very simple once you get comfortable with the idea. Just a little extra markup to help your data live a longer, more semantic life
Drew McClellan recently asked via amazing photo-slide, “Can Your Website Be Your API?”. I don’t want to spoil the slides, because they are amazing and you should check them immediately, but the answer is yes. It really can.
I already mentioned Assaf and his microposting, but I left something out: he has a great Rails helper complete with a cheat sheet ($ cheat microformats_helper).
Tantek and Ryan King of Technorati are microformat demi-gods. Chris Messina and (unsurprisingly) Dan Cederholm also have great blogs with a sprinkle of micro. And just the other day I found this Make Data Make Sense site, with greatness abound.
Brian Suda’s projects are way off the hook, like his cheat sheet and geo microformat to kml converter (for Google Maps). Oh yeah, he wrote a book, too.
Word is Mephisto supports microformats out of the box. Truth? Worth investigating.
And, dude? Firefox 3 and IE8 have both whisperings of microformat support. What are you waiting for? Dive in and start mofoin’.
Call to Arms
Let’s get going with this. We’re using mofo at work so expect to see it moving ahead rapidly. Find something it can’t parse? File a bug. Got an idea? Feature request. Patches? Please. Send me an e-mail if you want a little personal TLC or have a question. I’m also interested in cleaning up and speeding up the parser. So hey, if you like parsing junk and want commit access, just send a patch and you’re in. Group effort. There’s also a mofo mailing list where we might do some talking, if you’re into that sort of thing.
Now go get your REST on, dammit.
Update: Fixed a link. Dammit.
Update 2: That amazing mofo logo was created by Todd Matthews Thanks, Todd.
Update 3: I changed the SVN location. Sorry!
“Tails extension” link points to atmos’s flickr site. I had the feeling it should go somewhere else.
To abandon an XML API in favour of microformats in the HTML is happy and super where there is a microformat schema for the data you want to present. I mean, is there an hPorn microformat schema? I thought not. So my armada of Ruby on Rails Pron sites will just have to keep their XML API, which also produces Javascript by default if you ask it.
In summary, are microformats just a neat trick that have a limited set of fixed schemas, and thusly doomed?
Unless all those porn sites magically implement identical APIs, it’s no more work to consume custom microformats than it is to consume custom XML APIs. And its significantly less work to publish them. The benefit is on the publishing side.
Great article. I’ll try to give a ride to your library soon.
To Dr Nic: It seems to me that you have dismissed microformats idea pretty easily. Take a look at the wiki (http://microformats.org/wiki/Main_Page): 9 specifications (of which a couple already adopted or to be adopted by W3C), 11 drafts, 6 patterns and a huge list of exploratory discussions. I wouldn’t call this (after an year of existence only!) like “doomed”.
If you need a microformat, and only if you need one, just do it and propose it to others on the microformats.org wiki. Hell, you can even keep it to yourself.
The microformat proposal process is detailed here (http://microformats.org/wiki/process), and it starts with: “Why? There must be a problem to be solved. No problem, no microformat.” So if you don’t need one, don’t call those needing one “crazy”.
Sorry if I am too harsh …
Microformats are great for publishers, and they’re great for making information trivially discoverable by generic user-agents. They’re also great when they’re standardized, but that holds true of any format.
They’re not a magic bullet. For clients, they make screen-scraping easier, and that’s all. They’re no more a replacement for an API than screen-scraping is. They’re harder to parse (“just use mofo” isn’t an adequate response for a couple of reasons; the one that springs immediately to mind is that not everyone uses Ruby), much heavier down the wire than, say, JSON, and might cause other issues down the track—it’s possible that the browser-viewable HTML can’t be safely cached for as long as the API-response data, for example.
I don’t think mashup developers will thank you.
Mephisto doesn’t support it out of the box, unless your template does. Though, there is a scribblish port, which uses hAtom.
Since this replaced your REST API, can you then post the actual html representation of your post to Chowhound? Can you imagine that? Instead of posting an ATOM entry or converting to an XML-RPC struct, they just mark up your entry in your template and post it.
that’s one hell of a mofo logo
The Doctor: The hAtom format works really well for most content, in my limited experience. There is no hMessageBoardPost format but we got away with hAtom just fine. The reason things like hCard and hCalendar exist is so they can be exported and converted to other formats sanely—where are you going to take porn that’s special? Into Quicktime? Then maybe we need hVideo, but hCard can become a vcard in your Outlook and hCalendar can become an iCal event in your calendar app. That’s my understanding, at least. Also, what evan said.
@everything – thanks for helping me through this. I will subsequently apply additional markup to help Scrapers that want my data, but are fractionally to lazy to write their on scraper code :)
@my mother – if you find this, I don’t have porn sites. If I did I’d make more money and visit you more often.
Thanks for giving me the nudge to play with microformats.
I implemented the hCalendar standard on the stolen bike data over at Finetoothcog.
I wrote a blog post about my experience.
BTW: The service at technocrati:”http://technorati.com/events/” for converting hCalendar items to iCal subscriptions is super sweet!
Oh noes, your blog has fallen victim to comment spammers. Maybe your blog needs some anti-spam measures.
Anyway, thought you’d appreciate this. Mini-icons made for microformat entities! http://www.factorycity.net/projects/microformats-icons/
Good thing you don’t have a spam filter to block that link, heh.
Glad I finally got around to reading this. This is good stuff. Simple is good. No need to try to make it out to be more than it is, which I think is simply a nod in the right direction, with more potential in the right hands.
I was futzing with Flickr API libs for Ruby until it dawned on me to parse the RSS feed with Hpricot. A short while later I had exactly what I wanted, and there is still plenty of info to be harvested from that lil’ feed.
I’ve submitted a bug: http://rubyforge.org/tracker/index.php?func=detail&aid=7982&group_id=2548&atid=9799
hmm… that “u” in “uFormat” should be a ?. But you know that.
You ate my utf-8!
Awesome, I was planning on writing something like this, looks like I don’t need to!
The hAtom format works really well for most content, in my limited experience. There is no hMessageBoardPost format but we got away with hAtom just fine. The reason things like hCard and hCalendar exist is so they can be exported and converted to other formats sanely—where are you going to take porn that’s special? Into Quicktime? Then maybe we need hVideo, but hCard can become a vcard in your Outlook and hCalendar can become an iCal event in your calendar app. That’s my understanding, at least. Also, what evan said. automotive repair manual
I was futzing with Flickr API libs for Ruby until it dawned on me to parse the RSS feed with Hpricot. A short while later I had exactly what I wanted, and there is still plenty of info to be harvested from that lil’ feed.
Chime in.