Err the Blog Atom Feed Icon
Err the Blog
Rubyisms and Railities
  • “OpenStruct IRL”
    – Chris on October 06, 2006

    Hey, here’s a fun one. Just last week cdcarter needed to scrape and Rubyify the SciFi channel’s listings. (Wow, those guys really like tables, huh? And nondescript markup. And PHP3. (PHP3 was sooo the best.))

    Quickly carter and I dusted off Hpricot and, with it, scraped the hell out of the listing page. We then turned each listing into a Show object, easy. With OpenStruct.

    %w[open-uri rubygems hpricot ostruct].each { |f| require f }
    
    class Show < OpenStruct
      LISTINGS = 'http://www.scifi.com/schedulebot/index.php3?feed_req=US:Central:E'
    
      def to_s
        "#{time}: #{title}" << (program ? " [#{program}]" : '')
      end
    
      def self.find_all_from_today
        shows = []
        doc = Hpricot open(LISTINGS)
        tds = (doc/:td).select { |td| td.respond_to?(:[]) && td['class'] == 'text' }
        tds.each_with_index do |td, i|
          next unless td.innerHTML =~ /:.+(AM|PM)/
          time    = td.innerHTML
          program = tds[i+1].innerHTML.gsub(/<a.+>(.+)<\/a>/, '\1')
          title   = tds[i+2].innerHTML
          shows << new(:time => time, :program => program, :title => title)
        end
        shows
      end
    end
    
    # print all found shows for today
    Show.find_all_from_today.each { |show| puts show.to_s }
    

    Run it. I get something like this:

    5:00 AM: [PAID PROGRAMMING]
    7:00 AM: SHADOW PLAY [TWILIGHT ZONE, THE]
    7:30 AM: BLACK MARKET [BATTLESTAR GALACTICA (SEASON 2)]
    8:30 AM: SCAR [BATTLESTAR GALACTICA (SEASON 2)]
    9:30 AM: SACRIFICE [BATTLESTAR GALACTICA (SEASON 2)]
    ...
    

    Way cool (even though I’m dying to slip in some returning action). You can imagine how this might be expanded into a nice little pirate RSS feed or something.

    Any more cool Struct or OpenStruct uses floating around out there? Jay Fields has done lots of messin’ with OpenStruct and kindly sprinkles a few write-ups throughout his blog. How’s about yous?

  • ChrisJ, 1 day later:

    Nice article. I’ve been trying to find some “excuse” to try Hpricot. This gives me an idea for stats on nfl.com.

  • Chris Carter, 1 day later:

    Thanks for giving me the metion :) I think I’m gonna turn the Buggy code to use openstruct. It’s so clever. So many classes could just inherit from it, like a superjavabean

  • Seth Thomas Rasmussen, 1 day later:

    I started playing with Hpricot just today! Me likeyy.

  • kbrown, 4 days later:

    What is the advantage of using openstruct in this case? I can see why module opts_parse uses is because the methods vary based on program options, but you are just using methods: time,program,title. no?

  • John Nunemaker, 2 months later:

    I used hpricot on my twitter gem and absolutely fell in love. Wouldn’t have thought of it without this article. Unfortunately, using open struct slipped my mind. That is a really nice touch. Glad I revisted this.

  • Five people have commented.
    Chime in.
    Sorry, no more comments :(
This is Err, the weblog of PJ Hyett and Chris Wanstrath.
All original content copyright ©2006-2008 the aforementioned.