Parse XML with Hpricot — err.the_blog

“Parse XML with Hpricot”

– PJ on July 31, 2006
Given a piece of XML:
```
<Export>
  <Product>
    <SKU>403276</SKU>
    <ItemName>Trivet</ItemName>
    <CollectionNo>0</CollectionNo>
    <Pages>0</Pages>
  </Product>
</Export>
```
One might assume that REXML is the way to parse it, but we all know how slow it is.

Enter _why’s HTML parser, Hpricot. It’s written in C and since XHTML is a subset of XML, there’s no reason it shouldn’t be able to parse my file.

Turns out it does, it’s really fast, and the code is dead simple.
```
FIELDS = %w[SKU ItemName CollectionNo Pages]

doc = Hpricot.parse(File.read("my.xml"))
(doc/:product).each do |xml_product|
  product = Product.new
  for field in FIELDS
    product[field] = (xml_product/field.intern).first.innerHTML
  end
  product.save
end
```
Update: Slight refactoring of the code above. Chris figured out last night that you can use innerHTML which eliminated the only ugly part of the code.

Works great, replaced a project which was using REXML with Hpricot. The tests used to take 4.6 seconds, now they take 0.6. Big improvement.

That saves a ton of time indeed. Thanks.

I have found error : If you have tag in your file ( like in kml file type) then it is not recognized. Perhaps because this is markup for css style in html document?

Hi it is really good, but when i run this code i get an error saying “uninitialized constant Product (NameError)”.

How to solve this?

-Nicholas I

Sorry, no more comments :(