Given a piece of XML:
<Export> <Product> <SKU>403276</SKU> <ItemName>Trivet</ItemName> <CollectionNo>0</CollectionNo> <Pages>0</Pages> </Product> </Export>
One might assume that REXML is the way to parse it, but we all know how slow it is.
Enter _why’s HTML parser, Hpricot. It’s written in C and since XHTML is a subset of XML, there’s no reason it shouldn’t be able to parse my file.
Turns out it does, it’s really fast, and the code is dead simple.
FIELDS = %w[SKU ItemName CollectionNo Pages] doc = Hpricot.parse(File.read("my.xml")) (doc/:product).each do |xml_product| product = Product.new for field in FIELDS product[field] = (xml_product/field.intern).first.innerHTML end product.save end
Update: Slight refactoring of the code above. Chris figured out last night that you can use innerHTML which eliminated the only ugly part of the code.
Works great, replaced a project which was using REXML with Hpricot. The tests used to take 4.6 seconds, now they take 0.6. Big improvement.
That saves a ton of time indeed. Thanks.
I have found error : If you have tag in your file ( like in kml file type) then it is not recognized. Perhaps because this is markup for css style in html document?
Hi it is really good, but when i run this code i get an error saying “uninitialized constant Product (NameError)”.
How to solve this?
-Nicholas I
Chime in.