Problem Setup

I had noted in a previous post that I have been using the XML package in R to process an XML from an export of our database. I used xmlToDataFrame to change from an XML set to an Rdata.frame and I have found it to be remarkably slow. After some Googling, I found a link where the author states that xmlToDataFrame is a generic function and if you know the structure of the data, you can leverage that to speed up the function.

So, that’s what I did for my data. I think this structure is applicable to similar data structures in XML, so I thought I’d share.

Data Structure

Let’s look at the data structure. For my data, an example XML would be:

which tells me a few things:

  1. I’m XML (first line). There are other pieces of information which can be extracted as tags, but…

