Read raw data from database (without using horrible hacks)

August 08, 2012, 02:32:16 PM

I am analyzing data stored in a MySQL database and have a function which is able to create an xcmsRaw object from the database. My problem is that while I can use my xcmsRaw.db function on its own, I can't get xcmsSet to use my alternative version.

I came up with a horrible, evil, terrible hack to force xcms to use xcmsRaw.db instead of the normal xcmsRaw within xcmsSet and elsewhere by overwriting the binding from xcmsRaw within the namespace:xcms environment. Since xcmsRaw.db needs a database connection object (I use RODBC) and optional query parameters, those must be "passed in" by hacking variables into the dynamic context of the call to xcmsRaw.db. The complete code is here and hacking in my replacement is done with:

Code: [Select]

##' Seriously, this function is bad news.
HackRawReader <- function() {
  xEnv <- environment(xcms:::xcmsSet)
  unlockBinding("xcmsRaw", xEnv)
  assign("xcmsRaw", xcmsRaw.db, envir=xEnv)
  assign("xcmsRaw", xcmsRaw.db, envir=parent.frame())
}

Like I said, this is an abomination, but it works, for certain definitions of the word "works." I am able to run findPeaks() and plotPeaks() on the object imported from the database and get the same result as when it is imported from the corresponding mzML file.

Is there a less horrifying way to accomplish this goal? I could imagine turning xcmsRaw into a generic method which dispatches based on the filename string it is given. That way, if the file ends in ".cdf" it could be read using mzR::netCDFOpen or xcmsRaw.cdf, if it ends with ".mzML" it could be read with mzR::rampOpen or xcmsRaw.mzml, and if it begins with "db://" (or some such) it could be read with xcmsRaw.db. This would allow additional backends to be added without invasive changes to the xcmsRaw function; handling a new data source would be a matter of adding a regular expression and an associated function.

I'm fairly new at R and even newer at xcms, but if people think it's a good idea, I could try my hand at generifying xcmsRaw.

[Patch] Read raw data from any source

Reply #1 – August 25, 2012, 02:43:15 AM

I took it upon myself to modify xcms to provide the functionality I was looking for. I've created a Git repository with my changes here. The essence of my changes is to remove the hard-coded reliance on mzR reading files and replace it with an S4 generic class called xcmsSource. The only requirement is that classes implementing xcmsSource provide a loadRaw method which returns a list in the same format as netCDFRawData and rampRawData.

The filename argument to the xcmsRaw function need no longer be a character string (though it still can be), but can be any object with an xcmsSource method defined. The resulting xcmsSource object is still stored in the @filename slot of xcmsRaw objects for backwards compatibility. In fact, this branch is a drop-in replacement for the old one: R CMD check passes with flying colors.

I'd like to see these changes upstream, and am happy to make any improvements necessary for my code to be acceptable to include.