Skip to main content
Topic: CentWave error (Read 19696 times) previous topic - next topic

Re: CentWave error

Reply #15
Folks, please start new threads for new topics.

write.mzdata uses the library XML which is very slow and might cause the memory leak.
I guess bypassing lib XML and directly writing plain text would be much faster.
Code contributions/optimizations are very welcome!

If you are just looking for a fast way to convert cdf -> mzData you could also try OpenMS' FileConverter, which is very fast. But it might be more picky with reading the cdf files, though ...

Re: CentWave error

Reply #16
Sorry Ralf,

I viewed it as a continuation of this thread, or I would have started a new one. 

I downloaded OpenMS to try it and it doesn't seem to be able to read cdf files.  I think I am stuck for the time being.  I am trying to push waters to help with the conversion options, but that is moving slowly.  Proteowizard has a tool which doesn't use the lockmass data, so mass accuracy is compromised.

Re: CentWave error

Reply #17
Out of curiosity, why don't you process netcdf directly but convert to mzXML instead ?

Re: CentWave error

Reply #18
I am collecting Waters Raw data.  I need to get it into a usable format for processing in XCMS.  Waters conversion tool can convert to cdf, but not much else.  They do have tools (in proteomics packages) that can convert to mzXML, but they require profile data - I have centroid.  As far as I can tell, there are no tools in the Waters packages to convert centroid data to anything other than cdf (with respect to XCMS compatibility, ASCII is the only other output format). 

I am operating a Q-TOF, which requires a lock-mass correction.  As waters stores its raw data, it is not lockmass corrected, but is corrected on the fly either during conversion to CDF or for viewing in Waters software.  I looked into Proteowizard, which will convert to mzXML, but does not perform lock-mass correction - an issue they are working with waters with.  So the mzXML masses are off - accurate mass data is gone.

I can use cdf format using the matchedFilter algorithm, but the centWave algorithm does not work with cdf data as it is written by DataBridge, waters conversion tool.  I want to use centWave for this application because 1. it is supposed to be better for accurate mass data, and 2. it can give me some gaussian fit parameters for each peak, which aid in downstream interpretation. 

As an aside, we can collecting data using MSe acquisition, and a nice perk of the Databridge conversion is that raw data files are split into independent data files for function 1: MS; function 2: MSe, and function 3: lockmass.  This means that I can perform peak detection in XCMS on both the MS and MSe functions, a critical aspect for the workflow we are developing for mining indiscriminant MSMS data.  I am trying to talk waters into offering the option to split their raw files during conversion to mxXML in proteowizard as well, but it is slow going - in fact I have no idea if, much less when. 

I feel like I am fighting data conversion problems with every turn.  It is not the fault of the XCMS developers at all - I was just hoping to use the write mzData function to get all my CDF files into a format that centWave could read, so I could actually use the peak detection algorithm best suited to the data.

Re: CentWave error

Reply #19
The problem for centWave is not the cdf format of the Waters data - it is the gaps in the data caused by the lock mass scans (and converting to mzXML won't fix these gaps).
Paul just updated his gap correction module
Code: [Select]
xcmsSet(..., lockMassFreq= TRUE)
that makes use of 1:MS netcdf files and 3:lockmass files to fix these gaps.

I haven't had a chance yet to test the updated module but maybe you can let us know if it works for you.

Re: CentWave error

Reply #20
Interesting.  I will try it.

That being said, the write.mzData function output is functional with centWave.  I have tried it on many files successfully.  And it never works on the cdf files it was derived from.  I have looked at the centwave output and I beleive it is working properly on the mzdata files converted by XCMS::write.mzdata(). 

I was under the impression that the lockmass fill function that Paul was working on was more to correct for the signal gap to provide better quantitative values for XCMS output, rather than to allow functionality.

Re: CentWave error

Reply #21
Interesting. I don't see a reason why centWave should work on the mzData but not on the cdf - since it should be exactly the same data !

Maybe Paul can jump in here, but afaik the lock mass scans cause "gaps" in the chromatographic peaks therefore feature detection will not work as expected and will miss many features.

Paul's paper has more details on that:

citEntry(entry="article", 
  author="H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels",
  title="Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data",
  journal="BIOINFORMATICS",
  year="2010",
  volume="26",
  pages="2488",
  textVersion = paste("H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels",
  "Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data",
  "Bioinformatics, 26:2488 (2010)"))

Re: CentWave error

Reply #22
Ralf,

You mentioned earlier in this thread a 'corrupt file' phenomenon with cdf format.  This recommendation was what I was following in attempting to convert to mzData.  If it is somehow corrupted in cdf form, it becomes uncorrpted in mzData form.  Further, I tried (once) to use the write.cdf() function and the rewritten cdf file also failed to work with centWave.

Re: CentWave error

Reply #23
I see. That actually is the only modification/fix that write.mzdata will make to the data, I added this a while ago to have a possibility to fix "corrupt" files (here: unsorted scans).
Code: [Select]
    if (is.unsorted(peaks[,"mz"])) { ## fix "bad" scans
        peaks <- peaks[order(peaks[,"mz"]),]
    }   

It will simply order the peaks in scans if unsorted. But it will not fix the gaps caused by the lock mass scans. That is done "on the fly" during processing using Paul's module.

Re: CentWave error

Reply #24
FYI:

I figured out a way to modify my R script, and run it from the windows command prompt in batch.  It is far from elegant, but it appears functional and requires little person-time (still lots of computer time, C'est la vie).  To do so, open the command prompt and change the wd to the R directory
(the R folder should contain the R executable file Rscript.exe); in my case:

cd C:Program FilesRR-2.15.1binx64


Next: Create and R script (I called it convert1.R):

##load xcms library
rm(list=ls(all=TRUE))
library(xcms)
library(snow)
library(ncdf)
library(caTools)
library(XML)

setwd("C:/cdf_mzdata")
print(getwd())
seq<-read.csv(file="seq.csv", header=TRUE, row.names=1)
dim(seq)
dataset<-list.files(getwd(), pattern="CDF", recursive = FALSE)
length(dataset)
dataset<-as.vector(sapply(row.names(seq), grep, x = dataset, value = TRUE))
length(dataset)

i<-length(list.files(getwd(), pattern="mzData", recursive = FALSE))+1
xr <- xcmsRaw(dataset, profstep=0)
name<- dataset
name<-substr(name, 1, nchar(name)-4)
write.mzdata(xr, file=paste(name, ".mzData", sep=""))
rm(xr)
gc()
q()

Save this script in the same R folder.
One more step.  Create a 'batch' file called 'batch.bat' which contains
this:
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R


with the same line repeated for as many files as you have in your directory.  The R script counts the number of converted .mzData files and adds one to that number, so it advances one file each time. 

Then at the command prompt, type batch.bat and hit enter.

I am running it now and it seems to have bypassed the memory leak issue, as R is closed after each conversion.

Re: CentWave error

Reply #25
Hi,

I followed Ralf's idea to go back to plain cat() for writing
the mzData out. I haven't done any performance checks,
but at least the data looks almost as before, we just lost
proper indenting. Pushed as 1.35.1

It passes the unit tests, but I would be glad for some feedback
about the generated mzData. I also added polarity for the
MSn scans, which was lacking before. I just assumed it is
the same polarity as the parent scan, which I consider a safe guess ;-)

And if somebody can post some impressive speed improvement factors
before/after, that would be really cool.

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: CentWave error

Reply #26
Here you go for the  impressive speed improvements.
I wrote the same file 20 times.

Old way:  used 1.39GB of memory and took 9.83min
New way: used 0.03GB of memory and took 3.27min


So 3 times as fast and memory leak gone :) wonderful!
Blog: stanstrup.github.io

Re: CentWave error

Reply #27
Thanks Steffen,

I will give it a try soon!

Corey

Re: CentWave error

Reply #28
alright,
what am I doing wrong here: 

I went here to download the version:
http://www.bioconductor.org/packages/de ... /xcms.html

installed from the download folder in R.

Installation went fine, but

> library(xcms)
Loading required package: mzR
Loading required package: Rcpp
Error in eval(expr, envir, enclos) :
  could not find function ".getNamespace"
In addition: Warning message:
package ‘xcms’ was built under R version 2.16.0
Error : unable to load R code in package ‘xcms’
Error: package/namespace load failed for ‘xcms’


If I download via:
source("http://bioconductor.org/biocLite.R")
biocLite("xcms")

I get an older version that does not generate this error. 
I even uninstalled R and reinstalled R 2.15.1.

Re: CentWave error

Reply #29
It almost looks like the problem is related to Rcpp.
Do you have the latest version of Rcpp installed ?