Metabolomics Society Forum

Software => R => XCMS => Topic started by: cbroeckl on July 25, 2011, 12:48:33 PM

Title: CentWave error
Post by: cbroeckl on July 25, 2011, 12:48:33 PM
Hello users,

I am playing with XCMS v 1.27.4.  I am having an issue with a dataset that I started working with.  If i use the matchedFilter peak detection, the process goes smoothly.  If I use the centWave method:

xset3<- xcmsSet(dataset, method = "centWave", ppm=25, peakwidth=c(2,40), snthresh=2, integrate=2, mzdiff=0.1, fitgauss=FALSE, verbose.columns=TRUE)

I receive an error message that :

"
11-07-18_test_005501:
 Detecting mass traces at 25 ppm ...
 % finished: 0 10 20 30 40 50 60 Error in .local(object, ...) :
  m/z sort assumption violated ! (scan 670, p 129, current 170.1777 (I=5.06), last 170.1777)
In addition: Warning messages:
1: closing unused connection 6 (<-PMFLAB-04.cvmbs.ColoState.EDU:10187)
2: closing unused connection 5 (<-PMFLAB-04.cvmbs.ColoState.EDU:10187)
3: closing unused connection 4 (<-PMFLAB-04.cvmbs.ColoState.EDU:10187)
4: closing unused connection 3 (<-PMFLAB-04.cvmbs.ColoState.EDU:10187)
"
where 11-07-18_test_005501 is the problem filename.  When I use the nSlaves option, I end up with,  this message

"
Error in checkForRemoteErrors(val) :
  5 nodes produced errors; first error: m/z sort assumption violated ! (scan 670, p 129, current 170.1777 (I=5.06), last 170.1777)
"

and :
> traceback()
4: stop(count, " nodes produced errors; first error: ", firstmsg)
3: checkForRemoteErrors(val)
2: xcmsClusterApply(cl = snowclust, x = argList, fun = findPeaksPar)
1: xcmsSet(dataset, nSlaves = 4, method = "centWave", ppm = 25,
      peakwidth = c(2, 40), snthresh = 2, integrate = 2, mzdiff = 0.1,
      fitgauss = FALSE, verbose.columns = TRUE)


So apparently there are something about a few of these files that centWave does not like, but matchedFilter is OK with.  These are cdf files produced by Markerlynx DataBridge from waters raw files.  I am using R 3.13.1.  Anyone seen this issue previously?

Thanks
Title: Re: CentWave error
Post by: Ralf on August 01, 2011, 03:07:56 AM
Yes, this issue with corrupt data files has been seen before.

You can also use OpenMS'
FileInfo -c -in mydata.mzXML
to verify the file integrity.

Try to re-convert and/or report the problem to the vendor.

The only known work-around is to read the files into XCMS using xcmsRaw and write with the
 write.mzdata function, the bad scans will be automatically fixed when writing the mzData files.

PS: Not sure, but OpenMS' FileConverter might also be able to fix the files.
Title: Re: CentWave error
Post by: cbroeckl on August 12, 2011, 09:11:12 AM
So the matchedFilter method is insensitive to this issue while centWave is?
Title: Re: CentWave error
Post by: Ralf on August 12, 2011, 09:38:59 AM
matchedFilter generates the profile matrix from the raw data and detects peaks from the binned data.
It simply doesn't check for this type of problem. However, the effects have not been studied.

centWave's ROI algorithm works directly on the raw data and does a number of checks to ensure raw data integrity.

You can use XCMS or OpenMS to fix these files.
Title: Re: CentWave error
Post by: cbroeckl on August 12, 2011, 10:18:05 AM
Thanks Ralf,

I was just hoping to avoid having to convert to yet another format.  MassLynx databridge doesn't covert to mzXML, so I now have another step and another (roughly) doubling of hardrive space for each dataset.  I will try it though.
Title: Re: CentWave error
Post by: Ralf on August 12, 2011, 10:58:37 AM
Have you tried using proteowizard for the conversion ?
It supports the MassLynx .raw file format. http://https://xcmsonline.scripps.edu/docs/fileformats.html
Title: Re: CentWave error
Post by: cbroeckl on October 03, 2012, 11:13:33 AM
Ralf,

When using xcmsRaw(), does the profstep value influence the mass accuracy of the raw data?  For example:

xr <- xcmsRaw("test.CDF", profstep=0.01)
write.mzdata(xr, file="test.mzData")

Will the native mass values be preserved better or worse using the a profstep value of 0.01 vs 0.5?  Can I use this if I want to preserve the accurate mass data?  Thanks again
Title: Re: CentWave error
Post by: Ralf on October 03, 2012, 02:40:34 PM
It will not affect the raw data (and its mass accuracy) in this example.
The xcmsRaw object always contains the complete, unmodified raw data.

The profile matrix is generated on the fly whenever needed.
Only some functions like findPeaks.matchedFilter make use of the so called profile matrix, where the bin size of this matrix is determined by the profStep value. Only for these functions the profStep value will have an effect on mass accuracy.
findPeaks.centWave for example does not use the profile matrix but reads the raw spectra directly.
Title: Re: CentWave error
Post by: sneumann on October 03, 2012, 02:42:50 PM
Hi,

write.mzdata() will write the exact values present in the original raw file.

profstep is a parameter for the creation of a re-sampled rectangular
intensity matrix used in some plotting functions and matchedFilter.

Yours,
Steffen
Title: Re: CentWave error
Post by: cbroeckl on October 04, 2012, 10:25:06 AM
Thanks Steffen and Ralf.  My concerns are assuaged!  As a follow up then, does that profstep value have absolutely no influence on the conversion of cdf to mzData?  The smaller profstep values generate really big matrices that consume alot of memory - if there is not need to use a smaller profstep, I will set it relatively high.
corey
Title: Re: CentWave error
Post by: Ralf on October 04, 2012, 11:03:59 AM
You can just set it to zero  ;)
Title: Re: CentWave error
Post by: cbroeckl on October 04, 2012, 12:04:04 PM
even better!  Much obliged...
Title: Re: CentWave error
Post by: cbroeckl on October 08, 2012, 10:34:23 AM
I am trying to convert a bunch of files (~1600) from cdf to mzData, and am seeing behavior that reminds me of memory leak issues.  I am running R 2.15.1 64 bit. My code:

##load xcms library
rm(list=ls(all=TRUE))
library(xcms)
library(snow)
library(ncdf)
library(caTools)

seq<-read.csv(file="seq.csv", header=TRUE, row.names=1)
dim(seq)
dataset<-list.files(getwd(), pattern="CDF", recursive = FALSE)
length(dataset)
dataset<-as.vector(sapply(row.names(seq), grep, x = dataset, value = TRUE))
length(dataset)

for (i in 1:length(dataset)) {
xr <- xcmsRaw(dataset, profstep=0)
name<- dataset
name<-substr(name, 1, nchar(name)-4)
write.mzdata(xr, file=paste(name, ".mzData", sep=""))
rm(xr)
gc()
}


The symptoms:  It is a slow process writing mzData files.  On average, about 2.5 minutes per file.  The first 90ish files went fine, but then a low memory message was triggered, and my resource monitor indicated that I had no RAM left.  I am using all the R cleanup steps I know of to manage memory (rm() and gc()) but still ran out of memory, when I think I should not be.  Anyone see any bugs in the script or tips for preventing this?  It will take me weeks to do this 90 files at a time.  Thanks

Corey
Title: Re: CentWave error
Post by: Jan Stanstrup on October 08, 2012, 12:35:35 PM
Yep... There is a memory leak in write.mzdata. You can show this if you just write the same data over and over again. Memory usage will increase. I tried to look at the code some time ago but couldn't find an obvious problem.

I hope someone will find time to fix this issue soon as I too wanted to use xcms as a batch converter.
I am hoping that the writing routine can be made more efficient too. It seems illogical that writing is so many many times slower than reading the data; maybe this is my lack of understanding of what needs to be done though. As I remember reading the code it looked like it is adding one scan at a time to the file. Maybe this is not an efficient way?
Title: Re: CentWave error
Post by: cbroeckl on October 08, 2012, 12:55:09 PM
addendum

I just realized after starting a new batch that I am actually running into huge memory consumption after < 10 files, and then it starts running slow for a long time before running out of memory.
Title: Re: CentWave error
Post by: Ralf on October 08, 2012, 04:59:15 PM
Folks, please start new threads for new topics.

write.mzdata uses the library XML which is very slow and might cause the memory leak.
I guess bypassing lib XML and directly writing plain text would be much faster.
Code contributions/optimizations are very welcome!

If you are just looking for a fast way to convert cdf -> mzData you could also try OpenMS' FileConverter, which is very fast. But it might be more picky with reading the cdf files, though ...
Title: Re: CentWave error
Post by: cbroeckl on October 09, 2012, 10:37:09 AM
Sorry Ralf,

I viewed it as a continuation of this thread, or I would have started a new one. 

I downloaded OpenMS to try it and it doesn't seem to be able to read cdf files.  I think I am stuck for the time being.  I am trying to push waters to help with the conversion options, but that is moving slowly.  Proteowizard has a tool which doesn't use the lockmass data, so mass accuracy is compromised.
Title: Re: CentWave error
Post by: Ralf on October 09, 2012, 11:11:33 AM
Out of curiosity, why don't you process netcdf directly but convert to mzXML instead ?
Title: Re: CentWave error
Post by: cbroeckl on October 09, 2012, 11:48:37 AM
I am collecting Waters Raw data.  I need to get it into a usable format for processing in XCMS.  Waters conversion tool can convert to cdf, but not much else.  They do have tools (in proteomics packages) that can convert to mzXML, but they require profile data - I have centroid.  As far as I can tell, there are no tools in the Waters packages to convert centroid data to anything other than cdf (with respect to XCMS compatibility, ASCII is the only other output format). 

I am operating a Q-TOF, which requires a lock-mass correction.  As waters stores its raw data, it is not lockmass corrected, but is corrected on the fly either during conversion to CDF or for viewing in Waters software.  I looked into Proteowizard, which will convert to mzXML, but does not perform lock-mass correction - an issue they are working with waters with.  So the mzXML masses are off - accurate mass data is gone.

I can use cdf format using the matchedFilter algorithm, but the centWave algorithm does not work with cdf data as it is written by DataBridge, waters conversion tool.  I want to use centWave for this application because 1. it is supposed to be better for accurate mass data, and 2. it can give me some gaussian fit parameters for each peak, which aid in downstream interpretation. 

As an aside, we can collecting data using MSe acquisition, and a nice perk of the Databridge conversion is that raw data files are split into independent data files for function 1: MS; function 2: MSe, and function 3: lockmass.  This means that I can perform peak detection in XCMS on both the MS and MSe functions, a critical aspect for the workflow we are developing for mining indiscriminant MSMS data.  I am trying to talk waters into offering the option to split their raw files during conversion to mxXML in proteowizard as well, but it is slow going - in fact I have no idea if, much less when. 

I feel like I am fighting data conversion problems with every turn.  It is not the fault of the XCMS developers at all - I was just hoping to use the write mzData function to get all my CDF files into a format that centWave could read, so I could actually use the peak detection algorithm best suited to the data.
Title: Re: CentWave error
Post by: Ralf on October 09, 2012, 12:30:44 PM
The problem for centWave is not the cdf format of the Waters data - it is the gaps in the data caused by the lock mass scans (and converting to mzXML won't fix these gaps).
Paul just updated his gap correction module
Code: [Select]
xcmsSet(..., lockMassFreq= TRUE)
that makes use of 1:MS netcdf files and 3:lockmass files to fix these gaps.

I haven't had a chance yet to test the updated module but maybe you can let us know if it works for you.
Title: Re: CentWave error
Post by: cbroeckl on October 09, 2012, 12:44:40 PM
Interesting.  I will try it.

That being said, the write.mzData function output is functional with centWave.  I have tried it on many files successfully.  And it never works on the cdf files it was derived from.  I have looked at the centwave output and I beleive it is working properly on the mzdata files converted by XCMS::write.mzdata(). 

I was under the impression that the lockmass fill function that Paul was working on was more to correct for the signal gap to provide better quantitative values for XCMS output, rather than to allow functionality.
Title: Re: CentWave error
Post by: Ralf on October 09, 2012, 12:53:53 PM
Interesting. I don't see a reason why centWave should work on the mzData but not on the cdf - since it should be exactly the same data !

Maybe Paul can jump in here, but afaik the lock mass scans cause "gaps" in the chromatographic peaks therefore feature detection will not work as expected and will miss many features.

Paul's paper has more details on that:

citEntry(entry="article", 
  author="H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels",
  title="Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data",
  journal="BIOINFORMATICS",
  year="2010",
  volume="26",
  pages="2488",
  textVersion = paste("H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels",
  "Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data",
  "Bioinformatics, 26:2488 (2010)"))
Title: Re: CentWave error
Post by: cbroeckl on October 09, 2012, 01:13:01 PM
Ralf,

You mentioned earlier in this thread a 'corrupt file' phenomenon with cdf format.  This recommendation was what I was following in attempting to convert to mzData.  If it is somehow corrupted in cdf form, it becomes uncorrpted in mzData form.  Further, I tried (once) to use the write.cdf() function and the rewritten cdf file also failed to work with centWave.
Title: Re: CentWave error
Post by: Ralf on October 09, 2012, 01:28:43 PM
I see. That actually is the only modification/fix that write.mzdata will make to the data, I added this a while ago to have a possibility to fix "corrupt" files (here: unsorted scans).
Code: [Select]
    if (is.unsorted(peaks[,"mz"])) { ## fix "bad" scans
        peaks <- peaks[order(peaks[,"mz"]),]
    }   

It will simply order the peaks in scans if unsorted. But it will not fix the gaps caused by the lock mass scans. That is done "on the fly" during processing using Paul's module.
Title: Re: CentWave error
Post by: cbroeckl on October 09, 2012, 04:40:46 PM
FYI:

I figured out a way to modify my R script, and run it from the windows command prompt in batch.  It is far from elegant, but it appears functional and requires little person-time (still lots of computer time, C'est la vie).  To do so, open the command prompt and change the wd to the R directory
(the R folder should contain the R executable file Rscript.exe); in my case:

cd C:Program FilesRR-2.15.1binx64


Next: Create and R script (I called it convert1.R):

##load xcms library
rm(list=ls(all=TRUE))
library(xcms)
library(snow)
library(ncdf)
library(caTools)
library(XML)

setwd("C:/cdf_mzdata")
print(getwd())
seq<-read.csv(file="seq.csv", header=TRUE, row.names=1)
dim(seq)
dataset<-list.files(getwd(), pattern="CDF", recursive = FALSE)
length(dataset)
dataset<-as.vector(sapply(row.names(seq), grep, x = dataset, value = TRUE))
length(dataset)

i<-length(list.files(getwd(), pattern="mzData", recursive = FALSE))+1
xr <- xcmsRaw(dataset, profstep=0)
name<- dataset
name<-substr(name, 1, nchar(name)-4)
write.mzdata(xr, file=paste(name, ".mzData", sep=""))
rm(xr)
gc()
q()

Save this script in the same R folder.
One more step.  Create a 'batch' file called 'batch.bat' which contains
this:
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R
Rscript convert1.R


with the same line repeated for as many files as you have in your directory.  The R script counts the number of converted .mzData files and adds one to that number, so it advances one file each time. 

Then at the command prompt, type batch.bat and hit enter.

I am running it now and it seems to have bypassed the memory leak issue, as R is closed after each conversion.
Title: Re: CentWave error
Post by: sneumann on October 11, 2012, 03:32:04 PM
Hi,

I followed Ralf's idea to go back to plain cat() for writing
the mzData out. I haven't done any performance checks,
but at least the data looks almost as before, we just lost
proper indenting. Pushed as 1.35.1

It passes the unit tests, but I would be glad for some feedback
about the generated mzData. I also added polarity for the
MSn scans, which was lacking before. I just assumed it is
the same polarity as the parent scan, which I consider a safe guess ;-)

And if somebody can post some impressive speed improvement factors
before/after, that would be really cool.

Yours,
Steffen
Title: Re: CentWave error
Post by: Jan Stanstrup on October 12, 2012, 08:24:00 AM
Here you go for the  impressive speed improvements.
I wrote the same file 20 times.

Old way:  used 1.39GB of memory and took 9.83min
New way: used 0.03GB of memory and took 3.27min


So 3 times as fast and memory leak gone :) wonderful!
Title: Re: CentWave error
Post by: cbroeckl on October 12, 2012, 09:49:18 AM
Thanks Steffen,

I will give it a try soon!

Corey
Title: Re: CentWave error
Post by: cbroeckl on October 12, 2012, 01:45:28 PM
alright,
what am I doing wrong here: 

I went here to download the version:
http://www.bioconductor.org/packages/de ... /xcms.html (http://www.bioconductor.org/packages/devel/bioc/html/xcms.html)

installed from the download folder in R.

Installation went fine, but

> library(xcms)
Loading required package: mzR
Loading required package: Rcpp
Error in eval(expr, envir, enclos) :
  could not find function ".getNamespace"
In addition: Warning message:
package ‘xcms’ was built under R version 2.16.0
Error : unable to load R code in package ‘xcms’
Error: package/namespace load failed for ‘xcms’


If I download via:
source("http://bioconductor.org/biocLite.R")
biocLite("xcms")

I get an older version that does not generate this error. 
I even uninstalled R and reinstalled R 2.15.1.
Title: Re: CentWave error
Post by: Ralf on October 12, 2012, 01:51:12 PM
It almost looks like the problem is related to Rcpp.
Do you have the latest version of Rcpp installed ?
Title: Re: CentWave error
Post by: cbroeckl on October 12, 2012, 01:55:57 PM
That did it,
Thanks Ralf

not quite.  I actually must have been installing the latest stable version.  I am still getting the same error when I install the devel verion:

Loading required package: mzR
Loading required package: Rcpp
Error in eval(expr, envir, enclos) :
  could not find function ".getNamespace"
In addition: Warning message:
package ‘xcms’ was built under R version 2.16.0
Error : unable to load R code in package ‘xcms’
Error: package/namespace load failed for ‘xcms’

I am on 64 bit windows, FWIW.
Title: Re: CentWave error
Post by: sneumann on October 13, 2012, 12:42:52 PM
Hi,

can you load the individual dependencies,
esp. library(mzR) ?

Yours,
Steffen
Title: Re: CentWave error
Post by: cbroeckl on October 16, 2012, 01:35:20 PM
> library(Rcpp)
> library(mzR)
> library(xcms)
Error in eval(expr, envir, enclos) :
  could not find function ".getNamespace"
In addition: Warning message:
package ‘xcms’ was built under R version 2.16.0
Error : unable to load R code in package ‘xcms’
Error: package/namespace load failed for ‘xcms’
>

Same issue.  This is a new R session.  Doesn't sound like Jan had this problem, so it is either user error or platform dependent.
Title: Re: CentWave error
Post by: cbroeckl on October 17, 2012, 09:11:23 AM
any chance I need R 2.16.0?  I am running the latest stable version 2.15.1.
Title: Re: CentWave error
Post by: cbroeckl on October 18, 2012, 04:50:00 PM
Found this online:

"Beginning on October 2, 2012, with the release of Bioconductor 2.11, the way to use the development (devel) version of Bioconductor (2.12) is to install R-devel (R-2.16). Packages can then be installed normally; for example, this will install the devel version of IRanges and its dependencies:"

So I went and downloaded devel R, which is necessary for devel Bioconducor, and apparently one or the other is necessary for devel xcms, becuase now I seem to have the right version installed:

trying URL 'http://bioconductor.org/packages/2.12/bioc/bin/windows/contrib/2.16/xcms_1.35.1.zip'
Content type 'application/zip' length 1871951 bytes (1.8 Mb)
opened URL
downloaded 1.8 Mb

package ‘xcms’ successfully unpacked and MD5 sums checked

> library(xcms)
Loading required package: mzR
Loading required package: Rcpp
>
Title: Re: CentWave error
Post by: cbroeckl on October 18, 2012, 05:01:50 PM
and the new 'conversion' tool is working great.  Everything has to be devel versions! Thanks again,
Corey
Title: Re: CentWave error
Post by: sneumann on October 19, 2012, 05:00:28 AM
Hi,

glad you have it solved, sometimes the version and devel stuff
can be frustrating. Hope it helps someone else when googl'ing
their problem.

Yours,
Steffen
Title: Re: CentWave error
Post by: cbroeckl on October 19, 2012, 09:01:22 AM
I had two files that generate this error with new write.mxXML function. 
The two files are two functions (MS and MSe) derived from the same waters raw file, that was converted to two independent cdf files:

Error in if (is.unsorted(peaks[, "mz"])) { :
  missing value where TRUE/FALSE needed
Title: Re: CentWave error
Post by: Ralf on October 19, 2012, 12:04:52 PM
That means that some scans are either empty or contain only NA's for some reason ?
Can you send me one of those files ?