Skip to main content
Topic: CentWave error (Read 19787 times) previous topic - next topic

CentWave error

Hello users,

I am playing with XCMS v 1.27.4.  I am having an issue with a dataset that I started working with.  If i use the matchedFilter peak detection, the process goes smoothly.  If I use the centWave method:

xset3<- xcmsSet(dataset, method = "centWave", ppm=25, peakwidth=c(2,40), snthresh=2, integrate=2, mzdiff=0.1, fitgauss=FALSE, verbose.columns=TRUE)

I receive an error message that :

"
11-07-18_test_005501:
 Detecting mass traces at 25 ppm ...
 % finished: 0 10 20 30 40 50 60 Error in .local(object, ...) :
  m/z sort assumption violated ! (scan 670, p 129, current 170.1777 (I=5.06), last 170.1777)
In addition: Warning messages:
1: closing unused connection 6 (<-PMFLAB-04.cvmbs.ColoState.EDU:10187)
2: closing unused connection 5 (<-PMFLAB-04.cvmbs.ColoState.EDU:10187)
3: closing unused connection 4 (<-PMFLAB-04.cvmbs.ColoState.EDU:10187)
4: closing unused connection 3 (<-PMFLAB-04.cvmbs.ColoState.EDU:10187)
"
where 11-07-18_test_005501 is the problem filename.  When I use the nSlaves option, I end up with,  this message

"
Error in checkForRemoteErrors(val) :
  5 nodes produced errors; first error: m/z sort assumption violated ! (scan 670, p 129, current 170.1777 (I=5.06), last 170.1777)
"

and :
> traceback()
4: stop(count, " nodes produced errors; first error: ", firstmsg)
3: checkForRemoteErrors(val)
2: xcmsClusterApply(cl = snowclust, x = argList, fun = findPeaksPar)
1: xcmsSet(dataset, nSlaves = 4, method = "centWave", ppm = 25,
      peakwidth = c(2, 40), snthresh = 2, integrate = 2, mzdiff = 0.1,
      fitgauss = FALSE, verbose.columns = TRUE)


So apparently there are something about a few of these files that centWave does not like, but matchedFilter is OK with.  These are cdf files produced by Markerlynx DataBridge from waters raw files.  I am using R 3.13.1.  Anyone seen this issue previously?

Thanks

Re: CentWave error

Reply #1
Yes, this issue with corrupt data files has been seen before.

You can also use OpenMS'
FileInfo -c -in mydata.mzXML
to verify the file integrity.

Try to re-convert and/or report the problem to the vendor.

The only known work-around is to read the files into XCMS using xcmsRaw and write with the
 write.mzdata function, the bad scans will be automatically fixed when writing the mzData files.

PS: Not sure, but OpenMS' FileConverter might also be able to fix the files.

Re: CentWave error

Reply #2
So the matchedFilter method is insensitive to this issue while centWave is?

Re: CentWave error

Reply #3
matchedFilter generates the profile matrix from the raw data and detects peaks from the binned data.
It simply doesn't check for this type of problem. However, the effects have not been studied.

centWave's ROI algorithm works directly on the raw data and does a number of checks to ensure raw data integrity.

You can use XCMS or OpenMS to fix these files.

Re: CentWave error

Reply #4
Thanks Ralf,

I was just hoping to avoid having to convert to yet another format.  MassLynx databridge doesn't covert to mzXML, so I now have another step and another (roughly) doubling of hardrive space for each dataset.  I will try it though.


Re: CentWave error

Reply #6
Ralf,

When using xcmsRaw(), does the profstep value influence the mass accuracy of the raw data?  For example:

xr <- xcmsRaw("test.CDF", profstep=0.01)
write.mzdata(xr, file="test.mzData")

Will the native mass values be preserved better or worse using the a profstep value of 0.01 vs 0.5?  Can I use this if I want to preserve the accurate mass data?  Thanks again

Re: CentWave error

Reply #7
It will not affect the raw data (and its mass accuracy) in this example.
The xcmsRaw object always contains the complete, unmodified raw data.

The profile matrix is generated on the fly whenever needed.
Only some functions like findPeaks.matchedFilter make use of the so called profile matrix, where the bin size of this matrix is determined by the profStep value. Only for these functions the profStep value will have an effect on mass accuracy.
findPeaks.centWave for example does not use the profile matrix but reads the raw spectra directly.

Re: CentWave error

Reply #8
Hi,

write.mzdata() will write the exact values present in the original raw file.

profstep is a parameter for the creation of a re-sampled rectangular
intensity matrix used in some plotting functions and matchedFilter.

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: CentWave error

Reply #9
Thanks Steffen and Ralf.  My concerns are assuaged!  As a follow up then, does that profstep value have absolutely no influence on the conversion of cdf to mzData?  The smaller profstep values generate really big matrices that consume alot of memory - if there is not need to use a smaller profstep, I will set it relatively high.
corey

Re: CentWave error

Reply #10
You can just set it to zero  ;)

Re: CentWave error

Reply #11
even better!  Much obliged...

Re: CentWave error

Reply #12
I am trying to convert a bunch of files (~1600) from cdf to mzData, and am seeing behavior that reminds me of memory leak issues.  I am running R 2.15.1 64 bit. My code:

##load xcms library
rm(list=ls(all=TRUE))
library(xcms)
library(snow)
library(ncdf)
library(caTools)

seq<-read.csv(file="seq.csv", header=TRUE, row.names=1)
dim(seq)
dataset<-list.files(getwd(), pattern="CDF", recursive = FALSE)
length(dataset)
dataset<-as.vector(sapply(row.names(seq), grep, x = dataset, value = TRUE))
length(dataset)

for (i in 1:length(dataset)) {
xr <- xcmsRaw(dataset, profstep=0)
name<- dataset
name<-substr(name, 1, nchar(name)-4)
write.mzdata(xr, file=paste(name, ".mzData", sep=""))
rm(xr)
gc()
}


The symptoms:  It is a slow process writing mzData files.  On average, about 2.5 minutes per file.  The first 90ish files went fine, but then a low memory message was triggered, and my resource monitor indicated that I had no RAM left.  I am using all the R cleanup steps I know of to manage memory (rm() and gc()) but still ran out of memory, when I think I should not be.  Anyone see any bugs in the script or tips for preventing this?  It will take me weeks to do this 90 files at a time.  Thanks

Corey

Re: CentWave error

Reply #13
Yep... There is a memory leak in write.mzdata. You can show this if you just write the same data over and over again. Memory usage will increase. I tried to look at the code some time ago but couldn't find an obvious problem.

I hope someone will find time to fix this issue soon as I too wanted to use xcms as a batch converter.
I am hoping that the writing routine can be made more efficient too. It seems illogical that writing is so many many times slower than reading the data; maybe this is my lack of understanding of what needs to be done though. As I remember reading the code it looked like it is adding one scan at a time to the file. Maybe this is not an efficient way?
Blog: stanstrup.github.io

Re: CentWave error

Reply #14
addendum

I just realized after starting a new batch that I am actually running into huge memory consumption after < 10 files, and then it starts running slow for a long time before running out of memory.