I am trying to convert a bunch of files (~1600) from cdf to mzData, and am seeing behavior that reminds me of memory leak issues. I am running R 2.15.1 64 bit. My code:
seq<-read.csv(file="seq.csv", header=TRUE, row.names=1) dim(seq) dataset<-list.files(getwd(), pattern="CDF", recursive = FALSE) length(dataset) dataset<-as.vector(sapply(row.names(seq), grep, x = dataset, value = TRUE)) length(dataset)
for (i in 1:length(dataset)) { xr <- xcmsRaw(dataset, profstep=0) name<- dataset name<-substr(name, 1, nchar(name)-4) write.mzdata(xr, file=paste(name, ".mzData", sep="")) rm(xr) gc() }
The symptoms: It is a slow process writing mzData files. On average, about 2.5 minutes per file. The first 90ish files went fine, but then a low memory message was triggered, and my resource monitor indicated that I had no RAM left. I am using all the R cleanup steps I know of to manage memory (rm() and gc()) but still ran out of memory, when I think I should not be. Anyone see any bugs in the script or tips for preventing this? It will take me weeks to do this 90 files at a time. Thanks
Thanks Steffen and Ralf. My concerns are assuaged! As a follow up then, does that profstep value have absolutely no influence on the conversion of cdf to mzData? The smaller profstep values generate really big matrices that consume alot of memory - if there is not need to use a smaller profstep, I will set it relatively high. corey
Will the native mass values be preserved better or worse using the a profstep value of 0.01 vs 0.5? Can I use this if I want to preserve the accurate mass data? Thanks again
Are there any functions in the XCMS peak picking steps to diagnose peak symmetry? I know that there are functions for fitting a guassian to detected peak as part of the area estimation, can any peak shape values be returned? Thanks. Corey
I really haven't made any progress - any one have any tips? I just to average a few spectra from infusion data to increase spectrum quality. Thanks, Corey
I am running some infusions of standard compounds and am trying to use the getSpec() function to collect an averaged spectrum for a range of about 30 seconds during which time my standard is eluting.
test<-xcmsRaw("120703_601.CDF", profstep=0.01) test spec<-getSpec(test, rtrange=c(45:90)) spec2<-spec[order(spec[,2], decreasing=TRUE),] spec2[1:20,]
As you can see, I am not getting any averaging of the spectra unless the mass matches exactly, though the documentation for getScan suggests it can be used for averaging multiple scans. Am I missing a parameter setting somewhere (a ppm error, for example), or am I trying to do something the function wasn't designed for? Any advice is greatly appreciated!
Corey
ps. as a secondary but relevent question: any way to subtract background spectrum from a portion of the infusion before the compound elutes?
It isn't number of scans that is variable, it is the length of each scan. On a particular method I am using a lockmass scan is collected every 20 seconds, with a scan time 0.5 seconds, while analytical scans are 0.2 seconds.
I had the same problem, for what it is worth. And I have been collecting data files where the lockmass scan time does differ from the analytical scan time, so having that feature in there would still be valuable, in my opinion. Thanks for all the effort Paul.
I just tried deleting the lockmass scan data from the raw file, then converting using MassWolf to mzXML. The masses in the MS channel data are the same in the original waters data file as viewed with MassLynx as they are in the converted mxXML data as viewed by the Insilicos viewer. THis implies that the data, as they are stored in the Waters format raw data file are in fact lockmass corrected already - it should not need to be done by the converter. Maybe ProteoWizard is trying to use it to adjust the MS data???
massWolf --mzXML R:Project.PRODataFilename.RAW
Also, the conversion seemed to work fine - all the MS data looks like it is there, and it seems that all the MSMS data is there as well. So simply deleting the FUNC0003 DAT IDX and STS files might be a simple option for MassWolf conversion - I am not a programmer but I bet one could easily script that.... I just tried in on a single file, but seems functional.
I actually collect the data as centroid, lockmass corrected, so I don't think that is the issue - it should already be lockmass corrected I think??? Unless I am misunderstanding the way waters writes the data. And no - I don't have a fix for the lockmass intermingled with the experimental data. Even though the MassWolf readme says "If the last function is MS and not MSMS, then it is assumed to be lockspray;" I am thinking that I just need to delete it from the 'raw' files before conversion, but I haven't tried this yet; such a crude method may cause more problems
All I want is to be able to bring MSMS data files from waters into XCMS, and it isn't trivial - any ideas?!