I am trying to use XCMS to perform peak detection on some authentic standards to get a nice clean spectrum. Here is an example for the compound catechin:
The original peak of interest (the molecular ion in this case) is there: xset6@peaks[which(xset6@peaks[,"mz"] > 291.086 & xset6@peaks[,"mz"] < 291.087),]
But is not making it into the peak table even though I have tried to set the minfrac or minsamp values to zero, or some trivial non-zero value such as 0.000001. I could work around this and access the @peaks slot, but this seems to be reinventing the wheel. And it isn't a mass accuracy grouping artifact, as there are no peak groups within several dalton of the 291.0866 peak in the xset6 peak table. Any one have any idea how I can get a peakTable which hasn't filtered out the 'rare' features?
I had two files that generate this error with new write.mxXML function. The two files are two functions (MS and MSe) derived from the same waters raw file, that was converted to two independent cdf files:
Error in if (is.unsorted(peaks[, "mz"])) { : missing value where TRUE/FALSE needed
"Beginning on October 2, 2012, with the release of Bioconductor 2.11, the way to use the development (devel) version of Bioconductor (2.12) is to install R-devel (R-2.16). Packages can then be installed normally; for example, this will install the devel version of IRanges and its dependencies:"
So I went and downloaded devel R, which is necessary for devel Bioconducor, and apparently one or the other is necessary for devel xcms, becuase now I seem to have the right version installed:
> library(Rcpp) > library(mzR) > library(xcms) Error in eval(expr, envir, enclos) : could not find function ".getNamespace" In addition: Warning message: package ‘xcms’ was built under R version 2.16.0 Error : unable to load R code in package ‘xcms’ Error: package/namespace load failed for ‘xcms’ >
Same issue. This is a new R session. Doesn't sound like Jan had this problem, so it is either user error or platform dependent.
not quite. I actually must have been installing the latest stable version. I am still getting the same error when I install the devel verion:
Loading required package: mzR Loading required package: Rcpp Error in eval(expr, envir, enclos) : could not find function ".getNamespace" In addition: Warning message: package ‘xcms’ was built under R version 2.16.0 Error : unable to load R code in package ‘xcms’ Error: package/namespace load failed for ‘xcms’
> library(xcms) Loading required package: mzR Loading required package: Rcpp Error in eval(expr, envir, enclos) : could not find function ".getNamespace" In addition: Warning message: package ‘xcms’ was built under R version 2.16.0 Error : unable to load R code in package ‘xcms’ Error: package/namespace load failed for ‘xcms’
If I download via: source("http://bioconductor.org/biocLite.R") biocLite("xcms")
I get an older version that does not generate this error. I even uninstalled R and reinstalled R 2.15.1.
I figured out a way to modify my R script, and run it from the windows command prompt in batch. It is far from elegant, but it appears functional and requires little person-time (still lots of computer time, C'est la vie). To do so, open the command prompt and change the wd to the R directory (the R folder should contain the R executable file Rscript.exe); in my case:
cd C:Program FilesRR-2.15.1binx64
Next: Create and R script (I called it convert1.R):
Save this script in the same R folder. One more step. Create a 'batch' file called 'batch.bat' which contains this: Rscript convert1.R Rscript convert1.R Rscript convert1.R Rscript convert1.R Rscript convert1.R Rscript convert1.R Rscript convert1.R Rscript convert1.R Rscript convert1.R
with the same line repeated for as many files as you have in your directory. The R script counts the number of converted .mzData files and adds one to that number, so it advances one file each time.
Then at the command prompt, type batch.bat and hit enter.
I am running it now and it seems to have bypassed the memory leak issue, as R is closed after each conversion.
You mentioned earlier in this thread a 'corrupt file' phenomenon with cdf format. This recommendation was what I was following in attempting to convert to mzData. If it is somehow corrupted in cdf form, it becomes uncorrpted in mzData form. Further, I tried (once) to use the write.cdf() function and the rewritten cdf file also failed to work with centWave.
That being said, the write.mzData function output is functional with centWave. I have tried it on many files successfully. And it never works on the cdf files it was derived from. I have looked at the centwave output and I beleive it is working properly on the mzdata files converted by XCMS::write.mzdata().
I was under the impression that the lockmass fill function that Paul was working on was more to correct for the signal gap to provide better quantitative values for XCMS output, rather than to allow functionality.
I am collecting Waters Raw data. I need to get it into a usable format for processing in XCMS. Waters conversion tool can convert to cdf, but not much else. They do have tools (in proteomics packages) that can convert to mzXML, but they require profile data - I have centroid. As far as I can tell, there are no tools in the Waters packages to convert centroid data to anything other than cdf (with respect to XCMS compatibility, ASCII is the only other output format).
I am operating a Q-TOF, which requires a lock-mass correction. As waters stores its raw data, it is not lockmass corrected, but is corrected on the fly either during conversion to CDF or for viewing in Waters software. I looked into Proteowizard, which will convert to mzXML, but does not perform lock-mass correction - an issue they are working with waters with. So the mzXML masses are off - accurate mass data is gone.
I can use cdf format using the matchedFilter algorithm, but the centWave algorithm does not work with cdf data as it is written by DataBridge, waters conversion tool. I want to use centWave for this application because 1. it is supposed to be better for accurate mass data, and 2. it can give me some gaussian fit parameters for each peak, which aid in downstream interpretation.
As an aside, we can collecting data using MSe acquisition, and a nice perk of the Databridge conversion is that raw data files are split into independent data files for function 1: MS; function 2: MSe, and function 3: lockmass. This means that I can perform peak detection in XCMS on both the MS and MSe functions, a critical aspect for the workflow we are developing for mining indiscriminant MSMS data. I am trying to talk waters into offering the option to split their raw files during conversion to mxXML in proteowizard as well, but it is slow going - in fact I have no idea if, much less when.
I feel like I am fighting data conversion problems with every turn. It is not the fault of the XCMS developers at all - I was just hoping to use the write mzData function to get all my CDF files into a format that centWave could read, so I could actually use the peak detection algorithm best suited to the data.
I viewed it as a continuation of this thread, or I would have started a new one.
I downloaded OpenMS to try it and it doesn't seem to be able to read cdf files. I think I am stuck for the time being. I am trying to push waters to help with the conversion options, but that is moving slowly. Proteowizard has a tool which doesn't use the lockmass data, so mass accuracy is compromised.
I just realized after starting a new batch that I am actually running into huge memory consumption after < 10 files, and then it starts running slow for a long time before running out of memory.