I go through the regular XCMS workflow stages: peak identification (centWave) -> retention time correction (OBI-Warp) -> grouping (nearest) -> filling. However, when I use the resulting peak lists, despite the grouping, I end up with a lot of duplicate peaks. This of course gives problems for the differential report, as well as for subsequent multivariate analysis (e.g. PCA).
I've been having this problem for some time now. On a previous data set I switched from density-based grouping to nearest grouping, which seemed to decrease the amount of duplicate peaks to a handful. However, for a new data set I'm working on at the moment, nearest grouping results in hundreds of duplicate peaks as well.
I've tried to merge peaks manually, but some samples have different values for several of the duplicate peaks, so I don't know how to combine these values.
What causes these identical peaks to not be grouped together? Thank you for your help.
This is my relevant code:
# peak identification
set <- xcmsSet(files=rawfiles, method="centWave", ppm=30, peakwidth=c(10,60), prefilter=c(0,0), nSlaves=8)
####################################################################################################
sample.names <- sampnames(set)
class.label <- sampclass(set)
for(r in 1:length(sample.names)) {
start <- gregexpr(pattern="_", sample.names[r], fixed=TRUE)[[1]][1] + 1
end <- gregexpr(pattern=".mzdata", sample.names[r], fixed=TRUE)[[1]][1] - 1
sample.names[r] <- substr(sample.names[r], start, end)
}
sampnames(set) <- sample.names
####################################################################################################
# RT correction
corset <- set
pdf(paste(out.folder, "rt-cor.pdf", sep="/"))
corset <- retcor(corset, method="obiwarp", plottype="deviation", response=10, profStep=0.1, distFunc="cor_opt", gapInit=0.3, gapExtend=2.4)
dev.off()
# group corresponding peaks across samples
corset <- group(corset, method="nearest")
# fill missing peak values
fset <- fillPeaks(corset)
Session info:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.2.0 (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] gplots_2.14.1 xcms_1.41.0 Biobase_2.25.0 BiocGenerics_0.11.5 mzR_1.11.11 Rcpp_0.11.2
loaded via a namespace (and not attached):
[1] KernSmooth_2.23-13 bitops_1.0-6 caTools_1.17.1 codetools_0.2-9 gdata_2.13.3 gtools_3.4.1 tools_3.1.0