Why do I have fewer compounds with more samples?

March 28, 2014, 08:43:42 PM

When I use xcms to process 50 samples, I get ~4000 compounds for one particular dataset. When I use xcms to process 150 samples -- including the original 50 samples -- I get ~1500 compounds. What's going on? Why would more samples result in fewer compounds? This is particularly disconcerting for these data because the aligned data with 50 samples include a compound we're interested in and the aligned data with 150 samples do not.

Here's an example of my code:

Code: [Select]

Samples <- list.files(getwd(), pattern="mzdata.xml", full.names=F, recursive=TRUE)

xs1 <- xcmsSet(Samples[1:50], method = "centWave",  ppm=15, peakwidth=c(4,12), 
               snthresh = 5, mzCenterFun="apex", prefilter=c(5,500),
               integrate = 1, fitgauss= TRUE)

xs2 <- xcmsSet(Samples[51:150], method = "centWave",  ppm=15, peakwidth=c(4,12), 
               snthresh = 5, mzCenterFun="apex", prefilter=c(5,500),
               integrate = 1, fitgauss= TRUE)

xset.grouped <- group(c(xs1, xs2)), method="density", bw=4, 
                          minsamp=1, mzwid=0.007, max=500)

xset.RTcor <- retcor(xset.grouped, method="peakgroups", 
                         missing=20, extra=50, smooth="loess", 
                         family="symmetric", plottype="none")

xset.grouped2 <- group(xset.RTcor, method="density", minsamp=1, 
                           mzwid=0.007, bw=2, max=500)

xset.filledpeaks <- fillPeaks(xset.grouped2)

xset.peaks <- peakTable(xset.filledpeaks, filebase="xset peak table")

If I only align xs1, I get more compounds than if I align both xs1 and xs2.

Thanks for any help!

Laura

Re: Why do I have fewer compounds with more samples?

Reply #1 – April 01, 2014, 01:37:31 AM

In your example you are not setting "minfrac" in the group function. The default is minfrac = 0.5. So if you group only xs1 the peaks need to be in 50 % of the samples in xs1. If you group both the xs1 and xs2 the peaks need to be in 50 % of all samples to survive grouping. The last might be the case less often.
Also note that minfrac and minsamp is per sample class. XCMS tries to assign the classes based on the folder structure.

Re: Why do I have fewer compounds with more samples?

Reply #2 – April 01, 2014, 09:59:39 AM

Thank you, Jan; that solved the problem. When I set minfrac lower, I got more compounds, including the particular compound I was looking for. Thanks!