Metabolomics Society Forum

Software => XCMS Online => Topic started by: courtneys on October 17, 2018, 02:24:07 PM

Title: Questions about LCMS data preprocessing R vignette
Post by: courtneys on October 17, 2018, 02:24:07 PM

I have two questions on the vignette for the xcms R package, 'LCMS data preprocessing and analysis with xcms.'

In the vignette it says "We set it to 20,80 for the present example data set" when referring to the peakwidth parameter for centWave, but in the actual function they use 30,80. This makes a big difference in the Faahko data set, which is recommended?

Why do they use minFraction = 0.8 for PeakDensityParam but minFraction = 0.85 for PeakGroupsParam?
Title: Re: Questions about LCMS data preprocessing R vignette
Post by: sneumann on October 17, 2018, 11:06:28 PM
So,

COURTNEY SCHIFFMAN wrote:
> In the vignette you say "We set it to 20,80 for the present example
> data set" when referring to the peakwidth parameter for centWave, but
> in the actual function you use 30,80.

Ah, that discrepancy is clearly a typo then.

> This makes a big difference, which do you recommend?

That really depends on the chromatography and gradient used.
E.g., on a 20 minute UPLC gradient we went down to c(5,12).

One way to check is to plot what you actually get:

        hist(peaks(xs)[,"rtmax"]-peaks(xs)[,"rtmin"], breaks=100)

This shows the peakwidths distribution found in your data set,
and you can try a few different peakwidths ranges to see
what peakwidths are then found. Beware: if you select blatantly wrong,
e.g. c(30,80) on the above UPLC gradient,
you will still find "something". But the histogram helps to figure out
whether the majority of peakwidths is within your peakwidth range.

> Why do you use minFraction = 0.8 for PeakDensityParam
> but minFraction = 0.85 for PeakGroupsParam?

I was not aware of that difference, but that threshhold
does depend on the size(s) of your sample groups, and how
homogenous you'd expect them to be, and how much "noise"
you'd accept after grouping.

On Tue, 2018-10-16 at 19:55 -0700, COURTNEY SCHIFFMAN wrote:
> ...
> Why with the snthresh=10 in "CentWaveParam" are there still
> chromatographic peaks with an sn less than 10 after running
> "findChromPeaks"?

I had to dig the exact answer from the code:
https://github.com/sneumann/xcms/blob/eb6c61d2f081ea7ac6aeb1aa958f8a52fb70a91d/R/do_findChromPeaks-functions.R#L950

The summary is that the threshhold is calculated as
  sdthr <- sdnoise * snthresh

and the SN you see in the peaks table is

  https://github.com/sneumann/xcms/blob/eb6c61d2f081ea7ac6aeb1aa958f8a52fb70a91d/R/do_findChromPeaks-functions.R#L1066
  round((maxint - baseline) / sdnoise), ##  S/N Ratio

So indeed there is some room for confusion.

Yours,
Steffen