Metabolomics Society Forum

Software => R => XCMS => Topic started by: AmSidebottomIU on November 19, 2011, 11:31:54 AM

Title: Threshold for peak identification and bandwidth?
Post by: AmSidebottomIU on November 19, 2011, 11:31:54 AM
Hello all,

I have 2 questions:

1.  Does the XCMS generated Excel data include all peaks identified, or just the ones it considers "different" enough based on the statistical tests performed?  I've been thinking that XCMS removes the statistically similar peaks, but with the more reading I do I believe I am wrong.  So, XCMS highlight what is different between samples based on the t-test and p-value, but it also gives you information on what is the same?  Or no?

2.  And, when dictating the bandwidth parameter in the second group command (xset2<-group(xset2, bw=10)) what does this actually dictate and when is it necessary to change it? 

Thank you!
Ashley
Title: Re: Threshold for peak identification and bandwidth?
Post by: hpbenton on November 21, 2011, 07:38:04 AM
Ashley,

1) XCMS doesn't remove peaks as such, rather it doesn't report them at the end. Also, that aren't reported were chosen not by statistical testing on the intensity but rather the number of times that the peak detection algorithm found/saw the peak. The default is 50% of one class. This parameter 'minfrac' controls how much we need to see a feature in a single class for it to be continued in the analysis. So, In your typical KO vs WT example, if you have 10 files in each then you need to see feature A at least 5 times in either KO or WT. The parameter can be changed in the 'group' method. However, if you use the newer (not necessarily better) grouper algorithm ''nearest" ( 'group(method="nearest")' ), there is no minfrac and so all feature are reported by default. So yes if you turn off minfrac=0, then you'll get all the features. If you're looking for something that is the same in all class groups I would suggest using a test that tests for equal variances.

2) The bw parameter is essentially the maximum time that the features could be separated by. So if you retention time plot come back and the deviation is 5sec then you want to change the bw to 5sec. With grouping we're essentially asking how big or small does the box have to be to encapsulate one feature.
Code: [Select]
group(gret, bw=5, mzwid=0.1) ## Here bw is retention time and mzwid is the mass width in dalton.

Hope it helps,

Paul