Skip to main content
Topic: fillPeaks deleting feature groups? (Read 4794 times) previous topic - next topic

fillPeaks deleting feature groups?

Hi all,
When using the "nearest" grouping method followed by fillPeaks, i find that my xcmsSet object has less groups following fillPeaks than it did just after grouping.  The peaks in these groups are not sorted into new groups - they're just gone from the dataset.  I've investigated this and can find no pattern in the feature groups that would help explain why fillPeaks deletes them.
Has anyone else noticed a discrepancy in the number of groups before and after using fillPeaks?
Thanks,
matt
note to moderators: this is my third time trying to post this question. Please let me know by PM if there is some reason why this keeps getting moderated out!  Thanks!!

Re: fillPeaks deleting feature groups?

Reply #1
About your post: I don't know what happened but it seems all old accounts were at some point in 2011 set to another user group. So your posts ended up as posts that needed to be approved. I have approved your posts and your account should be fixed too.

For you problem it will be a lot easier to figure out if you can post an example. FillPeaks should not remove anything. Why are you using "nearest" btw? Normally you'd use the default that is "density".
Blog: stanstrup.github.io

Re: fillPeaks deleting feature groups?

Reply #2
Hi Jan,
Thanks for the reply and the note about the post moderation.  I suppose that makes me a long time lurker/first time poster.  :)
In my experience, any example processing using nearest/fillPeaks will lead to the same outcome.  This is why i'm interested in whether anyone else has seen this.  I see it all the time.  I'm happy to peakpick some files and upload the .RDA file, if that's what you mean by post an example.
As for why i use nearest - personally, I don't like the concept of density-based grouping, especially when the bin-width is such a non-intuitive parameter.  More to the point, i don't like that it is able to (and often does) group peaks together from the same sample, leading to npeaks># samples. 
I understand that at some point in that process, one of the (potentially) multiple peaks extracted from the sample is selected as the "best" one, but i'm not sure how that selection is controlled.
I much prefer - conceptually - the idea of matching peaks using a scoring system, one per sample.  It is my experience with synthetic datasets that this gives much more accurate results than density-based grouping.
For high res (UPLC) data, the density method really seems to butcher things.
Thanks for taking the time to read and respond - much appreciated.
matt

 

Re: fillPeaks deleting feature groups?

Reply #3
Code: [Select]
library(faahKO)
library(xcms)

xs_dens <- group(faahko,method="density")
xs_near <- group(faahko,method="nearest")

xs_filled_dens <- fillPeaks(xs_dens)
xs_filled_near <- fillPeaks(xs_near)

Code: [Select]
> xs_dens
An "xcmsSet" object with 12 samples

Time range: 2506.1-4147.7 seconds (41.8-69.1 minutes)
Mass range: 200.1-599.3338 m/z
Peaks: 4776 (about 398 per sample)
Peak Groups: 407
Sample classes: KO, WT

Profile settings: method = bin
                  step = 0.1

Memory usage: 0.709 MB
> xs_filled_dens
An "xcmsSet" object with 12 samples

Time range: 2502.9-4150.8 seconds (41.7-69.2 minutes)
Mass range: 200.1-599.3338 m/z
Peaks: 6121 (about 510 per sample)
Peak Groups: 407
Sample classes: KO, WT

Profile settings: method = bin
                  step = 0.1

Memory usage: 0.831 MB
>
> xs_near
An "xcmsSet" object with 12 samples

Time range: 2506.1-4147.7 seconds (41.8-69.1 minutes)
Mass range: 200.1-599.3338 m/z
Peaks: 4776 (about 398 per sample)
Peak Groups: 1496
Sample classes: KO, WT

Profile settings: method = bin
                  step = 0.1

Memory usage: 0.86 MB
> xs_filled_near
An "xcmsSet" object with 12 samples

Time range: 2501.4-4150.8 seconds (41.7-69.2 minutes)
Mass range: 200.1-599.4 m/z
Peaks: 17952 (about 1496 per sample)
Peak Groups: 1496
Sample classes: KO, WT

Profile settings: method = bin
                  step = 0.1

Memory usage: 2.16 MB


Not sure if we are talking about the same thing but the number of peak groups seem to stay constant in this example.
Blog: stanstrup.github.io