Re: Extracting blank samples in a discovery batch
Reply #3 –
You only really need to use the data grouping if you want to use the stats part of XCMS. As for what is best to do with group I am not that sure myself. Personally I usually don't group the data and just do the kind of calculations you have done.
Some thoughts to consider:
If you expect your groups to be very different then it might be easier to do the alignment/grouping sensibly by grouping the data.
Also, if for example you have many groups and you do the calculations you did you might have a very low threshold for inclusion that could let a lot of noise in.
I would recommend not processing your sample with your blanks if possible as they often cause a lot of trouble for the alignment (since so few features are there).
Yes, but putting it a bit lower to allow it to miss a few in smallest group too.
But if you do that it would also only need to be found within 15% of any of the samples which might be a bit too liberal. minfrac and minsamp should work as an AND so you could also just set minfrac low and use minsamp to control what happens in the "real groups" that might be easier to do sensibly without grouping the data. If you have similar sized groups that is. Otherwise grouping the data starts to make most sense.
No I am not suggesting to process the datasets separately. Just for the grouping step you should consider if you want features that are only found in a subset or not. If you expect that you do you can either group the data and set minfrac and minsamp more strictly or if you don't group the data do the considerations/calculations you did.
In my mind it only really matters if you have very different groups so you'd expect some features to be absent from one group. Then you might be able to be a bit more strict (and thus get less noise) within each group than you would otherwise have to be with the considerations you just did.
Otherwise the calculations you did is how I would think. Just putting the requirements a bit lower to allow features to not have been found in exactly all samples within a group.