At which steps are the sample classes used?

Topic: At which steps are the sample classes used? (Read 4362 times) previous topic - next topic

At which steps are the sample classes used?

November 20, 2014, 12:10:33 PM

Hi,
I am processing some data using XCMS and I want to make sure I do not use the sample class information at all when processing the data, because I think this would bias the later analyses (for example, if you change the samples around in the classes, you would get different peaks and different intensities - so what if you're blinded to the groups or you want to compare cases and controls, then later compare males and females?). As a result, I am setting minfrac = 0, then filtering based on npeaks, as was done here http://metabolomics-forum.com/viewtopic.php?f=8&t=272 . I want to make sure that the sample classes are not used elsewhere though, for example in the fillPeaks step. It doesn't seem like they should be used, but I was confused for instance by this post: http://metabolomics-forum.com/viewtopic.php?f=25&t=148 which states:

Quote

fillPeaks() will add intensities for peaks not observed in a certain sample, but in some/most of the other samples of a smple class.

It seems like this would introduce differences between the two sample classes even if there were none...
Basically, I want to know whether the sample classes are used in any other processing step besides the group step where they are used via minfrac and minsamp.

Thank you so much for your help! I'm a newbie in this area and I've found this forum super-helpful so far!

Cheers,
Maria

Re: At which steps are the sample classes used?

Reply #1 – November 25, 2014, 03:03:09 AM

Hi,

the quote you found on fillPeaks() only refers to the fact
that peaks are filled if they were found in a previous group() step,
and you're correct there the sample class matters (unless the processing
is done the way you described it).

Of course diffreport() uses sample classes, but you knew that :-)

Yours,
Steffen

Re: At which steps are the sample classes used?

Reply #2 – November 25, 2014, 03:20:14 AM

If I understand correctly you want to make sure that the data is treated as one large "class". I believe you can use phenoData(xset) to makes sure all samples are assigned to the same class.

Re: At which steps are the sample classes used?

Reply #3 – November 30, 2014, 09:32:34 PM

Thank you both!
Indeed, the sample classes are only used at the minfrac stage (well, before diffreport, of course!) I checked this by both dumping all the files in a single folder and by scrambling the samples around in two different folders.

The minfrac parameter plays a really big role in the preprocessing and it seems to me that a lot of users just use the default of 0.5. I wonder if in the future there could be a built-in option of filtering directly on npeaks, instead of using the two-step approach I did. I just find it weird to obtain different peaks if I scramble the files within the folders... The only time this might make sense is if you have 2 or 3 sample classes which correspond to technical replicates, so that the same samples are run 2 or 3 times.

Thanks again!
Cheers,
Maria