Skip to main content
Topic: Why can you not compare multiple cases to a single control in XCMS? (Read 4354 times) previous topic - next topic

Why can you not compare multiple cases to a single control in XCMS?

Hello,

Something that I have noticed when I process my own data in a case control manner is that I get aligned peak tables (assuming I processes the data once for each of my two cases) with more unique features from the two peak tables than the peak table I get when I process my data using both cases and the same control. There are equivalent number of replicates of each case and the control. I know xcms is meant to be run in a case/control fashion, but does anyone know why I might observe this difference in the number of unique features?

Just to clarify a few things, I have used peak picking parameters that have been optimized to our instrument and the raw data. For the most part, they are similar. I have also set classes with respect to each experimental case, and set the retention time correction parameters to account for the appropriate number of samples that may be missing due to the fact that they are in only one treatment and not the other.

I would be very grateful if anyone had any knowledge of how these algorithms work so I could improve my ability to use them.
Craig

Re: Why can you not compare multiple cases to a single control in XCMS?

Reply #1
Hi,

I am not sure it is completely clear what you are comparing. Are you talking about processing with and without dividing the samples in groups? Or two completely separate processing of the two groups?

Very likely it is the grouping step. The settings there are per group so it matters if you process things as one group or not: https://rawgit.com/stanstrup/XCMS-course/master/1.%20XCMS.html#/30
Blog: stanstrup.github.io

Re: Why can you not compare multiple cases to a single control in XCMS?

Reply #2
Hi Jan,

Thank you for your comment and the link. In the past, your PowerPoint was very helpful for me to learn more about XCMS.

Let me try to clarify. In my case, I have extracted the metabolome for cells under two different types of stress conditions as well as a control condition. When I processed the data, I set the classes of each group of data using the sampclass() function in xcms to represent which group the data was generated from. I have made sure that the minsamp parameters is below the number of total samples within each treatment group. I have also adjusted the extra and missing parameters to permit features that appear in one of the three groups to be retained. Yet, I get different peak tables if I process each case against the control separately, rather than if I process both cases together with the control at the same time. I do not understand why this is.

Thank you
Craig

Re: Why can you not compare multiple cases to a single control in XCMS?

Reply #3
If I understand correctly you have 3 groups: A, B, C.
1) If you do A and C you get features found in A+C
2) If you do B and C you get features found in B+C
3) If you do A, B and C you get features found in A+B+C.

So why should your not get different peaktables in those 3 cases?
Blog: stanstrup.github.io

Re: Why can you not compare multiple cases to a single control in XCMS?

Reply #4
Kind of. If I take the sum of all features detected from A + C and B + C, making sure to account for overlap by setting an appropriate retention time and ppm windows, I get more features than if I did A + B + C together. I am not getting the intersect between any two or all three during processing in any case. Instead, what I see is that I get more features that are exclusive to A, B, or C when I processes samples in a pairwise manner, than I do when I process them together.
Craig

Re: Why can you not compare multiple cases to a single control in XCMS?

Reply #5
Hmm. Not obvious to me what is going on then. What you see is true also before retcor after the first group?
Blog: stanstrup.github.io

Re: Why can you not compare multiple cases to a single control in XCMS?

Reply #6
Hello,

I have a similar issue, I'm rather new to xcms (and metabolomics in general) and would appreciate some clarifications.

I also have more than 1 sample class beside control, to build on the previous posts let them be class A, B, C (control). I organized them in subdirectories to allow for the automated classification, and I am following the standard protocol:

xcmsSet -> retcor -> group -> fillPeaks -> diffreport

diffreport is supposed to support only two class arguments "class1" and "class2", return t-test for those two classes and, if more classes are present, also perform anova.

The xcmsSets obtained processing A+C, B+C, A+B+C will all differ due to the applied corrections (e.g. I found a feature M162T72 become M162T73), and I guess there is not a "correct" one, but if I want in general to be able to compare all my samples, I should go for a consistent set of features and use xsetABC.

My questions is: is it correct to just run diffreport(xsetABC, "C", "A") and diffreport(xsetABC, "C", "B") or is there something else?

In particular, may I ask you Craig what you mean by "process each case against the control separately, rather than if I process both cases together with the control at the same time"? Is the same as what I mean by saying that I obtain either xsetAC, xsetBC, xsetABC?

Finally, is metaXCMS the option to go? In the published protocol they write "Although XCMS and other metabolomic programs that have been developed are well suited for the analysis of large sample numbers, the programs are limited in that they only compare two different sample groups directly". It sounds kind of definite that there is no other way than pairwise comparison.

Thanks!

Re: Why can you not compare multiple cases to a single control in XCMS?

Reply #7
My thought was too that it had to be the different retcor. That is why I asked if there is the same issue after the first grouping.

I definitely don't see any need for metaXCMS if the samples were analyzed together.

As for stats I really urge you not to just use the stats in XCMS. You are missing things that you probably need such as:
* Drift correction
* Correction for multiple testing --> FDR.
* Statistical model that takes into consideration all the factors in your study.

I don't think there is anything *wrong* with doing C vs A and C vs B but at the very least you'd need FDR correction on the whole set of p-values.
So I'd advise investing some time into doing stats in R using lm or lmer (and/or something multivariate) depending on your study.
Rick Dunn talks about some of these things in the last talk of the Data processing workshop here (unfortunately the last part was cut):
 http://metabolomicssociety.org/site-map/articles/88-videos/262-2017-conference-workshop-videos-public
Blog: stanstrup.github.io

Re: Why can you not compare multiple cases to a single control in XCMS?

Reply #8
Hi Jan,

thanks a lot for the heads up, will check the points you mentioned!

cheers
anto