retention time correction for individual sample classes

August 31, 2015, 08:27:56 AM

We are working with xcms to analyze plant secondary metabolites for different species of closely related plants using a waters G2QTOF UPLC-MS.

We have a question regarding correcting for retention time shifts across many samples with a limited number of shared features. It is possible individual sample classes will share only a single internal standard.

Each species makes up a sample class with 5 individuals as replicates per class. However, given the total number of samples, data acquisition has occurred over time and some RT shifts have been observed on the order of (+/- 1.0 min)

Our understanding of the retcor-loess and Obiwarp methods indicates we will have issues aligning all sample classes because there are not enough shared features. Similarly, obiwarp chooses a single sample to align the rest with such a diverse set of samples the chosen sample will not be representative of all other samples.

Does retcor() work by aligning samples within a sample class first then aligning across samples classes?

Is it possible to apply RT correction to each sample class individually then merge or save these corrected xcmsSET objects for a general analysis of all samples?

Thank you for your help!

Re: retention time correction for individual sample classes

Reply #1 – August 31, 2015, 08:45:20 AM

Hi Dale,

sounds like some severe batch effect in your chromatography. Can you give, for each batch, a rough RT deviation estimate ?
Can you guess if it would help to add/subtract some offset to each batch ? Then, with some non-trivial R hacking
it is possible to give a first round of corrected retention times to the xcmsSet, and use the normal retcor()
do a second round.

Yours,
Steffen

Re: retention time correction for individual sample classes

Reply #2 – August 31, 2015, 10:29:49 AM

Yes, this is a very severe batch effect! We have many samples so the overall run time for all samples has been months, which we understand is way less than ideal BUT is necessary for our project. To give a little perspective, if we were to re-run all samples having the machine run 24 hrs a day it would still take a minimum of 30 days...

Based on the internal standard the mean shift is 0.1 mins. However, about 10% of our samples have shifted by 0.8 - 1.4 mins.

1) We could use our RT standard to do a rough initial shift for all peaks. My big fear of doing this that shifts in chromatography across a gradient tends to be nonlinear.

You are recommending using default retcor(), Is this because as stated above no single sample will represent all samples because each sample class has a different set of metabolites? Does the default retcor have a minimum number of "well behaved peaks"? and will it fail if there are two few overlapping compounds between less similar sample classes.

2) Reading the forum it seems it is possible to merge and split samples after xcmsSET(), but when merging and splitting RT correction information is lost. Why is this? IS there a hack which would allow this information to be stored?

3) Given the consistency of the majority of our samples, we could potentially re-run all samples with a RT shift > an acceptable threshold. Is there a RT shift threshold where xcms effectively ignores small shifts (i.e based on our peak width of c(5,12) 0.2 mins, wouldn't shifts less than 0.2 mins still fall into the same peak width)?

Re: retention time correction for individual sample classes

Reply #3 – September 01, 2015, 08:32:13 AM

Hi,

Quote from: "dlforrister"

Based on the internal standard the mean shift is 0.1 mins. However, about 10% of our samples have shifted by 0.8 - 1.4 mins.
1) We could use our RT standard to do a rough initial shift for all peaks. My big fear of doing this that shifts in chromatography across a gradient tends to be nonlinear.
You are recommending using default retcor(), Is this because as stated above no single sample will represent all samples because each sample class has a different set of metabolites? Does the default retcor have a minimum number of "well behaved peaks"? and will it fail if there are two few overlapping compounds between less similar sample classes.

Check out the extra= and especially missing= parameter for retcor(). You can set missing probably
to something like 5% of your number of samples to catch those "too few overlapping compounds
between less similar sample classes"

I'd hope that the non-linear aspect is caught by the second round of group/retcor.

Quote from: "dlforrister"

2) Reading the forum it seems it is possible to merge and split samples after xcmsSET(), but when merging and splitting RT correction information is lost. Why is this? IS there a hack which would allow this information to be stored?

Yes, splitting is possible, but when I wrote the c() joining function, I had no idea how to handle
the RT correction. Should they just stay the same ? I had no really good answer.
A hack could involve manually working on the faahko@rt lists, which have the RT
for each raw file before/after the correction:

Code: [Select]

> str(faahko@rt)
List of 2
 $ raw      :List of 12
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
 $ corrected:List of 12
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...

Quote from: "dlforrister"

3) Given the consistency of the majority of our samples, we could potentially re-run all samples with a RT shift > an acceptable threshold. Is there a RT shift threshold where xcms effectively ignores small shifts (i.e based on our peak width of c(5,12) 0.2 mins, wouldn't shifts less than 0.2 mins still fall into the same peak width)?

Not sure what you mean here. peakwidt=(5,12) refers to peak picking.
Your issue is the group()ing step. There the important parameter is bw=seconds
for the kernel density estimation that is behind the grouping (cf. 2006 xcms paper
or the xcmsPreprocess vignette.

Maybe some more experimenting with the group/retcor parameters first
to get an acceptable xcmsSet without having to resort to hacking.
Maybe then a more directed hacking approach can tweak even more
out of the data.

You can also check http://metabolomics-forum.com/viewtopic.php?f=26&t=137
and there esp. the lower code snippet to cluster the samples w.r.t. their retention time profiles/deviation.

Yours,
Steffen