Metabolomics Society Forum

Software => R => XCMS => Topic started by: andgan on October 23, 2012, 03:31:48 PM

Title: Find same feature in separately processed studies
Post by: andgan on October 23, 2012, 03:31:48 PM
Hi All,

my problem is to find common  features in two studies (each with lots of samples), which have been run with the same machine, but separately processed with XCMS.

Specifically, I aligned the two studies separately using the same XCMS parameters. But now I then need to find the common features in both studies.
There are two options to do that:

fixed approach where I specify the max deviation in masses and retention times between the two studies (I think this is similar to what metaXCMS does).
How does it work:

x seconds in study 2.  [/list]
[/list]

However, we know that the retention time correction is a function of the retention time and not uniform for all the retention times.

This is for example the median retention time correction for study 1. The dashed lines are 1.96*median absolute deviation.


[attachment=0:23oyzgmy]rt_correction_vs_rt_corrected_twge.pdf[/attachment:23oyzgmy]


This is the same graph for study 2.


[attachment=1:23oyzgmy]rt_correction_vs_rt_corrected_pivus.pdf[/attachment:23oyzgmy]


The second possibility is:

dynamic approach  where I specify the max deviation in masses (e.g. +/- 0.02 Da) and a dynamic deviation in retention time.
A dynamic retention time means that when I try to find features with similar retention time, given that they have similar mass, I will not use a fixed range, but I allow the deviation in retention times to depend on the retention time.


Let's see an example:

I have this feature from study 1: M280.093T32.408

My goal is to determine if there is a similar feature in study 2.

This is the algorithm I used to compare features and find out if they are the same:


i of interest from study 1 with retention time t_i and mass m_i search for all the features in study 2 with mass m so that: |m_i-m| < 0.02. Obtain x(1)...x(n) features.

Example: M280.093T31.831 M280.089T144.406 M280.111T333.820 M280.108T454.073 M280.104T578.557 M280.103T890.490 M280.103T299.882

2.Determine the confidence intervals of  t_i.
To do that you need to get the median retention time correction (median(to)_s1) and the variability of the retention time correction (mad(to)_s1) for the retention time closest to t_i (Figure 1). These values can be obtained from XCMS (datasetname1@rt), looping across all individuals to get the median and the variability.
Then the upper confidence interval of t_i is:
U_t_i=t_i+|median(to)_s1+1.96*mad(to)_s1|
and the lower confidence interval is:
L_t_i=t_i-|median(to)_s1-1.96*mad(to)_s1|.

Example: median(to)_s1+1.96*mad(to)_s1 of ~32.408=0.942 and median(to)_s1-1.96*mad(to)_s1 of ~32.408=-0.573 then U_t_i=33.350 and L_t_i=31.835


3.Start with x(1) feature and  determine the confidence intervals of t_x(1) (retention time of x(1)).
To do that you need to get the median retention time correction (median(to)_s2) and the variability of the retention time correction (mad(to)_s2) for the retention time closest to t_x(1) (Figure 2). These values can be obtained from XCMS (datasetname2@rt), looping across all individuals to get the median and the variability.
Then the upper confidence interval of t_x(1) is:
U_t_x(1)=x(1)+|median(to)_s2+1.96*mad(to)_s2|
and the lower confidence interval is:
L_t_x(1)=x(1)-|median(to)_s2-1.96*mad(to)_s2|.

Example: median(to)_s2+1.96*mad(to)_s2 of ~31.831=1.912 and median(to)_s2-1.96*mad(to)_s2 of ~31.831=-1.061 then U_t_x(1)=33.743 and L_t_x(1)=30.770


4. Check if the confidence intervals of t_i and t_x(1) overlap.

Example: 31.835-33.350 ? 30.770-33.743 ? TRUE


5.Repeat steps 3 and 4 for x(2)...x(n).

Example: : TRUE FALSE FALSE FALSE FALSE FALSE FALSE[/list][/list]

Then I conclude that M280.093T32.408 from study 1 is the same as M280.093T31.831 from study 2.

What do you think? Is this approach making sense?

Best,

Andrea

[attachment deleted by admin]