Skip to main content
Topic: Find same feature in separately processed studies (Read 3289 times) previous topic - next topic

Find same feature in separately processed studies

Hi All,

my problem is to find common  features in two studies (each with lots of samples), which have been run with the same machine, but separately processed with XCMS.

Specifically, I aligned the two studies separately using the same XCMS parameters. But now I then need to find the common features in both studies.
There are two options to do that:

    1. To use a
fixed approach where I specify the max deviation in masses and retention times between the two studies (I think this is similar to what metaXCMS does).
How does it work:

    a. Find all the masses in study 1 that are equal +/- 0.02 Dalton in study 2.
    b. For each common mass find the retention times that are equal +/-
x seconds in study 2.  [/list]
[/list]

However, we know that the retention time correction is a function of the retention time and not uniform for all the retention times.

This is for example the median retention time correction for study 1. The dashed lines are 1.96*median absolute deviation.


[attachment=0:23oyzgmy]rt_correction_vs_rt_corrected_twge.pdf[/attachment:23oyzgmy]


This is the same graph for study 2.


[attachment=1:23oyzgmy]rt_correction_vs_rt_corrected_pivus.pdf[/attachment:23oyzgmy]


The second possibility is:

    2. To use a
dynamic approach  where I specify the max deviation in masses (e.g. +/- 0.02 Da) and a dynamic deviation in retention time.
A dynamic retention time means that when I try to find features with similar retention time, given that they have similar mass, I will not use a fixed range, but I allow the deviation in retention times to depend on the retention time.


Let's see an example:

I have this feature from study 1: M280.093T32.408

My goal is to determine if there is a similar feature in study 2.

This is the algorithm I used to compare features and find out if they are the same:


    1.Given a feature
i of interest from study 1 with retention time t_i and mass m_i search for all the features in study 2 with mass m so that: |m_i-m| < 0.02. Obtain x(1)...x(n) features.

Example: M280.093T31.831 M280.089T144.406 M280.111T333.820 M280.108T454.073 M280.104T578.557 M280.103T890.490 M280.103T299.882

2.Determine the confidence intervals of  t_i.
To do that you need to get the median retention time correction (median(to)_s1) and the variability of the retention time correction (mad(to)_s1) for the retention time closest to t_i (Figure 1). These values can be obtained from XCMS (datasetname1@rt), looping across all individuals to get the median and the variability.
Then the upper confidence interval of t_i is:
U_t_i=t_i+|median(to)_s1+1.96*mad(to)_s1|
and the lower confidence interval is:
L_t_i=t_i-|median(to)_s1-1.96*mad(to)_s1|.

Example: median(to)_s1+1.96*mad(to)_s1 of ~32.408=0.942 and median(to)_s1-1.96*mad(to)_s1 of ~32.408=-0.573 then U_t_i=33.350 and L_t_i=31.835


3.Start with x(1) feature and  determine the confidence intervals of t_x(1) (retention time of x(1)).
To do that you need to get the median retention time correction (median(to)_s2) and the variability of the retention time correction (mad(to)_s2) for the retention time closest to t_x(1) (Figure 2). These values can be obtained from XCMS (datasetname2@rt), looping across all individuals to get the median and the variability.
Then the upper confidence interval of t_x(1) is:
U_t_x(1)=x(1)+|median(to)_s2+1.96*mad(to)_s2|
and the lower confidence interval is:
L_t_x(1)=x(1)-|median(to)_s2-1.96*mad(to)_s2|.

Example: median(to)_s2+1.96*mad(to)_s2 of ~31.831=1.912 and median(to)_s2-1.96*mad(to)_s2 of ~31.831=-1.061 then U_t_x(1)=33.743 and L_t_x(1)=30.770


4. Check if the confidence intervals of t_i and t_x(1) overlap.

Example: 31.835-33.350 ? 30.770-33.743 ? TRUE


5.Repeat steps 3 and 4 for x(2)...x(n).

Example: : TRUE FALSE FALSE FALSE FALSE FALSE FALSE[/list][/list]

Then I conclude that M280.093T32.408 from study 1 is the same as M280.093T31.831 from study 2.

What do you think? Is this approach making sense?

Best,

Andrea

[attachment deleted by admin]