Print Page - help interpreting result

Title: help interpreting result
Post by: osuct on August 17, 2012, 10:41:07 AM

So my data is generated using Waters UPLC/QTOF-MS

I ran the following command in xcms:

xset <- xcmsSet(method="centWave")
xset <- group(xset)
xset@groups
         mzmed     mzmin     mzmax    rtmed   rtmin   rtmax npeaks samples
 [1,]  90.52499  90.50587  90.53768 566.7325 493.651 571.501    602     434
 [2,]  96.96339  96.96293  96.96393 573.6620 572.318 574.551    432     432
 [3,]  97.99267  97.96968  97.99426 570.5130 568.622 573.041    819     432
 [4,]  98.51168  98.51125  98.52936 569.8830 480.473 571.088    441     440
 [5,] 102.03374 102.03304 102.12967 567.2895 467.341 570.347    544     503
 [6,] 111.98470 111.98376 112.03649 571.1140 535.050 572.944    545     462
 [7,] 113.96607 113.96540 113.97762 571.2150 493.610 575.702    434     433
 [8,] 118.12385 117.93697 118.12470 551.8370 433.981 580.286   1192     432
 [9,] 125.98792 125.97683 125.98878 567.3455 490.244 570.640    436     432
[10,] 128.95311 128.95244 128.95365 571.7455 571.056 572.930    432     432
[11,] 141.96116 141.96064 141.98669 571.6450 478.937 572.517    448     432
[12,] 145.93201 145.93138 145.93242 574.9675 573.559 576.317    432     432
[13,] 149.02481 149.02377 149.04653 490.3790 415.157 546.035    568     558
[14,] 158.96294 158.96252 158.96682 572.2490 517.538 573.042    434     432
[15,] 171.15070 171.10154 171.17518 163.7375 105.204 261.115    722     717
[16,] 186.95823 186.95770 186.95874 571.7560 570.904 572.583    432     432
[17,] 187.12794 186.95827 187.15414 206.0160 205.260 263.677    521     516
[18,] 550.63239 550.55762 550.63547 505.1365 498.520 555.740    462     432
[19,] 551.63567 551.55751 551.63963 504.9905 498.503 520.841    442     432

It looks like the program identified 19 peak groups. That means there are 19 analytes identified across multiple samples?
The first analyte is eluting at 566.7325 retention time (median) which has 602 peaks and it appears in 434 samples?

From another thread, I read that I don't have to do retention time correction for UPLC/QTOF. Can anyone tell me why?

Thanks in advance!

Title: Re: help interpreting result
Post by: Carsten on August 20, 2012, 03:24:16 AM

Quote from: "osuct"

It looks like the program identified 19 peak groups. That means there are 19 analytes identified across multiple samples?
The first analyte is eluting at 566.7325 retention time (median) which has 602 peaks and it appears in 434 samples?

The group function is an alignment function,
which matches Peak X from Sample A to its corresponding Peak X in Sample B and so on and put the corresponding peaks into one feature group.
For the underlying method please check the xcms paper.

The xset@groups output shows you an overview about all detected features,
which are arrays defined by a m/z range and a retention time range.
The "npeaks" column is the sum of all peaks that falls into that ranges over all samples.
The "samples" column is the number of samples, where one or more peaks appears in that specific range. That is also the reason,
why npeaks can be higher than the number of samples.

At this point of the analysis I would recommend to optimize your parameters. See ?group.density for a short description.
Because the retention time difference for your first feature is quite huge, if you compare it to the second feature.
The standard bw = 30 parameter is for a HPLC setup, so for your UPLC a good starting point would be bw = 10.

You could also set sleep = 5 (5 seconds per feature), which produces for each feature a nice figure, where you see on overview about the detected feature and for example if the huge difference in rt means
that on the same mz slide two different peaks occurs within a short time.

Carsten

Title: Re: help interpreting result
Post by: osuct on August 20, 2012, 02:15:28 PM

Thanks for your reply.

So, what are the figures generated using group(xset,sleep) command? What is the x-axis.
Below, the first figure was generated using the group(xset,bw=30) and the second figure was generated using the group(xset,bw=3)
How can I tell "if the huge difference in rt means that on the same mz slide, two different peaks occurs within a short time" ?
(http://http://img402.imageshack.us/img402/9029/74030876.jpg)
Also, I have 19 aligned features in 1 and 17 aligned features in 2.

Title: Re: help interpreting result
Post by: Carsten on August 21, 2012, 06:18:03 AM

The x-axis is the retention time in seconds, the title shows the m/z value and N is the number of peaks that falls into that m/z bin.

Quote from: "osuct"

How can I tell "if the huge difference in rt means that on the same mz slide, two different peaks occurs within a short time" ?

The output of groups shows all information about the specific feature.

Quote

mzmed mzmin mzmax rtmed rtmin rtmax npeaks samples
90.52499 90.50587 90.53768 566.7325 493.651 571.501 602 434

So this feature contains 602 peaks from 434 samples, with mz values between 90.50587 and 90.53768 and the peaks occur between 493 and 571 seconds.

If you look at the figures and see you have two vertical line of points at 493 seconds and 571 seconds within one smooth gaussian function,
then the bw parameter was too high.
The cluster at ~580 shows no much deviation, which should be normal for an UPLC,
because the retention times are very stable.

As you also see in the second plot, the kernel widths are much smaller, which should results into more feature groups.
Normally you should also see coloured, dotted vertical lines, which indicates the identified groups and helps you to interpret the results.
So I assume you have no features with the mass of 81.52?

Another parameter you could optimize is the mzwid parameter, which is the width of those m/z slides.
The default 0.25 m/z is quite huge for an QTOF.

Title: Re: help interpreting result
Post by: osuct on August 23, 2012, 09:29:20 AM

Hi Carsten,

Thank you very much for taking the time to explain this. It helps a lot.
So, points that line up represent peaks from multiple samples that have the same retention time, correct?
The y-axis is labeled density. What density is this referring to?

Yes. There is no feature detected with the mass of 81.52

I tried to change the mzwid parameter to 0.025 and got a lot more detected features (> 1000) but is there a diagnostic like the group plot to see whether this is too small or too big?

Also, for the following figure, does it mean two features (with different two different rt range) are detected because of the two dashed lines? or is it only 1 feature is detected with larger rt range? Why there is only 1 gaussian curve for these two line of points?
The result from group array is as follows:

Code: [Select]

> xsg@groups[11:18,]
        mzmed    mzmin    mzmax   rtmed   rtmin   rtmax npeaks 01andQC
[1,] 73.00000 72.99943 73.00084 205.306 204.608 206.198    245     245
[2,] 73.07730 73.06574 73.07793  92.926  81.407  94.020    407     405
[3,] 73.53199 73.53149 73.53286 565.248 564.269 565.595    406     406
[4,] 75.05659 75.05601 75.05755 137.237 137.173 137.914    268     268
[5,] 76.95368 76.95315 76.95496 213.623 212.638 223.605    265     239
[6,] 79.02238 79.02130 79.02321  21.143  20.465  21.251    259     259
[7,] 80.95015 80.94924 80.95109  17.889  17.221  19.171    412     412
[8,] 81.52041 81.51986 81.52122 565.281 564.269 565.606    455     455

(http://http://img805.imageshack.us/img805/1517/mz2o.jpg)

Thanks again!

Title: Re: help interpreting result
Post by: uwe.geppert on August 26, 2012, 08:28:48 AM

I need help in the Interpretation of XCMS-Results, too.
For my analysis it was not possible to use the same amount of plant tissue in each sample.
Does XCMS compare absolute or relative amounts of substances?

Title: Re: help interpreting result
Post by: Carsten on August 30, 2012, 06:19:07 AM

Sorry for the late answer, but I was on holiday the last few days. ;)

Quote from: "osuct"

So, points that line up represent peaks from multiple samples that have the same retention time, correct?

Exactly!

Quote from: "osuct"

The y-axis is labeled density. What density is this referring to?

This density is calculated with a "standard" gaussian kernel estimation.
By the way the bw parameter from the group function is the smoothing bandwidth for this kernel.

Quote from: "osuct"

I tried to change the mzwid parameter to 0.025 and got a lot more detected features (> 1000) but is there a diagnostic like the group plot to see whether this is too small or too big?

Not a general one, as far as I know. If I want to optimize my parameters I normally look at some specific features and their group plots,
also in how many cases the npeaks number is higher than the number of samples, ...

But keep in mind that the mzwid parameter is unfortunately an absolute value, not a relative one.
So mzwid= 0.01 could be optimal for m/z 100 - 400, but for 400 - 1000 you need mzwid=0.025 and therefore it must be set to 0.025 in general.

Quote from: "osuct"

Also, for the following figure, does it mean two features (with different two different rt range) are detected because of the two dashed lines? or is it only 1 feature is detected with larger rt range? Why there is only 1 gaussian curve for these two line of points?

As you can see in you groups result, you have only one feature at m/z 76.95. The dashed lines marks here the left and the right end of the rt region, where the feature is defined.
So according to the feature, the left line is at 212.638 and the right at 223.605
mzmed mzmin mzmax rtmed rtmin rtmax npeaks 01andQC
[5,] 76.95368 76.95315 76.95496 213.623 212.638 223.605 265 239

The width of the gaussian curve depends certainly on the detected peaks but also on the previously mentioned bw parameter.
The kernels are scaled such that the bw parameter is the standard deviation of those kernels.
So if you know that your chromatography is very stable and the retention time for a feature doesn't change much even over hundreds of samples,
then you can decrease the bw parameter. This results in much more narrow curves, which would separate this feature here in two.

Carsten

Title: Re: help interpreting result
Post by: Carsten on August 30, 2012, 06:39:47 AM

Quote from: "uwe.geppert"

I need help in the Interpretation of XCMS-Results, too.
For my analysis it was not possible to use the same amount of plant tissue in each sample.
Does XCMS compare absolute or relative amounts of substances?

To keep threads clean it's would be best to create a new one instead of posting here,
but unfortunately I have no moderation rights to move your posting, so lets keep it here.

The fold changes within the diffreport function in xcms assumes that all samples uses the same amount.
Paul Benton posted a script to adjust peak intensities to fresh weight in the xcms cookbock
http://http://metabolomics-forum.com/viewtopic.php?f=26&t=143&sid=9bf36eb94e0e4c467b23563e34d0488d

Although we can't guarantee a linear relationship between fresh weight and peak intensity,
so be careful when you check the results.

Carsten

Title: Re: help interpreting result
Post by: osuct on March 07, 2013, 12:04:36 PM

Hi Carsten,

Sorry. I kinda postpone the project for a while when I didn't get your reply. That shows how valuable your help is. :)

I've gone back and try to run xcms on my QC samples only.
I tried running:

Code: [Select]

xset <- xcmsSet(method="centWave")

but for all of the QC files, I got warnings that only a few peaks are found.
Below are parts of the warnings:
> warnings()
Warning messages:
1: Only 4 peaks found in sample20111207_PS_POS_0402
2: No peaks found in sample 20111207_PS_POS_11202
3: Only 4 peaks found in sample20111207_PS_POS_13302
4: No peaks found in sample 20111207_PS_POS_15402
5: Only 1 peak found in sample 20111207_PS_POS_17502
6: Only 1 peak found in sample 20111207_PS_POS_19902
7: Only 2 peaks found in sample20111207_PS_POS_22002
8: Only 6 peaks found in sample20111207_PS_POS_24102
9: Only 4 peaks found in sample20111207_PS_POS_2502

when I changed some of the parameters I got the following:

Code: [Select]

> xs <- xcmsSet(method="centWave",ppm=25, peakwidth=c(5,20))
20111207_PS_POS_0402: 
 Detecting mass traces at 25 ppm ... 
 % finished: 10 30 50 60 80 100 
 23 m/z ROI's.

 Detecting chromatographic peaks ... 
 % finished: Error in if (!(!is.null(dim(wCoefs)) && any(wCoefs - baseline >= sdthr))) next : 
  missing value where TRUE/FALSE needed

Code: [Select]

> xs <- xcmsSet()
20111207_PS_POS_0402: 100:0 150:0 200:0 250:0 300:0 350:0 400:0 450:0 500:0 550:2 600:2 650:2 700:2 750:2 800:2 850:2 
20111207_PS_POS_11202: 100:0 150:0 200:0 250:0 300:0 350:0 400:0 450:0 500:0 550:0 600:0 650:0 700:0 750:0 800:0 850:0 
Error in logical(nrow(m)) : invalid 'length' argument

I'm not sure what's going on.... :?

Title: Re: help interpreting result
Post by: hpbenton on March 15, 2013, 01:01:25 PM

osuct,

First how are your QC's made? Are they the linear combination of your biological samples or closer to blanks or something different? What do your TIC's look like. The error that you're getting is because centWave cannot find anything that looks like a peak. You then run matchedFilter, the other peak detection algorithm, and this also reports that there are no peaks found using the default parameters. There may be no detectable peaks! I would run the TIC overlay in the cookbook to get an idea of how many peaks you should expect and what the alignment/reproducibility is.

I would look at the help page on centWave to understand the parameters a little more. I understand that you're using the Waters UPLC-MS system. I normally find that a smaller minimum peak width is needed, otherwise we lose a lot of peaks! I like to use something around 3,25. However, this really depends on your chromatography!

Hope it helps. Let us know how you get on.

Cheers,

Paul

Metabolomics Society Forum

Software => R => XCMS => Topic started by: osuct on August 17, 2012, 10:41:07 AM