Skip to main content
Topic: help interpreting result (Read 23173 times) previous topic - next topic

help interpreting result

So my data is generated using Waters UPLC/QTOF-MS

I ran the following command in xcms:
Code: [Select]
xset <- xcmsSet(method="centWave")
xset <- group(xset)
xset@groups
        mzmed    mzmin    mzmax    rtmed  rtmin  rtmax npeaks samples
 [1,]  90.52499  90.50587  90.53768 566.7325 493.651 571.501    602    434
 [2,]  96.96339  96.96293  96.96393 573.6620 572.318 574.551    432    432
 [3,]  97.99267  97.96968  97.99426 570.5130 568.622 573.041    819    432
 [4,]  98.51168  98.51125  98.52936 569.8830 480.473 571.088    441    440
 [5,] 102.03374 102.03304 102.12967 567.2895 467.341 570.347    544    503
 [6,] 111.98470 111.98376 112.03649 571.1140 535.050 572.944    545    462
 [7,] 113.96607 113.96540 113.97762 571.2150 493.610 575.702    434    433
 [8,] 118.12385 117.93697 118.12470 551.8370 433.981 580.286  1192    432
 [9,] 125.98792 125.97683 125.98878 567.3455 490.244 570.640    436    432
[10,] 128.95311 128.95244 128.95365 571.7455 571.056 572.930    432    432
[11,] 141.96116 141.96064 141.98669 571.6450 478.937 572.517    448    432
[12,] 145.93201 145.93138 145.93242 574.9675 573.559 576.317    432    432
[13,] 149.02481 149.02377 149.04653 490.3790 415.157 546.035    568    558
[14,] 158.96294 158.96252 158.96682 572.2490 517.538 573.042    434    432
[15,] 171.15070 171.10154 171.17518 163.7375 105.204 261.115    722    717
[16,] 186.95823 186.95770 186.95874 571.7560 570.904 572.583    432    432
[17,] 187.12794 186.95827 187.15414 206.0160 205.260 263.677    521    516
[18,] 550.63239 550.55762 550.63547 505.1365 498.520 555.740    462    432
[19,] 551.63567 551.55751 551.63963 504.9905 498.503 520.841    442    432

It looks like the program identified 19 peak groups. That means there are 19 analytes identified across multiple samples?
The first analyte is eluting at 566.7325 retention time (median) which has 602 peaks and it appears in 434 samples?

From another thread, I read that I don't have to do retention time correction for UPLC/QTOF. Can anyone tell me why?

Thanks in advance!

Re: help interpreting result

Reply #1
Quote from: "osuct"
It looks like the program identified 19 peak groups. That means there are 19 analytes identified across multiple samples?
The first analyte is eluting at 566.7325 retention time (median) which has 602 peaks and it appears in 434 samples?
The group function is an alignment function,
which matches Peak X from Sample A to its corresponding Peak X in Sample B and so on and put the corresponding peaks into one feature group.
For the underlying method please check the xcms paper.

The xset@groups output shows you an overview about all detected features,
which are arrays defined by a m/z range and a retention time range.
The "npeaks" column is the sum of all peaks that falls into that ranges over all samples.
The "samples" column is the number of samples, where one or more peaks appears in that specific range. That is also the reason,
why npeaks can be higher than the number of samples.

At this point of the analysis I would recommend to optimize your parameters. See ?group.density for a short description.
Because the retention time difference for your first feature is quite huge, if you compare it to the second feature.
The standard bw = 30 parameter is for a HPLC setup, so for your UPLC a good starting point would be bw = 10.

You could also set sleep = 5 (5 seconds per feature), which produces for each feature a nice figure, where you see on overview about the detected feature and for example if the huge difference in rt means
that on the same mz slide two different peaks occurs within a short time.

Carsten

Re: help interpreting result

Reply #2
Thanks for your reply.

So, what are the figures generated using group(xset,sleep) command? What is the x-axis.
Below, the first figure was generated using the group(xset,bw=30) and the second figure was generated using the group(xset,bw=3)
How can I tell "if the huge difference in rt means that on the same mz slide, two different peaks occurs within a short time" ?

Also, I have 19 aligned features in 1 and 17 aligned features in 2.

Re: help interpreting result

Reply #3
The x-axis is the retention time in seconds, the title shows the m/z value and N is the number of peaks that falls into that m/z bin.
Quote from: "osuct"
How can I tell "if the huge difference in rt means that on the same mz slide, two different peaks occurs within a short time" ?
The output of groups shows all information about the specific feature.
Quote
mzmed    mzmin    mzmax      rtmed    rtmin    rtmax    npeaks samples
90.52499  90.50587  90.53768 566.7325 493.651 571.501    602    434
So this feature contains 602 peaks from 434 samples, with mz values between 90.50587 and 90.53768 and the peaks occur between 493 and 571 seconds.

If you look at the figures and see you have two vertical line of points at 493 seconds and 571 seconds within one smooth gaussian function,
then the bw parameter was too high.
The cluster at ~580 shows no much deviation, which should be normal for an UPLC,
because the retention times are very stable.

As you also see in the second plot, the kernel widths are much smaller, which should results into more feature groups.
Normally you should also see coloured, dotted vertical lines, which indicates the identified groups and helps you to interpret the results.
So I assume you have no features with the mass of 81.52?

Another parameter you could optimize is the mzwid parameter, which is the width of those m/z slides.
The default 0.25 m/z is quite huge for an QTOF.

Re: help interpreting result

Reply #4
Hi Carsten,

Thank you very much for taking the time to explain this. It helps a lot.
So, points that line up represent peaks from multiple samples that have the same retention time, correct?
The y-axis is labeled density. What density is this referring to?

Yes. There is no feature detected with the mass of 81.52

I tried to change the mzwid parameter to 0.025 and got a lot more detected features (> 1000) but is there a diagnostic like the group plot to see whether this is too small or too big?

Also, for the following figure, does it mean two features (with different two different rt range) are detected because of the two dashed lines? or is it only 1 feature is detected with larger rt range? Why there is only 1 gaussian curve for these two line of points?
The result from group array is as follows:
Code: [Select]
> xsg@groups[11:18,]
        mzmed    mzmin    mzmax  rtmed  rtmin  rtmax npeaks 01andQC
[1,] 73.00000 72.99943 73.00084 205.306 204.608 206.198    245    245
[2,] 73.07730 73.06574 73.07793  92.926  81.407  94.020    407    405
[3,] 73.53199 73.53149 73.53286 565.248 564.269 565.595    406    406
[4,] 75.05659 75.05601 75.05755 137.237 137.173 137.914    268    268
[5,] 76.95368 76.95315 76.95496 213.623 212.638 223.605    265    239
[6,] 79.02238 79.02130 79.02321  21.143  20.465  21.251    259    259
[7,] 80.95015 80.94924 80.95109  17.889  17.221  19.171    412    412
[8,] 81.52041 81.51986 81.52122 565.281 564.269 565.606    455    455



Thanks again!

Re: help interpreting result

Reply #5
I need help in the Interpretation of XCMS-Results, too.
For my analysis it was not possible to use the same amount of plant tissue in each sample.
Does XCMS compare absolute or relative amounts of substances?

Re: help interpreting result

Reply #6
Sorry for the late answer, but I was on holiday the last few days. ;)

Quote from: "osuct"
So, points that line up represent peaks from multiple samples that have the same retention time, correct?
Exactly!

Quote from: "osuct"
The y-axis is labeled density. What density is this referring to?
This density is calculated with a "standard" gaussian kernel estimation.
By the way the bw parameter from the group function is the smoothing bandwidth for this kernel.

Quote from: "osuct"
I tried to change the mzwid parameter to 0.025 and got a lot more detected features (> 1000) but is there a diagnostic like the group plot to see whether this is too small or too big?
Not a general one, as far as I know. If I want to optimize my parameters I normally look at some specific features and their group plots,
also in how many cases the npeaks number is higher than the number of samples, ... 

But keep in mind that the mzwid parameter is unfortunately an absolute value, not a relative one.
So mzwid= 0.01 could be optimal for m/z 100 - 400, but for  400 - 1000 you need mzwid=0.025 and therefore it must be set to 0.025 in general.

Quote from: "osuct"
Also, for the following figure, does it mean two features (with different two different rt range) are detected because of the two dashed lines? or is it only 1 feature is detected with larger rt range? Why there is only 1 gaussian curve for these two line of points?
As you can see in you groups result, you have only one feature at m/z 76.95. The dashed lines marks here the left and the right end of the rt region, where the feature is defined.
So according to the feature, the left line is at 212.638 and the right at 223.605
        mzmed    mzmin    mzmax  rtmed  rtmin  rtmax npeaks 01andQC
[5,] 76.95368 76.95315 76.95496 213.623 212.638 223.605    265    239

The width of the gaussian curve depends certainly on the detected peaks but also on the previously mentioned bw parameter.
The kernels are scaled such that the bw parameter is the standard deviation of those kernels.
So if you know that your chromatography is very stable and the retention time for a feature doesn't change much even over hundreds of samples,
then you can decrease the bw parameter. This results in much more narrow curves, which would separate this feature here in two.

Carsten

Re: help interpreting result

Reply #7
Quote from: "uwe.geppert"
I need help in the Interpretation of XCMS-Results, too.
For my analysis it was not possible to use the same amount of plant tissue in each sample.
Does XCMS compare absolute or relative amounts of substances?

To keep threads clean it's would be best to create a new one instead of posting here,
but unfortunately I have no moderation rights to move your posting, so lets keep it here.

The fold changes within the diffreport function in xcms assumes that all samples uses the same amount.
Paul Benton posted a script to adjust peak intensities to fresh weight in the xcms cookbock
http://http://metabolomics-forum.com/viewtopic.php?f=26&t=143&sid=9bf36eb94e0e4c467b23563e34d0488d

Although we can't guarantee a linear relationship between fresh weight and peak intensity,
so be careful when you check the results.

Carsten

Re: help interpreting result

Reply #8
Hi Carsten,

Sorry. I kinda postpone the project for a while when I didn't get your reply. That shows how valuable your help is.  :)

I've gone back and try to run xcms on my QC samples only.
I tried running:
Code: [Select]
xset <- xcmsSet(method="centWave")
but for all of the QC files, I got warnings that only a few peaks are found.
Below are parts of the warnings:
> warnings()
Warning messages:
1: Only 4 peaks found in sample20111207_PS_POS_0402
2: No peaks found in sample 20111207_PS_POS_11202
3: Only 4 peaks found in sample20111207_PS_POS_13302
4: No peaks found in sample 20111207_PS_POS_15402
5: Only 1 peak found in sample 20111207_PS_POS_17502
6: Only 1 peak found in sample 20111207_PS_POS_19902
7: Only 2 peaks found in sample20111207_PS_POS_22002
8: Only 6 peaks found in sample20111207_PS_POS_24102
9: Only 4 peaks found in sample20111207_PS_POS_2502

when I changed some of the parameters I got the following:

Code: [Select]
> xs <- xcmsSet(method="centWave",ppm=25, peakwidth=c(5,20))
20111207_PS_POS_0402:
 Detecting mass traces at 25 ppm ...
 % finished: 10 30 50 60 80 100
 23 m/z ROI's.

 Detecting chromatographic peaks ...
 % finished: Error in if (!(!is.null(dim(wCoefs)) && any(wCoefs - baseline >= sdthr))) next :
  missing value where TRUE/FALSE needed

Code: [Select]
> xs <- xcmsSet()
20111207_PS_POS_0402: 100:0 150:0 200:0 250:0 300:0 350:0 400:0 450:0 500:0 550:2 600:2 650:2 700:2 750:2 800:2 850:2
20111207_PS_POS_11202: 100:0 150:0 200:0 250:0 300:0 350:0 400:0 450:0 500:0 550:0 600:0 650:0 700:0 750:0 800:0 850:0
Error in logical(nrow(m)) : invalid 'length' argument

I'm not sure what's going on.... :?

Re: help interpreting result

Reply #9
osuct,


First how are your QC's made? Are they the linear combination of your biological samples or closer to blanks or something different? What do your TIC's look like. The error that you're getting is because centWave cannot find anything that looks like a peak. You then run matchedFilter, the other peak detection algorithm, and this also reports that there are no peaks found using the default parameters. There may be no detectable peaks! I would run the TIC overlay in the cookbook to get an idea of how many peaks you should expect and what the alignment/reproducibility is.

I would look at the help page on centWave to understand the parameters a little more. I understand that you're using the Waters UPLC-MS system. I normally find that a smaller minimum peak width is needed, otherwise we lose a lot of peaks! I like to use something around 3,25. However, this really depends on your chromatography!

Hope it helps. Let us know how you get on.

Cheers,

Paul
~~
H. Paul Benton
Scripps Research Institute
If you have an error with XCMS Online please send me the JOBID and submit an error via the XCMS Online contact page