So my data is generated using Waters UPLC/QTOF-MS
I ran the following command in xcms:
xset <- xcmsSet(method="centWave")
xset <- group(xset)
xset@groups
mzmed mzmin mzmax rtmed rtmin rtmax npeaks samples
[1,] 90.52499 90.50587 90.53768 566.7325 493.651 571.501 602 434
[2,] 96.96339 96.96293 96.96393 573.6620 572.318 574.551 432 432
[3,] 97.99267 97.96968 97.99426 570.5130 568.622 573.041 819 432
[4,] 98.51168 98.51125 98.52936 569.8830 480.473 571.088 441 440
[5,] 102.03374 102.03304 102.12967 567.2895 467.341 570.347 544 503
[6,] 111.98470 111.98376 112.03649 571.1140 535.050 572.944 545 462
[7,] 113.96607 113.96540 113.97762 571.2150 493.610 575.702 434 433
[8,] 118.12385 117.93697 118.12470 551.8370 433.981 580.286 1192 432
[9,] 125.98792 125.97683 125.98878 567.3455 490.244 570.640 436 432
[10,] 128.95311 128.95244 128.95365 571.7455 571.056 572.930 432 432
[11,] 141.96116 141.96064 141.98669 571.6450 478.937 572.517 448 432
[12,] 145.93201 145.93138 145.93242 574.9675 573.559 576.317 432 432
[13,] 149.02481 149.02377 149.04653 490.3790 415.157 546.035 568 558
[14,] 158.96294 158.96252 158.96682 572.2490 517.538 573.042 434 432
[15,] 171.15070 171.10154 171.17518 163.7375 105.204 261.115 722 717
[16,] 186.95823 186.95770 186.95874 571.7560 570.904 572.583 432 432
[17,] 187.12794 186.95827 187.15414 206.0160 205.260 263.677 521 516
[18,] 550.63239 550.55762 550.63547 505.1365 498.520 555.740 462 432
[19,] 551.63567 551.55751 551.63963 504.9905 498.503 520.841 442 432
It looks like the program identified 19 peak groups. That means there are 19 analytes identified across multiple samples?
The first analyte is eluting at 566.7325 retention time (median) which has 602 peaks and it appears in 434 samples?
From another thread, I read that I don't have to do retention time correction for UPLC/QTOF. Can anyone tell me why?
Thanks in advance!
The group function is an alignment function,
which matches Peak X from Sample A to its corresponding Peak X in Sample B and so on and put the corresponding peaks into one feature group.
For the underlying method please check the xcms paper.
The xset@groups output shows you an overview about all detected features,
which are arrays defined by a m/z range and a retention time range.
The "npeaks" column is the sum of all peaks that falls into that ranges over all samples.
The "samples" column is the number of samples, where one or more peaks appears in that specific range. That is also the reason,
why npeaks can be higher than the number of samples.
At this point of the analysis I would recommend to optimize your parameters. See ?group.density for a short description.
Because the retention time difference for your first feature is quite huge, if you compare it to the second feature.
The standard bw = 30 parameter is for a HPLC setup, so for your UPLC a good starting point would be bw = 10.
You could also set sleep = 5 (5 seconds per feature), which produces for each feature a nice figure, where you see on overview about the detected feature and for example if the huge difference in rt means
that on the same mz slide two different peaks occurs within a short time.
Carsten
Thanks for your reply.
So, what are the figures generated using group(xset,sleep) command? What is the x-axis.
Below, the first figure was generated using the group(xset,bw=30) and the second figure was generated using the group(xset,bw=3)
How can I tell "if the huge difference in rt means that on the same mz slide, two different peaks occurs within a short time" ?
(http://http://img402.imageshack.us/img402/9029/74030876.jpg)
Also, I have 19 aligned features in 1 and 17 aligned features in 2.
The x-axis is the retention time in seconds, the title shows the m/z value and N is the number of peaks that falls into that m/z bin.
The output of groups shows all information about the specific feature.
So this feature contains 602 peaks from 434 samples, with mz values between 90.50587 and 90.53768 and the peaks occur between 493 and 571 seconds.
If you look at the figures and see you have two vertical line of points at 493 seconds and 571 seconds within one smooth gaussian function,
then the bw parameter was too high.
The cluster at ~580
shows no much deviation, which should be normal for an UPLC,
because the retention times are very stable.
As you also see in the second plot, the kernel widths are much smaller, which should results into more feature groups.
Normally you should also see coloured, dotted vertical lines, which indicates the identified groups and helps you to interpret the results.
So I assume you have no features with the mass of 81.52?
Another parameter you could optimize is the mzwid parameter, which is the width of those m/z slides.
The default 0.25 m/z is quite huge for an QTOF.
Hi Carsten,
Thank you very much for taking the time to explain this. It helps a lot.
So, points that line up represent peaks from multiple samples that have the same retention time, correct?
The y-axis is labeled density. What density is this referring to?
Yes. There is no feature detected with the mass of 81.52
I tried to change the mzwid parameter to 0.025 and got a lot more detected features (> 1000) but is there a diagnostic like the group plot to see whether this is too small or too big?
Also, for the following figure, does it mean two features (with different two different rt range) are detected because of the two dashed lines? or is it only 1 feature is detected with larger rt range? Why there is only 1 gaussian curve for these two line of points?
The result from group array is as follows:
> xsg@groups[11:18,]
mzmed mzmin mzmax rtmed rtmin rtmax npeaks 01andQC
[1,] 73.00000 72.99943 73.00084 205.306 204.608 206.198 245 245
[2,] 73.07730 73.06574 73.07793 92.926 81.407 94.020 407 405
[3,] 73.53199 73.53149 73.53286 565.248 564.269 565.595 406 406
[4,] 75.05659 75.05601 75.05755 137.237 137.173 137.914 268 268
[5,] 76.95368 76.95315 76.95496 213.623 212.638 223.605 265 239
[6,] 79.02238 79.02130 79.02321 21.143 20.465 21.251 259 259
[7,] 80.95015 80.94924 80.95109 17.889 17.221 19.171 412 412
[8,] 81.52041 81.51986 81.52122 565.281 564.269 565.606 455 455
(http://http://img805.imageshack.us/img805/1517/mz2o.jpg)
Thanks again!
I need help in the Interpretation of XCMS-Results, too.
For my analysis it was not possible to use the same amount of plant tissue in each sample.
Does XCMS compare absolute or relative amounts of substances?
Sorry for the late answer, but I was on holiday the last few days. ;)
Exactly!
This density is calculated with a "standard" gaussian kernel estimation.
By the way the bw parameter from the group function is the smoothing bandwidth for this kernel.
Not a general one, as far as I know. If I want to optimize my parameters I normally look at some specific features and their group plots,
also in how many cases the npeaks number is higher than the number of samples, ...
But keep in mind that the mzwid parameter is unfortunately an absolute value, not a relative one.
So mzwid= 0.01 could be optimal for m/z 100 - 400, but for 400 - 1000 you need mzwid=0.025 and therefore it must be set to 0.025 in general.
As you can see in you groups result, you have only one feature at m/z 76.95. The dashed lines marks here the left and the right end of the rt region, where the feature is defined.
So according to the feature, the left line is at 212.638 and the right at 223.605
mzmed mzmin mzmax rtmed rtmin rtmax npeaks 01andQC
[5,] 76.95368 76.95315 76.95496 213.623 212.638 223.605 265 239
The width of the gaussian curve depends certainly on the detected peaks but also on the previously mentioned bw parameter.
The kernels are scaled such that the bw parameter is the standard deviation of those kernels.
So if you know that your chromatography is very stable and the retention time for a feature doesn't change much even over hundreds of samples,
then you can decrease the bw parameter. This results in much more narrow curves, which would separate this feature here in two.
Carsten
To keep threads clean it's would be best to create a new one instead of posting here,
but unfortunately I have no moderation rights to move your posting, so lets keep it here.
The fold changes within the diffreport function in xcms assumes that all samples uses the same amount.
Paul Benton posted a script to adjust peak intensities to fresh weight in the xcms cookbock
http://http://metabolomics-forum.com/viewtopic.php?f=26&t=143&sid=9bf36eb94e0e4c467b23563e34d0488d
Although we can't guarantee a linear relationship between fresh weight and peak intensity,
so be careful when you check the results.
Carsten
Hi Carsten,
Sorry. I kinda postpone the project for a while when I didn't get your reply. That shows how valuable your help is. :)
I've gone back and try to run xcms on my QC samples only.
I tried running:
xset <- xcmsSet(method="centWave")
but for all of the QC files, I got warnings that only a few peaks are found.
Below are parts of the warnings:
> warnings()
Warning messages:
1: Only 4 peaks found in sample20111207_PS_POS_0402
2: No peaks found in sample 20111207_PS_POS_11202
3: Only 4 peaks found in sample20111207_PS_POS_13302
4: No peaks found in sample 20111207_PS_POS_15402
5: Only 1 peak found in sample 20111207_PS_POS_17502
6: Only 1 peak found in sample 20111207_PS_POS_19902
7: Only 2 peaks found in sample20111207_PS_POS_22002
8: Only 6 peaks found in sample20111207_PS_POS_24102
9: Only 4 peaks found in sample20111207_PS_POS_2502
when I changed some of the parameters I got the following:
> xs <- xcmsSet(method="centWave",ppm=25, peakwidth=c(5,20))
20111207_PS_POS_0402:
Detecting mass traces at 25 ppm ...
% finished: 10 30 50 60 80 100
23 m/z ROI's.
Detecting chromatographic peaks ...
% finished: Error in if (!(!is.null(dim(wCoefs)) && any(wCoefs - baseline >= sdthr))) next :
missing value where TRUE/FALSE needed
> xs <- xcmsSet()
20111207_PS_POS_0402: 100:0 150:0 200:0 250:0 300:0 350:0 400:0 450:0 500:0 550:2 600:2 650:2 700:2 750:2 800:2 850:2
20111207_PS_POS_11202: 100:0 150:0 200:0 250:0 300:0 350:0 400:0 450:0 500:0 550:0 600:0 650:0 700:0 750:0 800:0 850:0
Error in logical(nrow(m)) : invalid 'length' argument
I'm not sure what's going on.... :?
osuct,
First how are your QC's made? Are they the linear combination of your biological samples or closer to blanks or something different? What do your TIC's look like. The error that you're getting is because centWave cannot find anything that looks like a peak. You then run matchedFilter, the other peak detection algorithm, and this also reports that there are no peaks found using the default parameters. There may be no detectable peaks! I would run the TIC overlay in the cookbook to get an idea of how many peaks you should expect and what the alignment/reproducibility is.
I would look at the help page on centWave to understand the parameters a little more. I understand that you're using the Waters UPLC-MS system. I normally find that a smaller minimum peak width is needed, otherwise we lose a lot of peaks! I like to use something around 3,25. However, this really depends on your chromatography!
Hope it helps. Let us know how you get on.
Cheers,
Paul