Skip to main content
Topic: group or peakTable problem (Read 10833 times) previous topic - next topic

group or peakTable problem

I am trying to use XCMS to perform peak detection on some authentic standards to get a nice clean spectrum. 
Here is an example for the compound catechin:

stand1<-xcmsSet(filenames, nSlaves=2, method="centWave", ppm=25, peakwidth=c(1.5,15), mzdiff=0.01, fitgauss=TRUE, verbose.columns=TRUE)
xstand@peaks[which(xstand@peaks[,"mz"] > 291.086 & xstand@peaks[,"mz"] < 291.087),]

          mz    mzmin    mzmax      rt  rtmin  rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 109.8970 106.897 112.896 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 111.3708 107.112 117.395  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52

So I am getting the molecular ion that I know to be there in the feature list. 

I next try to concatenate this xset (stand1) with a 'background' xset, group them, retention time correct, and regroup:

xstand<-c(xset1, stand1)
xset4 <- group(xstand, bw=2, minfrac=0.5, max = 50, mzwid=0.02)
xset5 <- retcor(xset4,  method="loess", family = "gaussian", plottype = "mdevden", span=2, missing=round(length(dataset)*0.05, digits=0))
xset6 <- group(xset5, bw=1, minsamp=0, max= 1000, mzwid=0.02)
xset6

An "xcmsSet" object with 52 samples

Time range: -0.4-1330.2 seconds (0-22.2 minutes)
Mass range: 55.0173-1199.8205 m/z
Peaks: 70298 (about 1352 per sample)
Peak Groups: 397
Sample classes: Library_serumC8

Profile settings: method = bin
                  step = 0.1

Memory usage: 23.1 MB

And this appears to have worked, however, when I generate the peak table and look for my peak of interest, it isn't there. 

peaklist <- peakTable(xset6, value="into")
peaklist[which(peaklist[,"mz"] > 291.05 & peaklist[,"mz"] < 291.11),]

...returns an empty table. 

The original peak of interest (the molecular ion in this case) is there:
xset6@peaks[which(xset6@peaks[,"mz"] > 291.086 & xset6@peaks[,"mz"] < 291.087),]

      mz    mzmin    mzmax      rt    rtmin    rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 107.1962 104.1962 110.1952 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 108.6867 104.4027 114.6857  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52

But is not making it into the peak table even though I have tried to set the minfrac or minsamp values to zero, or some trivial non-zero value such as 0.000001.  I could work around this and access the @peaks slot, but this seems to be reinventing the wheel.  And it isn't a mass accuracy grouping artifact, as there are no peak groups within several dalton of the 291.0866 peak in the xset6 peak table.  Any one have any idea how I can get a peakTable which hasn't filtered out the 'rare' features?

Thanks

Re: group or peakTable problem

Reply #1
Any one out there have a suggestion I am missing?  Shouldn't I be able to recover all the signals from peak detection in the grouped dataset if I set the minFrac setting to zero?  Thanks,
Corey

Re: group or peakTable problem

Reply #2
It should work but bw=1 might be a little bit too small.
How do these peaks look before retention time correction ?
Can you try without retcor, just setting bw=2 or 3  ?

Re: group or peakTable problem

Reply #3
Ralf,

I am a bit confused I think.  Even if they aren't properly aligned - they should be in the resulting dataset right?  I can play with the bw setting tommorrow, but assuming that works, can you explain to me why the bw matters?  Shouldn't the peak end up as its own column in the peaklist if it is present in one sample?

Re: group or peakTable problem

Reply #4
I see now what you mean. Can you make sure that you set minfrac=0 in your second grouping step (xset6 <- ) ?

Re: group or peakTable problem

Reply #5
I have tried both minfrac=0 and minsamp=0 in the second peak grouping step.  I also tried trivial non zero values for the minfrac step, the feature representing the standard molecular ion is absent in either case.

Corey

Re: group or peakTable problem

Reply #6
Strange. Maybe retcor is doing something unexpected. What are the corrected retention times of these two features in xset5 ?

Re: group or peakTable problem

Reply #7
The retention time was adjusted by two seconds.  Increasing the bw for the post-retention time correction xset does not result in the 291 peak making it into the peakTable.  Any other ideas?  Could it be a bug somewhere?  This behavior doesn't make sense to me at all. 

> xset4@peaks[which(xset4@peaks[,"mz"] > 291.086 & xset4@peaks[,"mz"] < 291.087),]
          mz    mzmin    mzmax      rt  rtmin  rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 109.8970 106.897 112.896 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 111.3708 107.112 117.395  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52

> xset5@peaks[which(xset5@peaks[,"mz"] > 291.086 & xset5@peaks[,"mz"] < 291.087),]
          mz    mzmin    mzmax      rt    rtmin    rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 107.7186 104.7037 110.7324 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 109.0032 104.7192 115.0076  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52
> xset6@peaks[which(xset6@peaks[,"mz"] > 291.086 & xset6@peaks[,"mz"] < 291.087),]
          mz    mzmin    mzmax      rt    rtmin    rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 107.7186 104.7037 110.7324 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 109.0032 104.7192 115.0076  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52
>

> peaklist <- peakTable(xset6, value="into")
> peaklist[which(peaklist[,"mz"] > 291.05 & peaklist[,"mz"] < 291.11),]
 [1] mz                    mzmin                mzmax                rt                    rtmin                rtmax               
 [7] npeaks                Library_serumC8      Library_120912_011001 Library_120917_018301 Library_120917_009801 Library_120912_007801
[13] Library_120925_000701 Library_120921_013201 Library_120912_013501 Library_120817_016501 Library_120912_004001 Library_120917_000701
[19] Library_120917_011601 Library_120917_013301 Library_120912_017501 Library_120921_003901 Library_120917_003901 Library_120817_000901
[25] Library_120912_004701 Library_120817_015501 Library_120917_001301 Library_120912_000401 Library_120917_002201 Library_120921_002601
[31] Library_120925_001901 Library_120817_004801 Library_120925_001401 Library_120912_011002 Library_120917_018302 Library_120917_009802
[37] Library_120912_007802 Library_120925_000702 Library_120921_013202 Library_120912_013502 Library_120817_016502 Library_120912_004002
[43] Library_120917_000702 Library_120917_011602 Library_120917_013302 Library_120912_017502 Library_120921_003902 Library_120917_003902
[49] Library_120817_000902 Library_120912_004702 Library_120817_015502 Library_120917_001302 Library_120912_000402 Library_120917_002202
[55] Library_120921_002602 Library_120925_001902 Library_120817_004802 Library_120925_001402 Library_120817_004401 Library_120817_004402
<0 rows> (or 0-length row.names)
>

> peaklist <- peakTable(xset4, value="into")
> peaklist[which(peaklist[,"mz"] > 291.05 & peaklist[,"mz"] < 291.11),]
 [1] mz                    mzmin                mzmax                rt                    rtmin                rtmax               
 [7] npeaks                Library_serumC8      Library_120912_011001 Library_120917_018301 Library_120917_009801 Library_120912_007801
[13] Library_120925_000701 Library_120921_013201 Library_120912_013501 Library_120817_016501 Library_120912_004001 Library_120917_000701
[19] Library_120917_011601 Library_120917_013301 Library_120912_017501 Library_120921_003901 Library_120917_003901 Library_120817_000901
[25] Library_120912_004701 Library_120817_015501 Library_120917_001301 Library_120912_000401 Library_120917_002201 Library_120921_002601
[31] Library_120925_001901 Library_120817_004801 Library_120925_001401 Library_120912_011002 Library_120917_018302 Library_120917_009802
[37] Library_120912_007802 Library_120925_000702 Library_120921_013202 Library_120912_013502 Library_120817_016502 Library_120912_004002
[43] Library_120917_000702 Library_120917_011602 Library_120917_013302 Library_120912_017502 Library_120921_003902 Library_120917_003902
[49] Library_120817_000902 Library_120912_004702 Library_120817_015502 Library_120917_001302 Library_120912_000402 Library_120917_002202
[55] Library_120921_002602 Library_120925_001902 Library_120817_004802 Library_120925_001402 Library_120817_004401 Library_120817_004402
<0 rows> (or 0-length row.names)
>

Re: group or peakTable problem

Reply #8
Anyone have any suggestions as to why a feature in two of my samples (out of 52) does not make into my peakTable?  Any tips are appreciated.  Thanks,
Corey

Re: group or peakTable problem

Reply #9
Hmm, really strange, my first idea would also go into the direction of setting minfrac and minsamp to 0.
Could be a very rare bug  :?:

Perhaps you could try the other grouping method group.nearest and look if it performs better.

Cheers,
Carsten

Re: group or peakTable problem

Reply #10
After some debugging I think I have an idea, where the problem lies.
The group.density function bins all m/z values (meaning all m/z values over the whole retention time) and then loops over these bins.
It defines a number of peaks which has to occur in that bin, otherwise it loops into the next.
The value is defined as the minimum number of samples per class divided per two.
So for the faahKO data set this number is 3, because we have 6 samples per class.

If the number of peaks within that bin is higher than the threshold, the density function divides the m/z bin into different retention time groups.
For these groups the minfrac and minsamp values are then checked.

I have no idea why this threshold exist and if it can be safely removed.

Carsten

Re: group or peakTable problem

Reply #11
thanks carsten,

I am perplexed.  I haven't used the other grouping method but can try it.  I have tried setting the minsamp or minfrac to zero and that doesn't work.  If I take the two samples that contain the catechin molecular ion and group then the catechine molecular ion makes it intot he peak table.  If I use the c() command to add this to my 'background' xcms set, then group - the catechin molecular ion, while still present in the xcms@peaks slot, is absent from the peak Table. 

Is it possible that it is a function of the c() step?  I don't use this routinely, but was hoping to do so in this instance so that I didn't have to perform peak detection on the background files each time - just once.

Corey

Re: group or peakTable problem

Reply #12
Carsten,

Just saw your second response.  Thanks for the help.  I will try the other grouping method instead in the short term.

Corey

Re: group or peakTable problem

Reply #13
This is actually a little concerning.  I do not routinely structure my files for XCMS to recognize groups.  So if all my files (52) are in a single group, then I am only recovering features that are present in 1/2 of my files - 26 samples?  Shouldn't the minfrac take precendence over that value? 

Corey

Re: group or peakTable problem

Reply #14
FYI,

group.nearest did work for this application.  The catechin molecular ion was present in two of the 52 files, and ended up as a feature group in the peakTable.  I am a little shocked at how many more feature groups there are:  nearly 10 fold more, but that was a first run through, and I can't say that I have optimized parameters at all.  It does take quite a bit longer than the density algorithm.  Thanks for the tips.
Corey