Metabolomics Society Forum

Software => R => XCMS => Topic started by: cbroeckl on November 27, 2012, 11:54:55 AM

Title: group or peakTable problem
Post by: cbroeckl on November 27, 2012, 11:54:55 AM
I am trying to use XCMS to perform peak detection on some authentic standards to get a nice clean spectrum. 
Here is an example for the compound catechin:

stand1<-xcmsSet(filenames, nSlaves=2, method="centWave", ppm=25, peakwidth=c(1.5,15), mzdiff=0.01, fitgauss=TRUE, verbose.columns=TRUE)
xstand@peaks[which(xstand@peaks[,"mz"] > 291.086 & xstand@peaks[,"mz"] < 291.087),]

          mz    mzmin    mzmax      rt  rtmin  rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 109.8970 106.897 112.896 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 111.3708 107.112 117.395  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52

So I am getting the molecular ion that I know to be there in the feature list. 

I next try to concatenate this xset (stand1) with a 'background' xset, group them, retention time correct, and regroup:

xstand<-c(xset1, stand1)
xset4 <- group(xstand, bw=2, minfrac=0.5, max = 50, mzwid=0.02)
xset5 <- retcor(xset4,  method="loess", family = "gaussian", plottype = "mdevden", span=2, missing=round(length(dataset)*0.05, digits=0))
xset6 <- group(xset5, bw=1, minsamp=0, max= 1000, mzwid=0.02)
xset6

An "xcmsSet" object with 52 samples

Time range: -0.4-1330.2 seconds (0-22.2 minutes)
Mass range: 55.0173-1199.8205 m/z
Peaks: 70298 (about 1352 per sample)
Peak Groups: 397
Sample classes: Library_serumC8

Profile settings: method = bin
                  step = 0.1

Memory usage: 23.1 MB

And this appears to have worked, however, when I generate the peak table and look for my peak of interest, it isn't there. 

peaklist <- peakTable(xset6, value="into")
peaklist[which(peaklist[,"mz"] > 291.05 & peaklist[,"mz"] < 291.11),]

...returns an empty table. 

The original peak of interest (the molecular ion in this case) is there:
xset6@peaks[which(xset6@peaks[,"mz"] > 291.086 & xset6@peaks[,"mz"] < 291.087),]

      mz    mzmin    mzmax      rt    rtmin    rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 107.1962 104.1962 110.1952 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 108.6867 104.4027 114.6857  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52

But is not making it into the peak table even though I have tried to set the minfrac or minsamp values to zero, or some trivial non-zero value such as 0.000001.  I could work around this and access the @peaks slot, but this seems to be reinventing the wheel.  And it isn't a mass accuracy grouping artifact, as there are no peak groups within several dalton of the 291.0866 peak in the xset6 peak table.  Any one have any idea how I can get a peakTable which hasn't filtered out the 'rare' features?

Thanks
Title: Re: group or peakTable problem
Post by: cbroeckl on December 03, 2012, 01:06:12 PM
Any one out there have a suggestion I am missing?  Shouldn't I be able to recover all the signals from peak detection in the grouped dataset if I set the minFrac setting to zero?  Thanks,
Corey
Title: Re: group or peakTable problem
Post by: Ralf on December 04, 2012, 01:43:46 PM
It should work but bw=1 might be a little bit too small.
How do these peaks look before retention time correction ?
Can you try without retcor, just setting bw=2 or 3  ?
Title: Re: group or peakTable problem
Post by: cbroeckl on December 04, 2012, 05:59:08 PM
Ralf,

I am a bit confused I think.  Even if they aren't properly aligned - they should be in the resulting dataset right?  I can play with the bw setting tommorrow, but assuming that works, can you explain to me why the bw matters?  Shouldn't the peak end up as its own column in the peaklist if it is present in one sample?
Title: Re: group or peakTable problem
Post by: Ralf on December 04, 2012, 07:00:19 PM
I see now what you mean. Can you make sure that you set minfrac=0 in your second grouping step (xset6 <- ) ?
Title: Re: group or peakTable problem
Post by: cbroeckl on December 05, 2012, 10:35:57 AM
I have tried both minfrac=0 and minsamp=0 in the second peak grouping step.  I also tried trivial non zero values for the minfrac step, the feature representing the standard molecular ion is absent in either case.

Corey
Title: Re: group or peakTable problem
Post by: Ralf on December 05, 2012, 11:57:24 AM
Strange. Maybe retcor is doing something unexpected. What are the corrected retention times of these two features in xset5 ?
Title: Re: group or peakTable problem
Post by: cbroeckl on December 07, 2012, 02:53:21 PM
The retention time was adjusted by two seconds.  Increasing the bw for the post-retention time correction xset does not result in the 291 peak making it into the peakTable.  Any other ideas?  Could it be a bug somewhere?  This behavior doesn't make sense to me at all. 

> xset4@peaks[which(xset4@peaks[,"mz"] > 291.086 & xset4@peaks[,"mz"] < 291.087),]
          mz    mzmin    mzmax      rt  rtmin  rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 109.8970 106.897 112.896 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 111.3708 107.112 117.395  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52

> xset5@peaks[which(xset5@peaks[,"mz"] > 291.086 & xset5@peaks[,"mz"] < 291.087),]
          mz    mzmin    mzmax      rt    rtmin    rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 107.7186 104.7037 110.7324 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 109.0032 104.7192 115.0076  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52
> xset6@peaks[which(xset6@peaks[,"mz"] > 291.086 & xset6@peaks[,"mz"] < 291.087),]
          mz    mzmin    mzmax      rt    rtmin    rtmax      into      intb      maxo    sn    egauss      mu    sigma        h    f dppm scale
[1,] 291.0866 291.0862 291.0870 107.7186 104.7037 110.7324 1349308.18 1347901.04 338777.25  873 0.08150396 245.1626 3.932065 340105.13 4863    2    2
[2,] 291.0863 291.0818 291.0898 109.0032 104.7192 115.0076  59929.07  59922.21  15187.01 15186 0.11094711 244.9695 4.436207  13578.48 2527    4    -1
    scpos scmin scmax lmin lmax sample
[1,]  244  242  246  49  63    51
[2,]    -1    -1    -1  48  72    52
>

> peaklist <- peakTable(xset6, value="into")
> peaklist[which(peaklist[,"mz"] > 291.05 & peaklist[,"mz"] < 291.11),]
 [1] mz                    mzmin                mzmax                rt                    rtmin                rtmax               
 [7] npeaks                Library_serumC8      Library_120912_011001 Library_120917_018301 Library_120917_009801 Library_120912_007801
[13] Library_120925_000701 Library_120921_013201 Library_120912_013501 Library_120817_016501 Library_120912_004001 Library_120917_000701
[19] Library_120917_011601 Library_120917_013301 Library_120912_017501 Library_120921_003901 Library_120917_003901 Library_120817_000901
[25] Library_120912_004701 Library_120817_015501 Library_120917_001301 Library_120912_000401 Library_120917_002201 Library_120921_002601
[31] Library_120925_001901 Library_120817_004801 Library_120925_001401 Library_120912_011002 Library_120917_018302 Library_120917_009802
[37] Library_120912_007802 Library_120925_000702 Library_120921_013202 Library_120912_013502 Library_120817_016502 Library_120912_004002
[43] Library_120917_000702 Library_120917_011602 Library_120917_013302 Library_120912_017502 Library_120921_003902 Library_120917_003902
[49] Library_120817_000902 Library_120912_004702 Library_120817_015502 Library_120917_001302 Library_120912_000402 Library_120917_002202
[55] Library_120921_002602 Library_120925_001902 Library_120817_004802 Library_120925_001402 Library_120817_004401 Library_120817_004402
<0 rows> (or 0-length row.names)
>

> peaklist <- peakTable(xset4, value="into")
> peaklist[which(peaklist[,"mz"] > 291.05 & peaklist[,"mz"] < 291.11),]
 [1] mz                    mzmin                mzmax                rt                    rtmin                rtmax               
 [7] npeaks                Library_serumC8      Library_120912_011001 Library_120917_018301 Library_120917_009801 Library_120912_007801
[13] Library_120925_000701 Library_120921_013201 Library_120912_013501 Library_120817_016501 Library_120912_004001 Library_120917_000701
[19] Library_120917_011601 Library_120917_013301 Library_120912_017501 Library_120921_003901 Library_120917_003901 Library_120817_000901
[25] Library_120912_004701 Library_120817_015501 Library_120917_001301 Library_120912_000401 Library_120917_002201 Library_120921_002601
[31] Library_120925_001901 Library_120817_004801 Library_120925_001401 Library_120912_011002 Library_120917_018302 Library_120917_009802
[37] Library_120912_007802 Library_120925_000702 Library_120921_013202 Library_120912_013502 Library_120817_016502 Library_120912_004002
[43] Library_120917_000702 Library_120917_011602 Library_120917_013302 Library_120912_017502 Library_120921_003902 Library_120917_003902
[49] Library_120817_000902 Library_120912_004702 Library_120817_015502 Library_120917_001302 Library_120912_000402 Library_120917_002202
[55] Library_120921_002602 Library_120925_001902 Library_120817_004802 Library_120925_001402 Library_120817_004401 Library_120817_004402
<0 rows> (or 0-length row.names)
>
Title: Re: group or peakTable problem
Post by: cbroeckl on December 12, 2012, 11:06:50 AM
Anyone have any suggestions as to why a feature in two of my samples (out of 52) does not make into my peakTable?  Any tips are appreciated.  Thanks,
Corey
Title: Re: group or peakTable problem
Post by: Carsten on December 13, 2012, 06:14:20 AM
Hmm, really strange, my first idea would also go into the direction of setting minfrac and minsamp to 0.
Could be a very rare bug  :?:

Perhaps you could try the other grouping method group.nearest and look if it performs better.

Cheers,
Carsten
Title: Re: group or peakTable problem
Post by: Carsten on December 13, 2012, 08:02:13 AM
After some debugging I think I have an idea, where the problem lies.
The group.density function bins all m/z values (meaning all m/z values over the whole retention time) and then loops over these bins.
It defines a number of peaks which has to occur in that bin, otherwise it loops into the next.
The value is defined as the minimum number of samples per class divided per two.
So for the faahKO data set this number is 3, because we have 6 samples per class.

If the number of peaks within that bin is higher than the threshold, the density function divides the m/z bin into different retention time groups.
For these groups the minfrac and minsamp values are then checked.

I have no idea why this threshold exist and if it can be safely removed.

Carsten
Title: Re: group or peakTable problem
Post by: cbroeckl on December 13, 2012, 10:37:09 AM
thanks carsten,

I am perplexed.  I haven't used the other grouping method but can try it.  I have tried setting the minsamp or minfrac to zero and that doesn't work.  If I take the two samples that contain the catechin molecular ion and group then the catechine molecular ion makes it intot he peak table.  If I use the c() command to add this to my 'background' xcms set, then group - the catechin molecular ion, while still present in the xcms@peaks slot, is absent from the peak Table. 

Is it possible that it is a function of the c() step?  I don't use this routinely, but was hoping to do so in this instance so that I didn't have to perform peak detection on the background files each time - just once.

Corey
Title: Re: group or peakTable problem
Post by: cbroeckl on December 13, 2012, 11:33:21 AM
Carsten,

Just saw your second response.  Thanks for the help.  I will try the other grouping method instead in the short term.

Corey
Title: Re: group or peakTable problem
Post by: cbroeckl on December 13, 2012, 11:46:53 AM
This is actually a little concerning.  I do not routinely structure my files for XCMS to recognize groups.  So if all my files (52) are in a single group, then I am only recovering features that are present in 1/2 of my files - 26 samples?  Shouldn't the minfrac take precendence over that value? 

Corey
Title: Re: group or peakTable problem
Post by: cbroeckl on December 13, 2012, 04:56:26 PM
FYI,

group.nearest did work for this application.  The catechin molecular ion was present in two of the 52 files, and ended up as a feature group in the peakTable.  I am a little shocked at how many more feature groups there are:  nearly 10 fold more, but that was a first run through, and I can't say that I have optimized parameters at all.  It does take quite a bit longer than the density algorithm.  Thanks for the tips.
Corey
Title: Re: group or peakTable problem
Post by: Carsten on December 20, 2012, 03:55:00 AM
I also observe that group.nearest produces more feature groups, but more doesn't mean "better"
But I haven't had the time to really compare both methods, so normally I stick to density, because I'm more familiar with it.

I spoke with Ralf about the group.density problem and we removed the threshold in the next xcms version,
because it's unnecessary.

That should also fix your problem.

Carsten
Title: Re: group or peakTable problem
Post by: cbroeckl on December 20, 2012, 06:01:28 PM
Thanks Carsten and Ralf.  I will upgrade soon then!
Corey
Title: Re: group or peakTable problem
Post by: Ralf on December 20, 2012, 07:39:31 PM
Thank you Carsten for looking into this.
XCMS version 1.35.3 includes the patch. Pull from SVN now or get it from BioC in a day or two.

Corey, please let us know if that fixes your problem.

Ralf