Isotope annotation and grouping: approach question

Topic: Isotope annotation and grouping: approach question (Read 18249 times) previous topic - next topic

Isotope annotation and grouping: approach question

November 16, 2011, 05:24:33 PM

The original publication describing CAMERA suggests that the isotope and adduct annotation is performed using a sliding retention time window, such the isotopes with non-identical retention times can e recognized for all features. In the user guides accessed in R by typing findIsotopes? the recommendation, for the sake of performance, is to first group the peaks into pseudospectra. If one first groups peaks using groupFWHM(), then performs isotope and adduct annotation, does the sliding window only apply within a grouped pseudospectrum?

If not, then there is a real possibility that two features would be correlated using the validation tools within groupCorr might already be separated at that point, as the groupFWHM tool groups by retention time based on a center around an abundance feature. Does this sound correct?

Also, if a feature is assigned to a pseudospectrum by groupFWHM, and is removed from a pseudospectrum with groupCorr, is there any effort made to regroup the removed feature with other removed features within a range of retention times?

Just trying to understand the overall approach. Thanks.

Re: Isotope annotation and grouping: approach question

Reply #1 – November 18, 2011, 09:20:09 AM

Maybe I should rephrase the question a bit. I am interested in using CAMERA to help in the ID process. I am doing so in a manner that is based on features that are the most statistically interested, based on the experimental design. These features are often not the most abudnant features, which camera groupFWHM seems to employ. Does the approach utilized by groupFWHM compromise the grouping of the lesser abundance features? If so, is there a way to make CAMERA groupFWHM center the retention time window on the feature of interest, rather than the largest feature? Thanks.

Re: Isotope annotation and grouping: approach question

Reply #2 – November 18, 2011, 12:41:13 PM

If chromatographically resolved, features groups should be well defined, independent of the center feature as a starting point for that group.
However, the feature with the highest intensity gives the best estimate of the expected retention time window for grouping, based on its fwhm.

The correlation based validation was designed to eliminate false positive assignments and to find subgroups (in the case of overlapping feature groups),
no features are added to the groups at this point.

Re: Isotope annotation and grouping: approach question

Reply #3 – November 18, 2011, 04:57:25 PM

Thanks Ralf,

It is the first three words of your response that I am puzzled about. "Chromatographically resolved" doesn't often apply to complex samples. So if I have incomplete resolution of an abundant peak and a low-abundance peak, CAMERA is going to group the features from the low abundance peak with the features from the high abundance peak. Lets say that these two peaks are separated by 1.5 seconds, and you are using FMWH and sigma values for grouping which makes your retention time window using groupFWHM about 3.5 seconds - 1.75 seconds on either side (hypothetically). Some of the features from the less abundant peak are going to fall within the groupFWHM window, while others will not. This means that since we are using an abundance-based selection process for directing the grouping, the lesser abundant peaks will tend to be misgrouped at the first groupFWHM step. The correlation based filters can then remove the lesser abundant peak from the group containing the abundant peak, but the features that have been removed from the original group are now without a group (or in a group by thenselves), correct? While the subgrouping is a nice feature, if a true group has been split because of its retention time proximity to the retention time window boundaries of a major group, there is no way to put the lower abundance group back together, correct?

It seems that the solution to this, would be to have a function to allow the user to select the feature of interest, and center the retention time window around it, rather than around the nearest large peak. I have been trying to figure out how to do this myself, but haven't really succeeded. Basically, i just want to target the CAMERA process to a particular feature, rather than do so for the whole dataset, since the grouping process is driven by the major features.

Re: Isotope annotation and grouping: approach question

Reply #4 – November 18, 2011, 05:48:29 PM

The assumption we make here is that even with low-abundance, the peak center of this smaller peak should still be within the fwhm range of the larger peak.

Do you have an example or data where this assumption doesn't hold ?

Re: Isotope annotation and grouping: approach question

Reply #5 – November 20, 2011, 03:39:48 PM

The problem of dividing peaks per retention time is how to calculate and select the boundaries.
As Ralf already mentioned, we start with the feature with the highest intensity, assign to it other features within the FHWM and go subsequently to the second highest feature.
Under the assumption that the feature detection is able to retrieve the correct retention time in 100% this works quite well.
If the retention time for low abundant peaks is miscalculated your scenario can happen, but it should occur rarely.

A user input list of feature retention time centers as you suggested would have the same "binning" problem.

For a small hack, you could change the intensity of your special feature to the highest intensity. Then the algorithm would start with it and it results in pseudospectrum 1. Afterwards you can change the value back.

Re: Isotope annotation and grouping: approach question

Reply #6 – November 21, 2011, 03:52:44 PM

Thanks Ralf and Carsten,

I just want to make sure I am understanding the procedure used.

1. The largest peak is selected.
2. FWHM window is assigned
3. all other features within the window are now grouped.
4. Go to the next highest abundance feature in the entire dataset
5. repeat steps 2. and 3.
6. Correlational analysis for peak shape (within peak) and dataset wide (between peaks)
7. features removed from initial FWHM groupings are assigned to a new group (question: is there any attempt to regroup these?).

When I am collecting data for people it is much more likely that the features of most interest are small, rather than large. And by extension of the same logic used when you developed CAMERA, a retention time based grouping is going to work best when centered around the feature of interest. It will still suffer from the 'binning' limitation, but will suffer less if the bin is centered on the feature that is changing in response to the treatment than if centered on a nearby large feature.

To provide a more concrete example: I have recently run a test dataset in which I spiked a plant extract with five compounds, including caffeine. I ran samples, performed XCMS and CAMERA on the data. The molecular ion for caffeine, at low spike levels, was grouped with several ions that were part of the plant 'matrix' or background at higher levels. There is also an in-source fragment at m/z 138. Though I know this is a genuine in source fragment, it was not part of the CAMERA grouping, as there was a nearby background peak which shifted the average retention time by about 1.5-2 seconds, due to the broadening of the 138 peak. The goal is that the 138 fragment and the 195 parent should be in the same group after FWHM grouping. at low abundance, they are not. From my understanding (which is incomplete, so if I am mistaken, please do correct me), this is because the CAMERA grouping was built not around 195, but around a larger 'matrix' feature, in which 195, but not 138, was included. So I have then a CAMERA spectrum containing one feature representing caffeine, while the other caffeine feature is with a different group. I have no idea how commonly this would occur, but being that interesting features are often relatively low in abundance, I suspect it isn't rare. If, on the other hand, the FWHM grouping was centered around 195, there is a good change that the two features would be grouped together, at least after the FWHM step. They may then be separated based on peak shape, as the interfering matrix 138 would result in a different peak shape than that of the parent 195. But the dataset-wide correlation may still retain it if a peak shape filter isn't applied.

Carsten, regarding your 'small hack'. How would one change that value? Also on a (possibly??? related note) - how does the automatic file selection work for the CiS (peak shape) filter?

Thanks for all the feedback. I am trying to get the most out of these programs, and it does help alot to have the authors/developers so accessible.

Re: Isotope annotation and grouping: approach question

Reply #7 – November 24, 2011, 09:01:14 AM

Quote from: "cbroeckl"

1. The largest peak is selected.
....
7. features removed from initial FWHM groupings are assigned to a new group (question: is there any attempt to regroup these?).

That is the exact procedure, steps 1-5 are contained in groupFWHM and 6-7 in groupCorr.
At the moment there is unfortunately no direct attempt to regroup the peaks.

A possible solution could be to raise the perfwhm parameter until both peaks are contained in one group.
A higher perfwhm value results in larger retention time windows. This way more compounds would fall into one group before groupCorr.
Would that be okay for your analysis?

Quote from: "cbroeckl"

Carsten, regarding your 'small hack'. How would one change that value?

Possible hack for the faahKO data set:
It is a little bit larger, but I hope reproducible.
We assume feature 10 would be the 195 m/z peak

Code: [Select]

 library(CAMERA)
 library(faahKO)
 #alignment for faahko data set
 xs   <- group(faahko)

#  show intensities for feature 10
groupval(xs,value="maxo")[10,]
 ko15  ko16  ko18  ko19  ko21  ko22  wt15  wt16  wt18  wt19  wt21  wt22 
12957  8557  3291    NA  7865 10080 11416  9831    NA  4230 12368  9892 
#We see first sample has highest intensity for this feature 

#Look after peak index
> groupval(xs)[10,]
ko15 ko16 ko18 ko19 ko21 ko22 wt15 wt16 wt18 wt19 wt21 wt22 
  13  495 1051   NA 1787 2112 2450 2907   NA 3758 4060 4374 
#feature 10 has peak index 13
#small look to this peak
> xs@peaks[13,]
         mz       mzmin       mzmax          rt       rtmin       rtmax        into        intf        maxo        maxf      sample 
   236.0956    236.0000    236.1000   2518.5930   2504.5080   2534.2420 252282.0354 472730.1389  12957.0000  25108.7063      1.0000 
> xs@peaks[13,"maxo"]
 maxo 
12957 
#save old maxo value
oldvalue <- xs@peaks[13,"maxo"]
#change value to maximum
xs@peaks[13,"maxo"] <- 10000000

#Use CAMERA
xsa  <- xsAnnotate(xs)
xsa.group <- groupFWHM(xsa)
Start grouping after retention time.
Created 133 pseudospectra.

#Show first group (truncated for brevity )
> getpspectra(xsa.group,1)[,1:6]
        mz    mzmin    mzmax       rt    rtmin    rtmax
1 219.0848 219.0488 219.1000 2524.852 2518.592 2529.547
2 236.1018 236.0678 236.1188 2523.287 2518.592 2529.547
3 315.0000 315.0000 315.0000 2520.939 2507.638 2545.199
4 316.0000 316.0000 316.0230 2520.939 2509.203 2543.634
5 332.0000 332.0000 332.0121 2520.157 2507.638 2545.199
6 333.0012 333.0000 333.0286 2520.157 2507.638 2545.199
7 334.0316 334.0170 334.0733 2520.939 2509.203 2545.199
8 337.0000 336.9918 337.0000 2520.157 2509.203 2545.199

#We see feature 10 is in group 1
#write correct intensity back
xsa.group@xcmsSet@peaks[13,"maxo"] <- oldvalue

Hope this helps

Quote from: "cbroeckl"

how does the automatic file selection work for the CiS (peak shape) filter?

The automatic sample selection choose that sample for one pseudospektrum, which contains the most abundant peak.
The idea is similar to groupFWHM that the highest abundant peaks are expected to have the "nicest" peak shape.

Carsten

Re: Isotope annotation and grouping: approach question

Reply #8 – November 28, 2011, 01:54:41 PM

Thanks Carsten,

I am hesitant to simply broaden the FWHM window for this particular instance, because I want it to be a broadly applicable tool, I can't optimize for a single chromatographic peak. I think what I may try is to apply your 'hack' for each feature of interest, looping through all those features I am interested in, just to see how that works. I really appreciate the advice and clarification.

Corey

Re: Isotope annotation and grouping: approach question

Reply #9 – November 28, 2011, 03:34:58 PM

Carsten,

I don't know how difficult this would be, but could you, rather than assigning features that are removed using groupCorr to a new group, leave those removed features unassigned? The next step being to perform groupFWHM again, only on unassigned peaks, another groupCorr on the new groupings, etc etc? This might be a more versatile workaround, ultimately, than using the 'hack' described above. Just a thought.

Re: Isotope annotation and grouping: approach question

Reply #10 – November 29, 2011, 06:24:15 AM

Quote from: "cbroeckl"

Carsten,

I don't know how difficult this would be, but could you, rather than assigning features that are removed using groupCorr to a new group, leave those removed features unassigned? The next step being to perform groupFWHM again, only on unassigned peaks, another groupCorr on the new groupings, etc etc? This might be a more versatile workaround, ultimately, than using the 'hack' described above. Just a thought.

That would require a lot of changes in the source code, because it contradictory to the original design.

Another idea came up into my mind is to skip the groupFWHM part and do only groupCorr. This is possible but requires a lot more of calculations, because all features are put into one big group.
And the subsequent graph separation needs a lot more memory. Could be worth a try, depending on your workstation.

Carsten

Re: Isotope annotation and grouping: approach question

Reply #11 – November 30, 2011, 12:48:26 PM

Thanks Carsten,
I'll have to try that one too. I think I should have plenty of memory.
Corey

Re: Isotope annotation and grouping: approach question

Reply #12 – December 07, 2011, 01:48:52 PM

Carsten,

Could you point me to more information on the highly connected subgraph and label-propogation community algorithm and how they are used in CAMERA? Would they work as tools to group features without going through the groupFWHM steps first? Thanks,
Corey

Re: Isotope annotation and grouping: approach question

Reply #13 – December 13, 2011, 05:56:37 AM

Hi Corey,

in general we assume that all features originating from one substance share high EIC correlations. Correlation to features from other substances are significant lower or occur by change.
So both algorithms work in two parts, where the first is identical.
As first step we calculate the pairwise EIC correlation matrix between all features of one group (normally predefined from groupFWHM)
Afterwards we build a graph with features as nodes. For the edges we use the correlation value integrated with additional information like recognized isotopes.

The methods differs at the following second step. If the predefined group contains more than one compound (coeluting substances), both algorithms try to separate them on graph level.
The hcs algorithm tries to create high connected subgraphs by cutting low correlation edges. This produces the desired highly connected subgraphs.
The lpc algorithm labels all features with a unique number and then updating the labels by majority voting of the neighboring features under consideration of the edge weight. This produces consensus groups with a unique label.
Both methods return the separated feature groups.

From my experience the hcs produces more singletons due to the edge cutting whether the lpc tends to hold compound ions better together.

Quote from: "cbroeckl"

Would they work as tools to group features without going through the groupFWHM steps first?

In general based on the implementation, it is possible to use groupCorr without groupFWHM. In this case all ions are considered as one big group. I would suggested the use of the lpc, because the hcs runs into runtime and memory problems with large groups, and the lpc is stated as "near linear time". But I would expect a quite larger runtime.

Re: Isotope annotation and grouping: approach question

Reply #14 – December 14, 2011, 11:25:42 AM

I tried using this command:

an<-groupCorr(xset5, calcCiS=FALSE, calcCaS=TRUE, graphMethod="lpc")

and get an error:
Calculating peak correlations across samples.
% finished: 100 Error: cannot allocate vector of size 506.6 Mb

xset5 is an xsAnnotate object from a filled aligned xcmsSet object
I have plenty of RAM (8GB installed, 64 Bit windows 7), why do I get this error? Thanks again for all the advice.

Corey