Skip to main content
Topic: Way too many ions being assigned to the same compound (Read 19949 times) previous topic - next topic

Way too many ions being assigned to the same compound

I'm trying to use CAMERA to annotate peaks from data collected in ESI- on a QToF in single-MS mode. I started off using the parameters listed in "LC-MS Peak Identification and Annotation with CAMERA" by Carsten Kuhl, Ralf Tautenhahn and Steffen Neumann, and then, when those parameters gave the result that many, many ions were all caused by the same compound, I tried adjusting. I've tried making my parameters stricter and stricter, and I've now got parameters that suggest that we've got the world's most amazingly mass- and retention-time accurate QToF, but I'm still coming up with the same number of grouped features every time. For example, a bunch of stuff co-elutes around 12.5 minutes, and CAMERA has put 137 ions into that pcgroup, and I just can't believe that one compound could really generate 137 ions. What am I doing wrong? Am I misunderstanding the output? I thought that CAMERA would take a peak-picked, peak-aligned and peak-filled XCMS object and determine which of all those mass features were caused by the same compound. For example, let's say that two compounds co-elute and each generates one Na adduct and one 13C peak in addition to their major, monoisotopic peak. Doesn't CAMERA then decipher those data and tell you, "Hey, it looks like you've got two different compounds that co-elute and these three peaks are because of compound A and those three peaks are because of compound B."? Wouldn't those two compounds have a different number listed under pcgroup?

Here's the code I'm using, in case that's illuminating.
Code: [Select]
Set1.annot <- xsAnnotate(Set1.filledpeaks)
Set1.F <- groupFWHM(Set1.annot, perfwhm=0.005)
Set1.C <- groupCorr(Set1.F, cor_eic_th=1.6, pval=0.001, calcIso=TRUE,
                      calcCiS=TRUE)
Set1.FI <- findIsotopes(Set1.C, maxcharge=3, maxiso=4, ppm=10,
                          mzabs=0.0001, intval="maxo", minfrac=0.1)
Set1.FA <- findAdducts(Set1.FI, ppm=10, mzabs=0.0001, multiplier=3,
                        polarity="negative", rules=NULL, max_peaks=100)
Set1.peaklist <- getPeaklist(Set1.FA)
write.csv(Set1.peaklist, file="Set1 annotated peaklist.csv")

By the way, what are the allowable numbers for cor_eic_th? Is that referring to equation 1 in Kuhl 20012 Analytical Chemistry? So can that parameter range from 0 to 3?

Thank you very much in advance!

Laura

Re: Way too many ions being assigned to the same compound

Reply #1
If the features actually are perfectly co-eluding then the correlation across peaks would be perfect and calcCiS would not be able to say that they are different compounds. Have you looked in the raw data if they really are coeluting?
If your have a reasonable number of samples you can try enabling calcCaS that would look for correlation across samples. In this way features that are perfectly coeluting but not correlating across samples (that is if they are from the same compound, if one is high in a sample the other must be too) can be separated.
If they are both perfectly coeluting and related in a way that makes them also highly correlated across samples then there is no magic that will tell you which truly belong together. Only MSn experiments can help you determine that.
Blog: stanstrup.github.io

Re: Way too many ions being assigned to the same compound

Reply #2
Thank you very much for your response! That clarified a lot! The software that I was using previously was Agilent's MassHunter Qualitative Analysis, and while its algorithm is proprietary and black-box, my understanding of how it works is that anytime it can reasonably assign an isotopic peak or an adduct peak to something, then and only then it will count those as the same compound. In other words, it assumes that co-eluting ions arise because of different compounds unless it has reason to believe they're the same. But what you're telling me, then, is that CAMERA works with the opposite assumption: CAMERA assumes that co-eluting ions are the same compound unless it has a reason to think that they are not. Wouldn't it be better the other way from a statistical perspective? If the burden of proof is to show that ions are caused by the same compound, then you're probably going to miss some ions that really are caused by the same compound and classify them as different. When you do statistical testing, then, the compounds will not be completely independent. On the other hand, if the burden of proof is to show that ions are caused by different compounds, then you'll sometimes mistakenly assign ions arising from multiple compounds as belonging to just one compound. If that happens, unless you're ridiculously lucky (or maybe unlucky), you'd probably have issues with false negatives because some compounds in that peak group might correlate with what you want and many would not. You'd increase the "noise" of your data a lot by misassignment of peak groups.

I see what you're saying about correlating across samples, and that makes sense to me if you're comparing two groups and calcCiS looks within one group at a time. Is that how it works? I mean, if you had some compound that was interesting because it's high in group 1 and low in group 2, what does CAMERA do with that information when it's calculating correlations across samples? And what about situations where you're not comparing two groups? In my research, I'm trying to find compounds that correlate with a separate measurement from the same subjects. I don't have multiple groups; I'm looking for what compound correlates linearly with this separately determined measurement. I expect that compounds that wind up being interesting to us will never have the same intensity across samples.

Laura

Re: Way too many ions being assigned to the same compound

Reply #3
I think you are misunderstanding how it works.

calcCiS: Calculate correlation inside samples
That means correlation across the peak = is it really coeluting or not?
It is correlation inside the sample; not inside a sample group. This means that camera goes back to the raw data and compares extracted ion chromatograms.
The illutration in Carsten's paper show this: http://pubs.acs.org/doi/abs/10.1021/ac202450g
This will fail if compounds are perfectly coeluting.

calcCaS: Calculate correlation accross samples
They are correlated if high intensity of feature A means high intensity of feature B. The study design or sample groups are not used for this information.
Look at these plots. Each dot is a sample.

Features that are highly correlated between samples
[attachment=1:13e2jjox]cor.png[/attachment:13e2jjox]
Features that are uncorrelated between samples
[attachment=0:13e2jjox]uncor.png[/attachment:13e2jjox]


These methods are not to solve the problem of features not being independent. That is a statistical problem. Even if you could perfectly tell which features are from the same groups you will still have correlated groups.
These functions are helpful for structure elucidation. For that reason you would rather include too much in one group than too little.
Would you rather have a group consisting of several compounds than not be aware that a feature belongs in a group? I would choose the former. You will need to manually asses the data in either case. And the adduct annotation will help you greatly "guesstimating" which are the true pseudo-molecular ions in you group. But it can do nothing if a compound have been split in different groups.

[attachment deleted by admin]
Blog: stanstrup.github.io

Re: Way too many ions being assigned to the same compound

Reply #4
Hi, Jan.

Ah, yes, I was misunderstanding how it works. That helps!

Mind if I ask a personal preference question, then? When you are first setting out to analyze your data and you have a list of mass features and their intensities from XCMS, do you try to do anything to assign which ions might come from the same compound before doing any statistics on your data set? Or do you use the output from difreport or peakTable as is, figure out which ions are the most statistically significant for your research question and then use CAMERA solely to start structure elucidation?

Thank you very much for all your help!

Laura

Re: Way too many ions being assigned to the same compound

Reply #5
Well I don't think I should be giving advise in statistics... But no I don't use assignment at all before I do statistics. So yes I only use CAMERA for identification after statistics have told me which features are interesting. I don't use the statistics in xcms as the studies I am working on have a design that requires more complicated statistics.

It sounds like PLS might be the appropriate statistical tool for your problem.
Blog: stanstrup.github.io

Re: Way too many ions being assigned to the same compound

Reply #6
Hi Laura,

as Jan already pointed out, CAMERA uses multiple informations to decide whether peaks
within a short retention time window originate from different co-elution or from the same substance. Those peaks can be adducts, clusters, isotopes and fragments.
For example, in our QToF system we observe a lot of in-source fragments.

If you have only a single sample experiment, as in your case, you can only use correlation based on peak shape similarity (short: groupCiS). 
The groupCorr function, which is a wrapper function for all underlying grouping functions, automatically recognize this.

So in short only those compounds stay together, which shares a high peak shape similarity.
But their can be the case, as Jan mentioned, that two compounds have a perfect correlation. I just added one example from our data.
Here we have 2 substances (red, blue) we shares perfect co-elution, even from the peak shape.
[attachment=0:1oaw7qxf]Bsp5.png[/attachment:1oaw7qxf]
But in that case we were lucky and CAMERA was able to annotate both to two different pseudo-molecular ion groups afterwards.

If you would go directly only to annotated peaks, then it could happen that important peaks are sorted out.
For example we have a high abundance fragment peak with different adducts, like [F+H]+ and [F+Na]+ , but only a small [M+H]+ with no isotope and adducts. If the mass difference between M and F is
not into your rule set, both would be separated. But the peak shape analysis suggests a correlation between both.

So we think that high correlation is mandatory and adduct annotation helps in further interpretation.

Cheers,
Carsten

[attachment deleted by admin]

Re: Way too many ions being assigned to the same compound

Reply #7
Thank you very much, Jan and Carsten. I was planning to use other tools for statistical analyses; your answers helped clarify the intent of CAMERA for me, though, and that was very helpful.

Thanks!

Laura

Re: Way too many ions being assigned to the same compound

Reply #8
Hi everybody,

I have the same question. What I understand by Jan and Carsten is that: features in the same group are not the same compound, it may be two or more compounds with high correlation. This group is independent of the annotations iso and adduct.

In my experiment, there are 3 classes, and each class has 5 replicate samples. Because of multiple samples, I use calcCas method not calcCiS.

I give you my CAMERA script:

diffreportcombi.neg1<-annotateDiffreport(xset4,perfwhm=0.4,calcCiS=FALSE,calcIso=TRUE,calcCaS=TRUE,maxcharge=3,maxiso=4,minfrac=0.05,ppm=5, mzabs=0.015,polarity="negative")
write.csv(diffreportcombi.neg1,file="diffreport test 1.csv")

These are two first groups in my peaklist:

   name                           isotopes                   adduct                 pcgroup
265.3/725   M265T725                                                                               1
266.3/723   M266T723                                                                               1
333.1/723   M333T723                                                                               1
401/723   M401T723                                                                               1
836.3/723   M836T723                                                                               1
350.1/723   M350T723                                                                               2
620.9/722   M621T722                                                                               2
553.2/722   M553T722                                                                               2
835.4/722   M835T722                                    [M+Cl]- 800.421                         2
554.2/722   M554T722                                                                               2
837.4/722   M837T722                                [M-2H+K]- 800.421                         2

My questions are:
Peak M401T723   and M836T723 are in the same group, what this means? I don't think they mean same compound. May be the are co-eluting? If they are co-elution, How CAMERA separates these two groups which have the same retention time? I checked intensities of these 11 features. They present same evolution across all the samples. If features in group 1 have high correlation, features in group 2 also have high correlation with group 1 because of the same evolution?

I know there are a lot of mysterious for me in CAMERA. Thank you for your help.

Re: Way too many ions being assigned to the same compound

Reply #9
I am not exactly sure what you are asking but some observations:

Quote
This group is independent of the annotations iso and adduct.
The annotation is done "inside" each group so you cannot say that they are independent. But the annotation is done after the grouping and not used to define the grouping. The exception is if you put calcIso=TRUE then it is trying to annotation isotopes before grouping and using that to figure out if they belong together. That also means that if you do each step separately findIsotopes should be before groupCorr if  calcIso=TRUE. I an not sure how annotateDiffreport handles that.

I don't understand why you would set calcCiS=FALSE. You can use both calcCaS and calcCiS at the same time. In your case that is probably preferable. In your case you only have 15 samples to calculate correlations across samples (calcCaS). That is not that strong (think if linear regression, on very noisy data, with 15 points). I do myself use only calcCaS if I have a large number of samples since calcCiS can cause false positives with perfectly co-eluting peaks (a bigger problem specially on short gradients). But I would not rely only on that with only 15 samples.

In your case CAMERA said that the features in group 1 behave the same way across samples. But they appear to behave differently than the compounds in group 2. You can try to plot the intensities against each other like I did above to get an idea of what is going on.
I am not sure I understand what you mean by "evolution across all the samples". Evolution as in chromatographic profile or in relation to your study design?


You will never get perfect grouping in CAMERA because there is no clear cut-off between "features that correlate quite poorly because of intensity variations" (for example adducts are not always linearly related to the pseudo molecular ion) and "features that correlate slightly because they are strongly biologically related". All it does is give you a raw idea.
Good annotation of fragments/adducts help a lot though. But that is highly dependent on a good list of adducts and fragments. I have combined a list of adduct/fragment rules that is much larger than what comes with CAMERA and you can find that here: https://github.com/stanstrup/chemhelper ... st/extdata.

Please ask if you have more questions.
Blog: stanstrup.github.io

Re: Way too many ions being assigned to the same compound

Reply #10
Hi Jan,

Thank you very much for your helpful explication, you are so so so great!!!

For yesterday’s CAMERA, I didn’t get all my samples so I just used serval samples to test CAMERA. The evolution of group 1 and 2 means the chromatographic profile. Group 1 and 2 have the same expression. They are all strongly expressed in class 2.
Today I did a new CAMERA with all my samples. I have 3 classes, control and treatment 1 and 2. In these three classes, there are 9 subclasses of time and each subclass contains 5 replicate samples. I try to list my questions clearly : )

1.
In my XCMS, I use obiwarp method for the retention time correction, I don’t know if it will influence my CAMERA. For example, the perfwhm value.

2.
According to your explication, I use both calcCiS and calcCaS this time. But I don’t want to use isotopic relationship for peak grouping, the calcIso is FALSE. I do first with your CAMERA rules each step separately for ploting EICs :

> xsa.neg<-xsAnnotate(xset3,polarity="negative")
> xsaG.neg<-groupFWHM(xsa.neg,perfwhm=0.6)
> xsaC.neg<-groupCorr(xsaG.neg,calcCiS=TRUE,calcIso=FALSE,calcCaS=TRUE)
> xsaFI.neg<-findIsotopes(xsaC.neg,,maxcharge=3,maxiso=4,minfrac=0.05,ppm=5, mzabs=0.015)
> file<-system.file('rules/CAMERA_rules_neg.csv',package="CAMERA")
> rules<-read.csv(file,sep=";")
> xsaFA.neg<-findAdducts(xsaFI.neg,polarity="negative",rules=rules)
> xsaFA.neg

Then annotateDiffreport for combining xcms and camera results:

> diffreportcombi.neg<-annotateDiffreport(xset3,perfwhm=0.6,calcCiS=TRUE,calcIso=FALSE,calcCaS=TRUE,maxcharge=3,maxiso=4,minfrac=0.05,ppm=5, mzabs=0.015,polarity="negative",rules=rules)

I used the same parameter value for these two processes, but I have different results. When I did each step separately, CAMERA find 59 isotopes, 320 adduct and 646 annotation groups. When I did annotateDiffreport, CAMERA find 50 isotopes, 245 adducts and 802 annotation groups. Do you have any idea for this difference? Did you combine the xcms matrix and camera matrix before? If not, how do you integrate these two matrixes?

3.
In your CAMERA rules, all the quasi set to 0 except [M-H]-, does it mean if there are only adducts in the annotation group, this group is excluded? So, for each [nM+ions]-, there should be a [M]- in the same annotation group?

4.
I sort my annotation groups in sequence (A to Z). Most adducts present in first 50 groups, and after group 252 there is no adducts. I think the order of groups maybe has some sense?

5.
In my result, I have some huge groups. For example, my group 6 contains 92 features. There are 56 adducts in this group, the mass from472 to 1195 and the retention time from 1227s to 1233s.
Do you know how to plot the two right graphs of figure 2 in Carsten Kuhl’s paper (Anal Chem. 2012 January 3; 84(1): 283–289. doi:10.1021/ac202450g) ?
I try to do calcPC, but I have problems:
> calcPC.hcs(xsaFA.neg)
Error in `colnames<-`(`*tmp*`, value = c(NA, NA, "weight")) :
  attempt to set 'colnames' on an object with less than two dimensions
> calcPC.lpc(xsaFA.neg)
Error in `colnames<-`(`*tmp*`, value = c(NA, NA, "weight")) :
  attempt to set 'colnames' on an object with less than two dimensions

That's all the questions for today~~~ Thank you very much in advance!!!

Re: Way too many ions being assigned to the same compound

Reply #11
1.
 I am not 100% sure but I think yes. As far as I could figure FWHM is calculated from rtmin and rtmax that changes when you do retcor.

2.
Dunno. I never used annotateDiffreport.

Looking at the code it seems it is using slightly different defaults.
Quote
annotateDiffreport
intval = "into"
Quote
groupFWHM, findIsotopes
intval = "maxo"

Maybe that is it?

3. from the CAMERA docs:
Quote
A annotation group must include at least one ion with quasi set to 1 for this adduct. If a annotation group only includes optional adducts (rule set to 0) then this group is excluded.

So in my list I require it to find either [M-H]- or [M+Cl]- to make an annotation group. If you know compounds that don't necessarily make those you can change that. I would also be interested in such examples.

4. Again not sure about the inner workings of CAMERA but I think it starts from the highest peaks. So there are probably not many groups with a high index that actually have more than one member. And hence no annotation. At least that is what I have observed.

5. Sorry dunno. I don't think there is an automagic function for that. I guess he extracted EICs a bit manually to do that plot.
Blog: stanstrup.github.io

Re: Way too many ions being assigned to the same compound

Reply #12
Hi Jan,
I appreciate your suggestions and your patience. You are right, the intval is not the same, when I set intval="maxo" in annotateDiffreport, it's ok.

I met a new problem... I think I made a grave mistake at the beginning of xcmsSet. My MS data is centroid mode, but in the findpeak step, I used matchedFliter method which is for profile mode MS data. I don't know if we can use matcheFliter for centroid mode. Maybe that's the reason for my unsatisfactory CAMERA list. I try xcms with centWave method, but no peak group was found. I study on centwave. Maybe see you later in the XCMS column. Thanks again for your help.

Re: Way too many ions being assigned to the same compound

Reply #13
matchedFilter is not for profile mode but for centroid mode. You might be confusing "profile matrix" with profile mode. "profile matrix" is XCMS lingo for a matrix of EICs.
Blog: stanstrup.github.io

Re: Way too many ions being assigned to the same compound

Reply #14
Just so. You've guessed it !!