Questions about CAMERA and psgroup

Topic: Questions about CAMERA and psgroup (Read 7098 times) previous topic - next topic

Questions about CAMERA and psgroup

August 19, 2015, 06:42:54 AM

Dear all,
I currently working on an untargeted LC MS metabolomics project. I have analysed plasma samples in 4 different modes, namely hilic positive mode and hilic negative mode, reversed phase positive mode and reversed phase negative mode. I use XCMS for peak picking and alignment and the CAMERA for ion annotation. However, I could not trust the psudospectra information obtained from CAMERA. Here I show you the example.
For hilic negative mode:
XCMS:
AhilicNEG_Xcset<- xcmsSet(AhilicNEG, method="centWave", peakwidth=c(5, 20), ppm=15, snthresh=6, mzdiff=0.01, prefilter=c(3,100), mzCenterFun="wMean", integrate=1)
AhilicNEG_re=retcor(AhilicNEG_Xcset,method="obiwarp",profStep=1,gapInit=0.296,gapExtend=2.4)
AhilicNEG_gr=group(AhilicNEG_re,method="density",bw=3,mzwid=0.0234,minfrac=0.1,minsamp=1,max=50)
AhilicNEG_peaks=fillPeaks(AhilicNEG_gr)
CAMERA:
cameraAhilicNEG_peaks=xsAnnotate(AhilicNEG_peaks)
cameraAhilicNEG_peaks_FWHM=groupFWHM(cameraAhilicNEG_peaks,sigma=6, perfwhm=0.4)
cameraAhilicNEG_peaks_ISO=findIsotopes(cameraAhilicNEG_peaks_FWHM,maxcharge=3, maxiso=4, minfrac=0.5,ppm=10, mzabs=0.015)
cameraAhilicNEG_peaks_GC=groupCorr(cameraAhilicNEG_peaks_ISO,cor_eic_th=0.7,cor_exp_th=0.4,pval=0.001,calcCiS=TRUE,calcIso=TRUE, calcCaS=TRUE, graphMethod="hcs")
cameraAhilicNEG_peaks_ADD=findAdducts(cameraAhilicNEG_peaks_GC, ppm=10, mzabs=0.015, multiplier=3,polarity="negative", rules=NULL, max_peaks=100)
cameraAhilicNEG_peaks_peaklist=getPeaklist(cameraAhilicNEG_peaks_ADD)
However, I get the result and I am not sure whether I can trust the CAMERA result in this case. I am sorry to bother you with the long Email but I am sincerely very eager to get your suggestion and help. And your help definitely will facilitate to make progress of my ongoing project!!

I have shown you some of results as below: psgroup EICs have been uploaded as attachments.
174 features belonging to pcgroup 2
49 features belong to pcgroup 3
Even worse, 688 features belong to 9 pcgroup
And the other example is , these three features come out at similar retention time but have been separated into three pcgroup as you can see below.
         pcgroup   m/z         rt
2026   [119][M]-      543   229.0538   229.0509   229.0551   26.88
2037   [119][M+1]-      544   230.0575   230.0555   230.059   26.88
2061   [119][M+2]-      545   231.0338   231.0323   231.0357   26.28

How could I optimize the parameters of CAMERA or even XCMS to get the well-defined psgroup information for ion annotation? I am very confused.
I am looking forward to have your comments on these!

Lin Shi

[attachment deleted by admin]

Re: Questions about CAMERA and psgroup

Reply #1 – August 19, 2015, 08:28:52 AM

It is not easy to tell if you have a problem or not. It is normal that in the beginning of the chromatograms you have a lot of ions that gets grouped together because they are all co-eluting and probably also correlated because of ion suppression.
You can never "trust" CAMERA. It is a great help but you need to make sense of it in the end. I don't understand why you have set cor_exp_th=0.4? That seems so low as to be useless... So that is the first thing I would change and see if it helps.
You might find the discussion here useful: viewtopic.php?f=24&t=278.
When choosing calcCiS and calcCaS you need to consider the number of samples you have. Also "lpc" might be better than "hcs".

For features that you believe are erroneously split you should go through each step of the CAMERA process and see when it gets split. You can simply use getPeaklist after each step and check what happened to the ions you are not satisfied with.

Re: Questions about CAMERA and psgroup

Reply #2 – August 21, 2015, 04:41:20 AM

Hi Jan,
Thank you so much for help.
I have attached the file with my explanation why we choose the cor_exp_th 0.4 and also the plot of pcgroups when I increase the cor_exp_th. I hope you could have a look. I have large sample size, more than 1000 samples. I have more than 3000 pcgroups but 70% of them have really bad EICs. Some of them are not even real peak. And only one ion has been involved in one pcgroup. However, several features are the VIP features after my PLS model. In this case, how could I identify them?
As you mentioned before, you use CAMERA after statistics. Do you mean that you use all the well behaved and reproducible features after XCMS for statistics ? Many features after XCMS belong to one compound. In model, isotope, adduct and other fragments from same compound will contribute together. Do you think it will increase the chance of overfitting the multi statistical model?
I have not tried the lpc function so far. I have no idea how it works but I am trying.
Thank you again for your kind help.
Best regards,
Lin

[attachment deleted by admin]

Re: Questions about CAMERA and psgroup

Reply #3 – August 21, 2015, 06:23:13 AM

Hi Lin,

I am not sure I agree with the interpretation of your plots. If I understood you correctly you say that cor_exp_th=0.4 is a threshold because after that the number of pcgroups start increasing? To me that just shows that calcCaS does not do its job at 0.4... Everything is able able to be correlated at 0.4. Think of R^2 = 0.4. Would you consider that nicely correlated (http://www.jerrydallal.com/lhsp/pix/corrp.gif)?

Also with so many samples I would consider turning calcCiS off. The reason is that with 1000 samples a good correlation across peaks (calcCaS) is a very strong indication that the peaks are from the same compound. On the other hand calcCiS is prone to "false positives" when you have a lot of co-eluting peaks so if you can rely on calcCaS only that might improve things. I show an example of this in chapter 4.1 of my thesis: https://www.researchgate.net/publicatio ... n_pipeline

Quote

And only one ion has been involved in one pcgroup. However, several features are the VIP features after my PLS model. In this case, how could I identify them?

It is normal that there are a lot of ions that are in their own pcgroup. Either noise or compounds that don't make fragments. I would not be too concerned about that. If you look manually at the peaks is there other ions that seem like they belong with your marker?
So one question is: do you trust your PLS model? Was it properly validated? I cannot help much there since I have never used PLS in my own studies but with 1000 samples there should be plenty to build a very robust model I would assume. What does box plots look like? Are they clear markers? What does the peaks look like? Are they real peaks? Did you randomize properly (sampling, sample prep, analysis)?
As for identification your only chance is to see if you can do MS/MS on the ion. If not it becomes a question of how much you want it and you revert to classical isolation and structure elucidation methods if you need to.

About your noise features: Should be a peak-picking issue. Are your noise features found at all retention times and not just at the beginning? What instrument are you using? If it is a q-tof your ppm might be a bit low. You can also try to raise the s/n. Your peakwidth might also be too narrow to cover everything. Your prefilter also seems low for the instruments I have seen (but of course depends on the scale the instrument is using).
You can also try IPO (http://www.biomedcentral.com/1471-2105/16/118) for automatic optimization of XCMS parameters.

Quote

As you mentioned before, you use CAMERA after statistics. Do you mean that you use all the well behaved and reproducible features after XCMS for statistics ? Many features after XCMS belong to one compound. In model, isotope, adduct and other fragments from same compound will contribute together. Do you think it will increase the chance of overfitting the multi statistical model?

I use all peaks from XCMS to run my statistics. I normally do univariate mixed linear models + correction for multiple testing. Its fast and easy to interpret.

I am reluctant to comment on statistics (I can drive a car but taking my directions for building one is probably not wise...). I don't know how having more variables describe the same thing affects overfitting but isn't it the whole point of multivariate stats? To allow for highly correlated data.

Re: Questions about CAMERA and psgroup

Reply #4 – August 21, 2015, 10:09:57 AM

Hi Jan,
Thank you so much for making a lot of things clear for me!!
I will turn calcCIS off and see what will happen.
For statistics, so far I am using a in-house developed multilevel PLS model with repeated double cross validation. So I think this model is robust and properly validated. 40% the VIP features are nice and real peaks. I need to double check these. Sorry that I cannot answer all your questions but they are very useful!! Sample preparation and sampling are randomized properly.
We use the HPLC-qTOF-MS system (Agilent Technologies), which consisted of a 1290 LC system, a Jetstream electrospray ionization (ESI) source, and a 6540UHD accurate- mass qTOF spectrometer. Regarding the peak picking, I tested different parameters of XCMS and also applied IPO for optimization. And also I referred to the suggested values based on the XCMS online version and from several publications. So the parameters of my XCMS are assumed as optimized values. For reversed phase, the peakwidth is larger in my case.

Thank you again for helping me so much.
Best regards,
Lin