I know that those peak shapes are not really looking nice, but I'd estimate that 80% of my peaks look like that or even worse. However, the peaks themselves are pretty good extracted.
Quote
Only the core region of your features seem to be within within that 30 (?) ppm window
The core region of the first peak has a m/z width of 4 ppm, the entire peak after widening by +/- 0.0003 Da has a width of 8 ppm. The core region of the second peak has a width of only 2 ppm but after manually widening by +/- 0.0025 Dalton the width is still 8 ppm. Plus, I plotted exactly the RT range as indicated in the peak table. So even if the peaks are not really nice looking, why wouldn't I want to have the entire peak extracted and integrated? Please let me know for what else I should look / I should provide to better understand what is going wrong.
Quote
CAMERA uses getEIC for EIC extraction, so the results will depend on the profstep parameter of your xcmsSet.
Ok! Which brings me again to the starting point in this and the other thread: I do not specify a profstep (which means the default of 0.1 is applied). To which value should I set the profstep then? How reliable is the peak correlation in CAMERA against the background we discussed already in the other thread?
thanks for your reply. In most cases adding +/- 0.0001 is sufficient to recover the peak. But there are still cases were it's necessary to make the m/z window wider to extract the centroids belonging to the peak. I just can give you two examples: [attachment=1:27tmo29s]pk2040.png[/attachment:27tmo29s] +/- 3E-4 needed with
mz mzmin mzmax rt rtmin rtmax into intb maxo sn egauss mu sigma h f dppm scale scpos scmin scmax lmin lmax 162.0765 162.0762 162.0769 76.68 44.58 107.07 1002001 970706.5 29823.78 55 NA NA NA NA 3961 1 10 75 65 85 43 105
[attachment=0:27tmo29s]pk2089.png[/attachment:27tmo29s] more than +/- 25E-4 needed with
mz mzmin mzmax rt rtmin rtmax into intb maxo sn egauss mu sigma h f dppm scale scpos scmin scmax lmin lmax 854.1456 854.1447 854.1467 146.85 133.55 171.06 494611.6 475216.1 19721.88 39 NA NA NA NA 4135 1 6 140 134 146 127 171
As long as I need to extract the ion chromatograms just for plotting, it's fine and I know how to deal with it. But my biggest concern is, if any of this observations made in this thread do affect the generation of pseudospectra in CAMERA in a negative way?
For the implementation of the "rawEIC"-Function for xcmsSet I am using the mz-ranges given in the peak table. However, I just realized that mzmin and mzmax do not specify the borders of my peaks exactly. findmzROI finds the mz-range perfectly, but later in the implementation of findPeaks.centWave the mz-range is narrowed depending on the found scale. Maybe someone could explain why this is done?
The problem with that is, that many EICs look like pretty disrupted zigzag curves when signals within the peak have an m/z outside of this range. So how can I determine (or at least approximate) the true mz-range of the peaks from the peak table?
to be honest I am not a CAMERA-Ninja but just started to get my hands on. So this is just a thought:
Quote
after groupFWHM() we have a pseudospectrum A with 100 peaks. After groupCorr() some peaks will be withdrawn due to low correlations and pseudospectrum A will only have lets say 70 peaks.
Let's say within those 100 peaks you find two peaks which fulfill the criteria for being isotopes of each other (12C/13C ratio + mz-difference), but their peak shapes correlate less than groupCorr() "expects". Which of these contrary observations is more reliable? Especially when considering that the M+1 or M+2 peak's abundance is typically only a small fraction of this of the corresponding M peak? Consequently, the peak shapes might look different in certain cases, considering that large M peaks might be subject to ion suppression as well as low M+X peaks could interfere with noise and the baseline.
Now, if those peaks are separated by groupCorr() none of them will be withdrawn, but they will be assigned to independent pseudospectra A and B. If you now apply findIsotopes(), you will loose the M+1 information for the M peak (in ps A), making a subsequent formula generation less reliable. And you'll find in ps B the M+1 without any, or worse with a wrong, annotation.
Thats why I currently apply findIsotopes() prior to groupCorr() as suggested in the vignette. Does this make sense to you?
I am just curious: Within the CAMERA workflow, you call findIsotopes() as last step. Is there a reason for that? I am just asking because I use to call findIsotopes() between groupFWHM() and groupCorr() since the isotope information can be used during correlation grouping, avoiding to separate isotope peaks of the same compound in different pseudospectra.
I am afraid the grouping just takes that long. Usually I use group.density for my data and just tried for comparison group.nearest on an example data set (6 Samples, around 3000 Peaks/Sample). My machine has almost the same configuration as yours (but running Mac OS-X). Grouping takes about 1.5 minutes. From the description I expect that the algorithm runs with something like O(n) = n^2, meaning, that if you have 100 times more peaks, your runtime should be around 10000 times longer. I am curious if someone else has an idea but I am afraid that you either have to deal with it or to lower the number of detected peaks somehow.
Before going on to grouping I'd try to better understand the peaktable. Maybe you should start looking only at the peaks of one sample
a) How many peaks overlap in RT? This could give you an idea if you have to compete with many adducts or insource fragmentations. b) How many peaks are very close in m/z and how close are they on the RT axis to each other? c) How are the abundances distributed? Maybe you are collecting too much noise at low abundance and could avoid this directly when calling xcmsSet.
Just a couple of ideas. Sorry that I couldn't help more.
However, I made the experience that using 3 slaves on a low number of chromatograms might be faster in the end, especially considering how Ive Bridge's Turbo Boost works. In your case (just 5 chromatograms) I would even use only 2 slaves. In order to use the multicore functionality you have to have Rmpi or Snow as parallelization backend installed.
3.) I am not sure but your netCDF-files seam to be really huge. Are the MS data in profile mode? If yes, could you transform them prior to xcms analysis in centroid mode?
4.) Do 289.296 peaks per sample seam reasonable to you? It's this peak count which makes the grouping so slow. Are you really expecting a scan-to-scan accuracy of 2ppm? Our qTOF which is advertised with accurate mass accuracy less than 5ppm (meaning a weighted mz-mean over an entire peak) exhibits actually a scan-to-scan accuracy up to 35ppm. Choosing the ppm-parameter too low might lead to disrupted peaks along the mz dimension. Additionally peaks might be disrupted in time dimension as well, when choosing peakwidth not properly. I guess you are running a UPLC (in UPLC mode) when choosing the peakwidth to (6,15)?
5.) You mention that your peaks are not correctly aligned. Aligned along retention time? You should consider to call rector prior to grouping then. But watch out, retcor.orbiwarp took for me always 10-50 times longer than calling group.density on the same data set. So, if it makes any sense to you, try to lower the number of found peaks before proceeding.
obviously I misunderstood it again. Since I applied centWave, I never specified a step size and did not expect that it is set and used internally. Looking at the implementation of getEIC( xcmsSet ) I just saw what happens.
My concern is also how this influences the subsequent workflow in CAMERA. I could imagine that groupCorr makes vast usage of getEIC.
Quote
An alternative would be to implement a "rawEIC" method for xcmsSet, i.e. not to use the profile matrix but the full raw data (rawEIC for xcmsRaw) for EIC generation. I think this might be nice to have anyway, especially in combination with centWave, so I'll add this to my personal to-do list.
I have done this already halfway. Should I submit it somewhere after polishing?
good point: Data are acquired on an HPLC-qTOF instrument (scan-to-scan accuracy around 30ppm). If we look at the data with the vendor software (Agilent MassHunter) we can clearly see that these are two distinct peaks, whereas the first has around 20-30 times higher intensity (as is correctly calculated by xcms). Most probably, the second peak is a result of instrumental detector ringing. I have attached the raw chromatographic data, extracted manually from an xcmsRaw:
However, getEIC( xcmsSet, group ) as well as getEIC( xcmsSet, mzrange ) collapse the two (distinctly detected) mass traces to one and my question is, how this can be avoided?
I have an xcmsSet (centWave peak-picked, orbiwarp RT-corrected, grouped, and peaks filled) where two peak groups have a very low m/z difference around 0.0441 and same peak shapes but different abundances and apeces:
many thanks for the explanations. Actually you're right:
Quote
...would probably make sense as long as the mzmin & mzmax that you're talking about is from a single file, ie from a single run of 'findPeaks'.
I should have mentioned, that I am calling `getEIC` directly on a single chromatogram with the sole purpose to access ion traces of targeted compounds. I am not even calling `findPeaks`. My data are acquired by a qTOF with scan-to-scan accuracy around 35ppm.
I understand the need for binning when performing profile generation, although I do not really understand why this is necessary when trying to access ion traces on the raw data. And in case of over binning I would have expected that all bins within the given mz-range are collapsed, which is obviously not the case. I circumvent this now with this solution (which is not vectorized yet and could be improved by using `data.table` instead of `data.frame`):
# sum intensities of signals within single scans eic <- aggregate( intensity ~ scantime, eic, sum ) return(eic) } eic <- getIonTrace( xcmsRaw( filename="test.mzdata.xml" ) , mz = 378.0977, ppm = 35 ) plot( intensity ~ scantime, eic )
As a follow-up question, which has nothing to do with `getEIC`: Does the step parameter influence the peak or ROI detection in `xcmsSet( files, method = "centWave", ... )`? I thought that no profMethod is applied when calling xcmsSet with centWave, but today I understood that every xcmsRaw is subject to binning. So, what would be a good parameter for our qTOF data, if the step parameter makes a difference?
I am playing around with getEIC on xcmsRaw objects and do not really understand the step parameter.
In the beginning I ignored this parameter completely and let it set to its default (0.1). But then I realized that although I provided mz ranges with a width of about 10-35 ppm, getEIC extracted mass traces within a much broader range. So I lowered the step parameter to 0.0001. However, now I get again strange results as indicated by the three ion chromatograms:
So my question is, what happend to the second and third EIC (30ppm/35ppm, step = 0.0001) and why? To which value should I set the step size? Would be something like 0.1*(mzmax - mzmin) a robust value?