Skip to main content

Show Posts

This section allows you to view all Show Posts made by this member. Note that you can only see Show Posts made in areas you currently have access to.

Messages - Biesterfeld

2
XCMS / Re: simpleloess - XCMS
Hey manal,

hard to say without an reproducible example. What's the output of traceback? What's your sessionInfo()?

I assume, the error occurs when you call xcms::retcor.peakgroups? Could you give us an reproducible minimal example?

Cheers, Isam
3
XCMS / Re: Again getEIC(): Is it supposed to work like this
Hey Ralf,

this is HILIC under HPLC conditions. The MS data are recorded by an Agilent qTOF/MS 6540. Currently, I call centWave with
Code: [Select]
ppm = 35, peakwidth = c(12, 300), snthresh = 10,  prefilter = c(0,0), mzCenterFun = "wMean", integrate = 2, mzdiff = 0.001,  fitgauss = FALSE, noise = 1000
I know that those peak shapes are not really looking nice, but I'd estimate that 80% of my peaks look like that or even worse. However, the peaks themselves are pretty good extracted.

Quote
Only the core region of your features seem to be within within that 30 (?) ppm window
The core region of the first peak has a m/z width of 4 ppm, the entire peak after widening by +/- 0.0003 Da has a width of 8 ppm. The core region of the second peak has a width of only 2 ppm but after manually widening by +/- 0.0025 Dalton the width is still 8 ppm. Plus, I plotted exactly the RT range as indicated in the peak table. So even if the peaks are not really nice looking, why wouldn't I want to have the entire peak extracted and integrated? Please let me know for what else I should look / I should provide to better understand what is going wrong.

Quote
CAMERA uses getEIC for EIC extraction, so the results will depend on the profstep parameter of your xcmsSet.
Ok! Which brings me again to the starting point in this and the other thread: I do not specify a profstep (which means the default of 0.1 is applied). To which value should I set the profstep then? How reliable is the peak correlation in CAMERA against the background we discussed already in the other thread?
4
XCMS / Re: Again getEIC(): Is it supposed to work like this
Hey Ralf,

thanks for your reply. In most cases adding +/- 0.0001 is sufficient to recover the peak. But there are still cases were it's necessary to make the m/z window wider to extract the centroids belonging to the peak. I just can give you two examples:
[attachment=1:27tmo29s]pk2040.png[/attachment:27tmo29s] +/- 3E-4 needed with
Code: [Select]
      mz    mzmin    mzmax    rt rtmin  rtmax    into     intb     maxo sn egauss mu sigma  h    f dppm scale scpos scmin scmax lmin lmax
162.0765 162.0762 162.0769 76.68 44.58 107.07 1002001 970706.5 29823.78 55    NA NA    NA NA 3961    1    10    75    65    85  43  105

[attachment=0:27tmo29s]pk2089.png[/attachment:27tmo29s] more than +/- 25E-4 needed with
Code: [Select]
      mz    mzmin    mzmax     rt  rtmin  rtmax     into     intb     maxo sn egauss mu sigma  h    f dppm scale scpos scmin scmax lmin lmax
854.1456 854.1447 854.1467 146.85 133.55 171.06 494611.6 475216.1 19721.88 39    NA NA    NA NA 4135    1    6  140  134  146  127  171

As long as I need to extract the ion chromatograms just for plotting, it's fine and I know how to deal with it. But my biggest concern is, if any of this observations made in this thread do affect the generation of pseudospectra in CAMERA in a negative way?

Thanks, Isam

[attachment deleted by admin]
5
XCMS / Re: Again getEIC(): Is it supposed to work like this
As a follow-up:

For the implementation of the "rawEIC"-Function for xcmsSet I am using the mz-ranges given in the peak table. However, I just realized that mzmin and mzmax do not specify the borders of my peaks exactly. findmzROI finds the mz-range perfectly, but later in the implementation of findPeaks.centWave the mz-range is narrowed depending on the found scale. Maybe someone could explain why this is done?

The problem with that is, that many EICs look like pretty disrupted zigzag curves when signals within the peak have an m/z outside of this range. So how can I determine (or at least approximate) the true mz-range of the peaks from the peak table?

Many thanks
Isam
6
XCMS / Re: Long runtime while grouping!
Hi Dominic,

to be honest I am not a CAMERA-Ninja but just started to get my hands on. So this is just a thought:

Quote
after groupFWHM() we have a pseudospectrum A with 100 peaks. After groupCorr() some peaks will be withdrawn due to low correlations and pseudospectrum A will only have lets say 70 peaks.

Let's say within those 100 peaks you find two peaks which fulfill the criteria for being isotopes of each other (12C/13C ratio + mz-difference), but their peak shapes correlate less than groupCorr() "expects". Which of these contrary observations is more reliable? Especially when considering that the M+1 or M+2 peak's abundance is typically only a small fraction of this of the corresponding M peak? Consequently, the peak shapes might look different in certain cases, considering that large M peaks might be subject to ion suppression as well as low M+X peaks could interfere with noise and the baseline.

Now, if those peaks are separated by groupCorr() none of them will be withdrawn, but they will be assigned to independent pseudospectra A and B. If you now apply findIsotopes(), you will loose the M+1 information for the M peak (in ps A), making a subsequent formula generation less reliable. And you'll find in ps B the M+1 without any, or worse with a wrong, annotation.

Thats why I currently apply findIsotopes() prior to groupCorr() as suggested in the vignette. Does this make sense to you?

Cheers,
Isam
7
XCMS / Re: Long runtime while grouping!
Hey Dominic,

good to here that.

I am just curious: Within the CAMERA workflow, you call findIsotopes() as last step. Is there a reason for that? I am just asking because I use to call findIsotopes() between groupFWHM() and groupCorr() since the isotope information can be used during correlation grouping,  avoiding to separate isotope peaks of the same compound in different pseudospectra.

Cheers,
Siam
8
XCMS / Re: Long runtime while grouping!
Dominic,

I am afraid the grouping just takes that long. Usually I use group.density for my data and just tried for comparison group.nearest on an example data set (6 Samples, around 3000 Peaks/Sample). My machine has almost the same configuration as yours (but running Mac OS-X). Grouping takes about 1.5 minutes. From the description I expect that the algorithm runs with something like O(n) = n^2, meaning, that if you have 100 times more peaks, your runtime should be around 10000 times longer. I am curious if someone else has an idea but I am afraid that you either have to deal with it or to lower the number of detected peaks somehow.

Before going on to grouping I'd try to better understand the peaktable. Maybe you should start looking only at the peaks of one sample
Code: [Select]
pks <- peaks(xset)
pksSample1 <- pks[ pks[ ,"sample" ] == 1, ]
.

I'd focus on three aspects:

a) How many peaks overlap in RT? This could give you an idea if you have to compete with many adducts or insource fragmentations.
b) How many peaks are very close in m/z and how close are they on the RT axis to each other?
c) How are the abundances distributed? Maybe you are collecting too much noise at low abundance and could avoid this directly when calling xcmsSet.

Just a couple of ideas. Sorry that I couldn't help more.

Cheers,
Isam
9
XCMS / Re: Long runtime while grouping!
Hey Dominic,

1.) Which R-version are you actually deploying? 32 or 64 bit? Whats the output of
Code: [Select]
R.version
Can you see from the task manager how much memory is allocated to Rsession?

2.) Your CPU has 4 cores, you should make use of it! At least xcmsSet can be parallelized by
Code: [Select]
xcmsSet( ..., nSlaves = 4 )
However, I made the experience that using 3 slaves on a low number of chromatograms might be faster in the end, especially considering how Ive Bridge's Turbo Boost works. In your case (just 5 chromatograms) I would even use only 2 slaves. In order to use the multicore functionality you have to have Rmpi or Snow as parallelization backend installed.

3.) I am not sure but your netCDF-files seam to be really huge. Are the MS data in profile mode? If yes, could you transform them prior to xcms analysis in centroid mode?

4.) Do 289.296 peaks per sample seam reasonable to you? It's this peak count which makes the grouping so slow. Are you really expecting a scan-to-scan accuracy of 2ppm? Our qTOF which is advertised with accurate mass accuracy less than 5ppm (meaning a weighted mz-mean over an entire peak) exhibits actually a scan-to-scan accuracy up to 35ppm. Choosing the ppm-parameter too low might lead to disrupted peaks along the mz dimension. Additionally peaks might be disrupted in time dimension as well, when choosing peakwidth not properly. I guess you are running a UPLC (in UPLC mode) when choosing the peakwidth to (6,15)?

5.) You mention that your peaks are not correctly aligned. Aligned along retention time? You should consider to call rector prior to grouping then. But watch out, retcor.orbiwarp took for me always 10-50 times longer than calling group.density on the same data set. So, if it makes any sense to you, try to lower the number of found peaks before proceeding.

Cheers,
Isam
10
XCMS / Re: Again getEIC(): Is it supposed to work like this
Sorry Ralf,

obviously I misunderstood it again. Since I applied centWave, I never specified a step size and did not expect that it is set and used internally. Looking at the implementation of getEIC( xcmsSet ) I just saw what happens.

My concern is also how this influences the subsequent workflow in CAMERA. I could imagine that groupCorr makes vast usage of getEIC.

Quote
An alternative would be to implement a "rawEIC" method for xcmsSet, i.e. not to use the profile matrix but the full raw data (rawEIC for xcmsRaw) for EIC generation.
I think this might be nice to have anyway, especially in combination with centWave, so I'll add this to my personal to-do list.

I have done this already halfway. Should I submit it somewhere after polishing?

Cheers,
Isam
11
XCMS / Re: Again getEIC(): Is it supposed to work like this
meow,

good point: Data are acquired on an HPLC-qTOF instrument (scan-to-scan accuracy around 30ppm). If we look at the data with the vendor software (Agilent MassHunter) we can clearly see that these are two distinct peaks, whereas the first has around 20-30 times higher intensity (as is correctly calculated by xcms). Most probably, the second peak is a result of instrumental detector ringing. I have attached the raw chromatographic data, extracted manually from an xcmsRaw:

[attachment=0:3rs63afe]raw.png[/attachment:3rs63afe]

However, getEIC( xcmsSet, group ) as well as getEIC( xcmsSet, mzrange ) collapse the two (distinctly detected) mass traces to one and my question is, how this can be avoided?

Many thanks,
Isam

[attachment deleted by admin]
12
XCMS / Again getEIC(): Is it supposed to work like this
Hey there,

I had already a couple of weeks ago some issues with getEIC() (see http://http://www.metabolomics-forum.com/viewtopic.php?f=8&t=384) but since my new problem has nothing to do with that I opened a new topic.

I have an xcmsSet (centWave peak-picked, orbiwarp RT-corrected, grouped, and peaks filled) where two peak groups have a very low m/z difference around 0.0441 and same peak shapes but different abundances and apeces:

Code: [Select]
> groups( xset )[98:99, ]
        mzmed    mzmin    mzmax rtmed rtmin rtmax npeaks B_Cal D_Cal
[1,] 115.0867 115.0867 115.0868 64.23 63.65 64.75      6    3    3
[2,] 115.1308 115.1293 115.1309 64.62 64.61 64.81      6    3    3
> pks <- peaks( xset )[ unlist( xset@groupidx[ 98:99 ] ), c(1:6,9) ]
> pks
            mz    mzmin    mzmax    rt rtmin rtmax      maxo
 [1,] 115.0868 115.0864 115.0870 63.66 52.62 75.72 631219.562
 [2,] 115.0867 115.0861 115.0870 64.75 52.65 75.22 505789.812
 [3,] 115.0867 115.0864 115.0868 64.62 52.62 75.66 433946.500
 [4,] 115.0868 115.0864 115.0869 63.65 48.54 78.77 513818.281
 [5,] 115.0868 115.0864 115.0869 63.84 53.44 75.68 453958.312
 [6,] 115.0867 115.0866 115.0868 64.62 52.56 75.66 397969.844
 [7,] 115.1308 115.1306 115.1310 64.62 59.64 68.64  19231.410
 [8,] 115.1309 115.1303 115.1311 64.75 54.72 71.49  14101.018
 [9,] 115.1293 115.1292 115.1294 64.62 59.64 69.66  9610.525
[10,] 115.1309 115.1308 115.1310 64.61 60.59 68.64  15007.605
[11,] 115.1308 115.1307 115.1309 64.81 59.08 69.64  12008.001
[12,] 115.1308 115.1306 115.1309 64.62 59.64 69.66  10699.456

However, getEIC seams obviously to not resolve those peaks:
Code: [Select]
plot( getEIC( xset, group = 98 ) )
plot( getEIC( xset, group = 99 ) )
[attachment=1:24zorgyy]eics.png[/attachment:24zorgyy]

Even if I specify the mzrange explicitly the mass traces are not resolved:
Code: [Select]
plot( getEIC( xset, mzrange= pks[ 1:6, 2:3 ] , rtrange = pks[ 1:6, 5:6 ] ) )
plot( getEIC( xset, mzrange= pks[ 7:12, 2:3 ] , rtrange = pks[ 1:6, 5:6 ] ) )
[attachment=0:24zorgyy]eic2.png[/attachment:24zorgyy]

Is it something I am doing wrong or is getEIC just supposed to work like this?

Many thanks in advance,
Isam

[attachment deleted by admin]
14
XCMS / Re: How does "step" influence the result of getEIC?
Hey Paul,

many thanks for the explanations. Actually you're right:
Quote
...would probably make sense as long as the mzmin & mzmax that you're talking about is from a single file, ie from a single run of 'findPeaks'.
I should have mentioned, that I am calling `getEIC` directly on a single chromatogram with the sole purpose to access ion traces of targeted compounds. I am not even calling `findPeaks`. My data are acquired by a qTOF with scan-to-scan accuracy around 35ppm.

I understand the need for binning when performing profile generation, although I do not really understand why this is necessary when trying to access ion traces on the raw data. And in case of over binning I would have expected that all bins within the given mz-range are collapsed, which is obviously not the case. I circumvent this now with this solution (which is not vectorized yet and could be improved by using `data.table` instead of `data.frame`):

Code: [Select]
getIonTrace <- function( obj, mz, ppm, rtRange ) {
  # get table mz, intensity, scantime
  scanTime <- rep( obj@scantime, times = diff(  c( obj@scanindex, length( obj@env$mz ) ) ) )
  eic <- data.frame( intensity = obj@env$intensity, mz = obj@env$mz, scantime = scanTime ) 
 
  # filter RT
  if( !missing( rtRange ) ) {
    eic <- eic[ eic$scantime >= rtRange[1] & eic$scantime <= rtRange[2], ]
  }
 
  # filter mz
  mzRange <- c( mz * (1 - 1E-6 * ppm) , mz * (1 + 1E-6 * ppm) )
  eic <- eic[ eic$mz >= mzRange[1] & eic$mz <= mzRange[2], ]
 
  # sum intensities of signals within single scans
  eic <- aggregate( intensity ~ scantime, eic, sum )
  return(eic)
}
eic <- getIonTrace( xcmsRaw( filename="test.mzdata.xml"  ) , mz = 378.0977, ppm = 35 )
plot( intensity ~ scantime, eic )

As a follow-up question, which has nothing to do with `getEIC`: Does the step parameter influence the peak or ROI detection in `xcmsSet( files, method = "centWave", ... )`? I thought that no profMethod is applied when calling xcmsSet with centWave, but today I understood that every xcmsRaw is subject to binning. So, what would be a good parameter for our qTOF data, if the step parameter makes a difference?

Many thanks,
Isam
15
XCMS / How does "step" influence the result of getEIC?
Hey there,

I am playing around with getEIC on xcmsRaw objects and do not really understand the step parameter.

In the beginning I ignored this parameter completely and let it set to its default (0.1). But then I realized that although I provided mz ranges with a width of about 10-35 ppm, getEIC extracted mass traces within a much broader range. So I lowered the step parameter to 0.0001. However, now I get again strange results as indicated by the three ion chromatograms:

[attachment=2:3tprx2vu]eic1.png[/attachment:3tprx2vu][attachment=1:3tprx2vu]eic2.png[/attachment:3tprx2vu][attachment=0:3tprx2vu]eic3.png[/attachment:3tprx2vu]

So my question is, what happend to the second and third EIC (30ppm/35ppm, step = 0.0001) and why? To which value should I set the step size? Would be something like 0.1*(mzmax - mzmin) a robust value?

Many thanks
Isam

[attachment deleted by admin]