Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - cbroeckl

61
XCMS - FAQ / Re: Time for scan x greater than scan y - ProteoWizard
Paul,

I am trying to use proteowizard as you describe (i have been using databridge) to convert waters Raw to mzXML format.  However, I am seeing some conversion issues which make me uncomfortable.  Whenever I use databridge to convert to cdf, the masses are identical between the raw data and the cdf data.  When I use massWolf to convert to mzXML, same mass data.  When I use MSConvert - either throuh command line or GUI, I see a shift in mass, with an increase of  about 0.05 - 0.1 Da.  Have you seen this?  I just want to make sure I am not doing anything wrong - and if I am not, then I want to make sure that others are aware of the bug - kinda throws a wrench in mass accuracy....
62
XCMS / XCMS issue on Linux cluster
I am trying for the first time to run XCMS on a cluster running linux: 

> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8      LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8 
 [7] LC_PAPER=en_US.UTF-8      LC_NAME=C               
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C     

attached base packages:
[1] stats    graphics  grDevices utils    datasets  methods  base   

other attached packages:
[1] xcms_1.26.1

loaded via a namespace (and not attached):
[1] tools_2.12.2

When I try to run XCMS on a cdf format Waters Q-TOF data file I get an error message which I haven't seen before.
> xset <- xcmsSet(filenames, method = "matchedFilter", fwhm = 8, max = 500, snthresh = 3,
            step = 0.05, steps = 2, mzdiff = 0.05, index = FALSE, sleep = 0)

111101_TwinGene_0095b01: Error in if (del == 0 && to == 0) return(to) :
  missing value where TRUE/FALSE needed

I have run the same data file using the same script on my windows 7 desktop in the past. 

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          LC_TIME=English_United States.1252   

attached base packages:
[1] stats    graphics  grDevices utils    datasets  methods  base   

other attached packages:
[1] xcms_1.30.3     
>

I have tried to run mzXML formatted data through both, and that format works well on both the Linux and Windows systems.  Obviously, there are differences in both the R version and the XCMS package versions.  The problem is that I do not have control of the the Linux cluster, so I was hoping to get some guidance on a potential resolution before requesting R and package upgrades.  I have had problems with cdf files previously using centWave, but not matchedFilter - but on the Linux system, I get an error message using either method.  Also worth noting - I am not using any of the parallel functions for this - just a single file/core.  So I assume it is more likely a Unix/Installation issue rather than a parallel/cluster issue.  Any advice is appreciated.
63
CAMERA / Re: Isotope annotation and grouping: approach question
I did get it to work, but it took all 12 GB of RAM that I had available.  lpc uses ALOT of memory, and is a bit messy.  I was maxed out until I ran gc() garbage cleanup.  The actual CAMERA object is only 203 MB.  If my dataset was any bigger, I don't think I could have made this work.
64
CAMERA / Re: Isotope annotation and grouping: approach question
R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-mingw32/x64 (64-bit)

I think that I may actually be running out of RAM, but I am not sure why.  If I keep an eye on the resource monitor, I see that I am maxing out, so process must be consuming much more RAM than the reported half GB.  I will have to try on a bigger computer or find a way to reduce RAM use.  Thanks,
65
CAMERA / Re: Isotope annotation and grouping: approach question
I tried using this command:

an<-groupCorr(xset5, calcCiS=FALSE, calcCaS=TRUE, graphMethod="lpc")

and get an error:
Calculating peak correlations across samples.
 % finished: 100  Error: cannot allocate vector of size 506.6 Mb

xset5 is an xsAnnotate object from a filled aligned xcmsSet object
I have plenty of RAM (8GB installed, 64 Bit windows 7), why do I get this error?  Thanks again for all the advice. 

Corey
66
CAMERA / Re: Isotope annotation and grouping: approach question
Carsten,

Could you point me to more information on the highly connected subgraph and label-propogation community algorithm and how they are used in CAMERA?  Would they work as tools to group features without going through the groupFWHM steps first?  Thanks,
Corey
68
CAMERA / Re: Isotope annotation and grouping: approach question
Carsten,

I don't know how difficult this would be, but could you, rather than assigning features that are removed using groupCorr to a new group, leave those removed features unassigned?  The next step being to perform groupFWHM again, only on unassigned peaks, another groupCorr on the new groupings, etc etc?  This might be a more versatile workaround, ultimately, than using the 'hack' described above.  Just a thought.
69
CAMERA / Re: Isotope annotation and grouping: approach question
Thanks Carsten,

I am hesitant to simply broaden the FWHM window for this particular instance,  because I want it to be a broadly applicable tool, I can't optimize for a single chromatographic peak.  I think what I may try is to apply your 'hack' for each feature of interest, looping through all those features I am interested in, just to see how that works.  I really appreciate the advice and clarification.

Corey
70
CAMERA / Re: Isotope annotation and grouping: approach question
Thanks Ralf and Carsten,

I just want to make sure I am understanding the procedure used. 

1. The largest peak is selected.
2. FWHM window is assigned
3. all other features within the window are now grouped.
4. Go to the next highest abundance feature in the entire dataset
5. repeat steps 2. and 3.
6. Correlational analysis for peak shape (within peak) and dataset wide (between peaks)
7. features removed from initial FWHM groupings are assigned to a new group (question: is there any attempt to regroup these?). 

When I am collecting data for people it is much more likely that the features of most interest are small, rather than large.  And by extension of the same logic used when you developed CAMERA, a retention time based grouping is going to work best when centered around the feature of interest.  It will still suffer from the 'binning' limitation, but will suffer less if the bin is centered on the feature that is changing in response to the treatment than if centered on a nearby large feature. 

To provide a more concrete example:  I have recently run a test dataset in which I spiked a plant extract with five compounds, including caffeine.  I ran samples, performed XCMS and CAMERA on the data.  The molecular ion for caffeine, at low spike levels, was grouped with several ions that were part of the plant 'matrix' or background at higher levels.  There is also an in-source fragment at m/z 138.  Though I know this is a genuine in source fragment, it was not part of the CAMERA grouping, as there was a nearby background peak which shifted the average retention time by about 1.5-2 seconds, due to the broadening of the 138 peak.  The goal is that the 138 fragment and the 195 parent should be in the same group after FWHM grouping.  at low abundance, they are not.  From my understanding (which is incomplete, so if I am mistaken, please do correct me), this is because the CAMERA grouping was built not around 195, but around a larger 'matrix' feature, in which 195, but not 138, was included.  So I have then a CAMERA spectrum containing one feature representing caffeine, while the other caffeine feature is with a different group.  I have no idea how commonly this would occur, but being that interesting features are often relatively low in abundance, I suspect it isn't rare.  If, on the other hand, the FWHM grouping was centered around 195, there is a good change that the two features would be grouped together, at least after the FWHM step.  They may then be separated based on peak shape, as the interfering matrix 138 would result in a different peak shape than that of the parent 195.  But the dataset-wide correlation may still retain it if a peak shape filter isn't applied. 

Carsten, regarding your 'small hack'.  How would one change that value?  Also on a (possibly??? related note)  - how does the automatic file selection work for the CiS (peak shape) filter? 

Thanks for all the feedback.  I am trying to get the most out of these programs, and it does help alot to have the authors/developers so accessible.
71
CAMERA / Re: Isotope annotation and grouping: approach question
Thanks Ralf,

It is the first three words of your response that I am puzzled about.  "Chromatographically resolved" doesn't often apply to complex samples.  So if I have incomplete resolution of an abundant peak and a low-abundance peak, CAMERA is going to group the features from the low abundance peak with the features from the high abundance peak.  Lets say that these two peaks are separated by 1.5 seconds, and you are using FMWH and sigma values for grouping which makes your retention time window using groupFWHM about 3.5 seconds - 1.75 seconds on either side (hypothetically).  Some of the features from the less abundant peak are going to fall within the groupFWHM window, while others will not.  This means that since we are using an abundance-based selection process for directing the grouping, the lesser abundant peaks will tend to be misgrouped at the first groupFWHM step.  The correlation based filters can then remove the lesser abundant peak from the group containing the abundant peak, but the features that have been removed from the original group are now without a group (or in a group by thenselves), correct?  While the subgrouping is a nice feature, if a true group has been split because of its retention time proximity to the retention time window boundaries of a major group, there is no way to put the lower abundance group back together, correct? 

It seems that the solution to this, would be to have a function to allow the user to select the feature of interest, and center the retention time window around it, rather than around the nearest large peak.  I have been trying to figure out how to do this myself, but haven't really succeeded.  Basically, i just want to target the CAMERA process to a particular feature, rather than do so for the whole dataset, since the grouping process is driven by the major features.
72
CAMERA / Re: Isotope annotation and grouping: approach question
Maybe I should rephrase the question a bit.  I am interested in using CAMERA to help in the ID process.  I am doing so in a manner that is based on features that are the most statistically interested, based on the experimental design.  These features are often not the most abudnant features, which camera groupFWHM seems to employ.  Does the approach utilized by groupFWHM compromise the grouping of the lesser abundance features?  If so, is there a way to make CAMERA groupFWHM center the retention time window on the feature of interest, rather than the largest feature?  Thanks.
73
CAMERA / Isotope annotation and grouping: approach question
The original publication describing CAMERA suggests that the isotope and adduct annotation is performed using a sliding retention time window, such the isotopes with non-identical retention times can e recognized for all features.  In the user guides accessed in R by typing findIsotopes? the recommendation, for the sake of performance, is to first group the peaks into pseudospectra.  If one first groups peaks using groupFWHM(), then performs isotope and adduct annotation, does the sliding window only apply within a grouped pseudospectrum? 

If not, then there is a real possibility that two features would be correlated using the validation tools  within groupCorr might already be separated at that point, as the groupFWHM tool groups by retention time based on a center around an abundance feature.  Does this sound correct? 

Also, if a feature is assigned to a pseudospectrum by groupFWHM, and is removed from a pseudospectrum with groupCorr, is there any effort made to regroup the removed feature with other removed features within a range of retention times? 

Just trying to understand the overall approach.  Thanks.
74
XCMS / Re: calibrate
THanks for a response Paul,

I am a bit confused by your comment - are you saying the for the calibrate function to work, you have to have an EXACT mass match?  If you have an exact mass match, why would you need to calibrate at all?  I am just very confused by the application of this function and the few results I have seen.  Thanks again,

Corey
75
XCMS / Re: calibrate
Pretty quiet here....  Anyone ever use the calibrate function?