Hello,
I'm running into some problems with the annotate() wrapper function in the CAMERA package. I am using CAMERA to annotate 686 LC/MS samples hoping to eventually extract the isotope and "pseudospectra" data from the final object.
I am using a xset that I converted from a XCMSnExp object as such after peak detection, grouping, rt correction, regrouping, and peak filling in xcms.
xset <- x_filled
xset <-as(xset, "xcmsSet")
I run this to avoid this error: https://support.bioconductor.org/p/69414/
imports = parent.env(getNamespace("CAMERA"))
unlockBinding("groups", imports)
imports[["groups"]] = xcms::groups
lockBinding("groups", imports)
Then my function:
xset_a = annotate(xset,
quick=FALSE,
sample=NA,
nSlaves=16,
sigma=6,
perfwhm=0.6,
cor_eic_th=0.75,
graphMethod="hcs",
pval=0.05,
calcCiS=TRUE,
calcIso=TRUE,
calcCaS=FALSE,
maxcharge=4,
maxiso=4,
minfrac=0.5, # 0.25?
psg_list=NULL,
rules=NULL,
polarity=subset.polarity,
multiplier=3,
max_peaks=100,
intval="into",
ppm=2.5,
mzabs=0.0015
)
I get this output:
Starting snow cluster with 16 local sockets.
Run cleanParallel after processing to remove the spawned slave processes!
Start grouping after retention time.
Warning: Feature 1193 looks odd for at least one peak. Please check afterwards.
Warning: Feature 1190 looks odd for at least one peak. Please check afterwards.
Warning: Feature 239 looks odd for at least one peak. Please check afterwards.
Warning: Feature 1195 looks odd for at least one peak. Please check afterwards.
Warning: Feature 1204 looks odd for at least one peak. Please check afterwards.
Warning: Feature 236 looks odd for at least one peak. Please check afterwards.
Warning: Feature 10783 looks odd for at least one peak. Please check afterwards.
Warning: Feature 6130 looks odd for at least one peak. Please check afterwards.
Warning: Feature 10777 looks odd for at least one peak. Please check afterwards.
Warning: Feature 10790 looks odd for at least one peak. Please check afterwards.
Warning: Feature 1666 looks odd for at least one peak. Please check afterwards.
Warning: Feature 1672 looks odd for at least one peak. Please check afterwards.
Warning: Feature 3679 looks odd for at least one peak. Please check afterwards.
Warning: Feature 10786 looks odd for at least one peak. Please check afterwards.
Warning: Feature 4177 looks odd for at least one peak. Please check afterwards.
Warning: Feature 4185 looks odd for at least one peak. Please check afterwards.
Warning: Feature 1207 looks odd for at least one peak. Please check afterwards.
Warning: Feature 6139 looks odd for at least one peak. Please check afterwards.
....
*ect Ive cut the warnings down to save space. There were 623, same as the amount of samples*
Created 796 pseudospectra.
Generating peak matrix!
Run isotope peak annotation
% finished: 10 20 30 40 50 60 70 80 90 100
Found isotopes: 34751
Start grouping after correlation.
Generating EIC's ..
Warning: Found NA peaks in selected sample.
Calculating peak correlations in 796 Groups...
% finished: 10 20 30 40 50 60 70 80 90 100
Calculating isotope assignments in 796 Groups...
% finished: 10 20 30 40 50 60 70 80 90 100
Calculating graph cross linking in 796 Groups...
% finished: 10
The function has been hanging here for about a day.
I have two questions:
One, what do these warnings imply? I haven't received it before. I looked into the CAMERA source code to try and clarify but I am still not sure what it is implying.
Two, is there an obvious reason my code would be hanging here? I have run CAMERA a lot and never experienced it hanging like this before.
Thanks for any insights you all might have. If there is any more info I should provided or other places I should inquire let me know.
Thanks
Henry Holm
PhD Student
"Found isotopes: 34751" indicates that you have an insane number of peaks. How many peaks in your xset? That is probably why it is hanging.
Hi Jan,
Thank you so much for the response. Yes very large data set. Here is the peakfilled XCMSnExp object before I turn it into a xset.
> x_filled
MSn experiment data ("XCMSnExp")
Object size in memory: 109.46 Mb
- - - Spectra data - - -
MS level(s): 1
Number of spectra: 500238
MSn retention times: -1:39 - 30:21 minutes
- - - Processing information - - -
Data loaded [Fri Mar 30 17:59:13 2018]
Filter: select MS level(s) 1 [Fri Mar 30 17:59:18 2018]
MSnbase version: 2.4.2
- - - Meta data - - -
phenoData
rowNames: AE1319_A001_QE001613.mzXML AE1319_A002_QE001672.mzXML ... NetTrapQC107_QE002304.mzXML (623
total)
varLabels: sampleNames
varMetadata: labelDescription
Loaded from:
[1] AE1319_A001_QE001613.mzXML... [623] NetTrapQC107_QE002304.mzXML
Use 'fileNames(.)' to see all files.
protocolData: none
featureData
featureNames: F001.S0001 F001.S0005 ... F623.S3285 (500238 total)
fvarLabels: fileIdx spIdx ... spectrum (28 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
- - - xcms preprocessing - - -
Chromatographic peak detection:
method: centWave
28322726 peaks identified in 623 samples.
On average 45462 chromatographic peaks per sample.
Alignment/retention time adjustment:
method: peak groups
Correspondence:
method: chromatographic peak density
106874 features identified.
Median mz range of features: 0.0052374
Median rt range of features: 15.996
19590141 filled peaks (on average 31444.85 per sample).
Is there a way for me to run CAMERA in parallel for step other than the peak annotation? I understand it makes a snow cluster for that but it doesn't seem there are built in ways to run the isotope finding, groupCORR, ect in parallel.
Are there other steps I should take to manage the size of the xset?
You have 106,874 features. That is an insane number. About 10 times what I think is reasonable to get. CAMERA chokes trying to build a network between all these features. With 800 pseudo spectra after the first CAMERA step you have an average of ~ 1000 features in each group. Some group might be much larger.
You should focus on understanding why you get so many features. It looks like peak picking issues since each sample have a very high number of peaks. Check the sanity of your settings. On the top of my head these are the things that could lead to that:
1) too low ppm (split peaks)
2) too low allowed minimum peak width (spikes could get picked)
3) too low allowed maximum peak width (split peaks)
4) No prefilter or too liberal settings (picking noise)
5) noisy data in general
6) contaminants with persistent presence together with liberal settings
7) Continuum mode data instead of profile mode data (important!)