Skip to main content
Topic: memory error during fillpeak step (Read 18854 times) previous topic - next topic

Re: memory error during fillpeak step

Reply #15
bump?


I also find that  fillpeaks seem now to "loose" the intb column. I couldn't find the cause though.
Blog: stanstrup.github.io

Re: memory error during fillpeak step

Reply #16
fillPeaks seems to loose all values except mz, mzmin, mzmax, rt, rtmin, rtmax, into, maxo and sample.

Is this somehow related to R3.0?

any solutions?

br
Gunnar

Re: memory error during fillpeak step

Reply #17
Hi,

Quote from: "gunnar"
fillPeaks seems to loose all values except mz, mzmin, mzmax, rt, rtmin, rtmax, into, maxo and sample.
I think that problem was fixed with

CHANGES IN VERSION 1.37.1
--------------------------
BUG FIXES
    o fixed fillPeaks, which 1) dropped non-standard columns
    and 2) failed if nothing to do, based on patches by Tony Larson.

Or is it still present there ?

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: memory error during fillpeak step

Reply #18
I just wanted to refresh this thread with my own memory problems:
Windows 7 machine, 64 GB RAM
R 3.0.2
xcms 1.38.0

> xset
An "xcmsSet" object with 1610 samples

Time range: 1.7-1205.4 seconds (0-20.1 minutes)
Mass range: 55.0152-1199.8425 m/z
Peaks: 7274098 (about 4518 per sample)
Peak Groups: 6469

Memory usage: 1360 MB


xset <- fillPeaks.chrom(xset, nSlaves=2)
Error: cannot allocate vector of size 39.7 Mb

Windows task manager has the memory full upon failure, which gc() cleans up.

Re: memory error during fillpeak step

Reply #19
Hi,

how far do you get with nSlaves=1 ? The parallel fillPeaks currently
passes the xcmsSet down to all slaves, and the slaves operate on a subset
of them. The more clever way would be to pass trimmed xcmsSets to the slaves,
so the memory requirement is not mutliplied by the number of slaves.

Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: memory error during fillpeak step

Reply #20
Quote from: "sneumann"
Hi,

how far do you get with nSlaves=1 ? The parallel fillPeaks currently
passes the xcmsSet down to all slaves, and the slaves operate on a subset
of them. The more clever way would be to pass trimmed xcmsSets to the slaves,
so the memory requirement is not mutliplied by the number of slaves.

Steffen
Though this is true the biggest problem is that the data is repeated for each sample...
Blog: stanstrup.github.io

Re: memory error during fillpeak step

Reply #21
nSlaves=1 still fails.  I don't know how far it gets, but the memory climbs pretty quickly until I receive the same error message as before.

Re: memory error during fillpeak step

Reply #22
Hi,

It would be great if someone could test xcms prior to 1.35.4, e.g. 1.34.0 from
http://bioconductor.org/packages/2.11/b ... /xcms.html

That would be a way to rule out that my changes for parallel fillPeaks()
cause the problem. If 1.34.0 is fine, I'd need to think about an optimisation
for the parallel case.

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: memory error during fillpeak step

Reply #23
Steffen et all,

I just started testing the fillPeaks step using R v 2.15.2, xcms v1.34.0.  I haven't completed the process yet, but I am 99.9% certain that this version will complete the fillPeaks step on the dataset I began earlier.  I am ~10% through the data files and the memory usage by R has hardly changed over that time.
I will update this when I know for certain whether it succussfully finished.
Corey

Re: memory error during fillpeak step

Reply #24
It did succeed - seems like the newer versions of fillPeaks is rather memory inefficient.

Re: memory error during fillpeak step

Reply #25
I have a suggestion for a quick fix: using an environment to pass gvals. this way it is not repeated one time per sample. And the code change is only minimal.
In my test with 600 samples this used 140MB instead of 10.4GB on argList.

Code: [Select]
gvals_env <- new.env(parent=baseenv())
assign("gvals", gvals, envir = gvals_env)

argList <- apply(ft,1,function(x) {
  ## Add only those samples which actually have NA in them
  if (!any(is.na(gvals[,as.numeric(x["id"])]))) {
    ## nothing to do.
    list()
  } else {
    list(file=x["file"],id=as.numeric(x["id"]),
        params=list(method="chrom",
                    gvals=gvals_env,
                    prof=prof,
                    dataCorrection=object@dataCorrection,
                    polarity=object@polarity,
                    rtcor=object@rt$corrected[[as.numeric(x["id"])]],
                    peakrange=peakrange))
  }
})


fillPeaksChromPar would then need to do:
Code: [Select]
 gvals <- params$gvals$gvals
instead of
Code: [Select]
 gvals <- params$gvals
I don't know if there is a away around this. Is this used by any other functions? if not it seems like a fast fix though my knowledge of environments are very limited so I don't know if this has any unforeseen consequences.
Blog: stanstrup.github.io

Re: memory error during fillpeak step

Reply #26
Thank Jan, this sounds impressive!

We now have also created the github-bioc bridge
at https://github.com/sneumann/xcms
could you send this patch as a pull request ?

Thanks in advance,
yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: memory error during fillpeak step

Reply #27
Done. I also managed to compile xcms and test the function directly. It appears to work as intended :)
Blog: stanstrup.github.io