I think I found an unfortunate behavior that is causing fillpeaks to need an excessive amount of memory.
The problem seems to be here:
argList <- apply(ft,1,function(x) list(file=x["file"],id=as.numeric(x["id"]),
params=list(method="chrom",
gvals=gvals,
prof=prof,
dataCorrection=object@dataCorrection,
polarity=object@polarity,
rtcor=object@rt$corrected[[as.numeric(x["id"])]],
peakrange=peakrange)
))
gvals is repeated for each sample as far as I can see.
This is the object I used for testing (we have a bigger set where it runs out of memory):
An "xcmsSet" object with 456 samples
Time range: 0.6-361 seconds (0-6 minutes)
Mass range: 60.0429-999.9048 m/z
Peaks: 1050355 (about 2303 per sample)
Peak Groups: 4232
Sample classes: data_converted_pos
Profile settings: method = bin
step = 0.005
Memory usage: 266 MB
So gvals is 4232 x 456 and takes 7.6 MB. When it is repeated for each sample that eats 3.5GB alone in this case and would increase to the power of 2 when you increase the number of samples.
If I understand correctly gvals is the same in each index of the list. So can the structure of argList be changed to only have one copy?
edit: not to mention that the memory requirement is multiplied by the number of cores you want to use for gapfilling :shock: