Skip to main content

Topic: Peak alignment with large dataset (over 2500 samples and growing) (Read 632 times) previous topic - next topic

  • metabolon1
  • [*]
Re: Peak alignment with large dataset (over 2500 samples and growing)
Reply #15
Thanks all!!!

johannes.rainer, I tried your suggested code. It seems that my largest files are ~150MB. Using this number to make a conservative estimate, I would need about 12GB of RAM to hold 78 raw data files in memory, which is well below the 350+ GB on the server. But as you said, there are also other processes/objects that need RAM.

CoreyG mentioned that 4 threads keeps the RAM below 16 GB on a low spec desktop. So roughly 4 GB per core, which is much higher than the 150 MB core estimated above. But also, you mentioned that the fillChromPeaks is the most memory intensive process in your script, requiring a limit of 1 thread on a 32GB system. I've also noticed that fillChromPeaks is the step that takes the longest.

It does seem like I'll need to do some optimization on our system. Any advice on how to determine how stable the system is with a given number of cores? I don't mind starting with a large # of available cores and working down, as CoreyG suggests. However, for various reasons, I do not want to test stability by whether or not it crashes the server. Do you think that using the 'ulimit' approach to limit RAM would help to prevent the server from crashing? Can an I/O bottleneck cause a crash?

Perhaps I could work through the script step by step, optimizing the number of cores at each step before moving on to the next...