Skip to main content


This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - Jan Stanstrup

Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Yes the loadings do suggest that. But since you don't see that in your boxplots I think the pattern is too complex to appreciate there. So I think you need to look at some individual compounds. The corrections methods always work in each feature individually for the same reason.
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
I did not know you had done Obiwarp. If it works well then it could be fine.

Looking at intensity distributions IMO won't tell you enough. As illustrated in the Brunius paper features behaves differently and some remain constant. I would look at a few compounds I know and see how they behave across your batches.

Looking at the loadings of your PCA might also give you a clue. You should be able to see if your batch difference is dominated by a few features or it is all features moving in one direction.
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
You can set `binSize` in `PeakDensityParam`. It is set in in Dalton and you cannot set it in ppm. The default is 0.25 Da so it is unlikely to help you. You should probably set it much lower.
With such a large dataset it is very likely to suffer from big intensity drift issues you'd need to fix. XCMS does not provide tools for that but the papers you list have some tools.

A few other observations:
  • setting `minFraction` to 1/n is extremely aggressive. I assume you get a lot of features. `minSamples`/`minFraction` are critical for getting rid of false positives in the peak picking
  • are you sure `bw` is right? That means you don't expect more than ~1/60=0.017 min retention time difference between all your samples. Setting that so low might also give you batch effects if you have retention time differences between batches.
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
MZMine is know to use a lot of memory. I imagine that is where your bottleneck is. But you should check that.

XCMS is much more memory efficient. Be aware that each core will use a certain amount of memory. So on a system like yours not using all cores will use less memory and might save you if memory is your bottleneck. Also don't use 80 cores on processes that are bottlenecked by HDD reads (like reading the raw data).

That said, with 10,000 samples you really need to be careful about how greedy you need to be in terms of how low in intensity you want to pick.
Tools / Re: Data from waters - mass measure in centroid mode
What values are you comparing? How do you get a single m/z value from the profile mode data to compare to?
So there is the profile mode data, Waters centroided m/z and the msconvert centroided m/z. The last two will be different due to different centroiding algorithms. The documentation says the CWT method is not very good. You could use Waters centroiding (if that is the one that is good?) if you centroid in masslynx first (to new raw file) and then convert without any additional processing.

Alternatively the R package MSnbase might have more advanced alternatives:
XCMS / Re: Gaussian shape peak filtering
Some example data would probably make it a lot easier to help.
What is the error?
If you post the output of
it might give a hint about the problem.

One comment: With XCMS3 you make an object with the raw data before the XCMS object. Instead of re-reading the raw files in your function you could reuse this object.
XCMS / Re: Error when executing xcmsset
center is the correct argument for retcor.obiwarp.
XCMS is transitioning to a new interface (new functions with different arguments, but that basically does the same) where centerSample is an argument to ObiwarpParam. That it doesn't work with center=NULL but does when you don't specify might be a bug; it should work I think.

I think the reason you don't get a plot is because you have specified plottype = c("none","deviation"). You should only use one of the two. Otherwise the first option is used i.e. none.

Yes, default values are used if you don't specify explicitly.
XCMS / Re: Error when executing xcmsset
family is a parameter for retcor.peakgroups not for retcor.obiwarp (e.g. method="obiwarp"). See the help of the individual methods.