Skip to main content


This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - Jan Stanstrup

XCMS / Re: Consequences of using Centwave for profile mode data
Gusses of what will happen:
* She uses a wide mz window in peak picking to get the whole peak inside. In effect treating it like it has much less resolution
* She uses a normal mz window (ppm) and each peak is split in many pieces. Each real peak is represented by many features in her feature table. Probably there is many more features than you would expect.

It is hard to imagine you could do this without noticing. Perhaps they put the raw data up but forgot to say that they centroided?
XCMS / Re: Side/ Partial Peak artifacts
For the orbitrap like shoulder peaks I wrote a filter, xcmsRaw.orbifilter, you can find here
It runs through each scan in a file and starts with the largest mass and eliminates everything in a set mz range around that peak that is smaller than some fraction of the main peak.
It runs through all peaks in the scan until all peaks have been "evaluated".

Currently you'd need to do this on all the raw files and write them out to a new set of raw files.

As for the chromatographic side peaks I am not sure you can fix that with parameters (apart from the grouping you already mentioned). Also in your plots they look like to me to be legitimate additional small peaks so I don't see what the peak picker should be doing differently. You have fronting and tailing peaks and I guess that will always be problematic.
Your ~447 peak is very noisy. If your data in general looks like that the matchedfilter algorithm could give you better results. It generally is more robust to noisy data.
Courses and training / [COURSE] Introduction to Nutritional Metabolomics
Course dates
01 July 2019 - 05 July 2019

Place: Copenhagen

Info, sign-up, program 

The course will provide an overview on LC-MS based untargeted metabolomics and its application in nutrition. It will be delivered using a mixture of lectures, hands-on data preparation and analysis, computer-based practical sessions, and discussions. Visits to wet labs and instructions on human sample preparation procedures is included but with minimal hands-on.

The students will go through common steps in a typical metabolomics study using a real-life case. This case study includes collected plasma (or urine) samples from a nutritional intervention. The sample preparation and analysis on UPLC-QTOF has been conducted and the students will further process and analyse the acquired data with various free-ware tools (MZmine, Workflow4Metabolomics and Metaboanalyst). They will finally work on identification of relevant metabolites using several web-based structure elucidation tools. The course will finalize by presentations of reports generated by the students based on the case study.

The course will be structured as initial short lectures on theory followed by hands-on exercises which will teach the students to transfer the theoretical information to practice.

Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Yes the loadings do suggest that. But since you don't see that in your boxplots I think the pattern is too complex to appreciate there. So I think you need to look at some individual compounds. The corrections methods always work in each feature individually for the same reason.
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
I did not know you had done Obiwarp. If it works well then it could be fine.

Looking at intensity distributions IMO won't tell you enough. As illustrated in the Brunius paper features behaves differently and some remain constant. I would look at a few compounds I know and see how they behave across your batches.

Looking at the loadings of your PCA might also give you a clue. You should be able to see if your batch difference is dominated by a few features or it is all features moving in one direction.
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
You can set `binSize` in `PeakDensityParam`. It is set in in Dalton and you cannot set it in ppm. The default is 0.25 Da so it is unlikely to help you. You should probably set it much lower.
With such a large dataset it is very likely to suffer from big intensity drift issues you'd need to fix. XCMS does not provide tools for that but the papers you list have some tools.

A few other observations:
  • setting `minFraction` to 1/n is extremely aggressive. I assume you get a lot of features. `minSamples`/`minFraction` are critical for getting rid of false positives in the peak picking
  • are you sure `bw` is right? That means you don't expect more than ~1/60=0.017 min retention time difference between all your samples. Setting that so low might also give you batch effects if you have retention time differences between batches.
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
MZMine is know to use a lot of memory. I imagine that is where your bottleneck is. But you should check that.

XCMS is much more memory efficient. Be aware that each core will use a certain amount of memory. So on a system like yours not using all cores will use less memory and might save you if memory is your bottleneck. Also don't use 80 cores on processes that are bottlenecked by HDD reads (like reading the raw data).

That said, with 10,000 samples you really need to be careful about how greedy you need to be in terms of how low in intensity you want to pick.
Tools / Re: Data from waters - mass measure in centroid mode
What values are you comparing? How do you get a single m/z value from the profile mode data to compare to?
So there is the profile mode data, Waters centroided m/z and the msconvert centroided m/z. The last two will be different due to different centroiding algorithms. The documentation says the CWT method is not very good. You could use Waters centroiding (if that is the one that is good?) if you centroid in masslynx first (to new raw file) and then convert without any additional processing.

Alternatively the R package MSnbase might have more advanced alternatives:
XCMS / Re: Gaussian shape peak filtering
Some example data would probably make it a lot easier to help.
What is the error?
If you post the output of
it might give a hint about the problem.

One comment: With XCMS3 you make an object with the raw data before the XCMS object. Instead of re-reading the raw files in your function you could reuse this object.