Skip to main content


This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - CoreyG

XCMS / Re: How to run featureSpectra (or another function) on a subset of samples
Great work, Taylan. You've made good progress considering the minimal/incomplete advice I gave. However, I'm sorry you still haven't got a working solution.

I'm been travelling and haven't had much internet/computer access. I'll be able to put together a more complete solution mid next week.

Regarding the second solution, it helps to know that peakidx in featureDefinitions corresponds to the row numbers of chromPeaks (not the row.names). This means that deleting rows in chromPeaks will misalign chromPeaks and peakidx.
The simplest solution is to add a column to chromPeaks that contains the row number. cp<-cbind(cp,rowid=seq(nrow(cp))). Then filter chromPeaks, then 'match' peakidx to the temporary column. match(fd@listData$peakidx[[i  ]], cp[,"rowid"]).
This gives you the new peakidx to use. Note that this will probably contain NAs for the missing samples. NA.omit() might be useful to get rid of them.
For consistency, it might be good to overwrite the row.names of cp (not sure if it matters).

This could take a while as match doesn't consider the rowid is a sorted vector. If cp or fd are large, you are doing a whole heap of value lookups.

Regarding solution 1, internally xcms calls a (bioconductor?) Parallel version of lapply. So each file returning 'res' just gets built up into a list. Looking at the xcms code on github might make this simpler to understand than my rambling  :D

This should get you over the line. If not, I'll be able to give you some more help mid week. It's also a good idea to compare the final featurevalue before and after doing this to make sure the results are consistent for the samples still present.
^ again, the above is all from memory. So you might need to use some judgment.

EDIT: looks like some of my code is being scrambled by the wysiwyg editor.
Other / Re: Converting .dat ASC II File with EI-MS spectrum
Hi Biswa,

I'd be happy to take a look, but could you post a smaller section of one of those files?

Do you want to convert the data to another format or extract the spectrum for specific compounds?

XCMS / Re: How to run featureSpectra (or another function) on a subset of samples
Hi Metabolon1,

A great deal of work was put into making the XCMS analysis reproducible. This ensures the same results could be obtained just by having the final object i.e. all the data and 'process history' is contained within.

I'll suggest a bit of a hack that you can try, which I'm not able to test at the moment, but hopefully it can help get you somewhere. There is almost certainly a proper way to do what you want, but I can appreciate the frustration that can ensue when things don't work the way you want  :D

This is basically following the calls that featureSpectra makes internally (skipping all the checks along the way...)
Code: [Select]
pks <- chromPeaks(xdata)
xdata_filtered <- filterMsLevel(as(xdata, "OnDiskMSnExp"), 2L)
## Split data per file
file_factor <- factor(pks[, "sample"])
pks <-, f = file_factor)
xdata_filtered <- lapply(as.integer(levels(file_factor)), filterFile, object = xdata_filtered)

## You then need to loop through xdata_filtered and pks for the samples you need. Each entry in xdata_filtered becomes 'x' and 'pks' is the corresponding entry in the pks list.
sps <- spectra(x)
pmz <- precursorMz(x)
rtm <- rtime(x)

## Make sure you define all the required parameters i.e. method = 'closest_mz'
res <- vector(mode = "list", nrow(pks))
for (i in 1:nrow(pks)) {
  if ([i, "mz"]))
  idx <- which(pmz >= pks[i, "mzmin"] & pmz <= pks[i, "mzmax"] &
                 rtm >= pks[i, "rtmin"] & rtm <= pks[i, "rtmax"])
  if (length(idx)) {
    if (length(idx) > 1 & method != "all") {
      if (method == "closest_rt")
        idx <- idx[order(abs(rtm[idx] - pks[i, "rt"]))][1]
      if (method == "closest_mz")
        idx <- idx[order(abs(pmz[idx] - pks[i, "mz"]))][1]
      if (method == "signal") {
        sps_sub <- sps[idx]
        ints <- vapply(sps_sub, function(z) sum(intensity(z)),
        idx <- idx[order(abs(ints - pks[i, "maxo"]))][1]
    res[[i]] <- lapply(sps[idx], function(z) {
      z@fromFile = fromFile
names(res) <- rownames(pks)

If that is sounding like too much stuffing around, you can save featureDefinitions, featureValues and chromPeaks. Filter the file to just the DDA samples. Then you can edit these to be internally consistent and write them back to xdata.

After all that, hopefully someone who knows what they are talking about comes in to give you the proper directions  ;)
MS-DIAL / Re: MS2 spectrum missing peak information
Hi Larissa,

I haven't used MS-DIAL or mzmine2 enough to give you definite answers about the software.
However, I do see that the precursor masses are different between the two images. Not by much, but enough to suggest that they aren't showing the same DDA product scans. I assume this is why they differ i.e. they are showing different scans at different retention times -> different abundances of coeluting peaks.

If the software is showing the 'best product ion scans' that match a library, the difference would be in how they have implemented spectral similarity. There are many parameters that can be tuned and I guess the authors have settled on slightly different ones.

I hope that helps.

Announcements / Re: Announcement of Opportunity
Congratulations to the EMN committee of 2019-2020!

Very happy to introduce the faces of the EMN committee for 2019-2020 👩‍🎓👨‍🎓. Our members are based across the world and they love all things metabolomics! Watch this space for all the exciting projects we are working on!

EMN Metabolomics Society (@EMN_MetSoc)

EMN Metabolomics Society Facebook Page
Conferences and seminars / EMN Webinar 20th November 2019 (15:00 UTC)
Coming up 20th November 2019 at 15:00 UTC (7:00 PST, 10:00 EST, 16:00 CET)!

Metabolomics as a tool for elucidating plant growth regulation

Rising demand for food and fuels makes it crucial to develop breeding strategies for increasing crop yield/biomass. Plant biomass production is tightly associated with growth and relies on a tight regulation of a complex signaling network that integrates external and internal stimuli. The main goal of our group is to elucidate the processes underlying plant growth and production of biomass by combining physiology, metabolomics, and gene expression analyses. In my presentation, I will provide examples of i) how the evolutionary conserved Target of Rapamycin pathway fine-tunes metabolic homeostasis to promote biosynthetic growth in plants; ii) the potential of metabolite profiles to predict plant performance as biomarkers.

Click here to learn more about this latest webinar by Camila Caldana (PhD).

Please register for “Metabolomics as a tool for elucidating plant growth regulation," to be held on 20th November 2019 at 15:00 UTC (7:00 PST, 10:00 EST, 16:00 CET) at:

After registering, you will receive a confirmation email with information about joining the webinar.

Brought to you by the EMN of the Metabolomics Society.

XCMS Online / Problems with XCMS Online? Look here
Dear Forum members and visitors,

We have been very happy to see users supporting each other to solve each others issues with XCMS Online. However, the number of users with detailed knowledge of XCMS Online has dwindled. This has left many unanswered questions in the forum.

To help those going forward, we recommend you to direct specific questions to the XCMS Online Contact Form.
It might also help to look at the online documentation.

For questions related to the R version of XCMS, please ask them in this board.
Mass spectrometry / Re: Experimental design for multiple-batch study
Hi djb17,

I feel I wasn't so clear in my last message.

Let's imagine we have 2 groups. We don't know if Metabolite X is different between the two groups, so we decide to measure it.
We run the first group in one batch and get an average concentration of 10.
We run the second group separately and get an average concentration of 20.

Does that mean the second group has twice the concentration of the first? That depends on how reproducible (accurate) our measurement is.
Basically, we don't know if the difference we saw is due to the biological difference in groups or the way we ran the experiment (batches).

Someone suggests 'normalizing' the batches to each other. So we multiply the first batch by 1.5 to get an average of 15. We divide the other batch by 4/3 to get an average of 15. Now the batches are normalized, but we don't see any difference between the groups.

If we have randomized the samples beforehand, we might get an average of 15 in the first batch. This is because half the samples from group 1 were in there with half the samples of group 2.
What if the batch 2 average was 30?
We might say the reproducibility wasn't great, so we'll normalize the two batches. - for simplicity, we'll divide batch 2 by 2, so the average is 15.

Now we can compare our two groups and we see that group 1 average is 10 and group 2 average is 20. Success! We fixed an issue and didn't lose our biological information.

This has become standard practice and as you brought up, so is running pooled samples along the run.
This provides extra insurance and the ability to adjust for within-batch variability/drift.
However, the pooled sample does not need to be from the samples you are running! It just has to be representative i.e. don't run human plasma as a QC alongside algae extract.

We run 3 types of QC samples (not counting blanks). 1 pooled plasma QC (which we use for all our cohorts). 1 pre-extracted pooled QC that we use to monitor system stability. 1 pooled reference sample from NIST.

I hope that helps a bit.
If someone can explain it better, please chip in.
Mass spectrometry / Re: Experimental design for multiple-batch study
Hi djb17,

It is always a good idea to randomize samples across batches. There are special cases where you might want to perform block randomization, such as what Jan described.
I guess you could say that the goal is to make systematic/technical variation orthogonal to biological variation.

Given that, how different are the distributions?
Say there are 3 conditions, where all samples were randomly assigned to. It would be fine to have one batch with 30/40/50 samples from each treatment (assuming they are run in a randomized order). However, it wouldn't be good to have a distribution of something like 50/0/60. Even worst would be 10/0/100.

For pooled QCs. It's a good idea to create the pooled samples and put them with the biological samples as soon as possible. This ensures they will capture as much technical variation as possible i.e. all the variation from sample storage, defrosting, extraction, running etc.
This is especially important when batches are processed separately (extracted separately or stored in different freezers).

XCMS - FAQ / Re: XCMS - MS1 scans empty
I didn't have any readily available Agilent QqQ precursor ion scan data on hand, but could probably generate some next week if required.

Anyway, this is an old jank solution I was using to "convert" MS2 data to look like MS1 data:
Code: [Select]
## Load mzR library

## Open the mzML file that you want to convert

## Get spectra data
pks <- spectra(dat)
## Get file header
hdr <- header(dat)

## Remove all scans but ms2 scans

## Take a quick look and make sure everything looks ok

## Provide new scan/acquisition numbers

## Clear all precursor charge/Intensity/MZ/ScanNum columns
## Not sure if this is required and how precursor ion scans will differ

## Overwrite msLevel to 'pretend' to be MS1 data

## Write out the new 'MS1' data
There are a few fields that still contain MS2 information, so I'm not sure if they will conflict with anything downstream.
So, I'm not sure how well this will work for you. But let everyone know if it's helpful.

Good luck!
XCMS - FAQ / Re: XCMS - MS1 scans empty
Very cool resources, Jan.

AmelieV, if you only have one product ion per file, it's probably sufficient to "pretend" the MS2 data are MS1 scans.
Pretty sure I have a solution for that somewhere and some precursor ion scan data to test it on. I'll see if I can dig it up.

However, if you have already found the precursor masses, it's probably quickest to setup a Masshunter Quant or Qual method to integrate all the peaks (assuming this is what you are interested in?)
XCMS - FAQ / Re: XCMS - MS1 scans empty
Hi AmelieV,

You can find some quick notes and screenshots at the following sites: and

There are probably more detailed descriptions/tutorials around, including on this forum. However, I've found that XCMS is fairly robust to most (reasonable) settings in MSconvert.
I'd say the most important thing is to have 'peak picking' as a filter and 'MS levels' with a 1 in the first position (indicating you want MS1 centroided). Both of the links above show this.
The difference in settings between the links shouldn't affect XCMS function (mzXML vs mzML, 32 vs 64 bit, zlib compression).

Hope this can help you get things working.
R packages for metabolomics / Re: CliqueMS new R package for the annotation of adducts and fragments in LC-MS
Thanks for sharing, Osenan.

I've been using different packages for annotation of isotopes, adducts, dimers and fragments.
There are, of course, many ways that this can be done. From using just exact mass;  examination of co-elution peak shape; and correlations of peak areas.

I'd be interested in looking at how your network algorithm can improve our annotations.

Look forward to some interesting discussions in the future!

Mass spectrometers / Re: Suggestion to purchase of a high-resolution MS
Hi Sam,

I don't have a lot of HRMS experience, but I've gradually been doing more. Our focus is high-throughput, so robustness is something we care about as well. As a general rule, the top spec instruments tend to be less robust than the lower tier models, although there are always exceptions.

So far I've used an Agilent qTOF (6540) and two Thermo Orbitraps (Fusion Lumos and HF-X). All of these were setup to do analytical flow LC-MS.
The Agilent system had quite a few boards replaced over a few years, but I can't say that the instrument was looked after particularly well.
The Fusion Lumos ran well for over 1000 continuous samples (plasma). But other than that, I am not sure how robust the machine is outside of that.
The HF-X had a bad reputation for 'dirtying', but Thermo says they have fixed this in the current generation. If you are willing to wait a few months I can fill you in on how it goes :)

For the work that I was doing (lipidomics), the much higher resolution given by the orbitrap was very useful. Overall sensitivity was higher with the orbitraps as well - but we are comparing 2 very new instruments to a much older one.
However, both the orbitraps have very rough ion funnels. This causes in-source fragmentation of fragile molecules. This isn't exclusive to Thermo instruments.
Most vendors are aware of this, so you should bring it up with them if this is a concern.

Regarding quantitation, there are different levels of quantitation that people expect. Most metabolomics/lipidomics people have a relaxed view on quantitation. That is, there are caveats and assumptions that everyone excepts can't be resolved (not enough internal standards, shotgun vs LC...)
Most newer instruments provide a fairly decent dynamic range. Certainly much more than the expected variability you would see in a single metabolite in a group of people. So I guess it's important to ensure that sample prep is correct to put the right amount of 'stuff' in the machine.

I'm sure others can chime in with some more experiences and knowledge.