Skip to main content
Topic: looking for xcms setup help for untargeted metabolomics on a (Read 11046 times) previous topic - next topic

looking for xcms setup help for untargeted metabolomics on a

Sorry complete n00b here:

I am using ye olde LCQ DECA plus, a low res iontrap, from Thermo with a conventional HPLC doing untargeted metabolomics on GMO plant samples that have purposefully altered metabolisms (often randomly so with heterologous 2ndary metabolic genes).

I was originally doing all the data analysis by hand but as the sets got larger this became extremely impractical, time consuming, and generally painful so I switched over to xcms using mostly some pre-made recipes.  Things seemed to be working well for a while, but we more recently got a small data set and so a coworker and I attacked it both by hand and with xcms so that we could compare results.  We were happy that xcms found a few things that we didn’t by hand but were appalled that it was missing some very significant critical features that we did discover by hand.  Xcms is a powerful tool so I have no doubt that if I set it up correctly it would find everything.  But I have flipped through the manual and played around with many settings (my findings outlined below) but I still cannot get the behavior from xcms that I would like, hence why I am here imposing on the experts.

My general xcms workflow is very light.  I pretty much only use “xcmsSet”, “group”, and the “retcor" functions before outputting a “diffreport” and going on from there by hand.  So the first thing that I need to debug is my use of xcmsSet.  Traditionally I just started with:
Code: [Select]
A<-xcmsSet(step=0.5)
which seemed to give fairly good results until we did that comparison with hand analysis (and yes this iontrap’s resolution and consistency is so low that the “step=0.5” is unfortunately representative of the data and thus very necessary.)  Most of the significant features that we’re missing relative to the hand analysis had two things that I thought might be causing xcms to miss them, 1) they were very narrow despite their significant height and our old failing HPLC (their peak half-widths were under 3 seconds), and 2) they only appeared in one set of plant constructs so that they were only represented in like 10% of the traces within one group (which ironically means they are far more important then features common across more of the traces.)  So for issue “1)” it seemed like the solution would be to play with the “fwhm” variable in “xcmsSet”, and for issue “2” it seemed like the solution would be to play with the “minfrac” variable in the “group” function.

So I tried
Code: [Select]
A<-xcmsSet(fwhm=3, step=0.5)
but while the eventual report now contained the missing critical narrow peaks, it lost even more of the critical thicker peaks, and also caused a large number of problems with grouping and quasi-redundant peaks which I will get to in a bit.

So I tried something in between like:
Code: [Select]
A<-xcmsSet(fwhm=10, step=0.5)
Which gave a bit cleaner final output but can miss both some of the narrowest peaks and some of the thickest.

I don’t really understand what it means or does but I noticed that the “fwhm” variable can take a range so I tried a few things like:
Code: [Select]
A<-xcmsSet(fwhm=c(8,25), step=0.5)
but in addition to creating over 50 errors on building the xcmsset object for certain data sets (works fine in others though) it apparently completely wrecks up grouping, because while grouping itself seems to proceed more or less normally subsequent calls to the “retcor” function produce errors like:
Code: [Select]
Error in match.arg(method, getOption(“BioC”)$xcms$retcor.methods) :
  ‘arg’ should be one of “obiwarp”, “peakgroups”

One thing that kind of worked … kind of :? … was to set the signal to noise threshold (“snthresh” variable) low and then compensate by setting the “max” variable high so that there would be enough recounts of ions to cover all the extra noise looked at by xcms.  So trying something like
Code: [Select]
A<-xcmsSet(fwhm=3, step=0.5, max = 20, snthresh=2.5)
Does find all the critical peaks identified by hand analysis but it makes a horrible terrible mess too.  For starters the resulting xcmset object is such a disaster that the group function either chokes or otherwise doesn’t do much, and the retcor function basically doesn’t work at all; either having no grouped peaks to work with or dropping out with multiple errors about more esoteric faulty aspects of the data that I don’t understand.  Further in addition to the tremendous amount of just garbage/noise that this puts into the eventual diffreport, most of the features are significantly fragmented and appear to be tracked over completely unreasonable amounts of time.  Which is to say that if one takes the resulting diffreport and groups the data by average retention time (“rtmid”) and then m/z one finds that most real features are now registered as like six or more features with the real data seemingly bined seemingly at random between this semi-redundant peak registries which together formed sort of a gausien with the most intense and frequently integrated version of the feature at the center of all the redundant copies each of which got less intence and less frequently used for binning integration as they spanned out from the major center one.  Worse still was the range over which xcms was tracking these semi-redundant fragment features.  Looking at the actual data most of the real peaks only varied about 4 seconds or so over the whole data set, but for ever feature the difference between “rtmin” and “rtmax” for each fragment peak was 2 to 4 minutes even though each fragment was only a few fractions of a second from its nearest faster or slower fragment neighbor of the same feature.  This is somewhere between ugly and disastrous.

I kind of figured part of the issue might be setting the “max” variable so high, but if I drop it important peaks disappear.  It is pretty easy to understand why, the injection peak at the start of all my traces, and the equilibration peak at the end are complex “ion rainbows” so as “fwhm” and/or “snthresh” get smaller the number of “features” xcms identifies in these ion rainbows explodes.  Indeed with the setup as before:
Code: [Select]
A<-xcmsSet(fwhm=3, step=0.5, max = 20, snthresh=2.5)
well over 75% of my features end up being in the two “ion rainbows” in my diffreport, and I basically start all my analysis by deleting 75% or more of my “data” in these locations.  The obvious thing to try to compensate for this was the “scanrange” variable.  But any time I try something like:
Code: [Select]
C<-xcmsSet(step=0.5, scanrange = c(40,700))
I just get the following error:
Code: [Select]
Sample1: Error in .local(object, …) :
  unused argument(s) (scanrange = c(40, 700))
Perhaps this is for the best as some of the only things that the “retcor” seems to think are okay to lock onto to do alignment with are in these starting and finishing ion rainbows.  This is a bit frustrating though as that is basically garbage (though perhaps it is reproducible garbage)  But there should be better peaks to track as all the base plants are the same, so even before the addition of our internal standards to each sample, all the samples have a wide range of metabolites at many points along the chromatogram that are consistent across the whole data set with a variability in retention time of only a couple of seconds, and yet the retcor function refuses to acknowledge any of them for grouping or alignment.

Well this post is already getting really long so I guess I’ll save discussion of my ham-handed abuse of the “group” function for another post and just focus on asking: any thoughts on how I could best be setting up xcmsSet to find all interesting and significant features without making a total has of things?

Indeed please don’t think I am asking people to read and address all the questions in here, just any advise on any aspects that people can think to suggest would be very much appreciated.

 Thanks.

Re: looking for xcms setup help for untargeted metabolomics

Reply #1
Hi, Nat S.

I'm not sure what you mean by finding features manually. How many features are we talking about that you would find by hand? I typically find thousands of mass features in my data after processing by XCMS, and the thought of looking for those manually makes me think I'd look for a new job first! Are you sure that XCMS is the right approach for you? If you're doing a targeted analysis where you just have a bunch of compounds that you're looking for, it might be better to just set up a targeted method using the instrument manufacturer's software.

If you're not finding mass features that you expect to find, i.e. mass features that you're certain are real, there are a few places where you could have settings that are off. First, your xcmsSet parameter: It sounds like you have an older instrument. Is it giving you profile data? If so, then the default method for findPeaks, which is what you've got, is probably fine. You might try tightening the step size to something smaller, but I defer to your expertise on how your own instrument performs in terms of resolution.

I don't use the findPeaks default method within xcmsSet, so I'm not terribly familiar with it. How about trying a different algorithm? Have you tried centWave? You could set it up like this:
Code: [Select]
xcmsSet(Samples, method = "centWave",  snthresh = 3, ppm=1000, peakwidth=c(2,20), mzCenterFun="wMean",integrate = 1,fitgauss= TRUE)
That's a pretty wide chromatographic window, and the gaussian fit is somewhat flexible, to my understanding.

For the grouping question, I prefer at this point in my data analysis to leave out any filtering by how frequently a mass feature is present. Instead, I filter later, once I've already got my xcms dataset and I'm ready to do some statistical analyses. An example of how I run the group algorithm where "Data.RTcor" is my RT-corrected dataset:
Code: [Select]
group(Data.RTcor, method="density", minsamp=1, mzwid=0.004, bw=10, max=100)
One last thing you didn't address: I ALWAYS do recursive peak filling. There have been times that recursion has found peaks that xcms missed the first time around. The code is really straightforward:
Code: [Select]
fillpeaks(Data)

Good luck!

Laura Shireman

Re: looking for xcms setup help for untargeted metabolomics

Reply #2
Hi Laura,

    Thanks for the response.

    No we are doing untargeted metabolomics.  When I say finding features manually I mean finding features of a meaningful size and statistical significance.  We are talking about dozens not thousands, but they are dozens of meaningful ones, which xcms might represent as six or more each due to mass fragmentation and what not that is obviously all part of the same real feature when done by hand.  Done by hand this is still very very hard and time consuming even on our smaller data sets.  The things we found by hand that XCMS missed were new features for just a few samples in that new (but smaller) data set, but again this is all untargeted, and what we are looking for is new in almost every data set.

  I got the advise to try the centWave peak finding algorithm in my other more specific post in the same board here: (http://metabolomics-forum.com/viewtopic ... 93f63dde0f).  Unfortunately I am not having any luck with the centWave method.  I did figure out that I need to set the ppm such that 800 > ppm > ?450 via some experimentation and analysis for the centWave function including some suggestions from others.  But the resulting xcmsSet object I get out still doesn't group and thus still won't do a retcor.  I usually use the fillPeaks method too at the end, but that didn't solve my old problem by finding the missing peaks that could be located by hand pre centWave, and with centWave I can't get through grouping or retcor successfully so I haven't gotten far enough to play around with fillPeaks yet.

  I will play around with your suggested "xcmsSet" and "group" code suggestions though (after adjusting the ppm), that might solve something.  I'll get back to you when I have some results, good or bad in relation to this.

Re: looking for xcms setup help for untargeted metabolomics

Reply #3
Hi Laura I am running some tests now, but I am having one issue with your code up front.

I had to cut out the: mzCenterFun = "wMean" because whenever I include that in xcmsSet I get: Sample1: Error in .local(object, ...) : unused argument(s) (mzCenterFun = "wMean")  I checked the xcms manual and that command and option are definately there and I spelled them correctly, but I can't get xcmsSet to run with them.  The only call I am making before this is library(xcms) so I shouldn't be messing anything up from its default state.  Perhaps that is the issue, I need to do more setup before I can pass mzCenterFun = "wMean" to xcmsSet?  Any theories?  Currently I am running your exact code but with ppm = 600  and also just removed assigning anything to the mzCenterFun parameter and it seems to be working reasonably well though it is taking a while.  Any theories on what I might be doing wrong with the mzCenterFun = "wMean" assignment?

(random P.S apparently 600 ppm is now too big as I am getting a few mass insertion errors even though that works great on the same data set without your other suggested parameters for the centWave function, what has changed that might influence this?)

Re: looking for xcms setup help for untargeted metabolomics

Reply #4
Hi Laura,

  Unfortunately even in playing with the parameters to your centWave based xcmsSet setup I can't ever get an xcmsSet object that will either group or undergo retention correction at all.  Various setups for the group function all run and give the illusion of chugging along correctly, but the object they act on never has anything other then a <0 x 0> matrix when grouping is finished, and all attempts at retcor just fail utterly with errors.

  The problem can't be the data set as I can get the mediocre and unideal (but still useful) results that I was complaining about above by using the default matched filter method on this exact data set and then group it which works alright.

  Also the weird thing with the centWave setup you recommended is, as I mentioned above, that it gives peak data insertion errors even at ppm values below what were ideal before.  For instance if I do A<-xcmsSet(method="centWave", peakwidth=c(3,20), ppm=700)  it runs fine on this data set (though still produces something that won't group, retcor, or do anything else).  But when I am adding in your integration parameters (like "fitgauss=TRUE") then even going as low as "ppm=525" I get peak data insertion errors and am told to lower the ppm value.  This is bad as going lower may start to cross the actually threshold for error on this machine as best I can measure it which will make a real serious hash of things.

  Thoughts?

Re: looking for xcms setup help for untargeted metabolomics

Reply #5
Through trial and error I have worked my way down to ppm=400 and I am still getting a few (but only a few) data insertion errors.  So now the ppm value is below what is the measurable error in large key peaks (like the caffeine internal standard) and I am still getting data insertion errors, and nothing groups to produce anything other then an empty matrix or undergoes retention correction without errors.

  In conclusion I simply can get anything to work like this even with exhaustive trial and error.  Any thoughts or suggestions for getting this working?

Re: looking for xcms setup help for untargeted metabolomics

Reply #6
Hi, Nat.

Sorry, I should have been checking back here more frequently!

I have just a few thoughts on your issues. First, I wonder if the mass accuracy of your instrument is just too high to work well with XCMS. As someone working in an academic lab, I certainly understand limitations on how nice your instrumentation is, but 400-800 ppm is just awfully high to be doing metabolomics. Most instruments people use for metabolomics would have better mass accuracy than what you're stuck with, and I wonder if there's something in the code for xcms that just doesn't accommodate such a large imprecision in mass measurements. Probably, when Colin Smith, Ralf Tautenhahn, et al. coded xcms, they did so with the mindset that people would be using instruments with relatively good mass accuracy. Have you tried any other metabolomics peak picking and peak alignment software? Some other options: MSInspect SmallMolecule, MetAlign, MZMine. I'm not terribly familiar with them as I've been pretty happy with xcms.

Second, you mentioned that some of your problem peaks are really narrow; I was just rereading Tautenhahn 2008 BMC Bioinformatics, and starting on p. 4 of that paper, they talk about the advantage of fitting chromatographic data with varying widths to the Mexican hat wavelet they describe. This would be the parameter where you set integrate=1 within the xcmsSet command. On p. 8 of that paper, they say, "Optionally, a Gaussian curve is fitted to the feature, using the Nonlinear Least Squares (NLS) implementation of R." That must be gaussfit=TRUE within the same command. Have you tried setting gaussfit=FALSE? I'm not sure how the Gaussian fit works in conjunction with the Mexican hat wavelet function, but maybe they're not playing together nicely in your data. The Mexican hat wavelet sounds as though it IS what you want, though, based on my understanding of this paper and the nature of your data.

Third, your error with mzCenterFun="wMean": Have you tried mzCenterFun="apex"? How close to a Gaussian shape are your peaks? I suspect that the problem lies with your poor mass resolution. I don't know the code, but it probably isn't used to having to include such a large range of m/z to calculate the weighted mean of a single peak. Have you looked at plotRaw to get a visual representation of what your peaks look like on the mass and time axes? Here's an example:
Code: [Select]
xset.raw <- xcmsRaw("filename.mzdata.xml", profstep=0.01, profmethod="bin")
mzRange=c(512.2,512.3)
RTRange=c(705,740)
plotRaw(xset.raw, mzrange=mzRange, rtrange=RTRange, log=FALSE)
When I do this with my data, which was collected on a pretty good QTOF, I can clearly see that at the apex of the peak, I have the best mass accuracy, and at the sides, the mass accuracy is pretty low, which is what you'd expect. What do your data look like? (I'd insert a picture here, but I'm not sure how to do that with this site. It's asking for a URL for the image, and I'm not sure where I'd put it.)

Fourth, you asked about peak insertion errors. What do you mean? I wonder if the problem with the fitgauss parameter comes back to poor mass accuracy again.

I'm so sorry that this has been so frustrating for you! I can certainly relate, although I haven't had the exact problems you describe. Is there any way at all that you might get access to a better instrument?

Good luck. I'll try to be better about checking back here more frequently.

Laura

Re: looking for xcms setup help for untargeted metabolomics

Reply #7
Nat & Laura,

Just quickly I've been looking at this thread (I confess I didn't read everything so this may well have been answered) and wanted to clear up a few thing. The centWave peak detection algorithm is for use with high resolution data which is centroided. If you have low res data, what would be collected from a Thermo Deca I would use the original peak detection algorithm matchedFilter. This can be run simply using xcmsSet without the need to specifying the method (matchedFilter is the default).

Many years ago we tried collecting metabolite profiling data from our ion traps in the lab. It was a pretty difficult data set to analyse the mass accuracy made annotations difficult never mind identifications. At the end I'm not sure if the differences we saw were truly due to a biological change. Time was set aside for rerunning the experiment on the ESI-TOF instruments.

Laura, uploading of images to the forum you have to click on the upload attachment tab below the Save draft/preview/submit buttons. Upload the image and then it'll be displayed at the bottom of your post, rather than using the img button at the top.

Hope it helps,

Cheers,

Paul
~~
H. Paul Benton
Scripps Research Institute
If you have an error with XCMS Online please send me the JOBID and submit an error via the XCMS Online contact page

Re: looking for xcms setup help for untargeted metabolomics

Reply #8
Hi Nat, Laura and Paul,

I've been using xcms on low  - res LCQ data for years, using both the original (matchedFilter) and centWave algortihms.  The problems are thus:

1) matchedFilter works well with low res data but is not very adaptive in finding peaks with very different peak width (i.e. the fwhm setting)
2) centWave is good at finding a range of peak widths but doesn't work well with low res data.  You can set ppm to 600 - 800 to find peaks but it's a fine line between exceeding the mass accuracy limits of a low-res instrument and getting centWave - specific peak insertion problems.

The solution I've come up with that works best for me is to do multiple rounds of peak picking using matchedFilter with different FWHM settings, combine the peak lists and remove redundant peaks (i.e. peaks found more than once within some m/z and rt limits), and then proceed as normal with grouping, alignment, etc.  e.g:

#make 3 xcmsSet objects using 3 FWHM values keeping all else the same
set1a <- xcmsSet(files = mzXML.files, method = "matchedFilter", fwhm = 10, max = 500, snthresh = 10, step = 0.1, steps = 2 , mzdiff = 0.8)
set1b <- xcmsSet(files = mzXML.files, method = "matchedFilter", fwhm = 30, max = 500, snthresh = 10, step = 0.1, steps = 2 , mzdiff = 0.8)
set1c <- xcmsSet(files = mzXML.files, method = "matchedFilter", fwhm = 60, max = 500, snthresh = 10, step = 0.1, steps = 2 , mzdiff = 0.8)

#combine into one xcmsSet by using one of the above as a template and overriding its peaklist with a combination of all three
set1 <- set1c
set1@peaks <- rbind(set1a@peaks, set1b@peaks, set1c@peaks)
set1@peaks <- set1@peaks[order(set1@peaks[, "sample"], decreasing = FALSE), ]

#remove redundant peaks, in this case where there are any peaks within an absolute m/z value of 0.2 and within 3 s for any one sample in the xcmsSet (the largest peak is kept)
set2 <- deDuper(set1, mz.abs = 0.2, rt.abs = 3)

#then group, etc.

the deDuper function is something I've written that you are welcome to try:

deDuper <- function(object, mz.abs = 0.1, rt.abs = 2)
{
require("xcms")

mzdiff = 0

peaks.mat <- object@peaks
mz.min <- peaks.mat[, "mz"] - mz.abs
mz.max <- peaks.mat[, "mz"] + mz.abs
rt.min <- peaks.mat[, "rt"] - rt.abs
rt.max <- peaks.mat[, "rt"] + rt.abs

peaks.mat.out <- NULL

samples <- unique(peaks.mat[,"sample"])

cat("n", "Duplicate peak removal; % complete: ")
percplus <- -1

for(i in 1:length(samples))
        {
        perc <- round(i / length(samples) * 100)
        if(perc %% 10 == 0 && perc != percplus)
                {
                cat(perc, " ")
                }
        percplus <- perc

        peaks.mat.i <- peaks.mat[which(peaks.mat[, "sample"] == samples), , drop = FALSE]
        mz.min.i <- mz.min[which(peaks.mat[, "sample"] == samples)]
        mz.max.i <- mz.max[which(peaks.mat[, "sample"] == samples)]
        rt.min.i <- rt.min[which(peaks.mat[, "sample"] == samples)]
        rt.max.i <- rt.max[which(peaks.mat[, "sample"] == samples)]

        uorder.i <- order(peaks.mat.i[, "into"], decreasing = TRUE)
        uindex.i <- xcms:::rectUnique(cbind(mzmin = mz.min.i, mzmax = mz.max.i, rtmin = rt.min.i, rtmax = rt.max.i), uorder.i, mzdiff)
        peaks.mat.i <- peaks.mat.i[uindex.i, , drop = FALSE]
        peaks.mat.out <- rbind(peaks.mat.out, peaks.mat.i)
        }

cat("n")
object@peaks <- peaks.mat.out
return(object)

}


cheers
Tony

Re: looking for xcms setup help for untargeted metabolomics

Reply #9
Wow, Thanks all for the replies there is some great stuff here.

Sorry I haven't peeked in more often to give some of this a try. I am kind of doing chemical synthesis, molecular biology, biology, managing and designing new projects, writing in house scripts, dealing with gene expression clustering, and trying metabolomics all at the same time (my job looks a little like this: http://www.youtube.com/watch?v=sE2jy23iG2M except obviously I am not quite as talented or competent) so I am often divided many ways and things get left by the wayside for a while from time to time like the problems I was asking for help with here.

Yes I was initially working with the default matchedFilter method, and it was working okay, but every time I post a problem on these forums the first suggestion I get is always to move to the centWave method.  Given that this was universal advice I was trying to make the transition, but I was beginning to suspect that it just wasn't possible with my lowrez data as no amount of help from this forum or effort on my part seemed to get centWave to work with my data.

Tony seems to have the best solution to my problem that I have seen so far, especially as he seems to have been dealing with EXACTLY the same issue I am.  I'll give his proposal a try and report back, but that might be a while as metabolomic work is a bit far back in the rather full queue of stuff for me at the moment.

Thanks again all for all your help, I really appreciate every bit of it.

Re: looking for xcms setup help for untargeted metabolomics

Reply #10
Hi Tony,

  I really like your DeDuper algorithm and general method for dealing with our mutual issue.  I am switching over to using that now more or less permanently.

  I have, however, two small issues with it:

1)  The DeDuper algorithm discards peaks in favor of the treatment that gives the better integration.  This is generally all well and good, HOWEVER, I have a set up that create different isomers of metabolates, which create fine doublets (or triplets or quadruplets, ect...) in some critical treatments.  The problem here is while the really small fwhm xcmsSet calls (fwhm = 4 usually) pick up these fine neighboring peaks, the larger fwhm xcmsSets calls (fwhm = 20+) integrate these as a single peak.  Naturally the single peak intigration has much more integrated area so the DeDuper algorithm actually deletes the fine multipeak treatment (i.e. the correct and important one) in favor of the big, lazy, combined integration.  This could be very hard to fix.  I don't know how to program in R-script, but if I did, one possibility, would be for the DeDuper algorithm to take a list of the independent xcmsSets instead of one preemptively concatenated one.  Then the DeDuper algorithm would labels all the peaks based on what set they came from, and then merge and sort the sets.  At this point it can give preference to multiple peaks from one set over larger fewer peaks from an alternate set when deleting duplicates, and in cases where there is an identical number of peaks go with the treatment that gives the best integration as before.

2)  The first most column in the xcmsSet object is just that number label thingy (not the one that starts with "M" under the "name" heading just the one that has a number, often three or four digits and the column in the diffreport for it has no heading, just an empty cell).  I am sure it has a function though I don't know what it is.  After merging sets and DeDuping though some of the entries have an identical number label thingy according to XCMS.  This isn't inherently a problem, but when creating the critical diffreport from it, XCMS throws a warning that makes it look like it might have deleted a bunch of things with duplicate names in this category:
Code: [Select]
In data.row.names(row.names, rowsi, i) :
    some row.names duplicated: 1132 --> row.names NOT used
I haven't checked yet whether critical data has evaporated because of this, but it is concerning so I just wanted to bring it to your attention.  If it is a problem, a quick fix to that would be to have DeDuper make one last pass through the merged xcmsSet after removing duplicates, and just renumber/name everything in the order it encounters it (thus a unique number/name) before returning the xcmsSet object.

If I get time (hahah, ooohh that is a good one, having time) I'll try to learn enough of the R-script to implement these suggestions, but I thought I might as well inform you about them as they may be relevant to you as well.

Thanks again for all your help on this.  Your help has been hugely beneficial.

 

Re: looking for xcms setup help for untargeted metabolomics

Reply #11
Hi Nat,
Regarding point (2) I'm not sue what you're talking about. (I don't usulally use diffreport).  Could you be more specific?

Regarding point (1), this is trickier. The rectUnique() function within xcms effectively discards all duplicates in an ordered matrix, keeping only the first hit. Do if it is ordered by the biggest peak, all smaller peaks are discarded. What you want to do is keep multiple peaks in preference to one integrated peak encompassing all, a quite different task. In that case, you would not use rectUnique. What you could do instead is iteratively select "duplicated" peaks within the same sample, and instead of picking the peak with largest area, pick the fwhm parameter that gives the highest number of peaks in the group of duplicates, and discard the other possibilities. This is certainly doable in R without too much effort (eg using a while loop on an intensity ordered peak matrix and table() to select by peak frequency). But I'm a bit snowed under at the moment. I could spend an hour on it next week maybe....

All the best,
Tony

Re: looking for xcms setup help for untargeted metabolomics

Reply #12
Thanks,

  I'll look into that when I get a chance – don't worry about squandering your time on my issues.  You have done so much to help me for nothing already.

  Regarding point 2, I can't really be more specific because I honestly don't know the full extent of the "damage" done (assuming there is any damage at all.)  The only thing I have is the error message when creating the diffreport one example of which I put in the code block in my previous post.  All the other examples look about the same except that the number following "row.names duplicated:" changes depending on setup of previous functions like xcmsSet and whatnot.  I don't really know what is happening there, it may be 100% fine but in my previous post I was speculating what I think might be happening based on those fairly vague error messages – I could be totally off.  The only thing I can confirm is the number that pops up in that error message following "row.names duplicated:" appears to corespond to the type of number/names that appear in the first row of the diffreport.

  Thanks again for all your help.