Skip to main content

Show Posts

This section allows you to view all Show Posts made by this member. Note that you can only see Show Posts made in areas you currently have access to.

Messages - Nat S

1
XCMS / Re: looking for xcms setup help for untargeted metabolomics
Thanks,

  I'll look into that when I get a chance – don't worry about squandering your time on my issues.  You have done so much to help me for nothing already.

  Regarding point 2, I can't really be more specific because I honestly don't know the full extent of the "damage" done (assuming there is any damage at all.)  The only thing I have is the error message when creating the diffreport one example of which I put in the code block in my previous post.  All the other examples look about the same except that the number following "row.names duplicated:" changes depending on setup of previous functions like xcmsSet and whatnot.  I don't really know what is happening there, it may be 100% fine but in my previous post I was speculating what I think might be happening based on those fairly vague error messages – I could be totally off.  The only thing I can confirm is the number that pops up in that error message following "row.names duplicated:" appears to corespond to the type of number/names that appear in the first row of the diffreport.

  Thanks again for all your help.
2
XCMS / Re: looking for xcms setup help for untargeted metabolomics
Hi Tony,

  I really like your DeDuper algorithm and general method for dealing with our mutual issue.  I am switching over to using that now more or less permanently.

  I have, however, two small issues with it:

1)  The DeDuper algorithm discards peaks in favor of the treatment that gives the better integration.  This is generally all well and good, HOWEVER, I have a set up that create different isomers of metabolates, which create fine doublets (or triplets or quadruplets, ect...) in some critical treatments.  The problem here is while the really small fwhm xcmsSet calls (fwhm = 4 usually) pick up these fine neighboring peaks, the larger fwhm xcmsSets calls (fwhm = 20+) integrate these as a single peak.  Naturally the single peak intigration has much more integrated area so the DeDuper algorithm actually deletes the fine multipeak treatment (i.e. the correct and important one) in favor of the big, lazy, combined integration.  This could be very hard to fix.  I don't know how to program in R-script, but if I did, one possibility, would be for the DeDuper algorithm to take a list of the independent xcmsSets instead of one preemptively concatenated one.  Then the DeDuper algorithm would labels all the peaks based on what set they came from, and then merge and sort the sets.  At this point it can give preference to multiple peaks from one set over larger fewer peaks from an alternate set when deleting duplicates, and in cases where there is an identical number of peaks go with the treatment that gives the best integration as before.

2)  The first most column in the xcmsSet object is just that number label thingy (not the one that starts with "M" under the "name" heading just the one that has a number, often three or four digits and the column in the diffreport for it has no heading, just an empty cell).  I am sure it has a function though I don't know what it is.  After merging sets and DeDuping though some of the entries have an identical number label thingy according to XCMS.  This isn't inherently a problem, but when creating the critical diffreport from it, XCMS throws a warning that makes it look like it might have deleted a bunch of things with duplicate names in this category:
Code: [Select]
In data.row.names(row.names, rowsi, i) :
    some row.names duplicated: 1132 --> row.names NOT used
I haven't checked yet whether critical data has evaporated because of this, but it is concerning so I just wanted to bring it to your attention.  If it is a problem, a quick fix to that would be to have DeDuper make one last pass through the merged xcmsSet after removing duplicates, and just renumber/name everything in the order it encounters it (thus a unique number/name) before returning the xcmsSet object.

If I get time (hahah, ooohh that is a good one, having time) I'll try to learn enough of the R-script to implement these suggestions, but I thought I might as well inform you about them as they may be relevant to you as well.

Thanks again for all your help on this.  Your help has been hugely beneficial.
3
XCMS / Re: looking for xcms setup help for untargeted metabolomics
Wow, Thanks all for the replies there is some great stuff here.

Sorry I haven't peeked in more often to give some of this a try. I am kind of doing chemical synthesis, molecular biology, biology, managing and designing new projects, writing in house scripts, dealing with gene expression clustering, and trying metabolomics all at the same time (my job looks a little like this: http://www.youtube.com/watch?v=sE2jy23iG2M except obviously I am not quite as talented or competent) so I am often divided many ways and things get left by the wayside for a while from time to time like the problems I was asking for help with here.

Yes I was initially working with the default matchedFilter method, and it was working okay, but every time I post a problem on these forums the first suggestion I get is always to move to the centWave method.  Given that this was universal advice I was trying to make the transition, but I was beginning to suspect that it just wasn't possible with my lowrez data as no amount of help from this forum or effort on my part seemed to get centWave to work with my data.

Tony seems to have the best solution to my problem that I have seen so far, especially as he seems to have been dealing with EXACTLY the same issue I am.  I'll give his proposal a try and report back, but that might be a while as metabolomic work is a bit far back in the rather full queue of stuff for me at the moment.

Thanks again all for all your help, I really appreciate every bit of it.
4
XCMS / Re: looking for xcms setup help for untargeted metabolomics
Through trial and error I have worked my way down to ppm=400 and I am still getting a few (but only a few) data insertion errors.  So now the ppm value is below what is the measurable error in large key peaks (like the caffeine internal standard) and I am still getting data insertion errors, and nothing groups to produce anything other then an empty matrix or undergoes retention correction without errors.

  In conclusion I simply can get anything to work like this even with exhaustive trial and error.  Any thoughts or suggestions for getting this working?
5
XCMS / Re: looking for xcms setup help for untargeted metabolomics
Hi Laura,

  Unfortunately even in playing with the parameters to your centWave based xcmsSet setup I can't ever get an xcmsSet object that will either group or undergo retention correction at all.  Various setups for the group function all run and give the illusion of chugging along correctly, but the object they act on never has anything other then a <0 x 0> matrix when grouping is finished, and all attempts at retcor just fail utterly with errors.

  The problem can't be the data set as I can get the mediocre and unideal (but still useful) results that I was complaining about above by using the default matched filter method on this exact data set and then group it which works alright.

  Also the weird thing with the centWave setup you recommended is, as I mentioned above, that it gives peak data insertion errors even at ppm values below what were ideal before.  For instance if I do A<-xcmsSet(method="centWave", peakwidth=c(3,20), ppm=700)  it runs fine on this data set (though still produces something that won't group, retcor, or do anything else).  But when I am adding in your integration parameters (like "fitgauss=TRUE") then even going as low as "ppm=525" I get peak data insertion errors and am told to lower the ppm value.  This is bad as going lower may start to cross the actually threshold for error on this machine as best I can measure it which will make a real serious hash of things.

  Thoughts?
6
XCMS / Re: looking for xcms setup help for untargeted metabolomics
Hi Laura I am running some tests now, but I am having one issue with your code up front.

I had to cut out the: mzCenterFun = "wMean" because whenever I include that in xcmsSet I get: Sample1: Error in .local(object, ...) : unused argument(s) (mzCenterFun = "wMean")  I checked the xcms manual and that command and option are definately there and I spelled them correctly, but I can't get xcmsSet to run with them.  The only call I am making before this is library(xcms) so I shouldn't be messing anything up from its default state.  Perhaps that is the issue, I need to do more setup before I can pass mzCenterFun = "wMean" to xcmsSet?  Any theories?  Currently I am running your exact code but with ppm = 600  and also just removed assigning anything to the mzCenterFun parameter and it seems to be working reasonably well though it is taking a while.  Any theories on what I might be doing wrong with the mzCenterFun = "wMean" assignment?

(random P.S apparently 600 ppm is now too big as I am getting a few mass insertion errors even though that works great on the same data set without your other suggested parameters for the centWave function, what has changed that might influence this?)
7
XCMS / Re: looking for xcms setup help for untargeted metabolomics
Hi Laura,

    Thanks for the response.

    No we are doing untargeted metabolomics.  When I say finding features manually I mean finding features of a meaningful size and statistical significance.  We are talking about dozens not thousands, but they are dozens of meaningful ones, which xcms might represent as six or more each due to mass fragmentation and what not that is obviously all part of the same real feature when done by hand.  Done by hand this is still very very hard and time consuming even on our smaller data sets.  The things we found by hand that XCMS missed were new features for just a few samples in that new (but smaller) data set, but again this is all untargeted, and what we are looking for is new in almost every data set.

  I got the advise to try the centWave peak finding algorithm in my other more specific post in the same board here: (http://metabolomics-forum.com/viewtopic ... 93f63dde0f).  Unfortunately I am not having any luck with the centWave method.  I did figure out that I need to set the ppm such that 800 > ppm > ?450 via some experimentation and analysis for the centWave function including some suggestions from others.  But the resulting xcmsSet object I get out still doesn't group and thus still won't do a retcor.  I usually use the fillPeaks method too at the end, but that didn't solve my old problem by finding the missing peaks that could be located by hand pre centWave, and with centWave I can't get through grouping or retcor successfully so I haven't gotten far enough to play around with fillPeaks yet.

  I will play around with your suggested "xcmsSet" and "group" code suggestions though (after adjusting the ppm), that might solve something.  I'll get back to you when I have some results, good or bad in relation to this.
9
XCMS / Re: error with "scanrange" variable of xcmsSet
Quote
Is your data centroided or in profile mode?

  Oooh, good question, I'll have to look into that.  I didn't care before because the matchfilter method I was using didn't seem to mind whatever form I had the data in and it could give me results with it, but if that is an issue for centWave I'll have a look into now.

  Thanks for the suggestion.
10
XCMS / Re: error with "scanrange" variable of xcmsSet
Thanks Steffen,

  While waiting for your previous response I have already in the meantime played with the ppm value via trial and error to see right where things fall apart.  No mater how low I go with the ppm value though it doesn't effect the failure to group (again this is pre-your-recommendation to mess with the group function which I will try right after posting this).  I stop getting errors around ppm=700 (though again I still can't get grouping).  In the end though the ppm variable doesn't seem to be the primary source of my problems, you are right, my (likely improper) use of "group" is probably is where my issues are coming from.

  As for your plotRaw suggestion, please forgive my ignorance, but what exactly should I be looking for in the raw data plot from plotRAW?  What is the difference between the yellow and green dots on this scatter plot?  What I am noticing is that the data comes in bands grouped around masses, and I am guessing that I want is to get a good gauge of how wide those bands are near important peaks and set the ppm function accordingly.  In doing this, looking for a nice large signal I am seeing the vast majority of the dots on the band for that signal come between 393.15 and 393.40 so that would be about 636 ppm.  Stepping through scan by scan on a GUI browser to look for the single largest change between scans on the same peak I see the worst is 393.17 to 393.30, which would be about 330 ppm.  Similarly for a second peak, the plotRaw gives a band mostly between 195.0 and 195.20, which is about 1026 ppm and stepping in the GUI the worst per scan change is 195.17 to 195.25, which would be about 410 ppm. So the 700 ppm I got via trial and error looks like a fairly reasonable number (something >450 looks about right) though again, maybe I didn’t preform the analysis you were suggesting correctly as I am a bit new to all of this.

  I'll try playing with the group function now as you suggest.  I’ll route through the xcms manual to see what might be the best way to analyze “A” for groups it may or may not contain and they quality of them, but do you have any thoughts on simple code snippets for best way to look at “A” for its groups and quality thereof? If not that is fine, I feel bad about taking so much of your time with my ignorance, and I’ll monkey around until I get anything to do something more then what is happening currently)
11
XCMS / Re: error with "scanrange" variable of xcmsSet
Sure it does look like it is the same object though (I am trying to do things really as simple as possible so I am just using the nice simple generic object "A" for all my xcms work)  This is in the command line, so I'll put in my commands into the code block with the ">" to start off something I typed in and follow it with the output (abridged to the first object output in the case of long repetitive outputs such as xcmsSET).

Code: [Select]
> A<-xcmsSet(method="centWave", mzdiff=-0.5, ppm=4000, scanrange = c(40,800))
Sample1:
  Detecting mass traces at 4000 ppm ...
  % finished: 10 20 30 40 50 60 70 80 90 100 Warning: There were 116247 peak data insertion problems.
  Please try lowering the "ppm" parameter.

  4085 m/z ROI's

  Detecting chromatographic peaks ...
  % finished: 10 20 30 40 50 60 70 80 90 100
  322 Peaks.
  ##ABRIDGED ...
> group(A)
163 226 288 351 413 476 538 601 663 726 788 851 913 976
> retcor(A, family="s", plottype="m", method="linear")
Error in .local(object, ...) : No group information found
> group(A, bw=c(5,20), mzwid=0.5)
225 350 475 600 725 850 975
There were 50 or more warnings (use warnings() to see the first 50)
> retcor(A, family="s", plottype="m", method="linear")
Error in .local(object, ...) : No group information found
> Reporttab<-diffreport(A, "runs", "control", "centWaveTry1", 5)
Error in .local(object, ...) : No group information found

I just tried this all again except with "ppm=400" in the xcmsSet function and got basically all the same results except it found many fewer peaks in each file and didn't warn about setting the ppm low, but I still got the exact same resulting failure from group, retcor, ect...
12
XCMS / Re: error with "scanrange" variable of xcmsSet
Similarly trying to do a diffreport with the above data after grouping but skipping the retcor step produces an error
Code: [Select]
Error in .local(object, ...) : No group information found
Looks like what I am doing with centWave (or possibly) group is VERY wrong and is just making a mess.
13
XCMS / Re: error with "scanrange" variable of xcmsSet
Okay I tried running the setup of xcmsSet with the centWave method, but I am not sure I set it up properly for our low resolution instrument.  I tried
Code: [Select]
A<-xcmsSet(method="centWave", mzdiff = -0.5, ppm=4000, scanrange=c(40,800))
based on the crude estimate that our instrument has just under 0.5 ppm accuracy and variability (across the whole data set, not on a per scan nor likely on a per chromatogram basis.)

It churned through the data set but seemed upset about the ppm value, the results of all my chromatograms looked like this as they were processed
Code: [Select]
Some Sample:
  Detecting mass traces at 4000 ppm ...
  % finished: 0 10 20 30 40 50 60 70 80 90 100 Warning: there were 125126 peak data insertion problems.
  Please try lowering the "ppm" parameter.
 
  4275 m/z ROI's.
 
  Detecting chromatographic peaks ...
  % finished: 0 10 20 30 40 50 60 70 80 90 100
  613 Peaks.
Then following my attempt to use the "group" function (which, at least superficially, seems to run without errors) the retcor function just produced
Code: [Select]
Error in .local(object, ...)  : No group information found
Suggesting that things aren't in such great condition in the resulting xcmsSet object.

Thoughts?
14
XCMS / Re: error with "scanrange" variable of xcmsSet
Okay, thanks that makes sense. I'll give that a try.

One question though, is the centWave method appropriate for particularly low res mass spectrometers? One of the reasons I was just using the default matchedFilter method is that the xcms manual suggests that pretty much all of the others were for high-res instruments, and as the
Code: [Select]
step=0.5
in my entry suggests this isn't exactly a precision instrument.
15
XCMS / error with "scanrange" variable of xcmsSet
Okay, so my long post was apparently too daunting a wall of text so I'll break this into manageable bits, first:

Over 75% of my features end up being in two "ion rainbows", the first being the injection spike and the second being the requilibration portion at the end of all my runs.  As a result I basically start all my analysis by deleting 75% or more of my “data” in these locations in the diffreport.  The obvious thing to try to compensate for this was the “scanrange” variable. But any time I try something like:

Code: [Select]
C<-xcmsSet(step=0.5, scanrange = c(40,700))

I jut get the following error:
Code: [Select]
Sample1: Error in .local(object, …) :
  unused argument(s) (scanrange = c(40, 700))

Thoughts?