Show Posts - Jan Stanstrup

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - Jan Stanstrup

181

XCMS / Re: m/z value off by 0.1~0.2 Da

November 18, 2015, 03:19:43 AM

That's very interesting and good news if there is finally a proper converter. Let us know if it works out and which version works.

- Jan

182

XCMS / Re: Confusions about handling metabolomics data with XCMS

November 13, 2015, 07:43:35 AM

If the polarity is set correctly for each scan in the file you should be able to use the polarity parameter in xcmsSet to only use positive/negative scans.

You can also re-convert the continuous mode data with msconvert from proteowizard and centroid (peakPicking filter in msconvert). Be sure to compare your files with what it shows in your vendor software to verify that the files were converted correctly.

183

XCMS / Re: m/z value off by 0.1~0.2 Da

November 11, 2015, 03:19:41 AM

How did you do the controiding yourself?
They probably use Proteowizard in XCMS online so I guess that is why the mass is off. I have no idea how continuous data is handled there. The way I would get correct files is do one of the following:
1) databridge followed by proteowizard with centroiding enabled
2) my masswolf wrapper followed by proteowizard with centroiding enabled

You can then use xcms online or anything else you want.

184

XCMS / Re: m/z value off by 0.1~0.2 Da

November 10, 2015, 08:03:50 AM

How did you convert the files? Proteowizards do not calibrate the data. You can use masswolf or data bridge. See here viewtopic.php?f=26&t=359&p=1694#p1694. I made a wrapper for masswolf that makes it possible to extract each "function" correctly.

185

CAMERA / Re: Error in x$membership

October 18, 2015, 12:08:32 PM

For reference here is the discussion of the problem: https://support.bioconductor.org/p/69414/

186

XCMS / Re: Too few features? Synapt G2S UPLC HILIC

October 13, 2015, 08:27:38 AM

Some thoughts:

ppm at 8-10 might be too low. This accuracy is needed for the whole mass peak. Not just the apex. ~30 ppm might be a better starting point.
I have had data where centwave didn't do well. I think generally it needs pretty good data to work well (many scans per peak, clear peaks). You can try matched.filter which is better at picking up peaks but you will get more noise. But this should be more similar to markerlynx.
obiwarp is slow and never worked well for me personally. Perhaps try the loess method.
It sounds like your main problem is the grouping step since you get few peaks after that.
The bw parameter seems low. Try setting it it higher at least to try to see if it improves grouping.
The bw setting is usually the most important, while a sensible mzwid should not affect the grouping that much.
You have not set your minfrac, minsamp parameters in group. Sensible settings depends on if you defined the groups (by putting them in different folders) and how homogeneous you think each group should be.
profparam might be causing some bad data but first try to understand if your grouping works. This is much more critical. I don't think 0.1 makes much sense for a QTOF instrument. If it does anything bad or not depends if you have compounds with similar mass closely eluting or not. Again this should be relatively rare so should not completely scramble your data.
My approach would be this: Find a peak/fragment that should be in all samples but is lost after grouping. I would then use a function I wrote, analyze.xcms.group (viewtopic.php?f=8&t=577&p=1789&hilit=analyze.xcms.group#p1789), to visualize what happened during grouping. --> too few dots --> peak picking issue. --> did it not group them? Did it do it wrong? Can you understand why when you compare the plot to your grouping parameters (m/z or RT dimension is the problem)?

187

XCMS / Re: Too few features? Synapt G2S UPLC HILIC

October 09, 2015, 11:22:08 AM

profParam is used during fillPeaks though so it can influence your intensities. More details here: viewtopic.php?f=8&t=598&p=1853&hilit=profParam#p1853

188

XCMS / Re: load a CDF file in R

September 28, 2015, 01:35:11 PM

How large is this file?
Googling it seems that it might be because something is running in 32 bit mode. I don't know the internals well enough to venture a guess to where the problem is or how to fix it.

https://code.zmaw.de/boards/4/topics/468
http://www.aps.anl.gov/epics/tech-talk/ ... g01231.php

EDIT: idea: if your files are > 2GB because they are in profile mode you could probably get the size down by converting them to centroid mode with msconvert from proteowizard. You can try that anyway to see if it is a CDF specific issue.

189

Other / Re: Automatic detecion of problems during analysis

September 24, 2015, 11:22:14 AM

I was thinking of the first. A running analysis (updated when a new file is available) that would warn of potential problems so that you have a chance to catch problems when they appear and not 2 months later during data analysis. XCMS + shiny for a graphical report could do it was my thought.

190

Other / Automatic detecion of problems during analysis

September 20, 2015, 05:48:37 AM

Dear all,

I was wondering if anyone was aware of any software (or a proccess included in any software) that can automatically test for problems with your analysis during the analysis itself?
I image a tool that would check for drift in retention time and line broadining, continiously check the mass calibration, check for increases in common contaminants, dramatic drops in sensitivity etc and give a running report.
Does something like this exist? Or in part?

Any hints you can give me will will be highly appreciated. I am also interested to know if others would find such a tool useful.

- Jan Stanstrup.

191

XCMS / Re: Working with profile mode data

September 19, 2015, 04:45:07 AM

1) I don't think so
2) Yes. you can use msconvert from proteowizard to re-convert the files and centroid the data. The peakPicking filter should be what you need.

192

CAMERA / Re: Questions about CAMERA and psgroup

August 21, 2015, 06:23:13 AM

Hi Lin,

I am not sure I agree with the interpretation of your plots. If I understood you correctly you say that cor_exp_th=0.4 is a threshold because after that the number of pcgroups start increasing? To me that just shows that calcCaS does not do its job at 0.4... Everything is able able to be correlated at 0.4. Think of R^2 = 0.4. Would you consider that nicely correlated (http://www.jerrydallal.com/lhsp/pix/corrp.gif)?

Also with so many samples I would consider turning calcCiS off. The reason is that with 1000 samples a good correlation across peaks (calcCaS) is a very strong indication that the peaks are from the same compound. On the other hand calcCiS is prone to "false positives" when you have a lot of co-eluting peaks so if you can rely on calcCaS only that might improve things. I show an example of this in chapter 4.1 of my thesis: https://www.researchgate.net/publicatio ... n_pipeline

Quote

And only one ion has been involved in one pcgroup. However, several features are the VIP features after my PLS model. In this case, how could I identify them?

It is normal that there are a lot of ions that are in their own pcgroup. Either noise or compounds that don't make fragments. I would not be too concerned about that. If you look manually at the peaks is there other ions that seem like they belong with your marker?
So one question is: do you trust your PLS model? Was it properly validated? I cannot help much there since I have never used PLS in my own studies but with 1000 samples there should be plenty to build a very robust model I would assume. What does box plots look like? Are they clear markers? What does the peaks look like? Are they real peaks? Did you randomize properly (sampling, sample prep, analysis)?
As for identification your only chance is to see if you can do MS/MS on the ion. If not it becomes a question of how much you want it and you revert to classical isolation and structure elucidation methods if you need to.

About your noise features: Should be a peak-picking issue. Are your noise features found at all retention times and not just at the beginning? What instrument are you using? If it is a q-tof your ppm might be a bit low. You can also try to raise the s/n. Your peakwidth might also be too narrow to cover everything. Your prefilter also seems low for the instruments I have seen (but of course depends on the scale the instrument is using).
You can also try IPO (http://www.biomedcentral.com/1471-2105/16/118) for automatic optimization of XCMS parameters.

Quote

As you mentioned before, you use CAMERA after statistics. Do you mean that you use all the well behaved and reproducible features after XCMS for statistics ? Many features after XCMS belong to one compound. In model, isotope, adduct and other fragments from same compound will contribute together. Do you think it will increase the chance of overfitting the multi statistical model?

I use all peaks from XCMS to run my statistics. I normally do univariate mixed linear models + correction for multiple testing. Its fast and easy to interpret.

I am reluctant to comment on statistics (I can drive a car but taking my directions for building one is probably not wise...). I don't know how having more variables describe the same thing affects overfitting but isn't it the whole point of multivariate stats? To allow for highly correlated data.

193

CAMERA / Re: Questions about CAMERA and psgroup

August 19, 2015, 08:28:52 AM

It is not easy to tell if you have a problem or not. It is normal that in the beginning of the chromatograms you have a lot of ions that gets grouped together because they are all co-eluting and probably also correlated because of ion suppression.
You can never "trust" CAMERA. It is a great help but you need to make sense of it in the end. I don't understand why you have set cor_exp_th=0.4? That seems so low as to be useless... So that is the first thing I would change and see if it helps.
You might find the discussion here useful: viewtopic.php?f=24&t=278.
When choosing calcCiS and calcCaS you need to consider the number of samples you have. Also "lpc" might be better than "hcs".

For features that you believe are erroneously split you should go through each step of the CAMERA process and see when it gets split. You can simply use getPeaklist after each step and check what happened to the ions you are not satisfied with.

194

XCMS / Re: unreliable peak intensity in report table

July 31, 2015, 04:29:11 AM

I don't know why that doesn't work... I tried re-converting your files and I cannot get it to centroid the data either... You'd probably have to ask the proteowizard people. Sorry.

195

XCMS / Re: unreliable peak intensity in report table

July 31, 2015, 03:10:06 AM

OK the problem is that your files are in profile mode. That is the reason it is so slow (it runs through every m/z in each scan). The function is only meant for centroided data. Also xcms is only meant to use centroided data so that is probably the source of your troubles in the first place...