Skip to main content
Topic: Too few features? Synapt G2S UPLC HILIC (Read 6501 times) previous topic - next topic

Too few features? Synapt G2S UPLC HILIC

Dear all.

We're currently using XCMS to process our data obtained from a UPLC-Synapt G2S qTOF system from Waters. A short describtion of the samples might be in place:
Polar cell extracts (MeOH/H2O fraction), injection volume 5 uL, MS in resolution mode, HILIC chromatography, run time about 18 minutes (we collect up to 800 Da).
This is our current settings:

xset<-xcmsSet(method="centWave", ppm=5, peakwidth=c(3,15), snthresh=5)
xset2<-retcor(xset, method="obiwarp", profStep=0.1, plottype="deviation")
xset3<-group(xset2, bw=5, mzwid=0.01) [have tried to insert a minfrac of 0.75)
xset3<-fillPeaks(xset3)

We're picking up around 50000 ROIs and and peaks averaging at 5000 per sample. Once we group samples we end up with a low number of groups (500), even worse with minfrac (280).

We have checked our peak widths for selected substances, some are small (2 s) while others are quite large (>15 s).
The mass accuracy of the system should be around 0.02 Da or so, however we have noticed some issues with mass accuracy in the past. We have checked through some raw data, selected metabolites are of high intensity (e5-e7), as such we "think" our runs are good enough.

Here is the issue, we're picking up way to few features from a set comprising on 105 samples included several QC injections, I'm wondering the following:
1. Is there anything in particular with our processing which might cause this?
2. Is the settings suitable for a high-end qTOF instrument with UPLC chromatography?
3. What are the prime reasons for not picking up "enough" features (if the sample is considered "good enough")?
4. Is snthresh the way to go our would the noice parameter be better?

I'm getting absolutly smashed by this  :cry: , our instrumentation should perform way better than this, perhaps we are missing something vital?

Re: Too few features? Synapt G2S UPLC HILIC

Reply #1
Hi y MikaelE,

your parameters don't look too far off.

1) How much RT deviations does your retcor() graphic show ? is it around +/- 5 secs ?

2) How many sample classes do you have ? All 105 in a single sample class ?
    I think you should at least separate samples and QC.

3) On Waters one has to be careful with the expected mass accuracy.
    AFAIK, the Waters DLL that is available to mzML converters like proteowizard
    does not export the recalibrated data. IIRC netCDF data from DataBridge
    does have calibrated data, but lacks the MS^2 spectra if you have any.

4) Recent xcms versions have "plotQC(xs)" to show some diagnostic plots,
    due to 3) above, you could especially check m/z deviations.

5) You might want to check out http://www.biomedcentral.com/1471-2105/16/118
https://github.com/glibiseller/IPO for an automatic Parameter optimisation.
Caveat: don't use them blindly, use them wisely and as food for thought.

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: Too few features? Synapt G2S UPLC HILIC

Reply #2
Thank you for the kind reply.
Currently processing, so I can't access the retcor plot yet :-( I will update once I have it.
The set comprises of 4 groups and 1 QC group.
We use the Waters Databridge from netCDF conversion, that should be fine then I guess?
I'll check m/z deviations with plotQC and return with info once I have it.

I do have one very specific question related to the "profparam" which we have not done anything about (read somewhere that defult is 0.1), is that relevant for centWave detection or not for high resolution instruments?
If so, will lowering this value provide any benefits? This is about the only question mark I have right now...which is sad, wish there was something more that I could do.

Re: Too few features? Synapt G2S UPLC HILIC

Reply #3
Hi,

centWave does not use the profParam for the feature detection step.
It is used in e.g. the plotRaw() which displays the raw data.
So there should be no need to modify it.

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE


Re: Too few features? Synapt G2S UPLC HILIC

Reply #5
Ok, thank you I think I understand better now.
Our main concern now is that when we process our data with markerlynx (waters) we get a PCA plot which describes very nicely what we expected to see. However when we process with XCMS everything is a mess. Right now we're just trying to get our XCMS processing to generate a similar PCA as compared to Markerlynx.

Markerlynx method uses a signal threshold of 500 counts, the features are allowed a RT drift of 6 seconds. This produces an incredible huge model with 45K features (I know this is not optimal :-), however there is a clear difference (and similarities) between the groups according to how it "should look".

So same data went into XCMS. We altered the Profpara to 0.01 (might not have been a good idea, however this should just effect fillPeaks as I understand it).
We read in data with centWave (ppm=8) peakwidth (3,25), snthresh =5).
This is followed by recor (obiwarp) profStep=0.1, plottype="deviation") - the plot indicates drifts around 5-10 seconds
xset3<-group(xset2, bw=5, mzwid=0.01)
xset3<-fillPeaks(xset3)

This dataset looks horrible: issue 1: the data (along with multiple QC injections) are devided into two groups, loading plot identifies the many contributors as m/z values around 700 or so.
                                                      No grouping as seen in Markerlynx
                                                      Very few features (in the order of 2000 or so)
So given the fact that markerlynx somehow handles this data, which might have issues with primary retention time drifts (m/z accuracy seems fine), what settings in XCMS could help us generate data with more similarity to Markerlynx data?
A few thoughts: the centWave section seems fine, however we are currently re-processing with ppm= 10 and snthresh=3 instead.
                        Could it be the group section that is messing with the data, how much could the bw and mzwid settings effect the outcome? <- please suggest logical alterations here
                        Might the fillPeaks be the issue (remeber that we used profpara=0.01)? <- the new processing is according to default now (0.1), could that change the data?

Thanks a ton guys, I really do wanna use XCMS and not Markerlynx...

 

Re: Too few features? Synapt G2S UPLC HILIC

Reply #6
Some thoughts:
  • ppm at 8-10 might be too low. This accuracy is needed for the whole mass peak. Not just the apex. ~30 ppm might be a better starting point.
  • I have had data where centwave didn't do well. I think generally it needs pretty good data to work well (many scans per peak, clear peaks). You can try matched.filter which is better at picking up peaks but you will get more noise. But this should be more similar to markerlynx.
  • obiwarp is slow and never worked well for me personally. Perhaps try the loess method.
  • It sounds like your main problem is the grouping step since you get few peaks after that.
  • The bw parameter seems low. Try setting it it higher at least to try to see if it improves grouping.
  • The bw setting is usually the most important, while a sensible mzwid should not affect the grouping that much.
  • You have not set your minfrac, minsamp parameters in group. Sensible settings depends on if you defined the groups (by putting them in different folders) and how homogeneous you think each group should be.
  • profparam might be causing some bad data but first try to understand if your grouping works. This is much more critical. I don't think 0.1 makes much sense for a QTOF instrument. If it does anything bad or not depends if you have compounds with similar mass closely eluting or not. Again this should be relatively rare so should not completely scramble your data.
  • My approach would be this: Find a peak/fragment that should be in all samples but is lost after grouping. I would then use a function I wrote, analyze.xcms.group (viewtopic.php?f=8&t=577&p=1789&hilit=analyze.xcms.group#p1789), to visualize what happened during grouping. --> too few dots --> peak picking issue. --> did it not group them? Did it do it wrong? Can you understand why when you compare the plot to your grouping parameters (m/z or RT dimension is the problem)?
Blog: stanstrup.github.io