Metabolomics Society Forum

Software => R => XCMS => Topic started by: MikaelE on October 08, 2015, 07:34:26 AM

Title: Too few features? Synapt G2S UPLC HILIC
Post by: MikaelE on October 08, 2015, 07:34:26 AM
Dear all.

We're currently using XCMS to process our data obtained from a UPLC-Synapt G2S qTOF system from Waters. A short describtion of the samples might be in place:
Polar cell extracts (MeOH/H2O fraction), injection volume 5 uL, MS in resolution mode, HILIC chromatography, run time about 18 minutes (we collect up to 800 Da).
This is our current settings:

xset<-xcmsSet(method="centWave", ppm=5, peakwidth=c(3,15), snthresh=5)
xset2<-retcor(xset, method="obiwarp", profStep=0.1, plottype="deviation")
xset3<-group(xset2, bw=5, mzwid=0.01) [have tried to insert a minfrac of 0.75)
xset3<-fillPeaks(xset3)

We're picking up around 50000 ROIs and and peaks averaging at 5000 per sample. Once we group samples we end up with a low number of groups (500), even worse with minfrac (280).

We have checked our peak widths for selected substances, some are small (2 s) while others are quite large (>15 s).
The mass accuracy of the system should be around 0.02 Da or so, however we have noticed some issues with mass accuracy in the past. We have checked through some raw data, selected metabolites are of high intensity (e5-e7), as such we "think" our runs are good enough.

Here is the issue, we're picking up way to few features from a set comprising on 105 samples included several QC injections, I'm wondering the following:
1. Is there anything in particular with our processing which might cause this?
2. Is the settings suitable for a high-end qTOF instrument with UPLC chromatography?
3. What are the prime reasons for not picking up "enough" features (if the sample is considered "good enough")?
4. Is snthresh the way to go our would the noice parameter be better?

I'm getting absolutly smashed by this  :cry: , our instrumentation should perform way better than this, perhaps we are missing something vital?
Title: Re: Too few features? Synapt G2S UPLC HILIC
Post by: sneumann on October 08, 2015, 07:48:12 AM
Hi y MikaelE,

your parameters don't look too far off.

1) How much RT deviations does your retcor() graphic show ? is it around +/- 5 secs ?

2) How many sample classes do you have ? All 105 in a single sample class ?
    I think you should at least separate samples and QC.

3) On Waters one has to be careful with the expected mass accuracy.
    AFAIK, the Waters DLL that is available to mzML converters like proteowizard
    does not export the recalibrated data. IIRC netCDF data from DataBridge
    does have calibrated data, but lacks the MS^2 spectra if you have any.

4) Recent xcms versions have "plotQC(xs)" to show some diagnostic plots,
    due to 3) above, you could especially check m/z deviations.

5) You might want to check out http://www.biomedcentral.com/1471-2105/16/118 (http://www.biomedcentral.com/1471-2105/16/118)
https://github.com/glibiseller/IPO (https://github.com/glibiseller/IPO) for an automatic Parameter optimisation.
Caveat: don't use them blindly, use them wisely and as food for thought.

Yours,
Steffen
Title: Re: Too few features? Synapt G2S UPLC HILIC
Post by: MikaelE on October 08, 2015, 08:03:33 AM
Thank you for the kind reply.
Currently processing, so I can't access the retcor plot yet :-( I will update once I have it.
The set comprises of 4 groups and 1 QC group.
We use the Waters Databridge from netCDF conversion, that should be fine then I guess?
I'll check m/z deviations with plotQC and return with info once I have it.

I do have one very specific question related to the "profparam" which we have not done anything about (read somewhere that defult is 0.1), is that relevant for centWave detection or not for high resolution instruments?
If so, will lowering this value provide any benefits? This is about the only question mark I have right now...which is sad, wish there was something more that I could do.
Title: Re: Too few features? Synapt G2S UPLC HILIC
Post by: sneumann on October 09, 2015, 05:40:37 AM
Hi,

centWave does not use the profParam for the feature detection step.
It is used in e.g. the plotRaw() which displays the raw data.
So there should be no need to modify it.

Yours,
Steffen
Title: Re: Too few features? Synapt G2S UPLC HILIC
Post by: Jan Stanstrup on October 09, 2015, 11:22:08 AM
profParam is used during fillPeaks though so it can influence your intensities. More details here: viewtopic.php?f=8&t=598&p=1853&hilit=profParam#p1853 (http://www.metabolomics-forum.com/viewtopic.php?f=8&t=598&p=1853&hilit=profParam#p1853)
Title: Re: Too few features? Synapt G2S UPLC HILIC
Post by: MikaelE on October 12, 2015, 04:41:33 AM
Ok, thank you I think I understand better now.
Our main concern now is that when we process our data with markerlynx (waters) we get a PCA plot which describes very nicely what we expected to see. However when we process with XCMS everything is a mess. Right now we're just trying to get our XCMS processing to generate a similar PCA as compared to Markerlynx.

Markerlynx method uses a signal threshold of 500 counts, the features are allowed a RT drift of 6 seconds. This produces an incredible huge model with 45K features (I know this is not optimal :-), however there is a clear difference (and similarities) between the groups according to how it "should look".

So same data went into XCMS. We altered the Profpara to 0.01 (might not have been a good idea, however this should just effect fillPeaks as I understand it).
We read in data with centWave (ppm=8) peakwidth (3,25), snthresh =5).
This is followed by recor (obiwarp) profStep=0.1, plottype="deviation") - the plot indicates drifts around 5-10 seconds
xset3<-group(xset2, bw=5, mzwid=0.01)
xset3<-fillPeaks(xset3)

This dataset looks horrible: issue 1: the data (along with multiple QC injections) are devided into two groups, loading plot identifies the many contributors as m/z values around 700 or so.
                                                      No grouping as seen in Markerlynx
                                                      Very few features (in the order of 2000 or so)
So given the fact that markerlynx somehow handles this data, which might have issues with primary retention time drifts (m/z accuracy seems fine), what settings in XCMS could help us generate data with more similarity to Markerlynx data?
A few thoughts: the centWave section seems fine, however we are currently re-processing with ppm= 10 and snthresh=3 instead.
                        Could it be the group section that is messing with the data, how much could the bw and mzwid settings effect the outcome? <- please suggest logical alterations here
                        Might the fillPeaks be the issue (remeber that we used profpara=0.01)? <- the new processing is according to default now (0.1), could that change the data?

Thanks a ton guys, I really do wanna use XCMS and not Markerlynx...
Title: Re: Too few features? Synapt G2S UPLC HILIC
Post by: Jan Stanstrup on October 13, 2015, 08:27:38 AM
Some thoughts: