Does anyone know if there is a way to direct access to the data in the NIST library? I would like to automate some things in R but the NIST library seems to be in some binary format. Have someone done something similar? Any hints?
My thought was too that it had to be the different retcor. That is why I asked if there is the same issue after the first grouping.
I definitely don't see any need for metaXCMS if the samples were analyzed together.
As for stats I really urge you not to just use the stats in XCMS. You are missing things that you probably need such as: * Drift correction * Correction for multiple testing --> FDR. * Statistical model that takes into consideration all the factors in your study.
I don't think there is anything *wrong* with doing C vs A and C vs B but at the very least you'd need FDR correction on the whole set of p-values. So I'd advise investing some time into doing stats in R using lm or lmer (and/or something multivariate) depending on your study. Rick Dunn talks about some of these things in the last talk of the Data processing workshop here (unfortunately the last part was cut): http://metabolomicssociety.org/site-map/articles/88-videos/262-2017-conference-workshop-videos-public
I am interested in a similar issue. For data analysis it would be nice to have blank samples in your dataset so that you can assess noise compared to sample values. But since there are no or few peaks, peak grouping and alignment has failed in my hands.
What would be nice to have is at least be able to add new samples and do a dump integration with fillPeaks on the new samples. I have managed this by forcing the new files into the xcms object, faking scan times (raw/original and set the corrected RTs using the mean correction of the original set) but this is very dirty...
@johannes.rainer Any thoughts on whether something clean is feasible here?
If I understand correctly you have 3 groups: A, B, C. 1) If you do A and C you get features found in A+C 2) If you do B and C you get features found in B+C 3) If you do A, B and C you get features found in A+B+C.
So why should your not get different peaktables in those 3 cases?
Depends why it is an outlier. RT shifts or just very different intensities? If the first it could make sense. If not then no I would say. If no shift but unique features pruning those features from the peaktable might be fine.
I am not sure it is completely clear what you are comparing. Are you talking about processing with and without dividing the samples in groups? Or two completely separate processing of the two groups?
To view raw files? No. To browse converted files you can use mzMine.
To do the centroiding? Yes, msconvert from Proteowizard can, but from the docs seems not well. msconvert cannot use the Waters' centroiding as it can for other vendor formats. So it has to use its own supposedly inferior implementation.
When you have files that are 1GB they are almost certainly in continuum mode.You need to first convert them to centroid mode in masslynx to be able to use XCMS. Typically centroid mode files are 50-100MB. Masslynx --> tools --> accurate mass measure --> Automatic peak detection. Then convert the resulting raw files.
As for the functions I am afraid also MSe is listed as TOF MS. So you probably have to ask the people that did the experiment what each are unless you can guess from that _extern.inf file. But that probably requires comparison with something you know what is unless you are a really hardcore MS person.
You need to figure out which of the 3 files you need. Mixing different functions will likely mess things up. Since you have Databridge I guess you have masslynx. So open a chromatogram --> display --> TIC. Here you have the functions listed and you can get the TIC of each function. If that doesn't clear it up either ask the people that did the experiment or try to decipher the _extern.inf in the raw folders. That contains all the experiment settings and have sections for each function. The format is not very consistent between versions but I just checked and it seems MSe functions will have things like Transfer MS Collision Energy Low (eV) 10.0 Transfer MS Collision Energy High (eV) 40.0 listed at least in my files.
What do you mean by "load the files with XCMS"? Using xcmsSet? How heavy this is depends on the settings. What happens? You run out of memory? At which point? How large are the files? Maybe they are not centroided?
You get one file per "function" with databridge. So you need to know how your experiment was set up to know which to use. Typically you might have a normal MS1 function and the lockmass function (not sure the latter is written as a file but might be). Then you might have added an MSe function. You can check what each are in masslynx.
What do you mean by clog up the system? XCMS doesn't load all raw data at the same time.
From the discussion there it seems mzR at least should be able to read something from those files. Maybe Msnbase can even read the files but I don't know.