Gusses of what will happen: * She uses a wide mz window in peak picking to get the whole peak inside. In effect treating it like it has much less resolution * She uses a normal mz window (ppm) and each peak is split in many pieces. Each real peak is represented by many features in her feature table. Probably there is many more features than you would expect.
It is hard to imagine you could do this without noticing. Perhaps they put the raw data up but forgot to say that they centroided?
For the orbitrap like shoulder peaks I wrote a filter, xcmsRaw.orbifilter, you can find here https://gitlab.com/R_packages/chemhelper. It runs through each scan in a file and starts with the largest mass and eliminates everything in a set mz range around that peak that is smaller than some fraction of the main peak. It runs through all peaks in the scan until all peaks have been "evaluated".
Currently you'd need to do this on all the raw files and write them out to a new set of raw files.
As for the chromatographic side peaks I am not sure you can fix that with parameters (apart from the grouping you already mentioned). Also in your plots they look like to me to be legitimate additional small peaks so I don't see what the peak picker should be doing differently. You have fronting and tailing peaks and I guess that will always be problematic. Your ~447 peak is very noisy. If your data in general looks like that the matchedfilter algorithm could give you better results. It generally is more robust to noisy data.
Content The course will provide an overview on LC-MS based untargeted metabolomics and its application in nutrition. It will be delivered using a mixture of lectures, hands-on data preparation and analysis, computer-based practical sessions, and discussions. Visits to wet labs and instructions on human sample preparation procedures is included but with minimal hands-on.
The students will go through common steps in a typical metabolomics study using a real-life case. This case study includes collected plasma (or urine) samples from a nutritional intervention. The sample preparation and analysis on UPLC-QTOF has been conducted and the students will further process and analyse the acquired data with various free-ware tools (MZmine, Workflow4Metabolomics and Metaboanalyst). They will finally work on identification of relevant metabolites using several web-based structure elucidation tools. The course will finalize by presentations of reports generated by the students based on the case study.
The course will be structured as initial short lectures on theory followed by hands-on exercises which will teach the students to transfer the theoretical information to practice.
Yes the loadings do suggest that. But since you don't see that in your boxplots I think the pattern is too complex to appreciate there. So I think you need to look at some individual compounds. The corrections methods always work in each feature individually for the same reason.
I did not know you had done Obiwarp. If it works well then it could be fine.
Looking at intensity distributions IMO won't tell you enough. As illustrated in the Brunius paper features behaves differently and some remain constant. I would look at a few compounds I know and see how they behave across your batches.
Looking at the loadings of your PCA might also give you a clue. You should be able to see if your batch difference is dominated by a few features or it is all features moving in one direction.
You can set `binSize` in `PeakDensityParam`. It is set in in Dalton and you cannot set it in ppm. The default is 0.25 Da so it is unlikely to help you. You should probably set it much lower. With such a large dataset it is very likely to suffer from big intensity drift issues you'd need to fix. XCMS does not provide tools for that but the papers you list have some tools.
A few other observations:
setting `minFraction` to 1/n is extremely aggressive. I assume you get a lot of features. `minSamples`/`minFraction` are critical for getting rid of false positives in the peak picking
are you sure `bw` is right? That means you don't expect more than ~1/60=0.017 min retention time difference between all your samples. Setting that so low might also give you batch effects if you have retention time differences between batches.
MZMine is know to use a lot of memory. I imagine that is where your bottleneck is. But you should check that.
XCMS is much more memory efficient. Be aware that each core will use a certain amount of memory. So on a system like yours not using all cores will use less memory and might save you if memory is your bottleneck. Also don't use 80 cores on processes that are bottlenecked by HDD reads (like reading the raw data).
That said, with 10,000 samples you really need to be careful about how greedy you need to be in terms of how low in intensity you want to pick.
What values are you comparing? How do you get a single m/z value from the profile mode data to compare to? So there is the profile mode data, Waters centroided m/z and the msconvert centroided m/z. The last two will be different due to different centroiding algorithms. The documentation says the CWT method is not very good. You could use Waters centroiding (if that is the one that is good?) if you centroid in masslynx first (to new raw file) and then convert without any additional processing.