I've noticed some weird thing with post ident. What i want to have, is an ability to identify all possible adducts for my metabolite library. So, the postIdent .txt file has a separate entry for each adduct for each metabolite. Like this:
I have almost managed to push a set of 385 Q-ToF measurements (around 200 Gb of centroid data :-) through MS-DIAL v. 4.24. It has for now finished gap filling (required ca. 90h) and does something in limbo for another 24h. App is responding, something is being written on HDD from time to time and there is also some dynamic in RAM. But quite small one. Only one-two cores of 16 are involved at 100% load from time to time. Don't know when it'll be done, but already excited to see the results.
In this run one of our in-house databases had been used for metabolite identification. However, we got another idea and would like to check with another database. Is there a possibility to go through another post-ident run on existing ion table without experiencing a whole sequence of peak picking / alignment / gap-filling / finalization again?
IIRC MS-DIAL always does this when new alignment instance is created. And it would take another week again, i guess, if there is no way around it. Another important thing also arises - let's say i suddenly got power shortage during processing in gap filling stage. Is there some possibility to start from the point where everything stopped?
Working on annotation methodology, i've stumbled upon an issue with data handling. For instance, we have some big guys doing some good metabolomics (10.1021/acs.analchem.8b04698). The data in the abovementioned work were acquired on a good instrument with a good resolution and mass accuracy (i slightly doubt it was actually stable 0.1mDa, quite optimistic -). In centroid mode. Without any justification. Or i was unable to trace the explanation back to previous works of the group, this can also be an issue.
It is supposed, that centroid data should be equal in quality to those collected in profile mode. The only reason to use centroids should be data volume reduction. However, processing the data with MS-DIAL we can see and perfectly replicate following: profile data produce less features than centroided, software employed is MS-DIAL or ProgenesisQI.
Workflow is standard (for MS-DIAL):
Acquisition -> MSConvert if prof. -> ABF converter -> MS-DIAL (full scan tolerance 0.3mDa) -> ... -> Data matrix for statistics prof. / cent. prof.->cent.
Sample: human plasma PP, identification/annotation via in-house DB)
There is some small difference if we produce centroids on the fly with the instument or with Thermo MSFileReader library. They adjust the algorithms slightly with each update and new RawFileReader API was introduced recently. But its negligible.
Big issue is, that we get 280 features in profile mode against 350 using centroids. Manual curation reduces the numbers to 220/250. The difference is still >10%. So, the devil should be somewhere in details. I assume, ABF converter simply extracts MS data array from .RAW files using Thermo API. If we are in profile mode, it should be simply datapoints against the scans. No problem here. Then, MS-DIAL performs centroiding on its own.
The big question is, how MS-DIAL does the job: is it the same noise estimation + slicing algorithm used in chromatogram EIC extraction or something else?
Otherwise centroiding picks up all the shoulders and distorted peaks, creating too much garbage, which reduces S/N ratio, as it stated in Progenesis (with a recommendation to use profile data). I cannot be clear on Progenesis, what happens there, as it is proprietary black box. But more or less the same thing is observed also there.
P.S. i prefer profile data for different, but not completely unrelated, reasons (FTMS research).
Hi, Hiroshi! Great thx for the MS-DIAL and all the work you guys do on it!
Have a couple of questions regarding post-identification with tab-separated file:
1) Does current version of MS-DIAL have limits on a size of the postidentification tab-separated text file? We are doing metabolomics on endogenous metabolites. In-house database is employed at some point, which contains around 800 entries for tailored identification process. When i try to load the complete list, i get a window "Loading libraries..." after pressing "Finish" button, which dissapears momentarily and nothing happens. I've reduced the list to around 300 entries, everything went well.
2) For the post-identification library, can the Formula/InChi/ be included there to get displayed in Basic Peak Property/Compound detail tabs?
3) As far as I understand, for postident you need to put masses of ionized species, right? So, the post-identification is only possible then for one peak with selected adduct -> can be overcome by having 5-6 rows with different probable adducts for single compound -> again, size limit problem.
Will it be possible in future to use neutral masses in post-ident, that the program searches through all combinations based on selected adducts?