Yeah, I don't like this. Instead I've worked out that converting your data to mzML means you can just add an RT shift to the time value of each scan. I've just done a fixed shift but if I spend a couple more hours sampling different shifts between my batches I could use a function to correct it instead.
Please tell me more about this, is this done in ProteoWizard? I follow your blog and have benefitted when converting .L to .msp and merging them, so thank you.
For candidate 3, when UC Davis's Tobias Kind tried to analyze >2000 abf files at once, MS-DIAL required around 200 GB RAM when the joint alignner is executed.
In the peak picking process for each file, MS-DIAL stores the entire MS data on the PC memory. It means, if you set the thread size as 1, and if your abf file size is around 1GB, the program requires < 2GB RAM for the processing. On the other hand, in the alignment process, the peak picked data (stored in *.pai2 format file) for all analysis files will be stored on the PC memory.
Although the file size of PAI2 file is around 1M, it will be elevated if you import many files.
This is really important to know. I got this error trying to process 985 mzML DIA files of ~100MB each. My VM has 64GB of RAM and the PAI2 files generated are ~3.5MB each, so that seems like plenty for this analysis. I assume that MS-DIAL builds its alignment file by combining all these individual files so the data in memory will be double that amount, but that's still a fraction of what it's using. I've rerun the analysis with some repeat samples removed, so 680 files, and I've got the same error. I'm now seeking a temporary boost to the RAM on the VM to get this analysis done.
People in the Australian Biocommons community have told me that they don't use MS-DIAL because it couldn't handle large datasets. It might help broaden the user base for the documentation to make these limits explicit.
For pure spectral matching to your NIST23 library (jealous!), in your identification settings tab just set the RT and RI tolerances to 10,000. It will stop penalising matches by RT delta.
I don't know about NIST23 but if you want RT matching you need to make sure that the RI values for the NIST library are from their polar column values. In NIST 2020 there's a file in the nist_ri folder called ri.dat, which you can parse to a plain text file with both nonpolar (5% phenyl) and polar (wax/PEG) column data. You'll need someone to write a script to combine the data with your .msp export.
I dont think that's possible. If you're opening individual features in MS-FINDER that's not really efficient. You need to export all of your features from MS-DIAL to .mat files and then work with the whole list in MS-FINDER. You can then reflect your MS-FINDER results back to the .mat files via the "Tool" menu.
To export to .mat files click on the "Show ion table" button from the main window. WHen it opens click the export to MS-FINDER button and then set some filters in the "settings" option which appears at bottom right. If you have a list of features you want to investigate that range in abundance, RT and mz then you probably want to export the whole data set and use a scripting program to copy the relevant files to a new folder for processing. If you're doing pure discovery work and you're interested in the most abundant features then you can set a threshold and the blank filter to export the top "n" most abundant features to a folder. MS-FINDER can then open all features in that folder and run batch formula and structure searching.
I very much empathise. It is a fantastic tool but it's not perfect. If you have the chance you can raise an issue on their GitHub or make suggestions on the Twitter feed. They had a thread asking for suggestions for improvements in MS-DIAL 5. https://github.com/systemsomicslab/MsdialWorkbench
Are all the RTs shifted by 0.5 mins, or is it a progressive change across the batch? If the former you can convert your data to mzML and use Python to shift the RTs for the dodgy batch by a fixed value to align the peaks badly. I had to use this hack recently.
My problem is that CHEBI IDs for lipids are difficult to find in databases as the ones used in the Reactome pathways are often high-level generalised ones. Eg. https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:17815 is a generic 1,2-diglyceride. When I search for matches in Reactome for my diglyceride annotations, they don't match as they are for specific DG species. Is there a hierarchical system by which I can find matches for my specific annotations, as well as any entries in CHEBI for higher ontological classes?
Hi, sorry but I can't find it. I wrote it >5 years ago, when I was doing a lot of GC-MS. That was Nist17 anyway. I think NIST20 includes RI data in mainlib now. I may be mistaken. C
I don't use MS-DIAL v5 because I haven't found it to be stable. I'm using 4.9.221218.
In your first post you mention that you acquired the library spectra from single standard runs, extracted the spectra and compiled them into a single .msp. Can you clarify if your question is whether the same standard solution should give you a dot product of 1.00 every time you run it (it won't), or whether if you analyse a file from which the exact library spectra was extracted you should get a dot product of 1.00 (you should).
There's a few caveats here around what your acquisition settings are. For example, if you acquired multiple DDA scans for a peak, then the one that you've extracted for your library might not necessarily be the one MS-DIAL extracts and uses. They can be quite different. Especially if the peak was particularly large, or you extracted the first scan on the rising edge of the peak with quite a low threshold.
Hi Amin. Suggestions: - Try msconvert from the proteowizard package instead of ABF converter (convert to mzML, not ABF). - Check your source file isn't corrupted.
Yes, if you export spectra to a library and then use that to annotate the source file you exported from you should get a score of 1.00. I see that you got that in your third screenshot. What version of MS-DIAL are you using?
Frustratingly, in mass spec the dot product that is often applied is not this algorithm but an optimised, or weighted dot product. See Stein & Scott, 1994: Optimization and testing of mass spectral library search algorithms for compound identification. https://www.sciencedirect.com/science/article/pii/1044030594870098
Bad jargon but more robust spectral matching.
Reverse dot product uses a reverse filter to only match peaks which are present in the library spectra. Search for the term here: http://www.bioconductor.org/packages/devel/bioc/vignettes/msPurity/inst/doc/msPurity-lcmsms-data-processing-and-spectral-matching-vignette.html Quote: "The reverse dot product cosine (rpdc) uses the same algorithm as dpc but all peaks that do not match in the query spectra (based on the alignment) are omitted from the calculation. This will improve scores when the query spectra is noisy but should be used with caution as it might lead to more false positives."
I have reached out to Hiroshi Tsugawa in the past to ask for specifics of the dot product algorithms that MS-DIAL uses. I haven't had a response.
I ran a comparison between v4.92 and v5.1 for my AIF data. v5.1 crashed during processing so I'm sticking with v4.92. I haven't found any version of MS-DIAL 5 to be stable.
Hi Justin, if you have a NIST EI library from about 2014 on then it will have an RI library in text format that contains both non-polar (5% phenyl residue) and polar (wax column) RIs for many of the compounds. You will need to export convert the spectral library to .msp format for MS-DIAL using the Lib2NIST program. I wrote a script to matched the RI values to the .msp file entries so that MS-DIAL could then use them.
Hi, you'll have to provide more information. What sort of data are you processing? GC-MS or LC-MS? DDA or DIA? What parameters have you set in the alignment tab of the method? You should post screenshots to help, or reference the steps you've taken from the tutorial: https://mtbinfo-team.github.io/mtbinfo.github.io/MS-DIAL/tutorial.html