Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - drchrispook

1
MS-DIAL / Re: [MS-Dial 4.20] Error in JointAligner, alignment file not found
candidate 3: out of memory issue.

For candidate 3, when UC Davis's Tobias Kind tried to analyze >2000 abf files at once, MS-DIAL required around 200 GB RAM when the joint alignner is executed.

In the peak picking process for each file, MS-DIAL stores the entire MS data on the PC memory.
It means, if you set the thread size as 1, and if your abf file size is around 1GB, the program requires < 2GB RAM for the processing.
On the other hand, in the alignment process, the peak picked data (stored in *.pai2 format file) for all analysis files will be stored on the PC memory.

Although the file size of PAI2 file is around 1M, it will be elevated if you import many files.


This is really important to know. I got this error trying to process 985 mzML DIA files of ~100MB each. My VM has 64GB of RAM and the PAI2 files generated are ~3.5MB each, so that seems like plenty for this analysis. I assume that MS-DIAL builds its alignment file by combining all these individual files so the data in memory will be double that amount, but that's still a fraction of what it's using. I've rerun the analysis with some repeat samples removed, so 680 files, and I've got the same error. I'm now seeking a temporary boost to the RAM on the VM to get this analysis done.

People in the Australian Biocommons community have told me that they don't use MS-DIAL because it couldn't handle large datasets. It might help broaden the user base for the documentation to make these limits explicit.
2
MS-DIAL / Re: Unable to Completely Disregard Retention Information for Scoring/Identification
For pure spectral matching to your NIST23 library (jealous!), in your identification settings tab just set the RT and RI tolerances to 10,000. It will stop penalising matches by RT delta.

I don't know about NIST23 but if you want RT matching you need to make sure that the RI values for the NIST library are from their polar column values. In NIST 2020 there's a file in the nist_ri folder called ri.dat, which you can parse to a plain text file with both nonpolar (5% phenyl) and polar (wax/PEG) column data. You'll need someone to write a script to combine the data with your .msp export.
3
MS-DIAL / Re: From MS-FINDER to MS-DIAL
I dont think that's possible. If you're opening individual features in MS-FINDER that's not really efficient. You need to export all of your features from MS-DIAL to .mat files and then work with the whole list in MS-FINDER. You can then reflect your MS-FINDER results back to the .mat files via the "Tool" menu.

To export to .mat files click on the "Show ion table" button from the main window. WHen it opens click the export to MS-FINDER button and then set some filters in the "settings" option which appears at bottom right. If you have a list of features you want to investigate that range in abundance, RT and mz then you probably want to export the whole data set and use a scripting program to copy the relevant files to a new folder for processing. If you're doing pure discovery work and you're interested in the most abundant features then you can set a threshold and the blank filter to export the top "n" most abundant features to a folder. MS-FINDER can then open all features in that folder and run batch formula and structure searching.
4
MS-DIAL / Re: Batch alignment in MS-DIAL 4.92
I very much empathise. It is a fantastic tool but it's not perfect. If you have the chance you can raise an issue on their GitHub or make suggestions on the Twitter feed. They had a thread asking for suggestions for improvements in MS-DIAL 5.
https://github.com/systemsomicslab/MsdialWorkbench

Are all the RTs shifted by 0.5 mins, or is it a progressive change across the batch? If the former you can convert your data to mzML and use Python to shift the RTs for the dodgy batch by a fixed value to align the peaks badly. I had to use this hack recently.
5
Pathway analysis / CHEBI hierarchy
Hi, I'm trying to apply this very cool pathway analysis to my lipidomics data.
https://pubmed.ncbi.nlm.nih.gov/36376837/

My problem is that CHEBI IDs for lipids are difficult to find in databases as the ones used in the Reactome pathways are often high-level  generalised ones. Eg. https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:17815 is a generic 1,2-diglyceride. When I search for matches in Reactome for my diglyceride annotations, they don't match as they are for specific DG species. Is there a hierarchical system by which I can find matches for my specific annotations, as well as any entries in CHEBI for higher ontological classes?

To complicate matters further, there is a separate entry for a diglcyeride with unspecified positions https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A18035, and a third for diradylglycerides with unspecified r-groups that may be acyl, https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A76578.
8
MS-DIAL / Re: Spectral Similarities of in-house measured reference compounds quite low
I don't use MS-DIAL v5 because I haven't found it to be stable. I'm using 4.9.221218.

In your first post you mention that you acquired the library spectra from single standard runs, extracted the spectra and compiled them into a single .msp. Can you clarify if your question is whether the same standard solution should give you a dot product of 1.00 every time you run it (it won't), or whether if you analyse a file from which the exact library spectra was extracted you should get a dot product of 1.00 (you should).

There's a few caveats here around what your acquisition settings are. For example, if you acquired multiple DDA scans for a peak, then the one that you've extracted for your library might not necessarily be the one MS-DIAL extracts and uses. They can be quite different. Especially if the peak was particularly large, or you extracted the first scan on the rising edge of the peak with quite a low threshold.
9
MS-DIAL / Re: lipidomics/metabolomics
Hi Amin. Suggestions:
- Try msconvert from the proteowizard package instead of ABF converter (convert to mzML, not ABF).
- Check your source file isn't corrupted.
11
MS-DIAL / Re: Formula for calculating spectral similarity and total score
Dot product is straightforward matrix maths.
https://en.wikipedia.org/wiki/Dot_product

Frustratingly, in mass spec the dot product that is often applied is not this algorithm but an optimised, or weighted dot product. See Stein & Scott, 1994: Optimization and testing of mass spectral library search algorithms for compound identification.
https://www.sciencedirect.com/science/article/pii/1044030594870098

Bad jargon but more robust spectral matching.

Reverse dot product uses a reverse filter to only match peaks which are present in the library spectra. Search for the term here: http://www.bioconductor.org/packages/devel/bioc/vignettes/msPurity/inst/doc/msPurity-lcmsms-data-processing-and-spectral-matching-vignette.html
Quote:
"The reverse dot product cosine (rpdc) uses the same algorithm as dpc but all peaks that do not match in the query spectra (based on the alignment) are omitted from the calculation. This will improve scores when the query spectra is noisy but should be used with caution as it might lead to more false positives."

I have reached out to Hiroshi Tsugawa in the past to ask for specifics of the dot product algorithms that MS-DIAL uses. I haven't had a response.
13
MS-DIAL / Re: How to Specify Column Polarity and use Retention Index for ID
Hi Justin, if you have a NIST EI library from about 2014 on then it will have an RI library in text format that contains both non-polar (5% phenyl residue) and polar (wax column) RIs for many of the compounds. You will need to export convert the spectral library to .msp format for MS-DIAL using the Lib2NIST program. I wrote a script to matched the RI values to the .msp file entries so that MS-DIAL could then use them.
15
MS-DIAL / Re: RT correction
Hi, glad you found that post useful. It took me a year or so to work it out for myself so I'm glad other people are benefitting. Is it too late to give it a DOI?  :))

To manually shift the RT of mzML files you need to write a script that creates a copy of the file line-by-line and edits the RT with a fixed shift. It's simple brute force but it works. I thought about using a polynomial instead but that will corrupt the quantitative information in the data as this depends on a uniform scan interval.