At least size is not a problem. I am using concatenated libraries in .msp formats that go well above 5-6 GB and no problems at all! Most likely, its the "keys/ fields" with formatting issues.
For a test run, I always take few known spectra from this 3GB library and create a small .msp file to see if it works with say 0% or 5% cosine similarity cut off to make sure at least the library is recognized!
The mzTab-M export will only work if you have (A) 'aligned' the data, and from the alignment option, and (B) when choosen with the corresponding text file export for either height/ areas, and not for mzTab-M exclusively etc. PLUS, importantly, either "Area" or "Height" has to be chosen besides the "mzTab-M" option, as it will export them in pairs,i.e., .text version of height/ area alongside the mzTab-M version of height/ area. So, at least 2 checks.
I do not face such issue with mzTab-M export at least in the current versions or 1-2 versions earlier.
If you have NOT normalized the data after alignment is done, then checking 'Normalized data' option will result in a crash/non-export etc. So prior to exporting data as :"Normalized data matrix" the data needs to be normalized while you are on the "aligned results" tab on your (left) hand side panel/bars.
Check out the 2 attached figs and see if those help too!
Taking a stab at it from my past experience. Here can be the helpful steps:
[a] Sciex's WIFF/WIFF2 files can be converted using ProteoWizard's MS convert : http://proteowizard.sourceforge.net/tools.shtml where you can convert Sciex file into "mzML format that are centroided data". See attached pic on "MSconvert".
Once this individual mzML files with spectra for reference standards are available to you, you can run them on MSDIAL and export out individual spectra from each file as a NIST .msp format file, as shown in the attached pic: "Exporting Spectra"!
Exactly. See the FAQ site. http://prime.psc.riken.jp/compms/msdial/faq.html What is the 'w/o' tag of identification results? 1. This program first tries to find a metabolite candidate from a MSP file by means of MS/MS similarity, accurate mass, isotope ratio, and retention time. A metabolite getting 'highest score' in the metabolite candidates is annotated. Note that this score is the total score from retention time similarity, isotope ratio similarity, accurate mass similarity, and MS/MS similarity. 2. But if the 'highest score' is less than user-defined identification cut off, 'non-MS/MS based identification' is performed. That is, this program next tries to find a metabolite candidate by means of accurate mass, isotope ratio, and retention time. Then, a metabolite getting 'highest score' in candidates is annotated. Note that this score is the total score from retention time, isotope ratio, and accurate mass. This result will be shown as 'w/o MS2:***'. 3. If any compounds are not found from the above two criteria, the result will be 'unknown'.
(a) Can you first select/ check "MS2 assigned" instead of "suggested" and see if int he Ion Table you see any hits/ spectra? If not then the datasets does not have any MS/MS spectra, or your search criteria are too stringent etc.
(b) "w/o Name" is based on precursor ion / MS1 hits alone with a certain mass window, but less confidence hits and I would typically bother not considering them for any downstream analysis.
(b) Its mostly likely that the files do not have MS/MS data in them, or lost during conversion process to mzML etc. Also, can you please share the RAW unprocessed file from the vendor and the converted mzML file through Google Drive etc. for us/ me (email: email@example.com) to look at the file and see whats going on.
Without that its difficult to suggest from pictures alone!
Many many congratulations and thanks to you for preparing newer versions of these nice tools and updating us!
Loads of kudos to you for bringing such fantastic set of tools for the entire metabolomics research community in an open-source and free manner that has EXCELLENT documentation, you fix the bugs, allow great version control, incorporate (most of) our requests, help troubleshoot our stupidest of concerns, and work hard to respond to all our queries here!
Just wanted to acknowledge your efforts and contribution to the field!!!
I am trying out a single +ve mode LC-MS/MS DDA data file (CE: 40 eV) acquired data for MS-DIAL processing. Though, upon conversion to mzML using msConvert (oth 32 , 64 bit did not work out, peak picking all conditions did not help) can see lots of MS2 spectral data using Mass++ (See pic below) but can not see any MS/MS spectra in MS-DIAL interface (see pic). Attached are the parameters (.txt file) used too. Why is MS-DIAL not able to find the MS/MS data in my processing workflow ?
Not sure what am I doing wrong in MS-DIAL for this single data file ?
I too face same issues from converting entire NIST/ spectra to MS-DIAL readable MSP. (though both MSPs).
I have uploaded 5 files here (A-E, the forum does not allow uploading of .MSP as reading it as .exe so attached them as .txt file)
For “1. could you please send one .msp spectral record created by lib2nist?” see:[/b] A-“NIST_Spectra_Export”. gif showing exporting individual spectra from NIST (glutamic acid) and it looks as B- “Glutamic acid_From_NIST_MSP”
For “2. could you please share the detail of how you convert your nist library format file to msp file in lib2nist?” I did following
C- Converted an openly available EPA Starter Kit spectral library using LB2NIST program and the process “LIB2NIST_from_EPA”.gif shows it. D- “EPA_Library.MSP” .gif shows the library. E- “EPA_Starter_LB2NIST_ConvertedMSP” .text is how the spectra looks like. Hope it helps address our concerns!
Thank you for all the quick answers and continually improving the greatly useful tool!! : )
A wish list to request you for further consideration (based on challenges I face and where I feel MS-DIAL could help me!) :
1. What it takes for MS-DIAL to also recognize the spectral library/DB as “.mgf” format as well? In that way in addition to .MSP we can also use, for example entire GNPS Natural Products libraries amounting to thousands of spectra being added on a daily basis for MS-DIAL annotations ? Esp. When .mgf to .msp format conversions are not trivial .
2. Integration of “MS-FLO” into MS-DIAL would help In (a) consolidation of feature redundancy issues, and (b) help one report unique features from positive and negative mode runs as well ?
3. I see pathway functionalities added in MS-DISL but with KEGG not ideal for lipidomics data mapping or enrichment, adding “ChemRICH” as a MS-DIAL option would be fantastic for a user! Esp. with many untargeted LCMS workflows capturing bunch of lipids, doing separate pathway analysis for metabolites and lipids do not make sense !!
4. Lastly, may be a goal for 2022, I would like to see support for GC xGC (2D GCMS) data (.cdf) data analysis given the current LACK of open source tools in this area and monopoly of 1-2 vendors (and matlab tools) leading to inconveniences !
Looking forward to your thoughts and some promises!
Hope we all come out of this COVID pandemic unscathed! Stay safe and healthy!!!!