We were using a 10 minute reversed phase LC method of which most of the analytes of interest come out within the first 5-7 minutes. In addition were were trying to analyze 100+ human fecal samples along with a pooled QC sample. The samples contained a wild variety of additional material that often showed up during the high %B hold after 7 minutes or so. If you overlaid the samples you'd see that some had very little signal in that region while others had massive bulky TICs. Regardless cutting the RT window down to 5 minutes, removing that contaminated zone, allowed the alignment to continue as expected.
Anyhow it was not clear at first why the program wasn't moving forward at the time, but we figured it out.
I've run into this issue as well. While I don't have an exact answer on how to solve it I can say I found 2 resources that explain what's happening.
https://mtbinfo-team.github.io/mtbinfo.github.io/MS-DIAL/tutorial.html#section-9-4 "...The value of ‘-2’ in “Peak ID” column means that the peak is not detected by peak picking process. (but calculated by gap-filling method). In the case of gap-filled peak, the colors of the “Peak Int.” and “Peak Area” columns become light blue. In normal, the colors (red) reflect the level of peak intensity or peak area. You cannot refine the peak and alignment yet, but that function will be developed."
How Peak ID, alignment, and gapfilling work are explained in their math FAQ document which can be found here:
I wish their documentation were a bit better or a bit more up to date. Sometimes the examples are from an older version where the GUI choices or display is different from the version you're using.
So for what you've shown us for 'file ID' 0 -1, 4 - 7, 18, 19, and 21 are all examples of where gap filling by compulsion has happened. No peak was detected there, but because it's in your QCs it was forced into those samples (see last 2 pages of math FAQ).
I believe all the answers lie under the 'alignment' tab. I think you could have it exclude those by using a peak count filter(it says %, but I think it reflects an intensity value) or n% detected in one group, however in those cases it shouldn't even appear as an aligned feature rather than forcing signal into those blank samples. Alternatively you might be able to get them excluded by using blank subtraction. I'm thinking gap filling might be the problem here.
If I'm being honest all the stuff I've tried prior to exporting the aligned peak results has not had the desired results. For our data sets we have so many samples & ref matched IDs that it is incredibly laborious to manually investigate and possibly alter every sample over every data point.
We have used MS Dial a decent bit. Typically we collect small molecule metabolomics data on a UHPLC-Sciex 5600 QToF set up. Typically we're collecting IDA data in 10 minute runs. Most experiments involve collection of the following data types:
Pooled Samples MS2
Pooled Sample MS1
Individual Sample MS1
We do not convert the data to a common centroided format before ingesting it into MS-Dial. We churn the data on a dedicated Dell work station that has a 6 core/6 hyperthreaded Xeon CPU, 32 GB of DDR4 RAM, and a mechanical HD. For small projects(5 - 30 samples) its fine, but for larger projects (>30 samples) we run into issues. Specifically the issues arise after the individual samples are processed and during the 'peak filling, identification, and alignment' stage. Add gap filling in there too.
The program bogs down the PC to the point where it's unresponsive or slugging. MS Dial's progress bar during that period often freezes at a low %. Sometimes the project completes and other times it does not. It could take half a day or multiple days to complete.
I've pulled up task manager and other monitoring software during the processing to see what's happening. During the initial sample processing the CPU use is most intensive, but since it can multi-threaded in parallel it's quick. During the aforementioned 'peak filling, identification, and alignment' stage the CPU usage drops to almost nothing where as the RAM and the Disk activity shoot up to 90-100%.
I have a few questions
Are there tips on improving the speed at which larger projects can be computed?
Would a faster writing data component, like a soild state drive, improve the speed of that step?
How can we tell if the 'peak filling, identification, and alignment' is actually progressing or the program has stalled out as unresponsive?
I'd be happy to provide any information on our process, computer system, or software if it would help us tackle the problem. We like a MS-Dial a lot, but the slow down and questionable completion of the data processing has us questioning using it further.
Ok. We have a 5600 QToF as well and I investigated what you're showing with my own data. A few things pop out.
First with respect to what you're seeing in Peakview you have the wrong data displayed(MS1 scan). If it is IDA data you need to pull up the IDA explorer and find the parent ion at the expected RT to pull out the relevant MS2 scans. Those would properly reflect what MS Dial is trying to find and display to you in the windows you showed containing only the parent ion.
I'm not sure if you're trying to process SWATH data or not. You mentioned IDA, but what's shown in the MS-Dial Parameters includes deconvolution settings that differ from the standard inputs including an MS2 cutoff. That might also be part of the problem, however if that's not the case then no worries.
Also why are you converting to mzML file format? That might also be part of the problem. You can use wiff files so long as you select 'profile' for both MS1 and MS2 settings up front.
I have a follow up MS-Finder question - how do you clear the cache or previous work when you want to start a new project or set of IDs. For some reason there isn't a 'clear all' button in the program nor can you select previous items to be cleared individually. It's rather confusing.
MS-DIAL automatically aligns all extracted peaks. I don't think the batch information is taken into account in that process. You have to set the RT window in the alignment tab high enough to catch any RT drift from batch to batch and then it will recognise those peaks as the same feature. If not you will get multiple features with the peaks split between them. My students have tried various RT windows up to about 0.5 mins, dpending on how bad their batch effects are. The problem here is when you have isomeric pairs and the alignment mismatches them. There's no perfect solution so you have to accept some compromises. If you want to get the most out of the data you have to go through your entire data set peak-by-peak to correct dodgy alignments and integrations.
Thanks for the info. We've had some difficulty in reconciling a large batch of LC-MSMS data in which some subsets of collected samples shifted in RT by ~0.5 minutes or so. Trying to correct the data with the alignment wizard hasn't been super successful on our part, partly because the documentation for that process is poor and limited. If you don't do it then it the consequences are as you mention a larger than expected number of referenced ID matched hits spread over different %fills because the software appears to think that the shifted RTs and related reference QC samples are different analytes than the 'correct' samples and their respective QC samples. A thought was also to drop the QCs from the messed up samples, but it doesn't appear to do much to improve upon the issue. Your other post is helpful, but also frustrating because I feel like this should be a capability of the MS-Dial software, yet the options available to correct it either don't work or there isn't enough documentation information available, specifically for the alignment wizard, to devise a solution within the program itself. MS-Dial is an awesome program when it works, but when it doesn't it can be frustrating.
You have to go in and make a judgement call on which one is correct. Do you have a standard to compare it against to verify the expected RT? That would help resolve it. Otherwise you can open the data in MS-Dial and take a look to see if you have any representative MS2 to make a judgment call on based on the similarity to a known standard another group produced. Hopefully it's not just the loss of the sodium. Your %Fill for the last two denote that it was found in every sample at those RTs. Not sure if that helps either, but it might be an indicator.
There isn't enough information to go on here, but we could tell you how to better narrow it down with a bit more information.
In my haste I accidentally mixed up my file types. The Sciex .wiff raw files I was searching for were actually .mzXML files. Upon double checking the correct folder containing the .wiff files the program is behaving as expected. Let this be a lesson to double check the file type.
I'm still not getting any ref matched IDs or even suggested though when I parse the files with similar project parameters. Going back and comparing to v4.9 using the same raw files and the same in house library. Its really confusing.
I was excited to try out the new MS-Dial v5.1.22, but was surprised to run into some issues with file import. I was doubly confused by the tutorial videos provided which demonstrates import of Sciex files in a way that I cannot replicate.
The video is 'Creating new project' @ ~50 seconds it shows the user going to the file folder to retrieve the raw files. They have to use the file type selection choice to change from the default (.ABF) to the Sciex (.Wiff) file. When they do that the expected (.Wiffs) appear & can be selected for import. When I do that nothing shows up as it did in v4.9 & previous versions.
Doubly confounding is that the field for 'Analysis File Path' --> Browse which leads to the aforementioned window claims that, among other things, that '(Only ABF files now)' which you can briefly see in the video.
In converting my (.Wiff) files to (.ABF) I can see them when I repeat the process. Going forward with the project using the (.ABF) files I cannot get any usable ref matched IDs for the data files using similar project settings as we had before. I've tried several times to no avail. I've ensured the polarity and fragmentation types are all correct. I've tried our in-house RT enforceable library with many different settings. In addition I've also tried with both centroid and profile settings to be sure. Nothing improves the results in a way that comes close to the results we get in v4.9
This is quite confusing. Any thoughts on why this might be?
I just wanted to say as a user of your software that the new tutorial videos posted on YouTube are a huge boon for use of the software. While the written tutorials on github are also welcome it appears that many of them were made with older version of the software that have changed. Unfortunately the written tutorials don't necessarily reflect the changes that have been made to the software since. The video versions of the tutorial are a nice touch to show up-to-date information on the wonderful software this group has put together. Thank you so much for both the software & the informative videos.