Hello Hiroshi,
    I don't know if anyone has asked about this in the forum before. I have difficulty understanding what "gap-fill" is and when/how does the program gap-fill. I am currently using version 4.18.
    For instance, under the EIC plot window on the top, there is a table-viewer function on the right-click menu where I can view chromatograms of the specific alignment across samples. In one of the columns, it tells whether the peak was gap-filled (shown as -2) or not. However, what I don't understand is the last column where it shows the peak with points in the chromatogram: for those that are gap-filled, does it make up the points of a peak? If it is gap-filled meaning that the peak does not exist, why would it create points of a non-existing peak? On the other hand, if the points were based on an actual/existing peak, why would the program gap-fill? If it sounds confusing and you prefer some screenshots, please let me know. I appreciate your help!


I pasted my previous answers for the same topic (but it is not posted in this forum). I hope it helps you.


Question 1. If I understand correctly, the gap filling add a peak to a sample in which a peak was not found? Is that correct?

My answer: The peak less than "minimum abundance" parameter of peak detection tab will be not detected even though there is a peak, and it can be gap-filled for the alignment process if the peak having the same RT and m/z is detected in other samples.

Question 2. it seems that in the aligment a sample was labelled as gap filled but when I look to the sample I have a peak there.

My answer: This is actually very difficult to let you know. It should be a problem in our alignment/ and gap-filling process.
In the alignment process, this program will make a master peak list (in a worst case) like:

(A) RT: 1.05 min, m/z: 100.01
(B) RT: 1.1 min, m/z: 100.015
(C) RT: 1.2 min, m/z: 100.03.

Then, a peak having e.g. RT: 1.1 min and m/z: 100.02 in a file e.g. (Q) will be aligned to (B) of the master list. After that, this program will recognize that the file (Q) does not have the certain peaks for (A) and (C), and then the program will perform the gap filling method to fill the values. However, for example, when users use an RT tolerance as 0.1 min and an m/z tolerance as 0.01 Da for the alignment tolerance, newly created peak for a master list's (A) in the file (Q) should be nearly equal to what the file (Q) has for the master peak list's (B) because the extracted ion chromatogram for the gap fill process for peak (A) should be drawn by the tolerances of RT=1.05 min +/ 0.1 min, m/z=100.01+/- 0.01. Because of these, MS-DIAL is supposed to export very similar peak height/area values for the master peak list's (A) and (B) in the result of file (Q). Here, (B) is recognized as "detected" and (A) is recognized as "gap-filled" although the origin of peak is the same between them.
You can know the peak origins (detected or gap-filled) in the 'peak id' matrix from the alignment result export option where the "-1" value is described when the peak value is inserted by the gap-filled process.

Thank you lh1989 for asking this question and Hiroshi for your explaination. I have to admit that I am not really satisfied with the automated gap filling and up to this moment I thought I can switch on/off the gap filling according to my demands. I also wondered about the fact that there weren't gaps in my dataset, even when "Data processing" -> "Analysis parameter setting" -> "Alignment" -> "gap filling by compulsion" was off.

Now I am considering to export the aligned data and replace the gap filled spots with 1/10 of the peak minimum to concentrate on measured values instead of interpolations. I have the feeling that this modification of my data has consequences on the final assessment (especially if  there were many gaps filled). But I really like the statistics etc. of MS Dial and really don't want to leave the softwarte! How do the others feel about the gap filling? Or is there another option to avoid the gap filling?



After I have learned how to identify the gap filled values in my dataset I went deeper into the analysis how much this might influence my dataset. What I basically did was:

1) I filtered all gap filled spots and compared their mean with the mean of the real values (measured) of this peak (row-wise).
2) I calculated a ratio of real value / gap filled values and converted this ratio into a factor.

What I received from this comparison is plotted in Plot_gap_filling - Copy.png. Most of you will find this plot self explaining but still I want to highlight what I want to show.

Peaks on the left of the red line have (in my case ~22% of the data) have higher values in the gap filled cells than the measured values and all the bars between 1 (equal values) and 5 (measured values 5x higher) depict cases (43 % of all peaks) where the real values and the interpolated values are very close.

This has a great impact on my results and I am wondering if I am doing something wrong?! Is there someone in the metabolomics community who want to comment on this?

I greatly appreciate your comment or explaination!


Hi Stefan,

I am actually interested in this result. Could you please generate the results for both with and without the option of "gap filling by compulsion"?