Skip to main content
Topic: Gap-filled  (Read 6904 times) previous topic - next topic

Gap-filled

Hello Hiroshi,
    I don't know if anyone has asked about this in the forum before. I have difficulty understanding what "gap-fill" is and when/how does the program gap-fill. I am currently using version 4.18.
    For instance, under the EIC plot window on the top, there is a table-viewer function on the right-click menu where I can view chromatograms of the specific alignment across samples. In one of the columns, it tells whether the peak was gap-filled (shown as -2) or not. However, what I don't understand is the last column where it shows the peak with points in the chromatogram: for those that are gap-filled, does it make up the points of a peak? If it is gap-filled meaning that the peak does not exist, why would it create points of a non-existing peak? On the other hand, if the points were based on an actual/existing peak, why would the program gap-fill? If it sounds confusing and you prefer some screenshots, please let me know. I appreciate your help!
   

-Luann

Re: Gap-filled

Reply #1
I would like to get more familiar with MS DIAL software. I have been working with Sciex software previously. However, I would like to explore more about MS DIAL software as well. To do so, I have two questions from you as follows.

•   Can I use MS DIAL software for analyzing the PFAS compounds focusing on non-target analysis?
•   I have raw data with .wiff format and I would like to convert them to abf format with abf convertor. But, I get the following error when I try to convert the files to abf. Do you have any thoughts on how can I get rid of this issue?

Re: Gap-filled

Reply #2
Hi,

I pasted my previous answers for the same topic (but it is not posted in this forum). I hope it helps you.
Thanks,

Hiroshi

Question 1. If I understand correctly, the gap filling add a peak to a sample in which a peak was not found? Is that correct?

My answer: The peak less than "minimum abundance" parameter of peak detection tab will be not detected even though there is a peak, and it can be gap-filled for the alignment process if the peak having the same RT and m/z is detected in other samples.

Question 2. it seems that in the aligment a sample was labelled as gap filled but when I look to the sample I have a peak there.

My answer: This is actually very difficult to let you know. It should be a problem in our alignment/ and gap-filling process.
In the alignment process, this program will make a master peak list (in a worst case) like:

(A) RT: 1.05 min, m/z: 100.01
(B) RT: 1.1 min, m/z: 100.015
(C) RT: 1.2 min, m/z: 100.03.

Then, a peak having e.g. RT: 1.1 min and m/z: 100.02 in a file e.g. (Q) will be aligned to (B) of the master list. After that, this program will recognize that the file (Q) does not have the certain peaks for (A) and (C), and then the program will perform the gap filling method to fill the values. However, for example, when users use an RT tolerance as 0.1 min and an m/z tolerance as 0.01 Da for the alignment tolerance, newly created peak for a master list's (A) in the file (Q) should be nearly equal to what the file (Q) has for the master peak list's (B) because the extracted ion chromatogram for the gap fill process for peak (A) should be drawn by the tolerances of RT=1.05 min +/ 0.1 min, m/z=100.01+/- 0.01. Because of these, MS-DIAL is supposed to export very similar peak height/area values for the master peak list's (A) and (B) in the result of file (Q). Here, (B) is recognized as "detected" and (A) is recognized as "gap-filled" although the origin of peak is the same between them.
You can know the peak origins (detected or gap-filled) in the 'peak id' matrix from the alignment result export option where the "-1" value is described when the peak value is inserted by the gap-filled process.




Re: Gap-filled

Reply #3
Thank you lh1989 for asking this question and Hiroshi for your explaination. I have to admit that I am not really satisfied with the automated gap filling and up to this moment I thought I can switch on/off the gap filling according to my demands. I also wondered about the fact that there weren't gaps in my dataset, even when "Data processing" -> "Analysis parameter setting" -> "Alignment" -> "gap filling by compulsion" was off.

Now I am considering to export the aligned data and replace the gap filled spots with 1/10 of the peak minimum to concentrate on measured values instead of interpolations. I have the feeling that this modification of my data has consequences on the final assessment (especially if  there were many gaps filled). But I really like the statistics etc. of MS Dial and really don't want to leave the softwarte! How do the others feel about the gap filling? Or is there another option to avoid the gap filling?

Cheers,
Stefan

Re: Gap-filled

Reply #4
After I have learned how to identify the gap filled values in my dataset I went deeper into the analysis how much this might influence my dataset. What I basically did was:

1) I filtered all gap filled spots and compared their mean with the mean of the real values (measured) of this peak (row-wise).
2) I calculated a ratio of real value / gap filled values and converted this ratio into a factor.

What I received from this comparison is plotted in Plot_gap_filling - Copy.png. Most of you will find this plot self explaining but still I want to highlight what I want to show.

Peaks on the left of the red line have (in my case ~22% of the data) have higher values in the gap filled cells than the measured values and all the bars between 1 (equal values) and 5 (measured values 5x higher) depict cases (43 % of all peaks) where the real values and the interpolated values are very close.

This has a great impact on my results and I am wondering if I am doing something wrong?! Is there someone in the metabolomics community who want to comment on this?

I greatly appreciate your comment or explaination!

Stefan

Re: Gap-filled

Reply #5
Hi Stefan,

I am actually interested in this result. Could you please generate the results for both with and without the option of "gap filling by compulsion"?
Thanks,

Hiroshi

 

Re: Gap-filled

Reply #6
Dear Hiroshi,

thank you for your reply! I really appreciate having contact to the developer directly (which wasn't the case with any other software so far). Now I ran the alignment with and without the "Gap filling by compulsion option". There is basically no difference in the output. Still there were many peaks filled by values higher than the measured values. Please have a look to the attached Plot below. This time I have evaluated another dataset, so there is a slight difference in the Plot (but the main message stays the same).

If someone else want's to explore his/her alignment results, here is my R Code. Please adapt all red marked spots to your data.
 
Quote
`%notin%` <- Negate(`%in%`)

aligned_samples<-64

ID<-read.table("PeakID_1_20201221427.txt", header = TRUE, sep = "\t", dec = ".", skip = 4)
ID_data<-ID[,29:dim(ID)[2]]

data<-read.table("Normalized_1_20201221427.txt", header = TRUE, sep = "\t", dec = ".", skip = 4)
data<-data[,29:(28+aligned_samples)]

flag<--2

ratio<-c()

for(i in 1: dim(data)[1]){
  ratio<-c(ratio,mean(as.numeric(data[i,which(1:dim(ID_data)[2] %notin% which(ID_data[i,]==flag))]))/mean(as.numeric(data[i,which(ID_data[i,]==flag)])))
  data[i,which(ID_data[i,]==flag)]<-NA
}

no_gap_fill<-sum(is.nan(ratio)) # Summarizes the amount of rows without any gap filled cell (complete alignments)

ratio<-ratio[-which(is.nan(ratio))] # removes complete alignments from ratio vactor

min_max<-c(floor(min(log10(ratio))), ceiling(max(log10(ratio))))

# Histogram Plot
hist(log10(ratio), las=1, main="MS-DIAL log10(real values/gap filled values)", breaks=100, xlab="log10 ratio", xlim=min_max, xaxt="n")
abline(v=log10(1), col="red")
axis(1, at=min_max[1]:min_max[2], labels = 10^abs(min_max[1]:min_max[2])*c(rep(-1, abs(min_max[1])), 1, rep(1, abs(min_max[2]))))



Cheers,
Stefan

Re: Gap-filled

Reply #7
Dear Hiroshi,

I have another question regarding the default gap filling. Is any peak that was in-silico generated (and marked by -2) excluded from the filtering of the blanks. In a worst case you do not have peaks in the blank but after the alignment in-silico generated peaks have replaced the empty spots. Now those peaks might have the same avarage as the sample and therefore the peak is excluded from the alignment. Have you considered this scenario?

Greetings from Germany
Stefan

Re: Gap-filled

Reply #8
Hi Stefan,

can you share the data set with me with some explanation slides to understand your issue on my side?
I will take a look to improve msdial program very soon.
Thanks,

Hiroshi

Re: Gap-filled

Reply #9
Hi guys, any follow up on this thread.?
Would it be a way forward to disable the gap filled algorith so that users can decide on their own how to input missing values?
Best regards, Carlos.

Re: Gap-filled

Reply #10
Hey Everyone,

I was busy with paper writing so I haven't checked the forum for a while. @ Hiroshi:I will supply some data for testing by the end of the week.

I could imagine a simple solution to this problem by adding the option "clear gap-filled values after alignment" in the alignment tab (to guarantee that the barplot of the peak is not affected by the gap filled values) or adding this option at least in the alignment export. The location of the gap-filled spots is tracked so this shouldn't represent a big hurdle.

But I know there are so many requests to Hiroshi so I don't want to underestimate the workload and rather say thank you for this handy tool!

Cheers
Stefan

Re: Gap-filled

Reply #11
Hi Carlos and Stefan

please evaluate the following msdial program.
https://briefcase.riken.jp/public/T9PMQAKnKYhAAkc

Here, I changed the meaning of "gap filling by compulsion" option where the peaks will not be filled completely in the alignment gap-filling process. Just untick "Gap filling by compulsion" option of the alignment tab.

Let me know your thought.
Thanks,

Hiroshi

Re: Gap-filled

Reply #12
Hi Hiroshi, it seems to work just fine!
Thanks for it.
Best regards,

Re: Gap-filled

Reply #13
Dear Hiroshi,

Is this improvement incorporated in version 4.7? Beacuse I have the feeling to have the same issue.

Cheers,

Henry