Metabolomics Society Forum

Software => R => XCMS => Topic started by: A_Escourrou on July 01, 2016, 07:39:39 AM

Title: [QUESTION] Package that would filter the "bad gaussian curves" kept by XCMS
Post by: A_Escourrou on July 01, 2016, 07:39:39 AM
Hello everyone,

This is my first message to this forum, so first of all I'd like to thank the administrators, whom already helped me by creating it, indeed I could find a lot of answers to my questions!

Now, I am hoping that some of you might be able to answer this question:

As you do, we classically process MS data with XCMS and perform deconvolutions with the help of CAMERA. And as you already know, a challenge of metabolomic assays is to deal with the huge amount of observed variables.

Therefore, I'd like to know if an XCMS-dependant R package aiming to filter the "bad gaussians" kept by XCMS and processed in CAMERA already exists. Indeed it is quite time consuming to check at every single EIC, and it appears that there often are "wrong matches" of adducts/fragments in CAMERA due to peak shape.

I might be dreaming but that would be great!

I hope I was clear enough, and thank you for your time,

Antoine Escourrou
Title: Re: [QUESTION] Package that would filter the "bad gaussian curves" kept by XCMS
Post by: jcapellades on July 02, 2016, 02:18:55 PM
Hi Antoine,
I do not know if any package that does so as you describe.
If you want to improve your xcms processing settings, you may test this R package: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0562-8

Maybe someone else can help you better,

Jordi
Title: Re: [QUESTION] Package that would filter the "bad gaussian curves" kept by XCMS
Post by: A_Escourrou on July 05, 2016, 07:44:38 AM
Thank you Jordi for this quick answer.

IPO is a package that we already use too, and indeed it is part of the answer to my question, as we can perform automatization of XCMS parameters regarding the data distribution. This is quite useful for the data treatment to be more reliable, but still there is no function to filter the "bad gaussians" that would remain even after XCMS and IPO treatment.

Then my question still remains despite your useful answer!

Thank you again,

Antoine
Title: Re: [QUESTION] Package that would filter the "bad gaussian curves" kept by XCMS
Post by: hpbenton on July 05, 2016, 12:20:37 PM
Dear Antoine,

You could also look at XCMS Online. In the mobile app we have release a quick EIC sorter named Hot or Not. This allows you to quickly annotate the good and bad EICs in the dataset. The annotations can then be used as a filter in the table view.

Cheers, Paul
Title: Re: [QUESTION] Package that would filter the "bad gaussian curves" kept by XCMS
Post by: A_Escourrou on July 07, 2016, 03:57:22 AM
Thank you Paul for this answer! I did not know at all that there was a mobile app for XCMS online. This single information will be useful for me and surely for others as well!

This is really user-friendly, and allows me to save time during re-processing, which is what I was ultimately looking for, so thank you again !
Title: Re: [QUESTION] Package that would filter the "bad gaussian curves" kept by XCMS
Post by: krista on August 16, 2016, 11:01:45 AM
Pre-2011, I found a script on the XCMS Google groups forum that was written by Tony Larson. I updated and now use it regularly to filter out peaks that don't meet Gaussian criteria. I run this script immediately after the peak picking in XCMS.

I hope this helps,
Krista

The script is as follows:


Code: [Select]
#original file version from the Google Groups for xcms from Tony Larson
#Krista Longnecker updated this August 2011

#peakShape function to remove non-gaussian peaks from an xcmsSet
#code originally had cor.val = 0.9; 0.5 is too low (not doing enough pruning)
peakShape <- function(object, cor.val=0.9)
{
require(xcms)

files <- object@filepaths
peakmat <- object@peaks
peakmat.new <- matrix(-1,1,ncol(peakmat))
colnames(peakmat.new) <- colnames(peakmat)
for(f in 1:length(files))
        {
        xraw <- xcmsRaw(files[f], profstep=0)
        sub.peakmat <- peakmat[which(peakmat[,"sample"]==f),,drop=F]
        corr <- numeric()
        for (p in 1:nrow(sub.peakmat))
                {
                #extract using rawEIC method +/1 0.01 m/z to give smoother traces
                tempEIC <-
as.integer(rawEIC(xraw,mzrange=c(sub.peakmat[p,"mzmin"]-0.001,sub.peakmat[p,"mzmax"]+0.001))$intensity)
                minrt.scan <- which.min(abs(xraw@scantime-sub.peakmat[p,"rtmin"]))[1]
                maxrt.scan <- which.min(abs(xraw@scantime-sub.peakmat[p,"rtmax"]))[1]
                eics <- tempEIC[minrt.scan:maxrt.scan]
                #set min to 0 and normalise
                eics <- eics-min(eics)
                if(max(eics)>0)
                        {
                        eics <- eics/max(eics)
                        }
                #fit gauss and let failures to fit through as corr=1
                fit <- try(nls(y ~ SSgauss(x, mu, sigma, h), data.frame(x =
1:length(eics), y = eics)),silent=T)
                if(class(fit) == "try-error")
                        {
                        corr[p] <- 1
                        } else
                        {
                        #calculate correlation of eics against gaussian fit
                        if(length(which(!is.na(eics-fitted(fit)))) > 4 &&
length(!is.na(unique(eics)))>4 && length(!is.na(unique(fitted(fit))))>4)
                                {
                                cor <- NULL
                                options(show.error.messages = FALSE)
                                cor <- try(cor.test(eics,fitted(fit),method="pearson",use="complete"))
                                options(show.error.messages = TRUE)
                                if (!is.null(cor))
                                        {
                                        if(cor$p.value <= 0.05) corr[p] <- cor$estimate else corr[p] <- 0
                                        } else corr[p] <- 0
                                } else corr[p] <- 0
                        }
                }
        filt.peakmat <- sub.peakmat[which(corr >= cor.val),]
        peakmat.new <- rbind(peakmat.new, filt.peakmat)
        n.rmpeaks <- nrow(sub.peakmat)-nrow(filt.peakmat)
        cat("Peakshape evaluation: sample ", sampnames(object)[f],"
",n.rmpeaks,"/",nrow(sub.peakmat)," peaks removed","\n")
        if (.Platform$OS.type == "windows") flush.console()
        }

peakmat.new <- peakmat.new[-1,]

object.new <- object
object.new@peaks <- peakmat.new
return(object.new)
}

Title: Re: [QUESTION] Package that would filter the "bad gaussian curves" kept by XCMS
Post by: Jan Stanstrup on August 16, 2016, 11:24:36 AM
Interesting. But centwave already output "egauss" (RMSE of Gaussian fit). Couldn't you just filter by that? As far as I can read in the code it is basically doing the same. Just using correlation instead of RMSE.
Title: Re: [QUESTION] Package that would filter the "bad gaussian curves" kept by XCMS
Post by: krista on August 17, 2016, 08:01:57 AM
I do set fitgauss to TRUE when I do the centwave step. However, I was finding that I still had peaks that were not as high quality as I hoped. The peakShape code allows me to be stricter in requiring peaks to meet a Gaussian fit. The code also has the benefit that I can set how strict I want to be by setting the correlation value higher or lower.