Johannes - thanks for taking the time to go through and find the error, and for tweaking my code. I have learned a few more short cuts in looking at how you changed the function.
With this, I am not ready to move into XCMS > 3. I look forward to seeing future developments in the program.
Hi Jan, Sorry...sample code would be helpful. In the section below where I source 'peakShape_XCMS3.r' that is the code in the chunk from my first post (if I could figure out how to post an attachment, I would just attach it...).
A while back, Tony Larson posted code on the old Google groups forum. The code went through each mzRT feature and checked for a Gaussian fit for each peak. I have found it is stricter that the 'fitgauss=TRUE' in centWave and using XCMS < 3.0, I implement the code after I do the peakPicking. I am trying to implement this in XCMS > 3.0, but have run into one final hiccup when I try to reinsert my results back into the XCMSnExp object.
The code follows. I have left the 'old code' in the file for now, but commented out. I have managed to figure out how to get the data I want out of the mzML files, but I am stuck at the end (comment says 'STUCK HERE'). It seems like I should be able to use chromPeaks <- to set my answer. It will allow me to do this, but then I lose all the chromatography information that is needed in downstream processing.
Any thoughts? This is the final step keeping me from switching to XCMS > 3.0, so I hope to get it working.
#and then proceed with retention time correction and correspondence/grouping # Decreasing cor.val will allow more non-gaussian peaks through the filter
#original file version from the Google Groups for xcms from Tony Larson #KL corrected this version 8/23/2011 for XCMS < 3 #KL updating to using XCMS3, 10/30/2018
#original peakShape function to remove non-gaussian peaks from an xcmsSet #code originally had cor.val = 0.9; 0.5 is too low (not doing enough pruning) #object is updated to be an XCMSnExp class
#now set the Rt range...use sub.peakmat min and max RT rtRange = c(sub.peakmat[p,"rtmin"],sub.peakmat[p,"rtmax"]) subsetOnMZandRT <- filterRt(subsetOnMZ, rt = rtRange)
eics <- intensity(subsetOnMZandRT) #get the intensity values eics[sapply(eics, function(x) length(x)==0)] <- 0 #if empty in a scan, convert to 0 eics <- as.integer(unlist(eics)) #use as.double for Lumos
#filter out features that are less than the noise level I have already set... setThreshold <- which(eics < useNoise) eics <- eics[-setThreshold] rm(setThreshold)
#remove any NA (easier bc downstream leaving it in causes issues) eics <- eics[!is.na(eics)]
getIdx <- which(eics == min(eics))[1] #if multiple values, just need the first match #set min to 0 and normalise eics <- eics-eics[getIdx]
if(max(eics,na.rm=TRUE)>0) { eics <- eics/max(eics, na.rm=TRUE) } #fit gauss and let failures to fit through as corr=1 fit <- try(nls(y ~ SSgauss(x, mu, sigma, h), data.frame(x = 1:length(eics), y = eics)),silent=T)
if(class(fit) == "try-error") { corr[p] <- 1 } else { #calculate correlation of eics against gaussian fit if (length(which(!is.na(eics - fitted(fit)))) > 4 && length(!is.na(unique(eics)))>4 && length(!is.na(unique(fitted(fit))))>4) { cor <- NULL options(show.error.messages = FALSE) cor <- try(cor.test(eics,fitted(fit),method="pearson",use="complete")) options(show.error.messages = TRUE) if (!is.null(cor)) { if(cor$p.value <= 0.05) { corr[p] <- cor$estimate } else { corr[p] <- 0 } } else corr[p] <- 0 } else corr[p] <- 0 } } #this ends to the 'p' loop (going through one mzRT feature at a time)
Jo, This is perfect. Exactly what I needed. The plots are helpful, and the code to loop over each row in chrs generates the PDF I want. For others, here's what I have:
#setup the name of the PDF file that I want: fName_peaksPicked <- 'yourFilenamehere.pdf'
#preset the colors, use ncol to figure out how many samples I have (and hence # of colors needed)nColors <- ncol(chrs) useColors <- brewer.pal(nColors,'YlGnBu')
cairo_pdf(file = fName_peaksPicked,onefile=TRUE) for (i in 1:nrow(chrs)){ par(mfrow = c(1, 2)) plot(chrs[i, ], col = useColors) plot(chrs_raw[i, ], col = useColors)
# You could also use the highlightChromPeaks function to indicate the identified #peaks in the individual XICs: if (TRUE) { plot(chrs[i, ], col = useColors) highlightChromPeaks(xgF, rt = range(lapply(chrs[i, ], rtime)), mz = range(lapply(chrs[i, ], mz)), border = useColors) } } dev.off()
Thanks for the code. This really helps me out. I will take a look at it and see how things work. Either way I will make sure to post a reply later so that we can see how this works.
CoreyG: I have swapped between the XCMSnExp and xcmsSet using the as function. I used that to have CAMERA work for isotopes and adducts. It 'sort-of' works for the code that I posted, although I get an error message during the plotting that I cannot quite sort out:
Error in legend(rtrange[i, 2], max(maxint), legtext, col = unique(col[sampidx]), : 'legend' is of length 0
I am not sure where I would set the 'legend' parameters, so I am not sure how to get around this.
Jo: thanks for thinking about how to implement this in XCMS3. I used to use the PDF generated during the centWave peakPicking parameters, but at some point (even prior to XCMS3) that went away. Either is fine with me. I just like to see all/most of my features to see how I am doing beyond a pre-defined set of mz/RT values.
Thanks in advance. I look forward to moving forward with the transition to XCMS3.
I am in the process of updating to use XCMS3. So far, I like XCMS3 and most of the transition to XCMS3 has been easy in terms of how to set peak picking, retention time correction, and defining correspondence across a set of files (the old 'grouping'). However, one thing I miss is the ability to plot all of our mzRT features into a PDF that we could use to assess how well our parameters were working. I find multiple examples to extract a pre-determined mz / RT combination in the vignettes provided with XCMS3. Therefore, if I know the mz and retention time for a single peak, I can easily plot it. However, what I cannot figure out is out to get all the peaks across my set of samples.
I used to use the code listed below. In the example, xgF is the data after peak picking, group, retention time correction, and fillPeaks.
#colorList should be as long as the number of possible levels in the dataset... colorList <- c("deepskyblue","magenta","forestgreen","darkorchid","firebrick", "gold")
#plot first feature from diffreport (not sorted after pval!!) cairo_pdf(file = fName_peaksPicked,onefile=TRUE) for (i in 1:nrow(xs.fill@groups)){ #for (i in 1:5){
par(mfrow= c(2,1)) #rt row data plot(xeic.raw, xs.fill,groupidx=i,col=colorList) #rt corrected data plot(xeic.corrected, xs.fill,groupidx=i,col=colorList)
} dev.off()
My sessionInfo()
R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1
We invite applications for one or more postdoctoral researchers in the area of environmental metabolomics and/or environmental analytical chemistry. The Kujawinski lab uses analytical chemistry to explore the intersection of microbial biology and chemistry within marine systems. Potential projects include: use of metabolomics to understand the physiology of model marine organisms, comparison of laboratory and field data to infer the contribution of model organisms to marine organic matter cycling, and others. The successful applicant will be trained in all aspects of environmental sample analysis, including sample processing, mass spectrometry, and data analysis. A PhD in chemical oceanography, biogeochemistry, analytical chemistry or a related field is required. Experience with mass spectrometry and downstream data analysis as well as knowledge of programming tools such as MATLAB, Python, and R would be advantageous. Candidates with interests in linking chemical and biological datasets are especially encouraged to apply. The position is available for one year, renewable for a maximum of two years. Review of applications will begin Feb 1, 2018 and continue until one position is filled. Inquiries should be addressed to ekujawinski@whoi.edu.
We invite applications for multiple graduate research assistants in the area of environmental microbial metabolomics and/or environmental analytical chemistry. Background: Graduate students in the Kujawinski lab at the Woods Hole Oceanographic Institution (Woods Hole, Massachusetts, USA) are enrolled primarily in the MIT/WHOI Joint Program in Applied Ocean Sciences and Engineering. The research group uses analytical chemistry to explore the intersection of microbial biology and chemistry within Earth’s systems, with a specific focus on the oceans. We have access to sophisticated mass spectrometry tools as well as computational resources and develop new tools through which to interrogate the chemical composition of microbes and their environments. We collaborate with biologists in oceanography, soil science and human systems in order to place our work in the broadest context. My philosophy: Graduate students in my research group are full colleagues in our intellectual enterprise (see projects listed on this website). Students are expected to develop their own research questions and to participate in the research group activities to the fullest. Thus, I support independent creative individuals who are eager to answer sophisticated science questions at the cutting edge of environmental chemistry and microbiology. To the best of my ability, I provide financial support for laboratory research, field activities and networking opportunities. I encourage my students to explore non-traditional career opportunities and to seek out additional training resources, as their time and interests allow, with the goal of training well-rounded scientists for their chosen careers. Who succeeds: The lab is based on chemistry and so most of our members have training in the chemical arts. However, we welcome chemically-minded individuals with a physical science degree. We develop data analysis pipelines, when needed, and so students are encouraged to learn computer programming in MATLAB, R and/or Python. In my experience, students who have taken at least a year off after their undergraduate studies are more prepared for graduate school, so I give preference to individuals who have experience beyond their undergraduate degree. That said, excellent applications from graduating seniors are always considered. For additional information about graduate school at WHOI, click here. Questions should be addressed to Elizabeth Kujawinski (ekjuawinski@whoi.edu).
We invite applications for a postdoctoral researcher in the area of environmental metabolomics and/or environmental analytical chemistry. The Kujawinski lab at the Woods Hole Oceanographic Institution (Woods Hole, Massachusetts, USA) uses analytical chemistry to explore the intersection of microbial biology and chemistry within marine systems. The funded project involves complex mixture analysis by qualitative and quantitative mass spectrometry, with a focus on development of new tools for analysis of metabolites dissolved in seawater. The successful applicant will be trained in all aspects of environmental sample analysis, including sample processing, mass spectrometry, and data analysis. A PhD in chemical oceanography, biogeochemistry, analytical chemistry or a related field is required. Experience with mass spectrometry and downstream data analysis as well as knowledge of programming tools such as MATLAB, Python, and R would be advantageous. Candidates with interests in linking chemical and biological datasets are especially encouraged to apply. The position is available for one year, renewable for a maximum of two years. Inquiries should be addressed to ekujawinski@whoi.edu. Please apply online through the Human Resources Department.
I would like to compare the results from findmzROI with the output from findPeaks.centWave.
In my dataset, I know we have peaks that are not being resolved with our LC method. I would like to know how many different m/z values this is. Here, by not resolved, I mean I have many isomers that span multiple minutes in our LC run. I am most of the way through this thanks to some of the code and responses on this forum. However, I have found some undocumented output from findPeaks.centWave and am hoping that someone knows a little more about the behind the scenes steps to help me understand the output. Here's what I have done:
1. Find the list of ROIs using findmzROI as described by HP Benton (available here) 2. Run findPeaks.Centwave using my parameters of choice and verbose.columns = TRUE.
After step 1, I have a list of features (mz, mzmin,mzmax, scmin, scmax, length, intensity). After step 2, I have everything described in the help, and a few additional columns: lmin, lmax, sample.
What I would like to do is to directly tie my results from step 1 with the shorter list in step #2. However, I don't see any obvious index/marker that does this. I could search with some narrow m/z and retention time window, but I would rather know which ROI corresponds to which feature without setting some bounds on search parameters.
I hoped that 'f' (described as the 'region number of m/z ROI where peak was localised') would be helpful, but I am not sure how to use this information.
Thanks in advance for thoughts on this.
From sessionInfo() R version 3.0.3 (2014-03-06) XCMS_1.38.0
I do set fitgauss to TRUE when I do the centwave step. However, I was finding that I still had peaks that were not as high quality as I hoped. The peakShape code allows me to be stricter in requiring peaks to meet a Gaussian fit. The code also has the benefit that I can set how strict I want to be by setting the correlation value higher or lower.
Pre-2011, I found a script on the XCMS Google groups forum that was written by Tony Larson. I updated and now use it regularly to filter out peaks that don't meet Gaussian criteria. I run this script immediately after the peak picking in XCMS.
#original file version from the Google Groups for xcms from Tony Larson #Krista Longnecker updated this August 2011
#peakShape function to remove non-gaussian peaks from an xcmsSet #code originally had cor.val = 0.9; 0.5 is too low (not doing enough pruning) peakShape <- function(object, cor.val=0.9) { require(xcms)
files <- object@filepaths peakmat <- object@peaks peakmat.new <- matrix(-1,1,ncol(peakmat)) colnames(peakmat.new) <- colnames(peakmat) for(f in 1:length(files)) { xraw <- xcmsRaw(files[f], profstep=0) sub.peakmat <- peakmat[which(peakmat[,"sample"]==f),,drop=F] corr <- numeric() for (p in 1:nrow(sub.peakmat)) { #extract using rawEIC method +/1 0.01 m/z to give smoother traces tempEIC <- as.integer(rawEIC(xraw,mzrange=c(sub.peakmat[p,"mzmin"]-0.001,sub.peakmat[p,"mzmax"]+0.001))$intensity) minrt.scan <- which.min(abs(xraw@scantime-sub.peakmat[p,"rtmin"]))[1] maxrt.scan <- which.min(abs(xraw@scantime-sub.peakmat[p,"rtmax"]))[1] eics <- tempEIC[minrt.scan:maxrt.scan] #set min to 0 and normalise eics <- eics-min(eics) if(max(eics)>0) { eics <- eics/max(eics) } #fit gauss and let failures to fit through as corr=1 fit <- try(nls(y ~ SSgauss(x, mu, sigma, h), data.frame(x = 1:length(eics), y = eics)),silent=T) if(class(fit) == "try-error") { corr[p] <- 1 } else { #calculate correlation of eics against gaussian fit if(length(which(!is.na(eics-fitted(fit)))) > 4 && length(!is.na(unique(eics)))>4 && length(!is.na(unique(fitted(fit))))>4) { cor <- NULL options(show.error.messages = FALSE) cor <- try(cor.test(eics,fitted(fit),method="pearson",use="complete")) options(show.error.messages = TRUE) if (!is.null(cor)) { if(cor$p.value <= 0.05) corr[p] <- cor$estimate else corr[p] <- 0 } else corr[p] <- 0 } else corr[p] <- 0 } } filt.peakmat <- sub.peakmat[which(corr >= cor.val),] peakmat.new <- rbind(peakmat.new, filt.peakmat) n.rmpeaks <- nrow(sub.peakmat)-nrow(filt.peakmat) cat("Peakshape evaluation: sample ", sampnames(object)[f]," ",n.rmpeaks,"/",nrow(sub.peakmat)," peaks removed","\n") if (.Platform$OS.type == "windows") flush.console() }