Skip to main content
Topic: Optimize peak-picking (Read 4126 times) previous topic - next topic

Optimize peak-picking

Hi,

I´m looking for advice to optime perfomance of peak-picking for my MS data.

I am having some trouble to get all peaks of a chromatogram. I have tried to optimize the parameters to get most of them but i still loss one of the most intense peaks. In contrast, if I change the parameters to get that peak, I loss some of the less intense. I show the commands I am using above and i leave a link to get a file with the chromatograms and the peaks I am finding at the moment and the mzXML file. If it is possible i'd like to get all of the peaks.

https://github.com/maialba3/peak-picking-doubts

########################
library(xcms)
 
file <- "acq1.mzXML"
 
xs <- xcmsSet(file, method="centWave", ppm=10, peakwidth=c(3,25),
         snthresh=2, mzCenterFun="wMean", integrate=2, fitgauss=F,
         scanrange=NULL, noise=0, sleep=0, verbose.columns=F)
 
########################

Thank you,
Maribel

 

Re: Optimize peak-picking

Reply #1
I think this is gonna be very difficult to achieve with so much overlap. What I would try is lowering the minimum peakwidth. If that doesn't work last resort is trying matchedfilter instead of centwave.
Blog: stanstrup.github.io

Re: Optimize peak-picking

Reply #2
Hi,
which peak exactly are you missing ? As you see below, I can happily find
the M184T407.

Yours,
Steffen



Code: [Select]
sneumann@acryl:/tmp/maialba$ R

R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(xcms)
Loading required package: mzR
Loading required package: Rcpp

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unlist, unsplit

Loading required package: ProtGenerics
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: MSnbase
Loading required package: BiocParallel

This is MSnbase version 2.1.10
  Read '?MSnbase' and references therein for information
  about the package and how to get started.


Attaching package: ‘MSnbase’

The following object is masked from ‘package:stats’:

    smooth

The following object is masked from ‘package:base’:

    trimws


This is xcms version 1.51.9

>
> file <- "acq1.mzXML"
>
> xs <- xcmsSet(file, method="centWave", ppm=10, peakwidth=c(3,25), snthresh=2, mzCenterFun="wMean", integrate=2, fitgauss=F, scanrange=NULL, noise=0, sleep=0, verbose.columns=F)
DEBUG: using original centWave.
Detecting mass traces at 10 ppm ... OK
Detecting chromatographic peaks in 31566 regions of interest ... OK: 12594 found.
>
> peaks(xs)[peaks(xs)[,"mz"] < 185 & peaks(xs)[,"mz"] > 184 & peaks(xs)[,"rt"] < 450 & peaks(xs)[,"rt"] > 350, ]
            mz    mzmin    mzmax      rt   rtmin   rtmax       into       intb
 [1,] 184.0751 184.0749 184.0752 407.033 400.336 412.055 32211658.9 32192175.1
 [2,] 184.0748 184.0746 184.0750 448.890 442.190 456.424 45293500.1 45270113.3
 [3,] 184.0743 184.0742 184.0745 385.269 376.898 390.291  7070136.5  7048055.0
 [4,] 184.0743 184.0742 184.0745 371.038 366.016 376.898  2287296.5  2269111.5
 [5,] 184.0746 184.0745 184.0747 437.168 424.612 442.190 35453580.9 35425005.4
 [6,] 184.0745 184.0743 184.0746 416.241 412.055 424.612 19979393.7 19958609.4
 [7,] 184.0743 184.0742 184.0744 361.830 356.808 366.016   793141.5   777554.1
 [8,] 184.0743 184.0742 184.0745 395.314 390.291 400.336  4780042.9  4763156.6
 [9,] 184.0744 184.0743 184.0746 429.634 424.612 431.308  9079473.6  9067784.3
           maxo   sn sample
 [1,] 4220260.0 5793      1
 [2,] 3970454.8 5450      1
 [3,] 1122035.1 1539      1
 [4,]  421025.2  576      1
 [5,] 3052630.8 4190      1
 [6,] 2557234.2 3510      1
 [7,]  107556.0  146      1
 [8,]  691897.0  948      1
 [9,] 1943207.9 2666      1
>
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] xcms_1.51.9         MSnbase_2.1.10      BiocParallel_1.9.4
[4] Biobase_2.30.0      ProtGenerics_1.7.0  BiocGenerics_0.16.1
[7] mzR_2.9.10          Rcpp_0.12.8        

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2     BiocInstaller_1.20.3   plyr_1.8.4            
 [4] iterators_1.0.8        tools_3.2.3            zlibbioc_1.16.0       
 [7] MALDIquant_1.16        digest_0.6.11          tibble_1.2            
[10] preprocessCore_1.32.0  gtable_0.2.0           lattice_0.20-33       
[13] Matrix_1.2-3           foreach_1.4.3          stringr_1.1.0         
[16] S4Vectors_0.8.11       IRanges_2.4.8          multtest_2.26.0       
[19] stats4_3.2.3           grid_3.2.3             impute_1.44.0         
[22] survival_2.40-1        XML_3.98-1.5           RANN_2.5              
[25] limma_3.26.9           ggplot2_2.2.1          reshape2_1.4.2        
[28] magrittr_1.5           MASS_7.3-45            splines_3.2.3         
[31] scales_0.4.1           pcaMethods_1.60.0      codetools_0.2-14      
[34] MassSpecWavelet_1.36.0 assertthat_0.1         mzID_1.13.0           
[37] colorspace_1.3-2       stringi_1.1.2          affy_1.48.0           
[40] lazyeval_0.2.0         munsell_0.4.3          doParallel_1.0.10     
[43] vsn_3.38.0             affyio_1.40.0         




--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: Optimize peak-picking

Reply #3
Hi,
I still miss T338, T345, T370, T470, T500, T510, T580 and T620. Do you find them?

I've also tried to change the minimum peakwidth but I still miss some peaks... when I find peaks with overlapping I miss the less intense peaks.