Show Posts - sneumann

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - sneumann

XCMS Online / Re: Questions about LCMS data preprocessing R vignette

October 17, 2018, 11:06:28 PM

So,

COURTNEY SCHIFFMAN wrote:
> In the vignette you say "We set it to 20,80 for the present example
> data set" when referring to the peakwidth parameter for centWave, but
> in the actual function you use 30,80.

Ah, that discrepancy is clearly a typo then.

> This makes a big difference, which do you recommend?

That really depends on the chromatography and gradient used.
E.g., on a 20 minute UPLC gradient we went down to c(5,12).

One way to check is to plot what you actually get:

hist(peaks(xs)[,"rtmax"]-peaks(xs)[,"rtmin"], breaks=100)

This shows the peakwidths distribution found in your data set,
and you can try a few different peakwidths ranges to see
what peakwidths are then found. Beware: if you select blatantly wrong,
e.g. c(30,80) on the above UPLC gradient,
you will still find "something". But the histogram helps to figure out
whether the majority of peakwidths is within your peakwidth range.

> Why do you use minFraction = 0.8 for PeakDensityParam
> but minFraction = 0.85 for PeakGroupsParam?

I was not aware of that difference, but that threshhold
does depend on the size(s) of your sample groups, and how
homogenous you'd expect them to be, and how much "noise"
you'd accept after grouping.

On Tue, 2018-10-16 at 19:55 -0700, COURTNEY SCHIFFMAN wrote:
> ...
> Why with the snthresh=10 in "CentWaveParam" are there still
> chromatographic peaks with an sn less than 10 after running
> "findChromPeaks"?

I had to dig the exact answer from the code:
https://github.com/sneumann/xcms/blob/eb6c61d2f081ea7ac6aeb1aa958f8a52fb70a91d/R/do_findChromPeaks-functions.R#L950

The summary is that the threshhold is calculated as
sdthr <- sdnoise * snthresh

and the SN you see in the peaks table is

https://github.com/sneumann/xcms/blob/eb6c61d2f081ea7ac6aeb1aa958f8a52fb70a91d/R/do_findChromPeaks-functions.R#L1066
round((maxint - baseline) / sdnoise), ## S/N Ratio

So indeed there is some room for confusion.

Yours,
Steffen

Other / Re: Conversion from .RAW to mzXML and then export specific scans as .txt

August 20, 2018, 05:10:35 AM

Hi, in pwiz you have the mscat command line tool, in R you can use mzR for reading raw mz* data,
and/or MSnbase for a higher-level interface. Yours, Steffen

XCMS / Re: export XCMS2 fragments

August 20, 2018, 05:08:43 AM

Hi, the `xcmsFragments` is a rather old object class.
Your might want to look into the XCMS3 interface,
that heavily relies on MSnbase, which has much better
support for MS^n data. Yours, Steffen

XCMS / Re: Optimize peak-picking

April 12, 2017, 03:33:32 AM

Hi,
which peak exactly are you missing ? As you see below, I can happily find
the M184T407.

Yours,
Steffen

Code: [Select]

sneumann@acryl:/tmp/maialba$ R

R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(xcms)
Loading required package: mzR
Loading required package: Rcpp

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unlist, unsplit

Loading required package: ProtGenerics
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: MSnbase
Loading required package: BiocParallel

This is MSnbase version 2.1.10 
  Read '?MSnbase' and references therein for information
  about the package and how to get started.


Attaching package: ‘MSnbase’

The following object is masked from ‘package:stats’:

    smooth

The following object is masked from ‘package:base’:

    trimws


This is xcms version 1.51.9 

> 
> file <- "acq1.mzXML"
> 
> xs <- xcmsSet(file, method="centWave", ppm=10, peakwidth=c(3,25), snthresh=2, mzCenterFun="wMean", integrate=2, fitgauss=F, scanrange=NULL, noise=0, sleep=0, verbose.columns=F)
DEBUG: using original centWave.
Detecting mass traces at 10 ppm ... OK
Detecting chromatographic peaks in 31566 regions of interest ... OK: 12594 found.
> 
> peaks(xs)[peaks(xs)[,"mz"] < 185 & peaks(xs)[,"mz"] > 184 & peaks(xs)[,"rt"] < 450 & peaks(xs)[,"rt"] > 350, ]
            mz    mzmin    mzmax      rt   rtmin   rtmax       into       intb
 [1,] 184.0751 184.0749 184.0752 407.033 400.336 412.055 32211658.9 32192175.1
 [2,] 184.0748 184.0746 184.0750 448.890 442.190 456.424 45293500.1 45270113.3
 [3,] 184.0743 184.0742 184.0745 385.269 376.898 390.291  7070136.5  7048055.0
 [4,] 184.0743 184.0742 184.0745 371.038 366.016 376.898  2287296.5  2269111.5
 [5,] 184.0746 184.0745 184.0747 437.168 424.612 442.190 35453580.9 35425005.4
 [6,] 184.0745 184.0743 184.0746 416.241 412.055 424.612 19979393.7 19958609.4
 [7,] 184.0743 184.0742 184.0744 361.830 356.808 366.016   793141.5   777554.1
 [8,] 184.0743 184.0742 184.0745 395.314 390.291 400.336  4780042.9  4763156.6
 [9,] 184.0744 184.0743 184.0746 429.634 424.612 431.308  9079473.6  9067784.3
           maxo   sn sample
 [1,] 4220260.0 5793      1
 [2,] 3970454.8 5450      1
 [3,] 1122035.1 1539      1
 [4,]  421025.2  576      1
 [5,] 3052630.8 4190      1
 [6,] 2557234.2 3510      1
 [7,]  107556.0  146      1
 [8,]  691897.0  948      1
 [9,] 1943207.9 2666      1
> 
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] xcms_1.51.9         MSnbase_2.1.10      BiocParallel_1.9.4 
[4] Biobase_2.30.0      ProtGenerics_1.7.0  BiocGenerics_0.16.1
[7] mzR_2.9.10          Rcpp_0.12.8        

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2     BiocInstaller_1.20.3   plyr_1.8.4            
 [4] iterators_1.0.8        tools_3.2.3            zlibbioc_1.16.0       
 [7] MALDIquant_1.16        digest_0.6.11          tibble_1.2            
[10] preprocessCore_1.32.0  gtable_0.2.0           lattice_0.20-33       
[13] Matrix_1.2-3           foreach_1.4.3          stringr_1.1.0         
[16] S4Vectors_0.8.11       IRanges_2.4.8          multtest_2.26.0       
[19] stats4_3.2.3           grid_3.2.3             impute_1.44.0         
[22] survival_2.40-1        XML_3.98-1.5           RANN_2.5              
[25] limma_3.26.9           ggplot2_2.2.1          reshape2_1.4.2        
[28] magrittr_1.5           MASS_7.3-45            splines_3.2.3         
[31] scales_0.4.1           pcaMethods_1.60.0      codetools_0.2-14      
[34] MassSpecWavelet_1.36.0 assertthat_0.1         mzID_1.13.0           
[37] colorspace_1.3-2       stringi_1.1.2          affy_1.48.0           
[40] lazyeval_0.2.0         munsell_0.4.3          doParallel_1.0.10     
[43] vsn_3.38.0             affyio_1.40.0

XCMS / Re: fillpeaks creates high background in GC-MS

January 19, 2017, 09:01:58 AM

Thanks for the question. I think adding other integration methods into fillPeaks would be not entirely trivial,
but might be possible. Patches welcome :-)
Another potential solution would be to pass the missing peak as ROI to centWave with much lower S/N threshold,
and merge these peaks. No guarantee that the second run picks them up, though.
Yours, Steffen

XCMS / Re: RT correction_ 3 minutes

January 18, 2017, 02:49:38 AM

ok, so general strategy I'd recommend is to first have very lax parameters
for the group()ing and retcor() to see which samples have what shift.
If you say column change, one way would've been to identify the samples
affected, and correct/modify retention times based on the prior knowledge
about the sample shift. Since you say gradual changes over acquisition time,
you need the "normal" xcms way to correct.

So, initial step is group(), make sure your bw is big enough to cover
the whole expected shifts. Problem with large bw are false positives,
where peaks are put together that should not. But OK for initial
RT correction guesstimate. Then you can use plotQC(xcmsSet, what=""),
the last plot will give you the estimated RT shift per sample.
If you retcor() with plottype="mdevden’", you can see how your landmark
peaks are distributed across the gradient, and whether the correction looks
good or erroneous. You can also look at https://github.com/sneumann/IPB-2014-01/blob/master/IPB-2014-01.rmd#retention-time-outlier-visualisation
for a clustering based on RT behaviour.

There is no ready-made snippet for visualising the shifts of your spiked standard,
but I would expect some people have something like that done.
You need some code like the one used in the plotQC: https://github.com/sneumann/xcms/blob/devel/R/plotQC.R#L159

Yours,
Steffen

XCMS / Re: RT correction_ 3 minutes

January 17, 2017, 01:53:31 AM

Hi Sanju,
depending on data, this should be possible. Some questions: What Chromatography / gradient are you using ?
What MS are you using ? Is it random shifts of up to 3mins all over the samples , or did something happen (column change) and all remaining samples are shifted by three minutes ? Yours, Steffen

Job opportunities / PhD position Bioinformatics / Metabolomics

January 05, 2017, 08:27:48 AM

The Leibniz Institute of Plant Biochemistry (IPB) is an international research institute located on the Weinberg-campus in Halle and provides state-of-the-art facilities for research in bioinformatics, metabolomics and plant biochemistry.

The research group “Bioinformatics and Mass Spectrometry” in the department of Stress- and Developmental Biology at the Leibniz-Institute of Plant Biochemistry (IPB) is seeking applications by highly motivated candidates for a position as Research assistant (PhD student Bioinformatics / Metabolomics) in the context of the Leibniz project „DiSeMiNation“ on mangrove systems to contribute to global conservation.
The focus of the position will be to analyse mass spectrometry data on the biotic composition of mangrove systems, adapt methods for creation of current and historic metabolite profiles from sediment samples and to adapt methods for metabolite annotation to Pyrolysis GC/MS data.

You should hold a degree in bioinformatics or computer science, with experience in algorithm and software engineering and statistics. You are able to program in the statistics framework R, and have worked in Java. Knowledge in metabolomics would be an advantage.

The position is limited to 3 years, and available from April 1st, 2017. Payment is according to local regulations TV-L. Further information is available from the IPB job opportunities page .

Job opportunities / Vacancy: Two Postdoc positions Bioinformatics / Metabolomics at IPB Halle

June 29, 2016, 05:13:51 AM

Research assistant (Postdoc Bioinformatics / Metabolomics)

in the context of the German Network for Bioinformatics Infrastructure (de.NBI). The focus of the position will be to establish an infrastructure to support the experimental metabolomics community with efficient computational metabolomics services. The successful candidate will engage with users, provide support with metabolite annotation tools (MassBank, MetFrag) and standards-compliant data sharing.

You should hold a PhD in bioinformatics or computer science, with experience in algorithm and software engineering and statistics. You are able to program in the statistics framework R, and have worked in Java or C/C++. Knowledge in metabolomics or Grid-/Cloud computing would be an advantage. The position is limited to 3 years, and available from 01.11.2016. Payment is according to local regulations TV-L.

Further information is available from the institute's homepage http://www.ipb-halle.de/en/career/job-vacancies/ , for inquires please contact
Dr. Steffen Neumann, telephone: +49 345 5582-1470, e-Mail: sneumann@ipb-halle.de .

Please send your application (cover letter addressing your research interests, CV, transcripts, and names/contacts of two references) quoting reference number 7/2016 until July 31st, 2016 to:

Code: [Select]

Leibniz-Institut für Pflanzenbiochemie (IPB)
Stiftung des öffentlichen Rechts
AG Personalangelegenheiten
Frau Kerstin Balkenhohl
Weinberg 3
06120 Halle (Saale) or to bewerbungen@ipb-halle.de

CAMERA / Re: plotPsSpectrum error

April 24, 2016, 11:54:32 PM

Hi,
yes I can reproduce it, and tracking it on
https://github.com/sneumann/CAMERA/issues/8

One thing is that you don't use the result of the fillPeaks(),
since you create the result from xset3 (and overwrite xset4 ...)
but that is not the core issue.

Yours,
Steffen

CAMERA / Re: plotPsSpectrum error

April 22, 2016, 01:37:58 PM

Hi Mike,

can you post a self-contained (cut&pastable) example ?
You can use the data from either packages faahKO or mtbls2.

Yours,
Steffen

XCMS / Re: How to evaluate different software? XCMS and Progenesis

April 20, 2016, 12:43:31 PM

Hi,

software evaluation is always a tricky beast.

* You can ask three mass spectrometrists to evaluate the 7000+17000 features. Not going to happen.
* You can create a special evaluation experiment, e.g. http://pubs.acs.org/doi/abs/10.1021/ac301482k and evaluate which software has better results
* You can create a special evaluation experiment, e.g. http://bmcbioinformatics.biomedcentral. ... 2105-9-504 where the design allows to automatically designate a ground truth, and then make the task harder for the software, while the same ground truth should be detected.
* You can try to import the Progenesis output into an xcmsSet, and compare the quality score from the IPO package http://bmcbioinformatics.biomedcentral. ... 562-8#CR15. This has the benefit that you can do it on your existing data.

If you measure a well designed benchmark dataset, consider submitting it to http://www.ebi.ac.uk/metabolights .

Yours,
Steffen

XCMS / Re: m/z sort assumption violated

April 01, 2016, 06:35:00 AM

Hi,

There is now an automatic fix in https://github.com/sneumann/xcms/tree/f ... assumption
which I'll pull into a Bioconductor version after the next release in April.

Yours,
Steffen

XCMS / Re: mz sorting violation

April 01, 2016, 06:16:04 AM

Hi,

There is now an automatic fix in https://github.com/sneumann/xcms/tree/f ... assumption
which I'll pull into a Bioconductor version after the next release in April.

Yours,
Steffen

Task groups / CFP: metaRbolomics workshop at the 2016 Metabolomics Society

February 23, 2016, 06:43:57 AM

Dear colleagues,

we are calling for abstracts for the workshop “metaRbolomics: The R toolbox for Metabolomics”,
held on Monday the 25th, 2016 as part of the annual meeting of the Metabolomics Society
in Dublin (http://metabolomics2016.org/).

The workshop is aimed at Biologists, Bioinformaticians, and Chemists interested in high-throughput analysis.
Contributions should highlight the tools available for metabolomics, how existing packages
can be combined, and encourage participants to envision future developments and synergies.
Speakers will be chosen based on the creativity of their approaches for metabolomics data processing
and analysis in R, ideally with combinations of two or more R packages. Example data and R code
should be available to the audience to explore for themselves the power of R.

Please enter your abstract in the form at http://goo.gl/forms/Lc9QLVRsKG.
The deadline for submissions is Monday 7th of March 2016.

Yours,
Jan Stanstrup and Steffen Neumann