Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - sneumann

16
Computational Mass Spectrometry / Computational mass spectrometry and metabolomics in Dublin
Dear CompMS community,

one of the big events this year will be the annual conference of the international Metabolomics society in Dublin (27.-30.6.2016, see http://metabolomics2016.org/), where around 1000 people can be expected to attend. Last year we coordinated several proposals for workshops and sessions, and we are delighted that ALL THREE proposed sessions got accepted! Among the other scientific topics like the session on “Advances in Statistical Tools”, we expect to have these sessions related to computational mass spectrometry and metabolomics:

  • New Approaches for Identification of Metabolites applying MS and NMR (Session team: Witting/Dunn)
  • Network and Pathway Analysis for Metabolomics (Session team: Willighagen/Jourdan)
  • Computational Mass Spectrometry (Session team: Neumann/Böcker)

There will also be these workshops :

  • Computational Workflows and Workflow Engines (Workshop lead: Christoph Steinbeck)
  • Workshop On Data Sharing and Standardisation (Workshop team: Reza Salek et al.)
  • metaRbolomics: The R toolbox for Metabolomics (Workshop team: Stanstrup/Neumann)

Of course, the official call for participation and papers by the conference organisers will follow in the next weeks, but you could already think about your contribution to make the workshops and sessions a whopping success!

Yours,
the session and workshop organisers !
17
XCMS / Re: Get fold, p-value for set of EIC m/z values
Hi JHela001,

I could imagine that one could create an xcmsSet without peaks,
add your EICs (which should be mz and RT rectangles) as "groups" and let fillPeaks() do the integration job.

Yours,
Steffen
18
XCMS / Re: Confusions about handling metabolomics data with XCMS
Hi,

The netCDF will not have proper pos/neg annotated. If you KNOW
in which order they come, you can:

Code: [Select]

    library(xcms)
    library(faahKO)

    # Assume that's one of yours:
    file <- system.file('cdf/KO/ko15.CDF', package = "faahKO")
    xposneg <- xcmsRaw(file)

    # Vector of alternating pos/neg
    polarity = rep(c("positive", "negative"), length=length(xposneg@scantime))

    # Split into a list with two xcmsRaw objects:
    xrs <- split(xposneg, f=polarity)

    # Write out to new netCDF files
    write.cdf(xrs[["positive"]], filename="xrpos.cdf")
    write.cdf(xrs[["negative"]], filename="xrneg.cdf")


Repeat for all your files...

Yours,
Steffen
19
Task groups / Coordinated session proposals for Dublin 2016
Dear all,

Next year's Metabolomics 2016 conference had a call for sessions and proposals (see below).

It would be great if we could coordinate some of the CompMS related proposals beforehand to ensure a good coverage of the topics we're interested in. Please add your suggestions to the overview document at [1] and form small session organising teams.

Everyone with the link can view, but you'll need to request edit permissions in the google doc, but that is only because then it's simpler to send messages to everyone who is involved.

Once we converged towards a set of sessions, the session teams will have to send their detailed proposal to the Dublin organisers by October 30, 2015

Yours,
Sebastian Böcker and Steffen Neumann

[1] https://docs.google.com/document/d/1Qpp ... sp=sharing


--------------------------------------------------------------------------------------

From: http://www.metabonews.ca/Oct2015/MetaboNews_Oct2015.htm

URGENT: Open Call to Metabolomics Society Members for Scientific
Sessions and Workshops??
We are in the process of planning workshops and scientific sessions and
we welcome you to submit proposals for these important meetings. ??

Additional details can be found by clicking on the links below.
Do not delay – the proposals are due by October 30, 2015. ?

    ?Call for Scientific Sessions ??
  https://www.regonline.com/custImages/29 ... pdated.pdf

  Call for Workshops
  https://www.regonline.com/custImages/29 ... s-2016.pdf
20
XCMS / Re: Too few features? Synapt G2S UPLC HILIC
Hi,

centWave does not use the profParam for the feature detection step.
It is used in e.g. the plotRaw() which displays the raw data.
So there should be no need to modify it.

Yours,
Steffen
21
XCMS / Re: Too few features? Synapt G2S UPLC HILIC
Hi y MikaelE,

your parameters don't look too far off.

1) How much RT deviations does your retcor() graphic show ? is it around +/- 5 secs ?

2) How many sample classes do you have ? All 105 in a single sample class ?
    I think you should at least separate samples and QC.

3) On Waters one has to be careful with the expected mass accuracy.
    AFAIK, the Waters DLL that is available to mzML converters like proteowizard
    does not export the recalibrated data. IIRC netCDF data from DataBridge
    does have calibrated data, but lacks the MS^2 spectra if you have any.

4) Recent xcms versions have "plotQC(xs)" to show some diagnostic plots,
    due to 3) above, you could especially check m/z deviations.

5) You might want to check out http://www.biomedcentral.com/1471-2105/16/118
https://github.com/glibiseller/IPO for an automatic Parameter optimisation.
Caveat: don't use them blindly, use them wisely and as food for thought.

Yours,
Steffen
22
XCMS / Re: Alignment of matrices with obiwarp
Hi,

it is possible to create an xcmsSet from a peak list,
as shown by the following minimalistic example.
But this will not allow to use Obiwarp, since Obiwarp
goes back to the raw data files, which in turn would have
to be netCDF, mzML etc.

Code: [Select]
library(xcms)

intensity <- matrix(1:32, ncol=4)
mz <- rep(1:8, ncol(intensity))
rt <- rep(8:1, ncol(intensity))

xs <- new("xcmsSet")

peaks(xs) <- cbind(mz=mz, mzmin=mz, mzmax=mz,
                  rt=mz, rtmin=mz, rtmax=mz,
                  into=as.vector(intensity), intf=as.vector(intensity),
                  maxo=as.vector(intensity), maxf=as.vector(intensity),
                  sample=rep(seq(1,ncol(intensity)), each=nrow(intensity))
                  )
sampnames(xs) <- 1:4

xsg <- group(xs)
xsg

If you need Obiwarp, you will need to write a new constructor
for the xcmsSource class that reads a peaklist into an xcmsRaw,
see also this thread: viewtopic.php?f=8&t=310

and the code in https://github.com/sneumann/xcms/blob/m ... msSource.R
and the use in https://github.com/sneumann/xcms/blob/m ... /xcmsRaw.R


Yours,
Steffen
23
XCMS / Re: load a CDF file in R
Hi,

the file you sent loads fine over here, so I expect something in your installation.
If the smaller files load fine, I suspect RAM issues. How much memory do you have ?

Code: [Select]
> library(xcms)
Loading required package: mzR
Loading required package: Rcpp
xr <- xcmsRawLoading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

    xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames,
    sapply, setdiff, sort, table, tapply, union, unique, unlist

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

("data2
Attaching package: ‘xcms’

The following object is masked from ‘package:Biobase’:

    phenoData, phenoData<-

> xr <- xcmsRaw("data2.cdf")
> xr
An "xcmsRaw" object with 13257 mass spectra

Time range: 360-4204.2 seconds (6-70.1 minutes)
Mass range: 14.9984-519.9868 m/z
Intensity range: 0-1384450

MSn data on  0  mass(es)
with  0  MSn spectra
Profile method: bin
Profile step: 1 m/z (506 grid points from 15 to 520 m/z)

Memory usage: 5110 MB
> sessionInfo()
R version 3.0.0 Patched (2013-04-04 r62494)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8      LC_NUMERIC=C             
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8 
 [7] LC_PAPER=C                LC_NAME=C               
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C     

attached base packages:
[1] parallel  stats    graphics  grDevices utils    datasets  methods 
[8] base   

other attached packages:
[1] xcms_1.43.1        Biobase_2.22.0    BiocGenerics_0.8.0 mzR_2.1.10       
[5] Rcpp_0.11.2     

loaded via a namespace (and not attached):
[1] codetools_0.2-14 zlibbioc_1.8.0 

The matrix that you get has the dimensions:
Code: [Select]
> dim(xr@env$profile)
[1]  506 13257

So the 13257 correspond to the scans, the 506 grid points from 15 to 520 m/z.

Yours,
Steffen
24
XCMS / Re: load a CDF file in R
Hi,
if your installation works in principle, there is little I can think of.
Is this LECO GCxGC data ?

If your file has  6,479,713  bytes, that's only 6MB, so not huge at all.

For your other question, if you get the xcmsRaw, you find the
Raw data as a matrix by using xr@env$profile if you've set profStep=1
where 1 is the resolution in Da of the matrix.

Yours,
Steffen
25
XCMS / Re: load a CDF file in R
Hi Mohammad,

the file you sent seems to be fine on my Ubuntu Linux box.
What is your operating system and R version ?
Can you run R in a command line without the Rstudio
around it ?

Yours,
Steffem


Code: [Select]
> library(xcms)
> xr <- xcmsRaw("m.cdf")
> xr
An "xcmsRaw" object with 1029 mass spectra

Time range: 1199.8-1500 seconds (20-25 minutes)
Mass range: 28.8909-501.0926 m/z
Intensity range: 0-4153340

MSn data on  0  mass(es)
with  0  MSn spectra
Profile method: bin
Profile step: 1 m/z (473 grid points from 29 to 501 m/z)

Memory usage: 24.5 MB
26
XCMS / Re: retention time correction for individual sample classes
Hi,

Quote from: "dlforrister"
Based on the internal standard the mean shift is 0.1 mins. However, about 10% of our samples have shifted by 0.8 - 1.4 mins.
1) We could use our RT standard to do a rough initial shift for all peaks. My big fear of doing this that shifts in chromatography across a gradient tends to be nonlinear.
 You are recommending using default retcor(), Is this because as stated above no single sample will represent all samples because each sample class has a different set of metabolites? Does the default retcor have a minimum number of "well behaved peaks"? and will it fail if there are two few overlapping compounds between less similar sample classes.

Check out the extra= and especially missing= parameter for retcor(). You can set missing probably
to something like 5% of your number of samples to catch those "too few overlapping compounds
between less similar sample classes"

I'd hope that the non-linear aspect is caught by the second round of group/retcor.

Quote from: "dlforrister"
2) Reading the forum it seems it is possible to merge and split samples after xcmsSET(), but when merging and splitting RT correction information is lost. Why is this? IS there a hack which would allow this information to be stored?

Yes, splitting is possible, but when I wrote the c() joining function, I had no idea how to handle
the RT correction. Should they just stay the same ? I had no really good answer.
A hack could involve manually working on the faahko@rt lists, which have the RT
for each raw file before/after the correction:

Code: [Select]
> str(faahko@rt)
List of 2
 $ raw      :List of 12
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
 $ corrected:List of 12
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...
  ..$ : num [1:1278] 2501 2503 2505 2506 2508 ...

Quote from: "dlforrister"
3) Given the consistency of the majority of our samples, we could potentially re-run all samples with a RT shift > an acceptable threshold. Is there a RT shift threshold where xcms effectively ignores small shifts (i.e based on our peak width of c(5,12) 0.2 mins, wouldn't shifts less than 0.2 mins still fall into the same peak width)?
Not sure what you mean here. peakwidt=(5,12) refers to peak picking.
Your issue is the group()ing step. There the important parameter is bw=seconds
for the kernel density estimation that is behind the grouping (cf. 2006 xcms paper
or the xcmsPreprocess vignette.

Maybe some more experimenting with the group/retcor parameters first
to get an acceptable xcmsSet without having to resort to hacking.
Maybe then a more directed hacking approach can tweak even more
out of the data.

You can also check http://metabolomics-forum.com/viewtopic.php?f=26&t=137
and there esp. the lower code snippet to cluster the samples w.r.t. their retention time profiles/deviation.

Yours,
Steffen
27
XCMS / Re: retention time correction for individual sample classes
Hi Dale,

sounds like some severe batch effect in your chromatography. Can you give, for each batch, a rough RT deviation estimate ?
Can you guess if it would help to add/subtract some offset to each batch ? Then, with some non-trivial R hacking
it is possible to give a first round of corrected retention times to the xcmsSet, and use the normal retcor()
do a second round.

Yours,
Steffen
28
Job opportunities / Vacancy: Bioinformatician (Metabolomics) at the IPB Halle
The IPB is an international research institute located on the Weinberg-campus in Halle and provides state-of-the-art facilities for research in bioinformatics, metabolomics and plant biochemistry.

The research group “Bioinformatics and Mass Spectrometry” in the department of Stress- and Developmental Biology at the Leibniz-Institute of Plant Biochemistry (IPB) is seeking applications by highly motivated candidates for a position as research assistant. In the context of the EU project  PhenoMeNal you will work on the integration of existing computational metabolomics methods into efficient and versatile workflows, and their execution on local and European grid infrastructures.

You should hold a diploma or masters degree in bioinformatics or computer science, with experience in algorithm and software engineering and statistics. You are able to program in the statistics framework R, and have worked in Java or C/C++. Knowledge in metabolomics or Grid-/Cloud computing would be an advantage. The position is limited to 3 years, and available from 01.10.2015.

Payment is according to local regulations TV-L.

Further information is available from the institute's homepage http://www.ipb-halle.de/en/, for inquires please contact Dr. Steffen Neumann, telephone: +49 345 5582-1470, e-Mail: bewerbungen@ipb-halle.de
29
XCMS / Re: using plotPeaks to plot selected peaks
Hi,

It's always good to post a self-contained code snippet in a question,
so that I can cut&paste to find a solution. I think the trick is to have
a loop around the plotPeaks() to plot the next page of peaks.
That way you can also sort prior to plotting, e.g. by descending intensity
or fitgauss or ...

Hope that helped,

Yours,
Steffen

Code: [Select]
library(xcms)

file <- system.file('cdf/KO/ko15.CDF', package = "faahKO")
xraw <- xcmsRaw(file)
p <- findPeaks.centWave(xraw, fitgauss=T, verbose=T, sleep=0.001)

plotPeaks(xraw, p, figs=c(8, 4))

for (i in seq(1,nrow(p), by=32)) {
  plotPeaks(xraw, p[seq(i,max(i+32, nrow(p))),], figs=c(10, 10))
}

30
XCMS / Re: Help I don't want to get fillpeaks
Hi anvien,

diffreport() uses Student's t test for the statistics, which in turn
does not allow NA values.

If you want a table and not use filleaks(), check out peakTable()

Yours,
Steffen