Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - johannes.rainer

16
XCMS / Re: Implementing custom retention time alignment algorithms
Thanks for clarifying!

Regarding your concern, the fillChromPeaks will always use the adjusted retention times if retention time adjustment has been performed. So, the results should be the same, with or without applyAdjustedRtime.

Regarding the error: I fixed that. You can install the updated version from github:

For the developmental version you are using:
Code: [Select]
devtools::install_github("sneumann/xcms")

For the Bioconductor 3.8 release (R-3.5.x)
Code: [Select]
devtools::install_github("sneumann/xcms", ref = "RELEASE_3_8")

17
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Excellent summary @CoreyG ! In the end the 78 cores don't help unless you have enough memory to fit the full data from 78 mzML files at once into memory. Peak detection and filling in missing peak data both need to load the full data of a file (all spectra). You could load one of your mzML files into memory and then calculate the size of that using the object_size function from the pryr package:

Code: [Select]
obj <- readMSData(filename, mode = "inMem")
pryr::object_size(obj)

You could thus estimate how many processes you could run in parallel. On top of that you will however also need some more memory to fit the chromPeaks matrix and later the featureDefinitions data frame and R will need some more space to copy stuff in the background.

19
Tools / Re: Data from waters - mass measure in centroid mode
We have AB Sciex data and so far I used proteowizard centroiding (with vendor option). This turned out to be a poor choice. What I am doing now is to export all data from Sciex as mzML in profile mode and perform the centroiding in R using MSnbase.

Have also a look at https://github.com/jotsetung/metabolomics2018 where I describe the centroiding with MSnbase (specifically https://jotsetung.github.io/metabolomics2018/xcms-preprocessing.html#23_centroiding_of_profile_ms_data).
20
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Just chiming in with some explanations how xcms works now with large projects. We use now an object from the MSnbase package to represent the raw data, that only reads the raw (spectrum) data if required. That way the memory use is minimized. Peak detection is then performed on a per-file basis, i.e. reading the full data from one file, performing the peak detection on that and then removing the spectrum data again from memory. As Jan mentioned, you should be careful to not have too many parallel processes running, as the I/O will be the bottleneck, not the number of CPUs. On our cluster I use not more than 24 CPUs in parallel (although we have 292) because otherwise I got serious troubles with the I/O (this is most likely because our disk setup is ... suboptimal).

Just have a look at a recent xcms vignette (R 3.5.1, Bioconductor 3.8) how to perform the analysis. xcms uses by default this "onDisk" mode.

cheers, jo
23
Other / Re: Feature intensity drift correction suggestions
I'm using linear models to estimate the drift for each feature (similar to Wehrens et al mentione by Jan above) on QC samples requiring that I have valid measurements from the QC samples for 3/4 of the injection index range within a measurement run.

I had also high hopes on the RUVIII approach, but it failed completely on the data set I tested it on. My best guess for this failure is that it needs internal standards to estimate the "unwanted variance", and the values I got from the internal standard were not representative for the "real" samples.

cheers, jo
24
XCMS / Re: Gaussian shape peak filtering
In fact you did everything correctly. The problem was that replacing the identified peaks with chromPeaks(object) <- removes also the processing history (i.e. the information with which algorithm the peaks were detected). The "show" method for the XCMSnExp object had a bug that caused the error that you've seen. So nothing you did wrong, it was a bug in the function that ouputs the information on the object (I'll fix that in xcms).

The simple solution is to add the line
Code: [Select]
object.new@.processHistory <- object@.processHistory

to your function right before you return the result

I did also tweak your function a little bit. This version works also with objects after alignment:

Code: [Select]
peakShape_XCMS3v2 <- function(object, cor.val = 0.9, useNoise = 100) {
    require(xcms)
    ## Ensure we use adjusted retention times!
    xdata <- applyAdjustedRtime(object)
    peakmat <- chromPeaks(xdata) #extract everything

    res <- lapply(split.data.frame(peakmat, peakmat[, "sample"]), function(z) {
        ## Subset the data to the present file keeping only spectra that
        ## are in the peak range.
        currentSample <- filterRt(filterFile(xdata, z[1, "sample"]),
                                  rt = range(z[, c("rtmin", "rtmax")]))
        ## Loading the data into memory - could be faster for some settings
        ## can however also be removed
        currentSample <- as(as(currentSample, "OnDiskMSnExp"), "MSnExp")
        corr <- numeric(nrow(z))
        for (i in seq_len(nrow(z))) {
            mzRange <- z[i, c("mzmin", "mzmax")] + c(-0.001, 0.001)
            rtRange <- z[i, c("rtmin", "rtmax")]
            suppressWarnings(
                ints <- intensity(filterMz(filterRt(currentSample, rtRange),
                                           mzRange))
            )
            ints[lengths(ints) == 0] <- 0
            ints <- as.integer(unlist(ints))
            ints <- ints[!is.na(ints)]
            ints <- ints[ints > useNoise]
            ## Only proceed if we have values left
            if (length(ints)) {
                ## Normalize/baseline subtract
                ints <- ints - min(ints)
                if (max(ints) > 0)
                    ints <- ints / max(ints)
                fit <- try(nls(y ~ SSgauss(x, mu, sigma, h),
                               data.frame(x = 1:length(ints), y = ints)),
                           silent = TRUE)
                if (class(fit) == "try-error") {
                    corr[i] <- 1        # Shouldn't that be 0???
                } else {
                    ## calculate correlation of eics against gaussian fit
                    if (sum(!is.na(ints - fitted(fit))) > 4 &&
                        sum(!is.na(unique(ints))) > 4 &&
                        sum(!is.na(unique(fitted(fit)))) > 4) {
                        cor <- NULL
                        options(show.error.messages = FALSE)
                        cor <- try(cor.test(ints, fitted(fit),
                                            method = "pearson",
                                            use = "complete"))
                        options(show.error.messages = TRUE)
                        if (!is.null(cor) && cor$p.value <= 0.05)
                            corr[i] <- cor$estimate
                    }
                }
            }
        }                               # End for loop
        message("Peakshape evaluation: sample ",
                 basename(fileNames(currentSample)), ": ", sum(corr < cor.val),
                 "/", nrow(z), " peaks removed")
        z[corr >= cor.val, , drop = FALSE]
    })
    ## Need to "remember" the process history.
    ph <- processHistory(object)
    chromPeaks(object) <- do.call(rbind, res)
    ## Have to add also the process history
    object@.processHistory <- ph
    object
}
25
XCMS / Re: Gaussian shape peak filtering
Thanks Krista for providing the example!
It's also nice to see that you used the filterRt/filterMz in your code.

I'll have a look at it and post soon.

cheers, jo
26
XCMS / Re: How to export XCMS results
Hi,

it looks like you are using the new interface and functions (aka XCMS3) - groupval is not available for the new result objects. To extract the definitions of the features (previously called "groups") you can use the featureDefinitions function and to extract the actual intensity matrix for the features you will have to use featureValues (ensure also to set value = "into" when calling this function to extract the intensities).

Eventually you should also have a look at the xcms vignette that tries to describe how you can access results etc: https://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html

cheers, jo
27
XCMS / Re: replacement for getEIC in XCMS3
Just to update: we've added the function to the xcms codebase. It will be officially available with the next Bioconductor release (3.8) which will be released end of October 2018.

cheers, jo
28
XCMS / Re: replacement for getEIC in XCMS3
For now you can use the function below (just copy-paste into R). I will add it also to xcms and update the vignette to describe your use case.

Code: [Select]
featureChromatograms <- function(x, expandRt = 0, aggregationFun = "max",
                                 ...) {
    if (!hasFeatures(x))
        stop("No feature definitions present. Please run first 'groupChromPeaks'")
    pks <- chromPeaks(x)
    mat <- do.call(rbind, lapply(featureDefinitions(x)$peakidx, function(z) {
        pks_current <- pks[z, , drop = FALSE]
        c(range(pks_current[, c("rtmin", "rtmax")]),
          range(pks_current[, c("mzmin", "mzmax")]))
    }))
    colnames(mat) <- c("rtmin", "rtmax", "mzmin", "mzmax")
    chromatogram(x, rt = mat[, 1:2], mz = mat[, 3:4],
                 aggregationFun = aggregationFun, ...)
}

This function takes an XCMSnExp object, extracts chromatograms for each feature and returns it as an Chromatograms object. The Chromatograms arranges the XIC in a two-dimensional matrix, columns being the individual samples. Each row contains the XIC for one feature. It is thus straight forward to plot the data for one specific feature. The example code below illustrates it on the famous faahko data set:

Code: [Select]
library(xcms)
library(faahKO)
faahko_3_files <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                    system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                    system.file('cdf/KO/ko18.CDF', package = "faahKO"))

## Do a quick and dirty preprocessing
od <- readMSData(faahko_3_files, mode = "onDisk")
od <- findChromPeaks(od, param = CentWaveParam(peakwidth = c(30, 80),
        noise = 1000))
od <- adjustRtime(od, param = ObiwarpParam(binSize = 0.6))
od <- groupChromPeaks(od,
        param = PeakDensityParam(minFraction = 0.8, sampleGroups = rep(1, 3)))

## Extract ion chromatograms for each feature (need to post the code above or use xcms version > 3.3.4
chrs <- featureChromatograms(od)

## Extract also XICs without adjusted retention times
chrs_raw <- featureChromatograms(od, adjustedRtime = FALSE)

## Plot the XIC for the first feature using different colors for each file
par(mfrow = c(1, 2))
plot(chrs[1, ], col = c("red", "green", "blue"))
plot(chrs_raw[1, ], col = c("red", "green", "blue"))


In your use case you could simply make then a for loop over nrow(chrs) to plot the data for each feature.

Note: you could also use the highlightChromPeaks function to indicate the identified peaks in the individual XICs:

Code: [Select]
plot(chrs[1, ], col = c("red", "green", "blue"))
highlightChromPeaks(od, rt = range(lapply(chrs[1, ], rtime)), mz = range(lapply(chrs[1, ], mz)),
    border = c("red", "green", "blue"))



Hope that helps.
cheers, jo
29
Job opportunities / Postdoctoral position in Computational Metabolomics
Eurac Research is looking for a

Postdoctoral researcher in Computational Metabolomics
 
The Institute for Biomedicine is seeking a highly motivated researcher to join the Computational Metabolomics team.
 

The Institute for Biomedicine is home to an international and interdisciplinary team of scientists, students and support staff, integrated in an excellent network of international and national partners. Living and working in Bozen/Bolzano allows you to experience a unique combination of alpine and Mediterranean culture right at the heart of one of the most exciting mountain regions in Europe.

The Institute for Biomedicine is running the Cooperative Health Research in South Tyrol (CHRIS) study, a population-based study on cardiometabolic and neurological health of 13,000 general population subjects [Pattaro et al 2015] that excels with a high participation rate from complete families. From this population study, targeted (Biocrates p180) and LC-MS-based untargeted metabolomics data are available for 7,500 and 5,000 individuals, respectively. The Computational Metabolomics group focuses, besides contributing to the xcms and MSnbase Bioconductor packages for untargeted metabolomics and mass spectrometry data processing, on the analysis of these and other metabolomics data sets generated at the Institute. The successful candidate will perform metabolomics data analyses, contribute to software development and is expected to write scientific articles.

Tasks:
· Analyze large scale targeted and untargeted metabolomics data sets.
· Investigate metabolic profiles in relation to family structure within a population study-based data set.
· Identify metabolic profiles related to health, lifestyle and nutrition.
· Integration of metabolomics and genotype data.
· In silico annotation and identification of features identified by LC-MS experiments.
· Implement and extend software for the analysis of untargeted metabolomics data.
· Present results and/or software at international conferences.
· Write scientific articles.
 
Requirements:
· PhD or equivalent experience in computational biology, bioinformatics, metabolomics or a related field.
· Experience in the analysis of LC-MS-based metabolomics data sets.
· Proficient R-software usage. Software development skills are beneficial.
· Good communication skills; fluent in the English language.
· Team-working ability.
 
We offer:
· A one-year full-time position with the possibility of extension.
· A collaborative work environment with multidisciplinary teams of experts form different fields of biomedical research.
· Access to cutting-edge data and computational resources.
· Attractive living and working conditions.
 
How to apply:
Interested candidates should submit their application (CV, cover letter, reference contacts and further relevant documents) within 31.12.2018 via Email to jobs.biomedicine@eurac.edu  stating “Researcher in Computational Metabolomics” as subject:
Eurac Research
Institute for Biomedicine
Via Galvani 31 – 39100 Bolzano
Email: jobs.biomedicine@eurac.edu  - www.eurac.edu
Tel: +39 0471 055 500 / Fax: + 39 0471 055 599

Information about the Institute is available at: http://www.eurac.edu/en/research/health/biomed

For further information please contact Johannes Rainer (johannes.rainer@eurac.edu).


Please attach, after reading the privacy policy in compliance with the EU Regulation No. 2016/679 (GDPR) and the national legislation, the following consent to your personal record: 'I have read the privacy policy under http://www.eurac.edu/en/aboutus/Jobs/Pages/default.aspx and hereby authorize Eurac Research to use my personal data in accordance to EU Regulation No. 2016/679 and national legislation.' We inform you that we will not be allowed to consider any application without this compliancy declaration.

Please add the following consent if it is of interest to you: “I hereby explicitly authorize Eurac Research to store my personal data for the purpose of being contacted for potential future job openings”.
30
XCMS - FAQ / Re: xcms does not execute
Eventually there is something strange going on with the parallel processing. Please add the following line right after you load the xcms package and retry:

library(xcms)
register(SerialParam())

with that you disable parallel processing - if it runs without parallel processing we can think of a way to fix parallel processing.

cheers, jo