Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - johannes.rainer

1
Other / Re: Open software for SRM experiments
I'm also not familiar with SRM/MRM data. But we have implemented a readSRMData function in MSnbase to read chromatographic data from mzML files. Would be nice to know if that works for you and what is missing. Note also that xcms can do now peak detection also on purely chromatographic data (i.e. Chromatogram/Chromatograms classes which are returned by the readSRMData function).
2
XCMS / Re: Implementing custom retention time alignment algorithms
Thanks for sharing your ideas. If you would like some changes in xcms I would however appreciate if you open an issue at the xcms github repository as this enables me also to keep track of what to do (and what was done) - a pull request would actually be even better ;)

Regarding the code in xcms centWave - I did not write that code and I am veeerrry hesitant to change anything within it as this will affect all xcms users.

thanks again, jo
3
XCMS / Re: Implementing custom retention time alignment algorithms
Thanks for clarifying!

Regarding your concern, the fillChromPeaks will always use the adjusted retention times if retention time adjustment has been performed. So, the results should be the same, with or without applyAdjustedRtime.

Regarding the error: I fixed that. You can install the updated version from github:

For the developmental version you are using:
Code: [Select]
devtools::install_github("sneumann/xcms")

For the Bioconductor 3.8 release (R-3.5.x)
Code: [Select]
devtools::install_github("sneumann/xcms", ref = "RELEASE_3_8")

4
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Excellent summary @CoreyG ! In the end the 78 cores don't help unless you have enough memory to fit the full data from 78 mzML files at once into memory. Peak detection and filling in missing peak data both need to load the full data of a file (all spectra). You could load one of your mzML files into memory and then calculate the size of that using the object_size function from the pryr package:

Code: [Select]
obj <- readMSData(filename, mode = "inMem")
pryr::object_size(obj)

You could thus estimate how many processes you could run in parallel. On top of that you will however also need some more memory to fit the chromPeaks matrix and later the featureDefinitions data frame and R will need some more space to copy stuff in the background.

6
Tools / Re: Data from waters - mass measure in centroid mode
We have AB Sciex data and so far I used proteowizard centroiding (with vendor option). This turned out to be a poor choice. What I am doing now is to export all data from Sciex as mzML in profile mode and perform the centroiding in R using MSnbase.

Have also a look at https://github.com/jotsetung/metabolomics2018 where I describe the centroiding with MSnbase (specifically https://jotsetung.github.io/metabolomics2018/xcms-preprocessing.html#23_centroiding_of_profile_ms_data).
7
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Just chiming in with some explanations how xcms works now with large projects. We use now an object from the MSnbase package to represent the raw data, that only reads the raw (spectrum) data if required. That way the memory use is minimized. Peak detection is then performed on a per-file basis, i.e. reading the full data from one file, performing the peak detection on that and then removing the spectrum data again from memory. As Jan mentioned, you should be careful to not have too many parallel processes running, as the I/O will be the bottleneck, not the number of CPUs. On our cluster I use not more than 24 CPUs in parallel (although we have 292) because otherwise I got serious troubles with the I/O (this is most likely because our disk setup is ... suboptimal).

Just have a look at a recent xcms vignette (R 3.5.1, Bioconductor 3.8) how to perform the analysis. xcms uses by default this "onDisk" mode.

cheers, jo
10
Other / Re: Feature intensity drift correction suggestions
I'm using linear models to estimate the drift for each feature (similar to Wehrens et al mentione by Jan above) on QC samples requiring that I have valid measurements from the QC samples for 3/4 of the injection index range within a measurement run.

I had also high hopes on the RUVIII approach, but it failed completely on the data set I tested it on. My best guess for this failure is that it needs internal standards to estimate the "unwanted variance", and the values I got from the internal standard were not representative for the "real" samples.

cheers, jo
11
XCMS / Re: Gaussian shape peak filtering
In fact you did everything correctly. The problem was that replacing the identified peaks with chromPeaks(object) <- removes also the processing history (i.e. the information with which algorithm the peaks were detected). The "show" method for the XCMSnExp object had a bug that caused the error that you've seen. So nothing you did wrong, it was a bug in the function that ouputs the information on the object (I'll fix that in xcms).

The simple solution is to add the line
Code: [Select]
object.new@.processHistory <- object@.processHistory

to your function right before you return the result

I did also tweak your function a little bit. This version works also with objects after alignment:

Code: [Select]
peakShape_XCMS3v2 <- function(object, cor.val = 0.9, useNoise = 100) {
    require(xcms)
    ## Ensure we use adjusted retention times!
    xdata <- applyAdjustedRtime(object)
    peakmat <- chromPeaks(xdata) #extract everything

    res <- lapply(split.data.frame(peakmat, peakmat[, "sample"]), function(z) {
        ## Subset the data to the present file keeping only spectra that
        ## are in the peak range.
        currentSample <- filterRt(filterFile(xdata, z[1, "sample"]),
                                  rt = range(z[, c("rtmin", "rtmax")]))
        ## Loading the data into memory - could be faster for some settings
        ## can however also be removed
        currentSample <- as(as(currentSample, "OnDiskMSnExp"), "MSnExp")
        corr <- numeric(nrow(z))
        for (i in seq_len(nrow(z))) {
            mzRange <- z[i, c("mzmin", "mzmax")] + c(-0.001, 0.001)
            rtRange <- z[i, c("rtmin", "rtmax")]
            suppressWarnings(
                ints <- intensity(filterMz(filterRt(currentSample, rtRange),
                                           mzRange))
            )
            ints[lengths(ints) == 0] <- 0
            ints <- as.integer(unlist(ints))
            ints <- ints[!is.na(ints)]
            ints <- ints[ints > useNoise]
            ## Only proceed if we have values left
            if (length(ints)) {
                ## Normalize/baseline subtract
                ints <- ints - min(ints)
                if (max(ints) > 0)
                    ints <- ints / max(ints)
                fit <- try(nls(y ~ SSgauss(x, mu, sigma, h),
                               data.frame(x = 1:length(ints), y = ints)),
                           silent = TRUE)
                if (class(fit) == "try-error") {
                    corr[i] <- 1        # Shouldn't that be 0???
                } else {
                    ## calculate correlation of eics against gaussian fit
                    if (sum(!is.na(ints - fitted(fit))) > 4 &&
                        sum(!is.na(unique(ints))) > 4 &&
                        sum(!is.na(unique(fitted(fit)))) > 4) {
                        cor <- NULL
                        options(show.error.messages = FALSE)
                        cor <- try(cor.test(ints, fitted(fit),
                                            method = "pearson",
                                            use = "complete"))
                        options(show.error.messages = TRUE)
                        if (!is.null(cor) && cor$p.value <= 0.05)
                            corr[i] <- cor$estimate
                    }
                }
            }
        }                               # End for loop
        message("Peakshape evaluation: sample ",
                 basename(fileNames(currentSample)), ": ", sum(corr < cor.val),
                 "/", nrow(z), " peaks removed")
        z[corr >= cor.val, , drop = FALSE]
    })
    ## Need to "remember" the process history.
    ph <- processHistory(object)
    chromPeaks(object) <- do.call(rbind, res)
    ## Have to add also the process history
    object@.processHistory <- ph
    object
}
12
XCMS / Re: Gaussian shape peak filtering
Thanks Krista for providing the example!
It's also nice to see that you used the filterRt/filterMz in your code.

I'll have a look at it and post soon.

cheers, jo
13
XCMS / Re: How to export XCMS results
Hi,

it looks like you are using the new interface and functions (aka XCMS3) - groupval is not available for the new result objects. To extract the definitions of the features (previously called "groups") you can use the featureDefinitions function and to extract the actual intensity matrix for the features you will have to use featureValues (ensure also to set value = "into" when calling this function to extract the intensities).

Eventually you should also have a look at the xcms vignette that tries to describe how you can access results etc: https://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html

cheers, jo
14
XCMS / Re: replacement for getEIC in XCMS3
Just to update: we've added the function to the xcms codebase. It will be officially available with the next Bioconductor release (3.8) which will be released end of October 2018.

cheers, jo
15
XCMS / Re: replacement for getEIC in XCMS3
For now you can use the function below (just copy-paste into R). I will add it also to xcms and update the vignette to describe your use case.

Code: [Select]
featureChromatograms <- function(x, expandRt = 0, aggregationFun = "max",
                                 ...) {
    if (!hasFeatures(x))
        stop("No feature definitions present. Please run first 'groupChromPeaks'")
    pks <- chromPeaks(x)
    mat <- do.call(rbind, lapply(featureDefinitions(x)$peakidx, function(z) {
        pks_current <- pks[z, , drop = FALSE]
        c(range(pks_current[, c("rtmin", "rtmax")]),
          range(pks_current[, c("mzmin", "mzmax")]))
    }))
    colnames(mat) <- c("rtmin", "rtmax", "mzmin", "mzmax")
    chromatogram(x, rt = mat[, 1:2], mz = mat[, 3:4],
                 aggregationFun = aggregationFun, ...)
}

This function takes an XCMSnExp object, extracts chromatograms for each feature and returns it as an Chromatograms object. The Chromatograms arranges the XIC in a two-dimensional matrix, columns being the individual samples. Each row contains the XIC for one feature. It is thus straight forward to plot the data for one specific feature. The example code below illustrates it on the famous faahko data set:

Code: [Select]
library(xcms)
library(faahKO)
faahko_3_files <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                    system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                    system.file('cdf/KO/ko18.CDF', package = "faahKO"))

## Do a quick and dirty preprocessing
od <- readMSData(faahko_3_files, mode = "onDisk")
od <- findChromPeaks(od, param = CentWaveParam(peakwidth = c(30, 80),
        noise = 1000))
od <- adjustRtime(od, param = ObiwarpParam(binSize = 0.6))
od <- groupChromPeaks(od,
        param = PeakDensityParam(minFraction = 0.8, sampleGroups = rep(1, 3)))

## Extract ion chromatograms for each feature (need to post the code above or use xcms version > 3.3.4
chrs <- featureChromatograms(od)

## Extract also XICs without adjusted retention times
chrs_raw <- featureChromatograms(od, adjustedRtime = FALSE)

## Plot the XIC for the first feature using different colors for each file
par(mfrow = c(1, 2))
plot(chrs[1, ], col = c("red", "green", "blue"))
plot(chrs_raw[1, ], col = c("red", "green", "blue"))


In your use case you could simply make then a for loop over nrow(chrs) to plot the data for each feature.

Note: you could also use the highlightChromPeaks function to indicate the identified peaks in the individual XICs:

Code: [Select]
plot(chrs[1, ], col = c("red", "green", "blue"))
highlightChromPeaks(od, rt = range(lapply(chrs[1, ], rtime)), mz = range(lapply(chrs[1, ], mz)),
    border = c("red", "green", "blue"))



Hope that helps.
cheers, jo