Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - johannes.rainer

1
XCMS / Re: Scan numbering for DDA/IDA-experiments
One possibility would be to first extract the ion chromatograms for all detected peaks and then count the number of data points in each:

Code: [Select]
## Subset to one file, assuming xdata is an XCMSnExp with identified chrom peaks
xdata_1 <- filterFile(xdata, 1)
chrs <- chromatogram(xdata_1, rt = chromPeaks(xdata_1)[, c("rtmin", "rtmax")],
        mz = chromPeaks(xdata_1)[, c("mzmin", "mzmax")])
head(lengths(chrs))
## median number of scans for all peaks in a the file
median(lengths(chrs))

Note: this should be done separately for each file, hence the filterFile step.
2
XCMS Online / Re: Discrepancy between XCMS online and XCMS R-Package
AFAIK xcmsOnline uses an old version of xcms (that's what you see in the logs) - and most likely also an old version of R. The development of xcms and xcmsOnline somehow diverged at some point and all the new developments in xcms (aka xcms3) are not used/available in the online version.
 
Some discrepancies could eventually be explained by the changes in xcms that we did during the update and modernization (have also a look at the vignettes of xcms https://bioconductor.org/packages/release/bioc/html/xcms.html, specifically the New and modified functionality in xcms).

For a thourough comparison one would however have to start with an standalone R xcms version 1.47.3 and compare its results to those of xcmsOnline. Could be that xcmsOnline has some internal helper functions and modifications that are not available in the standalone R version of xcms ... but I am only guessing here.
3
XCMS / Re: Scan numbering for DDA/IDA-experiments
I would be careful with the scmin/scmax lmin/lmax columns - I do not recall what they exactly mean. We do by default not record from which spectrum the data of a chromatographic peak comes, but with the retention time and m/z range available it is easy to subset/extract all spectra for one chromatographic peak.

What exactly do you want/need to do with the data? Maybe there is a simple solution for that...
4
Other / Re: Open software for SRM experiments
I'm also not familiar with SRM/MRM data. But we have implemented a readSRMData function in MSnbase to read chromatographic data from mzML files. Would be nice to know if that works for you and what is missing. Note also that xcms can do now peak detection also on purely chromatographic data (i.e. Chromatogram/Chromatograms classes which are returned by the readSRMData function).
5
XCMS / Re: Implementing custom retention time alignment algorithms
Thanks for sharing your ideas. If you would like some changes in xcms I would however appreciate if you open an issue at the xcms github repository as this enables me also to keep track of what to do (and what was done) - a pull request would actually be even better ;)

Regarding the code in xcms centWave - I did not write that code and I am veeerrry hesitant to change anything within it as this will affect all xcms users.

thanks again, jo
6
XCMS / Re: Implementing custom retention time alignment algorithms
Thanks for clarifying!

Regarding your concern, the fillChromPeaks will always use the adjusted retention times if retention time adjustment has been performed. So, the results should be the same, with or without applyAdjustedRtime.

Regarding the error: I fixed that. You can install the updated version from github:

For the developmental version you are using:
Code: [Select]
devtools::install_github("sneumann/xcms")

For the Bioconductor 3.8 release (R-3.5.x)
Code: [Select]
devtools::install_github("sneumann/xcms", ref = "RELEASE_3_8")

7
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Excellent summary @CoreyG ! In the end the 78 cores don't help unless you have enough memory to fit the full data from 78 mzML files at once into memory. Peak detection and filling in missing peak data both need to load the full data of a file (all spectra). You could load one of your mzML files into memory and then calculate the size of that using the object_size function from the pryr package:

Code: [Select]
obj <- readMSData(filename, mode = "inMem")
pryr::object_size(obj)

You could thus estimate how many processes you could run in parallel. On top of that you will however also need some more memory to fit the chromPeaks matrix and later the featureDefinitions data frame and R will need some more space to copy stuff in the background.

9
Tools / Re: Data from waters - mass measure in centroid mode
We have AB Sciex data and so far I used proteowizard centroiding (with vendor option). This turned out to be a poor choice. What I am doing now is to export all data from Sciex as mzML in profile mode and perform the centroiding in R using MSnbase.

Have also a look at https://github.com/jotsetung/metabolomics2018 where I describe the centroiding with MSnbase (specifically https://jotsetung.github.io/metabolomics2018/xcms-preprocessing.html#23_centroiding_of_profile_ms_data).
10
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Just chiming in with some explanations how xcms works now with large projects. We use now an object from the MSnbase package to represent the raw data, that only reads the raw (spectrum) data if required. That way the memory use is minimized. Peak detection is then performed on a per-file basis, i.e. reading the full data from one file, performing the peak detection on that and then removing the spectrum data again from memory. As Jan mentioned, you should be careful to not have too many parallel processes running, as the I/O will be the bottleneck, not the number of CPUs. On our cluster I use not more than 24 CPUs in parallel (although we have 292) because otherwise I got serious troubles with the I/O (this is most likely because our disk setup is ... suboptimal).

Just have a look at a recent xcms vignette (R 3.5.1, Bioconductor 3.8) how to perform the analysis. xcms uses by default this "onDisk" mode.

cheers, jo
13
Other / Re: Feature intensity drift correction suggestions
I'm using linear models to estimate the drift for each feature (similar to Wehrens et al mentione by Jan above) on QC samples requiring that I have valid measurements from the QC samples for 3/4 of the injection index range within a measurement run.

I had also high hopes on the RUVIII approach, but it failed completely on the data set I tested it on. My best guess for this failure is that it needs internal standards to estimate the "unwanted variance", and the values I got from the internal standard were not representative for the "real" samples.

cheers, jo
14
XCMS / Re: Gaussian shape peak filtering
In fact you did everything correctly. The problem was that replacing the identified peaks with chromPeaks(object) <- removes also the processing history (i.e. the information with which algorithm the peaks were detected). The "show" method for the XCMSnExp object had a bug that caused the error that you've seen. So nothing you did wrong, it was a bug in the function that ouputs the information on the object (I'll fix that in xcms).

The simple solution is to add the line
Code: [Select]
object.new@.processHistory <- object@.processHistory

to your function right before you return the result

I did also tweak your function a little bit. This version works also with objects after alignment:

Code: [Select]
peakShape_XCMS3v2 <- function(object, cor.val = 0.9, useNoise = 100) {
    require(xcms)
    ## Ensure we use adjusted retention times!
    xdata <- applyAdjustedRtime(object)
    peakmat <- chromPeaks(xdata) #extract everything

    res <- lapply(split.data.frame(peakmat, peakmat[, "sample"]), function(z) {
        ## Subset the data to the present file keeping only spectra that
        ## are in the peak range.
        currentSample <- filterRt(filterFile(xdata, z[1, "sample"]),
                                  rt = range(z[, c("rtmin", "rtmax")]))
        ## Loading the data into memory - could be faster for some settings
        ## can however also be removed
        currentSample <- as(as(currentSample, "OnDiskMSnExp"), "MSnExp")
        corr <- numeric(nrow(z))
        for (i in seq_len(nrow(z))) {
            mzRange <- z[i, c("mzmin", "mzmax")] + c(-0.001, 0.001)
            rtRange <- z[i, c("rtmin", "rtmax")]
            suppressWarnings(
                ints <- intensity(filterMz(filterRt(currentSample, rtRange),
                                           mzRange))
            )
            ints[lengths(ints) == 0] <- 0
            ints <- as.integer(unlist(ints))
            ints <- ints[!is.na(ints)]
            ints <- ints[ints > useNoise]
            ## Only proceed if we have values left
            if (length(ints)) {
                ## Normalize/baseline subtract
                ints <- ints - min(ints)
                if (max(ints) > 0)
                    ints <- ints / max(ints)
                fit <- try(nls(y ~ SSgauss(x, mu, sigma, h),
                               data.frame(x = 1:length(ints), y = ints)),
                           silent = TRUE)
                if (class(fit) == "try-error") {
                    corr[i] <- 1        # Shouldn't that be 0???
                } else {
                    ## calculate correlation of eics against gaussian fit
                    if (sum(!is.na(ints - fitted(fit))) > 4 &&
                        sum(!is.na(unique(ints))) > 4 &&
                        sum(!is.na(unique(fitted(fit)))) > 4) {
                        cor <- NULL
                        options(show.error.messages = FALSE)
                        cor <- try(cor.test(ints, fitted(fit),
                                            method = "pearson",
                                            use = "complete"))
                        options(show.error.messages = TRUE)
                        if (!is.null(cor) && cor$p.value <= 0.05)
                            corr[i] <- cor$estimate
                    }
                }
            }
        }                               # End for loop
        message("Peakshape evaluation: sample ",
                 basename(fileNames(currentSample)), ": ", sum(corr < cor.val),
                 "/", nrow(z), " peaks removed")
        z[corr >= cor.val, , drop = FALSE]
    })
    ## Need to "remember" the process history.
    ph <- processHistory(object)
    chromPeaks(object) <- do.call(rbind, res)
    ## Have to add also the process history
    object@.processHistory <- ph
    object
}
15
XCMS / Re: Gaussian shape peak filtering
Thanks Krista for providing the example!
It's also nice to see that you used the filterRt/filterMz in your code.

I'll have a look at it and post soon.

cheers, jo