Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - CoreyG

1
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Random Forest batch correction (Quality Control – Random Forest Signal Correction; QC-RFSC) is incorporated into statTarget and is available in R (https://bioconductor.org/packages/release/bioc/html/statTarget.html).

Alternatively, there was a recent paper by Fiehn's lab on Systematical Error Removal using Random Forest (SERRF). Some details of the method can be found: https://slfan2013.github.io/SERRF-online/. Unfortunately, they have set it up as an online portal, where you have to upload your data to their server. This could constitute a breach of ethics for data sharing/storage, so be you have permission to do so if you use it.
2
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Glad to see things moving along, Metabolon1.

Regarding batch correction, take a look at a recent topic that came up on the forums: Feature intensity drift correction suggestions. In addition, check out the Metabolomics Society Wiki page on batch effects for some other packages: Batch Effect.
There are also a few RandomForest based methods floating around. I'll grab the name of the package tomorrow.

In regards to the latest plots you put up, they looked remarkably similar to plots I made when I "exported" my data with featureValues but didn't specify 'value = "into"'. If you don't specify this, featureValues just returns the row number of chromPeaks for that peak.

Lastly, I too had to set 'binSize' (in PeakDensityParam) quite a bit lower as I observed the feature m/z range was a bit too large given the ppm error that I would expect.

Cheers,
Corey
3
XCMS / Re: Implementing custom retention time alignment algorithms
Thanks Johannes,

I'll take a look at making the changes and generating a pull request (never done one before).

No problem regarding xcms centWave. I don't think I'd be brave enough to suggest changes there.

Cheers,
Corey
4
Other / Re: Open software for SRM experiments
Hi VSO,

I've used skyline (https://skyline.ms/project/home/software/Skyline/begin.view) a fair bit in the past for metabolomics SRM data analysis. You can export peak areas, retention times (apex, start of integration, end of integration) and FWHM fairly easily (using document grid). I don't know if it generates a metric for noise (or S/N)...

The hardest part when getting started, is that you have to manually specify what transitions you want to look at (edit->Insert->Transition list). Skyline won't read the transition list from a file automatically.

Cheers,
Corey
5
XCMS / Re: Implementing custom retention time alignment algorithms
Thanks for looking into and fixing the error - very much appreciated.

I was quite intrigued that you said fillChromPeaks always uses the adjusted retention time, so I looked a bit deeper into the code.
It seems fairly simple to allow the ability to integrate using the original rt range.

If you included another parameter on getChromPeakData to select whether switch back to the unadjusted rtrange, stored the unadjusted rtime, figured out which index of rtim is the rtmin and rtmax, then use those indexes to get the original rt for use in the calculation of res[,"into"]
Code: [Select]
.getChromPeakData <- function(object, peakArea, sample_idx,
                             mzCenterFun = "weighted.mean",
                             cn = c("mz", "rt", "into", "maxo", "sample"),
                             unadjusted=FALSE) {
...
rtim <- rtime(object)
if(unadjusted) rtim_adjusted <- rtime(object,adjusted=!unadjusted)
...
rtScans<-range(which(rtim >= rtr[1] & rtim <= rtr[2]))
...
if(unadjusted) rtrange<-rtim_unadjusted[rtScans] else rtrange<-rtim[rtScans]
...
res[i, "into"] <- sum(mtx[, 3], na.rm = TRUE) *
          ((rtrange[2] - rtrange[1]) /
             max(1, (sum(rtim >= rtr[1] & rtim <= rtr[2]) - 1)))

However, this highlighted something else in the code that felt odd to me (again, I'm making a lot of assumptions).
By using 'rtr[2] - rtr[1]' in the calculation of "into", don't we always end up overestimating the area of the peak?
rtr comes from the medians of other samples, but getChromPeakData integrates using scans found between these limits. So the rt range of where it integrates is notionally smaller than 'rtr[2] - rtr[1]'. In the example above, rtrange is indeed smaller (with unadjusted=FALSE).

Could we iterate over peakArea and calculate new rtmin and rtmax based on the actual rtime?
Code: [Select]
peakArea<-apply(peakArea,1,function(pk) {
    # Get start and end index of rtim between rt range
    rtScans<-range(which(rtim >= pk["rtmin"] & rtim <= pk["rtmax"]))
   
    # Convert median rt range to actual rt range
    pk[c("rtmin","rtmax")]<-rtim[rtScans]
   
    # If the user wants unadjusted rt range give it to them, otherwise just rt range
    if(unadjusted) rtrange<-rtim_unadjusted[rtScans] else rtrange<-rtim[rtScans]
   
    # Save rt range in peakArea, so it can be used instead of rtr[2]-rtr[1] for res[i,'into']
    pk[c("rtDiff")]<-diff(rtrange)
   
    return(pk)
})
peakArea<-t(peakArea)

I'm not sure how centWave integrates peaks and how the rtmin and rtmax are chosen. So maybe this doesn't make sense...

Cheers,
Corey
6
XCMS / Re: Implementing custom retention time alignment algorithms
Hi Johannes,
Just to be clear, I am using 'adjustedRtime' to apply the adjusted retention times (and not following it with 'applyAdjustedRtime').
Code: [Select]
adjustedRtime(xdata)<-scans

If I use applyAdjustedRtime after adjustedRtime, I do not get an error with fillChromPeaks. This is likely because hasAdjustedRtime returns FALSE, so the processHistory check never gets performed (methods-XCMSnExp.R#L651, I think)
My concern with using applyAdjustedRtime, is that the data will be slightly warped and so the integration during fillChromPeaks will be slightly off. That is, unless it using the retention time when loading the raw data again? In which case the whole thing is solved  :))

Nonetheless, I am using 'R Under development (unstable) (2019-01-14 r75992)', 'xcms_3.5.1' and 'MSnbase_2.9.3'.
I compiled this version of xcms from the github page ("sneumann/xcms") to utilize the subset feature, so I'm not sure if that version number above is necessarily correct.

Thanks
7
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Hopefully someone with more experience will chime in when they get the chance.

In my experience, limiting the number of available cores will reduce how much total ram is used/allocated i.e. ram usage should be somewhat linear with the number of parallel processes. On our lower spec desktops, we use 4 threads to keep the ram usage below 16 GB.

Based on my understanding, readMSData generates the "onDisk" object without having to load all the raw data into ram. findChromPeaks will cause each parallel process to load a raw data file from the hard drive into memory, perform the peak detection, then clear the unneeded data from memory. So this step will be memory and IO intensive.
After that, retention time alignment may or may not be similarly demanding. Obiwarp works on the raw data, so it will need to be loaded into ram again. Peakgroups, on the otherhand, doesn't require the raw data.
The last operation I usually perform is fillChromPeaks. This, again, requires the raw data to be loaded. In my hands, this seems to be the most memory intensive step, requiring me to run it single threaded even with 32 GB of system ram.

You certainly could get away with changing the number of available cores at different steps. But you might need to experiment to determine what works best for your system. In our case, we ran scripts with different number of threads and monitored the systems - reducing the number until it was stable.
8
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Take my comments with a grain of salt, because I only ever work with windows desktop machines.

I would imagine it comes down to the capabilities of the machine (IO performance etc).
There is a batch of samples running on a desktop right next to me, using 6 threads and 20 GB of ram (including windows overhead). The hard drives are barely being touched; just the occasional access. So it would hardly be a problem here. But you could imagine if 78 cores are all calling/writing data to the hard drives - that would be a significant bottleneck.

Given what Johannes said, I would limit the core count. The worst thing you could do is limit the available ram so it has to keep paging the memory to the hard drives (I'm assuming servers would have to do this as well?).

Good luck!
9
XCMS / Implementing custom retention time alignment algorithms
Hi Everyone,

We've been working on some retention time alignment algorithms, recently. The major driver for this, is that we have noticed that compounds from different classes can exhibit unique retention time drift behavior. So even though they initially elute at very close retention times, one class of compounds will begin to elute later in the chromatogram while the other class elutes earlier.

The issue we are facing, is how can we best apply these adjusted retention times to an XCMSnExp objects and still maintain all the capabilities of XCMS?

Currently, we can force the adjusted retention times into an XCMSnExp object by manually calling adjustedRtime. I know this isn't recommended practice, but is there an alternative? We don't want to use applyAdjustedRtime as we want to go back and integrate missed peaks.

For the most part, this appears to work well, except when we run fillChromPeaks. We get the following error:
Code: [Select]
> xdata<-fillChromPeaks(xdata,BPPARAM=SerialParam())
Defining peak areas for filling-in ....Error in if (idx_rt_adj > idx_pk_det) { : argument is of length zero

This appears to be produced when dropChromPeaks is called (fillChromPeaks->filterFile->chromPeaks->dropChromPeaks). Essentially, dropChromPeaks looks in processHistory to determine whether peak detection occurred before or after retention time alignment. But the retention time alignment isn't in processHistory.

I couldn't quite see how we can add entries into the processHistory. Eventually I resorted to fudging an entry to circumvent the error.
Code: [Select]
processHolder<-processHistory(xdata)
processHolder[[1]]@type<-"Retention time correction"
xdata@.processHistory<-c(processHistory(xdata),processHolder)
Is there a simpler way to accomplish this?

Would anybody care to offer some advice/suggestions for any part of this? I'm happy to have anyone's input.
Thanks!
10
Other / Re: MAVEN
As another follow up, you could look at "El MAVEN" (https://elucidatainc.github.io/ElMaven/). The website says the following:
Quote
Maven and El-MAVEN share following features:
  • Multi-file chromatographic aligner
  • Peak-feature detector
  • Isotope and adduct calculator
  • Formula predictor
  • Pathway visualizer
  • Isotopic flux animator
El-MAVEN is robust, faster and with more user friendly features compared to Maven.
It is being updated fairly recently, with the latest release coming out just 5 days ago.
11
Other / Re: MAVEN
Hi Debbie,

I haven't used MAVEN and by the lack of responses, it doesn't seem like many (any?) people here use it.

I would suggest trying a few things and letting us know if any worked/didn't:
  • Increase the ppm window until you see the peak you expect. If you never see it, there is likely a bigger issue going on.
  • Use msconvert and save the data with and without centroiding.
  • Use msconvert and save the data with a filter for MS1 only.

I assume you are using MAVEN for the analysis part of the program?
13
Chromatography / Re: Buffering both Eluents - Acetonitrile with Ammonium Carbonate
Hi Debbie,

I'm not a user of HILIC columns, but I've heard a few times that retention time reproducibility can be quite sensitive to pH changes. So adding a buffer to the apolar solvent could help.

There are a few discussions going on in the forum about correcting for retention time drift. So if you can't sort it out, it is possible to correct some of the issues with software (such as XCMS).

Keep us informed of your trials - there are a lot of others that will benefit from your experiences!

Cheers,
Corey
14
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
The "onDisk" mode of xcms has allowed us to process ~1,000 samples comfortably on a desktop machine - although it does take some time. Retention time alignment and correspondence happens quite fast and hasn't given us any trouble at all.
The only problem we've had is with fillChromPeaks, where we need to run it single threaded due to memory constraints.