Skip to main content

Topics

This section allows you to view all Topics made by this member. Note that you can only see Topics made in areas you currently have access to.

Topics - jamesrco

1
XCMS / Optimizing centWave settings for HPLC-ESI-MS Orbitrap data
Hello!

First post in this forum. I have used xcms in the past, but never before for Orbitrap data. I am having a heck of a time identifying centWave settings that yield acceptable-looking peaks, so I am hoping for some advice. I’ve reviewed a lot of the existing posts, but nothing has really done the trick. I have experimented around with ranges of different values for almost all of the centWave parameters, but still getting some rather questionable peaks. If anyone out there has any suggestions, I would be very grateful.

Background information: I am working with an 18-sample set of natural lipid extracts from an oxidation experiment. All samples run on an Agilent 1100 HPLC stack coupled to an Orbitrap Exactive Plus (100-1500 m/z scan range, chromatogram generally runs out to ~ 25 mins). I used an implementation of msconvert to convert and centroid the Thermo .raw data files, and extract + and – mode scans into separate files. I am looking now at the + mode data. I confirmed successful centroiding by inspection of the mzXML in several of the files with a text reader. I see:

Quote
<dataProcessing centroided="1">
    </dataProcessing>

Right now, I am just working with a single xcmsRaw file, then generating plots using plotPeaks, in order to tune my settings as best as possible. I’ve already run an entire xcmsSet dataset containing all files through a full xcms workup, but I’m skeptical of the results since my peaks look so questionable.

Here’s what I have so far, and some example plots using plotPeaks() from this single sample. I started with the recommended centWave settings for HPLC/Orbitrap in Table 1 of Patti et al., 2012, "Meta-analysis of untargeted metabolomic data from multiple profiling experiment," Nature Protocols 7: 508-516. I’ve also experiment with massifquant, but massifquant appears to be identifying things that definitely aren’t peaks (confirmed by visual inspection when I use sleep = 0.01).

Code: [Select]
mzXMLfiles = list.files(mzXMLfiles_folder_pos, recursive = TRUE, full.names = TRUE)

# create xcmsRaw object from just a single sample (for method development, just using the first sample)

xfile_raw = xcmsRaw(mzXMLfiles[1])
profStep(xfile_raw) = 0.05

xfile_raw is:

Quote
An "xcmsRaw" object with 601 mass spectra

Time range: 0.1-1740.7 seconds (0-29 minutes)
Mass range: 100.0007-1499.9558 m/z
Intensity range: 849.698-406951000

MSn data on  0  mass(es)
   with  0  MSn spectra
Profile method: bin
Profile step: 0.05 m/z (28000 grid points from 100 to 1499.95 m/z)

Memory usage: 149 MB

I then run:

Code: [Select]
rawpeaks = findPeaks.centWave(xfile_raw,
                ppm = 2.5, # setting of Patti et al., 2012
                peakwidth =c(10,45), # max lowered from Patti et al., 2012, HPLC recommended setting of 60 s based on visual inspection of sample data in Excalibur
                fitgauss = TRUE,
                noise = 5000,
 #                sleep = 1,
                verbose.columns = TRUE,
                snthresh = 10,
                integrate = 1,
                prefilter = c(3,10000) # 3.5k recommended by Patti et al. appears to be too low
#                nSlaves = 4 # commenting out right now since this is just one sample
                )

I receive the following output:

Quote
Detecting mass traces at 2.5 ppm ...
 % finished: 0 10 20 30 40 50 60 70 80 90 100
 43503 m/z ROI's.

 Detecting chromatographic peaks ...
 % finished: 0 10 20 30 40 50 60 70 80 90 100
 15107  Peaks.
Warning message:
In .local(object, ...) :
  It looks like this file is in profile mode. centWave can process only centroid mode data !

As I said, I have repeatedly confirmed that the data are centroided.

Here's where I am very skeptical of the peak-picking. Below are images of three sets of peaks from this list of rawpeaks; I am just choosing these three sets of 25 peak at random. I used this code:

Code: [Select]
plotPeaks(xfile_raw,rawpeaks[1:24,],figs = c(5,5),width = 100)

plotPeaks(xfile_raw,rawpeaks[2150:2174,],figs = c(5,5),width = 100)

plotPeaks(xfile_raw,rawpeaks[10150:10174,],figs = c(5,5),width = 100)

to generate these plots:

[attachment=2:cupyrl2t]Rplot.pdf[/attachment:cupyrl2t]

[attachment=1:cupyrl2t]Rplot2.pdf[/attachment:cupyrl2t]

[attachment=2:cupyrl2t]Rplot.pdf[/attachment:cupyrl2t]

For the first set, I am not reading too much into things; we (like everyone) have a lot of junk and noise that co-elutes early at low m/z. However, in some of the subplots, the peaks  supposedly bounded by the grey lines aren't even visible. And, my settings seem to be picking many things that I wouldn't consider peaks.

I've worked up this same dataset using analogous settings in MAVEN (GUI application from the Rabinowitz lab at Princeton) and the vast majority of the peaks I see using that program are much better. So, I'm fairly certain it isn't something wrong with the underlying data (or our chromatography or MS settings). I am interested in using xcms, however, because we have a follow-on custom pipeline that's already scripted in R.

I appreciate any suggestions, and will very happily clarify anything I've written here if it's not clear.

Thank you in advance!

Jamie

[attachment deleted by admin]