Hi Corey, if you use findPeaks with verbose.columns=TRUE and fitgauss=TRUE, then the peaktable contains additional columns with information about the wavelet analysis and the Gaussian parameter mu, sigma and h.
So this feature contains 602 peaks from 434 samples, with mz values between 90.50587 and 90.53768 and the peaks occur between 493 and 571 seconds.
If you look at the figures and see you have two vertical line of points at 493 seconds and 571 seconds within one smooth gaussian function, then the bw parameter was too high. The cluster at ~580 shows no much deviation, which should be normal for an UPLC, because the retention times are very stable.
As you also see in the second plot, the kernel widths are much smaller, which should results into more feature groups. Normally you should also see coloured, dotted vertical lines, which indicates the identified groups and helps you to interpret the results. So I assume you have no features with the mass of 81.52?
Another parameter you could optimize is the mzwid parameter, which is the width of those m/z slides. The default 0.25 m/z is quite huge for an QTOF.
It looks like the program identified 19 peak groups. That means there are 19 analytes identified across multiple samples? The first analyte is eluting at 566.7325 retention time (median) which has 602 peaks and it appears in 434 samples?
The group function is an alignment function, which matches Peak X from Sample A to its corresponding Peak X in Sample B and so on and put the corresponding peaks into one feature group. For the underlying method please check the xcms paper.
The xset@groups output shows you an overview about all detected features, which are arrays defined by a m/z range and a retention time range. The "npeaks" column is the sum of all peaks that falls into that ranges over all samples. The "samples" column is the number of samples, where one or more peaks appears in that specific range. That is also the reason, why npeaks can be higher than the number of samples.
At this point of the analysis I would recommend to optimize your parameters. See ?group.density for a short description. Because the retention time difference for your first feature is quite huge, if you compare it to the second feature. The standard bw = 30 parameter is for a HPLC setup, so for your UPLC a good starting point would be bw = 10.
You could also set sleep = 5 (5 seconds per feature), which produces for each feature a nice figure, where you see on overview about the detected feature and for example if the huge difference in rt means that on the same mz slide two different peaks occurs within a short time.
This is somehow similar to the CAMERA plotEICs function. As far as I know you can't collapse all plots directly from the xcms plotEIC into one plot. One solution could be to use the layout function, as Ralf suggested in another thread. So multiple plots on one page. This should work with the normal plotEIC. Perhaps sleep > 0 not sure.
If you really want all EICs in one plot! (be aware of differences of retention time and intensity), try this snippet. Is an adoption of the last example and the plot quality could perhaps be not optimal
#generate plot plot(0, 0, type = "n", xlim = c(rt.min,rt.max), ylim = c(0, max(maxint)), xaxs = "i", xlab = "Retention Time", ylab = "Intensity", main = paste("Extracted Ion Chromatograms for ", "nTime: From", round(rt.min,3), "to", round(rt.max,3)))
#make nice colors, change to number of peaks plotted col <- c("red","blue") for(i in seq(along=xeic.raw@eic[[1]])){ points(xeic.raw@eic[[1]][[i]], type="l", col=col[i]) } #make legend legend("topright",col=col,legend=mzrange,lty=1)
good question, because the output of getIsotopeCluster can't be mapped directly. It would be a larger code snippet, so I prefer a small change in the function itself. I report back, as soon as I'm finished.
Concerning the value argument: xcms reports 3 different intensity values for each peak. maxo - maximum peak intensity into - integrated peak intensity intb - integrated peak intensity (baseline corrected)
The choose of the intensity is important for the detection in findIsotopes (for calculation C12/C13 threshold) Within getIsotopeCluster it only changes the reported intensity value.
Within our data sets the intb values works best, mainly with low peak intensity.
Aw, bummer. In the pdf help file, it lists "scanrange" as one of the parameters, but I see that "scanrange" is missing when I do as you suggest and type ?xcmsRaw in R. I was hoping I could limit the scan range so that I could look at more than one sample at a time. Currently, each xcmsRaw object is so large that I can only load one at a time into the working memory of my PC.
The xcmsRaw object is to large? Hmm, you could try to set profstep=0. This way no profile matrix is generated. That should save some bytes and centWave works perfect without it. But every function which depends on the profile matrix certainly not!
Other way could be to split the xcmsRaw file, depending on your setup, for example if the file contains MS2. Could you provide me with some more details, if the above doesn't help?
Hi Laura, this doesn't work, because scanrange is a not a parameter for the xcmsRaw constructor. ("Unused argument" means the argument didn't exists in the function definition, see ?xcmsRaw)
In general, you read the complete sample and the subsequent functions like getEIC or findpeaks.centWave can use a subset.
as Jan already pointed out, CAMERA uses multiple informations to decide whether peaks within a short retention time window originate from different co-elution or from the same substance. Those peaks can be adducts, clusters, isotopes and fragments. For example, in our QToF system we observe a lot of in-source fragments.
If you have only a single sample experiment, as in your case, you can only use correlation based on peak shape similarity (short: groupCiS). The groupCorr function, which is a wrapper function for all underlying grouping functions, automatically recognize this.
So in short only those compounds stay together, which shares a high peak shape similarity. But their can be the case, as Jan mentioned, that two compounds have a perfect correlation. I just added one example from our data. Here we have 2 substances (red, blue) we shares perfect co-elution, even from the peak shape. [attachment=0:1oaw7qxf]Bsp5.png[/attachment:1oaw7qxf] But in that case we were lucky and CAMERA was able to annotate both to two different pseudo-molecular ion groups afterwards.
If you would go directly only to annotated peaks, then it could happen that important peaks are sorted out. For example we have a high abundance fragment peak with different adducts, like [F+H]+ and [F+Na]+ , but only a small [M+H]+ with no isotope and adducts. If the mass difference between M and F is not into your rule set, both would be separated. But the peak shape analysis suggests a correlation between both.
So we think that high correlation is mandatory and adduct annotation helps in further interpretation.
the centWave algorithm searches in short for m/z signals occurring in consecutive scans within a specific m/z error. The min/max number of necessary scans are calculated from the peakwidth parameter. The m/z error is the combination of the mzwid and ppm parameters.
At the peak apex the m/z error is certainly within your mentioned 10ppm ranges, but at the peak borders, with are normally at low intensities, the mass accuracy is worse. This also applies to low abundance peaks. That is the reason for choosing higher ppm values.
To get an impression on our data, you can look at the @peaks slot or the general peak list, where for each peak beside the mz values (which is calculated at the peak apex) also the mzmin, mzmax values are reported.
As far as I know, the cdf object written by write.cdf() can be read again only by xcms itself. Other programs like AMDIS or OpenMS fails, because they have additional requirements for the generated cdf.
I'm not sure, if there was some progress in the xcms development, since the last time I checked.
In .local(object, ...) : It looks like this file is in profile mode. centWave can process only centroid mode data !
This it is just a warning from a heuristic function to detect profile data. It can be ignored if your samples are in centroid mode.
Quote
No peak groups found for retention time correction
This can happen with very large sample set, although I would expect that 2990 features should be enough to find at least one. Are the group parameters (defaults) suitable for your setup?
I think this applies to the older "matchedFilter" algorithm. Within "centWave" this value is much lower and should occur very rare. Have a look at the manpage with: