Skip to main content
Topic: access centroids from getEIC (Read 3419 times) previous topic - next topic

access centroids from getEIC


 I would like access to any peak's associated centroid (rt, m/z, int) triplet pairs.  I have searched both forums and pdf documentation looking for how to use the function getEIC(). However, its been difficult to unlock the information within the object besides just plotting it. I don't necessarily need the plot. I want something like the following:

eic(1): { (rt, m/z, int) (1), ..., (rt, m/z, int) (n) }
eic(n):  { (rt, m/z, int) (1), ..., (rt, m/z, int) (n) }

That would allow me to compare its performance to other algorithms on a centroid basis. I would prefer to avoid using the findPeaks.centWave() method that returns (among other things)

eic(1): { mzmin, mzmax, rtmin, rtmax }.

I could use this to get all points that fall in this region, but that sometimes will return more than one centroid per scan that would violate the ROI algorithm.


Re: access centroids from getEIC

Reply #1
I'm a little bit confused by the terminology you are using.

XCMS terminology (which unfortunately might not be consistent throughout the documentation due to the different contributors) :

A feature describes a 2D region in the m/z vs. retention time space.
An EIC is the projection (sum) of the intensity values within a defined m/z range.
The term peak is used for 1D signals in the spectrum or the chromatogram, but is also sometimes used as a synonym for feature.

If you are interested in the raw data that corresponds to a list of features, returned by a feature detection algorithm,
then you can extract the raw data (centroids in case of centroid mode data) that
corresponds to these features using the plotRaw function.

The EIC's for these features can be plotted using the getEIC, plotChrom or rawEIC functions.

findPeaks.centWave is an algorithm designed for centroid mode data and returns a list of features.

I hope that helps and didn't cause more confusion :)


Re: access centroids from getEIC

Reply #2
Hi Ralf,

Thanks, I should be more precise. My goal is to evaluate centWave's performance on a manually annotated LC-MS centroided data set.
 In order to do that, I need the values associated for each corresponding feature that centWave determines.  That
way I can generate quantitative scores  like X% sensitivity and Y%Sensitivity or find out the fraction of correct centroids
identified for any given feature.

I would like something like this if possible:

>(object or matrix)  = function for feature finding with centwave()

>And some getter function returns some data structure returning the following.

feature(1): { (rt, m/z) (1), ..., (rt, m/z) (k) }
feature(n): { (rt, m/z) (1), ..., (rt, m/z) (k) }

What has not worked.

>findPeaks.centwave()  => Returns a list of features (m X 10 matrix) which is nice, but the list of features only corresponds to summary statistics.

mz mzmin mzmax rt rtmin rtmax ....

(2) (a)
>xcmsRaw(file) => Returns an xcmsRaw object and cannot call feature finding like an xcmsSet object.
>plotRaw(file) => Returns a (m X 3) column of (rt, mz, int) of the whole data set, which is nice.
                            I would like something like this, but with respect to features not the whole data set.
                          However, this function only seems to accept xcmsRaw objects, which cannot find features.

> xcmsSet( ..., method = 'centWave', ...)  => returns an xcmsSet object with feature finding functionality.
                                                                However, documentation only shows it returning EICs for visualization.

Appreciate the time,


Re: access centroids from getEIC

Reply #3
You can write a script that loops through the feature list that you get from centWave,
and for each feature you call <- plotRaw(my-xcmsRaw.object, mzrange=..., rtrange= ...)
specifying mzrange and rtrange using mzmin, mzmax, rtmin, and rtmax for that feature.

But I am not sure if I understand why you want to compare or evaluate the raw data for each feature region instead of using the feature coordinates (mz, rt) = the feature center point.


Re: access centroids from getEIC

Reply #4
Thanks, I am implementing that right now :) Got some wierd bug :(

[quote author="Ralf"]why you want to compare or evaluate the raw data for each feature region instead of using the feature coordinates (mz, rt) = the feature center point.[/quote]

Fair question,

 You know that is not a bad way of determining feature identities; I hadn't thought of that.
With a manual annotation of a feature's associated centroids,
 we can better know what fraction of a feature does an algorithm find. This way you can generate
metrics of specificity because non-feature belonging centroids are noise and thereby a true negative.
In the absence of centroid-based criteria, we would rely simply on precision and recall like your paper
since it is difficult to categorize a true negative on a feature basis. What is a non-feature (if you see
what I mean)? Both types of metrics have their own advantages and disadvantages of characterizing performance.

Re: access centroids from getEIC

Reply #5
The biggest problem that I see for this type of comparison is to come up with a reliable true positive (TP)  list.
Standard mixtures in combination with manual or automatic annotation are often used for that.

But even then,  ALL - TP != TN, since it is almost impossible to predict all the possible fragments, adducts and cluster ions.
Furthermore, even with commercial standards that have only 98% or 99% purity after HPLC
some of the contaminants ionize very well, and all these additional features can hardly be assigned as FP ...

Re: access centroids from getEIC

Reply #6
Interesting and a very good point. I will consult my PI about that. Right now I am just trying to capture the evaluation from as many angles as possible. For example, your 7/10 cross-sample validation in the centWave paper was a nice creative way of generating a "ground truth". :D