Skip to main content
Topic: Formulating the metabolite identification result via probability (Read 1823 times) previous topic - next topic

Formulating the metabolite identification result via probability

I\'m new in metabolomics, but familiar with mass spectrometry based proteomics. Recently, I read the new call from Metabolite Identification Task Group to improve current standard system for reporting metabolite identification results (Creek D, Dunn W, Fiehn O, Griffin J, Hall R, Lei Z, et al. Metabolite identification: are you sure? And how do your peers gauge your confidence? Metabolomics. 2014;10:350-3.). My suggestion is, since the rank of candidate molecules is represented by scores or probabilities, the confidence of the first rank can be estimated by the p-value or E-value which indicates how far the highest score is from the distribution of other scores hypothesized as random distribution (this hypothesis is based upon that other candidates are wrong, thus randomly selected by the algorithm). For untargeted analysis, this problem becomes a multiple testing problem that can be estimated by false positive rate/false discovery rate.

Any other suggestion?

Naiping

Formulating the metabolite identification result via probability

Reply #1
The ranking system described in the past and the paper mentioned above is a good approach but has to be build into the tools.

We have developed an unpublished knowledge base tool with the Bandeira lab and it give the ability to do molecular networking but also ID and comparative ID (dereliction as it is called int he natural product community). We simply give an output and then you can rank the hits according to wrong, cannot tell, Compound ID class/possibly correct (isomers etc fall into this) and then correct ID. We let the community tell us what the confidence is as more than one person can subscribe to a data set (this can be thousands of LC-MS runs) and give these star rankings. We also give the overall confidence star rating. In other words build confidence into the software output and it becomes an issue of the past.


Here is a representative link http://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=65349d3bdba24eab93e314ea76765cff&view=group_by_spectrum_all_beta