Skip to main content

Show Posts

This section allows you to view all Show Posts made by this member. Note that you can only see Show Posts made in areas you currently have access to.

Messages - Adam Carroll

1
Standards & Databases Interest Group / Rumours of the death of MSI have been greatly exaggerated......
Hi everyone,

I\'m very glad to hear the MSI is still active.

I published a paper describing a central MSI-compliant interactive repository and raw data-processing pipeline for GC-MS metabolomics (MetabolomeExpress; www.metabolome-express.org) midway through last year [Carroll, A.J., Badger, M.R. and Harvey Millar, A. (2010) The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets.; BMC Bioinformatics, 11, 376 http://www.biomedcentral.com/1471-2105/11/376].

Since publication I have had over 30 users from around the world sign up and set up their own repository to start depositing their data sets in and the database of metabolic phenotypes currently contains ~12,000 public metabolite response statistics from 22 independent experiments representing 16 different peer-reviewed publications and provides a range of query tools including cross-study comparisons and database-driven phenocopy (pattern matching) analysis. I\'m finding I really need to advertise the repository though (rather than relying on people doing google searches for \'metabolomics database\') as it is surprising (and disappointing) how often I hear it said that there is no central repository for metabolomics - that\'s not exactly true because I spent a decent sized chunk of my life building one. Although initially published as a pipeline/repository for GC-MS, I have now adapted the database of processed statistics to accept relative metabolite level statistics from all analytical platforms, albeit without the same level of raw data integration as for GC-MS (this will come) and a paper describing this is on the horizon.

The point of my post here is this: I have a lot of highly-annotated metabolomics data to share with other databases and a bunch of data-mining tools that would benefit from having highly annotated data imported from other databases. Clearly, all database operators have a vested interest in co-operating on the development of a universal open exchange format for raw and processed metabolomics datasets. I, like most others it seems, really like the ISA tools and think that some kind of ISA-based exchange format is the way to go for metadata exchange. Susanna, I see that the ISA tools come with out of the box *capability* for MSI-compliant annotation, particularly at the higher levels. However, unless I\'m missing something (ie. you have a set of metabolomics configurations and ontologies not supplied with the normal isa-tools downloads), I think more work needs to be done in the following areas:

- Standardising the lower-level configurations for specific assay types. For example, instead of just having \"metaboliteprofiling_ms\" as an assay type configuration, it would be better to have a range of more focused (not necessarily more detailed) assay configurations such as \"metaboliteprofiling_1D-GC-EI-TOF-MS\" and \"metaboliteprofiling_1D-LC-ESI-MSMS\"... inside which the range of values is already greatly constrained with more specialised fields.

- Also, there is work to be done in defining / refining the ontologies that are appropriate for each field. For example, in the default \"metaboliteprofiling_ms\" configuration, the field ParameterValue[instrument] points to the Proteomics Standards Initiative Mass Spectrometry Ontology. However, there doesn\'t appear to be a branch in that ontology with a simple list of the major types of instrument used in metabolomics (eg. 1D-GC-EI-TOF-MS, 1D-LC-QTOF-MS etc...). In fact, there isn\'t even a single mention of electron impact ionisation (THE most widely used ionisation technique in MS metabolomics) in that ontology.

- For the \'ontology\' fields, limit the fields to the children of relevant branches of those ontologies. For example, don\'t make the user search/browse through the entire Mass Spectrometry ontology to find the list of detector types when there is a branch called detector types - limit their options to that branch so they can just click on the correct one. All these links between fields and ontology branches should be part of an MSI standard ISA configuration.

Finally, as far as I understand it, the ISA-tab format only defines study design and associated metadata. A standard exchange format for metabolomics needs to define standardised, open file formats for raw and processed data that can be referenced from within the ISA metadata. For GC-MS raw data a logical format would be *.CDF. For LC-MS/MS, mzML. For NMR maybe JCAMP (I don\'t know, I\'m more of a mass-spectrometry person myself). What about processed peak identification results? Data matrices? Relative metabolite levels? Statistical results?

I would very much like to cooperate with other stakeholders in bringing out a *widely accepted* complete exchange format so that complete datasets including metadata, raw and processed data from any of the major instruments or organisms can be transferred between compliant databases in a truly plug-and-play manner. It\'s a fair bit of work, so having a variety of specialists sharing the load would be a great thing!

Cheers,

Adam

PS-

As part of making MetabolomeExpress I designed a simple yet extensible tab-delimited metadata format based around the old ArMet schema and the recommendations papers of the MSI and built template (validation schema) variants for each of the major model organisms and experiment types including:

bacterial_ecoli
bacterial_general
environmental
fungal_scerevisiae
fungal_general
insect_dmelanogaster
insect_general
human_invivo
human_invitro
mouse_invivo
mouse_invitro
mammalian_invivo
mammalian_invitro
plant_arabidopsis
plant_rice
plant_general
general_invivo
general_invitro

Having a validation template for each research area independently allowed me to specify which ontology/vocabulary must be used in each field. For example, gene references in \'mouse_invivo\' or \'mouse_invitro\' must be one of the official mouse gene marker symbols as per the Mouse Genome Informatics (MGI) website (stored in a table on the MetabolomeExpress server). I\'ve attached the \'mouse_invivo\' validation template as an example. The codes define the range of valid field values at each field in the format and are all explained in the Appendix of the MetabolomeExpress manual.