Skip to main content
Topic: How to create a csv HMDB database without programming (Read 1395 times) previous topic - next topic

How to create a csv HMDB database without programming

This post contains instructions for how to create a HMDB Metabolites CSV database for searching mass spectrometry spectra without using scripts such as Python or R scripts. Two options will be described with Option 1 not requiring any specialized software and the simplest approach. Options 2 uses freeware and is useful to convert sdf files. Some biofluid specific HMDB databases only appear to be available as sdf files so Options 2 may come in useful.

Option 1.
  • Navigate to the HMDB.ca website and from the top menu select “Metabolites” from the Browse dropdown
  • Choose your filters, apply the filter then select the Export button which should be below the filters on the right side (the entire database is approx 248K entries)
  • The export may take a few minutes. Try again if it does not work the first time. Once it completes the file will either automatically be downloaded to your browsers default download location or a Save As box will pop up depending on how you have things configured. Notice that the file does not have an extension. Rename the file as desired and add the .CSV extension on the end.
  • Open the file in Excel.
  • Modify as needed. For example, delete any columns you don't want, and change column headers so they recognizable by your mass spec software. For example, the Agilent MassHunter Qualitative software recognizes the following column headers:

    Formula
    Retention Time, RT
    Mass
    Compound Name, Compound, Cpd, Name
    Description, Notes, Comments
    CasId, CAS
    KeggId, KEGG
    HmpId, HMP
    Structure

Option 2.
  • Download the freeware Datawarrior from: https://openmolecules.org/datawarrior/download.html
  • Load the sdf file. Delete any columns, then go to File > Save Special > Textfile…
    Alternatives to deleting the columns in Datawarrior is to import the file from within Excel by navigating to Data > From Text and selecting the desired columns using the Excel import wizard. Alternatively you can just edit the file directly in Excel. If an error is returned when searching the csv file, considered running the Excel CLEAN function on some of the columns.


Other biofluid specific databases available through the https://www.tmicwishartnode.ca/databases/ website are offered as sdf files and, since they are subsets of the main HMDB database, are much smaller. The same biofluid databases are offered through the HMDB.ca website but only as zipped XML files.