Skip to main content
Topic: InChIKey Generation (Read 2485 times) previous topic - next topic

InChIKey Generation

Developers,

There seems to be a bug in the InChiKey generation. I'm getting a few InChiKeys that don't seem to match their corresponding metabolite. At least, they pull up 0 results on Pubchem. It looks like they should have a S instead of an N in the third position from the end. I listed a few examples below.

Name                             MS-DIAL Generated InChIKey                    Correct(?) InChIKey
3-Hydroxyvaleric acid   REKYPYSUBKSCAT-UHFFFAOYNA-N          REKYPYSUBKSCAT-UHFFFAOYSA-N
Leucine                         ROHFNLRQFUQHCH-UHFFFAOYNA-N      ROHFNLRQFUQHCH-UHFFFAOYSA-N
Pyroglutamic acid        ODHCTXKNWHHXJC-UHFFFAOYNA-N      ODHCTXKNWHHXJC-UHFFFAOYSA-N
Threonine                     AYFVYJQAPQTCCC-UHFFFAOYNA-N          AYFVYJQAPQTCCC-UHFFFAOYSA-N

Could this be easily patched, or is there a workaround? I guess I could use the SMILES instead of the InChIKeys.
Thanks!

Re: InChIKey Generation

Reply #1
I also have a same problem - my workaround is to use just the first layer of the InChIKey plus the UHFFFAOYSA-N. Separation of enantiomers (second layer of the InChIKey) is anyway rare on classic LC-MS.

@Hiroshi Tsugawa
I am not sure MS-DIAL/MS-FINDER is generating InChIKeys by itself. It seems there is some library in MS-DIAL which contains UHFFFAOYNA-N ending. Perhaps changing UHFFFAOYNA-N to UHFFFAOYSA-N would resolve the confusion?

Re: InChIKey Generation

Reply #2
Hi,

I do not use any database information to put InChIKey into MSP/MSFINDER databases.
I use ChemAxon molconvert program to generate InChIKey from smiles/inchi codes.
Therefore, there should be some inconsistencies with the databases like PubChem. As discussed here,  in metabolomics, the first layer of InChIKey can be used for searching metabolites. For example, the second layer's key "UHFFFAOY" means "it does not have stereoisomer information".

The last three characters like "NA-N" describe (1) N/S: standardized or not (2) A: version of InChIKey (3) -N: ionized form (proton state).

We should know those things to search metabolites by using InChIKey identifiers.
Thanks,

Hiroshi