ePubs
The open archive for STFC research publications
Home
About ePubs
Content Policies
News
Help
Privacy/Cookies
Contact ePubs
Full Record Details
Persistent URL
http://purl.org/net/epubs/work/50585941
Record Status
Checked
Record Id
50585941
Title
Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier
Contributors
J Handsel (STFC Rutherford Appleton Lab.)
,
B Matthews (STFC Rutherford Appleton Lab.)
,
NJ Knight
,
SJ Coles
Abstract
We present a sequence-to-sequence machine learning model for predicting the IUPAC name of a chemical from its standard International Chemical Identifier (InChI). The model uses two stacks of transformers in an encoder-decoder architecture, a setup similar to the neural networks used in state-of-the-art machine translation. Unlike neural machine translation, which usually tokenizes input and output into words or sub-words, our model processes the InChI and predicts the IUPAC name character by character. The model was trained on a dataset of 10 million InChI/IUPAC name pairs freely downloaded from the National Library of Medicine’s online PubChem service. Training took seven days on a Tesla K80 GPU, and the model achieved a test set accuracy of 91%. The model performed particularly well on organics, with the exception of macrocycles, and was comparable to commercial IUPAC name generation software. The predictions were less accurate for inorganic and organometallic compounds. This can be explained by inherent limitations of standard InChI for representing inorganics, as well as low coverage in the training data.
Organisation
STFC
,
SCI-COMP
,
SCI-COMP-DST
Keywords
Funding Information
EPSRC
, PSDS (EP/S020357/1)
Related Research Object(s):
Licence Information:
Creative Commons Attribution 4.0 International (CC BY 4.0)
Language
English (EN)
Type
Details
URI(s)
Local file(s)
Year
Journal Article
Journal of Cheminformatics 13, no. 1 (2021): 79.
doi:10.1186/s13321-021-00535-x
2021
Showing record 1 of 1
Recent Additions
Browse Organisations
Browse Journals/Series
Login to add & manage publications and access information for OA publishing
Username:
Password:
Useful Links
Chadwick & RAL Libraries
SHERPA FACT
SHERPA RoMEO
SHERPA JULIET
Journal Checker Tool
Google Scholar