Explainable Molecular Sets: Using Information Theory to Generate Meaningful Descriptions of Groups of Molecules

Adam C. Mater, Michelle L. Coote

Research output: Contribution to journalReview articlepeer-review

Abstract

Algorithmically identifying the meaningful similarities between an assortment of molecules is a critical chemical problem, and one which is only gaining in relevance as data-driven chemistry continues to progress. Effectively addressing this challenge can be achieved through a reformulation of the problem into information theory, cluster-based supervised classification, and the implementation of key concepts, particularly information entropy and mutual information. These concepts are combined with unsupervised learning atop learned chemical spaces to generate meaningful labels for arbitrary collections of molecules. An open-source and highly extensible codebase is provided to undertake these experiments, demonstrate the viability of the approach on known clusters, and glean insights into the learned representations of chemical space within message-passing neural networks, an architecture not readily permitting interpretability. This approach facilitates the interoperability between human chemical knowledge and the algorithmically derived insights, which will continue to become more prevalent in the coming years.

Original languageEnglish
Pages (from-to)4877-4889
Number of pages13
JournalJournal of Chemical Information and Modeling
Volume61
Issue number10
DOIs
Publication statusPublished - 25 Oct 2021
Externally publishedYes

Keywords

  • Molecule
  • chemistry
  • chemical space

Fingerprint

Dive into the research topics of 'Explainable Molecular Sets: Using Information Theory to Generate Meaningful Descriptions of Groups of Molecules'. Together they form a unique fingerprint.

Cite this