TY - JOUR
T1 - Explainable Molecular Sets
T2 - Using Information Theory to Generate Meaningful Descriptions of Groups of Molecules
AU - Mater, Adam C.
AU - Coote, Michelle L.
PY - 2021/10/25
Y1 - 2021/10/25
N2 - Algorithmically identifying the meaningful similarities between an assortment of molecules is a critical chemical problem, and one which is only gaining in relevance as data-driven chemistry continues to progress. Effectively addressing this challenge can be achieved through a reformulation of the problem into information theory, cluster-based supervised classification, and the implementation of key concepts, particularly information entropy and mutual information. These concepts are combined with unsupervised learning atop learned chemical spaces to generate meaningful labels for arbitrary collections of molecules. An open-source and highly extensible codebase is provided to undertake these experiments, demonstrate the viability of the approach on known clusters, and glean insights into the learned representations of chemical space within message-passing neural networks, an architecture not readily permitting interpretability. This approach facilitates the interoperability between human chemical knowledge and the algorithmically derived insights, which will continue to become more prevalent in the coming years.
AB - Algorithmically identifying the meaningful similarities between an assortment of molecules is a critical chemical problem, and one which is only gaining in relevance as data-driven chemistry continues to progress. Effectively addressing this challenge can be achieved through a reformulation of the problem into information theory, cluster-based supervised classification, and the implementation of key concepts, particularly information entropy and mutual information. These concepts are combined with unsupervised learning atop learned chemical spaces to generate meaningful labels for arbitrary collections of molecules. An open-source and highly extensible codebase is provided to undertake these experiments, demonstrate the viability of the approach on known clusters, and glean insights into the learned representations of chemical space within message-passing neural networks, an architecture not readily permitting interpretability. This approach facilitates the interoperability between human chemical knowledge and the algorithmically derived insights, which will continue to become more prevalent in the coming years.
KW - Molecule
KW - chemistry
KW - chemical space
UR - http://www.scopus.com/inward/record.url?scp=85118134052&partnerID=8YFLogxK
UR - http://purl.org/au-research/grants/ARC/FL170100041
U2 - 10.1021/acs.jcim.1c00519
DO - 10.1021/acs.jcim.1c00519
M3 - Review article
AN - SCOPUS:85118134052
VL - 61
SP - 4877
EP - 4889
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
SN - 1549-9596
IS - 10
ER -