Fig. 1: Estimating unfamiliarity of molecular data using joint modelling.

a, Conceptual representation of the applicability domain. Molecules close to the training data in chemical space are within a models’ applicability domain. Molecules outside of this boundary are considered OOD. b, The architecture of the JMM estimates how ‘unfamiliar’ a molecule is to the model through its reconstruction loss. c, Inducing molecular distribution shifts by separating molecular data into in-distribution and OOD groups through spectral clustering. Results for the Orexin receptor 2 (OX2R) dataset are shown. CNN, convolutional neural network. RNN, recurrent neural network.