Fig. 3 | Nature Communications

Fig. 3

From: Predicting natural language descriptions of mono-molecular odorants

Fig. 3

Predicting olfactory perception across descriptor sets and molecules. a Top Schematic of the direct models for predicting ratings. During training (top row), the direct semantic model (DirSem left column) learns a transformation S from DREAM descriptors’ semantic vectors to Dravnieks descriptors’ semantic vectors. Direct ratings model (DirRat right column) learns a transformation R from molecule ratings on DREAM descriptors to molecule ratings on Dravnieks descriptors. During testing (bottom row), the DirSem and DirRat models use transformations S and R, respectively, to predict molecule ratings on Dravnieks descriptors from the ratings given on DREAM descriptors. Note that during training, DirSem uses no molecules while DirRat uses the shared set of 58 molecules. Both models are tested on these 58 molecules, averaging across 100 repetitions of 10-fold cross-validation. Bottom: The performance of DirSem (blue dots) and DirRat (orange dots) as well as a their averaged mixed model (green dots), as the number of molecules used in training is increased. b Top: Schematic of the indirect models for predicting ratings. During imputation (top row), both models learn the same transformation C from chemoinformatic properties to the ratings on the DREAM descriptors. During training (middle row), the two models imputed semantics ImpSem and imputed ratings ImpRat learn transformations S and R using the same procedure as the training phase of DirSem and DirRat, respectively. During testing (bottom row), the DirSem and DirRat models use the transformations SC and RC, respectively, to predict molecule ratings on Dravnieks descriptors from the ones given on DREAM descriptors. Note that the ImpSem model uses no molecules during training, while the ImpRat model uses molecules from the set of 70 molecules present only in the Dravnieks dataset during training. Both models are tested on these 70 molecules, using cross-validation. Bottom: The performance of the ImpSem (blue squares) and ImpRat (orange squares) models and the mixed model (green squares), as the number of molecules used in training is increased. Inset shows the value of the correlations for the DirSem (black dots) ImpSem (black squares) when no molecules are used during training

Back to article page