Extended Data Fig. 5: Lasso optimization identifies parsimonious sets of chemical descriptors that predict neural odour relationships. | Nature

Extended Data Fig. 5: Lasso optimization identifies parsimonious sets of chemical descriptors that predict neural odour relationships.

From: Structure and flexibility in cortical representations of odour space

Extended Data Fig. 5

a, Left, descriptors identified through training on one odour set also improve Pearson’s correlation (r) between corresponding chemical and neural distances for held-out sets of odours. C, clustered; G, global; T, tiled. A value of 1 in the matrix corresponds to no improvement from baseline Pearson’s r value after optimization. Baseline chemical-neural correlation is 0.22 for global; 0.48 for clustered; 0.37 for tiled (see Supplementary Table 1 for optimal descriptor sets). Right, reduction in mean-squared error (MSE) between chemical and neural odour pair distances for held-out odour sets (indicated below the x axis) after training on a single odour set (indicated above). Note that the five odours in common between the global and clustered odour sets (names in bold in Extended Data Fig. 2e) were discarded when evaluating performance on held-out data. The chemical features learned from the tiled odour set improved chemical–neural Pearson’s correlations in the clustered odour experiment but not the global odour experiment, consistent with the odours belonging to the tiled set covering only a limited region of chemical odour space (left). However, despite the limited chemical overlap between the tiled and global odour sets, training on the tiled odour set still improved the correspondence between odour chemistry and neural responses for the global odour set as assessed by a reduction in the mean-squared error (right). b, Identifying a subset of chemical descriptors (from the original superset used to define odour space) using Lasso optimization on odour distances improves the correspondence to cortical activity (Methods, Supplementary Table 1). Training data were derived from the bouton dataset, and testing was performed for bouton responses to held-out odours within the tiled odour set, and also to cortical responses of the tiled odour set. Data are mean ± s.e.m. over cross-validation folds. c, The same procedure as in b was performed on a limited subset of 15 semantically relevant descriptors that comprise the ‘molecular properties’ block of the Dragon database; these descriptors include metrics that reflect molecular properties associated with functional groups (for example, donor or acceptor atom surface area), molecular weight (for example, van der Waals molecular volume) or a combination of both, such as ‘hydrophilic factor’, and reflect the main axes of diversity in the tiled odour set. Most descriptors enriched in the olfactory bulb covary with molecular weight (red descriptors). Most descriptors enriched in PCx reflect the combined presence of a charged atom and variable number of carbon atoms along the aliphatic series of the tiled odour set (blue descriptors). Note that these descriptors differ from those identified when querying the entire Dragon set using Lasso optimization (Supplementary Table 1), as this limited set of targeted descriptors (selected because their semantic meaning is transparent) may not afford optimal predictions over neural data.

Back to article page