Fig. 1: Embedding the model components in an understandable semantic space allows us to systematically and more easily understand the inner workings of large neural networks. | Nature Machine Intelligence

Fig. 1: Embedding the model components in an understandable semantic space allows us to systematically and more easily understand the inner workings of large neural networks.

From: Mechanistic understanding and validation of large AI models with SemanticLens

Fig. 1

a, To turn the incomprehensible latent feature space (hidden knowledge) into an understandable representation, we leverage a foundation model \({\mathcal{F}}\) that serves as a semantic expert. Concretely, for each component of the analysed model \({\mathcal{M}}\), (1) concept examples \({\mathcal{E}}\) are extracted from the dataset, representing samples that induce high stimuli (they activate the component) and (2) embedded in the latent space of the foundation model resulting in a semantic representation ϑ. Further, (3) relevance scores \({\mathcal{R}}\) for all components are collected, which illustrate their role in decision-making. b, This understandable model representation (set of ϑ’s, potentially linked to \({\mathcal{E}}\)’s and \({\mathcal{R}}\)’s) enables one to systematically search, describe, structure and compare the internal knowledge of AI models. Further, it allows one to audit the alignment to human expectation and opens ways in which we can evaluate and optimize human-interpretability. Credit: image in a, Unsplash.

Back to article page