Fig. 2: Behavioural results using distributional semantics and large language models.
From: Neural dynamics of semantic control underlying generative storytelling

a Graphical representation of the pipeline adopted for the semantic distance analysis. We used word2vec models to extract word embedding vectors of the target (orange) and non-target (black) words from participants’ stories, excluding stopping words (grey). The depicted story comes from a sampled participant from the creative condition. Then, we computed the semantic distance between target and non-target words and averaged the results across the 3 targets. b Raincloud plots depicting the semantic distance scores across Ordinary (green), Creative (pink), and Random (purple) conditions. Each dot represents a participant, horizontal bars with asterisk indicate statistical significance (N = 24) and dotted lines in the boxplot show mean values. c Graphical representation of the pipeline adopted for the surprise estimation analysis. We used large language models (BERT) to perform masked modelling by masking the target word tokens and computing the predictions to those masked tokens. We then computed the surprise score as the negative log-likelihood (NLL) between those predictions and the target words and averaged the results across the 3 targets. d Raincloud plots showing the surprise scores across the 3 conditions as in (b). Horizontal bars with an asterisk indicate statistical significance and dotted lines in the boxplot show mean values. For all boxplots in the raincloud plots, the dotted and solid horizontal line represent the mean and median value, respectively, while the whiskers extend to the minimum and maximum data point that does not exceed 1.5 times the interquartile range.