Figure 1

Computational models of predictive processing at lexical and semantic level. To illustrate the idea of prediction operating across multiple representational levels, consider the sentence I take my coffee with cream and… which ends with either an expected completion (sugar), an unexpected but semantically related completion (salt) or an unexpected and semantically unrelated completion (socks). At a lexical level, salt is unexpected because it is extremely rare that this sequence of words is heard or read. Processing of this word is therefore assumed to be no different from the processing of other unexpected words (i.e. socks). Conversely, at a higher semantic level, salt is relatively more likely, because sugar and salt share common features, both being powders and condiments; edible; white etc. We used two models of lexical surprisal and semantic dissimilarity to disentangle the contributions of prediction at lexical and semantic levels, respectively. Top: For the semantic dissimilarity model, vector representations of previous words in the sentence are averaged to form an estimation of the event context. The latent semantic features of the averaged vector converge on a representation similar to the predicted target “sugar” which, consequently, is more similar to words from the same category (e.g. “salt”) than different categories (e.g. “socks”). Bottom: Conversely, the lexical surprisal model does not distinguish between unexpected words based on their semantic category as it only reflects the probability of encountering either sequence of words in the training corpus, which is either rare or non-existent.