Introduction

Infants are powerful learners, sensitive to statistical variabilities in their environment, and capable of selectively orienting their attention towards stimuli with the maximal potential for learning1,2. It has been suggested that adults structure the information that they present to young learners, whether implicitly or explicitly, to maximise learnability3. In other words, the form of child-directed content typically reflects their functions4. This has been shown, for example, for child-directed speech (CDS)5, child-directed gestures6, child-directed television7, and child-directed singing8.

The first and best-known way in which we alter the information content of stimuli that we present to young listeners for learnability is by structuring it to allow for the signal to be easily differentiable from the noise9,10,11,12,13,14. For example, in many languages, CDS contains exaggerated pitch, elongated words, and expanded vowel space with stretched formant frequencies15. Previous research suggests that these changes entail exaggerating the critical features that distinguish one phoneme from another16,17,18. Children learn better from CDS than they do from adult-directed speech19, and CDS evokes greater attention-related electrophysiological responses than ADS17. This suggests that the specific form of CDS is tailored to its transmissive function, namely, to allow infants to learn the regularities of their native language.

A second way in which we alter the content of information that we present to young listeners is by balancing the trade-off between predictability and complexity (see Fig. 1). An upcoming stimulus is highly predictable when a particular learner can fully anticipate and encode it based on their prior knowledge without too much cognitive effort (i.e., when the agent’s mental model of the environment does not need to be expanded to process the stimulus fully). Contrarily, a stimulus is complex when it is relevant for the individual by yielding substantial information gain (reduction of uncertainty) for a particular learner in a particular environment. These two needs (predictability and complexity), which are fundamental and antagonist, are essential to enable learners to reduce their uncertainty by creating models of their environment in order to make future predictions20,21,22,23, but also aid effective communication and cooperation24,25,26,27. This idea has been extensively discussed in the context of adult-directed content, such as music28,29,30,31. The balance between these two opposite forces, often referred to as the Goldilocks zone, has also been shown to govern infants’ allocation of their attention to visual and auditory stimuli32,33. For instance, 8-month-old infants already orient their attention to maximise learning2, preferring stimuli that cause a model to update (the complexity force) while still being able to encode novel information (the predictability force)26. Importantly, the predictability-complexity trade-off is not an absolute constant universal value but is modulated depending on the intention, context, and agents’ prior representations and past experiences.

Fig. 1: The preferred information rate is the result of a balance between predictability and complexity.
figure 1

The mean (preferred information rate) and standard deviation (tolerance for a different information rate) can change over development, intention, and socio-environmental context. For instance, speakers spontaneously increase their speech redundancy when talking in a noisy environment to maintain the same signal-to-noise ratio11. CR (Compression Ratio) indicates complexity measured as the length of the shortest description of the message divided by the length of the original message (see Methods—Text informativeness metric). A small CR (0) indicates low information density, high signal-to-noise ratio, and high predictability; a large CR (1) indicates high information density, low signal-to-noise ratio, and low predictability.

This balance between predictability and complexity is also particularly important when considering the acoustic properties of child-directed songs34,35,36,37. Previous research has investigated these differences from the perspective of fine-grained acoustics: for example, we know that similar acoustic differences exist between child- and adult-directed music as between child- and adult-directed speech5; and that, in music, phonetic and acoustic information are mapped more clearly than in speech alone38, which increases intersensory redundancy and thus guides perceptual differentiation39. Even 10-month-old infants can parse a continuous auditory stream into discrete word segments from a song40, and speech sung along with music helps language acquisition38. Furthermore, caregivers spontaneously alter the information-theoretic characteristic of their prosodic contours, with pitch trajectories being less predictable in infant-directed speech than in adult-directed speech, possibly to attract infant attention by inducing uncertainty-driven curiosity41.

However, thus far the literature has focused on static measures of predictability in information content. Studies have looked at how a single time-invariant information content measure of a specific stimulus impacts infants’ attention or learning. Here, we concentrate instead on how information content is structured over time in child-directed content. To do so, we focused on a different aspect of child-directed songs, namely, the form of the information input carried over time by the song lyrics, in terms of textual predictability. Extensive evidence suggests that semantic and orthographic complexity tend to be aligned in language26,42,43,44,45 (for example, simpler contexts tend to have similar forms). Even though child-directed songs like nursery rhymes contain more than textual information, for instance, melody, harmony, rhythm, and movement46, or in this case, even video, we focus here on textual predictability as a relevant feature to investigate because of the role of nursery rhymes in word learning. We acknowledge that the overall complexity carried by the song is not solely dependent on the textual content. One of the limitations of studying only the form of lyrics, in a medium that also contains musical features, is that we are missing some of the cues that infants use to help them encode the audiostream47. However, as the vocal part of the nursery rhymes attract infants’ attention more than an instrumental version48, and as vocals are present in more than half the musical experience of infants49, the lyrical content alone should still be constrained by the infant’s needs as they are a salient part of the audiostream. Specifically, we hypothesised that textual information is introduced differently in child-directed songs (CDSongs) compared to adult-directed songs (ADSongs) and child-directed stories (CDStories). We think that those characteristics emerged from a long evolutionary process (through copy, variation, competition, and selection) carried by children50 along with caregivers51. By looking at the shape of textual content as cultural artefacts52, we can try to unveil the selective pressures that shaped them in the first place.

We hypothesise that the form of CDSongs lyrics should be determined by the functions they have and the multiple caregiving roles that they support. In other words, the introduction of semantic information through lyrics, like any other musical feature, is potentially shaped to fulfill specific needs for each agent51,53. Accordingly, we posited that the differences in the patterns of textual information introduction, between CDSongs and ADSongs, and between CDSongs and CDStories, stem from the different functions of each register. CDSongs are used by caregivers across cultures54,55 to assist them on a variety of tasks56; from early social bonding57,58, and soothing59,60, to word learning61,62,63 and collective creative endeavours64,65. In addition to acoustic differences between CDSongs, ADSongs, and CDStories, the form of information introduction patterns in CDSongs could also partly explain infants’ preference for them compared to ADSongs66 and CDSpeech67. Accordingly, we compared the form of CDSongs to ADSongs and the form of CDSongs to CDStories. We expected the CDSongs form to differ from the ADSongs form by showing a relative predictability level adapted to a younger audience, in line with the idea that CDSongs are used by caregivers to teach new words, soothe, and enhance social bonding with their child. We also expected CDSongs to differ from CDStories because of the more salient role of music for social bonding and joint creation compared to storytelling65,68,69, but also because of the differences in their performance by caregivers: namely that CDSongs are usually performed without written support contrary to CDStories.

We examined differences in information content between CDSongs, ADSongs, and CDStories in three analyses spanning nested timescales. In the first analysis, we compared the overall complexity of the three text types (CDSongs, ADSongs, and CDStories). Complexity was measured by calculating the Compression Ratio (CR), which is the length of the shortest description of the message divided by the length of the original message. This approach, based on minimum length description70, gives us insight into the relative quantity of information contained within the text (see Methods). Our first analysis assessed whether there was a relationship between overall complexity and the song’s popularity, indexed as the number of views of that content on YouTube, normalised by time since the content had been posted (Analyses 1a). Based on the Goldilocks principle32,33, we predicted that popularity should follow an inverted U-shaped as a function of complexity, meaning that the most popular songs should be of intermediate complexity. Our second hypothesis was that CDSongs have greater redundancy in the information content presented, compared to ADSongs or CDStories (Analyses 1b), manifesting as lower complexity. This lower complexity compared to ADSongs should result from infants’ improved word encoding for higher redundancy language71,72. CDSongs lower complexity compared to CDStories should result from the caregiver’s constraint to have easily remembered lyrics to sing without written support68,69,73.

In the second analysis (Analysis 2), we examined the rate of change of information, in order to discover: while presenting a set quantity of information, when should the information rate be higher? According to the Serial-Position Effect, information presented at the beginning (primacy effect) and at the end (recency effect) have greater recall chances than information presented in the middle of a sequence74, and this effect is present from 6 months75. This phenomenon was also measured for long-term recollection of real-world lyrics76,77. We thus predicted that CDSongs will take advantage of the Serial-Position Effect, i.e., show a higher relative information rate at the beginning and end compared to ADSongs.

Our final set of analyses (Analysis 3) examined the rate of change of new information over time. For this analysis, we examined temporal regularities in the timing with which new information was presented. Temporal regularities are present at numerous levels in early child-caregiver interactions, from sub-second level variabilities in amplitude fluctuations in CDS78 to periodic fluctuations at the time-scale of seconds, minutes, and hours in socio-communicative behaviours and stress physiology79,80. These temporal regularities may serve important functions—first, they allow for the development of predictions22,81; and second, they allow for oscillatory entrainment to develop between periodic endogenous oscillations and external information content80,82. In line with the Smooth Signal Redundancy Hypothesis83 and the Uniform Information Density Principle84,85, we predicted that, compared to CDSongs and ADSongs, CDStories should distribute the information load equally throughout the transmission, in order to be more robust to random noise86,87. Additionally, consistent with the micro-level regularities already noted at the sub-second level in CDS, we expected to observe macro-level oscillatory structure at the second to minute scale in how new information content was presented in CDSongs and ADSongs, and that oscillations in information predictability should be less variable and thus more easily trackable in the CDSongs compared to ADSongs. Endogenous oscillatory activity is present at multiple levels from early life, and evidence from a variety of sources suggests that oscillatory patterns in external information content facilitate learning80.

Methods

Analyses were not pre-registered.

Corpora

We identified three contemporary databases of child-directed songs (CDSongs), adult-directed songs (ADSongs), and child-directed stories (CDStories) in three languages (English, Spanish, and French). Our analyses in the main text are based on the entire corpus; in addition, in the Supplementary Materials, we also include a breakdown of analyses into individual language corpora.

Table 1 shows a breakdown of the number of individual texts included per language. For the English language content, the CDSongs is composed of 313 song lyrics from traditional and contemporary nursery rhymes taken from the YouTube channel Cocomelon—Nursery Rhymes (e.g., Incy Wincy Spider, Wind the Bobbin Up, etc—see SM Section 4 for a full list). The Spanish corpora is taken from the Youtube channel Little Angel Español—Canciones Infantiles (e.g., Cinco Patitos Salieron a Nadar, El Viejo MacDonald tenía una Granja, etc) and French language CDSongs were taken from the Youtube channel Titounis—Comptine bébé—Chanson enfant (e.g., Frère Jacques, Il était un petit navire, etc).

Table 1 Number of text per condition and languages

The ADSongs corpus is composed of 267 song lyrics (100 song lyrics from the UK top 100 Charts from 07/11/2022; Spanish content from Youtube playlist “Pop Latino 2022–2023 Videos Oficiales” (96 song lyrics); French content from the Youtube playlist “French top songs (most viewed 2022)”) (71 song lyrics). The child-directed story corpus (CDStories) is composed of 342 Æsop fables, (146 English Æsop fables, 100 Spanish Æsop fables, and 96 French Æsop fables). The full list of titles included for all three corpora is given in the SM (Section 4).

For CDSongs, and ADSongs, the lyrics were extracted either from the video description, as provided by the YouTube channel, or from an online lyrics catalogue, and the accuracy of the lyrics was manually checked for all of them. For CDStories, the corpus was directly extracted as a text file. In order to assess whether the textual content of these transcriptions, which is what we are analysing in this paper, is identical between the versions that we used and other transcriptions of the same nursery rhymes, songs, and stories extracted from different online sources, we also included some additional analyses in the Supplementary Materials (see SM Section 2), in which we compare the textual similarity of the same content taken from different sources online. These analyses confirm that the written text transcriptions used by Cocomelon, for example, are highly similar to the texts used by other sources.

The mean (std) length in characters (individual letters and punctuations) of the texts extracted was 1023.31 (382.83) for CDSongs, 1603.55 (898.70) for ADSongs, and 955.41 (411.02) for CDStories. Because of these differences in mean lengths, for analyses concerning overall complexity (Analyses 1.a and 1.b), we present results based on normalised data, i.e., divided by the length of the original text.

YouTube views as a popularity metric

For Analysis 1a, following other previously published papers88,89,90, we extracted the number of YouTube views as a measure of popularity. Viewer recommendations for YouTube are made based on the viewer’s personal history and what others with similar viewing history are choosing, as described in the following references. In these pages91,92, YouTube describes how the aim of their algorithms is to mimic and continue the selection process initiated by children and caregivers, and describe in some detail the exact metrics that YouTube uses to measure this (e.g., watchtime and repeat views).

Text informativeness metric

Kolmogorov complexity is the quantitative aspect of the principle of Minimum Length Description. The Kolmogorov complexity K(s) of a finite discrete sequence s is the length of the shortest programme that generates s. Such a programme that can losslessly describe s in K(s) symbols can be thought of as the representation of the original message s but reduced to its core information without any redundancy93. Exact Kolmogorov complexity is not computable, but lossless compression algorithms offer a reasonable approximation94. PyLZMA, a Python implementation of the Lempel–Ziv–Markov chain algorithm, was used for its compression power and simplicity of use95 (see Fig. 2). LZMA algorithms use dynamic dictionaries to count the occurrences of every n-sequence and then use Huffman coding to allocate shorter symbols to the most frequent string sequences. This method allows tracking of statistical regularities not only in the zero-order, character frequency scale, but also in hierarchically increasing n-sequences of characters, to match human language comprehension96. The size of the compressed version represents the actual number of characters needed to losslessly encode the sequences. As the cumulative window of analysis increases, the compressed size increases when the model needs updating to encode the sequence (Fig. 3A). This increasing cumulative window analysis (left-conditioned) is supposedly closer to how we perceive an incoming continuous stream of information: not as a sliding window but as an integrative one97.

Fig. 2: Scheme of a general adaptive compression algorithm (adapted138).
figure 2

The incoming symbols are encoded using the previous model, and the model is then updated to maximise encoding. The final encoded representation is a description of the initial symbols with a code that uses fewer symbols. If incoming symbols are already understood within the model, the encoded representation will not increase. The final representation length only updates upon encoding truly new information for the model (unpredictable information). The compression ratio is the length of the encoded representation divided by the total length of the original symbols.

Fig. 3: Illustration of the cumulative lossless compression technique.
figure 3

Example of a compressed size analysis with a cumulative window (A) and its derivative (B) at the beginning of the song “Baby Shark”. C shows onset and offset of periods of high (plateau = 1) and low (plateau = 0) derivative of compressed size, identified with a threshold of 3. All time series are plotted over character progression.

We examined how new information was introduced over time by calculating the derivative of the cumulative window. As the cumulative window of analysis increases, when the model needs updating to encode the sequence, the compressed size increases and thus the derivative becomes positive (Fig. 3B). This can also be seen as the difference in minimum description length between consecutive timepoints.

This LZMA lossless cumulative compression process can be seen as a model for predictive processing98 and for cognitive constructivism20,99,100 where data are sequentially unveiled to a pattern-seeking agent that keeps track of past statistical regularities to fit the incoming data into a condensed representation (see Fig. 2). This low-level statistical technique, already in use for image analysis101, can also be adapted for textual analysis102,103.

A CR was calculated, defined as the compressed size of the lyrics divided by the total length of the uncompressed lyrics to control for the fact that each story/song had a different length (see Fig. 2). A CR close to 1 indicates that no compression was done because of maximal information content, whereas a CR close to 0 indicates minimal information content in the lyrics. The CR was calculated iteratively using a cumulative window that increased in increments of one character between iterations. Analyses 1a and 1b are based on the average CR of the entire song, which is effectively the complexity of the song, standardised across different song lengths.

For Analyses 2 and 3, we examined how complexity changed over time during each song or story in two ways. Analysis 2 examines coarse-grained changes over the course of the entire song. To this end, we calculated the point within the progression of the song when given percentages of information content (25, 50, 75, 85, 95%) were reached (Fig. 4C for total result and see Fig. 5C for illustrative examples), and compared this between corpora using an analysis of variance (ANOVA). We also summed the derivative of the compressed size into 8 equal bins, divided those sums by the total amount of new information, and compared the amount of information in the first and last segments (segment 1 vs 8) using an ANOVA (see Fig. 4D).

Fig. 4: CDSongs are shown blue (n = 313), ADSongs in red (n = 267), and CDStories in green (n = 342).
figure 4

A shows the relation between the log normalised number of YouTube views as a function of complexity (i.e., the Compression Ratio). B shows the Compression Ratio (total amount of information divided by length) across conditions. C shows the relative progression at which texts reach 25, 50, 75, 85 95% of their total information content. For instance, CDSongs reached 95% of their total information content later than ADSongs. D shows the probability of information input in segments. E shows the R2 values of the compressed size across conditions. F shows the standard deviation of the derivative of the compressed size. G shows the mean redundancy period duration in each song. H shows within-song redundancy period variability (std). Fig. SM35 shows the same analyses broken down into English, French, and Spanish corpora separately.

Fig. 5: CDSongs (CoComelon—Animal Dance) is shown in blue, ADSongs (Taylor Swift—Anti-Hero) in red, and CDStories (Aesop—The Fox and the Crow) in green.
figure 5

A Raw compressed size over song progression (in characters, i.e., individual letters and punctuation). B Standardised compressed size adjusted to have similar length (1000) and similar final compressed size (100), over song progression (in characters). C Derivative of standardised compressed size over song progression (in characters). D Probability of new information over segment progression computed from the derivative divided by the total amount of information and binned into eight segments.

Analysis 3 examines fine-grained moment-by-moment changes. We conducted three analyses to examine fine-grained changes in the derivative of the compressed size during the song. First, we plotted the compressed size against the progression of the song (see Fig. 3A) and calculated the coefficient of determination R-squared104 (R2). A high R2 indicates that the information profile closely matches a straight line, indicating that the information is introduced uniformly over the entire course of the story/song, which matches what would be expected under the Uniform Information Principle. Second, after computing the derivative of the compressed size (see Fig. 3B) we calculated its standard deviation. A high std derivative indicates that new information is introduced at variable rates along the text, with periods of high informativeness and periods of high redundancy, capturing the staircase aspect of the information profile.

Third, we assessed oscillatory structures in the introduction of information content. We could not do this using standard frequency domain analyses, as the period and phase of the oscillations differed between songs of different lengths. Instead, we computed the average span (the average character length) and variability (std) of redundancy (i.e., low complexity) periods within one text. Periods of redundancy were computed as sequences where the derivative of the compressed size was lower than a fixed threshold, chosen to differentiate high from low information input moments (Fig. 3C). This threshold was set arbitrarily at a value of 3; but to check that the results were not dependent on this specific threshold, the same set of analyses was repeated with each possible threshold and results were found to be consistently significant regardless of the threshold (see SM section 1). Less within-song variability in the span of the redundancy periods indicates more periodic fluctuations in the rate of change of information content over time.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Our results section is in four parts. First, in Analyses 1a and 1b, we examine differences in overall complexity (standardised for length between songs). In Analysis 1a we examine the relationship between overall complexity and number of YouTube views. In Analysis 1b we compare differences in overall complexity between the different corpora (CDSongs, ADSongs, and CDStories).

Next, in Analyses 2 and 3, we examine differences in the rate of change of information content. In Analysis 2 we examine coarse-grained changes in how the rate of change of new information content changed over the course of the song, using two separate analyses. In Analysis 3 we examine fine-grained changes in how the rate of new information content changed within the song, using three separate analyses.

All analyses were conducted using ANOVAs as the texts being examined came from different authors, and so the assumption that observations are independent was verified. In the Supplementary Materials, we have also included the full breakdown for English, Spanish and French corpora considered separately (Section 3). Overall, individual languages almost always show similar results and directions, although not always in a significant way, so unless specified, the results were similarly significant for all languages taken separately.

Analysis 1a—overall complexity and number of views

First, we examined whether there was a relationship between stories’ and songs’ overall complexity (total amount of information, as indexed by the compressed size, standardised by song length) and the number of views (standardised by number of days since the video was uploaded). Our prediction was that the relationship would be an inverted U, in line with the Goldilocks principle. To test this, we fitted both a linear and quadratic model to the CDSongs corpus, and found that the linear model F(1, 311) = 17.99, p < 0.001, AIC = 752.7, BIC = 760.2 was a better fit than the quadratic model F(2, 310) = 10.28, p < 0.001, AIC = 752.2, BIC = 763.4. The linear relationship observed was a negative one, indicating that increased complexity is associated with fewer views (Fig. 4A). However, we did find the expected inverted U shape for the ADSongs, as the quadratic model was significant (F(2, 264) = 12.19, p < 0.001, AIC = 588.7, BIC = 599.5), but the linear one was not significant (p = 0.33). No popularity index was available for the CDStories corpus. Virtually identical patterns were observed when we examined the English, Spanish, and French corpora independently (Fig. SM35). Overall, these results show that only in the CDSongs corpus was decreased complexity associated with increased views and that this relationship is a linear one rather than the predicted inverted U shape. By contrast, the adult corpus reflected the expected inverted U shape relationship between complexity and popularity.

Analysis 1b—comparison of overall complexity between corpora

We first ran a one-way ANOVA to test whether there were any differences in overall complexity (indexed by the CR) between the CDSongs, ADSongs, and CDStories corpora. This showed a significant main effect of condition (F(2, 914) = 978.5, p < 0.001). We thus ran a series of pairwise comparisons between all three conditions (Fig. 4B). There was a significant difference in CR between CDSongs and ADSongs, t(578) = −4.03, p < 0.001 (MCDSongs = 0.322, stdCDSongs = 0.137; MADSongs = 0.362, stdADSongs = 0.091, Cohen’s d = −0.33, CICDSongs = [0.30; 0.33], CIADSongs = [0.35; 0.37]), with CDSongs having lower complexity, although there was no statistically significant differences in the Spanish and French corpora taken individually. We also compared the CR between CDSongs and CDStories, and found it to be higher for the CDStories corpus compared to the CDSongs corpus, t(653) = −39.1, p < 0.0001 (MCDStories = 0.647, stdCDStories = 0.065, Cohen’s d = −3.05, CICDStories = [0.64; 0.65]), this was found for all languages taken separately (see SM3, SM4, SM5). Another pairwise comparison revealed that ADSongs had significantly lower average CR than CDStories (t(607) = −44.80, p < 0.0001, Cohen’s d = −3.65), this was found for all languages taken separately (see SM3, SM4, SM5). Overall, these results suggest that CDSongs tend to be less complex than ADSongs, and that both CDSongs and ADSongs are less complex than CDStories.

Analysis 2—rate of change of new information content—changes across song

In order to quantify changes in information input over time we computed the derivative of the cumulative compressed size. A positive and high derivative indicates new information occurring, and a derivative close to zero indicates that a segment is redundant—i.e., that it adds no novel information compared to the cumulative amount of information already introduced thus far (see Fig. 3B). Samples of raw data from this analysis are shown in Fig. 5A, C.

In order to quantify whether the way information is introduced over time differed between corpora we performed two analyses. First, we calculated the point within the progression of the song when given percentages of information content (25, 50, 75, 85, 95%) were reached (see Fig. 4C for average results and Fig. 5B for raw data illustration). We then calculated a two-way ANOVA to assess differences between corpora (CDSongs, ADSongs, and CDStories). The ANOVA revealed an interaction between the condition and the percentage of total information content (F(8, 914) = 2150, p < 0.0001). We thus ran a series of pairwise comparisons between conditions. Pairwise comparison showed that CDSongs reached 25% of their total amount of information earlier than the ADSongs, t(578) = −4.58, p < 0.001 (MCDSongs = 0.113, stdCDSongs = 0.051; MADSongs = 0.131, stdADSongs = 0.051, Cohen’s d = −0.38, CICDSongs = [0.108; 0.119], CIADSongs = [0.126; 0.135]) although no statistical difference was observed in the Spanish corpus taken individually; and that CDSongs songs reached 95% of their total amount of information later than ADSongs songs, t(578) = 3.32, p < 0.0001 (MCDSongs = 0.847, stdCDSongs = 0.171; MADSongs = 0.804, stdADSongs = 0.133, Cohen’s d = 0.27, CICDSongs = [0.828; 0.866], CIADSongs = [0.788; 0.820]) although there was no significant statistical difference in the French corpus taken individually. This can be interpreted as an earlier fade in and a later fade out (Fig. 4C) of information input in CDSongs compared to the ADSongs. CDSongs also reached 25% of their total amount of information sooner than the CDStories, t(653) = −27.5, p < 0.0001 (MCDStories = 0.193, stdCDStories = 0.011, Cohen’s d = −2.15, CICDStories = [0.151; 0.159]), but they reached 95% of their total amount of information sooner than the CDStories, t(653) = −10.0, p < 0.0001 (MCDStories=0.941, stdCDStories = 0.008, Cohen’s d = −0.78, CICDStories = [0.940; 0.942]), similar results were found for all languages taken individually (Fig. SM35). This can be interpreted as an earlier fade in and fade out of information input in CDSongs compared to the CDStories.

Second, as a complementary analysis, we summed the derivative of the compressed size into 8 bins and then divided those sums by the total amount of information (final compression size) (Fig. 4D for results and Fig. 5D for raw data illustration). This allows us to compare the dynamics of information input across songs with different overall information content. In order to test whether the beginning and end had higher probability of new information, a two-way ANOVA was performed to analyse the effect of corpora (CDSongs, ADSongs, and CDStories) and progression (segment from 1 to 8) on the probability of information input. The ANOVA revealed an interaction between condition and progression segments (F(8, 914) = 59.91, p < 0.0001). We thus ran a series of pairwise comparisons between conditions. CDSongs showed higher values compared to ADSongs in the first t(578) = 3.33, p < 0.0001; and eighth segment t(578) = 7.69, p < 0.0001, although the higher value in the first segment was not significantly different in Spanish and French considered separately. CDSongs also showed higher values compared to CDStories in the first segment, t(653) = 17.9, p < 0.0001; however, CDSongs showed lower information input in the eighth segment compared to CDStories, t(653) = −9.42, p < 0.0001. Similar patterns were observed when the English, Spanish, and French corpora were considered independently (see Fig. SM35). This is fully consistent with the prior analysis in showing differential patterns in how information is introduced towards the beginning and end of CDSongs as compared to both ADSongs and CDStories. Importantly, though, this probability of information input over progression comes from a normalisation by overall information content, and thus we can only make claims about the relative repartition of information in each text type, not their absolute information content.

Overall, these results suggest that: i) Relatively to the overall information content of each song, more novel information is presented at the beginnings and ends of songs in CDSongs compared with ADSongs; ii) When looking at the beginning, middle, and end, the introduction of novel information is more evenly distributed along the text in CDStories compared to CDSongs and ADSongs.

Analysis 3—rate of change of new information content—fine-grained changes

In order to compare the micro differences in information input between CDSongs, ADSongs, and CDStories, we first plotted the derivative of the compressed size against the progression of the song and calculated the R2 (see Fig. 4E). R2 was found to be different across conditions F(2, 914) = 143.2, p < 0.0001, with the R2 in the CDSongs condition being lower compared to the ADSongs, t(578) = −4.90, p < 0.001 (MCDSongs = 0.882, stdCDSongs = 0.132; MADSongs = 0.926, stdADSongs = 0.068, Cohen’s d = −0.408, CICDSongs = [0.868; 0.897], CIADSongs = [0.918; 0.935]) (although there was not statistically significant difference in Spanish); and compared to CDStories, t(653) = −15.6, p < 0.0001 (MCDStories = 0.995, stdCDStories = 0.002, Cohen’s d = −1.22, CICDStories = [0.995; 0.996]). We also found that the R2 were significantly lower in the ADSongs compared to the CDStories, t(607) = −18.4, p < 0.0001. This indicates that the rate at which new information is presented during each item is more evenly distributed in CDStories compared with songs, and in ADSongs compared with CDSongs. Second, we calculated the standard deviation of the derivative of the compressed size (Fig. 4F). In line with the previous finding, we also found differences in the variability of information introduction, F(2, 914) = 988.2, p < 0.0001, with variability being significantly higher in the CDSongs condition compared to the CDStories condition, t(653) = 38.2.6, p < 0.001 (MCDSongs = 3.10, stdCDSongs = 0.47; MCDStories = 2.00, stdCDStories = 0.22, Cohen’s d = 2.98, CICDSongs = [3.050; 3.155], CICDStories = [1.981; 2.030]). Variability of information introduction was also higher in the ADSongs condition compared to the CDStories condition, t(607) = 46.3, p < 0.0001 (MADSongs = 3.15, stdADSongs = 0.37, Cohen’s d = 3.78, CIADSongs = [3.110; 3.201]), however no statistically significant difference was found between CDSongs and ADSongs (p = 0.15). Overall we found more variability in the introduction of information in the CDSongs and ADSongs compared to CDStories, and this was found in all languages considered separately (Fig. SM35). Taken together, the lower R2 values and higher variability of information introduction in CDSongs and ADSongs compared to CDStories, indicates that in CDSongs and ADSongs, information is more likely to be allocated in bursts with periods of high informativeness and periods of high redundancy.

The CDStories showed an information profile that closely matched a straight line (Fig. 4E, i.e., R2 values close to 1), so we excluded them from the next analysis. To further examine the oscillating patterns in the two remaining conditions, we computed the mean and variability of the redundancy periods, when less (or no) new information is presented (Fig. 3B; here, we present results using a threshold of 3, see Methods and SM section 1 for different threshold values). There was a main effect of conditions on variability (F(2, 562) = 98.16, p < 0.001, see Fig. 4G, H), where variability was measured as the standard deviation of the redundancy spans within one song. The variability was lower for CDSongs compared to ADSongs, t(562) = −9.90, p < 0.0001 (MCDSongs = 43.21, stdCDSongs = 45.37; MADSongs = 86.22, stdADSongs = 57.50, Cohen’s d = −0.83, CICDSongs = [38.04; 48.39], CIADSongs = [79.27; 93.16]), this was found even when considering language separately. We also found that the average redundancy spans were shorter in the CDSongs compared to the ADSongs condition, t(652) = −2.29, p < 0.025 (MCDSongs = 59.33, stdCDSongs = 71.34; MADSongs = 71.60, stdADSongs = 53.22, Cohen’s d = −0.19, CICDSongs = [51.20; 67.46], CIADSongs = [65.17; 78.02]), although there was no statistically significant differences in English and Spanish considered separately.

Taken together, our findings show that within-song variability and duration in the spans of the redundancy periods is lower in CDSongs than in ADSongs. This indicates that fluctuations in the rate of change of information content over time are more periodic and less spaced out in CDSongs compared to ADSongs.

Discussion

We compared the rate of change of information content between three open-source corpora: child-directed song (CDSongs), adult-directed song (ADSongs) and child-directed stories (CDStories), in English, Spanish and French.

In Analysis 1a, we examined the relationship between complexity and popularity in CDSongs (number of YouTube views) (Fig. 4A). The inverted U shape relationship predicted based on the literature31,32,33 was observed for the ADSongs corpus. For CDSongs, contrary to our predictions, we did not find the same inverted U shape. Instead, we observed that decreased complexity was associated with increased popularity. Three reasons might explain this. First, it remains possible, despite the evidence in its favour, that the Godlilocks effect is more restricted than previously thought, and cannot be observed using real textual information contents, and that young children do not prefer stimuli of intermediate predictability. Second, it is possible that the conditions to apply the Goldilocks principle might not be met here, e.g., YouTube users might select musical extracts that are more predictable for other reasons than knowledge transmission (e.g., self-soothing105). Third, it is possible that the expected U shape was not observed because extremely predictable text sequences, for instance, a song in which just one word is repeated, are simply not present in the corpus, as lyricists and music producers would themselves consider these unlikely to be popular, and therefore not produce them in the first place.

In Analyses 1b, we found that overall CDSongs showed lower complexity than either ADSongs or CDStories (Fig. 4B), although this was not significant for the Spanish and French corpora considered individually. This is in line with the literature on CDSongs showing a reduced lexicon; but also “infra-word” structure and other predictable sequences, like counting or verse/chorus oscillations68,106,107. This increased redundancy could allow for more overall infant word encoding99,100,108; for easier context recognition, in turn enhancing speech neural tracking109 and word learning110. As CDSongs are usually sung without written support and are transmitted through oral tradition68,69, they require sequence repetition to be more easily remembered, declared, and transmitted73,111,112. Potentially, then, one could also posit that this reduced complexity is the result of generational transmission constraints113, where simpler messages are more replicated over time as they require fewer cognitive resources to be remembered and transmitted114,115. The simpler shared songs would then reinforce social bonding (but also social outing) and intergenerational links, as people sing and share the same musical experience more easily, in turn increasing their chances of being replicated. Indeed, the more redundant form of CDSongs compared to CDStories can also be seen as a result of their different “storage of information” technics116. For CDStories, however, the redundancy constraint is reduced, as the textual trace largely assists the storytelling. The lower absolute complexity found in CDSongs compared to ADSongs and CDStories in Analyses 1b seems to point in that direction.

Second, in Analysis 2, we found that, even if the total amount of information was higher in ADSongs compared to CDSongs, CDSongs had more information at the beginning and end compared to ADSongs relative to their total amount of information (Fig. 4C, D), and individual language analysis reveal similar patterns. For example, in the song “Baby Shark” (Fig. 3), information is introduced at the beginning of the song (“Baby shark doo doo…”), followed by high levels of repetition/redundancy (i.e., introducing new family members following the same pattern and repeating “doo doo doo doo doo doo”), followed by new content at the end of the song (“Let’s go dance doo doo…”). These patterns are in contrast to ADSongs, which seems to follow the general pattern of constant entropy rate instead, over the song progression, found in naturally occurring languages, consistent with the Smooth Signal Redundancy hypothesis and the Uniform Information Density117,118,119.

In Analysis 3, consistent with this, we found that information was more evenly allocated, at an even finer time-scale in CDStories compared to CDSongs and ADSongs (Fig. 4F), and this was found for all languages even when considered individually. According to the Smooth Signal Redundancy Hypothesis83, in order to convey information efficiently while still being robust to noise occurring at random times, one should use a code that distributes information and redundancy equally throughout the transmission86,87. This is what we observed in the CDStories corpus but not the CDSongs and ADSongs corpora. CDStories are thus more optimal for efficient communication from an information theory perspective, as they use the information density principle to their advantage by having a near-constant information rate. This could potentially indicate that efficient information transmission is not the sole function of CDSongs and ADSongs.

The U-shaped input of information that we observed in Analysis 2 could potentially be helpful for learning by virtue of the serial-position effect. The serial-position effect states that information at the start and end of a sequence is better recalled than the one introduced in the middle of the sequence75. This strategic adaptation could enhance word learning, as new words are introduced at the preferred serial position for word recall120. Indeed, caregivers speaking English infant-directed speech spontaneously place unfamiliar, less frequent words, (words carrying more information by our metric) at the final position of a sequence121. Although not a universal feature of infant-directed speech, the serial position effect could also be a driving force behind the different temporal structure of information at the beginning-middle-end scale in CDSongs compared to ADSongs, possibly supporting both attention elicitation and word learning.

In addition, the more variable information rate that we observed in CDSongs could be beneficial due to being less predictable, as we know that infants tend to pay more attention to variable stimuli6,41. One reason why the variable information rate could be more popular among infant audiences might be due to the potential higher range of information processing rates that they can accommodate (Fig. 6). If we assume that infants’ preferred information rate is lower and more variable than that of adults, CDSongs create more opportunities for infants to be on both sides of their preferred information rate. As caregivers might not be aware of their infant’s current processing capacity, having a signal that carries a broad range of information rates over time might increase the chances that at least a part of the message will be processed.

Fig. 6: On the left—preference vs complexity: different hypothetical distributions of information rate processing capacities.
figure 6

Each curve representing the relationship between preference and complexity is a balance between opposing needs: predictability and complexity (see Fig. 1). Those needs can change according to contexts, and those changes are represented by different means and spread, with their mean value extended to the right-hand plot. From top to bottom: a preference for high complexity with a low tolerance (small range), a preference for medium complexity with a wide tolerance, and a preference for low complexity with medium tolerance. On the right- complexity vs time: CDSongs, ADSongs, and CDStories archetypal complexity temporal profiles (in blue, red, and green, respectively) and different information rate processing capacities (dashed lines). A variable complexity profile can satisfy a wider range of preferences for complexity, as illustrated by the archetypal CDSongs curve having a rewarding complexity level for more than one preference distribution.

In Analysis 3 we also observed oscillating patterns in the information profile of CDSongs and ADSongs, and that those oscillating patterns were more predictable (less variable in span) in the CDSongs compared to the ADSongs (Fig. 4H), and this results was found even when considering languages individually. These observations are consistent with the finding that, during early caregiver-child interaction, information tends to be packaged in oscillatory temporal structures across multiple timescales, from sub-second-scale amplitude fluctuations (which are stronger in CDS) through to daily routines and rhythms79,80,122. These periodicities may help to generate predictability, which, in turn, may facilitate information transfer123. Periodicities may also drive oscillatory entrainment between exogenous rhythms that are present in the external environment and endogenous neural and physiological rhythms that operate across multiple timescales123,124. Importantly, the findings from Analyses 1a (i.e., overall reduction of information content in CDSongs) and those from Analysis 3 (i.e., information appearing in bursts in CDSongs) are interrelated. Indeed, it is during the redundant segments that no new information is added, contributing to the overall reduction of information in relation to the increasing length of the analysis. These oscillations may even help to separate the new information from the redundancy, creating obvious sub-event segmentation guiding infant attention125 and training them for event segmentation, an important part of comprehension and anticipation126.

More speculatively, our findings—and, in particular, the differences we observed between CDSongs and CDStories—may also be compatible with the idea that one of the functions of CDSongs is not simply to maximise information transfer, but also to create a common ground for synchrony and shared interpretation, developing early social bonding. As Gratier et al. pointed out: “The repetition-variation dialectic of all musical idioms is intrinsically a culture-producing process. It provides a basic architecture for intersubjective engagement”65. As we discussed above, CDSongs offers a platform for co-creation, as high redundancy lyrics are easier to remember and in turn to sing along127,128,129 potentially leading to blur the adult/child division compared to the top-down didacticism of CDStories69.

The increased periodicities in new information content that we documented in CDSongs may also permit the temporal alignment of behaviours, as the next information peak can be anticipated80. Isochronous oscillations are more easily trackable as they show less variability130, which may enhance caregiver-infant synchronisation81,131. It has been suggested that interpersonal synchrony may help infant word segmentation132 and foster early social bonding57,133,134,135, for instance by blurring the self-other dichotomy during shared actions and/or by aligning multiple actions towards a shared goal134.

Figure 6 is a visualisation of archetypal information profiles from each corpus. The lines were drawn with respect to the different analysis being made across multiple timescales. First, CDSongs has an overall lower complexity (area under the curve) compared to ADSongs and CDStories (Analysis 1b). Second, CDSongs have relatively more information at the beginning and end compared to ADSongs (Analysis 2). Third, CDSongs and ADSongs have oscillating unpredictability, with CDSongs having more isochronous oscillations than ADSongs (Analysis 3).

Limitations

One limitation of this study is that, while we measured the difference in temporal patterns of complexity across text type, we can only posit theories that would explain these differences in terms of functions, and constraints. Indeed, it remains possible that the forms we describe here either happened for other reasons (i.e., under the selective pressures of other—non-measures—factors) and/or randomly, and that after being repeatedly exposed to these forms, infants learned how to learn from them optimally. Because we only measure forms here, but not how they impact learning or attention, we cannot strictly conclude concerning the form-function relationships. Another limitation concerning the YouTube view as a popularity metric, is that, to our knowledge, it is unknown how much caregivers interfere in the song selection process. Another limitation of our study is that—as is the case for most of the available literature on this topic unfortunately—the texts we analysed come from a small subset of languages, all coming from WEIRD cultures136, and not necessarily the one listened to everyday by children (e.g., the Aesop stories). Moreover, some effects did not hold when considering each language separately. Therefore, our findings have limited generalisation power137, and even if we tried to mitigate this by choosing three languages that are spoken worldwide, we acknowledge that similar methods applied to a more diverse and naturalistic stimuli are needed to provide a more complete picture. Establishing a stronger relationship between the form of child-directed content and the functions that they support could be implemented in intervention or serve as guidelines for early years educators.

Conclusion

In line with the literature suggesting that the characteristics of nursery rhymes are constrained by form-function relationships, we examined the multiscale information dynamics of child-directed songs (CDSongs), adult-directed songs (ADSongs), and child-directed stories (CDStories). We developed a technique to quantify the rate of change of multilingual textual information content over several timescales. We found that ADSongs showed an inverted U relationship between complexity and popularity, but CDSongs only showed a negative relationship. Overall textual complexity was, however, lower in CDSongs. We also found that information was more unevenly allocated in CDSongs, with more information contained at the starts and ends of songs, and more periodic oscillatory patterns in the rate of change of information content. Taken together, these findings may suggest that nursery rhymes act as a form of pattern recognition training for infants and children to learn structure and novelty in a targeted way, while also enabling them to connect with others. These findings also highlight the need to treat infant preference for middle complexity as something to extend and analyse in the time dimension.