Introduction

Humans use various conscious and unconscious processes to maintain optimal vision quality. Spontaneous eye blinks, defined as involuntary blinking that is not a reflex response1serve a key function by preventing the ocular surface from drying out by lubricating the eye with every blink2. However, the average human blink rate is about 15 blinks per minute3far exceeding the estimated 2 blinks per minute necessary for maintaining ocular lubrication2. Moreover, blink rate patterns exhibit considerable variability, with research indicating that blinking rates vary across individuals and change with age3,4,5,6. Such results have led to the idea that blinking may be driven by factors beyond merely fulfilling biological needs, a hypothesis that was already put forward in the late 20th century7,8,9.

A cognitive role for eye blinks

The notion that blinking could be linked to how the brain manages cognitively demanding tasks was supported by early research showing that people who performed a task with a higher cognitive load blinked significantly less compared to a control group10,11. Subsequent studies corroborated this trend in within-subject designs, demonstrating that the number of blinks decreases during periods with high cognitive load and increases afterward12,13. In addition to these relatively simple laboratory tasks, real-life situations that require increased cognitive effort, such as driving or controlling air traffic, similarly lead to a reduction in blink rate14,15,16.

Higher cognitive demands are closely tied to an elevated state of attention. Hence, it is unsurprising that blink rates tend to decrease when more attention is required in an experimental task17,18,19,20. For instance, in a study by Fukuda21eye blinks were studied in relation to the amount of selective attention deployed during a deception detection task. The lowest blink rate occurred during the presentation of a relevant visual stimulus, suggesting that minimal blinks were produced during periods of heightened attention. Research has identified analogous effects when participants process purely auditory information, suggesting that blinks serve as content-sensitive markers, irrespective of the modality through which the information is conveyed22,23. Oh and colleagues18 reported similar effects. During an auditory tone counting task, eyeblink activity was more suppressed with higher task difficulty and it rebounded in the silent gap immediately after exposure. Significant differences in blinking patterns were found even in response periods lasting less than one second: blinks were transiently suppressed between the task-relevant response cue and participants’ keypress, with a marked increase in blinks following the response. These findings have reinforced the hypothesis that blinks may be unconsciously regulated through top-down (attentional) mechanisms to optimize task performance18,24,25. Indeed, it has been demonstrated that both the presence and absence of blinking can be considered indicators of vigilance or sustained attention19,26,27,28,29.

In the context of examining spontaneous blinks during attentional states, a limited number of studies have explored blinking behavior in more ecological environments. While watching a movie, people’s blinking rates are considerably lower during memorable scenes30. Conversely, elevated blink rates were present during attentional breakpoints such as the conclusion of an action or during repeated presentations of a similar scene31,32. Such non-random blinking has also been observed when listening to a (visible) speaker, where blinks mainly occurred synchronized with those of the speaker at the end of vocalizations and during pauses33. This coupling between listener’s blink patterns and speech pauses has also been shown in participants listening to audiobooks, where it was observed for attended but not for unattended speech34. Similar to the findings for more artificial tasks described above, it appears that blinking is actively related to the content of the scene or conversation, rather than a mere physiological necessity20. Consistent with blinks’ postulated involvement in the release of attention, they have been associated with momentary activation of the default-mode network35 and deactivation of external-orienting networks36.

The timing of blinks in text reading

In the present study, we aimed to examine such a cognitive role of blinking during text reading. Blinking rates have consistently been found to be lower in reading conditions compared to the resting state, probably due to both the relevance of visual information processing and enhanced cognitive effort37,38,39. Moreover, empirical evidence demonstrates that blink patterns during text reading exhibit variability contingent on the degree of engagement with the content, with fewer blinks correlating with heightened engagement26,27,40. In contrast to the extensive research on eye-movements during reading41,42there has been limited research examining blinking patterns in relation to specific textual markers during reading. To our knowledge, the only study to investigate this topic was published by Hall7 nearly 80 years ago with the title “The origin and purposes of blinking”. Hall manually tracked the occurrence of blinks in 16 participants reading aloud a short text passage. He reported blinking to occur significantly more at the end of sentences and punctuation marks compared to page-turning events (end of a line) or other special locations in the text. Additionally, he observed that participants blinked more frequently in the presence of unusual or unexpected words. Hall was among the first to propose that spontaneous blinking could have other functions aside of maintaining the tear film by stating that blinking while reading is a form of an ‘acquired technique’, which was interpreted by many authors as a form of high-level processing. Consequently, numerous studies have cited Hall’s findings as the pioneering evidence that blinking can be studied as a reliable marker of our cognitive system9,35,37,43,44,45.

Despite its recognition as a seminal finding, the finding that blinks are associated with punctuation marks or word expectations during text reading has, to our knowledge, never been replicated. Such a replication is critical given the outdated methodology, very limited amount of data and lack of statistical support for the original finding. For example, the blinks were manually counted whereas we know they cannot always be easily spotted46. Hall drew his conclusion based on a total of 139 blinks observed among 16 participants (blinks per participant were not reported). Presumably, conclusions were made based on a visual inspection of the data and not on statistical analyses, as no statistics were reported. Furthermore, the unusual or unexpected words were most likely categorized in this manner based on subjective opinion, since he did not provide clarity on the characteristics of these words. Translated to modern research, the usualness and expectedness of words could be assessed through word frequency and word predictability, respectively. It is well established that word frequency influences reading, with the word frequency effect demonstrating that high-frequency words are processed more efficiently than low-frequency words47. Additionally, words that are more expected given the preceding text are read more quickly48,49. In contrast to Hall’s research, contemporary studies benefit from more valid measures to ascertain word frequency and predictability50,51. In other words, the question remains whether the placement of blinks during reading can be robustly associated with these specific textual factors, as has been demonstrated in other contexts such as scene viewing or listening to speech31,33.

Current study

Using a very large eye-tracking corpus of text reading, we investigated blinking patterns in relation to punctuation marks, word frequency and word predictability. We hypothesized that these factors would influence readers’ blinking patterns. Punctuation marks imply the end of, or a pause in a sentence, potentially serving as attentional breakpoints during fluent reading, which should elicit blinking. We hypothesized that lower word frequency and predictability would increase cognitive effort relative to higher-frequency and more predictable words, which would be reflected in an elevated probability of blinking following fixations on these words. With this study, we aim to provide a large-scale quantification of the points in text where readers blink, making the first analysis of this kind since Hall7. Therefore, we seek to enhance our understanding of the temporal dynamics of spontaneous blinking during reading and their potential role in the reading process.

Materials and methods

Eye-tracking corpus

We made use of pre-existing data obtained for the Ghent Eye Tracking Corpus database (GECO)52,53. The corpus contains eye-tracking data from 19 English-Dutch bilinguals and 15 English monolinguals who silently read an entire Agatha Christie novel54which was presented in paragraphs on the screen. The novel comprised approximately 5,000 sentences (over 54,000 words in total), read across four separate sessions. For the present study, only the monolingual (English) data were used. All participants signed an informed consent form prior to starting the experimental procedure and all the methods were performed in accordance with the Declaration of Helsinki. The experimental procedure was approved by the ethical committee of Ghent University. For a detailed description of the book and a comprehensive overview of the methodology, please refer to Cop et al.52. Notably, blink variables (e.g., timing of blink occurrences and duration of blinks) were not included in the original corpus. However, the raw eye-tracking data were generously shared with us by the authors, allowing us to extract blink occurrences using the EyeLink Data Viewer software (SR Research Ltd., version 4.3.1). Word frequencies were obtained from the SUBTLEX-UK corpus55 with each word assigned a Zipf value, a logarithmic measure of word frequency ranging from one (very low frequent words) to seven (extremely high frequent words). Following previous research51,56word predictability given the preceding context is quantified using surprisal, calculated as the negative natural logarithm of a word’s probability according to a large language model (GPT-2)57. The distribution of word frequency scores and surprisal values throughout the novel is presented in Supplementary Materials (Fig. S1 and S2).

Blink extraction

As highlighted by several researchers, there is currently no consensus on the blink duration criteria for extracting spontaneous blinks from naturalistic reading data26,46. In this study, we adhered to pre-processing steps where blinks were defined using only the closed phase (i.e., no pupil was detected), labelled in Data Viewer as blink events58. Blink events were considered outliers if their duration fell outside a 1.5SD range from the participant’s mean blink duration. This method was applied separately for each participant because of the significant variability in spontaneous blink rates among humans3,4. Additionally, consistent with previous research, segments labelled as blinks lasting longer than 1000ms were deemed beyond the range of spontaneous blinking and segments shorter than 10ms were excluded as they could represent measurement noise26. The distribution of blink event durations per participant before applying the exclusion criteria is presented in Supplementary Materials (Fig. S3).

Statistical analyses

All pre-processing steps and statistical analyses were performed using R (RStudio version 4.4.2).

The primary analysis (Analysis 1) focused on blinking patterns in relation to punctuation marks. As an initial step, the number of times a blink occurred before or after each fixation within a textual position of interest segment was calculated for each participant (see Fig. 1) and divided by the number of occurrences of this position of interest in the raw text, resulting in a relative blink proportion. A position of interest was defined as a text segment containing textual information pertinent to this analysis. Four positions of interest were included: a punctuation mark (Punctuation), an end of line (EOL), a combination of both a punctuation mark and an end of line (Combo), and a residual category (Residual) including all segments that did not contain one of the three previously mentioned textual markers.

Fig. 1
figure 1

Segment of the novel annotated with data viewer markings. An example paragraph from the novel with added Data Viewer markings. The data were segmented based on preprocessing steps, as indicated by the boxes in the figure, upon which the positions of interest were determined according to the content of these segments. The blue circles represent fixations, with fixation durations indicated, while the yellow arrows denote saccades. The red lines indicate blink activity (the length is not proportional to the blink duration). The initial step of the analysis can be demonstrated with the following example: the last red line (blink) on the fifth line is initiated after a fixation within a text segment containing an end-of-line position (position of interest). Consequently, this blink is classified as occurring in a position of interest.

Given that the proportions were confined to the standard unit interval (zero to one) and not normally distributed, a linear regression model was deemed unsuitable. Therefore, we employed a beta regression model, implemented via the ‘betareg’ package59with average relative blink proportions as the dependent variable and position as the main predictor with four levels. In order to verify the assumptions of our beta regression model, a residual analysis was performed to ensure that the residuals were randomly distributed and followed a beta distribution. Goodness-of-fit was assessed using pseudo-R-squared values, further supporting the validity of the model assumptions. As a follow-up, we also tested whether the effect of punctuation was predominantly driven by a specific type of punctuation. We again employed a beta regression model with average relative blink proportions as the dependent variable and position as the predictor. The position variable consisted of five distinct levels, representing the positions of interest: a point (Point), an exclamation mark (Exclamation), a question mark (Question), a comma (Comma), and the residual category (Residual).

For both models, we additionally controlled for participants which improved the model fit. Note that controlling for participants is also warranted in light of the substantial inter-individual variability in blink rates among humans4. These models were subsequently evaluated through pairwise comparisons utilizing the ‘multcomp’ package60. Bonferroni correction was used to account for multiple comparisons. Finally, odds ratio effect sizes were computed for all statistically significant pairs.

The second analysis (Analysis 2) focused on the impact of the word frequency and word surprisal of the last fixated word on the probability of blink occurrence. We used a generalized linear mixed-effects model (GLMM), implemented via the ‘lme4’ package61. The binary outcome variable was modeled with a binomial family and logit link function to account for the dichotomous nature of the response (blink occurrence (1) or blink absence (0)). The fixed effects in the model were the word frequency score and the surprisal value as well as their interaction. These variables were scaled prior to inclusion in the model. Word length was included in as a control variable. We included random intercepts per participant, given the substantial inter-individual variability in blink rates among humans4and a random intercept was included for each individual word. The analysis was restricted to content words only. This approach was taken to avoid confounding frequency or surprisal effects with word type, ensuring that the results reflected effects of interest, rather than differences between content and function words.

Results

We utilized blinking data from 15 participants who read the entire novel across four sessions. Two sessions were excluded due to the absence of saved eye-tracking data, resulting in 58 usable reading sessions. Approximately 10% of all events labelled as blinks were removed following outlier removal protocols. In total, 30,367 blinks were included for further analyses. The average number of blinks per participant across all sessions was 2024.5 blinks (SD = 1137.8) with a minimum of 551 and a maximum of 4,810. The mean blink duration was 128.80ms (SD = 56.4, range = 10–347ms). Finally, the average number of blinks per minute across all participants was 10.3 (SD = 5.9). For the distribution of the textual positions of interest in the novel and the total number of blinks per textual position of interest, please refer to Supplementary Materials (Table S4).

In the first analysis (Analysis 1), we compared the average proportion of blinks across the four primary positions of interest (Residual, Punctuation, EOL, and Combo) using a beta regression model. This analysis was done to investigate whether the occurrence of blinks around punctuation marks was higher compared to other positions in the text. Overall, four out of six pairwise contrasts were significant (see Table 1; Fig. 2a). Specifically, the odds of blinking around Punctuation were 4.9 times higher than around Residual positions (z = -11.12, p < .001). Additionally, the odds of blinking around EOL and Combo positions were respectively 3.9 and 6.1 times higher compared to Residual positions (z = 9.40, p < .001; z = -12.84, p < .001, respectively). Finally, the likelihood of blinking around Combo positions was 1.6 times higher than those around EOL positions (z = -4.68, p < .001). No significant differences were observed between Punctuation and the other two positions of interest. Blink proportions of all punctuation mark types were significantly different when compared to Residual positions (see Table 2; Fig. 2b).

Table 1 Results of pairwise comparisons for position (analysis 1).
Fig. 2
figure 2

(a) Results of analysis 1: the impact of text position on relative blink proportion. Boxplot with midlines showing the median and box limits indicating the first and third quartile of relative blinks proportions per position of interest. The whiskers extend to the minimum and maximum value, with the exception of outliers. Lines connect blink proportions across different positions for a given participant, thereby illustrating the variation and trends in blink rates. Each coloured dot represents one participant. Four out of 6 pairwise contrasts were significant, *** = p < .001. (b) Results of follow-up analysis 1: differentiating different punctuation marks. Boxplot with midlines showing the median and box limits indicating the first and third quartile of relative blinks proportions per position of interest. The whiskers extend to the minimum and maximum value, with the exception of outliers. Lines connect blink proportions across different positions for a given participant, thereby illustrating the variation and trends in blink rates. Each coloured dot represents one participant. All pairwise contrasts including the Residual category were significant, *** = p < .001.

Table 2 Results of pairwise comparisons for position (follow-up analysis 1).

In the second analysis (Analysis 2), we examined the probability of blink occurrence in relation to the word frequency and predictability of the content word (quantified as surprisal) that was last fixated by the reader using a logistic generalized linear mixed model. A correlation analysis revealed a negative correlation between word frequency and word surprisal (r = -.53, p < .001). The fixed effects revealed a significant negative effect of word frequency on blink probability (Beta = -0.038, SE = 0.013, z = -2.891, p = .004), suggesting that higher-frequency words were associated with a reduced likelihood of blinking. Surprisal value had a significant positive effect on blink probability (Beta = 0.071, SE = 0.008, z = 9.121, p < .001), indicating that more surprising words were associated with a higher likelihood of blinking. Importantly, the interaction between surprisal and frequency was also significant (Beta = 0.020, SE = 0.006, z = 3.279, p = .001) (Fig. 3). Please note that the results were qualitatively identical when the analysis was conducted on all words instead of content words only.

Fig. 3
figure 3

Result of analysis 2: impact of word frequency (Zipf Values) and word predictability (Surprisal Values) on blink probability. The figure illustrates the predicted probability of a blink as a function of the word frequency score and surprisal value, as estimated by a generalized linear mixed-effects model. Predictor variables were scaled prior to model fitting. The colored lines represent the predicted blink probabilities for three distinct quantiles of surprisal (0.10, 0.50, and 0.90), while the shaded regions represent the corresponding 95% confidence intervals.

Discussion

Over the last eight decades, several studies have proposed that spontaneous eye blinks play an important cognitive role in addition to their biological functions7,9,18,33,35. Although blinking has been studied across various contexts and experimental paradigms, evidence regarding the connection between spontaneous blinks and cognitive processes during reading remains limited7,8,40,41,42. The current study aimed to investigate this possible cognitive role by analyzing blinking patterns during naturalistic reading, focusing on their occurrence relative to punctuation marks and the frequency and predictability of fixated words. The work could be framed as a much-needed contemporary conceptual replication of the study conducted by Hall7who investigated blinks during reading using a manual blink count. We used the GECO dataset containing eye movement data from 15 monolingual English participants who read an entire novel in silence. We hypothesized that the probability of blinks occurring around positions containing punctuation marks would be higher compared to other positions in the text, as these positions potentially serve as attentional breakpoints during reading. Furthermore, we hypothesized that blinking probabilities would be modulated by word frequency and predictability, with higher blink probabilities after fixations on lower-frequency and less predictable (i.e., more surprising) words, as processing these words demands more cognitive effort.

Reading between the blinks

Although humans are unaware of it, the loss of perceptual information during blinking is inevitable62,63. The brain may, therefore, regulate blinking behaviour so that as little information as possible is missed while still satisfying our biological needs. This could elucidate the observed reduction in mean blink rate within the current reading study (i.e., about 10 blinks per minute), as compared to the generally established mean blink rate of 15 per minute in humans3. In addition a reduced blink rate, blinks may also be distributed in a non-random manner. The findings supported both of our hypotheses regarding factors influencing the placement of blinks during text reading. Consistent with the pattern observed by Hall7average blink probabilities were higher at punctuation marks compared to random positions in the text. Results showed that this trend was not specifically driven by one type of punctuation. In addition, similar effects were shown for other non-random places in the text; blinks occurred more often at the end of a line and when punctuation marks and line endings coincide (compared to random positions). Blink patterns are likewise influenced by word frequency and predictability, with higher word frequencies and predictabilities significantly reducing the probability of blinking.

Firstly, we highlight the effect of punctuation on blinking patterns. Punctuation marks serve primarily to indicate pauses in the flow of words. Event-related potential studies have shown that speech boundaries and commas reliably elicit a similar neural response, known as the Closure Positive Shift (CPS), suggesting that they act as visual cues for implicit prosodic chunking64. The increased likelihood of readers blinking around punctuation may reflect a reduction in the incoming information load and attentional demands. Readers may then also use top-down processes to schedule their blinks to take place at these special positions in the text7,24,25. Although our data do not clearly distinguish between these two interpretations, what is clear is that spontaneous blinks that arise during reading are not only driven by biological needs (such as a reflex or the urge to wet the eye) but also by cognitive processes.

In addition to punctuation marks, there is an increased probability of blinking at the end of a reading line. This phenomenon has been extensively documented and can be attributed to the saccadic movements associated with transitioning to the subsequent line of text8. These saccades are frequently accompanied by blinks, as no visual information is processed during these rapid eye movements65. Our results indicate that the combination of punctuation marks and line endings produces more pronounced effects than line endings alone. One could infer that this is primarily driven by the effect of punctuation, as the combined blinking proportion does not significantly differ from the proportion associated with punctuation by itself.

Secondly, the results showed that blinking patterns are influenced by the word frequency of the word that was last fixated, with higher word frequency scores significantly reducing the likelihood of blinks. Additionally, blink probability is also influenced by the surprisal value, such that a more predictable words are associated with a reduced probability of blinking. This can be understood within the cognitive accounts of blinking. The word frequency effect suggests that low-frequency words are more cognitively demanding and therefore are harder to process compared to high-frequency words47. Similarly, the word predictability effect reflects increased processing difficulty associated with less predictable words49,66. It is plausible that after encountering a lower-frequency or less predictable word, participants were more often in need of a cognitive break, causing a post-effort effect. Consistent with the punctuation results, blinks would be more likely to occur during these cognitive breaks.

In addition to the main effects of word frequency and surprisal, the analysis revealed a significant interaction between these two variables indicating that the effect of surprisal value was stronger for high-frequency words. Previous studies have similarly documented such an interaction67,68. A plausible interpretation for this finding is that word frequency and predictability are intrinsically correlated, as reflected in the observed correlation, whereby lower-frequency words are inherently more surprising. Consequently, when a word is already low in frequency, the effort required to recognize it may already be high, limiting the additional benefit that predictability can provide. In contrast, for higher-frequency words, the cognitive processing benefits of predictability are more pronounced.

Overall, our findings align with previous studies suggesting that blinking serves not only a physiological function but also plays a cognitive and attentional role. Since Hall’s introduction of the so-called ‘acquired technique’7blinking behaviour has been associated with several cognitive processes. It was shown that higher blink rates occur during mentally less demanding periods or during periods with lower attentional involvement, whereas lower blink rates tend to coincide with increased mental effort and attention13,20,45,69,70. Our results further strengthen the evidence for this link by demonstrating that, during novel reading, blinks are not randomly distributed. Blinking is influenced by specific textual factors, with blink probabilities fluctuating during reading based on the level of attentional or cognitive demands. We thereby reinforce and extend earlier findings observed across different contexts, including scene watching, listening to speech, and experimental tasks18,21,31.

Strengths, limitations and avenues for future research

A notable strength of the present study is its utilization of a corpus-based approach. The GECO corpus, comprising extensive naturalistic reading data from a lengthy narrative text, ensures high generalizability and external validity53. A second advantage is the use of a systematic method to characterize blink occurrences relative to text position in an eye-tracking dataset. Creating relative blink proportions ensured that differences in the absolute frequency of the different positions of interest did not affect the results. Moreover, the large volume of data collected over multiple sessions enhances the likelihood of mitigating potential confounders, such as external factors influencing blinking during reading sessions (e.g., light conditions or fatigue71,72) or technical errors originating from the eye-tracker and its software.

However, that the dataset was collected before the study implies a few limitations. Blinking was not an original variable of interest in the GECO data collection and was rather considered a confounding factor as is typically the case in eye movement research on reading52. Therefore, blinks and other positions of interest had to be extracted post hoc, whereas most studies that examine blinking patterns use paradigms that are specifically designed to observe or elicit blinks26,46. Moreover, although we expect the vast majority of recorded blink events to represent true blinks, some may have resulted from short-term data loss or error. These occurrences are unlikely to significantly affect the analyses, yet they could introduce additional noise.

Further research is required to determine whether blinks can be associated with other textual characteristics during reading. Previous studies have extensively examined other eye movements, such as fixation latencies and regressions, and found significant correlations with various textual properties, including word length, predictability, general word inconsistencies and difficulty73,74,75. The question remains whether these properties could also be significantly associated with blinking, given that these textual properties are correlated with the level of cognitive demand within a text. In addition to lexical properties, eye-movement research has also focused on higher-level text processing76. Readers tend to segment texts, apart from punctuation marks or white spaces, on so-called situation models in which they perceive subjective boundaries while reading whenever there are changes in time, location, or characters within the narrative77,78. In line with the punctuation effect observed in the current study, these subjective boundaries could similarly elicit increased blinking probabilities due to top-down processes. Such an effect of event segmentation has been found in video watching31however, it remains an open question whether this phenomenon extends to text reading as well.

Integrating the results of this study with previously described research, a robust link between blinking and cognition emerges. Evidence suggests that these processes are biologically interconnected44 and blink patterns tend to remain stable over time and across trials within the same individual79,80. Consequently, it is not surprising that numerous researchers have proposed blinks as a potentially valuable tool for studying cognition and various related mental processes71,81,82,83. Currently, only very few studies have suggested that blinks can be used to examine cognitive processes, specifically during reading26,27. Based on our results, we encourage future research to further explore this possibility.

Conclusion

Blinking is necessary to maintain clear vision yet causes brief interruptions of visual input. This raises an intriguing question: does our brain efficiently time these blinks during reading? We investigated this by analyzing blinking patterns in eye-tracking data from the GECO corpus, which contains data from participants reading an entire novel. Our analyses examined the impact of punctuation marks, word frequency and word predictability on blink probability. The results indicated increased blinking around punctuation marks compared to other positions in the text and a negative relation between word frequency and predictability and blink probability, conceptually replicating the early observations by Hall7. We propose that these systematic blinking patterns can be linked to our attentional system.