Introduction

Cross-cultural research has yielded valuable insights into the systematic variations in perceptual and cognitive processes across nations and cultural groups. This rich area of research encompasses a diverse range of cognitive processes, such as object categorization, attitudes toward contradictions, object-background separation, selective attention in complex scenes, global-local processing, and sensitivity to visual field changes1.

Expanding on this foundation, a theory of holistic and analytic cognition has been formulated2 which posits the existence of two distinct modes of cognitive processing: holistic and analytic cognition. It suggests that individuals tend to engage predominantly in one of these styles, shaping their cognitive processing. Holistic thinkers, compared to their analytic counterparts, rely more on intuitive strategies rather than rigid rules in object categorization3,4, embrace dialectical thinking over formal logic, face challenges in object-background differentiation5, prioritize the contextual relationships between objects over salient focal objects and attributes6, exhibit a preference for global over local features7, demonstrate higher sensitivity to contextual changes rather than focal object changes8, and lower central fixation bias9. Most of the research mentioned above concludes that analytic cognitive mode is more typical for Western countries (USA, Western Europe) and holistic mode of perception is more typical for Asian countries.

Drawing inspiration from this theory, our current research focuses on one specific aspect of visual perception: selective attention to objects and their surrounding context when perceiving complex natural scenes, also referred to as context sensitivity or attention to the field10. Within this framework, “analytic” cognition has been linked to an object-centred attentional mode: people attend to focal objects, classify them via rule-based categories, and explain events by reference to internal object properties. By contrast, “holistic” cognition emphasises field-wide information; attention is distributed across background elements and inter-object relations, and reasoning is guided by similarity and contextual fit rather than abstract rules2. Several previous studies have explored cross-cultural differences in context sensitivity using natural scenes and eye-tracking techniques within the free-viewing paradigm11,12,13. More specifically, some researchers have discovered distinctive patterns of eye movements between Chinese and North American participants11 and between Chinese and African participants12, or between Taiwanese and Czech participants6. In this context, an analytic pattern empirically manifests as longer dwell time on focal objects and correspondingly shorter saccades within the first seconds of viewing, whereas a holistic pattern is characterised by broader scan paths and more time allocated to the background11.

However, some studies failed to replicate or even contradicted the theory of holistic and analytic cognition. For instance, Rayner et al.14 recorded eye movements across six information-processing tasks and found no reliable East–West divergence in attention to objects versus backgrounds. Using the original Chua et al.11 materials, Evans et al.13 likewise reported null cultural effects between American and Chinese in eye-movements and scene recognition. Furthermore, Rayner et al.15 showed that American and Chinese observers looked equally quickly at the “weird” components of manipulated scenes. Finally, Miellet et al.16 masked central vision dynamically during an animal-search task and found identical oculomotor strategies in Western and Eastern participants. Given the contradictory evidence on the systematic cultural differences in perception and cognition, the reseach in this field is far from concluded.

Despite the existing body of research, there are still some gaps that need to be addressed.Most existing eye-tracking studies contrast a single “Western” sample (typically the United States) with a single “East-Asian” sample (Japan, China, or Korea). This dichotomy leaves unanswered whether the purported holistic–analytic split is categorical or graded, and whether it covaries with specific socio-ecological variables such as urbanisation, education system, or relational mobility17,18,19. We therefore adopt a gradient view: societies positioned closer to the historical core of Western Europe on indices of individualism, “Book religions”, and alphabetic literacy are expected to show the most object-focused (analytic) gaze, whereas societies further away on those dimensions should exhibit progressively more context-focused (holistic) patterns. Taiwan represents the East-Asian pole because its Confucian heritage, logographic writing system, and collectivistic value profile mirror those of the cultures in which holistic gaze patterns were first documented10,11.

Furthermore, the limited sample diversity and the predominant focus on Western (read Anglo-Saxon, associated with analytic cognition) and Eastern countries (read East-Asian, associated with holistic cognition) when defining the theory of holistic and analytic cognition present obstacles to its potential advancement. To expand the body of knowledge on scene perception beyond the commonly compared countries, we gathered data from three European countries (Germany, Bulgaria, Czechia), two samples from a country on the border between Europe and Asia (Türkiye), an African country (Ghana), and one East-Asian country (Taiwan). The decision to collect two separate samples from Türkiye aimed not only to examine cross-country differences but also to explore within-country differences in the relatively largest and most diverse country in the sample. To achieve this, we deliberately gathered data from participants born and raised in both the Western and Eastern regions of Türkiye.

One of the central questions was how to determine the rank of the countries on the holistic-analytic spectrum prior to delving into the data, especially when faced with a lack of robust empirical evidence to substantiate the formulation of our hypotheses, at least in some of the comparisons (e.g. between Türkiye and Ghana). Based on the previous research, we expected the participants from Germany (a representative of the Western country) to be the most analytic11, followed by participants from Czechia and Bulgaria (post-communist Europe)20. On the other hand, Taiwan is expected to be the most holistic country. Participants from Ghana and Türkiye should fall between these countries12. Finally, we expected no differences between the participants from Eastern Türkiye compared to their Western counterparts21, but decided to test them.

In the current research, we focus on cross-cultural similarities and differences in one of the aspects of visual perception - attention to (focal) objects and background - using complex visual scenes (naturalistic photographs). The main variables of interest (dependent variables) were the amount of time spent on objects (focal object dwell time; calculated as a sum of fixation durations) and saccadic length (calculated in pixels). Dwell time reflects how long a viewer attends to focal objects vs. contextual elements, with cultural differences between object-oriented (analytic) and context-sensitive (holistic) styles. Saccadic length and its temporal changes provides insight into whether a viewer is making smaller, detail-focused movements or larger, more context-focused shifts22. These measures are frequently used in eye-tracking studies of context sensitivity6,11,21,23. We were also interested in the temporal changes in saccadic lengths during the stimulus presentation, and in factors that might have influenced the object-oriented attention. Therefore, we formulated several assumptions related to differences in eye-movements across cultural samples.

The hypotheses are categorized as major or minor (Hma, Hmi). First, statistical equivalence of both Türkish samples in the dependent variables (dwell time on focal objects, saccadic length) was tested using a TOST procedure (two one-sided t-tests) proposed by Lakens24. When statistical equivalence is not achieved, merging the samples and analyzing them as one group is not advisable. As already mentioned, we expected the same patterns of eye-movements in both Türkish samples (Hmi1).

Next, we tested for differences in the average dwell time on focal objects (main hypothesis Hma1) and the average saccadic length (Hma2). Analytic perceivers should according to the theory attend relatively more to the focal objects, therefore, the average dwell time on objects should be longer. Moreover, analytic perceivers should more prevalently use focal visual scanning patterns that are typical with relatively longer fixations and relatively shorter lengths of saccades in the first few seconds after the stimulus onset22. Therefore, we expected the following order of countries in the average dwell on focal objects: Taiwan < Ghana < Türkiye-East < Türkiye-West < Bulgaria < Czechia < Germany. The opposite pattern was expected in the saccadic length (Taiwan > Ghana > Türkiye-East > Türkiye-West > Bulgaria > Czechia > Germany).

Next, we tried to test several minor hypotheses. Follet et al.22 demonstrated a significant drop in saccadic length after three seconds from the stimulus onset when free-viewing complex visual scenes. This effect should be independent from the content of the scene. We have no specific cross-cultural hypothesis regarding this, but we would expect to replicate the drop in saccadic length in our study (Hmi2).

In the next set of hypotheses, we wanted to explore potential effects of stimulus characteristics, namely (a) the number of focal objects, and (b) the relative size of focal objects, on the average dwell time.The manipulation with the number of focal objects tests whether increasing object load demands additional visual processing: when two independent items are present, viewers must individuate, compare, and encode each one in turn, which should prolong the cumulative time their gaze remains within the object ROIs. We therefore expect that, as the number of focal objects (1 vs. 2) rises (Hmi3) and as their relative size increases (Hmi4), participants across countries will display longer mean dwell time on the focal objects.

The last exploratory hypothesis tries to investigate whether individual sociodemographic factors such as gender, age, or socioeconomic status25 have some kind of effect on dwell time or saccadic length during free-viewing of complex visual scenes (Hmi5) as suggested by some previous research in the field26.

Methods

Research ethics

The Research Ethics Committee of Masaryk University reviewed the application to conduct the research project and has approved this project (Proposal No.: 0257/2018) to be conducted on 13 March 2019. Nature of methods used and their administration was performed in accordance with the relevant guidelines and regulations of Masaryk University and national laws. Informed consent was obtained in writing from all participants.

Use of LLMs

Chat GPT (v. 4O, https://chatgpt.com/) LLM was used for English proofreading of all sections of the text, and for polishing the analytic code.

Materials

Real-world scenes

The stimulus materials consisted of complex scenes - real-world photos. The scenes were obtained from the online databases containing free-to-use for non-commercial purposes license photos (CC BY-NC-SA and more permissive licenses). We set several pre-selection criteria of the photos: (a) the image should contain one or two focal objects, (b) there should be no people or animals present in the image, (c) there should be no components of the image that would reveal the exact location, where the image was taken, present in the picture (no commonly known landmarks, no writing).

To verify that the scenes fit our criteria, we conducted a qualitative evaluation of the scenes. The preselected images were sent to four independent evaluators (members of the research team) from countries involved in this research. The evaluators were given the definitions of a focal object (“The main figure(s) or object(s) in the picture. It is salient and can be in the foreground.”) and background (“The part of a picture that forms a setting for the main figure(s) or object(s), or appears furthest from the viewer.”) and asked to: (a) indicate how many objects they identify in the image, (b) name those objects. Minimum of three out of four evaluators had to agree in the evaluated criteria. Scenes that passed this qualitative evaluation (i.e. they had the same number of objects, same components of the image are perceived as objects) were kept in the test battery, other pictures were discarded.

In total, 98 scenes out of the original 160 were retained after the qualitative evaluation. In the next step, we unified the size of each image and added a black frame occupying 10% of the entire stimulus (see Fig. 1; size without the frame: 922 × 692 px; size with the frame: 1024 × 768 px). This was done due to the lower accuracy of eye-tracking systems in the display periphery27. Subsequently, for each stimulus, we defined the surface occupied by focal objects and computed an index of relative focal object size that was used both for further selection of stimuli and later on in the statistical analysis. In the next step, we ranked the images by the relative size of the focal object (min = 1%, max = 40%). Images with extremely small or large portions occupied by focal object(s) were discarded. The resulting range of relative focal object size was between 5% and 28%. This variable was transformed to z-score. Based on z-scores, the stimuli were placed in one of three categories: stimuli with small (N = 12), medium (N = 43), and large (N = 15) focal objects (cut-off points at z-score − 1 and 1). These 70 stimuli comprised the test stimuli (trials).

Fig. 1
figure 1

Sample stimuli with (A) man-made and (B) natural focal objects.

Each trial was presented for four seconds and was preceded by a central fixation cross displayed until the moment a participant fixated on it. The participants were instructed to “view a series of pictures” and evaluate “how much they liked each picture” on a scale ranging from 1 to 5 (1–very good, 5–very poor). The task of rating scene likability was included solely to maintain participant engagement and prevent boredom effects, contributing to enhanced attention and motivation during the experimental session11,12,22.

Sociodemographic questionnaire

Apart from the scenes, sociodemographic data were collected by a questionnaire. Data on sex, age, nationality, marital status, number of siblings, mother tongue, socioeconomic status, and other variables were collected. This research was part of a larger test battery and not all of the variables collected were used in this research. For a full list of variables collected, including their coding, please see Supplement. All materials were translated by two independent translators using a back-translation method. The translation process for each language was overseen by native members of the research team from the respective countries, all of whom were well-acquainted with the research objectives.

Apparatus and procedure

The sociodemographic data were collected using the software Hypothesis28,29 running on Google Chrome web browser between February 2020 and April 2022. The eye-tracking data were collected using SMI RED 250 mobile in all countries except for Taiwan. In Taiwan, an EyeLink 1000 was used. The sampling frequency was set to 250 Hz with 9-point calibration. The desired accuracy threshold of calibration was set to 1° of visual angle. The selection of eye-tracking systems was guided by their specifications and performance parameters to guarantee methodological comparability of the recorded data. The SMI RED 250 mobile is specified at 0.4° accuracy and 0.03° RMS precision by the manufacturer30, whereas the EyeLink 1000 is specified at ≤ 0.15° (typical 0.25–0.50°) accuracy and 0.01° RMS precision31. Independent benchmark studies report mean empirical accuracies of ~ 0.5–0.9° for the SMI RED 250 under 9-point calibration32 and ~ 0.6° for the EyeLink 1000 in a 49-target fixation grid33. These values confirm that both systems meet the ≤ 1° criterion recommended for dispersion-based scene-perception research34. Both systems can be considered relatively accurate and precise, and also comparable in both parameters.

However, we conducted the task even with participants with lower accuracy to make the entire experimental procedure constant for all participants (the same order of tests in the test battery). In all countries, stimulus material was presented on 22” LCD monitors set to 60 Hz refresh rate and 1024 × 768 resolution). Participants viewed the stimuli using chin-rest positioned approximately 60 cm away from the monitor. The laboratory setup, data collection, as well as all the subsequent phases of data cleaning and analysis adhere to the guidelines for eye-tracking studies by Dunn et al.35.

Data processing and cleaning

Sampling plan

The sampling plan and participant exclusion criteria for this study were pre-registered (see Data availability section). As part of a larger test battery, we collected data from university students across 7 countries (Ghana, Taiwan, Armenia, Bulgaria, Türkiye, Czechia, and Germany) in 8 independent samples (for contrast, Türkish participants born in Eastern and Western regions of the country were collected). Sample size was determined based on a priori power analysis with specifying small-to-medium effect size (Cohens’ f = 0.18). Using common settings (α = 0.05, β = 0.20) in G*Power (v3.1.9.7)36 for One-Way ANOVA with 8 groups resulted in total 456 participants, i.e. 57 participants per a group (Noncentrality parameter λ = 14.77, Critical F = 2.03, Numerator df = 7, Denominator df = 448). Counting with a certain data loss that is quite common in eye-tracking studies we aimed to collect 60–70 participants per cultural group. Due to the low data quality of a sample collected in Armenia, the number of participants would drop far below the desired number (to only 15 participants out of 58 collected). All the data cleaning and sample information below are therefore reported without the Armenian sample (deviation from the pre-registration described in the Supplement).

Due to the fact that the number of groups changed, and the final number of participants was below the planned number, we calculated a post-hoc sensitivity analysis for the statistical tests that were used in this study. ANOVa (α = 0.05, β = 0.20) for 7 groups and 408 participants should be able to detect small-to-medium effect (2 = 0.03). Post-hoc t-tests with the same settings should detect Hedges’s medium to large effect (g = 0.59)37. TOST procedure24 used for the tests of equivalence between both Türkish samples was able to detect medium effect (d = 0.40).

Data processing—raw data

Since data collection was conducted using multiple eye-tracking systems and due to the incompatibility of their software (different native fixation detection algorithms, impossibility to import data gathered with one eye-tracker into another), we employed a meticulous data processing procedure described below. This procedure should eliminate or at least minimize any method-related artifacts. Additionally, to assure good data quality, several data quality checks were performed and data not meeting a sufficient quality were discarded from the further analysis. For a full list of discarded participants and reasons for their removal see the Supplement.

In the first step, data were exported from SMI BeGaze and SR Research Data Viewer. Also, the first phase of data cleaning was performed. We discarded all participants with incomplete administration, low accuracy, administration with incorrect sampling frequency, or impossibly short experiment duration. Thirty-two participants were discarded in this phase. Subsequently, we prepared the raw data to be imported to Ogama software (v5.0.1)38. This included mainly changing file names, relabeling of trial names and IDs, removal of duplicate timestamps, relabeling of participant IDs, and reordering columns. The R-codes for these transformations are available in the OSF repository.

Data processing—Ogama

In Ogama, the ROIs were drawn for each test trial (Fig. 2). ROIs are defined around the contours of the focal objects. Both the picture slides and the ROI coordinates exported from Ogama are available at the OSF repository. Subsequently to the definition of ROIs, the eye-movement metrics were calculated. Ogama uses an I-DT dispersion-type algorithm for fixation detection39. We used the settings recommended for 250 Hz data by Popelka40: maximum distance of 15 px, minimum number of 20 samples, size of 31 px for the fixation detection ring, automated elimination of first fixation, and no merging of consecutive fixations. The following metrics were exported for each participant and trial: trial duration, data loss, fixation count (trial and each ROI), fixation duration (dtto), and saccade distance (dtto). Ogama exports saccade length in screen-pixel units, and we recalculated the saccade length to more commonly used degrees of visual angle. All sites used identical 22′′ monitors (1024 × 768) viewed from a chin-rest–fixed 60 cm distance. Under this geometry 1 pixel ≈ 0.042° of visual angle, so each pixel value was multiplied by 0.042 to yield saccade length in degrees.

Fig. 2
figure 2

ROIs with (A) one and (B) two focal objects.

Data processing—quality check after fixation calculation

After the ROI definition, fixation calculation, and export of eye-movement metrics we conducted a second phase of data cleaning (pre-registered). Trials were excluded if they met the following conditions: (1) the percentage of data loss was greater than 20%, which equated to more than 800 ms of lost data, based on the guidelines proposed by Komogortsev et al.41; or (2) the number of fixations was less than five, as recommended by Follet et al.22. The average number of trials excluded in this fashion varied between 4.66% in Czechia and 15.90% in Ghana. Furthermore, if a participant had more than 20% (i.e., 15 or more) of trials excluded based on these criteria, they were excluded from subsequent analyses to ensure data integrity. Fourty additional participants were deleted in this phase. The number of missing (deleted) trials after the removal of participants was 4.37% and the mean data loss as exported directly from Ogama was 3.05% across all participants with very similar values in all countries (minimum of 2.71% in Türkiye-West and maximum of 3.82% in Taiwan). Both numbers indicate high data quality after the data cleaning procedure. As an additional quality control, we also computed a median of intersample angular distances precision34,42,43 for every trial, fixation, and participant, and compared these across countries using a mixed-effects model (see the Supplement). The S2S seems to be acceptable to make meaningful comparisons across cultural samples44.

Tests of assumptions and equivalence

Assumptions of ANOVA (normality, homoscedasticity, sphericity) were tested before each of the presented analyses. In the case of violation of these assumptions, a robust variant of the statistical test was calculated (Welch’s ANOVA, bootstrap heteroscedastic ANOVA for trimmed means, Games-Howell post-hoc). Results of assumptions testing can be found in the Supplement. To test for equivalence of both Türkish samples, we used TOST)24 with medium equivalence bounds of d = 0.40. The procedure allows - unlike null hypothesis significance testing - to interpret evidence in favor of null hypotheses (in our case, in favor of equivalence of the two cultural groups).

Data were analyzed using R, version 4.1.245 and the packages afex, effsize46, ggforce, ggdist47, gghalves, lme448, lsr, mclust49, performance50, psych51, rstatix, TOSTER52, and WRS253.

Research sample

After the data cleaning procedure, 408 cases were retained in the research sample (Table 1). The minimum size of a subsample is 42 (Ghana and Türkiye West), the maximum is 92 (Czechia). The percentage of females varies between 32 in Türkiye East to 66 in Taiwan. Regarding the age of participants, the mean age of participants varies around 21–22 years, with the exception of Germany that is represented by marginally older participants (25 years average). The total sample size is slightly below the desired sample size (see the Sampling plan section), which is primarily caused by lower number of participants retained in subsamples from Ghana and Türkiye West).

Table 1 Sample characteristics.

Results

Equivalence of Türkish samples

To be able to determine, whether or not both Türkish samples can be considered equivalent (Hmi1), we performed the TOST with set d = 0.40 bounds24. If found equivalent, the samples could be grouped and analyzed together (see the Supplement for details on the equivalence tests). Tests were computed for both main dependent variables, the average dwell on focal objects and the saccadic length. Regarding the mean dwell time on focal objects, there was no significant difference between the two Türkish samples, t(79.8) = 0.93, p = .358, mean difference = 56.52; Hedges’s g = 0.19, CI95% [− 0.15, 0.53]. However, the TOST suggests that the groups are not equivalent; t(79.8) = 0.92, p = .82. Regarding the mean saccadic length, no significant difference was found between the two Türkish samples, t(95.6) = − 1.69, p = .094, mean difference = − 9.71; Hedges’s g = − 0.34, CI95% [− 0.66, − 0.00] and the samples were non-equivalent, t(95.6) = − 1.60, p = .950.

In summary, the testing of sample equivalence in the two dependent variables shows no evidence of equivalence of neither the samples from Türkiye. This means that the grouping should not be performed and the samples will be further compared individually.

Dwell time on focal objects

One-way ANOVA with cultural group as an independent variable and average dwell on focal objects as a dependent variable was conducted to identify potential differences in dwell across cultural groups (Hma1; Fig. 3). Effect of culture was significant, F(6, 401) = 12.77, p < .001, with large effect size ω2 = 0.15, CI95% [0.09, 1.00]. The mean focal object dwell time varied between 1,895 ms (SD = 304, SE = 46) in Ghana and 1,528 ms (SD = 281, SE = 34) in Taiwan. When inspected in the direction from longest to shortest average dwell, the first significant difference can be observed between Ghana and Czechia (mean difference = 247 ms, p < .001; Hedges’s g = − 0.92, CI95% [− 1.30, − 0.54]). Inspecting the differences in the opposite direction, the first significant difference can be found between Taiwan and Bulgaria (mean difference = 205 ms, p = .001; Hedges’s g = 0.75, CI95% [0.38, 1.11]). Full descriptives and all post-hoc tests are reported in the Supplement.

Fig. 3
figure 3

Violin plots show the full distribution of participant-level mean dwell time on the focal-object ROIs, aggregated across all trials. The embedded box-and-whisker elements mark the median, the inter-quartile range, and 1.5 × IQR limits; black dots are individual participants. Under the main plot, circles labelled with two-letter country codes are linked by grey bars whenever a post-hoc test found no significant difference between the connected pair. The darker circle in each row is the reference country for that particular set of comparisons.

To further explore patterns of attention to object vs. attention to the background, we performed a series of t-tests on the size of differences between dwell time on objects and background in each of the cultural samples (Table 2). Each pairwise comparison is statistically significant. Yet, in the case of Taiwan, the effect size is notably lower (medium) compared to the other subsamples (high).

Table 2 Differences in dwell time on focal object (FO) and background (BG) by culture (in ms).

Saccadic length

To analyze potential cross-cultural variations in saccadic length (Hma2; Fig. 4), we conducted a Welsch one-way ANOVA. The effect of culture was significant, F(6, 163.2) = 7.43, p < .001, with large effect size ω2 = 0.18, CI95% [0.08, 1.00]. The mean saccadic length varied between 7.06° (SD = 1.09, SE = 0.17) in Ghana and 8.31° in Taiwan (SD = 1.45, SE = 0.18). When inspected in the direction from shortest to longest mean saccadic length, the first significant difference can be found between Ghana and Bulgaria (mean difference = 0.75°, p = .001; Hedges’s g = 0.81, CI95% [0.43, 1.18]). When inspecting in the opposite direction, the first significant difference can be found between participants from Taiwan and Türkiye East (mean difference = 0.72°, p = .001; Hedges’s g = 0.44, CI95% [0.44, 1.17]). Full descriptives and all post-hoc tests are reported in the Supplement.

Fig. 4
figure 4

Layout and symbols follow Fig. 3, but the dependent variable is participant-level mean saccade saccadic length expressed in degrees of visual angle.

Temporal saccadic length changes

To replicate the findings of Follet et al.22 that, using almost identical research design, identified a drop in saccadic length three seconds after the stimulus onset, we calculated mean saccadic lengths for each of four seconds of stimulus presentation (Hmi2). At the aggregate level, the mean saccadic length did not show the expected pattern, but fluctuated from 7.44° (SD = 1.48) in Second 1, through 7.86° (SD = 1.50) in Second 2 and 7.61° in Second 3 (SD = 1.60), to 8.07° (SD = 1.56) in Second 4. The results of robust (bootstrapped, with 0.2 trimmed means) repeated measures ANOVA showed a significant main effect of time interval after stimulus onset F(3, 404) = 18.03, p < .001, ξ² = 0.22. All differences between individual levels were significant at alpha 0.05, except of saccadic length in Second 1 and 3, p = .140, Hedges’s g = − 0.18, CI95% [− 0.28, − 0.09] and in Second 2 and 4, p = .060, Hedges’s g = − 0.25, CI95% [− 0.35, − 0.15]. For full results see the Supplement. This pattern of fluctuations in saccadic length was similar at the level of each individual cultural group (Fig. 5).

Fig. 5
figure 5

Temporal changes in saccadic length across cultural samples. Each panel depicts one cultural sample. The x-axis divides the 4-s presentation interval into four successive 1-s bins (1 = 0–1 s, …, 4 = 3–4 s). Within each bin, violin/box plots display the distribution of participant-level mean saccade amplitude.

Effects of stimulus characteristics on dwell time

In the next step, we explored cross-cultural differences in dwell time on focal objects based on the characteristics of stimuli. More specifically, we were interested in the effects of number of focal objects and their relative size (percentage of stimulus occupied by focal object).

Effect of number of focal objects

Overall, the focal object dwell time was higher on stimuli with one focal object (Hmi3); mean = 1694 ms, SD = 305) compared to stimuli with two focal objects (mean = 1628, SD = 333). The difference was significant, t(407) = 6.14, p < .001; Hedges’s g = 0.30, CI95% [0.20, 0.40]. At the cultural level (Table 3), the differences were significant in the case of Taiwan and both Türkish samples that exhibited longer dwell time on objects in the case of stimuli with one focal object compared to stimuli with two objects.

Table 3 Focal object dwell time by culture - stimuli with one and two focal objects (in ms).

Effect of relative size of focal object

Furthermore, we conducted an analysis of effects of relative focal object sizes on dwell time (Hmi4). Across all samples, there is an apparent trend that the dwell time on objects increases together with their relative size. Focal object dwell time in trials with small objects (mean = 1523, SD = 316) was significantly shorter than in trials with medium (mean = 1686, SD = 309), t(407) = − 15.33, p < .001, Hedges’s g = − 0.76, CI95% [− 0.87, − 0.65], and large objects (mean = 1800, SD = 350), t(407) = − 21.27, p < .001, Hedges’s g = − 1.05, CI95% [− 1.17, − 0.93]. The differences between trials with medium and large objects were also significant, t(407) = − 12.13, p < .001, Hedges’s g = − 0.60, CI95% [− 0.70, − 0.49]. The same pattern is observable at the level of individual countries (see Supplement for all descriptives and t-tests).

Effects of sociodemographic variables

To determine whether the inclusion of sociodemographic variables improves the model fit (Hmi5), we included gender, age, and socioeconomic status as covariates of cultural group, and dwell time on focal objects, resp. saccadic length as dependent variables. Subsequently, we performed likelihood ratio tests of the nested regression models. The inclusion of covariates did not improve the model neither in the case of dwell, χ²(5) = 379,530, p = .419, nor in the case of saccadic length χ²(5) = 4059.4, p = .372, none of the covariates was significant. Corresponding AIC and BIC values of the reported models are in the Supplement.

Discussion

In this paper, we present results of possibly the largest cross-cultural study on perception of complex visual scenes, in both the number of cultural samples and overall sample size. Our findings aim to expand the body of research on holistic and analytic cognition beyond the WEIRD samples. In line with the recent findings by Uskul et al.54, our findings contradict the idea that while the “Western” samples are characteristic with analytic perception while the “Eastern” should perceive holistically. Much more complicated patterns of cultural differences emerge when we focus on characteristics of eye-movements such as dwell time on focal objects and background and saccadic length. Our results, therefore, support the idea that a much more differentiated approach to research on cross-cultural specifics of perception and cognition is needed. The insights gained from this study might enable researchers to formulate more refined and context-specific hypotheses in future investigations, thereby advancing the field’s knowledge base.

Most notably, the main hypotheses (Hma1, Hma2) that assumed a specific order of the countries regarding the attention to objects11,12 were not fully confirmed in the presented study. Eventough, participants from Taiwan exhibited the most holistic pattern of scene viewing having the shortest average dwell time on focal objects and the longest average saccadic length suggesting relatively highest proportion of ambient processing. On the other hand, and quite surprisingly, participants from Germany and Czechia that were assumed to be the most analytic (longest dwell on objects and shortest saccades), were in fact not significantly different from the participants from Taiwan. Moreover, participants from Ghana and Türkiye that were hypothesized to be somehow moderate in both main metrics were in fact the most analytic with the highest focus on focal objects and the shortest saccades. While not aligning with our initial hypotheses, this finding represents a substantial contribution to the understanding of context sensitivity in scene perception in underrepresented countries, including Ghana, Bulgaria, and Türkiye. Upon examining the relative dwell time allocated to focal objects and the background, a clear conclusion emerges: participants consistently devoted significantly more time to observing the objects compared to countries like Taiwan, Czechia or Germany. Notably, the Taiwan sample exhibited distinct characteristics compared to the other samples, as it displayed comparatively smaller differences in dwell times between objects and background. This finding suggests the presence of holistic patterns in the scanning of complex scenes within the Taiwan sample.

In exploratory analysis of saccadic length changes (Hmi2) we investigated whether the scanning patterns of participants from individual countries develop during the time from stimulus onset. Follet et al.22 observed a significant drop in saccadic length after three seconds from the start of the trial. This change in saccadic length suggests a switch from the initial ambient scanning of the scene to the object oriented (focal) processing. Their findings were not replicated in the current study. Moreover, we had initially assumed that the decrease in saccadic length would vary across cultural groups studied, with variations in the timing of the drop following stimulus onset. We expected that in countries with a holistic mindset, the decrease would happen later or be less prominent. However, when examining the average dwell time on focal objects and the average saccadic length as key indicators of holistic and analytic perception in this study (i.e. Ghana representing the most analytic and Taiwan the most holistic), we did not observe any temporal changes in saccadic length that aligned with these expectations. Although it should be noted that such patterns might emerge in some participants if analyzed at the individual level.

Next, we explored the relationship between characterics of the scenes - number of focal objects (Hmi3) and relative size focal objects (Hmi4) and the dwell time spent on focal objects. Interestingly, participants from Taiwan and both Türkish samples spent significantly more time on objects in stimuli with one focal focal object compared to stimuli with two focal objects. It was not surprising that the dwell time on focal objects was rising together with their relative size, as the size of objects on the retina (visual saliency) is one of the factors (apart from other factors such as their centrality or semantic saliency) that draw visual attention55.

The final analysis aimed to explore the potential effects of participant age, gender, and socioeconomic status on the dwell time allocated to focal objects (Hmi5). However, none of these variables demonstrated a significant impact on the observed dwell time.

While this research has contributed new insights into cross cultural similarities and differences in context sensitivity, it is important to acknowledge its limitations. Although we were able to collect a substantial cross-cultural dataset on scene perception, the final number of participants after data cleaning (N = 408, divided into seven groups) fell slightly below the target sample size determined through power analysis (N = 456, calculated for eight groups). Additionally, a sensitivity analysis revealed that we were able to detect small to medium effects in the case of ANOVA, medium to large effects in the case of post hoc tests, and medium effect in TOST. As a result, there is a possibility of an increased risk of Type II errors, which means we may have been less likely to detect small true effects53. Future research with a similar number of coutries and with ambition to detect small effects should take this into account and increase sample size.

As mentioned above, the research on holistic and analytic perception commonly compares typical “Western” (North America, Western Europe) and typical “Eastern” (East Asia) countries. Despite our research sample covering a relatively diverse set of countries from Europe, the Near East, Africa, and East Asia, the “Western” end of the spectrum is missing as a potential benchmark for an analytic country. This omission was due to the “higher power” intervention of COVID-19 and the associated restrictions. Moreover, it should be noted that our sample consists solely of university students and the results should not be generalized to the broader populations of the studied countries. It is crucial to recognize that our participants are primarily from larger cities and represent individuals with relatively higher levels of education and wealth within their respective cultures. The future research should therefore focus on deeper exploration of within-country variations of scanning patterns during scene perception, especially in the relatively understudied countries.

Apart from possible data collections in other countries, cultures, or subpopulations, the future research in the field should focus on the following issues. The field of cross-cultural research of perception and cognition suffers from its inability to find individual- and cultural-level predictors of the observed differences. Despite the current attempts to link the differences in perception with psychological constructs such as interdependent and independent social orientation56, social class26, or means of sustenance57, their results found no strong links or are limited to specific cultural contexts or have not been yet replicated elsewhere.

The progress is, in this respect, also hindered by psychometric and methodological issues connected with self- report scales that are widely used for measurement of the said psychological constructs. Specifically, the research on individualism/collectivism, resp. independence/interdependence encounters issues with adaptability of the self-report scales across cultures, more specifically with measurement invariance across cultures58,59. Therefore, part of the problem is a lack of reliable scales that could be used as predictors for less common cultural samples. For example Vignoles et al.60 attempted to validate a self-construal scale across 33 nations. Although this research is generally very impressive, the authors report only “marginally acceptable fit on all indices” which actually translates as a poor fit by the standards in the SEM research field61.

Besides these broader issues, future research should focus on cross-cultural variations of eye-movement patterns under different experimental tasks and designs. As demonstrated by Yarbus62 the nature of the instructions can solely influence the eye-movement patterns of participants. The obtained results were based on participants being instructed to view the pictures and assess their subjective liking. However, it is important to note that different instructions given to participants may lead to varying scanning patterns during scene perception. For example, Senzaki et al.63 observed that cultural differences were strongly mitigated when participants merely observed animated vignettes compared to condition when they were to describe the content of the scene. Similar observation was made by Köster and Kärtner64, who argue that context-sensitivity is socialized via a verbal route. Therefore, it would be valuable to replicate this research using alternative instructions or even different task paradigms, such as visual search, change detection, or moving object tracking. A further limitation is that we did not collect post-trial ratings of image familiarity or semantic clarity; cross-cultural differences in how easily a scene is recognised or interpreted could modulate the gaze patterns we report. Future studies should obtain such ratings to disentangle perceptual style from stimulus familiarity.

The timing of stimulus presentation is an additional factor that may have influenced the observed results. It is plausible that the duration of the trials could have impacted eye-movement patterns, such as attention to objects or saccadic lengths, leading to potential differences across the studied samples. For instance, it is possible that the transition from ambient to focal processing, which is associated with shortened saccadic lengths as described by Follet22, might occur at a later time following stimulus onset or vary in a culturally dependent manner. Moreover, longer presentation times would allow the calculation of eye-tracking metrics that are not illustrative in this research design such as transitions between various ROIs.

It is important to note that not all potentially plausible analyses were conducted or reported in this study. Exploring low-level characteristics of the scenes, such as centrality65, visual saliency66, or utilizing classification algorithms to assess background complexity, would be valuable additions to further enhance the understanding of the phenomenon. Moreover, relatively new and interesting methods of eye-movement analysis are currently available. Studies based on these methods employ a different approach than the analysis of fixations and saccades present in the current paper. For example Hsiao et al.67 developed a statistical method of individual-level comparison of eye-movement patterns based on combination of hidden Markov models and co-clustering that allows for a concurrent clustering of participants in both temporal (position of fixations) and spatial (order of fixations) dimension of eye-movements. Another option might be using at the trial-level the Gini coefficient of dispersion68.

Although we harmonised all acquisition parameters and showed that key data-quality metrics (data loss, maximum calibration accuracy threshold) did not differ by site, unmeasured device-specific artefacts can never be ruled out completely34. Importantly, our dependent variables - dwell time and saccade length - are relatively insensitive to the spatial errors below 1°.