Introduction

Political and ideological polarization has been on the rise in the United States. A growing body of work has studied this phenomenon in political organizations, news media, and voting populations, particularly in the context of social media use1,2. Social media platforms have garnered particular interest as vectors for the spread of political information and misinformation3,4,5. Phenomena such as echo chambers, filter bubbles, and super-spreaders have been well-documented on social media platforms, with evidence suggesting these may be causally related to political polarization6,7,8,9,10. Indeed, recent work suggests that increasing numbers of individuals are being radicalized into extremist views specifically through their online interactions11, with social media platforms seeking out better approaches for mitigating polarization and the presence of extremist and hate groups12,13,14. However, in spite of the increasing prevalence of polarized thinking in the population and indications of social media serving as mediators, it is not clear which psychosocial phenomena drive this trend: what does it mean to think in polarized terms?

Although we have scant knowledge of the origins and nature of polarized political thinking, there exists a well-established body of literature on a similar style of thinking referred to as “cognitive distortions,” patterns of thought known to be associated with certain mental health disorders. These disorders, including depression and anxiety, are referred to as “internalizing disorders.” The theory of Cognitive Behavioral Therapy (CBT)15,16, the current gold standard of evidence-based psychotherapy17,18,19, holds that the subjective distress associated with emotional disorders such as depression stems from cognitive distortions: maladaptive internalized beliefs that color one’s perception of self and environment. Cognitive distortions are thinking patterns “wherein individuals think about themselves, the future, and the world in inaccurate and overly negative ways”20.

Moreover, cognitive distortions are recognizable through an individual’s conscious, verbal ideation. For example, a cognitive distortion occurs when individuals attach negative labels to themselves (e.g., “I am a total loser”), talk about future events in terms of dichotomous expectations of worst-case scenarios (e.g., “My meeting will be a complete disaster”), or make baseless assumptions about someone else’s state of mind (e.g., “Everybody will think that I am a failure”).

These patterns of thought constitute a simplification and reduction of complex, nuanced real-world relations into rigid, black-and-white rules. This may color an individual’s interactions and perceptions, inducing negative affect and avoidant behavioral patterns, especially in the context of environmental stress. For example, the (exaggerated) belief that one is entirely disliked by all of one’s friends becomes a loss of confidence in oneself, possibly leading to the avoidance of social gatherings and other opportunities for social interaction. This, in turn, may further reinforce one’s loss of confidence in oneself or one’s relationships, resulting in a behavioral feedback loop. CBT, the gold-standard treatment for depression and anxiety, is concerned with the identification of such exaggerated and black-and-white thinking patterns and their associated behaviors in order to mitigate their effects on disorder symptoms21.

In general, CBT recognizes about 12 types of cognitive distortions which are all characterized by overly rigid, exaggerated, and black-and-white thinking, each with striking parallels to political polarization (Supplementary Tables S1 and S2). For example, “labeling and mislabeling” refers to the tendency of assigning unequivocal labels to oneself or others (e.g., “I am a loser” or “they are evil”) which has a clear analog in political thought in which political adversaries are labeled as “traitors” or “liars.” Similarly, “overgeneralizing” refers to the tendency to make broad generalizations from a single or a few events, which is also common in polarized thought.

Although cognitive distortions and political polarization show compelling parallels, they may refer to distinct individual and collective processes. Crucially, cognitive distortions describe an individual’s cognitive style, whereas polarization is defined by the individual’s relationship to their social environment. At the present time, it is not clear whether and how they are related, nor what the precise nature of their relationship could be, whether causal or otherwise.

While extensive research exists on polarization and cognitive distortions independently, their intersection has largely remained theoretical. This study aims to bridge this knowledge gap by investigating the occurrence of cognitive distortions and sociopolitical polarization during the 2016 and 2020 US presidential elections in the realm of social media. In particular, we employ language analysis techniques tailored for identifying cognitive distortions in text data to analyze a vast dataset encompassing tweets mentioning a US presidential candidate in both election years2. This dataset, previously explored for longitudinal trends in user ideology, offers a unique opportunity to investigate the possible psychosocial relationship between distorted thinking and sociopolitical ideology.

Methods

The datasets used in this study were initially collected and analyzed in Flamino et al.2. Since the data were anonymized and publicly accessible, formal Institutional Review Board approval was not required for this analysis. Additionally, as the data consisted of user-generated content shared voluntarily on a public platform, informed consent was not obtained from individual users. All procedures adhered to relevant ethical guidelines for using publicly available social media data, including data anonymization and respect for privacy. The study did not collect any personal demographic information, such as sex/gender, age, or race/ethnicity, as this data was not available in the dataset used. The use of Twitter data followed the platform’s terms of service and privacy policy. The analyses and results presented in this work are not associated with any preregistered studies or hypotheses.

Twitter data

We analyzed datasets of all tweets pertaining to political news on Twitter in 2016 and 2020. These datasets were originally collected and analyzed in ref. 2. These data comprise all tweets posted between 1 June and election day (8 November in 2016 and 2 November in 2020), with the names of the two presidential candidates in each of the presidential elections in 2016 and 2020 as inclusion criteria. We focus on the set of all tweets posted by individuals who posted at least one tweet containing a cognitive distortion schemata (CDS) n-gram, yielding 37,292,720 tweets by 1,558,934 users in 2016 and 47,532,985 tweets by 3,005,657 users in 2020. In order to capture longitudinal, within-individual trends, we focus particularly on users who posted at least 10 tweets in both the 2016 and 2020 datasets. This longitudinal set comprises 13,376,733 tweets from 2016 and 11,852,917 tweets from 2020 by 97,748 users.

Assumptions of the statistical models, including normality of residuals and homogeneity of variances, were not formally tested. However, linear mixed models are generally robust to moderate deviations from normality, particularly with sufficiently large sample sizes and balanced designs. We relied on this robustness in interpreting the results.

Detection of cognitive distortions in language

The detection or recognition of cognitive distortions as patterns of thought or a style of thinking in the context of mental health disorders is generally left to the expertise of experienced therapists. However, advances have been made in the detection of cognitive distortions in written language using machine learning and natural language processing techniques22. Bathina et al.23 proposed and demonstrated the effectiveness of a lexicon of 241 n-grams that serve as lexical markers of the expression of cognitive distortions in language. A panel of 10 CBT experts mapped a set of 241 CDS expressions (encoded as n-grams of length 1 to 5) that mark the expression of at least one type of cognitive distortion. The presence of the lexicon’s n-grams in online text was found to be associated with internalizing disorders such as depression23.

The n-grams in the CDS lexicon were formulated to capture the expression of cognitive distortions as a style of thought, regardless of the particular topic, subject, or mental health disorder. For example, “I am a” marks the expression of a cognitive distortion of the Labeling type, regardless of what the labeling pertains to in terms of its topic or subject. This allows their use for the general detection of cognitive distortions in language beyond the context of mental health disorders and has been successfully demonstrated in a number of contexts, including political framing of online discussions about abortion24 and historical changes in collective language20.

The different types and their definitions that were used to compile the set of n-grams are described in Supplementary Tables S1 and S2 along with sample tweets for each CDS type to illustrate how they could capture cognitive distortions in general language.

CDS prevalence

Our CDS n-gram schemata is a function F(t) → {0, 1} that maps each tweet t to 1 if it contains any CDS n-gram or 0 otherwise. In other words, for any user U, we have a corresponding set of tweets TU, the set of tweets produced by U. The prevalence for each user U is thus calculated as

$${P}_{U}=\sum {k = 1}_{t\in {T}_{U}}F(t)/| {T}_{U}|$$
(1)

Grouping over the set of users who post in each election year, we obtain a distribution of CDS prevalence for each election year (Fig. 1). For the set of users who meet our criteria, posting at least 10 tweets in both 2016 and 2020, we can further calculate within-individual prevalence ratios over the 2 years. We calculate the CDS prevalence of the given user U in 2016 and 2020 as PU,2016 and PU,2020, respectively, and so the prevalence ratio for the user is given as PRU = PU,2020/PU,2016. We thus obtain a distribution of prevalence ratios over users, as shown in Supplementary Fig. S1.

Fig. 1: CDS prevalence from 2016 to 2020.
figure 1

CDS prevalence for the same set of users is shown for 2016 (in blue) and 2020 (in orange), among whom we could define both CDS prevalence and latent ideology. Solid vertical lines indicate the population means for each year.

Bootstrap analysis

Next, we use bootstrap resampling to compare differences in the prevalence of individual CDS types. Given that restricting our counts to a specific type will introduce many users with zero values, for this analysis we calculate the between-group CDS prevalence as the proportion of all tweets containing a CDS n-gram of the given type produced by the resampled population. Our CDS n-gram function thus becomes function F(tC) → {0, 1} that maps each tweet t to 1 if it contains a CDS n-gram of type C or 0 otherwise. Here, we consider the set of tweets t from 2016 and 2020 as T2016 and T2020, respectively, and so the prevalence of CDS type C is calculated by \({P}_{C,year}=\frac{{\sum }_{t\in {T}_{year}}F(t,C)}{| {T}_{year}| }\). Prevalence ratios are thus given by \(P{R}_{C}=\frac{{P}_{C,2020}}{{P}_{C,2016}}\). Our bootstrap analysis comprised a random resampling with replacement n of individuals from our sample population, where n is the size of our sample. These bootstrap estimates are calculated by resampling repeatedly, B = 10,000 times, while the CDS type prevalence ratios for the resampled population are recorded at each resample step. Our bootstrap analysis thus produces a distribution of CDS prevalence ratios for each CDS type C, \(\{P{R}_{C,1}^{* },P{R}_{C,2}^{* },\ldots ,P{R}_{C,B}^{* }\}\), such that the overlap of inner 95 percentile bounds of the bootstrapped distribution of PRs with 0 is indicative of non-significant change in prevalence.

Latent ideology

The latent ideology estimation follows the method developed in refs. 25,26 adapted to using retweet interactions instead of following relations. As in ref. 26, we use correspondence analysis27 (CA) to infer ideological positions of Twitter users.

The adjacency matrix, A, of the retweet network between the influencers and their retweeters is the matrix with element aij equal to the number of times the user i retweeted influencer j. We only select tweets that have been sent from an official Twitter client in order to limit the presence of bots and professional accounts and we also remove users that show a low interest in the US election by removing users that retweeted less than three different influencers. For the 2016 data, matrix A has 751,311 rows corresponding to distinct users, 593 columns corresponding to influencers, and a total number of retweets equal to 39,385,772. For the 2020 data, matrix A has 2,034,970 rows corresponding to distinct users, 591 columns corresponding to influencers, and a total number of retweets equal to 153,463,788.

The CA method consists of the following steps28. The matrix of standardized residuals of the adjacency matrix is computed as \({{\bf{S}}}={{{\bf{D}}}}_{r}^{-1/2}({{\bf{P}}}-{{\bf{r}}}{{\bf{c}}}){{{\bf{D}}}}_{c}^{-1/2}\) where \({{\bf{P}}}={{\bf{A}}}{({\sum }_{ij}{a}_{ij})}^{-1}\) is the adjacency matrix normalized by the total number of retweets, r = P1 is the vector of row sums, c = 1TP is the vector of column sums, Dr = diag(r), and Dc = diag(c). Using the standardized residuals allows the inference to take into account the variation of popularity and activity of the influencers and the users, respectively26. An SVD is then computed such that S = UDαVT with UUT = VVT = I and Dα being a diagonal matrix with the singular values on its diagonal. The positions of the users are then given by the standard row coordinates: \({{\bf{X}}}={{{\bf{D}}}}_{r}^{-1/2}{{\bf{U}}}\) where we only consider the first dimension, corresponding to the largest singular value. Finally, the ideological positions of the users are found by standardizing the standard row coordinates to have a mean of zero and a standard deviation of one. The ideological position of the influencers is given by the median of the weighted positions of their retweeters.

We tested the robustness of our method by varying the way we construct matrix A as follows: (1) removing entries with weight 1 in order to discard relations that do not show a strong ideological alignment; (2) considering the logarithm of the number of retweets as weight in order to consider a sublinear relation between the number of retweets and the strength of ideology alignment; (3) considering a random subsample of the 2020 retweet data of the same size than the 2016 retweet data to control for a potential effect of the difference in sizes of the two datasets. All of these robustness tests show extremely similar results to our initial method with correlation coefficients between the user position distributions of the robustness tests and the initial configuration all above 0.995. We also compare the users’ latent ideology distribution with the users’ average leaning distribution and find a correlation above 0.90 for 2016 and 2020. The average leaning of a user is computed for all users having at least three tweets classified in at least one media category and estimated as the weighted average of the media category positions, given as: fake news = 4/3, extreme bias right = 1, right = 2/3, right-leaning = 1/3, center = 0, left-leaning = −1/3, left = −2/3, extreme bias left = −1.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

In the following analyses, we focus on the subset of users for whom we could determine both latent ideology and CDS prevalence in 2016 and 2020 (see “Methods”). This subset includes 53,493 individuals.

CDS prevalence by year

To identify distorted thinking patterns among social media users regarding the themes of the 2016 and 2020 US presidential elections, we determine the prevalence of cognitive distortions in the Flaminio et al.2 data using an existing and vetted lexicon of context-free n-grams, referred to as CDS. Each CDS corresponds to a cognitive distortion of one or more of 12 types of cognitive distortions23, as detailed in the “Methods.”

Analyses of overall CDS prevalence (see “Methods”) demonstrate a clear trend of increasing CDS prevalence from 2016 (in blue) to 2020 (in orange), as shown in Fig. 1. Mean CDS prevalence increased from 0.204 in 2016 to 0.292 in 2020, representing an increase of over 43%. This increase is statistically significant (p value << 0.01) under a one-sided paired t-test with a null hypothesis stating that the mean of the distribution underlying the 2020 CDS prevalence is greater than the mean of the 2016 CDS prevalence. We found similar results when considering all users for whom CDS prevalence could be defined, as discussed in the Supplementary Information and shown in Supplementary Fig. S1.

This signal is not limited to population-level trends but is measurable on the level of individual users. To investigate this effect, we computed within-individual prevalence ratios (see “Methods”) for each user and found a mean prevalence ratio of 1.76, as shown in Supplementary Figs. S1 and S2. In other words, for the average user in our dataset, we find a 76% increase in the prevalence of tweets containing at least one instance of cognitive distortion.

These prevalence values and ratios do not match any specific cognitive distortion type, as all tweets in these prevalence computations are simply tested for whether or not they contain at least one instance of any CDS n-gram. However, there may be informative differences in the prevalence of specific CDS types. Bootstrap resampling (see “Methods”) of aggregate population prevalence allows for measurement of changes in prevalence of specific CDS types while also controlling for effects introduced by our specific sample of users. Resampling also provides an additional test for measuring the reliability of the prevalence ratio by providing precise confidence intervals. We computed these prevalence ratios for each CDS type and all CDS markers. We found no overlap in any confidence interval with 1, indicating that the increase in prevalence from 2016 to 2020 is persistent across every CDS type (see Fig. 2). The most significant prevalence ratios occurred with the cognitive distortion types of “Emotional Reasoning,” “Overgeneralizing,” “Catastrophizing,” and “Mindreading.” Supplementary Tables S1 and S2 contain some examples of these cognitive distortions.

Fig. 2: Bootstrapped prevalence.
figure 2

Bootstrapped CDS prevalence ratios of tweets by each cohort, separated by CDS n-gram type. Median values are shown by the center lines, with the lower and upper box limits corresponding to the first and third quartiles, respectively, and whiskers corresponding to the 95% CI range. We found no overlap in any confidence interval with 1 (red vertical line), indicating that the increase in prevalence from 2016 to 2020 is persistent across every single CDS type.

CDS prevalence and latent ideology

The findings above indicate a notable rise in the prevalence of cognitive distortions (CDS) from 2016 to 2020, particularly concerning the themes surrounding the US presidential elections. In this section, we investigate the increase in cognitive distortions (CDS) prevalence among Twitter users in conjunction with the observed rise in individual polarization, as documented in ref. 2. The latent ideology distribution (see “Methods”) for the users for whom we could define both CDS prevalence and latent ideology is shown in Fig. 3. The figure illustrates a noticeable escalation in users’ polarization in 2020 (right panel) compared to 2016 (left panel), along with a convergence of their latent ideologies into two distinct groups. This pattern mirrors the patterns observed for the overall population (i.e., all users for whom we could define the latent ideology)2.

Fig. 3: Latent Ideology from 2016 to 2020.
figure 3

On the left, displayed in blue, is the latent ideology of users in 2016, whereas on the right, we present the latent ideology of users in 2020. We observe a notable increase in polarization over this period, characterized by more distinctly defined peaks and a marked reduction in the number of users with intermediate latent ideologies.

To assess the consistency of our findings across ideological groups, we focus on users who maintained a stable ideological stance between the two elections. We classified individuals into two groups, left-leaning if their latent ideology score was below  −0.5 in both 2016 and 2020 (n = 29,048), and right-leaning if their score was above 0.5 in both years (n = 18,631). Given that our ideology score yields a bimodal distribution, we analyze user polarity according to which of the two peaks (right or left) they are nearest. We define the degree of polarization of each user in a given year as the user’s distance from the nearest peak, normalized by the distance of that peak from 0. For example, referring to Fig. 3, left, if the right peak in 2016 is positioned at 1.3, a user with a latent ideology of 0.5 would have a degree of polarization calculated as (1.3−0.5)/1.5 = 0.53. Formally,

$${{{\rm{polarization}}}}_{i}^{{{\rm{year}}}}=1-\frac{\left\vert {{\rm{Latent}}}\,{{{\rm{Ideology}}}}_{i}^{{{\rm{year}}}}-{{{\rm{peak}}}}_{\pm }^{{{\rm{year}}}}\right\vert }{\left\vert {{{\rm{peak}}}}_{\pm }^{{{\rm{year}}}}\right\vert }$$
(2)

with \({{{\rm{polarization}}}}_{i}^{{{\rm{year}}}}\) being the degree of polarization of user i, year in (2016, 2020) and peak± being the right peak (peak+) if \({{\rm{Latent}}}\,{{{\rm{Ideology}}}}_{i}^{year}\ge 0\), and the left peak (peak) otherwise. The degree of polarization defined in this way approaches 1 when the user is highly polarized (i.e., close to the nearest peak), and it is close to 0 otherwise (maximum distance from the peak).

To analyze changes in CDS prevalence over time, we employed linear mixed-effects models (LMMs) with random intercepts for users, which allow for modeling repeated observations within individuals29,30,31,32. This approach is preferable to paired t-tests, as it accounts for both between-user differences and within-user temporal change without assuming independence. Our primary model included election year (2016 vs. 2020) and polarization as fixed effects:

$${{\rm{CDS}}}\,{{\rm{prevalence}}} \sim {{\rm{year}}}+{{\rm{polarization}}}+(1| {{\rm{user}}}).$$
(3)

where (1user) denotes a user-level random intercept. The results are shown in Table 1.

Table 1 Linear mixed-effects model results and effect sizes for CDS prevalence by ideological group

Among left-leaning users, CDS prevalence significantly increased from 2016 to 2020 (β = 0.082, SE = 0.001, p < 0.0001, 95% CI = [0.080, 0.084]), reflecting a rising trend in distorted language over time. The degree of polarization was also positively associated with CDS prevalence (β = 0.066, SE = 0.006, p < 0.001, 95% CI = [0.055, 0.077]), indicating that more ideologically extreme left-leaning users were more likely to engage with CDS-related content within a given year.

Among right-leaning users, CDS prevalence likewise increased from 2016 to 2020 (β = 0.092, SE = 0.001, p < 0.0001, 95% CI = [0.090, 0.094]), suggesting a general temporal rise in such language.

Additionally, the baseline level of CDS prevalence—captured by the intercept—was significantly higher for right-leaning users (β = 0.189, SE = 0.006, p < 0.0001, 95% CI = [0.177, 0.201]) than for left-leaning users (β = 0.151, SE = 0.005, p < 0.0001, 95% CI = [0.140, 0.161]) in 2016, when polarization was minimal or average. This suggests that right-leaning users generally engaged with CDS content at higher levels, but we did not find a statistically significant relationship between CDS usage and degree of polarization.

To assess the robustness of our findings, we re-estimated the mixed-effects models while including total user activity (i.e., number of tweets) as a covariate33,34 (Supplementary Table S3). The core results remained stable. For both ideological groups, CDS prevalence increased significantly from 2016 to 2020, reflecting a general rise in distorted language. Among left-leaning users, polarization continued to exhibit a strong and significant positive association with CDS prevalence (β = 0.069, p < 0.0001, 95% CI = [0.058, 0.081]), even after adjusting for overall activity. For right-leaning users, the effect of polarization remained positive but was smaller in magnitude and significant (β = 0.013, p = 0.041, 95% CI = [0.001, 0.026]). Total activity was negatively associated with CDS prevalence in both groups. However, the effect size is extremely small to be considered negligible.

As a second robustness check, we tested whether the increase in CDS prevalence over time differed systematically between ideological groups by estimating an LMM with a year-by-group interaction term: CDS prevalence ~ year + group + year × group + polarization + (1user), as shown in the Supplementary Table S4. Results confirmed a significant overall increase in CDS prevalence from 2016 to 2020 (β = 0.084, p < 0.0001, 95% CI = [0.082, 0.085]), indicating that users were more likely to post content containing a CDS n-gram during the later election year. In the pooled model, right-leaning users had slightly lower CDS prevalence than left-leaning users in 2016 (β = −0.013, p < 0.0001, 95% CI = [−0.015, −0.011]), as indicated by the main effect of group when year = 2016. However, the significant interaction between year and group (β = 0.007, p < 0.0001, 95% CI = [0.004, 0.009]) showed that the increase in CDS prevalence over time was greater for right-leaning users than for their left-leaning counterparts. That is, although right-leaning users started at a marginally lower level in 2016 (after controlling for polarization), they exhibited a steeper growth trajectory in CDS engagement by 2020. Additionally, polarization was positively associated with CDS prevalence across all users (β = 0.039, p < 0.0001, 95% CI = [0.031, 0.048]), suggesting that individuals who were more ideologically extreme—regardless of whether they leaned left or right—tended to engage more with CDS discourse. This contrasts with earlier subgroup analyses, where polarization significantly predicted CDS prevalence only among left-leaning users, with no clear effect observed among right-leaning users. In the pooled model, however, the overall positive effect of polarization suggests that ideological extremity is broadly associated with higher CDS engagement, even if the strength of that relationship varies across ideological groups. Taken together, these results indicate that while CDS discourse remains more prevalent among right-leaning users overall, the temporal growth of this pattern is especially pronounced once polarization is accounted for.

Finally, to address the imbalance in group sizes (left-leaning: n = 29,048, right-leaning: n = 18,631), we conducted a bootstrapped down-sampling analysis. We repeatedly sub-sampled the larger group (left-leaning users) to match the size of the right-leaning group and re-estimated the mixed-effects model 100 times. As summarized in Supplementary Table S5, the findings consistently showed that polarization was strongly and significantly associated with increased CDS prevalence among left-leaning users across all iterations (mean β = 0.069, SD = 0.006, 100% significant, 95% CI = [0.0574, 0.0806]). In contrast, the average effect for right-leaning users was substantially smaller and less reliable (mean β = 0.013, SD = 0.007, 95% CI = [0.0005, 0.0255]), with polarization reaching statistical significance in only 41% of iterations. These results reinforce the asymmetry found in the main analysis, confirming that the link between ideological extremity and distorted language (within-year) is more stable and pronounced among left-leaning users.

We emphasize that these findings are purely correlational and do not imply any causal or temporal directionality. The associations observed reflect within-year patterns, not longitudinal changes, and should not be interpreted as evidence that polarization causes changes in CDS prevalence or that CDS engagement influences polarization.

CDS prevalence vs. latent ideology

To understand the relation between polarization and CDS prevalence, we introduce a new variable capturing the change in polarization between 2016 and 2020, which allows us to test whether variation in CDS prevalence is associated with users becoming more (or less) polarized over time. We define the variation of polarization of a user between 2 different years as:

$$\Delta {P}_{i}={{{\rm{polarization}}}}_{i}^{2020}-{{{\rm{polarization}}}}_{i}^{2016}$$
(4)

ΔPi > 0 indicates that a user became more polarized from 2016 to 2020. The higher the value of ΔP, the stronger the change in polarization compared to 2020. Conversely, ΔPi = 0 highlights no change in user polarization between 2016 and 2020, while ΔPi < 0 indicates a decrease in polarization.

To assess whether longitudinal changes in political polarization predict variation in CDS prevalence, we estimate an LMM that includes the change in ideological polarization between 2016 and 2020 (ΔP) as a fixed-effect predictor, alongside the election year and the ideological group. This model structure allows us to evaluate whether individuals who became more ideologically extreme over time were more likely to engage in distortion-related discourse. Formally, the model is defined as:

$${{\rm{CDS}}}\,{{\rm{prevalence}}} \sim {{\rm{year}}}+{{\rm{group}}}+\Delta P+(1| {{\rm{user}}})$$
(5)

where (1user) denotes a user-level random intercept. Results are shown in Table 2.

Table 2 Linear mixed-effects model results and effect sizes for CDS prevalence among moderate users

The CDS prevalence significantly increased from 2016 to 2020 (β = 0.076, p < 0.0001, 95% CI = [0.073, 0.079]), and was slightly higher on average among right-leaning users (β = 0.015, p < 0.0001, 95% CI = [0.011, 0.020]). Notably, we found that individuals who experienced higher ideological polarization between 2016 and 2020 also exhibited significantly higher levels of CDS prevalence across those years (β = 0.062, p < 0.0001, 95% CI = [0.045, 0.080]). This indicates that longitudinal increases in polarization are positively associated with higher CDS prevalence, even after accounting for time and ideological groups. Figure 4 visualizes this relationship by comparing users grouped by levels of polarization change (Medium, High ΔP) and tracking their CDS prevalence across time. All groups show an increase in CDS engagement from 2016 to 2020, but the slope of the increase is steeper among users with higher ΔP. In both years, users in the High ΔP group exhibit higher CDS prevalence than their Medium ΔP counterparts. This pattern suggests that not only are more polarized individuals more likely to engage in CDS discourse but that the magnitude of change in their political orientation corresponds with sharper increases in such engagement. While these results are consistent with a temporal association, we stress that this relationship remains correlational and should not be interpreted as causal.

Fig. 4: Mean CDS prevalence over time by level of polarization change (ΔP).
figure 4

Users were split into two groups based on the median value of ΔP (Medium vs. High). Those with greater increases in polarization (High ΔP) consistently exhibit higher CDS prevalence and a steeper rise over time. Error bars represent 95% confidence intervals.

To ensure robustness, we performed several supplementary analyses. First, we tested a simplified change-on-change model where the dependent variable (CDS prevalence) and the predictor (ΔP) captured between-year user-level differences. See Supplementary Table S6. The relationship between increasing polarization and CDS prevalence remained significant. Second, we included total user activity (i.e., tweet volume) as a covariate to control for variation in engagement. See Supplementary Table S7. The effect of ΔP remained stable. Third, we confirmed the relationship in a reduced model excluding ideology group and activity covariates. Across all specifications, increasing ideological polarization (ΔP) consistently predicted higher CDS prevalence. See Supplementary Table S8.

To examine whether the relationship between polarization change and CDS prevalence differs by ideological group, we estimated an LMM including an interaction between group and ΔP:

$${{\rm{CDS}}}\,{{\rm{prevalence}}} \sim {{\rm{year}}}+{{\rm{group}}}+\Delta P+{{\rm{group}}} * \Delta P+(1| {{\rm{user}}}).$$
(6)

As shown in Table 3, polarization change (ΔP) was positively associated with CDS prevalence overall (β = 0.100, p < 0.0001, 95% CI = [0.073, 0.127]), indicating that, among left-leaning users (the reference group), increases in polarization were linked to greater use of distorted language. The interaction term for right-leaning users was negative and significant (β = −0.066, p < 0.0001, 95% CI = [−0.101, −0.030]), reducing the effect of polarization change for this group to β = 0.034. While ΔP remains a significant predictor of CDS prevalence for right-leaning users, its impact is notably smaller compared to the left. Right-leaning users exhibited a higher baseline level of CDS prevalence (β = 0.032, p < 0.0001, 95% CI = [0.022, 0.042]), which may partially explain this attenuated effect—having already engaged in more distorted language, additional polarization may contribute less to further increases in CDS expression. These findings suggest an ideological asymmetry: while polarization change is a strong and consistent predictor of CDS prevalence among left-leaning users, the relationship is weaker and attenuated for right-leaning users.

Table 3 Linear mixed-effects model with ΔP × group interaction

A bootstrap analysis using equal-sized samples of left- and right-leaning users (100 iterations) confirmed the ideological asymmetry, as shown in Supplementary Table S9. While polarization change predicted increased CDS prevalence for both groups, the effect was significantly stronger among left-leaning users. The interaction term remained negative and marginally significant across iterations, suggesting this pattern is not due to sample size imbalance.

To examine whether the association between polarization change (ΔP) and CDS prevalence exhibits nonlinear patterns across ideological groups, we estimated a mixed-effects model including a quadratic term (ΔP2) and its interaction with group identity:

$$ {{\rm{CDS}}}\,{{\rm{prevalence}}} \sim {{\rm{year}}}+{{\rm{group}}}+\Delta P+\Delta {P}^{2}+{{\rm{group}}}\times \Delta P \\ +{{\rm{group}}}\times \Delta {P}^{2}+(1| {{{\rm{user}}}}_{i})$$
(7)

As shown in Table 4, polarization change was positively associated with CDS prevalence among left-leaning users (the reference group; β = 0.166, p < 0.0001, 95% CI = [0.081, 0.252]). While the quadratic term for this group was negative, it was not statistically significant (β = −0.136, p = 0.109, 95% CI = [−0.303, 0.030]). In contrast, right-leaning users exhibited a negligible linear relationship (β = 0.010), but a significant positive quadratic effect (β = 0.060, p = 0.041). This pattern indicates a nonlinear trend: CDS prevalence among right-leaning users remained relatively flat at lower levels of polarization change, but increased more steeply at higher values of ΔP, reflecting a distinct U-shaped relationship. Figure 5 visualizes these effects, showing predicted CDS prevalence across values of polarization change (ΔP) for each ideological group.

Table 4 Quadratic mixed-effects model with ΔP2 × group interaction
Fig. 5: Predicted CDS prevalence across values of polarization change (ΔP), based on a mixed-effects model with quadratic terms.
figure 5

The plot illustrates a steadily increasing, concave relationship for left-leaning users (blue), and a flatter, convex curve for right-leaning users (red), with a sharper increase at higher levels of polarization. This pattern is consistent with an ideological asymmetry: left-leaning users exhibit a linear rise in CDS prevalence, while right-leaning users show limited change until higher ΔP levels.

To visualize predicted CDS prevalence across levels of polarization change, we computed marginal predictions from the fitted quadratic mixed-effects model. Prior to model estimation, the ΔP variable was mean-centered and squared to create a nonlinear term (ΔP2), and interaction terms between the group and both linear and quadratic polarization terms were included. For prediction, we generated a grid of centered ΔP values across the observed range and computed corresponding ΔP2 values. We held the year constant at 2020 to focus on cross-sectional variation, and produced predictions separately for left- and right-leaning users. Predicted values were calculated from the model’s fixed effects only, excluding user-specific random intercepts. For visualization, we back-transformed the centered ΔP values to the original (raw) polarization scale and plotted predicted CDS prevalence across this range. To improve interpretability, we restricted the plot to non-negative values of ΔP, reflecting the domain of substantive interest.

The figure shows that among left-leaning users, CDS prevalence increases with ΔP and then begins to level off, consistent with a mild saturation pattern. In contrast, right-leaning users exhibit a nonlinear trajectory: CDS prevalence remains relatively flat at lower levels of polarization, but rises more steeply at higher values—contradicting a ceiling effect. This divergence reveals a complex asymmetry: while polarization change is strongly and linearly related to CDS engagement for left-leaning users, the relationship for right-leaning users is quadratic, with a delayed but accelerating increase in distorted language use as polarization intensifies.

Temporal link between polarization and CDS

To explore the potential directional relationship between ideological polarization and engagement in CDS discourse, we conducted a time-lagged mixed-effects regression analysis using user-level data from 2016 to predict outcomes in 2020. Specifically, we estimated two separate LMMs with random intercepts for users. The first model (Model A) tested whether polarization in 2016 predicted CDS prevalence in 2020:

$${{{\rm{CDS}}}}_{2020} \sim {{{\rm{polarization}}}}_{2016}+(1| {{\rm{user}}})$$
(8)

The second model (Model B) tested the reverse association—whether CDS prevalence in 2016 predicted polarization in 2020:

$${{{\rm{polarization}}}}_{2020} \sim {{{\rm{CDS}}}}_{2016}+(1| {{\rm{user}}})$$
(9)

As shown in Table 5, the two models reveal a directional asymmetry. In Model A, users with higher levels of ideological polarization in 2016 were slightly less likely to engage in CDS discourse in 2020 (β = −0.028, p < 0.0001, 95% CI = [−0.029, −0.026]). In contrast, Model B showed the opposite pattern: users who engaged more in CDS discourse in 2016 became more polarized by 2020 (β = 0.052, p < 0.0001, 95% CI = [0.046, 0.057]). These findings suggest that early CDS engagement is a stronger predictor of future polarization than early polarization is of later CDS behavior. To further test the directionality of this association, we conducted a Difference-in-Differences (DiD) analysis comparing changes in CDS and polarization from 2016 to 2020 between users who exhibited increased levels of the other variable and those who did not (see Supplementary Tables S10 and S11). Consistent with the lagged model, DiD results showed that increases in CDS predicted greater increases in polarization, whereas polarization increases ha less impact on CDS engagement.

Table 5 Lagged mixed-effects models between polarization and CDS prevalence

Overall, while both relationships are statistically robust, their directions differ. Higher polarization at baseline was associated with a slight reduction in later CDS use, whereas earlier CDS engagement was linked to a more pronounced increase in future polarization. This asymmetry suggests that distortion-related discourse may play a stronger role in driving ideological polarization than the reverse. However, because these analyses rely on observational data, the results should be interpreted as correlational rather than causal. They are consistent with a potential feedback loop between polarization and CDS, but further research using experimental or quasi-experimental designs is needed to establish causal mechanisms.

Discussion

By leveraging social media data and analysis methods tailored to detect markers of distorted thought patterns, our study demonstrates the value of online discourse as a lens for understanding complex social dynamics. We observed a rise in language associated with cognitive distortions among Twitter users engaged in U.S. presidential election conversations between 2016 and 2020. Individuals who became more polarized over time also exhibited increased use of cognitively distorted language, though we caution that these associations are correlational rather than causal. The relationship between polarization and cognitive distortions resonates with existing literature, underscoring concerns about the escalation of cognitive distortions in recent years20,35,36,37. The observed cognitive distortions, such as dichotomous thinking and catastrophizing, align with those addressed in CBT interventions15,16,38. Bathina et al. carried out an analysis using the same CDS methodology to compare individuals with and without depression23. They found that personalizing, emotional reasoning and overgeneralizing were the categories of greatest increased prevalence in the population with depression, closely matching our own findings, as shown in Fig. 2. The fact that catastrophizing and mindreading were the third and fourth categories of greatest change in our analysis might highlight the particular kinds of distortions that differentiate polarization effects from those of depressive disorders. For example, polarization may include greater emphasis on in-group/out-group belief systems, leading to increased mindreading, while perceived large-scale real-world outcomes of political events lead to increased catastrophizing.

Our results offer an empirical perspective on theories proposing a link between cognitive distortions and political polarization such as the “reverse-CBT hypothesis” posited by Lukianoff et al.39,40 or the “belief traps” model of Scheffer et al.41. Although both left- and right-leaning users exhibit this behavior, our results reveal that right-leaning individuals not only begin with a higher baseline level of CDS usage but also show a more slightly pronounced increase across the two elections. However, the link between rising CDS prevalence and polarization is weaker among right-leaning users, potentially due to a saturation effect. This asymmetry aligns with prior investigations showing that political extremism on both sides of the political spectrum is correlated with increased use of angry and negative language42, but that right-leaning users have a greater tendency towards language associated with such motivations as threat, power, security, and anxiety43. These findings also fit into the substantial body of work that has explored the asymmetries in psychological motivation along ideological lines. Jost et al.44 argue that conservative ideology is associated with “intolerance of ambiguity, dogmatism, avoidance of uncertainty, cognitive simplicity, and personal needs for order, structure, and closure,” notions that are strikingly similar to patterns of dichotomous thinking and catastrophizing. While these motives have been found to be negatively associated with liberal views in prior time periods, the growth of both polarization and CDS prevalence between 2016 and 2020 may be reflective of rising perceptions of threat and the need for group belonging in narratives across the political spectrum. Indeed, our results mirror earlier findings of a higher prevalence of terms associated with an increase in cognitive distortions in various domains of public discourse20,45. By connecting individual cognitive patterns on social media to large-scale social trends, our study not only corroborates these theories but also suggests the potential for CBT-inspired interventions to mitigate the potential polarizing and radicalizing effects of online interactions.

Limitations

Our analysis revealed a link between rising cognitive distortions and increasing political polarization. Notably, the evidence suggests that distortion-related discourse may play a more active role in fueling polarization than simply reflecting it, pointing to CDS as a potential contributor to ideological divides. However, this association warrants cautious interpretation. While the parallel rise of these phenomena is striking, their underlying mechanisms may differ. Cognitive distortions are typically rooted in individual psychological patterns linked to internalizing disorders, whereas polarization arises from social dynamics and broader ideological contexts. It remains unclear whether their relationship reflects co-occurrence, causation, or a more complex interaction shaped by shared psychosocial influences. Our dataset, limited to two-time points 4 years apart, does not allow for robust longitudinal analysis. Future research should investigate the temporal dynamics of CDS in political discourse, aiming to clarify causal pathways and identify potential targets for intervention. Additionally, shifts in platform policies over the study period may have influenced user behavior, underscoring the need for future work to incorporate varied data sources, including other platforms and experimental designs, to support more definitive, long-term insights and intervention strategies46.

We furthermore caution that we are not studying a relation between internalizing disorders, such as depression and anxiety, and political polarization, but the possible association of cognitively distorted thinking with polarized modes of thought. However, we note that the literature does indicate the possibility of complex interactions between dysfunctional or distorted thinking, extremist political beliefs, and mental health disorders40,41,47,48. Ford et al.49 find that as engagement with politics permeates daily life, it can exert a detrimental influence on individuals’ mental and physical health. Nayak et al.50 find that individuals who report higher levels of polarization are more likely to develop depression and anxiety disorders. This particular question remains an active research front that merits additional exploration well beyond our own findings.

While mindful of the limitations inherent in our methodology and the scope of our data, this research lays the groundwork for future investigations into the multifaceted dynamics of political polarization. Our work underscores the urgent need for interdisciplinary approaches to tackle the complex challenges at the confluence of mental health, digital communication, and political discourse. Ultimately, by deepening our understanding of the cognitive and psychosocial underpinnings of ideological divisions, we aim to contribute towards more empathetic, informed, and constructive public discourse.