Measuring political polarization through visible interactions between religious and non-religious citizens

Caliskan, Cantay; Ke, Liangze; Li, Yichen; Shimazaki, Yuka; Zhong, Muchen

doi:10.1038/s41598-025-01307-5

Download PDF

Article
Open access
Published: 30 September 2025

Measuring political polarization through visible interactions between religious and non-religious citizens

Cantay Caliskan¹,
Liangze Ke¹,
Yichen Li¹,
Yuka Shimazaki¹ &
…
Muchen Zhong²

Scientific Reports volume 15, Article number: 33795 (2025) Cite this article

887 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

This study offers a new method for measuring polarization by using advanced computer vision techniques that involve object detection and measurements of physical distances between pedestrians. Motivated by escalating political polarization around the world, and specifically by the ideological divide between secularism and political Islam in Turkey, we analyze more than 1,400 publicly available YouTube videos recorded on the streets of Turkey. From these videos, we extract and use approximately 170,000 frames that show pedestrians. The analysis detects and categorizes pedestrians based on their gender and level of religiosity by using the YOLOv5 algorithm and develops and refines two innovative distance estimation techniques for calculating the relative distances between pairs of pedestrians. Our unique technical approach allows us to convert the 2D distances in the street videos into 3D relative distances between pedestrians of different genders and levels of religiosity. These distances are then used as a proxy for measuring the extent of polarization. The study concludes that social factors significantly influence these distances, with individuals from similar backgrounds (i.e., religious people, religious females, and non-religious females) tending to walk closer to their in-group. The greatest distances are measured between non-religious males and religious females, as well as between religious females and non-religious males, reflecting traditional gender boundaries in predominantly Muslim communities and highlighting how religious and cultural norms shape social interactions. The image dataset we have assembled stands as the most extensive collection of thematic street imagery found in computational social science research and represents the largest dataset ever gathered for analyzing political polarization in Turkey.

Quantifying social organization and political polarization in online platforms

Article 01 December 2021

Affective polarization and dynamics of information spread in online networks

Article Open access 07 June 2024

No echo in the chambers of political interactions on Reddit

Article Open access 02 February 2021

Introduction

“Science is the most reliable guide in life.”

– Mustafa Kemal Atatürk.

“We will raise a religious generation.”

– Recep Tayyip Erdoğan.

In recent years, the surge of political polarization between secularism and political Islam within Turkish society has become a subject of significant concern and scholarly inquiry. In the streets and alleys of a Turkish city like Ankara or Istanbul, a street scene unfolds: a woman dressed soigné attire and embracing a relaxed posture walks, crossing paths with another woman, a believer, draped in a colorful hijab (as common in Turkish culture) and wrapped by a heavy silky clothes that covers almost all her head and body. This juxtaposition vividly encapsulates the status quo of politically polarized Turkish streets. Measuring the extent of this phenomenon is crucial for developing effective strategies to mitigate the adverse effects and the understanding the political repercussions of polarization. This project embarks on a multidisciplinary approach, integrating advanced computer vision techniques with statistical estimation methodologies to obtain an understanding of how identities are potentially causally linked to political polarization. Although the growing polarization between religious and secular groups in Turkey serves as a vivid backdrop, the foremost contribution of this paper is methodological. Specifically, we propose a novel computer-vision approach that can be applied to a variety of socio-political contexts, with Turkey functioning here as an illustrative and empirically rich case study. By emphasizing our new technique rather than any single country’s political trajectory, we highlight how this framework can be generalized and adapted to different forms of polarization worldwide. Understanding political polarization, especially as it visibly manifests on the streets, is crucial in today’s global landscape. In the United States, political polarization has led to significant societal divisions, while in India, religious tensions are influencing political discourse. In Brazil, political clashes between supporters of different parties have become more visible and heated. In France, debates over secularism and the integration of Muslim communities have led to social unrest. Similarly, in the United Kingdom, Brexit has deepened visible ideological rifts within society. By examining Turkey’s secular and Islamic divide, this study provides valuable insights into how religious and cultural norms shape social interactions. These findings can inform global strategies to mitigate polarization, offering policymakers and community leaders data-driven approaches to foster social cohesion and bridge ideological divides in increasingly polarized societies.

Previous studies have shed light on how visual cues in political discourse contribute to polarization. Schill¹ highlighted how visual symbols create political unity and high-light differences with opponents. Pauwels² pointed out that the presentation of visual data can shape partisan views. Krogstad³ traced how visual persuasion has historically affected polarization, while Gerodimos⁴ emphasized the need for interdisciplinary research on visual stimuli’s emotional and cognitive impacts. Campante⁵ suggested that less media variety might lead to more uniform but polarized opinions. Simon⁶ saw politicization, often through visuals, as a precursor to polarization, and Barber⁷ associated visual branding with shifts to party-line politics. Together, these studies demonstrate the powerful role of visuals in political polarization, despite not using large-N data.

The main goal of this article is to understand the extent of political polarization in an ideologically diverse country such as Turkey by focusing on an important aspect of religious clothing: the headscarf. The headscarf, varieties of which are also known as the hijab in some other cultures, has been a crucial cultural identity, and—in some cases—a prominent political symbol, of religious women’s lives in Turkey, as it is part of the incarnation of their pious beliefs in Islamic identity⁸. Political Islam plays a crucial part in Turkey’s politics and appears to be one of the whips of polarization. “There is significant tension around the issue of secularism or laicism in the country”⁹. Also, as evidenced by the parliamentary and presidential election results since the 2000s, Turkey’s politics has been characterized by political fragmentation and the main points of contestation are religiousness (the two main groups are political Islamists and laicists) and ethnic identity (again, the two main groups are Turks are Kurds). Recep Tayyip Erdogan, the current president of Turkey, reduced political fragmentation among political Islamists by centralizing power and co-opting key factions into the ruling AKP. He strategically used alliances, such as the one with the Gu¨len movement until their fallout, and later with the Nationalist Action Party (MHP), to consolidate his control, thereby minimizing rival Islamist movements and creating a more unified political front. His political restructuring resulted in a less fragmented political Islamist landscape compared to secular groups and their political formations^10,11. This interesting political landscape and other past studies spanning multiple decades and aspects of political polarization point at the necessity of a large-N study to analyze polarization in Turkish politics.

While these studies underscore the impact of visual cues, the physical and spatial dynamics of social interactions also play a crucial role in political polarization. Proxemics, as originally conceptualized by Edward T. Hall¹², provides a framework for understanding how physical space and social environments reflect cultural dynamics, particularly in culturally diverse societies like Turkey. In this context, religious and non-religious groups occupy distinct social environments—such as different districts and venues—and maintain greater physical distances in public spaces^13,14. This segregation deepens political polarization by reinforcing group identities and reducing cross-group interactions. Recent literature, building on these foundations, shows that such spatial behaviors foster group polarization, where homogeneous environments intensify preexisting beliefs, leading to more extreme views and further entrenching ideological divides^15,16,17.

Recent large-N studies leveraging visual cues have advanced our understanding of political and social polarization. Bucy¹⁸ uses computer vision to analyze visual data from protests and candidate self-presentation, revealing how visual elements reinforce partisan divides. Dietrich¹⁹ employs motion detection in C-SPAN videos, finding that decreased cross-party interactions predict a 14.61 percentage point increase in party-line voting. The closest related literature by Dietrich and Sands²⁰ extends this analysis to racial dynamics, showing pedestrians in New York City maintain a greater distance from Black individuals, averaging 3.1 to 4.4 inches more, translating to a 3.19% to 3.93% deviation of sidewalk width. These studies highlight the critical role of visual data in capturing subtle social and political interactions, providing robust quantitative evidence of how visual cues influence behavior and polarization.

Lastly, there is strong evidence that survey analysis do not always lead to a satisfactory understanding of the political division in the context of Turkey²¹. Then, the question becomes: How do we capture, clean, and deeply and meticulously analyze the extent of polarization? We believe that analyzing the visual data from the streets could be the answer. Pioneering technical approaches used by Dietrich and Sands²⁰ and Wang and his colleagues²² in their detection of pedestrians on the campuses of Fudan University and the University of Pennsylvania have motivated us methodologically.

Drawing from the theoretical background and the current state of the literature, we aim to examine the following two groups of hypotheses in this paper:

H1: Visible religious choices will influence social interactions between people of different levels of religiosity.

H1a: Religious people will tend to walk closer to in-group members (versus the out-group).

H1b: Non-religious people will tend to walk closer to in-group members; however, the effect will be less differentiated versus the out-group because of the effects of political fragmentation.

H2: Visible gender identities will influence social interactions between people of different gender identities.

H2a: People of the same perceived gender will tend to walk closer to in-group members (versus the out-group).

H2b: The influence of gender on social interactions will be more pronounced for females (compared to males) because religiosity is more easily perceived due to the headscarf.

An important theoretical rationale behind H1b is that religious individuals often rely on distinctive outward signals—such as specific attire, beards, or headscarves—that immediately convey group membership. Because these markers are relatively standardized within each religious group, strangers can easily recognize a fellow in-group member. By contrast, non-religious individuals do not share a single visible ‘marker’ of non-adherence; they are more heterogeneous in their appearance. This heterogeneity makes it less likely that secular or non-religious pedestrians will spontaneously identify fellow ‘non-religious’ individuals as members of the same in-group. Consequently, we expect religious individuals on the street to group together at closer distances than non-religious individuals, whose group identity is not readily signaled to observers.

Our second main goal in this paper is to curate a comprehensive visual dataset comprising diverse elements that computational social science researchers can use to measure social interactions among human beings and to develop innovative methods for measuring political polarization. The compiled dataset encompasses over 1400 video clips showcasing human-centric street views from various cities of Turkey, thousands of frames extracted from these clips, and official voting and poll results, collectively representing the socio-political spectrum. The visual dataset we have compiled is the largest thematic street dataset presented in the computational social science literature; it is also the largest dataset collected to understand political polarization in Turkey.

To fulfill this purpose, datasets were meticulously selected, and machine learning models were trained. “Turkey Videos Reservoir” (TVR), a dataset that contains links to human-centric Turkey street YouTube videos, was scraped. By cutting those videos into frames, hand-labeling, and relevant cleaning, a training set (called HN-7450) and a test set (called HN-178413) were generated. These two datasets contain images that depict Turkish pedestrians. YOLOv5, an object detection model (ODM), was carefully selected among potential alternatives. Based on YOLOv5, HN-7450 is used to train Hijab-Net, a series of algorithms that can classify these pedestrians on images into various gender and political spectrums. Based on what was classified, relative distances between pedestrians among different genders and political spectrums were estimated.

In summary, our paper offers improvements on multiple different aspects of the empirical computer vision literature. The following methodological improvements have been explained in different sections of this paper: (1) Finding the Best Object Detection Model to Detect Humans on Streets: we discuss why YOLOv5, instead of other alternatives, was picked to detect pedestrians on the streets; (2) Human Labeling: a process to generate HN-7450 (which resulted in the collection of a training dataset of over 27,000 human objects that was later used to create our classification model called Hijab-Net) so that each pedestrian on each training image was labeled with designated genders and political categories; (3) Custom Object Detection Model based on YOLOv5: the training, analysis, and prediction process using the Hijab-Net model; (4) 3D Distance Estimation: estimation of relative distances based on the output of the Hijab-Net and conversion of 2D distances to 3D; and (5) Empirical Approach and the Presentation of Results: the use of regression analysis to demonstrate the extent of polarization. Figure 1 below shows the detailed data pipeline used in this study.

Discussion on object detection

A review of different object detection models

Since the purpose of this study is to compare the distances between citizens with different levels of religiosity on streets using frames extracted from videos, the first step is to choose a suitable object detection model to accurately detect pedestrians on the streets. This has been achieved by first looking at the existing models and evaluating their performance using various image datasets.

In the realm of machine learning and computer vision, object detection techniques have evolved significantly. Initially, hand-crafted features like Histogram of Oriented Gradients (HOG) were prominent. With advancements in methodology, the empirical literature at large has transitioned to neural networks and witnessed a shift from two-stage to one-stage detection models.

Our methods section delves into the nature of the models evaluated for our project.

The models include:

HOG (Histogram of Oriented Gradients)²³: Analyzing gradient orientations to detect objects, emphasizing shape and structure.
R-CNN Family^24,25,26: A paradigm shift in object detection, addressing localization accuracy and efficient training.
YOLO Family^27,28: Known for real-time detection, performing detection in a single network pass^21,7. YOLOv3 introduced an expansive feature extractor and improved detection capabilities.
RetinaNet²⁹: A single-stage model with Feature Pyramid Networks and Focal Loss for improved detection.
EfficientDet³⁰: Utilizing compound scaling, EfficientNet backbone, BiFPN, and optimized loss functions for enhanced performance.

Object detection models performance evaluation

In this step, three different datasets (each containing 100 randomly selected images) were chosen to evaluate the accuracy of different object detection models. First, having been used by past computer vision (CV) studies, two notable datasets come to the forefront: Cityscapes³¹ and Fudan-UPenn Pedestrian²². The Cityscapes dataset, curated from 50 European cities, presents a rich tapestry of diverse urban scenes at a high resolution of 2048 × 1024 pixels. Each image is intricately labeled down to the pixel level, making Cityscapes an excellent choice for projects requiring comprehensive city scene analysis³¹. On the other hand, the Fudan-UPenn Pedestrian dataset, a collaboration between Fudan University and the University of Pennsylvania, exclusively focuses on pedestrian identification. Cityscapes and Fudan-Upenn Pedestrian dataset will be referred to as the “Berlin” and the “Fudan-Upenn” dataset accordingly. Second, 100 frames edited from 29 videos containing Istiklal Street in Istanbul’s Beyoglu district were selected as the primary dataset (referred to as “Istiklal Dataset”), as this is one of the most bustling streets in Turkey. The first two datasets have been used for testing the models in general, and the Istiklal dataset has been used to ensure that the object detection model is the most suitable one for the specific project.

To select the best performances among the object detection models, YOLOv3, YOLOv4, YOLOv5, YOLOv8, RetinaNet, EfficientDet, Faster R-CNN, and HOG + SVM have been implemented on Berlin, Fudan-Upenn, and Istiklal datasets. The results from both hand labeling and machine detection using different object detection models have been compared using different standards. To be more specific, Euclidean Distance, standard deviation, F1 score, and their respective averages have been calculated and compared. The results are presented in Tables 1, 2, 3, 4, 5, and 6. The mathematical details and meaning of these metrics are discussed in “Appendix C”. Performance Measurements for Benchmarking ODMs.

Table 1 Comparison of different models on Fudan-UPenn Dataset (n = 100).

Subjects

Abstract

Similar content being viewed by others

Quantifying social organization and political polarization in online platforms

Affective polarization and dynamics of information spread in online networks

No echo in the chambers of political interactions on Reddit

Introduction

Discussion on object detection

A review of different object detection models

Object detection models performance evaluation

Data collection and labeling

Collection of unlabeled HN-9000

Manual labeling process of HN-7450: forming and applying the DRC

Descriptive findings from HN-7450

Model building: custom object detection

Models comparison

The implementation of prediction set: HN-178413

Methodology: distance estimation

Review of image distance estimation tools

Relative distance calculation

Actual distance vs. relative distance

Distance score aggregation and normalization method

Empirical approach

Results

Conceptual clarification and methodological framing

Limitations

Conclusion

Data availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

A: Geographical distribution of religiosity

B: Variables in TVR

C: Intercoder reliability score for Model 2, Model 4, and Model 5

D: Performance measurements for benchmarking ODMs

Explanation: features used in benchmarking

Euclidean distance

Normalized standard deviation

F1 score

E: Data lost during distance estimation

F: Logarithmic relationship between area of the bounding box and the depth

G: Discussion on day/night classification

H: Additional model specifications

I: Distance measurement methodology validation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links