Introduction

The advent of automated vehicles (AVs) transforms transportation, promising enhanced safety, efficiency, and convenience. However, the widespread adoption of AVs faces significant hurdles, mainly stemming from public skepticism about their safety and operational reliability1,2,3,4. Central to overcoming these obstacles is trust—specifically, the willingness of individuals to place their confidence in these autonomous systems5,6,7,8. Existing literature reveals that higher trust correlates with an increased likelihood of AV adoption9,10,11.

Despite the significant role of auditory communication, prior research on human AV interaction has largely neglected the impact of voice on trust9,12,13,14,15,16. Emerging literature has started to investigate how verbal explanations provided by AVs can foster trust by clarifying vehicle actions, reducing user uncertainty, and aiding in the development of accurate mental models of AV behavior—elements that are crucial during scenarios requiring human intervention9,13,16,17. These studies primarily focus on the content of the communication rather than the characteristics of the voice itself. Within this context, understanding the influence of voice-gender similarity becomes essential in shaping user trust.

The discrepancy between user preferences for voice similarity and the widespread adoption of default female voices in AI technologies raises essential questions regarding trust dynamics in autonomous vehicles. On the one hand, similarity attraction theory posits that users are more inclined to trust voices that resemble theirs18,19,20. This implies customizing AV vocal outputs to mirror user characteristics should be used to foster trust. On the other hand, widespread AI voice assistants such as Apple’s Siri and Amazon’s Alexa, which are rapidly employed in AVs, often default to female voices21,22. Despite the option to modify these voice settings to match their own, most users choose not to do so, which poses questions regarding the role of similarity voice, trust, and user acceptance of AV technologies23,24.

One explanation for why female voices are widely used in AI technologies could be gender stereotypes and gender-role congruity, which reflect societal norms regarding gender roles and characteristics25. Gender stereotypes are oversimplified beliefs about the attributes and roles typically associated with individuals based on their gender26,27,28. Such stereotypes influence perceptions and behavioral expectations, delineating specific traits considered appropriate for men and women29. Gender-role congruity builds on these stereotypes, suggesting that individuals—or AI voices—are evaluated more favorably when their traits and roles align with traditional gender norms, while deviations may lead to negative perceptions or reduced trust30. For example, traditional stereotypes portray men as assertive, competitive, and logical, while women are seen as nurturing, empathetic, and emotional31. These preconceived notions reinforce gender-role congruity, impacting various facets of life, including career choices, interpersonal relationships, and social interactions, as well as the design of AI voice technologies that align with these established expectations.

Understanding the influence of voice similarity and gender stereotypes may be particularly complex when we consider that trust has cognitive and affective components32. Cognitive trust arises from logical assessments that differentiate between dependable and unreliable agents, whereas affective trust derives from emotional connections, encapsulating care and concern within interpersonal relationships32,33. A voice that aligns with a user’s identity may enhance affective trust through emotional resonance yet may simultaneously lack the authoritative qualities necessary to bolster cognitive trust. Gaining clarity on these dynamics is essential not only for understanding trust interactions within AV frameworks but also for informing the design of future technologies that prioritize user confidence.

This research endeavors to address two central objectives. Firstly, we aim to investigate whether the gender similarity between human users and the voices of AVs influences the development of cognitive and affective trust. Secondly, we will examine how users’ perceptions of the AV’s role, along with existing gender stereotypes, moderate the relationship between voice gender similarity and trust. We conducted a randomized experimental study through an online survey platform to pursue these objectives. Results show that voice gender similarity only impacted affective trust, and its impact was moderated by gender role conference. Likewise, voice gender does not affect cognitive trust. These insights illuminate how voice gender characteristics and underlying gender stereotypes affect trust in AVs, enhancing our understanding of design-related factors that either reinforce or challenge societal biases.

Related work

Voice-based explanations and trust in AVs

Voice-based explanations are essential for cultivating trust in AVs as they explain the vehicles’ decision-making processes, thereby reducing user uncertainty and enhancing the predictability of AV behavior9. By improving predictability and com- prehensibility, these explanations enable users to develop accurate mental models of AV operations—an essential factor for informed decision-making and trust establishment13. Research indicates that when users are provided with clear, contextually relevant explanations of an AV’s actions, their confidence in the system increases17. Furthermore, studies demonstrate that transparent reasoning enhances user understanding and contributes to a sense of agency and control, further reinforcing trust16.

Previous research has largely focused on auditory and visual modalities for delivering explanations, with a predominant emphasis on auditory explanations. However, the specific characteristics of these auditory explanations, particularly the attributes of the voice used, have received limited attention. For example, Du et al. (2019) utilized a male voice with a standard American accent in a simulated AV environment, while Körber et al. (2018) and Forster (2017) employed a female voice actor’s natural voice9,13,16. Ruijten et al. (2018) also allowed participants to select either a male or female voice for the AV interface14. Our lack of knowledge on the influence of voice gender on human-AV interaction hinders our ability to improve AV design and user trust. Examining the impact of voice gender will contribute to the theoretical understanding of human-AV interaction and yield practical implications for improving user acceptance and satisfaction in autonomous technologies.

Voice gender and preference

Humans naturally anthropomorphize AI systems, attributing human-like traits to AVs and other AI-driven technologies34,35,36. Even minimal social cues, such as voice, lead users to associate characteristics like gender, age, and personality with technology37,38. As voice becomes increasingly integrated into AI interactions, its design plays a significant role in shaping user experiences39.

Voice gender is a highly debated element of voice design in AI technologies. Many voice assistants, such as Siri, Alexa, and Cortana, default to female voices, often drawing on anecdotal evidence suggesting that users across diverse cultures generally prefer female voices40,41,42. Female voices are frequently associated with warmth, gentleness, and cooperation, reflecting traditional gender stereotypes43. Dong (2020) found that drivers tend to favor female voices in automated vehicles (AVs), likely due to their familiarity with female voices in navigation systems44.

However, the universal adoption of female voices in AI applications is increasingly being scrutinized. Research indicates that user preferences for voice gender often depend on the perceived function of the AI system43,45. This suggests that users apply social norms from human interactions to their expectations of AI behavior, influencing their perceptions based on gendered traits.

Voice gender similarity and trust in AVs

User gender has been shown to significantly influence preferences for the gender of AI-generated voices, including those in AVs. Research suggests that individuals tend to prefer voices that match their own gender, a phenomenon explained by similarity-attraction theory18. This theory posits that people are more likely to trust, feel comfortable with, and be positively influenced by individuals who share similar attributes, including voice characteristics19,20. Applied to AVs, this suggests that users may develop stronger trust in an AV with a voice that aligns with their own gender, as gender similarity may create a sense of familiarity and reliability.

Existing studies on voice preference and trust in AI provide empirical support for this idea. Research indicates that male users tend to prefer male voices, whereas female users show a stronger inclination toward female voices in AI interactions37. Lee et al. (2000)46 found that female participants were more likely to agree with and trust a female Text-to-Speech (TTS) voice, while male participants exhibited a similar preference for male TTS voices. This alignment between user gender and AI voice gender has also been observed in human–robot interaction studies, where participants rated robots with gender-matching voices as more acceptable and psychologically closer47. These findings suggest that gender similarity in AV voices may impact on user trust, leading to higher engagement and acceptance of automated systems.

Gender stereotypes, gender-role congruity and voice preference

Gender stereotypes significantly influence user perceptions of AI-generated voices. Research categorizes gendered traits into two main groups: communal and agentic attributes26,28,48. Communal characteristics—such as warmth, empathy, and support- iveness—are generally associated with women, while agentic traits—such as assertiveness, authority, and confidence—are more frequently linked to men30. Understanding these associations is crucial for designing AI voice interfaces that align with user expectations and enhance user engagement.

Building on these stereotypes, gender-role congruity reflects the extent to which an individual’s traits, behaviors, or roles align with established societal expectations. Gender role congruity theory, proposed by Eagly and Karau (2002)30, posits that individuals are evaluated more favorably when their characteristics and roles conform to traditional gender norms. Conversely, deviations from these expectations may lead to prejudice or negative evaluations. This framework is particularly relevant in AI voice design, where users demonstrate clear preferences for voices that match stereotypical gender roles. For example, research indicates that male voices are favored in roles demanding authority and expertise, such as financial advising or technical support, while female voices are preferred in assistance-oriented roles, such as customer service and navigation43,45. These preferences reflect broader societal expectations, where women are often perceived as nurturing problem-solvers, whereas men are viewed as authoritative decision-makers49,50,51.

Studies also suggest that voice gender influences perceived dominance in AI interactions. Female-voiced computers are often seen as less authoritative and serious compared to male-voiced systems when delivering evaluations52. Additionally, regardless of an AI system’s visual appearance, users prefer voices that align with traditional gender-role expectations, favoring male voices in leadership and technical roles and female voices in supportive or caregiving roles53 These findings highlight the importance of voice gender in shaping perceptions of authority and effectiveness in AI interactions, underscoring the need for careful consideration of voice selection in the design of AI systems to align with user expectations and desired functionalities.

In AVs, voice preference often depends on the vehicle’s perceived function. Research suggests that male voices are perceived as more credible when delivering factual information about the AV’s actions and surroundings, whereas female voices are preferred for social engagement, such as offering reassurances or addressing user concerns20. Yet, the relationship between voice gender similarity and gender-role congruity in AV interactions remains unexplored. Specifically, the impact of gender similarity (matching the AV’s voice to the user’s gender) and gender-role congruity (aligning voice gender with the AV’s perceived function) requires further investigation. Addressing these gaps could improve AV design, enhancing both user experience and trust.

Cognitive and affective trust

Previous studies have explored the link between trust in AVs and providing explanations. Utilizing various trust theories, researchers like Forster et al. (2017)16 and Du et al. (2019)9 discovered that explanations significantly enhance trust across different dimensions, such as performance, process, purpose, and dependability, especially when delivered before the AV takes action. Yet, contrasting findings by Hatfield (2018)15 suggest that transparency does not impact trust during specific moral dilemmas. The essential role of trust in AV research is well-acknowledged, but further examination is needed to nurture trust in these vehicles.

For instance, the current literature lacks a comprehensive examination of the impact of AV explanations on trust from both cognitive and affective perspectives. This presents a theoretical challenge because trust in AVs has mostly been studied from a cognitive standpoint, focusing on logical reasons for trust, while overlooking the emotional engagement individuals may have with AVs6,54,55. The literature on interpersonal relationships highlights the importance of distinguishing between cognitive and affective trust32,33, where the former is based on rational evaluations of trustworthiness and the latter on emotional connections. Recent work by Lee et al. (2022)56 sought to bridge this gap by investigating trust in AVs through both cognitive and affective lenses. The study revealed that employing politeness strategies in AVs enhanced both forms of trust, with particular implications for fostering affective trust. The findings underscore the significance of considering both cognitive and affective trust in AVs, providing a pathway for promoting a positive human–AV relationship through politeness strategies. This emerging focus on both cognitive and affective trust represents a promising direction for enhancing user trust in AVs, suggesting the need for a more holistic approach in future research to fully comprehend and bolster trust in automated driving systems.

Hypothesis development

This study examined the impact of gender congruence between AV explanatory voices and human listeners. It also explored the potential moderating effects of gender-role congruity on this relationship. Our main goal was understanding how these factors influence cognitive and affective trust in AV contexts. The research framework was grounded in the “Computers Are Social Actors” (CASA) paradigm, similarity attraction theory, and role congruity theory, which together clarify the complex interplay between these elements and their implications for trust formation.

The CASA paradigm innovatively reconceptualizes the human-computer relationship, portraying computers not as inanimate tools but as impactful social actors possessing attributes like agency, personality, and social presence34,35,57,58,59. Among the interactions between humans and computers, voice, especially its gendered aspect, emerges as a pivotal component in shaping user experiences and perceptions38,39,60. Similarity attraction theory posits that individuals naturally gravitate toward entities that resemble themselves, fostering trust and stronger connections61,62. Cognitive trust in AVs is largely contingent on user perceptions and past experiences of reliability32,33. By adopting a gender-congruent AV voice, users may be more inclined to seek AV-related information, thus creating a positive feedback loop where trust and similarity mutually reinforce each other. This could significantly bolster cognitive trust in autonomous vehicles, ultimately paving the way for more interactive and dependable human–AV interactions.

H1

Matching the AV voice gender to the user’s gender enhances cognitive trust in the AV.

Affective trust, a foundation of emotional connection and shared concern, is deeply rooted in feelings and sentiments32,33. When users perceive the gender of AV voices as concordant with their own, it should engender a heightened sense of comfort and emotional security, thereby nurturing affective trust. This resonance should trigger a cascade of positive emotions, fostering a connection that cultivates trust and makes users feel genuinely understood and represented. This sense of validation can significantly enhance users’ engagement with and reliance on AVs, forging deeper bonds between humans and autonomous vehicles.

H2

Matching the AV voice gender to the user’s gender enhances affective trust in the AV.

Building upon the role congruity theory, which posits that social groups receive more favorable evaluations when their attributes align with traditional roles, this research delves into the impact of gender stereotypes and the perceived congruence between an individual’s gender and that person’s role30,63,64,65. In the context of AV interactions, we propose that the concept of gender-role congruity can moderate the effects of gender similarity on both cognitive and affective trust. More specifically, when there is an incongruity between the perceived gender of an AV and its assigned role, this incongruence may diminish the AV’s perceived competency, thereby tempering the influence of gender similarity on cognitive trust.

Moreover, we predict that affective trust may wane when there is a mismatch between the perceived role of the AV and its voice characteristics. This incongruity could lead to negative emotional responses and, consequently, a reduction in affective trust despite gender similarity63,64,65. We propose that gender-role incongruity has the potential to dampen the trust amplification that typically accompanies gender similarity, particularly when contrasted with scenarios where gender roles are congruent.

H3

The impact of voice gender similarity on (a) cognitive trust and (b) affective trust depends on gender-role congruity. More specifically, when the gender of the AV voice does not align with the perceived role of the AV, this weakens the impact of either type of trust, leading to a less pronounced difference in trust between groups with gender similarity and dissimilarity.

Methods

This study received approval of exemption from the institutional review board in compliance with the ethical guidelines of the American Psychological Association. All research was performed following relevant guidelines and regulations. Informed consent was obtained from all participants.

Participants

Our study, conducted via an online survey, involved 333 U.S. drivers recruited from CloudResearch’s diverse participant pool. Seven participants, who either preferred not to disclose their gender (n = 3) or identified as non-binary (n = 4), were excluded from the analysis, as this research stage focused on male and female gender categories. This resulted in a final sample of 326 participants who self-identified as female or male. The demographic distribution was nearly gender-balanced, with 160 females and 166 males. Participants were further categorized into two age groups—younger drivers (18–25 years) and older drivers (55 years and older)—following established classifications from previous research66,67. Among the females, 75 were younger adults with an average age of 22.52 years (SD = 3.09), and 85 were older adults with an average age of 62.15 years (SD = 5.37). For the males, 88 were younger adults with an average age of 22.52 years (SD = 4.85), and 78 were older adults with an average age of 61.83 years (SD = 6.16).

To ensure the delivery of high-quality data, we implemented two key measures. Primarily, we shortlisted workers who had demonstrated a high performance in prior tasks, reflected by a minimum 95% approval rating and the successful completion of at least 1,000 approved tasks. Additionally, we incorporated two attention-check questions within the survey to prevent rushed or inattentive responses. We also incorporated eligibility screening to verify participants’ suitability for our study. We ensured that they held a valid driver’s license, had no visual or auditory impairments that could affect the outcome, and used devices that could play audio content. Upon completing the survey, which generally took 25–30 min, participants received $5 compensation.

Study design

This study employed a randomized between-subjects design, anchored on two factors: gender similarity and gender-role congruity, each with two levels, rendering a 2 × 2 experimental design. The aim was to delve into the impact of gender similarity between participants and AV explanatory voices, and the moderating effect of gender-role congruity on cognitive and affective trust.

Independent variables

This study examined two main independent variables: gender similarity and gender-role congruity between humans and AV explanation voices. For gender similarity, participants were grouped into similarity and dissimilarity categories based on whether their gender-matched that of the AV voice they heard.

To assess gender-role congruity, we examined participants’ perceptions of the AV’s role, classifying it as either “driving assistant” or “driving supervisor.” Participants indicated their perceived AV role using a slider scale, where sliding to the left represented “driving assistant” and sliding to the right represented "driving supervisor." These role perceptions were then matched with the gender of the AV voice according to common gender stereotypes (i.e., male for driving supervisor, female for driving assistant)49,50,51. Those perceiving the AV as a driving assistant and hearing a female voice were placed in the gender-role congruity group. In contrast, those encountering a male voice in the same scenario were placed in the gender-role incongruity group. The distribution across gender similarity and gender-role congruity groups is displayed in Table 1.

Table 1 Experimental design and participant distribution.

In our research, we utilized two text-to-speech platforms, Murf and Uberduck, to generate AV voices with a standard American accent and distinct demographic characteristics 68. To neutralize any interaction between participant age and the perceived age of the AV voice (e.g., Age stereotype), we produced four voices: two genders (male and female) across two age brackets (younger and older). Participants were purposefully matched with voices reflecting varied age demographics to mitigate bias. Using Murf, we crafted “younger” personas: Natalie (young female) and Nate (young male). Conversely, “older” voices—Charlotte (older female) and Jim (older male)—were generated via Uberduck. Since pitch and tone naturally vary between genders and serve as key characteristics that help listeners identify voice gender69,70,71, these attributes were inherently tied to the gender variable. These voices were paired with related video scenarios using the CapCut editor for participant immersion, ensuring accurate alignment between spoken content and on-screen actions.

Dependent variables

This study evaluated two trust-related dependent variables: cognitive trust and affective trust. An overview of all utilized questionnaires is provided in the SI Appendix Table S1.

We implemented a seven-item measure adapted from McAllister’s 199533 and Lee’s 202256 studies to assess cognitive trust in AVs. To suit the context of AVs, we modified these items accordingly. Participants were asked to rate each item on a 7-point Likert scale, with 1 denoting “strongly disagree” and 7 signifying “strongly agree.” We evaluated affective trust via a questionnaire based on McAllister’s33 and Lee’s56 affective trust questionnaires. The participants were instructed to rate each item on a 7-point Likert scale from 1, indicating strong disagreement, to 7, implying strong agreement.

Control variables

The study included several control variables, including participants’ familiarity with AVs (e.g., "How much have you heard about self-driving cars?"), their experience with voice agents (e.g., "How often do you use a voice assistant such as Siri, Google Assistant, Cortana, Alexa, Bixby, or an in-car voice agent?"), as well as their age and gender. These variables were considered due to their demonstrated influence on individuals’ trust and perceptions of AVs and voice design, as evidenced by previous studies3,72,73,74.

Study procedure and scenarios

Participants were recruited through the Connect platform on CloudResearch and directed to an online Qualtrics survey. Before beginning the study, participants were briefed on its objectives and instructed to ensure their devices had the necessary audio and visual capabilities. Upon providing consent, participants were introduced to the AV’s functions, emphasizing its autonomous capabilities, adherence to traffic laws, and ability to adapt to different routes. They were informed that the self-driving car was designed for full automation, meaning it would perform all critical driving tasks and monitor roadway conditions throughout the trip. While participants, acting as “drivers,” would provide destination or navigation input, they were not expected to take control of the vehicle at any point during the trip. A preliminary questionnaire was administered to capture participants’ general perceptions of AVs and experiences with voice agents.

The study comprised six video scenarios showcasing AV responses in varied driving contexts (urban, highway, and rural), each accompanied by a detailed “what + why” narrative from the AV voice, explaining the AV’s actions and rationale. The SI Appendix Table S2 provides further insight into the what + why narratives for each scenario. Figure 1 illustrates one scenario titled “Oversized Vehicle Ahead.” After each viewing, participants completed surveys evaluating their affective and cognitive trust in the AV. Then, participants filled out demographic data so we could ensure participant diversity and explore demographic impacts on AV perceptions.

Fig. 1
figure 1

A video screenshot for “Oversized Vehicle Ahead” scenario.

Results

Reliability and construct validity

This study examined construct validity and reliability to ascertain the accuracy and consistency of the utilized measurement constructs. Construct validity, delineating how well a scale embodies the intended concept75, encompasses convergent and discriminant validity, assessed here through exploratory factor analysis. Convergent validity, demonstrated by factor loadings of 0.70 or higher on corresponding constructs, and discriminant validity, evidenced by loadings of 0.35 or lower on unrelated constructs76, were both adequately achieved or exceeded by all scale items. The only exception was one item in the cognitive trust question set, which had a loading of 0.63. Construct reliability, indicative of internal consistency among scale items77, was gauged using Cronbach’s alpha78,79. All constructs demonstrated a reliability score meeting or exceeding the accepted threshold of 0.70, thus validating their reliability.

Hypothesis testing

The hypothesis was tested with a sample of 326 participants using linear mixed models to analyze the impact of gender similarity and gender-role congruity on cognitive and affective trust. This approach allowed us to identify significant differences in mean values across independent groups based on these factors. Participants, along with control variables (i.e., AV familiarity and voice agent experience), were treated as covariance to account for potential non-independence within the models. All analyses were performed using IBM SPSS 28.0, with a significance level set at an alpha level of 0.05. To control for Type I error in multiple comparisons, a Bonferroni correction was applied to all post hoc analyses.

Manipulation check

In the manipulation check, participants were tasked with identifying the perceived age and gender of the voice in each video post-viewing. The results affirmed the effectiveness of our AV voice manipulation. Notably, a significant distinction in voice age perception between younger and older voice groups was recorded (F(1, 1750) = 1699.550, p < 0.001), validating the age-based voice manipulation. Likewise, participants accurately identified voice gender (F(1, 1954) = 25,089.650, p < 0.001), endorsing the gender-based manipulation. These outcomes collectively substantiate the reliability of our manipulations within the study.

Gender similarity on cognitive trust

As shown in Table 2, there were no significant differences between the gender similarity group (mean = 5.205) and the dissimilarity group (mean = 5.125) in cognitive trust in AVs, t(1,1950) = 1.792, p = 0.073. The means of both groups are approximately 5, indicating that participants “somewhat agree” that they trust the AV cognitively. Therefore, H1 is not supported.

Table 2 Summary of generalized linear model results on the impact of gender similarity.

Gender similarity on affective trust

As shown in Fig. 2 and Table 2, gender similarity has a significant impact on affective trust, t(1,1950) = 4.405, p < 0.001. Specifically, post-hoc comparisons revealed that participants had higher affective trust in the AV when the vehicle’s voice matched their gender (mean = 3.663) compared to the dissimilarity group (mean = 3.410). In other words, when participants heard a voice dissimilar to their gender, they tended to report lower affective trust (closer to “somewhat disagree”). In contrast, the similarity group expressed more neutral affective trust (closer to "neither agree nor disagree"). Thus, H2 is supported. For effect size in multilevel models, we follow the approach by Snijders and Bosker (2012) to compute the proportion of variance in affective trust explained by the covariates in our model80,81. The effect size falls in the medium-to-large range (0.14)82.

Fig. 2
figure 2

The average scores of affective trust between gender similarity/dissimilarity groups.

Gender similarity and gender-role congruity on cognitive trust

To examine the potential moderating effect of gender-role congruity on cognitive trust (H3a), we analyzed the interaction between gender similarity and gender-role congruity. The results revealed a significant interaction effect on cognitive trust, t(1,1948) = 2.734, p = 0.006, with a moderate effect size (0.10), as shown in Table 3. Thus, H3a was supported. Further analysis indicated that the effect of gender similarity on cognitive trust was present only in the gender-role congruity group but not in the incongruity group. For visualization purposes, standardized values are used in Fig. 3a,b. However, all statistical analyses and reported means reflect raw data to maintain interpretability. As shown in Fig. 3a, in the gender-role congruity condition (blue line in Fig. 3a), participants who shared gender similarity exhibited significantly higher cognitive trust (mean = 5.284) compared to those in the dissimilarity condition (mean = 5.068, p < 0.001). However, in the gender-role incongruity group (red line in Fig. 3a), gender similarity had no significant impact on cognitive trust. There was no meaningful difference between the similarity group (mean = 5.140) and the dissimilarity group (mean = 5.190, p = 0.447).

Table 3 Summary of generalized linear model results on the impact of gender similarity and gender-role congruity.
Fig. 3
figure 3

Effect of two-way interaction between gender similarity and gender-role congruity on trusts.

Gender similarity and gender-role congruity on affective trust

The relationship between gender similarity and affective trust appears to be shaped by gender-role congruity. Our analysis revealed a significant interaction effect between these factors, t(1,1948) = 5.349, p < 0.001, with a large effect size (0.22) (see Figure 3b, Table 3). Thus, H3b was supported.

Specifically, gender similarity positively affected affective trust, but only in the gender-role congruity condition. In this group (blue line in Fig. 3b), participants who shared gender similarity exhibited a substantially higher level of affective trust (mean = 4.026) compared to those in the dissimilarity condition (mean = 3.410, p < 0.001). Conversely, when gender roles were incongruent (red line in Fig. 3b), gender similarity had no measurable impact. Trust levels remained statistically similar between the gender similarity (mean = 3.361) and dissimilarity groups (mean = 3.402, p = 0.625).

Summary of the results

This study examined the impact of gender similarity between human users and the perceived gender of the AV’s voice on cognitive and affective trust, as well as the moderating role of gender-role congruity. Notably, gender similarity did not significantly impact cognitive trust, disconfirming H1. In contrast, a different trend was observed for affective trust, with gender similarity leading to higher affective trust than the gender dissimilarity group, supporting H2. Furthermore, gender-role congruity moderated the impact of gender similarity on cognitive and affective trust. When the AV voice’s perceived gender aligned with its expected role, gender similarity led to significantly higher cognitive and affective trust compared to gender dissimilarity, supporting H3a and H3b. However, the effect was much more profound for affective trust than cognitive trust. The following section explores these findings in greater depth, discussing their scholarly contributions and the study’s limitations.

Discussion

Despite the importance of auditory communication, the influence of voice characteristics on trust in human-AV interaction remains largely unexplored9,12,13,14,15,16. This study investigates how voice gender similarity impacts cognitive and affective trust in AVs and how gender congruity moderates this impact. In doing so, this study addresses the discrepancy between voice similarity preferences and the prevalence of female AI voices, raising important questions about trust in AVs.

Gender similarity, gender-role congruity, and cognitive trust

Cognitive trust is built upon rational assessments, where users evaluate an AV’s trustworthiness based on competence, predictability, and alignment with expectations32,33. Contrary to similarity-attraction theory, which suggests that individuals are more likely to trust those who resemble them61, our findings indicate that gender similarity alone does not significantly impact cognitive trust. This suggests that in trust judgments based on reasoning and evidence, social identity cues such as voice gender are not primary determinants of cognitive trust formation.

However, our findings highlight the critical moderating role of gender-role congruity. When the AV’s voice gender aligned with its expected role—such as a male-voiced AV in a supervisory driving role or a female-voiced AV in an assistance role—gender similarity led to higher cognitive trust. This supports role congruity theory, which posits that congruence between gender and expected roles enhances credibility and positive evaluations30,63,64,65. In such scenarios, users likely perceived the AV as more competent, appropriate for its function, and trustworthy due to the reinforcement of societal role expectations.

Conversely, when the AV’s voice gender was incongruent with its expected role, the effect of gender similarity on cognitive trust disappeared. Even when users and AVs shared the same gender, trust was not enhanced when the AV’s voice contradicted gender-based role expectations. These findings align with role congruity theory, suggesting that AVs that deviate from traditional gender norms are often perceived as less suitable for their assigned roles64,65. The disappearance of gender similarity effects in these conditions underscores the dominance of role expectations over demographic similarity in cognitive trust formation.

Gender similarity, gender-role congruity, and affective trust

Unlike cognitive trust, affective trust is emotionally driven, influenced by perceptions of comfort, familiarity, and social bonding32,33. In contrast to cognitive trust, our findings indicate that gender similarity alone significantly influences affective trust, suggesting that shared social identity cues can enhance emotional engagement. Participants reported higher affective trust when interacting with an AV voice that matched their gender, aligning with similarity-attraction theory18. This finding supports prior research indicating that social identity cues, such as shared gender, facilitate emotional connections and reduce psychological distance, making interactions feel more natural and engaging61,62.

Gender-role congruity also played a moderating role in affective trust formation. When the AV’s voice gender was aligned with its perceived role, gender similarity further enhanced affective trust. This suggests that traditional gender-role expectations shape not only rational evaluations (cognitive trust) but also emotional responses (affective trust) in human-AV interactions30,63,64,65. In these congruent conditions, users may have felt greater emotional ease and social resonance with the AV, reinforcing trust through similarity and role alignment. However, in gender-role incongruent conditions, the positive effect of gender similarity on affective trust disappeared. Users did not exhibit increased affective trust when the AV’s voice gender contradicted stereotypical role expectations, suggesting that violations of social norms disrupt the emotional connection that similarity might otherwise foster. This aligns with research demonstrating that gender-role incongruity can diminish social engagement and emotional resonance, leading to lower trust or neutral perceptions20,53. These findings highlight that while gender similarity alone can enhance affective trust, its impact is conditional on broader social role expectations—when violated, the emotional bond facilitated by similarity is weakened or nullified.

Design implications

This study provides valuable insights into AV voice design by examining how gender similarity and gender-role congruity affect trust in human-AV interactions. The findings suggest that the impact of aligning an AV’s voice gender with the user’s gender primarily depends on gender role congruity. When voice gender similarity aligns with gender role congruity, it enhances cognitive and affective trust in the AV. However, if voice gender similarity does not correspond with gender role congruity, it increases only affective trust, leaving cognitive trust unchanged. This may help explain why AV gender voices are not always designed to match the user’s gender and why users do not consistently adjust to align with their voice.

These results highlight the tension between leveraging existing gender role expectations to promote trust in AV while avoiding reinforcing gender stereotypes. While users may prefer voices that match traditional gender roles, designing solely based on these preferences risks perpetuating biases. Instead, AV developers should offer customizable or gender-neutral voice options, allowing users to choose a voice that best suits their preferences rather than defaulting to male or female voices. Beyond customization, AVs could implement adaptive trust-building strategies that tailor voice characteristics based on user interactions rather than relying on static gender assignments. Reducing reliance on gendered voices altogether—by incorporating non-anthropomorphic auditory cues such as tones or visual indicators—could also help shift the focus from gender expectations to functional communication.

Moreover, instead of assuming that certain roles require male or female voices, AV designers can develop voice characteristics that adapt based on situational demands, ensuring that trust is built on role functionality rather than gendered expectations. A more transformative approach involves rethinking the role of AI voices within AVs altogether. Rather than positioning the AV as a subordinate assistant or authoritative decision-maker, a collaborative AI framework could redefine the human-AV relationship as a partnership rather than a hierarchy.

In sum, AV voice design should take a multifaceted approach that considers gender and role expectations where beneficial to enhance trust while integrating design strategies that mitigate the reinforcement of societal biases. Moving beyond rigid gender-based frameworks and embracing adaptive, user-driven, and bias-conscious design choices will enable AV systems to be both trustworthy and socially responsible. This shift not only enhances the quality of human-AV interactions but also fosters a more inclusive and ethically designed AI future, where trust is established through functionality and contextual relevance rather than traditional gender norms.

Limitations and future research

This study has several limitations that merit discussion. First, the participant pool was recruited online and may be accustomed to the format of online studies and therefore might not reflect the wider population’s familiarity with the AV domain. Second, although our experimental setup ensured robust internal validity, the external validity might be circumscribed, underlining the need for subsequent field studies to enhance generalizability. Third, the potential for hypothesis guessing among participants could have affected their responses, although no evidence supporting this was identified.

Moreover, this study focused solely on the gender dimension of voice characteristics, leaving other factors such as personality, age, accent, and race/ethnicity unexplored. Additionally, the study only considered binary gender categories (male and female). In contrast, future research should include non-binary and diverse gender identities to support a more inclusive and comprehensive understanding of AV voice design83. Furthermore, this study did not examine other aspects of AV explanations, such as how explanations are defined and generated or how alternative courses of action are evaluated, which presents an opportunity for further research.

To conclude, a more comprehensive understanding of AV voice design and explanations requires future studies that explore a wider range of voice characteristics, user identities, and explanatory attributes to create more effective and inclusive AV systems.

Conclusion

In this investigation, we explored the interplay between gender similarity and gender-role congruity in engendering both cognitive and affective trust. To the best of our knowledge, this is a pioneering effort to understand the influence of voice characteristics on the efficacy of explanations within the sphere of Level 5 AVs. Our results emphasize the importance of aligning the gender of the AV explanation voice with that of human users, and its synergy with gender-role congruity, in optimizing the effectiveness of AV explanations vis-à-vis cognitive and affective trust. Collectively, our study provides a more detailed understanding of the factors influencing AV explanation efficacy, offering critical insights that are set to significantly inform the design of future AVs.