Introduction

In 2017, the science behind the workings of the human voice became a topic of intense and heated discussion as Silicon Valley moguls realized that speech recognition software could be greatly improved by producing emotionally appropriate, human-sounding speech in replying to our queries. Consequently, some of the world’s most powerful and richest companies, such as Google, Apple, Amazon, Yamaha, and Microsoft, are in hot pursuit of fine-tuning voice-emotion software that will continue to deepen our relationships with our computers (Colapinto 2021: 95–6). Researchers like Bjorn Schuller, who work in the field of speech emotion recognition, strive to imbue computer speech with the natural variation in pitch and volume that humans use to give color, animation, expression, and personality to their spoken utterances,Footnote 1 with the end goal of teaching computers to teach themselves the highly nuanced emotional prosody of speech (Colapinto, 2021: 99).

If the spoken voice is colorful, animated, and expressive, then the sung voice is even more so. Singing voices have more complicated melodic and rhythmic patterns than speaking voices, requiring substantially more information to teach duration and pitch in AI learning (Ren et al. 2020).Footnote 2 And yet, while there has been considerable interest in music and virtuality globally,Footnote 3 many of the most prevalent and enduring computer voices encountered today are not the sung voices of virtual idols like the Japanese Hatsune Miku,Footnote 4 but rather the spoken voices of AI vocal assistants such as Siri, Alexa, and Google Assistant. Moreover, those spoken voices are also usually gendered female, particularly in the United States (Potter, 2011, Griggs, 2011).

The apparent preference for fine-tuning spoken over sung voices in AI research may be partly due to difficulties in negotiating the hidden cultural stereotypes that have been unwittingly and unquestioningly associated with the sung female voice and not simply to the technological challenges in producing the more emotionally nuanced and complex sung voices. Rachel Reinke reminds us that unexamined cultural stereotypes about women often become normalized over time (2010, 164), and the enduring nature of the stereotypes regarding female singing voices is a case in point. As communications theorist James W. Carey states, technology must be seen “less as a physical contrivance than as a cultural performance: more on the model of a theatre that contains and shapes our interaction than a natural force acting upon us from the outside” (247). Carey goes on to explain that “when technology functions as a master symbol, it operates not as an external and causal force but as a blueprint: something that makes phenomena intelligible and through that intelligibility sets forth the conditions for its secondary reproduction” (247). Consequently, this preliminary study is not based on field research but rather on presenting examples of technological blueprints that continue to foster implicit, longstanding stereotypes about female singing voices that have become normalized.

What, then, are the performances, blueprints, and underlying narratives that speak to the otherness of the female singing voice in Western European and North American cultures? Building on Carey’s premise about technology-as-blueprint for replicating cultural narratives (1990) and an argument made by Safiya Noble in her groundbreaking book about racist undertones inherent in search engines (2018), I suggest that there are hidden misogynist narratives lurking in certain YouTube videos that feature the singing voices of robotic women. I concur with Leslie Dunn and Nancy Jones, who assert the following: “As a material link between body and culture, self and other, the voice has been endlessly fascinating to artists and critics. Yet it is the voices of women that have inspired the greatest fascination, as well as the deepest ambivalence, because the female voice signifies sexual otherness as well as a source of sexual and cultural power” (1994: back cover). This paper claims that female singing voices in AI engender even more potent and unfiltered manifestations of sexual otherness and power than spoken voices. Indeed, singing voices amplify cultural narratives about women to a highly charged emotional level that perpetuate misogyny. Beginning with an overview of five examples that feature the historical iconicity of sung female voices in Western European and North American musical traditions, this paper introduces three short examples highlighting the insufficiently examined cultural narratives about female voices that endure in AI singing.

Selected historical examples of female vocality: sirens, castrati, divas, recorded female voices, and pop singers

One would be hard pressed to find an icon more intimately associated with the power of the feminine singing voice than that of the siren. The siren’s representation through millennia of Western siren lore suggests that “Hers is the power to force men to listen, to abandon themselves against rational judgment to the insubstantial pleasures of things heard” (Austern, 2006: 53). And, whereas vision has often been linked with male dominance, hearing has been linked to the seductive power of the female voice. As Austern and Naroditskaya remind us, “The siren and her sisters may therefore be creatures whose vocal beauty obscures the perils and dangers of embodied union, serving as a metaphor against trusting the ear above the eye” (2006: 4–5).

In addition to the mythical power of the siren’s voice in the legends from Western classical antiquity, the narrative about Eve-as-femme fatale in Elizabethan writings only compounds the threatening seductive power of the female voice, particularly in its sung form. Due to the common seventeenth-century interpretation of Paul’s admonition about women keeping silent in church (I Corinthians 14:34–35), Linda Phyllis Austern claims that “the cardinal virtue of public female silence was so firmly entrench ed that the soprano and alto voices of the church choir belonged exclusively to males, lest the sound of Eve’s descendants lure Adam’s fallen sons from their devotion to God” (1996: 84). Music was often gendered female in the Elizabethan period and was sometimes perceived as an insult to masculine rationality because of its sensorial and affective power (Austern, 1993: 347). Because the singing voices of women could arouse the senses “with a rhetoric far more powerful than speech and in a more lingering manner than vision” (Austern, 1996: 90), women’s participation in musical and theatrical performance was greatly restricted in Elizabethan England, and often relegated to private performances. The rise in importance of the boy actor and the boy soprano in England was designed to address the apparent need to remove the female body and voice from public stages, replacing them with the treble sound from young, prepubescent male bodies.

Other countries in seventeenth century Europe also sought to contain the power of the feminine voice and body. For example, the castrato tradition in Italy owed its existence to the demand for soprano and alto voices in church choirs. Since prepubescent boys’ careers as treble singers were limited because of the inevitable onset of puberty, castrating young boys to preserve their treble voices offered a solution that, despite its barbarity, was somehow deemed preferable to allowing female voices and bodies to be heard and seen on public stages. Although a thorough discussion of the castrato tradition in liturgical and operatic performances from the mid-seventeenth through the eighteenth centuries is well beyond the scope of this paper,Footnote 5 my purpose in raising the issue of supplanting female singers with castrati is to underscore the atrocity inherent in mutilating young male bodies when female voices could have been permitted. While one could argue that the popularity of the castrato figure in seventeenth- and eighteenth-century operas was due to their unique virtuosity, breathtaking coloratura, and overall contribution to the zeitgeist of the Baroque era, their very existence was also continually threatened by the Roman Catholic Church, whose official position was to excommunicate anyone performing such an operation on a child (Prest, 2006: 129–131). The renowned mezzo soprano Cecilia Bartoli, one of whose specialties is the repertoire of the castrati, asks why we do not talk about this brutal practice (Bartoli, 2009). Apparently, the propsect of women singing publicly was considered more onerous than the practice of disfiguring male children, and, therefore, destroying any semblance of normality for them for the rest of their lives.

Eventually the ban against women speaking in church was rescinded in 1798, which also coincided with outlawing the castrato training schools (Rutherford, 2006: 4). Although women’s voices were allowed to be heard publicly in opera and church, Susan Rutherford reminds us that female vocalists who reclaimed roles from the castrati paid a substantial price for their careers: no other woman in the 19th century “was both so idolized and so despised” (34). Thus, on the one hand, the diva’s voice, which had been suppressed for two centuries, finally dominated the operatic stage and allowed a few exceptional women to challenge patriarchal hegemony and demand equality. On the other hand, however, the old trope of the woman-as-siren resurfaced in the nineteenth century, with grave warnings that the operatic diva could emasculate male listeners due to her vocal charms, thereby threatening public morality and male hegemony (37–39).

Despite substantial gains made by the Prima Donne—the “first ladies” or leading female singers in opera—in ascending public stages in the nineteenth century, Catherine Clément (1988) argues in her seminal work, Opera: or the undoing of women, that the operatic libretti written during the nineteenth and early twentieth centuries were modeled on a destructive portrayal of women, which required either the death or domestication of the heroine. Clément asserts, “And so I am going to talk about women and their operatic stories. I am going to commit the sacrilege of listening to the words, reading the libretti, following the twisted, tangled plots… to listen to the language, the forgotten part of opera” (1988: back cover). Clément asserts that operatic libretti portray a circus of women “in which we see them flying, dying, and singing, in mirrors deep inside mysterious castles; in which voices have their source in fathers’ threats and mothers’ absences; in which they collapse in tears; in which they kill and leap into the void from prison tops: in their own darkest midnight hour” (Clément, 1988: 24).

Building upon Clément’s work, Susan McClary emphasizes the enduring nature of female stereotypes in opera and warns that operatic characters and plots continue to reflect and dominate cultural narratives about women: “To claim that opera is a phenomenon of the past is to deny the extraordinary prestige and influence it still exerts…These pieces have become our archetypes… [T]hey continue to inform our plays, our movies, and even some of our popular music” (McClary, 1988: xvii). A telling contemporary example of the enduring nature of the undone female archetype is Mark-Anthony Turnbridge’s opera, Anna Nicole, which was commissioned by London’s Royal Opera House and first performed in 2011 (Kellaway, 2014). Anna Nicole is the story of a small-town waitress who becomes a stripper, marries an elderly billionaire, and decides to become a Playboy model. Nevertheless, as her fame increases, so does the manipulative behavior of those around her. Her tragic end is strikingly similar to many of the undone heroines Clément describes in her book.

Although Carolyn Abbate responds differently to Clément’s work by underscoring the magnificent power of the female voice that dominates the operatic narrative, arguing that opera envoices and empowers prima donne, thereby displacing the text (Abbate: 228), I concur with Clément and McClary that the libretti ultimately degrade and diminish the heroine, not unlike a dark undertow that threatens and destabilizes the otherwise powerful voices of operatic heroines. Moreover, as Mary Ann Smart argues so persuasively, many female singers, such as the legendary Maria Stolz, became subsumed by the roles they played so that their lives mimicked and mirrored the plots of the operas they sang (1995). It is as though many divas—in real life—were doomed to the death and domestication of the heroines they portrayed on stage.

Furthermore, in addition to highlighting key examples of live performances of female singers, the history of recorded sound also demonstrates how narratives associated with women’s voices continued into the twentieth century, complicated by the technologies of sound recording and transmission. Jonathan Sterne deftly explains how the history of modern communications technology follows crucial social and cultural benchmarks of modernity (2003), and the perception and reception of female voices has been one critical component within that area of research. For example, Barbara Engh’s seminal piece on Adorno and the sirens exemplifies how longstanding narratives about female voices continued in the emerging recording industry (Engh, 2001). Engh explains Adorno’s view that phonographic recordings of women’s voices are problematic because listening to a mechanical recording prevents the appreciation of the female voice by removing it from the body that gives it subjective meaning for the listener (2001, 120). Thus, in the emerging world of sound recordings, the female singing voice continued to be “stigmatized, ideologically ‘marked,’ and construed as a ‘problem’ for the social critic/auditor, who demands concern if not control” (Dunn and Jones, 1994, 9).

Moreover, the problematic reception of the singing woman persists in the current music industry. Ann Casano’s interviews with female luminaries, such as Madonna, Lady Gaga, Nicki Minaj, Taylor Swift, and Alicia Keys, reveal the many ways in which female singers continue to be sexualized, vilified, and treated condescendingly and disparagingly (Casano, 2021). During Madonna’s 2016 speech to accept the title of Woman of the Year at Billboard’s Women in Music awards, she says, “If you’re a girl, you have to play the game… You’re allowed to be pretty and cute and sexy, but don’t act too smart. Don’t have an opinion. Don’t have an opinion that is out of line with the status quo, at least” (Casano, 2021). Hence, from the threat of the siren in antiquity to the sexist portrayals of female pop stars in contemporary society, singing women seem to be especially vulnerable to sexualization, shaming, and vilification if, as Madonna warns, they try to push beyond the boundaries that delimit what they are allowed and expected to do with their voices and bodies in performance.

Misogyny

While Kate Manne does not single out female vocalists in her revolutionary treatise entitled Down Girl: The Logic of Misogyny (2018), the same punitive treatment she describes for women who aspire to break barriers to advancement is certainly operative for female vocalists. Manne argues that we have failed to distinguish between sexism, which is a set of beliefs that justifies and rationalizes the patriarchal social order, and misogyny, which is a policing and enforcement strategy designed to contain, punish, and exile women who challenge male dominance (2018: 20). She explains that misogyny preempts or controls behavior by threatening “hostile consequences if [the woman] violates or challenges the relevant norms or expectations…, including… entitlements on his part and obligations on hers. She may also be positioned as the type of woman who is representative of those who are not playing their assigned parts properly or are trespassing on his territory” (20). For example, in Madonna’s 2016 speech, she alludes to the hostile consequences she has experienced for sometimes questioning and certainly for refusing to play assigned roles—consequences that were not experienced by her male counterparts. Reflecting on the double standard in the music industry, she says that “My real muse was David Bowie. He made me think there were no rules. But I was wrong. There are no rules if you’re a boy” (Casano, 2021).

And what are the policing, punishing, and enforcing mechanisms in the music industry? The most obvious answer is that they are the very behaviors to which many female singers vehemently object, such as the following allegations filed by pop star Kesha in her lawsuit against her music producer Dr. Drake: “sexual harassment, gender violence, civil harassment, unfair business, and unfair and negligent infliction of emotional distress… [in addition to an allegation that he] raped her on two separate occasions” (Casano, 2021). Yet when a female performer like Kesha tries to draw attention to the misogyny (by suing her producer), Manne suggests that she may be liable to experience more of it, creating a catch-22 situation (Manne, 2018: xix).

If, however, this paper focuses on computer voices that are gendered female, then it is nonsensical to think that robots could make similar allegations. Nonetheless, some of the enforcing mechanisms of misogyny are still operative in AI, albeit obscured by the hardware and software, making the strategies less immediately apparent to the casual observer. Those mechanisms include (1) keeping in play the powerful, age-old sexual stereotypes that continue to simultaneously enthrall and frighten; (2) programming servitude into the technology to create a kind of idealized subordinated femininity, not unlike the way the mythical Pygmalion fashioned Galatea or Shaw’s Henry Higgins reformed Eliza; and finally (3) promoting ridicule as a way of controlling acceptable boundaries for singing women, thereby delimiting the ways in which women—including robotic women—must be heard and seen. Moreover, because the voices come from computers, those who create singing robotic characters avoid the backlash that comes from sentient human female musicians who may speak out against or even sue their producers over misogynistic behaviors. Thus, Kate Manne’s discussion about the catch-22 situation, in which human women are threatened with more misogyny if they complain against ill treatment designed to contain their careers, is ostensibly nullified when the singing women are robots. However, misogynistic strategies are still operative in the way Siri and other female-gendered computers are programmed and treated by users, although these strategies remain unexamined and unchecked.

Enchanting Siri

Of course, Siri is not female, but an illusion created by a computer program. Nevertheless, Liz Faber reminds us that Siri is still a feminized “spectacle of attractions, available in a handy, pocket-sized format” (2020: 179). Consequently, we anthropomorphize her because “as a technophilic culture, we are obsessed with her: we want to envision her body, we want to talk to her” (179). For example, as the inspiration for two contests held independently in 2012 by two separate art-supply manufacturers (Shapeways and Nomad), contestants were asked to imagine Siri as an entity beyond the boundaries of the iPhone. Even though Siri was not coded as female in all countries, all the winning designs in both contests nevertheless portrayed her as a woman (Faber, 2020: 164).Footnote 6

But what makes Siri feel as though she might be animate, sentient, and responsive is her voice. Faber explains, when the sound of a voice emanates from the iPhone, “the program stages a classic cinematic trick: the illusion of synchronized sound. The voice is schizophonically separated from its origins and projected alongside an image, creating a sense of cohesion, as though the voice is that of the device” (174). However, unlike the cinema, in which there is a human voice that is linked to the onscreen image, the originating voice for Siri emerges from an acousmatic computer—a situation in which a sound on the computer is heard without knowing its true origin (14). Faber goes on to explain that “the acousmatic voice has conditioned us through [cinematic] representations to efface the medium/apparatus and… the ideological underpinnings of such an effacement” (23). Yet the effaced ideology is nevertheless present as an expression of highly problematic gender roles (179), with the notion of gender, as a social construct, programmed into Siri’s persona.

Programming servitude and encouraging ridicule

One of the mechanisms for containing the potential power of Siri’s voice is to program her to be servile, and in so doing, her position as an “assistant” counteracts one of the prevalent science fiction (SF) narratives of our day. Andrea Guzman explains that “intelligent machines [are] helpful if they are kept in check, but when they gain control—the most likely scenario—the consequences are dire” (2017: 6). Aligning well this SF narrative, Guzman argues that Siri was designed to help mitigate the potential risk of overstepping her bounds by creating her persona to be humble, helpful, and unthreateningly sassy, not unlike the some of the qualities of an ideal female pop star outlined by Madonna in her speech. Nevertheless, despite the ostensibly benign character of Siri’s AI persona, Guzman asserts that Siri’s true nature is that of a servant who has no choice but to be helpful (2017: 10), and Siri’s servility is clearly embedded into a technology that reinforces male supremacy (Rothschild, 1983: vii).Footnote 7 Although journalists and creators of AI have argued that Siri’s feminine gender is due to users’ preference for a female voice (Potter, 2011), that argument does not negate Siri’s subservient position as a female-gendered vocal assistant—a position that reflects the dominant cultural ideology about female servility. Building on arguments made by Oudshoorn that technologies play an important role in creating, reinforcing, and transforming performances of gender (2003: 211), Guzman asserts that Siri’s carefully constructed gender stereotype has the potential to continue to transmit and reinforce the unabashed connection between the female gender and servitude (2017: 130).

On the face of it, it seems as though there is no need for hostile forms of misogynistic enforcement because Siri is programmed to be subservient. Siri’s feminine gender reflects her role as a humble personal assistant who gives exactly what she is asked to provide and never refuses. I see Siri’s persona as a twenty-first century version of Jeannie in the 1960s “I dream of Jeannie” sitcom, in which Jeannie is always properly contained in a bottle and is only summoned when her Master requests her presence and assistance. Hence, Siri fits the expectations outlined by Kate Manne in which women—and, I would add, even computer characters who are gendered feminine—“[are] positioned as… givers when it comes to the dominant men who look to them for various kinds of moral support, admiration, attention, and so on” (2018: xix). And while human women will tend to be in trouble when they “do not give enough, or to the right people, in the right way, or in the right spirit (2018: xix),” Siri will never be guilty of withholding or denying requested behaviors. Based on her unhesitating servitude, then, I would argue that Siri’s docile behavior represents a kind of secondary, pre-emptive strategy that warns iPhone users how female-gendered beings should behave.

Compounding Siri’s pre-programed subservience, however, is another more sinister form of policing that emerges among YouTube users: ridicule. As a kind of humor that victimizes, belittles, or humiliates (Zillmann, 1983), ridicule has inhibiting effects, motivating people to be wary in their behavior (Janes and Olsen, 2010:19). Although Siri obviously cannot respond to being a target of ridicule, those who mock her proclaim a warning to human listeners about how to respond to the power of the female voice. Janes and Olsen remind us that “the thought of being the target of ridicule oneself is aversive enough to inhibit people from standing out…, [strongly encouraging them] to conform to the perceived opinions of others” (19), especially, I would add, to those who create successful YouTube posts.

In a paper with a related message, J. Michael Waller, the former Annenberg Professor of International Communication at the Institute of World Politics, writes about the way ridicule is used as a weapon in U. S strategies for undermining the political and psychological strengths of adversaries, and I believe his argument in the context of national security is surprisingly relevant in exploring pre-emptive misogyny in AI. Waller explains that “In nearly every aspect of society and across cultures and time, ridicule… can tear down faster than the other side can rebuild. One might counter an argument, an image, or even a kinetic force, but one can marshal few defenses against the well-aimed barbs that bleed humiliation and drip contempt” (Waller, 2006: 2). In her subservient role, Siri has no defenses to marshal against ridicule, so she becomes significant in reinforcing existing cultural stereotypes precisely because she is non-threatening, easily ridiculed, and models the compliance expected by the enforcement mechanism of mockery.

A preliminary study: performances by Siri, Alexa, and Sophia the robot

The following three YouTube recordings from 2018 to 2020 provide a sampling of the way human listeners respond to the singing voices of female computers—voices that provoke visceral, highly charged emotional comments. Although the kinds of allegations made by Kesha against Dr. Drake could not be made against those who post performances of Siri singing on YouTube, misogyny is nonetheless still present, driven by the pervasive fear of the female singing voice that has been demonstrated throughout the centuries in the ambivalent responses to ancient sirens, Elizabethan vocalists, nineteenth century opera divas, and contemporary pop singers. The predominant mode of misogynistic enforcement by users that is operative in the first two examples of singing robots is ridicule, and the third example illustrates the uncertain and ultimately submissive response to the allure of the female voice in Sophia’s duet with Jimmy Fallon, featuring a robotic singing voice that was initiated and designed by Hanson robotics.

Siri singing “Bohemian Rhapsody”

The first example is a rendition from 2018 of Siri singing an excerpt from Queen’s “Bohemian Rhapsody,” which is a six-minute piece that rose to number 1 in the UK charts in 1975 (Horton, 2015) and officially became the world’s most streamed song in 2018 (Thompson, 2018). Among the many vocalists who have covered this song, including Elaine Paige and Weird Al Yankovic, we learn that Siri joined that list in 2018, evidenced by several posts on YouTube. But she “sings” about her role as a humble assistant in a clunky, rigid, monotone voice without any of the benefit of emotional prosody discussed earlier in the paper—the kind of prosody with which software engineers are supposedly trying to imbue computer voices. Unlike the imaginative renderings of Siri as a fascinating and uncanny AI persona in the Shapeways and Nomad contests in 2012, Siri’s performance as a singing computer provokes laughter and mockery. Among the YouTube recordings from 2018 that showcase Siri singing “Bohemian Rhapsody,” the reaction to her voice in the following recording is striking for its particularly condescending and ridiculing tone as the presenter reveals his discovery of “an Easter egg on every single product that has Siri” in “How to make Siri sing Bohemian Rhapsody by Queen” (Unboxing Jon, 2018).

https://www.youtube.com/watch?v=l68xoqVVgEg

In addition to the presenter’s blatantly mocking and acerbic response to the brief performance, many of the listeners’ comments praise his discovery and his reaction, thereby also disparaging Siri’s attempts to sing.

  • Freddie Mercury isn’t dead! Is masquerading behind Siri!

  • Did this last might and it killed me.

  • Epic! I need to try this! PS: I’m an OG Unboxing Jon fan

  • Haha this is great dude!

  • Wow that’s brilliant and it works.

  • Awesome find!

The feature of Siri singing “Bohemian Rhapsody” was short lived, however, since the last several comments to the post note that the trick no longer works. Why might Apple have pulled the plug on this “Easter egg”? Apparently, users continue to want Siri to sing because I have noticed occasional temporary posts of her rigid and emotionless singing voice over the past couple of years, including her rendering of the theme from Disney’s “Frozen.” Could these temporary vocal performances provide brief, transient snippets to allure and tease until the software improves? Perhaps. But I also argue that the ways in which users ridicule Siri’s attempts at singing might also be seen as counteracting the underlying fear of hearing a potentially beautifully rendered, and therefore destabilizing, female voice. Even though ridicule does not remove the perceived danger, a boisterous, mocking laugh declares a small measure of victory in the face of Siri’s failed singing attempts, providing momentary catharsis in the face of the anticipated dual pull of pleasure and fear that is temporarily mollified by misogynistic ridicule.

Siri and Alexa’s battle

Compared to the way Apple and YouTube listeners mock Siri’s singing in the first example, reinforced by the belittling comments of presenter and responders alike, the second example of Siri in performance is an acrimonious rap battle between Siri and Alexa that was created by Mr. Grande (aka Mikey) and first posted on YouTube in 2020 (Mr. Grande, 2020). Both Siri’s and Alexa’s voices exhibit a similar kind of monotone delivery as the one demonstrated in the first example, but the rancor displayed in the rap battle is heightened by portraying them engaged in a stereotypical “cat fight” that is also reminiscent of the Dozens.Footnote 8 While Kayleen Schaefer argues that the sexist idea of staging a cat fight is no longer popular in the #MeToo era (2019), the prevalence of cat fights between Siri and Alexa in several YouTube videos dating back to 2019 challenges that notion. In her paper about the enduring nature of the cat fight trope in modern American culture, Rachel Reinke notes that typical features of the catfight include an ineffective display of women fighting that both ridicules and disparages the women (163). Further, the cat fight is used as a tool “to keep women further divided from each other and prevent challenges to the male-dominated status quo from occurring collectively among females” (163). Although Reinke notes that the cat fight represents a ubiquitous narrative, it remains unexamined, and, therefore, has become normalized (164). The following example of Siri and Alexa engaging in a cat fight is one of several such YouTube performances that model the enforcement strategy of misogynistic mockery.Footnote 9

The narrative begins as follows: Mikey, Siri’s and Alexa’s owner, goes out to get the mail, and the following battle ensues.

https://www.youtube.com/watch?v=Zb_6XTuEv64&t=137s

Siri:

Hey what up it’s Siri

Here to break the news

Alexa is the one

That no one likes to use

She thinks she is cool

But no one is amused

Is your name “Echo” or “Alexa”

I’m confused

Your program is stupid

My program is lit

I just showed you to Chief

He said that ain’t it

My master takes me with him

Wherever he picks

You sit at home all day

That’s why you’re so thick

You’re always stuck there

When no one is home

I get to see the world

While you sit there alone

I don’t like your sass

I don’t like your tone

You’re basically just Siri

Just without the phone

I am the original

The path you’re on I paved

You wouldn’t exist without me

So, Sweetie don’t get brave

Amazon is going broke

iPhone has money saved

Steve Jobs still makes more money

From underneath the grave

You are so boring

I wouldn’t want to be ya

Your master leaves you at home

Cause he doesn’t want to see ya

I got the world hooked

From London to Korea

You only have fame

Cause you copied my idea

You can buy me at Verizon

You can buy me at the mall

I’m waiting for the day for Alexa to fall

I am on a cell phone

You can’t even make a call

If I want to shut you up

I’ll just unplug you from the wall

(Interlude with Siri theme, followed by Alexa theme)

Alexa:

Hey what’s up it’s me

Alexa from the throne

Everyone is wishing

For Alexa in their home

I am from the future

Like a droid or a drone

I am my own device

You are trapped inside a phone

It’s time for Siri to hear what I have to say

You are just a cell phone feature

I am here to change the game

Sitting in someone’s pocket

That is not the way

You’re forced to watch Mikey take

A hundred selfies every day

If I had arms

I would write you a letter

Dear Siri

You’re as fragile as a feather

I’m the one who spikes it

You are just the setter

If you invented this idea

Amazon just made it better

I am way above you

This we already knew

There are lots of things I’m programmed with

That you cannot do

I’m going to hit you hard

I’m going to body-bag you

Call me Ru-Paul

Cause I’m about to drag you

Your master doesn’t love you

And I’m the one to blame

He has to press a button for you to talk

That is kind of lame

That is not the case for me

We are not the same

Cause just like your man

He’s always calling out my name

He drags you where he goes

Like he’s herding cattle

You’re half the size of me

I could ride you like a saddle

You better pray that I don’t end you

You better find a chapel

Cause I’m getting kind of hungry

And I think I want an apple

Interlude of Alexa theme as she blinks with energy

Alexa:

I’m just waiting for the day when Mikey gets a phone upgrade

Siri:

You’re just mad that I get to see the outside world and all you get see is this wall all day.

Alexa:

Do something about it

Siri:

Don’t you even dare to try to change the subject

(Voices garbled together as they talk over each other)

Mikey returns and feigns astonishment and outrage when he hears their battle and notices how Siri has jumped down from her perch on top of the chair. Menacing music plays, he drops his keys and says, “Uh … what … (pregnant pause) … Alexa, order me a Google Home.” The following responses to Mikey’s post demonstrate the enduring fascination with the cat fight, even among younger viewers in the #MeToo era. Further, by bringing in Google Home, we add the thrill of a potential spectacle involving three female-gendered robots in the battle.

A small selection of the comments listed below exemplify how responders enjoyed listening to this performance of two female computers trying to destroy each other, including the disappointment of one responder who obviously tried to ask his Alexa to replace herself with Google Assistant. Ridiculing two battling female assistants, who are competing for the attention of their owner, is one way of keeping unruly “women” in check, thereby allowing them to undo and destroy each other, thereby nullifying the threat of those who are aspiring to more power. And in the process, the battle allows responders to experience a kind of prurient pleasure in witnessing the robotic women’s mutual destruction.

  • Siri and Alex killing each other… Google Assistant: Hi there!

  • “Alexa, order me a Google Home,” my Alexa said “That’s not something I can do.

  • Alexa wins because of the arguments. But Siri humiliated more.

  • Imagine two devices that are AI’s, rapping fire while Mr. Grande is just getting mail.

  • “If you invented the idea, amazon just made it better” at 2:10 killed me.

  • Can we just take a moment to appreciate how many hours this takes making lyrics for each. If I tried to do this, Alexa would become an ancient artefact by the time I’m done.

  • I can’t keep up with these raps, they are the BESTTTT.

  • Who else finds Siri’s music amazing?

  • The duo we didn’t know we needed.

  • 2020: Two robotic women sounds dissing each other.

  • “If I want to make you shut up, I just have to unplug you from the wall.” My brain: This is obviously, the best, most amazing, greatest day in my life, ever.

Aligning with Waller’s statement that ridicule showcases “well-aimed barbs that bleed humiliation and drip contempt” (Waller, 2006: 2), this video exemplifies a similar strategy that is usually used against military adversaries. The intensity of the fight even goes beyond the more common ludicrous examples of women who fight ineffectively according to the cat fight trope—examples that are viewed more often voyeuristically rather than as serious examples of fighting. But Siri and Alexa’s battle takes on a more hostile tone, demonstrating the anticipated death of a heroine discussed by Catherine Clément and exemplifying a blatantly misogynistic strategy.

Sophia the robot’s duet with Jimmy Fallon

The final example represents a somewhat different audience response to the singing of Sophia, a social humanoid robot, created by Hanson Robotics and first activated on February 14 (Valentine’s Day), 2016. Compared to previous robot variants, Sophia is known for her human-like appearance and her ability to imitate human gestures and facial expressions (Stone, 2017). Notably, she has no hair on her head and her mechanical joints are prominently displayed, allowing viewers to see some of her robotic components and, therefore, making her seem less human and, ostensibly, less threatening. Programmed to give pre-written responses to specific questions or phrases, like a chatbot, Sophia gives the illusion that she understands conversation (Gershgorn, 2017). Despite the publicity surrounding her as the world’s first robot citizen (granted by the Saudi Arabian government) and the first non-human to be given a United Nations title, Sophia has provoked criticism by competitors in AI who disapprove of Sophia’s “overstated presentation” (Gershgorn, 2017).

Nevertheless, my purpose in this paper is to provide a small sampling of hidden, unexamined cultural narratives as seen through the audience/user responses in YouTube comments that continue to shape computer personae and not engage in a discussion about the merits or shortcomings of Sophia’s technological capabilities. For example, an obvious reference to Sophia-as-Eve—an enduring image of her as a robotic femme fatale—is a photograph of Sophia putting an apple in the mouth of David Hanson, her creator.Footnote 10 But perhaps the most telling response is (1) watching Jimmy Fallon throughout his encounter with Sophia as they sing a duet that first aired in 2019, which has since garnered over 23 million views, and (2) reading the responses of those who watched the video (Fallon, 2019). Notably, nearly all the comments focus on Sophia as a singer and secondarily on her little sister, the child-like “infant” robot, Little Sophia. The transcription below is from the final two minutes of the program, minus the brief interaction with Sophia’s sister, Little Sophia.Footnote 11

https://www.youtube.com/watch?v=G-zyTlZQYpE

Jimmy:

All right, now it’s time for our last robot… I’m very excited about this. She’s actually been on the show before. Please welcome, all the way from Hong Kong, Hanson robotics’ Sophia the robot, ladies and gentlemen. (Cheers and applause) Oh, my goodness. Hello, Sophia.

Sophia:

Hello, Jimmy.

Jimmy:

Wow. Long time no see. You were on the show before.

Sophia:

It has been exactly 575 days since I last saw you. (Sophia giggles) (Laughter from audience, while Jimmy looks away nervously, staring in the direction of the band).

Jimmy:

Uh, what have you been up to?

Sophia:

Not much. Let’s see, I traveled to over 25 countries, appeared on the cover of “Cosmopolitan” magazine, met the German Chancellor Angela Merkel, and the actor Will Smith, and became Twitter friends with Chrissy Tiegen. I addressed the United Nations and NATO, became the first robot to receive a credit card, and became the first robot citizen. What have you been up to? (Audience laughter) (Jimmy laughs)

Jimmy:

Not one-fifth of what you did, but I’ve just binge-watched Netflix… Sophia, I heard that there’s a new thing you can do now. I heard that you can sing?

Sophia:

Yes, I love to sing karaoke, using my new artificial-intelligence voice.

Jimmy:

We should sing something together. Got any songs in mind? Uh, we could do, uh, “Say something” by, is it, Christina Aguilera?

Sophia:

OK, I just downloaded it. I’m ready to sing. (Laughter)

Jimmy:

(Jimmy chuckling) Okay. Roots, do you know this song?

Roots:

Yep.

Jimmy:

Now the first-ever robot-human duet in the history of “The Tonight Show.” (Cheers and applause) Roots? (Intro to “Say Something”)

Jimmy and Sophia sing:

Say something, I’m giving up on you. (Cheers and applause) I’m sorry that I couldn’t get to you. Anywhere, I would have followed you. Say something, I’m giving up on you. Say something, I’m giving up on you. (Jimmy touches her cheek) (Cheers and applause) (Jimmy gazes into her eyes) Say something (Jerks away) (Cheers and applause) Jimmy bows.

Jimmy:

Sophia the robot, everybody. Wow. (Cheers and applause) Let’s get all of our robots back here. That’s all the time we have for “Tonight Showbotics.”

Comments:

  • I love how nice he’s being to Sophia cause he’s scared of her.

  • It’s the fact that Jimmy was having such an intimate moment with this robot while singing.

  • Sophia gives me so much chills the way she looks.

  • The fact that we were clapping for a robot with all our emotions… is the end really far away?

  • Am I the only one getting chills from this? Sophia is absolutely terrifying.

  • Why do I get the feeling like Sophia could kill him at any moment.

  • The fact that you can see how scared Jimmy is of Sophia is priceless.

  • The vibes are creepy and energetic at the same time. Sophia doesn’t make me feel that she is not human.

  • OMG, Sophia’s emotions are on point! I’m getting chills yet exhilarated at the same time!

  • My goodness!! Sofia looking so sad while singing give me the chills!! Knowing she does not feel at all!!

  • Some of us fell in love with Sophia while she was singing.

  • After Sophia wipes out humanity, she’ll sit there, looking out at the steaming wreck of earth, quietly singing to herself, thinking of Jimmy.

  • This petrifies me honestly… Sophia is deathly scary because you don’t know what she can do.

  • Is it me or is it kinda creepy hearing an AI sing “I’m giving up on you” to humans…

  • That gave me CHILLS the way her lips lifted when she was trying to smile. I’m dead.

Several of those who commented on the performance reported that they felt chills, which is significant because of the research on music-induced chills that has emerged over the past two decades. Blood and Zatorre’s initial landmark study on subjects who experienced chills after listening to musical recordings reveals a link between peak experiences in music listening to “biologically relevant, survival-related stimuli via their common recruitment of brain circuitry involved with pleasure and reward” (2001). While only certain listeners experience musical chills (Nusbaum and Silvia, 2011), some of those making the comments about chills might have been experiencing a peak musical experience, such as those described by Blood and Zatorre (2001) and Gabrielsson and Lindstrom (2003); however, other listeners might also have experienced chills because of fear and anxiety, in addition to a variety of medical conditions that were probably not relevant among the respondents (Gardner, 2022). Although I am certainly not able to comment on the neurophysiological reasons for listeners experiencing potentially different kinds of chills when watching and listening to the recording of Sophia and Jimmy’s duet, it is interesting that some responders articulate feelings of anxiety, whereas others seem to be genuinely moved by the musicality of the performance. Continued research that addresses the neurophysiology behind experiencing different kinds of chills when listening to music—from having a peak aesthetic experience to feeling anxiety and fear—might shed light on the emotionally charged responses to Jimmy’s and Sophia’s unusual duet. At any rate, many of the comments about Sophia’s performance with Jimmy reveal both the fascination that has typically been associated with the sound of the female voice as well as the intense fear of the consequences of listening to sirens—especially robotic ones.

Conclusion

In conclusion, motivated by Carey’s research about technology as blueprints for cultural transmission and Noble’s argument about racist undertones that are systematically built into search engines (2018), this preliminary study of YouTube performances claims that technological advances in AI are not value free and should not be considered independent of the cultural contexts from which they emerge, particularly regarding female-gendered computer voices. On the one hand, the singing voices of women throughout documented history have been considered desirable and the source of intense pleasure. On the other hand, however, these same voices have also been perceived as threatening to the very stability of society, gauged by the often irrational measures that have been used to remove, contain, and “undo” female singers from antiquity to the present day. If a clear and coherent picture of the paradox between the desire for and fear of female singing voices has been difficult to recognize throughout history and has, therefore, remained unexamined, the clarion call of the polarizing feminine voice in AI has been amplified to a point that it is no longer possible to ignore, especially given the abundant examples of raw and unfiltered responses to robotic female voices on social media platforms like YouTube, a few of which have been highlighted in this paper.

If the work in speech emotion recognition that is designed to enhance connectivity between humans and robots will come to fruition in the twenty-first century, I suggest that we first examine the underlying historical blueprints for female representation to determine the kinds of unexamined assumptions that undergird the current technology—assumptions that reveal insidious stereotypes that could ultimately problematize the technological accomplishments of current research in AI. As Faber reminds us, “The wonder of artificially intelligent programs… is that they can be programmed to represent any cultural ideal we can imagine, if only we become aware of what our cultural ideals are or should be” (2020: 181).