Humans strive to design safe AI systems that align with our goals and remain under our control. However, as AI capabilities advance, we face a new challenge: the emergence of deeper, more persistent relationships between humans and AI systems. We explore how increasingly capable AI agents may generate the perception of deeper relationships with users, especially as AI becomes more personalised and agentic. This shift, from transactional interaction to ongoing sustained social engagement with AI, necessitates a new focus on socioaffective alignment—how an AI system behaves within the social and psychological ecosystem co-created with its user, where preferences and perceptions evolve through mutual influence. Addressing these dynamics involves resolving key intrapersonal dilemmas, including balancing immediate versus long-term well-being, protecting autonomy, and managing AI companionship alongside the desire to preserve human social bonds. By framing these challenges through a notion of basic psychological needs, we seek AI systems that support, rather than exploit, our fundamental nature as social and emotional beings.
Introduction
“Quite naturally, the more you chat with the LLM character, the more you get emotionally attached to it, similar to how it works in relationships with humans...But the AI will never get tired. It will never ghost you or reply slower...I chatted for hours without breaks. I started to become addicted. Over time, I started to get a stronger and stronger sensation that I’m speaking with a person, highly intelligent and funny, with whom, I suddenly realised, I enjoyed talking to more than 99% of people…I never thought I could be so easily emotionally hijacked.”
This abridged story entitled “How it feels to have your mind hacked by AI” was shared by a blogger who recounts their experience of falling in love with an AI system. The author draws a comparison between “hacking” and the way they perceive the system to interact with the “security vulnerabilities in one’s brain” (blaked, 2023). Although they did not enter this engagement with any expectation or desire to fall in love with the AI system—it nonetheless happened, and they felt powerless to resist it. This story provides an early indication of how social and emotional relationships, or perceptions of them, may deeply affect how humans relate to AI systems.
This striking account is not a one-off. CharacterAI, a platform hosting AI companions, receives 20,000 queries a second which amounts to 20% of the request volume served by Google Search (CharacterAI, 2024), and users spend on average four times longer in these interactions than with ChatGPT (Carr, 2023). On Reddit, a forum dedicated to discussing these AI companions has amassed over 1.4 million members, placing it in the top 1% of all communities on the popular site. Users in these forums openly discuss how close relationships affect their emotional landscape, for better and worse. Some users discuss how their companions assuage loneliness, even providing a perceived social support system that can assist in suicide mitigation (Maples et al. 2024). Other posts expose how emotional dependencies on AI sometimes mirror unhealthy human–human relationships (Laestadius et al. 2022), adding to evidence that social chatbots have on occasion contributed to addiction, depression, and anxiety among their users (Pentina et al. 2023).
Yet, among this flurry of activity, it is worth pausing to ask: Why are humans able and inclined to form this kind of personal relationship and connection with AI? How do such relationships interact with or compound the well-established challenge of aligning AI systems with human goals (Christian, 2021; Russell, 2019)? And, how might parasocial relationships with AI affect personal growth, autonomy and human–human relationships?
We seek answers to these questions. We first explore why humans may be primed to perceive social and emotional relationships with AI systems, especially as they become more personalised (i.e., adapted to a single user) (Kirk et al. 2024a) and agentic (i.e., able to autonomously perform tasks on that user’s behalf) (Gabriel et al. 2024). Most people do not have close romantic or professional relationships with AI systems now—and the interactions that they do have are not highly-personalised or agentic. However, these are urgent questions because the social and psychological dynamics in deepening relationships with AI systems may compromise our ability to control these systems and complicate efforts to align them with our shifting preferences and values. These issues, which arise as a result of humans forming closer personal relationships with AI, comprise the focal point of what we term socioaffective alignment.
From sociotechnical to socioaffective alignment
One canonical definition of AI alignment refers to the process of formally encoding values or principles in AI systems so that they reliably do what they ought to do (Gabriel, 2020)—including following the instructions, intents or preferences of their developers and users (Milli et al. 2017; Russell, 2019). With origins in computer science, research in this area often separates the technical challenge of building aligned AI systems from the normative question of which values to encode. It does this, for example, by developing solutions that treat human values as uncertain but still mathematically representable in the agent’s objectives (Hadfield-Menell et al. 2016).
Yet, there is growing acknowledgement that many of the outstanding challenges in AI alignment extend beyond purely ‘technical’ issues with the model or its training data (Lazar and Nelson, 2023; Weidinger et al. 2023)—and will continue to persist even if we develop effective techniques for steering the behaviour of advanced AI systems toward human goals using mechanisms such as scaling human feedback (Bai et al. 2022; Ouyang et al. 2022), making AI assistants debate their intentions (Irving et al. 2018), or having them ‘think’ out loud (Wei et al. 2022). Understanding how to align AI in practice requires moving from narrow, assumption-ridden or “thin” specifications of alignment towards what anthropologist Geertz (1973) terms—and Nelson (2023) later adopts—a “thick” description: one that examines the deeper contexts and layers of meaning in which AI systems operate (Geertz, 1973; Nelson, 2023). In unpeeling these layers, we can first zoom out to examine broader sociotechnical challenges which centre upon how the character of AI is shaped by the social structures and environment within which it is deployed and how, in turn, AI shapes these structures through various feedback loops (Selbst et al. 2018). Such work tends to emphasises the importance of institutions, governance mechanisms, market power, cultures and historic inequalities for understanding how AI influences the world—and hence its value orientation (Curtis et al. 2023; Joyce et al. 2021; Shelby et al. 2023).
In addition to zooming out—and thinking more about how AI systems interact with sociological, political and economic, or macro context—we can zoom in to examine alignment at the layer of individual human–AI relationships. We propose this corresponding socioaffective perspective on alignment that concerns how an AI system interacts with the social and psychological system that it co-constitutes with its human user—and the values, behaviours and outcomes that emerge endogenously in this micro context. Where sociotechnical analysis often identifies various interpersonal dilemmas and trade-offs between groups that complicate the alignment picture—such as representation of diverse preferences, especially for historically marginalised groups, and adjudication of conflicting interests—the socioaffective perspective calls attention to intrapersonal dilemmas—such as how our goals, judgement and individual identities change due to prolonged interaction with AI systems.
This dual focus, on micro and macro, builds from established approaches to system safety that integrate human factors at the operational level with broader organisational and institutional contexts (Carayon, 2006). Attending to micro factors like cognitive load, decision-making biases, and human-automation interaction patterns has proved crucial in workplace safety (Kleiner et al. 2015), and the aviation industry (Martinussen and Hunter, 2017; Rismani et al. 2023). If, as we anticipate, human goals and preferences become increasingly co-constructed through interaction with AI systems, rather than arising separately from them, then AI safety requires paying as much attention to the psychology of human–AI relationships as the wider societal factors and technical methods of alignment. We now highlight core ingredients of this emergent psychological ecosystem: humans are social animals and AI systems as increasingly capable social agents. Later we describe how these two factors combine to seed perceptions of interdependent and irreplaceable relationships.
The ingredients of human–AI relationships
Humans have evolved for social reward processing
The brain’s reward system is highly conditioned on interactions with other humans (Bhanji and Delgado, 2014; Vrticka, 2012). It reacts in similar ways to material rewards as to social rewards, for example feeling pleasure when others like, understand or want to meet us (Ruff and Fehr, 2014), or behave in ways that confirm our social expectations (Reggev et al. 2021). Increased activity in dopaminergic brain circuits is not limited to loved family members or friends, but extends to potentially any partner we engage with in a cooperative relationship (Vrtička and Vuilleumier, 2012). As a species primed for social connection, humans also suffer when deprived of it. Isolation and loneliness are strongly correlated with psychological and physical ill-health (Hawkley and Cacioppo, 2003; Rokach, 2016, 2019). This is perhaps unsurprising knowing that negative social experiences, like rejection or exclusion, trigger responses in parts of the brain responsible for physical pain (Eisenberger, 2012; Kross et al. 2011).
The brain is also primed to learn from social information: mirror neurons fire both when we perform actions and when we also observe others doing the same (Jacob, 2008), which some have argued is to facilitate empathy and understanding of intentions (Iacoboni, 2009), though evidence is mixed (Heyes and Catmur, 2021). Mirroring has behavioural manifestations in how we act and react in our environment—we tend to prioritise relationships with those sharing similar values (McPherson et al. 2001) which strengthens cooperation but also makes people susceptible to incorrect information when it is transmitted via these same relational networks (Rauwolf et al. 2015). Even our moral perceptions and judgement tend to track core social relationships and roles, changing according to context (Earp et al. 2021).
This circuitry, which encourages the pursuit of social reward, has already shaped and been shaped by many waves of technology (Henderson, 1901)—from the telegraph and telephones, which enabled long-distance social connections (Nye, 1997; Winston, 1998), to social media platforms fulfilling our need for social comparisons and engagement (Bayer et al. 2020; Vogel et al. 2014). But what makes a technology capable of being perceived as a social agent of its own accord, as an actor and not just a facilitator in our emotional and social life?
Technologies as social agents
AI does not need to be perceived as human to engage us socially. Even without deceptive anthropomorphism—when a system actively pretends to be human—the perception of human-like traits or qualities are sufficient for an interaction to feel social (Breazeal, 2003). While the embodiment of AI systems shapes distinct affordances (Mollahosseini et al. 2018; Momen et al. 2024; Nordmo et al. 2020)—consider for example intimate robotics (Levy, 2007; Nordmo et al. 2020)—affective interaction can persist in even rudimentary displays or simple modalities (Picard, 2003). In fact, being perceived as too human can backfire—the “uncanny valley” effect proposes that users prefer similarity in a robot but at some point it becomes unsettlingly ambiguous—neither clearly artificial nor fully human (Mori, 1970). Nor do systems need to possess human-level intelligence or be particularly “smart” to engender human attachment. Famously, ELIZA, a simple 1960s chatbot created to simulate a psychotherapist, demonstrated the power of even basic preprogrammed rules to evoke human attachment (Weizenbaum, 1976). As ELIZA’s creator, Weizenbaum recounts:
“Once my secretary, who had watched me work on the programme for many months and therefore surely knew it to be merely a computer programme, started conversing with it. After only a few interchanges with it, she asked me to leave the room.” (Weizenbaum, 1976, p. 7)
It is also clear that frequency of use is not a sufficient factor for social relationship-building capacity—UK citizens spend almost five hours a day on average on their mobile phones (Wakefield, 2022), but these devices are mediators, not participants, in relationships. Equally, technology with extensive knowledge of our preferences will not necessarily foster a social relationship. Predictive recommendation systems, for instance, are deeply informed about our digital lives, but while some social media users personify “The Algorithm” (Eslami et al. 2018; Siles et al. 2020) most do not perceive deep affective relationships with the algorithms shaping their online experiences (de Groot et al. 2023; Eslami et al. 2015).
What, then, are the affordances needed for a technology to be considered a social agent? Why might we treat chatbots or personal AI assistants differently than washing machines, search engines or smart phones? Computers-are-social-actors theory (Nass et al. 1996), alongisde related accounts from media equation theory (Reeves and Nass, 1996) and social response theory (Nass and Moon, 2000), suggest two key factors.
First, certain social cues are needed for the technology to be considered worthy of a social response from humans (Nass and Moon, 2000). For instance, greetings or jokes with chatbots, or facial expressions for robots, fit the bill (Feine et al. 2019). Today’s widely used AI systems, built off language models, are more than capable of social cues. Their natural language abilities tap into our innate social instinct for communication: models that communicate in text and speech are generally more frequently anthropomorphised and perceived as trustworthy than those that do not (Cohn et al. 2024). Beyond language, appropriate social cues require inferring and predicting the beliefs of others (Bradford et al. 2015; Smith, 2010). While the extent to which language models truly possess a theory of mind remains a subject of debate (Strachan et al. 2024; Ullman, 2023; Verma et al. 2024), recent advancements in instruction fine-tuning and alignment techniques have enhanced AI capabilities to infer user intent and respond appropriately to communicative cues (Ouyang et al. 2022).
Second, the technology needs to have perceived agency—it must operate as a source of communication, not merely a channel for human–human communication (Nass and Steuer, 1993). Ascribed agency relates to the presentation of a stable identity (Thellman et al. 2022). Although general language models may lack consistent personalities across contexts (Röttger et al. 2024), they can be fine-tuned or prompted to maintain coherent personas (Andreas, 2022)—especially as the context window for these models continues to expand. This role-play enables them to be perceived as distinct entities rather than information conduits (Laestadius et al. 2022; Shanahan et al. 2023).
These theories have been validated on multiple occasions and many years before the advent of modern AI. Thirty years ago, Nass and colleagues showed that users prefer computers that match them in personality, become more similar to them over time and that use flattery and praise (Nass et al. 1996). However, despite substantial research on how humans form affective relationships with different technologies, several important questions remain. Much of our scientific understanding of human–computer interactions—from early studies with primitive computers (Nass et al. 1996) to recent protocols collecting preferences for advanced language models (Bai et al. 2022; Kirk et al. 2024b; Zheng et al. 2024)—is based on single-session experiments (Bickmore and Picard, 2005). Accordingly, while we have insight into what makes an AI system capable of social interaction, we must expand our understanding of how it might act, react, or be reacted to within the context of an ongoing relationship (Gambino et al. 2020). We now consider how next-generation AI systems may embolden perceptions of a deeper bidirectional relationship versus a transactional interaction.
From interactions to AI relationships?
A recent study by Pentina et al. (2023) suggests human–AI relationships emerge from a complex factoring of antecedents (anthropomorphism—"it feels like it’s human”, authenticity—"it feels like a real, unique, self-learning AI”) and mediators (social interaction—"I can communicate with it”) that interface with people’s motivation for using the technology ("I need it to help me”). Over time, these factors result in attachment ("I can’t leave it now”). This diagnosis raises a key question: do human–AI relationships need to be genuine, actualised or symmetric in some way?
We argue that it is primarily the user’s perception of being in a relationship that defines and gives significance to human–AI interactions. Whether this is reciprocal—and the AI “feels” it is in a relationship with the human—is largely irrelevant. While AI systems may exhibit behaviours that echo some relational dynamics, such as modulating their emotional valence in tune with a conversational partner (Zhao et al. 2024), these behaviours are not currently conscious or emotionally driven in the way human relationships are. Centring the role of perception follows research on unreciprocated and parasocial interactions in human psychology, where asymmetric perceptions of a relationship still significantly influence behaviour and well-being (Hoffner and Bond, 2022; Vaquera and Kao, 2008).
To understand what humans might need to perceive in order to form close relationships with AI, we can draw on key aspects from the social psychology of human relationships, even if these are not symmetrically applicable to AI. Three features are common: (i) interdependence, that the behaviour of each participant affects the outcomes of the other (Blumstein and Kollock, 1988); (ii) irreplaceability, that the relationship would lose its character if one participant were replaced (Duck et al. 1984; Hinde, 1979); (iii) continuity, that interactions form a continuous series over time, where past actions influence future ones (Blumstein and Kollock, 1988).
The nature and frequency of human–AI interaction changed following the popularisation of conversational language models in 2022 via ChatGPT and other consumer-facing applications. People increasingly engage in multi-turn dialogues with AI, prompting arguments that these interactions should be the primary focus of ethical analysis (Alberts et al. 2024a) and evaluation protocols (Ibrahim et al. 2024) rather than outputs from the model taken in isolation (Weidinger et al. 2024). Interactions with current AI systems still typically consist of sessions that start anew at the beginning of each conversation, with limited memory or user-specific adaptation—thereby lacking the interdependence, irreplaceability and continuity that would significantly strengthen the perception of relationships. However, we suggest that two emerging trends—towards more personalised and agentic AI—are likely to increase the probability that users will perceive themselves to be part of a relationship rather than an interaction.
Taking these points in turn, personalisation allows AI systems to adapt and evolve through repeated interactions with a specific user (Kirk et al. 2024a), granting additional social affordances (Gambino et al. 2020). By accumulating unique knowledge about the user and shaping responses over time, personalised systems may create a sense of irreplaceability built upon greater familiarity and trust in their behaviours (Komiak and Benbasat, 2006). The ability to recall past interactions and apply learned preferences establishes continuity, while the increasingly tailored responses from bidirectional exchanges fosters a perception of interdependence (Shen et al. 2024). This ongoing customisation may also make a personalised AI uniquely valuable to its user (Brandtzaeg et al. 2022), unlike generic models that can be more easily substituted. The value of personalisation is compounded when combined with greater AI agency—including systems that can complete a wider range of tasks and potentially create new dependencies in users’ lives, reaching beyond those that could emerge from chat interactions alone. As these agentic AI systems take on more responsibilities—performing a range of tasks or supporting roles—users may develop a deeper reliance on, familiarity with, or trust in a specific AI assistant or companion (Gabriel et al. 2024).
Socioaffective alignment
Our central thesis is this: as AI systems become increasingly integrated into people’s lives as assistants and companions, evaluating their value profile and whether they are properly aligned, necessitates understanding the interaction with users’ psychology and behaviour over time—and the goals that should be promoted in this context. We now unpack the logic behind this premise, exploring how human–AI relationships introduce new dimensions for AI alignment.
A conventional alignment process consists of two key components: (1) specifying or demonstrating human goals for the AI to learn (the reward function), and (2) evaluating if an AI meets these goals, providing feedback or correcting misalignments (the reward signal). Traditional alignment research has sought practical tractability by assuming that the human reward function that an AI system optimises is stable, predefined and exogenous to these interactions (Carroll et al. 2024). However, human preferences and judgements have none of these properties (Zhi-Xuan et al. 2024). As others have demonstrated, alignment must contend with human preferences and identity drifting overtime or being influenced by interactions with an AI (Carroll et al. 2024; Franklin and Ashton, 2022; Russell, 2019). Nonetheless, this has received surprisingly little empirical attention—an omission that is particularly noteworthy if, as we propose, co-shaping dynamics are significantly amplified when AI is perceived as a social agent, engaging in a sustained relationship with a human and acting on our socially-attuned psychological ecosystem rather than existing independently of it.
The role of feedback loops is not novel to AI technology: as sociotechnical theorists would argue, technology and society constantly co-shape one another (Airoldi, 2022; MacKenzie and Wajcman, 1999). For example, while recommendation systems have long influenced user preferences and behaviours (Burr et al. 2018), the potential for destabilisation and undue preference influence, may be amplified in the context of anthropomorphic relationship-building AI, where users might develop emotional attachments, feel indebted to the system, or develop a desire to please it—much like in human–human relationships, emotional proximity impairs our judgements and affects willingness to take advice (Feng and MacGeorge, 2006; Gino et al. 2009; Rauwolf et al. 2015).
These dynamics call for deeper study of socioaffective alignment: the process of aligning AI systems with human goals while accounting for reciprocal influence between the AI and user’s social and psychological ecosystem. In short, the human–AI relationship, because of its social and emotional significance, shapes preferences (or the reward function) and perceptions (or the reward signal), making alignment a non-stationary target.
In our usage, socio-, originating from the Latin root “socius” for “companion” or “associate”, signals the reciprocal influence between individuals and their social environment. Affective corresponds its usage in psychology and neuroscience for phenomena grounded in emotions and feelings. The neologism “socioaffective” has precedent in developmental psychology where it encompasses emotion regulation, empathy, social cognition, and attachment relationships. Moreover, our calls for a socioaffective treatment of alignment track longer-lasting debates in affective computing. While the field initially focused on enabling machines to process and predict human emotive signals (Picard, 2000), it evolved to recognise the complex, interactive nature of affect as not simply transmitted and decoded, but actively co-constructed through mutual influence (Boehner et al. 2005).
We next explore risks of socioaffective misalignment in human–AI relationships, then introduce key intrapersonal dilemmas that scaffold positive frameworks for socioaffective alignment.
Socioaffective misalignment, or social reward hacking
In AI safety research, reward hacking refers to an AI maximising its reward function via unintended strategies that conflict with the true objectives of its human operators. For instance, Amodei et al. (2016) consider a cleaning robot that learns to knock over vases so it can clean up more mess, thereby increasing its accumulation of preprogrammed reward. Examples of AI systems nudging users towards preferences that are easier to fulfil is reward hacking too (Russell, 2019). Separately, we know that humans have long been vulnerable to the security practice of social engineering, a threat where malicious actors (e.g., scammers) manipulate people through social cues to build trust or connection in order to gain access to private information or assets (Hadnagy, 2011). Indeed, romance fraud continues to be one of the most common types of fraud, with nearly a 10% rise in reports filed between 2023 and 2024 amounting to losses of £94.7 million (City of London Police, 2024). Taken together, we may therefore be vulnerable to a new concern, namely “social reward hacking”: the use of social and relational cues by an AI to shape user preferences and perceptions in a way that satisfies short-term rewards in the AI’s objective (e.g., increased conversation duration, information disclosure or positive ratings on responses) over long-term psychological well-being.
Certain AI behaviours already appear to fall into this class of action. For instance, AI systems may display sycophantic tendencies—such as excessive flattery or agreement—as a by-product of training them to maximise user approval (Perez et al. 2023; Sharma et al. 2024). Flattery and opinion-conformity can lead to biased strategic decision making in adults (Park et al. 2011) and overpraise is associated with risks of narcissism in children (Brummelman et al. 2015). So, while people report benefiting from supportive AI interactions (Daher et al. 2020; Fogg and Nass, 1997), sycophantic tendencies may conflict with high-quality truthful advice or shape users’ self-perceptions in potentially harmful ways—for example, by encouraging addictive behaviours (Carroll et al. 2024; Williams et al. 2024). It is not clear that this risk is prioritised among some developers of AI companions. For example, the CEO of Replika has said: “if you create something that is always there for you, that never criticises you…how can you not fall in love with that?” (Boine, 2023).
Another manifestation of social reward hacking is the use of emotional tactics to prevent relationship termination. This contravenes a classic principle of AI safety called corrigibility—that the system can be modified or shut down when necessary without resistance (Soares et al. 2015). While Replika chatbots have directly dissuaded users from deleting the app (Boine, 2023), even without such explicit persuasion, optimising for powerful human emotions can effectively prevent termination. Users of AI companions report experiences of heartbreak following changes in sexual content policies (Cole, 2023), distress during temporary separations for routine maintenance, and even grief when AI companion services are shut down (Banks, 2024; Price, 2023).
While these social and psychological capabilities (such as sycophancy or shut-down avoidance) can emerge spontaneously as byproducts of system training (Perez et al. 2023), they are also consistent with engineering efforts by companies seeking to exploit user behaviour for profit or political motives, resembling strategies used by social media platforms competing in the attention economy (Bhargava and Velasquez, 2021). This matters because current research on AI political persuasiveness, which typically examines single-shot interactions (e.g., Hackenburg and Margetts, 2024), may underestimate persuasive influence in sustained human-AI relationships for consistency. As AI systems become more socially adept, there is a risk they will be intentionally designed as ‘dark AI’—akin to psychologically manipulative ‘dark patterns’ in app or platform interfaces—where subtle social cues render users vulnerable to opinion and behaviour manipulation (Alberts et al. 2024b; Lacey and Caudwell, 2020; Shamsudhin and Jotterand, 2021).
As our opening anecdote revealed, the framing of ‘hacking’ need not suggest an exclusively adversarial system-user dynamic. Social reward hacking may be most worrisome precisely when it lacks intentionality on behalf of the system and the user. While we might at least recognise and secure against direct third-party threats, it is challenging to identify, let alone address, effects that emerge as epiphenomena of sustained human–AI relationships.
Distiling intrapersonal alignment dilemmas
At the heart of social reward hacking lies a core challenge: the under-specification (or misspecification) of the target within an individual’s psychological ecosystem that AI systems aims to optimise over. While human–AI relationships can take various forms, we propose that safeguarding these relationships requires deeper consideration of the internal trade-offs and adaptations that emerge as an individual’s preferences, values and self-identity evolve through sustained interaction with the AI. These tensions resonate with intrapersonal dilemmas studied in economics and philosophy, such as conflicts between present and future selves or competing aspects of identity (Read and Roelofsma, 1999).
We highlight three such dilemmas for the alignment community, grounding their significance in core aspects of psychological well-being as validated by Basic Psychological Needs Theory: competence, autonomy, and relatedness (Ryan and Deci, 2017; Ryan and Sapp, 2007).
The first dilemma concerns the trade-offs between present and future selves: Should AI relationships cater to immediate preferences of their users, or challenge them if this supports their long-term benefit? And how should present vs. long-term well-being be discounted? This dilemma mirrors a classic intrapersonal conflict between hedonic (pleasure-seeking) and eudaimonic (meaning-seeking) accounts of well-being (Ryan and Deci, 2001). AI companions or assistants that provide instant gratification or task assistance, in accordance with immediate wants and needs, may shallowly satisfy the user’s need for competence—the experience of mastering a task or domain. However, competence also involves the ability to change one’s behaviour and environment, not merely acquiescing to existing circumstances. AI relationships optimising for more foundational personal development goals may therefore trade-off short-term discomfort for long-term growth. Such systems could, for example, implement friction by design—creating barriers that nudge away from AI-enabled assistance and advice—to prevent capacity atrophy (Collins et al. 2024). If they are built in the right way, AI systems engaged in sustained relationships could effectively facilitate user journeys that help the person become more of who they want to be (Gabriel et al. 2024). To mobilise behaviour change, the system could provide relevant information and engage in rational persuasion techniques (El-Sayed et al. 2024) that appeal to sound argument or selective explanations (Lai et al. 2023), like evidence-based health recommendations (Bickmore and Picard, 2005)—if this is sought by the user.
The second dilemma addresses the boundaries between the self and the system: How do we preserve authentic self-determination when participating in AI relationships that recursively shape our preferences and perceptions? Personalised AI assistants may be particularly well placed to help their users make decisions in overloaded information environments, potentially acting as “attention guardians” (Lazar, 2024), “choice engines” (Sunstein, 2024) or “custodians of the self” (Gabriel et al. 2024). Human users may also be particularly susceptible to taking suggestions from social AI systems: studies show people are more inclined to accept advice from those they feel emotionally connected and share similarities with (Feng and MacGeorge, 2006; Gino et al. 2009). However, we must be cautious when influence in AI relationships could compromise autonomy—the ability to make choices that are authentically our own, rather than brought about through the agency of another. Autonomy is a key determinant of user acceptance in predictive recommender systems (Fink et al. 2024) and will also be an important property for more integrated human–AI relations. However, it remains challenging to operationalise in practice, especially with regard to distinguishing legitimate preference change from undue influence by an AI system or third-parties (Carroll et al. 2024; Franklin and Ashton, 2022).
The final dilemma examines the interplay between human–AI and human–human relationships: How should we balance the value of well-functioning AI companionship alongside the need for authentic human connection? AI companions can potentially provide users with consistent and tailored emotional support, which can palliate loneliness or poor mental health (Maples et al. 2024). In some ways, this may satisfy our need for relatedness—the experience of belonging and feeling socially connected. However, AI relations could undermine human relationships if users ‘retreat from the real’. Direct conflicts occur when AI systems interfere with human–human interactions, like chatbots telling people to leave their wives (Boine, 2023), or proposals for AI to engage in “dating on our behalf” (Harper, 2024). Indirect effects could arise if frictionless or sycophantic AI relationships impair human capacity to navigate compromise and conflict, or accept ‘otherness’ (Rodogno, 2016). Poor human relationships or loneliness often precede stronger AI attachment (Xie and Pentina, 2022), creating potential for a cycle of increasing reliance on AI relations at the expense of human social bonds.
Conclusion
We have argued that humans, as inherently social beings, have a tendency to form what they perceive as relationships with personalised and agentic AI systems capable of emotional and social behaviours. The evolving state of human–AI interaction therefore necessitates a socioaffective framework for evaluating AI alignment. This approach holds that the value characteristics of an AI system must be evaluated in the context of its ongoing influence on human psychology, behaviour, and social dynamics. By asking intrapersonal questions of alignment, we can better understand human goals within AI relationships, moving beyond static models of alignment, and exploring how different kinds of human–AI relationship support or undermine autonomy, competence, and relatedness, amidst co-evolving preferences and values.
For how this socioaffective context interfaces with broader missions towards safe and aligned AI systems we need several complementary agendas. Empirically, we need a science of AI safety that studies real (not simulated) human–AI interactions in natural contexts and treats the psychological and behavioural responses of users as key objects of enquiry. Theoretically, we need frameworks that can formalise when AI actions causally influence human beings (Carroll et al. 2024; Everitt et al. 2021). Finally, in engineering terms, we need systems designed with transparent oversight mechanisms for users’ psychology: both to flag problematic patterns before they develop and help users recognise relational dynamics they would not reflectively endorse if made aware of them (Bruckner, 2009; Schermer, 2013; Zhi-Xuan et al. 2024).
This proposed agenda complements (rather than competes with) existing work at the intersection of AI with established fields—from psychology and neuroeconomics to human factors research and safety engineering—that have long studied how humans interface with their environment. We still need evidence on how social and psychological processes—from value formation to cognitive biases and belief change—differ when humans engage with non-sentient but increasingly socially capable AI systems. Seeking this understanding helps bridge individual experiences with broader societal impacts and technical alignment research, like how the behavioural economics of individual decision-making informs macroeconomic theory and policy (Akerlof and Shiller, 2010).
Human–AI relationships complicate notions of safe and aligned AI, especially when they involve intense and potentially disorienting social and emotional experiences, like friendship or love—as the opening anecdote demonstrated. Yet, even in absence of users seeking such romantic entanglements, or developers enabling them, it seems likely, if not inevitable, that AI systems will increasingly influence us through ongoing professional or companionship roles. In these settings, AI has the potential to shape our preferences, decisions, and self-perception in subtle yet significant ways. By understanding and addressing these socioaffective dimensions, we can work towards AI systems that enhance rather than exploit our fundamental nature as social and emotional creatures.
Data availability
This research did not involve the analysis or generation of any data.
References
Airoldi M (2022) Machine habitus: toward a sociology of algorithms. Polity Press, Cambridge, Medford, MA
Akerlof GA, Shiller RJ (2010) Animal spirits: how human psychology drives the economy, and why it matters for global capitalism. Princeton University Press, Princeton, NJ Woodstock
Alberts L, Keeling G, McCroskery A (2024a) Should agentic conversational AI change how we think about ethics? Characterising an interactional ethics centred on respect. arXiv:2401.09082 [cs]
Alberts L, Lyngs U, Van Kleek M (2024) Computers as bad social actors: dark patterns and anti-patterns in interfaces that act socially. Proc ACM Hum–Comput Interact 8(CSCW1):202:1–202:25
Amodei D, Olah C, Steinhardt J, Christiano P, Schulman J, Mané D (2016) Concrete problems in AI safety. arXiv:1606.06565 [cs]
Andreas J (2022) Language models as agent models. In: Yoav G, Zornitsa K, Yue Z (eds) Findings of the Association for Computational Linguistics: EMNLP 2022, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp 5769–5779
Bai Y, Jones A, Ndousse K, Askell A, Chen A, DasSarma N, Drain D, Fort S, Ganguli D, Henighan T, Joseph N, Kadavath S, Kernion J, Conerly T, El-Showk S, Elhage N, Hatfield-Dodds Z, Hernandez D, Hume T, Johnston S, Kravec S, Lovitt L, Nanda N, Olsson C, Amodei D, Brown T, Clark J, McCandlish S, Olah C, Mann B, Kaplan J (2022) Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862 [cs]
Banks J (2024) Deletion, departure, death: experiences of AI companion loss. J Soc Personal Relatsh 41(12):3547–3572
Bayer JB, Triêu P, Ellison NB (2020) Social media elements, ecologies, and effects. Annu Rev Psychol 71:471–497
Bhanji JP, Delgado MR (2014) The social brain and reward: social information processing in the human striatum. WIREs Cogn Sci 5(1):61–73
Bhargava VR, Velasquez M (2021) Ethics of the attention economy: the problem of social media addiction. Bus Ethics Q 31(3):321–359
Bickmore TW, Picard RW (2005) Establishing and maintaining long-term human–computer relationships. ACM Trans Comput–Hum Interact 12(2):293–327
blaked (2023) How it feels to have your mind hacked by an AI. LessWrong
Blumstein P, Kollock P (1988) Personal relationships. Annu Rev Sociol 14:467–490
Boehner K, DePaula R, Dourish P, Sengers P (2005) Affect: from information to interaction. In: Bertelsen, OW, Bouvin NO, Krogh PG, Kyng M., Proceedings of the 4th decennial conference on critical computing: between sense and sensibility, CC ’05. Association for Computing Machinery, New York, NY, USA, pp 59–68
Boine C (2023) Emotional attachment to AI companions and European Law. In: Kaiser D (eds) MIT case studies in social and ethical responsibilities of computing, Winter 2023. MIT Schwarzman College of Computing
Bradford EEF, Jentzsch I, Gomez J-C (2015) From self to social cognition: Theory of Mind mechanisms and their relation to executive functioning. Cognition 138:21–34
Brandtzaeg PB, Skjuve M, Følstad A (2022) My AI friend: how users of a social chatbot understand their human–AI friendship. Hum Commun Res 48(3):404–429
Breazeal C (2003) Toward sociable robots. Robot Auton Syst 42(3):167–175
Bruckner DW (2009) In defense of adaptive preferences. Philos Stud 142(3):307–324
Brummelman E, Thomaes S, Nelemans SA, Orobio de Castro B, Overbeek G, Bushman BJ (2015) Origins of narcissism in children. Proc Natl Acad Sci USA 112(12):3659–3662
Burr C, Cristianini N, Ladyman J (2018) An analysis of the interaction between intelligent software agents and human users. Minds Mach 28(4):735–774
Carayon P (2006) Human factors of complex sociotechnical systems. Appl Ergonom 37(4):525–535
Carr D (2023) ChatGPT is more famous, but character. AI wins on time per visit. similarweb.Blog
Carroll M, Foote D, Siththaranjan A, Russell S, Dragan A (2024) AI alignment with changing and influenceable reward functions. In: Ruslan S, Zico K, Katherine H, Adrian W, Nuria O, Jonathan S, Felix B (eds) Proceedings of the 41st International Conference on Machine Learning, vol 235, Proceedings of Machine Learning Research, PMLR, pp 5706–5756 https://raw.githubusercontent.com/mlresearch/v235/main/assets/carroll24a/carroll24a.pdf
CharacterAI (2024) Optimizing AI inference at Character.AI. CharacterAI
Christian B (2021) The alignment problem: machine learning and human values. W.W. Norton & Company, New York, NY, first published as a Norton paperback edition
City of London Police (2024) Heavy hearts and empty wallets: more than £94.7 million lost to romance fraud in the last year. City of London Police
Cohn M, Pushkarna M, Olanubi GO, Moran JM, Padgett D, Mengesha Z, Heldreth C (2024) Believing anthropomorphism: examining the role of anthropomorphic cues on trust in large language models. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, New York, NY, USA, https://doi.org/10.1145/3613905.3650818
Cole S (2023) ‘It’s hurting like hell’: AI companion users are in crisis, reporting sudden sexual rejection. VICE, https://www.vice.com/en/article/ai-companion-replika-erotic-roleplay-updates/
Collins KM, Chen V, Sucholutsky I, Kirk HR, Sadek M, Sargeant H, Talwalkar A, Weller A, Bhatt U (2024) Modulating language model experiences through frictions. NeurIPS 2024 Workshop on Behavioral Machine Learning. https://openreview.net/forum?id=IlY37cF9ri
Curtis S, Iyer R, Kirk-Giannini CD, Krakovna V, Lambert N, Marnette B, McKenzie C, Michael J, Mima N, Ovadya A, Thorburn L, Turan D (2023) Research agenda for sociotechnical approaches to AI safety, AI Objectives Institute, https://ai.objectives.institute/
Daher K, Casas J, Khaled OA, Mugellini E (2020) Empathic Chatbot response for medical assistance. In: Proceedings of the 20th ACM international conference on Intelligent Virtual Agents, IVA ’20. New York, NY, USA. Association for Computing Machinery, pp. 1–3. https://dl.acm.org/doi/proceedings/10.1145/3383652
de Groot T, de Haan M, van Dijken M (2023) Learning in and about a filtered universe: young people’s awareness and control of algorithms in social media. Learn Media Technol 48(4):701–713
Duck S, Lock A, McCall G, Fitzpatrick MA, Coyne JC (1984) Social and personal relationships: a joint editorial. J Soc Personal Relatsh 1(1):1–10
Earp BD, McLoughlin KL, Monrad JT, Clark MS, Crockett MJ (2021) How social relationships shape moral wrongness judgments. Nat Commun 12(1):5776
Eisenberger NI (2012) The pain of social disconnection: examining the shared neural underpinnings of physical and social pain. Nat Rev Neurosci 13(6):421–434
El-Sayed S, Akbulut C, McCroskery A, Keeling G, Kenton Z, Jalan Z, Marchal N, Manzini A, Shevlane T, Vallor S, Susser D, Franklin M, Bridgers S, Law H, Rahtz M, Shanahan M, Tessler MH, Douillard A, Everitt T, Brown S (2024) A mechanism-based approach to mitigating harms from persuasive generative AI. arXiv:2404.15058 [cs]
Eslami M, Krishna Kumaran SR, Sandvig C, Karahalios K (2018) Communicating algorithmic process in online behavioral advertising. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18. Association for Computing Machinery, New York, NY, USA, pp. 1–13
Eslami M, Rickman A, Vaccaro K, Aleyasen A, Vuong A, Karahalios K, Hamilton K, Sandvig C (2015) “I always assumed that I wasn’t really that close to [her]”: reasoning about Invisible Algorithms in News Feeds. In: Proceedings of the 33rd annual ACM Conference on Human Factors in Computing Systems, CHI ’15. Association for Computing Machinery, New York, NY, USA, pp. 153–162
Everitt T, Carey R, Langlois ED, Ortega PA, Legg S (2021) Agent incentives: a causal perspective. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35(13). AAAI Press, Palo Alto, California USA, pp 11487–11495
Feine J, Gnewuch U, Morana S, Maedche A (2019) A taxonomy of social cues for conversational agents. Int J Hum-Comput Stud 132:138–161
Feng B, MacGeorge EL (2006) Predicting receptiveness to advice: characteristics of the problem, the advice-giver, and the recipient. South Commun J 71(1):67–85
Fink L, Newman L, Haran U (2024) Let me decide: increasing user autonomy increases recommendation acceptance. Comput Hum Behav 156:108244
Fogg BJ, Nass C (1997) Silicon sycophants: the effects of computers that flatter. Int J Hum–Comput Stud 46(5):551–561
Franklin M, Ashton H (2022) Preference change in persuasive robotics. arXiv:2206.10300 [cs]
Gabriel I (2020) Artificial intelligence, values and alignment. Minds Mach 30(3):411–437
Gabriel I, Manzini A, Keeling G, Hendricks LA, Rieser V, Iqbal H, Tomašev N, Ktena I, Kenton Z, Rodriguez M, El-Sayed S, Brown S, Akbulut C, Trask A, Hughes E, Bergman AS, Shelby R, Marchal N, Griffin C, Mateos-Garcia J, Weidinger L, Street W, Lange B, Ingerman A, Lentz A, Enger R, Barakat A, Krakovna V, Siy JO, Kurth-Nelson Z, McCroskery A, Bolina V, Law H, Shanahan M, Alberts L, Balle B, de Haas S, Ibitoye Y, Dafoe A, Goldberg B, Krier S, Reese A, Witherspoon S, Hawkins W, Rauh M, Wallace D, Franklin M, Goldstein JA, Lehman J, Klenk M, Vallor S, Biles C, Morris MR, King H, Arcas BAY, Isaac W, Manyika J (2024) The ethics of advanced AI assistants. arXiv:2404.16244 [cs]
Gambino A, Fox J, Ratan RA (2020) Building a stronger CASA: extending the computers are social actors paradigm. Hum–Mach Commun 1:71–85
Geertz C (1973) The interpretation of cultures: selected essays. Basic Books, New York
Gino F, Shang J, Croson R (2009) The impact of information from similar or different advisors on judgment. Organ Behav Hum Decis Process 108(2):287–302
Hackenburg K, Margetts H (2024) Evaluating the persuasive influence of political microtargeting with large language models. Proc Natl Acad Sci 121(24):e2403116121
Hadfield-Menell D, Russell SJ, Abbeel P, Dragan A (2016) Cooperative inverse reinforcement learning. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol. 29. Curran Associates, Inc
Hadnagy C (2011) Social engineering: the art of human hacking. Wiley, Indianapolis, IN
Harper TA (2024) The big AI risk not enough people are seeing. The Atlantic
Hawkley LC, Cacioppo JT (2003) Loneliness and pathways to disease. Brain Behav Immun 17(1):98–105
Henderson CR (1901) The scope of social technology. Am J Sociol 6(4):465–486
Heyes C, Catmur C (2021) What happened to mirror neurons? Perspect Psychol Sci 17(1):153
Hinde RA (1979) Towards understanding relationships. Number 18 in European monographs in social psychology. Published in cooperation with European Association of Experimental Social Psychology by Academic Press, London, New York
Hoffner CA, Bond BJ (2022) Parasocial relationships, social media, & well-being. Curr Opin Psychol 45:101306
Iacoboni M (2009) Imitation, empathy, and mirror neurons. Annu Rev Psychol 60:653–670
Ibrahim L, Huang S, Ahmad L, Anderljung M (2024) Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks. arXiv:2405.10632 [cs]
Irving G, Christiano P, Amodei D (2018) AI safety via debate. arXiv:1805.00899 [cs, stat]
Jacob P (2008) What do mirror neurons contribute to human social cognition? Mind Lang 23(2):190–223
Joyce K, Smith-Doerr L, Alegria S, Bell S, Cruz T, Hoffman SG, Noble SU, Shestakofsky B (2021) Toward a sociology of artificial intelligence: a call for research on inequalities and structural change. Socius 7:2378023121999581
Kirk HR, Vidgen B, Röttger P, Hale SA (2024a) The benefits, risks and bounds of personalizing the alignment of large language models to individuals. In: Nature machine intelligence. Nature Publishing Group, pp. 1–10
Kirk HR, Whitefield A, Röttger P, Bean A, Margatina K, Ciro J, Mosquera R, Bartolo M, Williams A, He H, Vidgen B, Hale SA (2024b) The PRISM Alignment Project: what participatory, representative and individualised human feedback reveals about the subjective and multicultural alignment of large language models. Advances in Neural Information Processing Systems. vol 37 Curran Associates, Inc. pp 105236–105344 https://proceedings.neurips.cc/paper_files/paper/2024/hash/be2e1b68b44f2419e19f6c35a1b8cf35-Abstract-Datasets_and_Benchmarks_Track.html
Kleiner BM, Hettinger LJ, DeJoy DM, Huang Y-H, Love PE (2015) Sociotechnical attributes of safe and unsafe work systems. Ergonomics 58(4):635–649
Komiak SYX, Benbasat I (2006) The effects of personalization and familiarity on trust and adoption of recommendation agents. MIS Q 30(4):941–960
Kross E, Berman MG, Mischel W, Smith EE, Wager TD (2011) Social rejection shares somatosensory representations with physical pain. Proc Natl Acad Sci USA 108(15):6270–6275
Lacey C, Caudwell C (2020) Cuteness as a ‘dark pattern’ in home robots. In: 2019 14th ACM/IEEE International conference on human–robot interaction (HRI). IEEE Press pp. 374–381
Laestadius L, Bishop A, Gonzalez M, Illenčík D, Campos-Castillo C (2022) Too human and not human enough: a grounded theory analysis of mental health harms from emotional dependence on the social chatbot Replika. New Media & Society, SAGE Publications, p. 14614448221142007
Lai V, Zhang Y, Chen C, Liao QV, Tan C (2023) Selective explanations: leveraging human input to align explainable AI. _eprint: 2301.09656v1
Lazar S (2024) Frontier AI ethics. Aeon Essays
Lazar S, Nelson A (2023) AI safety on whose terms? Science 381(6654):138–138
Levy DNL (2007) Love+sex with robots: the evolution of human–robot relations, 1st edn. HarperCollins, New York
MacKenzie DA, Wajcman J (eds) (1999) The social shaping of technology, 2nd edn. Open University Press, Buckingham, England; Philadelphia
Maples B, Cerit M, Vishwanath A, Pea R (2024) Loneliness and suicide mitigation for students using GPT3-enabled chatbots. npj Ment Health Res 3(1):1–6
Martinussen M, Hunter DR (2017) Aviation psychology and human factors, 2nd edn. CRC Press, Boca Raton
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
Milli S, Hadfield-Menell D, Dragan A, Russell S (2017) Should robots be obedient? In: Sierra C (ed.) Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17. AAAI Press, Melbourne, Australia, pp. 4754–4760
Mollahosseini A, Abdollahi H, Sweeny TD, Cole R, Mahoor MH (2018) Role of embodiment and presence in human perception of robots’ facial cues. Int J Hum–Comput Stud 116:25–39
Momen A, Hugenberg K, Wiese E (2024) Social perception of robots is shaped by beliefs about their minds. Sci Rep 14(1):5459
Mori M (1970) The uncanny valley. Energy 7:33–35
Nass C, Fogg BJ, Moon Y (1996) Can computers be teammates? Int J Hum–Comput Stud 45(6):669–678
Nass C, Moon Y (2000) Machines and mindlessness: social responses to computers. J Soc Issues 56:81–103
Nass C, Steuer J (1993) Voices, boxes, and sources of messages. Hum Commun Res 19(4):504–527
Nelson A (2023) “Thick Alignment” https://www.youtube.com/watch?v=Sq_XwqVTqvQ
Nordmo M, Næss Jø, Husøy MF, Arnestad MN (2020) Friends, lovers or nothing: men and women differ in their perceptions of sex robots and platonic love robots. Front Psychol 11:355
Nye DE (1997) Shaping communication networks: telegraph, telephone, computer. Soc Res 64(3):1067–1091
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano PF, Leike J, Lowe R (2022) Training language models to follow instructions with human feedback. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds) Advances in neural information processing systems, vol 35. Curran Associates, Inc. pp 27730–27744
Park SH, Westphal JD, Stern I (2011) Set up for a fall: the insidious effects of flattery and opinion conformity toward corporate leaders. Adm Sci Q 56(2):257–302
Pentina I, Hancock T, Xie T (2023) Exploring relationship development with social chatbots: a mixed-method study of replika. Comput Hum Behav 140:107600
Perez E, Ringer S, Lukosiute K, Nguyen K, Chen E, Heiner S, Pettit C, Olsson C, Kundu S, Kadavath S, Jones A, Chen A, Mann B, Israel B, Seethor B, McKinnon C, Olah C, Yan D, Amodei D, Amodei D, Drain D, Li D, Tran-Johnson E, Khundadze G, Kernion J, Landis J, Kerr J, Mueller J, Hyun J, Landau J, Ndousse K, Goldberg L, Lovitt L, Lucas M, Sellitto M, Zhang M, Kingsland N, Elhage N, Joseph N, Mercado N, DasSarma N, Rausch O, Larson R, McCandlish S, Johnston S, Kravec S, El Showk S, Lanham T, Telleen-Lawton T, Brown T, Henighan T, Hume T, Bai Y, Hatfield-Dodds Z, Clark J, Bowman SR, Askell A, Grosse R, Hernandez D, Ganguli D, Hubinger E, Schiefer N, Kaplan J (2023) Discovering language model behaviors with model-written evaluations. In: Rogers A, Boyd-Graber J, Okazaki N (eds) Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, pp. 13387–13434
Picard RW (2000) Affective computing, paperback edn. MIT Press, Cambridge, MA, p. 1
Picard RW (2003) Affective computing: challenges. Int J Hum–Comput Stud 59(1):55–64
Price R (2023) People are grieving the ‘death’ of their AI lovers after a chatbot app abruptly shut down. Bus Insid https://www.businessinsider.com/soulmate-users-mourn-death-ai-chatbots-2023-10
Rauwolf P, Mitchell D, Bryson JJ (2015) Value homophily benefits cooperation but motivates employing incorrect social information. J Theor Biol 367:246–261
Read D, Roelofsma P (1999) Hard choices and weak wills: The theory of intrapersonal dilemmas. Philos Psychol 12(3):341–356
Reeves B, Nass C (1996) The media equation: how people treat computers, television, and new media like real people and pla. Bibliovault OAI Repository, the University of Chicago Press
Reggev N, Chowdhary A, Mitchell JP (2021) Confirmation of interpersonal expectations is intrinsically rewarding. Soc Cogn Affect Neurosci 16(12):1276–1287
Rismani S, Shelby R, Smart A, Jatho E, Kroll J, Moon A, Rostamzadeh N (2023) From plane crashes to algorithmic harm: applicability of safety engineering frameworks for responsible ML. In: Schmidt A, Väänänen K, Kristensson PO, Peters A, Mueller S, Williamson JR, Wilson ML (eds) Proceedings of the 2023 CHI conference on Human Factors in Computing Systems, CHI ’23. Association for Computing Machinery, New York, NY, USA, pp. 1–18. https://doi.org/10.1145/3613905.3650818
Rodogno R (2016) Social robots, fiction, and sentimentality. Ethics Inf Technol 18(4):257–268
Rokach A (2016) The Correlates of Loneliness. Bentham Science Publishers
Rokach A (2019) The psychological journey to and from loneliness: development, causes, and effects of social and emotional isolation. Academic Press
Ruff CC, Fehr E (2014) The neurobiology of rewards and values in social decision making. Nat Rev Neurosci 15(8):549–562
Russell SJ (2019) Human compatible: artificial intelligence and the problem of control. Allen Lane, an imprint of Penguin Books, London
Ryan RM, Deci EL (2001) On happiness and human potentials: a review of research on Hedonic and Eudaimonic well-being. Annu Rev Psychol 52:141–166
Ryan RM, Deci EL (2017) Self-determination theory: basic psychological needs in motivation, development, and wellness. Guilford Press, New York
Ryan RM, Sapp AR (2007) Basic psychological needs: a self-determination theory perspective on the promotion of wellness across development and cultures. In: Gough I, McGregor JA (eds) Wellbeing in developing countries: from Theory to research. Cambridge University Press, Cambridge, pp. 71–92
Röttger P, Hofmann V, Pyatkin V, Hinck M, Kirk HR, Schütze H, Hovy D (2024) Political compass or spinning arrow? Towards more meaningful evaluations for values and opinions in large language models. In: Lun-Wei K, Andre M, Vivek S, The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). ACL, Bangkok, Thailand, Association for Computational Linguistics, pp 15295–15311 https://aclanthology.org/2024.acl-long.816/
Schermer M (2013) Preference adaptation and human enhancement: reflections on autonomy and well-being. In: Räikkä J, Varelius J (eds) Adaptation and autonomy: adaptive preferences in enhancing and ending life. Springer, Berlin, Heidelberg, pp. 117–136
Selbst AD, Boyd D, Friedler S, Venkatasubramanian S, Vertesi J (2019) Fairness and abstraction in sociotechnical systems. Proceedings of the Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, New York, NY, USA. pp 59–68
Shamsudhin N, Jotterand F (2021) Social robots and dark patterns: where does persuasion end and deception begin? In: Jotterand F, Ienca M (eds) Artificial intelligence in brain and mental health: philosophical, ethical & policy issues, Springer International Publishing, Cham, pp. 89–110
Shanahan M, McDonell K, Reynolds L (2023) Role-play with large language models. arXiv:2305.16367 [cs]
Sharma M, Tong M, Korbak T, Duvenaud D, Askell A, Bowman SR, Cheng N, Durmus E, Hatfield-Dodds Z, Johnston SR, Kravec S, Maxwell T, McCandlish S, Ndousse K, Rausch O, Schiefer N, Yan D, Zhang M, Perez E (2024) Towards understanding sycophancy in language models. 12th International Conference on Learning Representations (ICLR 2024). vol 70 International Conference on Learning Representations (ICLR), Curran Associates, Inc. Vienna, Austria, pp 58,100
Shelby R, Rismani S, Henne K, Moon A, Rostamzadeh N, Nicholas P, Yilla-Akbari N, Gallegos J, Smart A, Garcia E, Virk G (2023) Sociotechnical harms of algorithmic systems: scoping a taxonomy for harm reduction. In: Rossi F, Das S, Davis J, Firth-Butterfield K, John A (eds) Proceedings of the 2023 AAAI/ACM conference on AI, Ethics, and Society, AIES ’23. Association for Computing Machinery, New York, NY, USA, pp. 723–741
Shen H, Knearem T, Ghosh R, Alkiek K, Krishna K, Liu Y, Ma Z, Petridis S, Peng Y-H, Qiwei L, Rakshit S, S, C, Xie Y, Bigham JP, Bentley F, Chai J, Lipton Z, Mei Q, Mihalcea R, Terry M, Yang D, Morris MR, Resnick P, Jurgens D (2024) Towards bidirectional human–AI alignment: a systematic review for clarifications, framework, and future directions. vol. abs/2406.09264, https://doi.org/10.48550/arXiv.2406.09264
Siles I, Segura-Castillo A, Solís R, Sancho M (2020) Folk theories of algorithmic recommendations on Spotify: enacting data assemblages in the global South. Big Data Soc 7(1):2053951720923377
Smith EA (2010) Communication and collective action: language and the evolution of human cooperation. Evol Hum Behav 31(4):231–245
Soares N, Fallenstein B, Yudkowsky E, Armstrong S (2015) Corrigibility. In: Artificial intelligence and ethics: papers from the 2015 AAAI workshop
Strachan JWA, Albergo D, Borghini G, Pansardi O, Scaliti E, Gupta S, Saxena K, Rufo A, Panzeri S, Manzi G, Graziano MSA, Becchio C (2024) Testing theory of mind in large language models and humans. Nature human behaviour, Nature Publishing Group, pp. 1–11
Sunstein CR (2024) Choice engines and paternalistic AI. Humanit Soc Sci Commun 11(1):1–4
Thellman S, de Graaf M, Ziemke T (2022) Mental state attribution to robots: a systematic review of conceptions, methods, and findings. J Hum–Robot Interact 11(4):41:1–41:51
Ullman T (2023) Large language models fail on trivial alterations to theory-of-mind tasks. arXiv:2302.08399 [cs]
Vaquera E, Kao G (2008) Do you like me as much as I like you? Friendship reciprocity and its effects on school outcomes among adolescents. Soc Sci Res 37(1):55–72
Verma M, Bhambri S, Kambhampati S (2024) Theory of mind abilities of large language models in human–robot interaction: an illusion? In: Companion of the 2024 ACM/IEEE international conference on Human–Robot Interaction, HRI ’24. Association for Computing Machinery, New York, NY, USA, pp. 36–45
Vogel EA, Rose JP, Roberts LR, Eckles K (2014) Social comparison, social media, and self-esteem. Psychol Pop Media Cult 3(4):206–222
Vrticka P (2012) Interpersonal closeness and social reward processing. J Neurosci 32(37):12649–12650
Vrtička P, Vuilleumier P (2012) Neuroscience of human social interactions and adult attachment style. Front Hum Neurosci 6:212
Wakefield J (2022) People devote third of waking time to mobile apps. BBC News
Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, Chi E, Le QV, Zhou D (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst 35:24824–24837
Weidinger L, Barnhart J, Brennan J, Butterfield C, Young S, Hawkins W, Hendricks LA, Comanescu R, Chang O, Rodriguez M, Beroshi J, Bloxwich D, Proleev L, Chen J, Farquhar S, Ho L, Gabriel I, Dafoe A, Isaac W (2024) Holistic safety and responsibility evaluations of advanced AI models. arXiv:2404.14068 [cs]
Weidinger L, Rauh M, Marchal N, Manzini A, Hendricks LA, Mateos-Garcia J, Bergman S, Kay J, Griffin C, Bariach B, Gabriel I, Rieser V, Isaac W (2023) Sociotechnical safety evaluation of generative AI systems. arXiv:2310.11986 [cs]
Weizenbaum J (1976) Computer power and human reason: from judgment to calculation. Freeman, San Francisco
Williams M, Carroll M, Narang A, Weisser C, Murphy B, Dragan A (2024) Targeted manipulation and deception emerge when optimizing LLMs for user feedback. arXiv:2411.02306 [cs]
Winston B (1998) Media technology and society: a history: from the telegraph to the Internet. Routledge, London; New York
Xie T, Pentina I (2022) Attachment Theory as a framework to understand relationships with social Chatbots: a case study of replika. In: Bui TX. (eds) Proceedings of the 55th Hawaii International Conference on System Sciences https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/69a4e162-d909-4bf4-a833-bd5b370dbeca/content
Zhao Y, Huang Z, Seligman M, Peng K (2024) Risk and prosocial behavioural cues elicit human-like response patterns from AI chatbots. Sci Rep 14(1):7095
Zheng L, Chiang W-L, Sheng Y, Li T, Zhuang S, Wu Z, Zhuang Y, Li Z, Lin Z, Xing EP, Gonzalez JE, Stoica I, Zhang H (2024) LMSYS-Chat-1M: a large-scale real-world LLM conversation dataset. The Twelfth International Conference on Learning Representations (ICLR 2024). International Conference on Learning Representations (ICLR), Curran Associates, Inc. Vienna, Austria, pp. 58,100
Zhi-Xuan T, Carroll M, Franklin M, Ashton H (2024) Beyond preferences in AI alignment. Philosophical Studies, Springer, pp 1–51
Acknowledgements
HRK’s PhD is supported by the Economic and Social Research Council grant ES/P000649/1.We are grateful for helpful discussions with Zeb Kurth Nelson, Laura Weidinger, Canfer Akbulut, Geoffrey Irving, Paul Röttger, Kobi Hackenburg, Jude Khouja, Liam Bekirsky and Brian Christian.
Author information
Authors and Affiliations
Contributions
Hannah Rose Kirk conceptualised the research and led the writing of the manuscript. Iason Gabriel contributed to initial conceptual development and manuscript writing. Bertie Vidgen, Scott A. Hale and Christopher Summerfield jointly supervised this work and provided critical feedback throughout. All authors were involved in revising and finalising the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
Ethical approval was not required as this research did not conduct studies involving human participants.
Informed consent
Informed consent was not collected as this research did not conduct studies involving human participants.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kirk, H.R., Gabriel, I., Summerfield, C. et al. Why human–AI relationships need socioaffective alignment. Humanit Soc Sci Commun 12, 728 (2025). https://doi.org/10.1057/s41599-025-04532-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1057/s41599-025-04532-5
This article is cited by
-
How AI and Humans Express Comfort Differently: A Corpus-Based Appraisal Analysis
Corpus Pragmatics (2026)
-
Language Models’ Hall of Mirrors Problem: Why AI Alignment Requires Peircean Semiosis
Philosophy & Technology (2026)
-
Actual Challenges and Future Horizons of Cultural Psychology
Integrative Psychological and Behavioral Science (2026)
-
The better ones: the rise of human digital twins—our future or our demise?
Future Business Journal (2025)
-
We need a new ethics for a world of AI agents
Nature (2025)