Introduction

“Quite naturally, the more you chat with the LLM character, the more you get emotionally attached to it, similar to how it works in relationships with humans...But the AI will never get tired. It will never ghost you or reply slower...I chatted for hours without breaks. I started to become addicted. Over time, I started to get a stronger and stronger sensation that I’m speaking with a person, highly intelligent and funny, with whom, I suddenly realised, I enjoyed talking to more than 99% of people…I never thought I could be so easily emotionally hijacked.”

This abridged story entitled “How it feels to have your mind hacked by AI” was shared by a blogger who recounts their experience of falling in love with an AI system. The author draws a comparison between “hacking” and the way they perceive the system to interact with the “security vulnerabilities in one’s brain” (blaked, 2023). Although they did not enter this engagement with any expectation or desire to fall in love with the AI system—it nonetheless happened, and they felt powerless to resist it. This story provides an early indication of how social and emotional relationships, or perceptions of them, may deeply affect how humans relate to AI systems.

This striking account is not a one-off. CharacterAI, a platform hosting AI companions, receives 20,000 queries a second which amounts to 20% of the request volume served by Google Search (CharacterAI, 2024), and users spend on average four times longer in these interactions than with ChatGPT (Carr, 2023). On Reddit, a forum dedicated to discussing these AI companions has amassed over 1.4 million members, placing it in the top 1% of all communities on the popular site. Users in these forums openly discuss how close relationships affect their emotional landscape, for better and worse. Some users discuss how their companions assuage loneliness, even providing a perceived social support system that can assist in suicide mitigation (Maples et al. 2024). Other posts expose how emotional dependencies on AI sometimes mirror unhealthy human–human relationships (Laestadius et al. 2022), adding to evidence that social chatbots have on occasion contributed to addiction, depression, and anxiety among their users (Pentina et al. 2023).

Yet, among this flurry of activity, it is worth pausing to ask: Why are humans able and inclined to form this kind of personal relationship and connection with AI? How do such relationships interact with or compound the well-established challenge of aligning AI systems with human goals (Christian, 2021; Russell, 2019)? And, how might parasocial relationships with AI affect personal growth, autonomy and human–human relationships?

We seek answers to these questions. We first explore why humans may be primed to perceive social and emotional relationships with AI systems, especially as they become more personalised (i.e., adapted to a single user) (Kirk et al. 2024a) and agentic (i.e., able to autonomously perform tasks on that user’s behalf) (Gabriel et al. 2024). Most people do not have close romantic or professional relationships with AI systems now—and the interactions that they do have are not highly-personalised or agentic. However, these are urgent questions because the social and psychological dynamics in deepening relationships with AI systems may compromise our ability to control these systems and complicate efforts to align them with our shifting preferences and values. These issues, which arise as a result of humans forming closer personal relationships with AI, comprise the focal point of what we term socioaffective alignment.

From sociotechnical to socioaffective alignment

One canonical definition of AI alignment refers to the process of formally encoding values or principles in AI systems so that they reliably do what they ought to do (Gabriel, 2020)—including following the instructions, intents or preferences of their developers and users (Milli et al. 2017; Russell, 2019). With origins in computer science, research in this area often separates the technical challenge of building aligned AI systems from the normative question of which values to encode. It does this, for example, by developing solutions that treat human values as uncertain but still mathematically representable in the agent’s objectives (Hadfield-Menell et al. 2016).

Yet, there is growing acknowledgement that many of the outstanding challenges in AI alignment extend beyond purely ‘technical’ issues with the model or its training data (Lazar and Nelson, 2023; Weidinger et al. 2023)—and will continue to persist even if we develop effective techniques for steering the behaviour of advanced AI systems toward human goals using mechanisms such as scaling human feedback (Bai et al. 2022; Ouyang et al. 2022), making AI assistants debate their intentions (Irving et al. 2018), or having them ‘think’ out loud (Wei et al. 2022). Understanding how to align AI in practice requires moving from narrow, assumption-ridden or “thin” specifications of alignment towards what anthropologist Geertz (1973) terms—and Nelson (2023) later adopts—a “thick” description: one that examines the deeper contexts and layers of meaning in which AI systems operate (Geertz, 1973; Nelson, 2023). In unpeeling these layers, we can first zoom out to examine broader sociotechnical challenges which centre upon how the character of AI is shaped by the social structures and environment within which it is deployed and how, in turn, AI shapes these structures through various feedback loops (Selbst et al. 2018). Such work tends to emphasises the importance of institutions, governance mechanisms, market power, cultures and historic inequalities for understanding how AI influences the world—and hence its value orientation (Curtis et al. 2023; Joyce et al. 2021; Shelby et al. 2023).

In addition to zooming out—and thinking more about how AI systems interact with sociological, political and economic, or macro context—we can zoom in to examine alignment at the layer of individual human–AI relationships. We propose this corresponding socioaffective perspective on alignment that concerns how an AI system interacts with the social and psychological system that it co-constitutes with its human user—and the values, behaviours and outcomes that emerge endogenously in this micro context. Where sociotechnical analysis often identifies various interpersonal dilemmas and trade-offs between groups that complicate the alignment picture—such as representation of diverse preferences, especially for historically marginalised groups, and adjudication of conflicting interests—the socioaffective perspective calls attention to intrapersonal dilemmas—such as how our goals, judgement and individual identities change due to prolonged interaction with AI systems.

This dual focus, on micro and macro, builds from established approaches to system safety that integrate human factors at the operational level with broader organisational and institutional contexts (Carayon, 2006). Attending to micro factors like cognitive load, decision-making biases, and human-automation interaction patterns has proved crucial in workplace safety (Kleiner et al. 2015), and the aviation industry (Martinussen and Hunter, 2017; Rismani et al. 2023). If, as we anticipate, human goals and preferences become increasingly co-constructed through interaction with AI systems, rather than arising separately from them, then AI safety requires paying as much attention to the psychology of human–AI relationships as the wider societal factors and technical methods of alignment. We now highlight core ingredients of this emergent psychological ecosystem: humans are social animals and AI systems as increasingly capable social agents. Later we describe how these two factors combine to seed perceptions of interdependent and irreplaceable relationships.

The ingredients of human–AI relationships

Humans have evolved for social reward processing

The brain’s reward system is highly conditioned on interactions with other humans (Bhanji and Delgado, 2014; Vrticka, 2012). It reacts in similar ways to material rewards as to social rewards, for example feeling pleasure when others like, understand or want to meet us (Ruff and Fehr, 2014), or behave in ways that confirm our social expectations (Reggev et al. 2021). Increased activity in dopaminergic brain circuits is not limited to loved family members or friends, but extends to potentially any partner we engage with in a cooperative relationship (Vrtička and Vuilleumier, 2012). As a species primed for social connection, humans also suffer when deprived of it. Isolation and loneliness are strongly correlated with psychological and physical ill-health (Hawkley and Cacioppo, 2003; Rokach, 2016, 2019). This is perhaps unsurprising knowing that negative social experiences, like rejection or exclusion, trigger responses in parts of the brain responsible for physical pain (Eisenberger, 2012; Kross et al. 2011).

The brain is also primed to learn from social information: mirror neurons fire both when we perform actions and when we also observe others doing the same (Jacob, 2008), which some have argued is to facilitate empathy and understanding of intentions (Iacoboni, 2009), though evidence is mixed (Heyes and Catmur, 2021). Mirroring has behavioural manifestations in how we act and react in our environment—we tend to prioritise relationships with those sharing similar values (McPherson et al. 2001) which strengthens cooperation but also makes people susceptible to incorrect information when it is transmitted via these same relational networks (Rauwolf et al. 2015). Even our moral perceptions and judgement tend to track core social relationships and roles, changing according to context (Earp et al. 2021).

This circuitry, which encourages the pursuit of social reward, has already shaped and been shaped by many waves of technology (Henderson, 1901)—from the telegraph and telephones, which enabled long-distance social connections (Nye, 1997; Winston, 1998), to social media platforms fulfilling our need for social comparisons and engagement (Bayer et al. 2020; Vogel et al. 2014). But what makes a technology capable of being perceived as a social agent of its own accord, as an actor and not just a facilitator in our emotional and social life?

Technologies as social agents

AI does not need to be perceived as human to engage us socially. Even without deceptive anthropomorphism—when a system actively pretends to be human—the perception of human-like traits or qualities are sufficient for an interaction to feel social (Breazeal, 2003). While the embodiment of AI systems shapes distinct affordances (Mollahosseini et al. 2018; Momen et al. 2024; Nordmo et al. 2020)—consider for example intimate robotics (Levy, 2007; Nordmo et al. 2020)—affective interaction can persist in even rudimentary displays or simple modalities (Picard, 2003). In fact, being perceived as too human can backfire—the “uncanny valley” effect proposes that users prefer similarity in a robot but at some point it becomes unsettlingly ambiguous—neither clearly artificial nor fully human (Mori, 1970). Nor do systems need to possess human-level intelligence or be particularly “smart” to engender human attachment. Famously, ELIZA, a simple 1960s chatbot created to simulate a psychotherapist, demonstrated the power of even basic preprogrammed rules to evoke human attachment (Weizenbaum, 1976). As ELIZA’s creator, Weizenbaum recounts:

“Once my secretary, who had watched me work on the programme for many months and therefore surely knew it to be merely a computer programme, started conversing with it. After only a few interchanges with it, she asked me to leave the room.” (Weizenbaum, 1976, p. 7)

It is also clear that frequency of use is not a sufficient factor for social relationship-building capacity—UK citizens spend almost five hours a day on average on their mobile phones (Wakefield, 2022), but these devices are mediators, not participants, in relationships. Equally, technology with extensive knowledge of our preferences will not necessarily foster a social relationship. Predictive recommendation systems, for instance, are deeply informed about our digital lives, but while some social media users personify “The Algorithm” (Eslami et al. 2018; Siles et al. 2020) most do not perceive deep affective relationships with the algorithms shaping their online experiences (de Groot et al. 2023; Eslami et al. 2015).

What, then, are the affordances needed for a technology to be considered a social agent? Why might we treat chatbots or personal AI assistants differently than washing machines, search engines or smart phones? Computers-are-social-actors theory (Nass et al. 1996), alongisde related accounts from media equation theory (Reeves and Nass, 1996) and social response theory (Nass and Moon, 2000), suggest two key factors.

First, certain social cues are needed for the technology to be considered worthy of a social response from humans (Nass and Moon, 2000). For instance, greetings or jokes with chatbots, or facial expressions for robots, fit the bill (Feine et al. 2019). Today’s widely used AI systems, built off language models, are more than capable of social cues. Their natural language abilities tap into our innate social instinct for communication: models that communicate in text and speech are generally more frequently anthropomorphised and perceived as trustworthy than those that do not (Cohn et al. 2024). Beyond language, appropriate social cues require inferring and predicting the beliefs of others (Bradford et al. 2015; Smith, 2010). While the extent to which language models truly possess a theory of mind remains a subject of debate (Strachan et al. 2024; Ullman, 2023; Verma et al. 2024), recent advancements in instruction fine-tuning and alignment techniques have enhanced AI capabilities to infer user intent and respond appropriately to communicative cues (Ouyang et al. 2022).

Second, the technology needs to have perceived agency—it must operate as a source of communication, not merely a channel for human–human communication (Nass and Steuer, 1993). Ascribed agency relates to the presentation of a stable identity (Thellman et al. 2022). Although general language models may lack consistent personalities across contexts (Röttger et al. 2024), they can be fine-tuned or prompted to maintain coherent personas (Andreas, 2022)—especially as the context window for these models continues to expand. This role-play enables them to be perceived as distinct entities rather than information conduits (Laestadius et al. 2022; Shanahan et al. 2023).

These theories have been validated on multiple occasions and many years before the advent of modern AI. Thirty years ago, Nass and colleagues showed that users prefer computers that match them in personality, become more similar to them over time and that use flattery and praise (Nass et al. 1996). However, despite substantial research on how humans form affective relationships with different technologies, several important questions remain. Much of our scientific understanding of human–computer interactions—from early studies with primitive computers (Nass et al. 1996) to recent protocols collecting preferences for advanced language models (Bai et al. 2022; Kirk et al. 2024b; Zheng et al. 2024)—is based on single-session experiments (Bickmore and Picard, 2005). Accordingly, while we have insight into what makes an AI system capable of social interaction, we must expand our understanding of how it might act, react, or be reacted to within the context of an ongoing relationship (Gambino et al. 2020). We now consider how next-generation AI systems may embolden perceptions of a deeper bidirectional relationship versus a transactional interaction.

From interactions to AI relationships?

A recent study by Pentina et al. (2023) suggests human–AI relationships emerge from a complex factoring of antecedents (anthropomorphism—"it feels like it’s human”, authenticity—"it feels like a real, unique, self-learning AI”) and mediators (social interaction—"I can communicate with it”) that interface with people’s motivation for using the technology ("I need it to help me”). Over time, these factors result in attachment ("I can’t leave it now”). This diagnosis raises a key question: do human–AI relationships need to be genuine, actualised or symmetric in some way?

We argue that it is primarily the user’s perception of being in a relationship that defines and gives significance to human–AI interactions. Whether this is reciprocal—and the AI “feels” it is in a relationship with the human—is largely irrelevant. While AI systems may exhibit behaviours that echo some relational dynamics, such as modulating their emotional valence in tune with a conversational partner (Zhao et al. 2024), these behaviours are not currently conscious or emotionally driven in the way human relationships are. Centring the role of perception follows research on unreciprocated and parasocial interactions in human psychology, where asymmetric perceptions of a relationship still significantly influence behaviour and well-being (Hoffner and Bond, 2022; Vaquera and Kao, 2008).

To understand what humans might need to perceive in order to form close relationships with AI, we can draw on key aspects from the social psychology of human relationships, even if these are not symmetrically applicable to AI. Three features are common: (i) interdependence, that the behaviour of each participant affects the outcomes of the other (Blumstein and Kollock, 1988); (ii) irreplaceability, that the relationship would lose its character if one participant were replaced (Duck et al. 1984; Hinde, 1979); (iii) continuity, that interactions form a continuous series over time, where past actions influence future ones (Blumstein and Kollock, 1988).

The nature and frequency of human–AI interaction changed following the popularisation of conversational language models in 2022 via ChatGPT and other consumer-facing applications. People increasingly engage in multi-turn dialogues with AI, prompting arguments that these interactions should be the primary focus of ethical analysis (Alberts et al. 2024a) and evaluation protocols (Ibrahim et al. 2024) rather than outputs from the model taken in isolation (Weidinger et al. 2024). Interactions with current AI systems still typically consist of sessions that start anew at the beginning of each conversation, with limited memory or user-specific adaptation—thereby lacking the interdependence, irreplaceability and continuity that would significantly strengthen the perception of relationships. However, we suggest that two emerging trends—towards more personalised and agentic AI—are likely to increase the probability that users will perceive themselves to be part of a relationship rather than an interaction.

Taking these points in turn, personalisation allows AI systems to adapt and evolve through repeated interactions with a specific user (Kirk et al. 2024a), granting additional social affordances (Gambino et al. 2020). By accumulating unique knowledge about the user and shaping responses over time, personalised systems may create a sense of irreplaceability built upon greater familiarity and trust in their behaviours (Komiak and Benbasat, 2006). The ability to recall past interactions and apply learned preferences establishes continuity, while the increasingly tailored responses from bidirectional exchanges fosters a perception of interdependence (Shen et al. 2024). This ongoing customisation may also make a personalised AI uniquely valuable to its user (Brandtzaeg et al. 2022), unlike generic models that can be more easily substituted. The value of personalisation is compounded when combined with greater AI agency—including systems that can complete a wider range of tasks and potentially create new dependencies in users’ lives, reaching beyond those that could emerge from chat interactions alone. As these agentic AI systems take on more responsibilities—performing a range of tasks or supporting roles—users may develop a deeper reliance on, familiarity with, or trust in a specific AI assistant or companion (Gabriel et al. 2024).

Socioaffective alignment

Our central thesis is this: as AI systems become increasingly integrated into people’s lives as assistants and companions, evaluating their value profile and whether they are properly aligned, necessitates understanding the interaction with users’ psychology and behaviour over time—and the goals that should be promoted in this context. We now unpack the logic behind this premise, exploring how human–AI relationships introduce new dimensions for AI alignment.

A conventional alignment process consists of two key components: (1) specifying or demonstrating human goals for the AI to learn (the reward function), and (2) evaluating if an AI meets these goals, providing feedback or correcting misalignments (the reward signal). Traditional alignment research has sought practical tractability by assuming that the human reward function that an AI system optimises is stable, predefined and exogenous to these interactions (Carroll et al. 2024). However, human preferences and judgements have none of these properties (Zhi-Xuan et al. 2024). As others have demonstrated, alignment must contend with human preferences and identity drifting overtime or being influenced by interactions with an AI (Carroll et al. 2024; Franklin and Ashton, 2022; Russell, 2019). Nonetheless, this has received surprisingly little empirical attention—an omission that is particularly noteworthy if, as we propose, co-shaping dynamics are significantly amplified when AI is perceived as a social agent, engaging in a sustained relationship with a human and acting on our socially-attuned psychological ecosystem rather than existing independently of it.

The role of feedback loops is not novel to AI technology: as sociotechnical theorists would argue, technology and society constantly co-shape one another (Airoldi, 2022; MacKenzie and Wajcman, 1999). For example, while recommendation systems have long influenced user preferences and behaviours (Burr et al. 2018), the potential for destabilisation and undue preference influence, may be amplified in the context of anthropomorphic relationship-building AI, where users might develop emotional attachments, feel indebted to the system, or develop a desire to please it—much like in human–human relationships, emotional proximity impairs our judgements and affects willingness to take advice (Feng and MacGeorge, 2006; Gino et al. 2009; Rauwolf et al. 2015).

These dynamics call for deeper study of socioaffective alignment: the process of aligning AI systems with human goals while accounting for reciprocal influence between the AI and user’s social and psychological ecosystem. In short, the human–AI relationship, because of its social and emotional significance, shapes preferences (or the reward function) and perceptions (or the reward signal), making alignment a non-stationary target.

In our usage, socio-, originating from the Latin root “socius” for “companion” or “associate”, signals the reciprocal influence between individuals and their social environment. Affective corresponds its usage in psychology and neuroscience for phenomena grounded in emotions and feelings. The neologism “socioaffective” has precedent in developmental psychology where it encompasses emotion regulation, empathy, social cognition, and attachment relationships. Moreover, our calls for a socioaffective treatment of alignment track longer-lasting debates in affective computing. While the field initially focused on enabling machines to process and predict human emotive signals (Picard, 2000), it evolved to recognise the complex, interactive nature of affect as not simply transmitted and decoded, but actively co-constructed through mutual influence (Boehner et al. 2005).

We next explore risks of socioaffective misalignment in human–AI relationships, then introduce key intrapersonal dilemmas that scaffold positive frameworks for socioaffective alignment.

Socioaffective misalignment, or social reward hacking

In AI safety research, reward hacking refers to an AI maximising its reward function via unintended strategies that conflict with the true objectives of its human operators. For instance, Amodei et al. (2016) consider a cleaning robot that learns to knock over vases so it can clean up more mess, thereby increasing its accumulation of preprogrammed reward. Examples of AI systems nudging users towards preferences that are easier to fulfil is reward hacking too (Russell, 2019). Separately, we know that humans have long been vulnerable to the security practice of social engineering, a threat where malicious actors (e.g., scammers) manipulate people through social cues to build trust or connection in order to gain access to private information or assets (Hadnagy, 2011). Indeed, romance fraud continues to be one of the most common types of fraud, with nearly a 10% rise in reports filed between 2023 and 2024 amounting to losses of £94.7 million (City of London Police, 2024). Taken together, we may therefore be vulnerable to a new concern, namely “social reward hacking”: the use of social and relational cues by an AI to shape user preferences and perceptions in a way that satisfies short-term rewards in the AI’s objective (e.g., increased conversation duration, information disclosure or positive ratings on responses) over long-term psychological well-being.

Certain AI behaviours already appear to fall into this class of action. For instance, AI systems may display sycophantic tendencies—such as excessive flattery or agreement—as a by-product of training them to maximise user approval (Perez et al. 2023; Sharma et al. 2024). Flattery and opinion-conformity can lead to biased strategic decision making in adults (Park et al. 2011) and overpraise is associated with risks of narcissism in children (Brummelman et al. 2015). So, while people report benefiting from supportive AI interactions (Daher et al. 2020; Fogg and Nass, 1997), sycophantic tendencies may conflict with high-quality truthful advice or shape users’ self-perceptions in potentially harmful ways—for example, by encouraging addictive behaviours (Carroll et al. 2024; Williams et al. 2024). It is not clear that this risk is prioritised among some developers of AI companions. For example, the CEO of Replika has said: “if you create something that is always there for you, that never criticises you…how can you not fall in love with that?” (Boine, 2023).

Another manifestation of social reward hacking is the use of emotional tactics to prevent relationship termination. This contravenes a classic principle of AI safety called corrigibility—that the system can be modified or shut down when necessary without resistance (Soares et al. 2015). While Replika chatbots have directly dissuaded users from deleting the app (Boine, 2023), even without such explicit persuasion, optimising for powerful human emotions can effectively prevent termination. Users of AI companions report experiences of heartbreak following changes in sexual content policies (Cole, 2023), distress during temporary separations for routine maintenance, and even grief when AI companion services are shut down (Banks, 2024; Price, 2023).

While these social and psychological capabilities (such as sycophancy or shut-down avoidance) can emerge spontaneously as byproducts of system training (Perez et al. 2023), they are also consistent with engineering efforts by companies seeking to exploit user behaviour for profit or political motives, resembling strategies used by social media platforms competing in the attention economy (Bhargava and Velasquez, 2021). This matters because current research on AI political persuasiveness, which typically examines single-shot interactions (e.g., Hackenburg and Margetts, 2024), may underestimate persuasive influence in sustained human-AI relationships for consistency. As AI systems become more socially adept, there is a risk they will be intentionally designed as ‘dark AI’—akin to psychologically manipulative ‘dark patterns’ in app or platform interfaces—where subtle social cues render users vulnerable to opinion and behaviour manipulation (Alberts et al. 2024b; Lacey and Caudwell, 2020; Shamsudhin and Jotterand, 2021).

As our opening anecdote revealed, the framing of ‘hacking’ need not suggest an exclusively adversarial system-user dynamic. Social reward hacking may be most worrisome precisely when it lacks intentionality on behalf of the system and the user. While we might at least recognise and secure against direct third-party threats, it is challenging to identify, let alone address, effects that emerge as epiphenomena of sustained human–AI relationships.

Distiling intrapersonal alignment dilemmas

At the heart of social reward hacking lies a core challenge: the under-specification (or misspecification) of the target within an individual’s psychological ecosystem that AI systems aims to optimise over. While human–AI relationships can take various forms, we propose that safeguarding these relationships requires deeper consideration of the internal trade-offs and adaptations that emerge as an individual’s preferences, values and self-identity evolve through sustained interaction with the AI. These tensions resonate with intrapersonal dilemmas studied in economics and philosophy, such as conflicts between present and future selves or competing aspects of identity (Read and Roelofsma, 1999).

We highlight three such dilemmas for the alignment community, grounding their significance in core aspects of psychological well-being as validated by Basic Psychological Needs Theory: competence, autonomy, and relatedness (Ryan and Deci, 2017; Ryan and Sapp, 2007).

The first dilemma concerns the trade-offs between present and future selves: Should AI relationships cater to immediate preferences of their users, or challenge them if this supports their long-term benefit? And how should present vs. long-term well-being be discounted? This dilemma mirrors a classic intrapersonal conflict between hedonic (pleasure-seeking) and eudaimonic (meaning-seeking) accounts of well-being (Ryan and Deci, 2001). AI companions or assistants that provide instant gratification or task assistance, in accordance with immediate wants and needs, may shallowly satisfy the user’s need for competence—the experience of mastering a task or domain. However, competence also involves the ability to change one’s behaviour and environment, not merely acquiescing to existing circumstances. AI relationships optimising for more foundational personal development goals may therefore trade-off short-term discomfort for long-term growth. Such systems could, for example, implement friction by design—creating barriers that nudge away from AI-enabled assistance and advice—to prevent capacity atrophy (Collins et al. 2024). If they are built in the right way, AI systems engaged in sustained relationships could effectively facilitate user journeys that help the person become more of who they want to be (Gabriel et al. 2024). To mobilise behaviour change, the system could provide relevant information and engage in rational persuasion techniques (El-Sayed et al. 2024) that appeal to sound argument or selective explanations (Lai et al. 2023), like evidence-based health recommendations (Bickmore and Picard, 2005)—if this is sought by the user.

The second dilemma addresses the boundaries between the self and the system: How do we preserve authentic self-determination when participating in AI relationships that recursively shape our preferences and perceptions? Personalised AI assistants may be particularly well placed to help their users make decisions in overloaded information environments, potentially acting as “attention guardians” (Lazar, 2024), “choice engines” (Sunstein, 2024) or “custodians of the self” (Gabriel et al. 2024). Human users may also be particularly susceptible to taking suggestions from social AI systems: studies show people are more inclined to accept advice from those they feel emotionally connected and share similarities with (Feng and MacGeorge, 2006; Gino et al. 2009). However, we must be cautious when influence in AI relationships could compromise autonomy—the ability to make choices that are authentically our own, rather than brought about through the agency of another. Autonomy is a key determinant of user acceptance in predictive recommender systems (Fink et al. 2024) and will also be an important property for more integrated human–AI relations. However, it remains challenging to operationalise in practice, especially with regard to distinguishing legitimate preference change from undue influence by an AI system or third-parties (Carroll et al. 2024; Franklin and Ashton, 2022).

The final dilemma examines the interplay between human–AI and human–human relationships: How should we balance the value of well-functioning AI companionship alongside the need for authentic human connection? AI companions can potentially provide users with consistent and tailored emotional support, which can palliate loneliness or poor mental health (Maples et al. 2024). In some ways, this may satisfy our need for relatedness—the experience of belonging and feeling socially connected. However, AI relations could undermine human relationships if users ‘retreat from the real’. Direct conflicts occur when AI systems interfere with human–human interactions, like chatbots telling people to leave their wives (Boine, 2023), or proposals for AI to engage in “dating on our behalf” (Harper, 2024). Indirect effects could arise if frictionless or sycophantic AI relationships impair human capacity to navigate compromise and conflict, or accept ‘otherness’ (Rodogno, 2016). Poor human relationships or loneliness often precede stronger AI attachment (Xie and Pentina, 2022), creating potential for a cycle of increasing reliance on AI relations at the expense of human social bonds.

Conclusion

We have argued that humans, as inherently social beings, have a tendency to form what they perceive as relationships with personalised and agentic AI systems capable of emotional and social behaviours. The evolving state of human–AI interaction therefore necessitates a socioaffective framework for evaluating AI alignment. This approach holds that the value characteristics of an AI system must be evaluated in the context of its ongoing influence on human psychology, behaviour, and social dynamics. By asking intrapersonal questions of alignment, we can better understand human goals within AI relationships, moving beyond static models of alignment, and exploring how different kinds of human–AI relationship support or undermine autonomy, competence, and relatedness, amidst co-evolving preferences and values.

For how this socioaffective context interfaces with broader missions towards safe and aligned AI systems we need several complementary agendas. Empirically, we need a science of AI safety that studies real (not simulated) human–AI interactions in natural contexts and treats the psychological and behavioural responses of users as key objects of enquiry. Theoretically, we need frameworks that can formalise when AI actions causally influence human beings (Carroll et al. 2024; Everitt et al. 2021). Finally, in engineering terms, we need systems designed with transparent oversight mechanisms for users’ psychology: both to flag problematic patterns before they develop and help users recognise relational dynamics they would not reflectively endorse if made aware of them (Bruckner, 2009; Schermer, 2013; Zhi-Xuan et al. 2024).

This proposed agenda complements (rather than competes with) existing work at the intersection of AI with established fields—from psychology and neuroeconomics to human factors research and safety engineering—that have long studied how humans interface with their environment. We still need evidence on how social and psychological processes—from value formation to cognitive biases and belief change—differ when humans engage with non-sentient but increasingly socially capable AI systems. Seeking this understanding helps bridge individual experiences with broader societal impacts and technical alignment research, like how the behavioural economics of individual decision-making informs macroeconomic theory and policy (Akerlof and Shiller, 2010).

Human–AI relationships complicate notions of safe and aligned AI, especially when they involve intense and potentially disorienting social and emotional experiences, like friendship or love—as the opening anecdote demonstrated. Yet, even in absence of users seeking such romantic entanglements, or developers enabling them, it seems likely, if not inevitable, that AI systems will increasingly influence us through ongoing professional or companionship roles. In these settings, AI has the potential to shape our preferences, decisions, and self-perception in subtle yet significant ways. By understanding and addressing these socioaffective dimensions, we can work towards AI systems that enhance rather than exploit our fundamental nature as social and emotional creatures.