Introduction

The implicit or explicit assumption of fair treatment is a cornerstone of human moral behavior1,2. Numerous studies have provided evidence that individuals are more likely to behave prosocially to the extent that they trust that others will allocate resources equitably and will not systematically abuse such trust3,4. The research literatures on resource allocation and on interpersonal trust have, understandably, focused almost exclusively on human-to-human interactions5,6,7,8. However, with the increasing capabilities of LLM-based social agents to simulate human behavior, a growing literature has begun to turn these questions toward humans’ moral expectations of AI agents with the aim of better understanding the mechanisms - and limits - of anthropomorphism9,10,11,12,13.

Several research teams have approached these questions using classic behavioral economic paradigms. For example, Karpus et al.9 reported that in a one-shot Prisoner’s Dilemma game, participants expected the same degree of cooperative behavior from an AI counterpart as from a human, although they themselves responded significantly more cooperatively toward the human (49%) than toward the AI counterpart (36%) (See also13. Similarly, in a one-shot Trust Game, participants expected equivalent payouts from a human or an AI counterpart, although they themselves reciprocated prosocially 75% of the time when their counterpart was human, but only 34% when it was an AI agent. Karpus et al.9 reported analogous results with a one-shot Dictator Game and Nielsen et al.10 reported parallel findings with a Public Goods game. In summary, although participants tend to approach an AI counterpart with the assumption that it will treat them as equitably as a human will, they feel less obligated to respond in kind toward the AI counterpart14.

The ultimatum game

Although the Prisoner’s Dilemma, Trust Game, and Dictator Game all partially concern expectations of fair treatment, they also invoke a range of further concerns, including interpersonal trust and strategic gamesmanship. Arguably a more direct method to assess assumptions of fair treatment per se is the Ultimatum Game (UG)15,16. In a one-shot UG, the Proposer starts with a sum of money. The Proposer decides how to split the money with the Responder (who is made aware of the total sum of money). The Responder may accept or reject this split. If the Responder accepts it, the money is split accordingly. If the Responder rejects the split, both players receive nothing. Before play begins, both players are informed of all possible outcomes. Thus, if the Responder rejects a disadvantageous split (e.g., 20%−80%), thereby choosing to receive nothing rather than something, this can be interpreted as a signal of the Responder’s displeasure over the violation of an implied expectation of fair treatment. Studies using the UG have consistently revealed that, indeed, a higher proportion of Responders reject disadvantageous splits (typically starting around the 30% Responder-70% Proposer range) than would be predicted by orthodox, rational choice theories17,18. In this respect, the UG differs from the related Dictator Game (DG); in the DG, the Responder is not afforded the opportunity to reject the offer.

To what extent does this pattern extend to human-AI interaction? In one of the first studies to investigate the UG in this context, Sanfey et al.19 reported from a small sample (19 participants) that Responders showed stronger emotional reaction in response to disadvantageous offers from a human than from a computer counterpart. Participants’ anterior insula activation during the rejection of inequitable offers correlated with emotional expressions, suggesting a possible neural substrate for the relevant emotional processes. The authors proposed that participants reacted more negatively toward the human’s disadvantageous offer than the computer’s disadvantageous offer because people grant a higher degree of autonomy13, agency11, intentionality20 and in turn, moral responsibility21 to humans than to AI agents. (See also22 for related findings.)

Evolving expectations of AI agents

In the decades since, however, enormous advances in computational power and neural architecture have led to the proliferation of commercially available AI products. This has brought human-AI interaction more squarely into the realm of everyday experience. As such, expectations about AI moral behavior are likely to have evolved. The idea of anthropomorphically blaming, for example, ChatGPT (not the programmers, not the OpenAI corporation) for a perceived transgression may seem less far-fetched to laypeople in 2025 than in the past.

For this reason, we hypothesized that participants responses in an Ultimatum Game against an AI opponent would differ from the pattern reported by19, when provided with a disadvantageous offer, present-day people will respond more negatively toward AI agents than toward humans. We suspected this for four reasons. First, recent studies have demonstrated that (relative to humans) people generally hold machines to and expect a higher standard of reliability, a phenomenon dubbed the Perfect Automation Schema23 see also11,24. Thus, a violation of such expected reliability and fairness represents a greater betrayal25. Second, many people may hold the assumption that AI agents should serve humans, not undermine them e.g26.,. A disadvantageous offer may violate the AI agent’s presumed subordinate role. Third, interpersonal politeness norms that demand negative emotion suppression27 may be reduced or absent when interacting with an AI counterpart (compared to a human counterpart). That is, people may feel freer to express their outrage toward an entity that is assumed to lack the capacity to experience feelings, such as insult or hurt11. Fourth, participants may feel less reluctant to aggress toward, or even ‘punish’, an AI counterpart because they see it as a mechanism that possesses fewer (if any) moral rights21,28.

Advantageous offers are also a violation of fairness norms

Considerably less work in this area has examined the psychology of receiving an unfairly advantageous offer, or advantageous inequity. Our method permitted us to measure participants’ offer rejection rate, self-reported affect, and heart-rate variability in response to the experience of being over benefitted (e.g., 80%Responder-20%Proposer) by either a human or an AI counterpart. While a strict, rational self-interest approach might suggest that people should enjoy receiving unexpectedly high returns, there are grounds to suspect that, in practice, people will generally find advantageous inequity discomfiting. First, people may experience feelings of guilt upon receiving a payout that is perceived to be larger than deserved. This concept has a long history in psychological research, including classic work in Equity Theory in social relationships29, as well as relatively recent conceptualizations30,31. Second, people may doubt their ability to maintain the positive outcomes into the future and may thus experience anxiety over the likely impending decrease32, Third, separately from the valence of the payout (i.e., over vs. under-benefit), people may generally experience confusion upon receiving any sort of unexpected result. Finally, the literature on downregulation of positive emotion33,34 indicates that, upon receiving positive news, people are typically sensitive to the emotional state of others in their environment and calibrate their own emotional expression so as not to deviate too much in a positive direction.

Building on the neuroeconomic foundation laid by19, later work by Civai et al.35 demonstrates a functional split within the fairness network: anterior-insula activity flags violations of the equality norm, whereas mPFC/aMCC activity supports the effortful override of that norm when self-interest is salient—such as when accepting an advantageous offer. Complementary behavioural data indicate that equality itself functions as a default heuristic36, as such, people appear to be relatively averse to both giving and receiving offers that are advantageous to themselves. Studies on Advantageous Inequity Aversion (AIA) have demonstrated that medial frontal negativity (MFN), an index of neural activity associated with expectancy violation, responds negatively to advantageous inequities37,38. This aversion to unfair advantage aligns with societal norms that equate fairness with equal treatment, an observation supported by behavioral responses in the Ultimatum Game, in which advantageous offers are often rejected despite the potential gain. Furthermore, research with children indicates that this aversion is socially conditioned and develops with age. Whereas younger children tend to accept unfair advantages, by around age 8 they begin to reject such benefits, prioritizing fairness over personal gain39. Finally, there is evidence that people tend to view inequities (whether advantageous or disadvantageous) as violations of moral principles40.

To what extent might such concerns extend to AI agents? Given that most people assume that AI entities are (a) largely incapable of experiencing emotions11 and (b) less socially embedded than humans20, these interpersonal emotional concerns should be largely moot when one is over benefitted by an AI agent. Thus, we hypothesized that participants would display lower advantageous inequity aversion (expressed by lower rejection rates, lower emotional disturbance, and modulation of heart-rate variability as an indicator of regulatory capacity) when over benefitted by an AI counterpart than when over benefitted by a human counterpart.

Our design involved a 21-round game, rather than one shot. This allowed us to aggregate across participants’ responses to multiple fair or unfair offers for arguably a more robust measure than can be provided by a single, one-shot game. In addition, the extended play period permitted us to measure physiological indices of heart rate variability. We suspected that such indices, which we describe next, would provide additional insight into the processes that contribute to accept/reject decisions.

Heart rate variability

A healthy heart does not beat like a regular clock. Instead, its oscillations are complex and non-linear. Heart rate variability (HRV) refers to the fluctuation in the time interval between adjacent heartbeats41. It indexes neurocardiac function and is generated by heart-brain interactions and dynamic, non-linear autonomic nervous system (ANS) processes42. The beat-to-beat fluctuations of a healthy heart are best described by mathematical chaos43. Such nonlinear variability is thought to provide the flexibility to rapidly cope with an uncertain and changing environment44. An optimal level of HRV is associated with health, self-regulatory capacity, and adaptability or resilience41.

HRV (measured via time, frequency, and non-linear metrics) has been identified as a physiological marker of stress-induced elevation in the sympathetic nervous system (SNS) and reduction in the parasympathetic nervous system (PNS)45. Time-domain indices of HRV quantify the amount of variability in measurements of the inter-beat interval (IBI). Frequency-domain measurements estimate the distribution of absolute or relative power into frequency bands. Non-linear measurements allow researchers to quantify the unpredictability of a time series46.

Several studies have focused on how individual differences in heart rate variability are associated with decision-making in the UG. For example, Sütterlin et al.47 reported that participants with higher resting HRV, which indicates greater parasympathetic nervous system activity and inhibitory control, were more likely to reject unfair offers in the UG. Additionally, performance on a separate motor response inhibition task (Stop Signal Task, SST) also predicted rejection rates in the UG. Combining HRV and SST measures explained a significant portion of the variance in rejection rates, suggesting that self-regulatory capacity plays a crucial role in overcoming economic self-interest and promoting fairness-related behavior in the UG. In a related study, Dulleck et al.48 reported that lower HRV (indicative of higher self-regulatory effort and/or stress) correlated with both rejecting and making unfair offers in the UG.

The primary time-domain measure used to estimate vagally mediated changes reflected in HRV is the root mean square of successive differences between normal heartbeats (RMSSD)42. Osumi and Ohira49 found that heart rate deceleration was more pronounced when participants received offers that they subsequently rejected, suggesting that the perception of disadvantage triggers physiological responses that contribute to the decision to reject. Another research team reported that criminal judges were not only slower in their responses, but also rejected a greater proportion of unfair offers, with their decision-making and rejection rates significantly correlated with higher HRV scores50. These data suggest that HRV reflects self-regulatory effort in moral decision-making contexts, including the UG context. Taken together, these findings have introduced a new perspective to UG behavior by highlighting inhibitory control processes that may contribute to the decision to reject an offer.

Materials and methods

Participants

One hundred and fifty participants initially enrolled in the study. However, due to participant nonresponse or system malfunctions, the final sample size for analysis was 142 participants (Mean age = 19.22, SD = 2.68) (The age data were available for 136 participants.) with 63 randomly assigned to the human counterpart condition and 79 to the AI counterpart condition. The sample size was determined using a priori power analysis, informed by effect sizes and sample sizes reported in comparable prior studies e.g47,50.,. All methods were approved by the Research Ethics Board at the University of Toronto and conducted in accordance with the university’s published ethical guidelines and regulations. Informed consent was obtained from all participants prior to their participation. All participants provided informed consent before the experimental procedures. Socio-demographic data (age, race/ethnicity, socioeconomic status) were collected via end questionnaires. Exclusion criteria included cardiovascular disease, arrhythmias, serious medical conditions, and the use of heart rate variability-affecting medications (e.g., beta-blockers, calcium channel blockers). Participants also reported their frequency of alcohol intake, tobacco intake, caffeine intake, minimum fifteen-minute exercise intervals, and minimum fifteen-minute meditation intervals on a weekly basis, due to their impact on HRV. Participants were required to abstain from smoking and caffeine on the day of the experiment and to avoid strenuous exercise and heavy meals prior to testing.

Procedure

Fig. 1
figure 1

Experimental protocol overview. Participants were randomized into Human versus AI conditions. Both conditions involved a 5-minute baseline PPG recording, 21 offers in the Ultimatum Game, a 5-minute final baseline, and questionnaires. Continuous PPG recording was maintained throughout.

After providing informed consent, proposers and responders briefly introduced themselves. In the AI condition, participants interacted with “SAM,” a small, disk-shaped tabletop speaker that delivered pre-recorded utterances generated with Amazon Polly. A human assistant in an adjoining room triggered these audio files using a standard Wizard-of-Oz (WoZ) procedure. Thus, SAM was not a fully autonomous agent. After each session, research assistants informally verified perceived agent identity; nearly all participants in the AI condition reported that they believed they had played against an AI. We note, however, that our credibility check was verbal rather than a standardized written manipulation check, a limitation given that WoZ methods, while common in HCI/HRI, can introduce expectation effects and demand characteristics if participants infer human control or over-attribute capabilities to AI51,52. To strengthen ecological validity in future work, we recommend incorporating formal post-task questionnaires to document agent beliefs, preregistered exclusion criteria for non-believers, and, where feasible, double-blind protocols or genuinely autonomous agents51,52.

SAM greeted the participants with, “Hi, my name is SAM, it’s nice to meet you today, how have you been doing”? SAM was always presented as the proposer. In the Human condition, the proposer - a research assistant posing as another participant made a similar verbal greeting. Next, participants completed a series of questionnaires, which test hypotheses that are beyond the scope of the present article and will thus not be discussed further and can be found in the supplementary material (One measure assessed trait-level dehumanization tendencies53. A second measure assessed individual differences in anthropomorphism tendencies54. A third measure was a single item assessing the perceived importance of fairness in decision-making. A fourth measure assessed trait-level interpersonal trust55.

Then each participant was taken to a separate room and connected to the physiological recording system (Fig. 1). Throughout the session, continuous heart rate monitoring was conducted using PPG sensors. (See details below.) After tests of the PPG system, participants turned their attention to the computer terminal to play a modified version of the Ultimatum Game (UG). To briefly recap the UG procedure, on each round, the proposer offers a split of points ranging from 0 to 100 (e.g., 70/30, such that the proposer keeps 70 and 30 goes to the responder). If the responder (the participant) agrees to the division, both parties receive the proposed amounts. However, a rejection by the responder results in neither player earning any points. All participants were assigned the role of the responder after completing prerequisite comprehension questions about the game rules. Research assistants, posing as fellow participants, posed as proposers in the human counterpart condition. The sequence of offers was quasi-randomized to ensure that each participant received an equal distribution of offers: 7 fair, 7 disadvantageous, and 7 advantageous.

On each round, the human/AI counterpart’s offer was presented to participants on the screen. Participants’ task was to select ‘Accept’ or ‘Reject’. The Ultimatum Game was conducted using the oTree platform56. Following each of the 21 rounds, participants completed a brief, self-reported affect questionnaire - modeled on Osumi and Ohira49. In keeping with the exact language used by Osumi and Ohira49, participants rated their feelings of anger, aversion, reassurance, and pleasure. In addition, we added a new exploratory item (disgust) in light of the extensive literature on the link between disgust and moral condemnation e.g57,58.,

Heart rate variability measurement

To measure heart rate variability, we used photoplethysmogram (PPG), which detects cardiovascular pulse waves via a light source and detector. This technique captures variations in blood volume or flow, with the intensity of backscattered light corresponding to changes in blood volume59,60. PPG offers a low-cost, non-invasive alternative to the traditional electrocardiogram (ECG). Leveraging embedded devices like Raspberry Pi and Arduino increases the functionality of PPG. Consistent with61, we used the Pulse Sensor Amped, an Arduino-based heart-rate sensor, to collect PPG data from participants, recording from the initial baseline to the final baseline. Previous data, including analyses in both time and frequency domains, indicate that PPG can be as reliable as ECG when a sampling frequency of at least 25 Hz is used62. To optimize accuracy, participants were asked to minimize movement of the hand equipped with the sensor. We used the Python toolbox NeuroKit263 for data analysis. The raw PPG signal, sampled at 160.414 Hz, underwent preprocessing with a bandpass filter (0.5–8 Hz, Butterworth 3rd order, following64. For more accurate RR interval estimation from PPG signals, we deployed specific functions from NeuroKit2 based on methodologies reported by65,66.

Data analysis

Accept/Reject decisions

Behavioral data were processed using R Studio Version 2024.4.2.764. We defined rejection rates as the ratio of rejected offers to total offers in each category (Disadvantageous, Advantageous, Fair, and Overall). Offer types were categorized as Disadvantageous if the participant was offered 0–40%, Fair if offered 50%, and Advantageous if offered 60–100%. Given non-normality of the distribution of rejection rates, a Kruskal-Wallis test was conducted to compare rejection rates between different offer types. The Wilcoxon Signed-Rank Test was used to assess differences in rejection rates between the Human and AI conditions for advantageous and disadvantageous offers. Consistent with previous approaches in the literature e.g., Osumi & Ohira49, repeated measures ANOVAs were performed on the affect questionnaire data to evaluate the effect of offer type (Fair, Advantageous, Disadvantageous) on subjective feelings. Additionally, Spearman’s rank correlation was used to examine relationships between self-reported emotions, heart rate variability (HRV) metrics, and rejection rates and counts. To ensure consistency across affective measures, negative affects (Anger, Aversion, and Disgust) were reverse coded on a standardised scale, so that higher values indicated higher positive affect. Although ‘anger,’ ‘aversion,’ ‘reassurance,’ and ‘pleasure’ suggest four semantically distinct emotion concepts, analyses revealed that participants’ responses were highly intercorrelated: Cronbach’s 𝛂 = 0.86. Given this high degree of intercorrelation, we averaged the scores for each participant to create a composite affect index, with higher values indicating higher affect positivity.

Physiological data analysis

From the PPG recordings, we extracted the following time, frequency, and nonlinear measures using the neurokit2 toolbox:

  • Standard Deviation of NN Intervals (SDNN), which indicates overall heart rate variability.

  • Low Frequency (LF) and High Frequency (HF) components, which reflect a combination of sympathetic and parasympathetic activity, and parasympathetic activity, respectively.

  • Cardiac Vagal Index (CVI), which specifically measures parasympathetic function.

  • MeanNN represents the mean of all normal-to-normal (NN) intervals in milliseconds, which is inversely related to heart rate.

  • Root Mean Square of the Successive Differences (RMSSD), which captures short-term variability linked to parasympathetic activity.

  • pNN50, which is the percentage of NN intervals differing by more than 50 ms.

  • Shannon Entropy, which measures the complexity and unpredictability of heart rate patterns.

  • SD1 and SD2, which reflect short-term and long-term autonomic regulation, respectively, and are considered to be sensitive to dynamic changes in stress and recovery.

We calculated these measures during the baseline period (5 min), then estimated it over three five-minute recording windows during the Ultimatum game play. The first and last 30 s of the data were excluded from the analysis to avoid potential artifacts or signal instability commonly observed during the start and end of PPG/ECG recordings, ensuring more reliable and accurate measurements. For short-term HRV measurements, 5-minute segments are generally considered appropriate67.

Results

Behavioral results: Accept/Reject decisions

Rejection rates varied by offer type

First, to ascertain that the data replicated previous findings e.g22,49,68., we compared the rejection rates by offer type (Advantageous, Fair, Disadvantageous), collapsing across the human and AI counterpart conditions. A Kruskal-Wallis test revealed a significant difference in rejection rates among the different offer types in the predicted direction (i.e. highest for Disadvantageous, lowest for Advantageous), χ² (2) = 257, p <.001.

The rejection rate for disadvantageous offers was higher in the AI condition than in the human condition

Next, we compared the rejection rates for disadvantageous offers between the human and AI conditions. A Wilcoxon rank-sum test revealed that participants were more likely to reject disadvantageous offers from an AI counterpart than from a human counterpart, W = 1688, p =.0067 (Fig. 2A). Note that this is consistent with our hypothesis; whereas19 reported a higher rejection rate with the human counterpart, these data - collected over two decades later and with a much larger sample – differed notably from the original effect. As we discuss below, this is likely due to evolving expectations about computerized/AI agents since 2003. We wish to note the following caveat, however: Our use of a between-participants manipulation of Human vs. AI differed from Sanfey et al.’s19 within-participants manipulation of Human vs. Computer. Thus, our study should be considered an inexact replication.

The advantageous offer rejection rate was higher in the human condition

Next, we compared the rejection rates for advantageous offers between the human and AI conditions. A Wilcoxon rank-sum test indicated that participants exhibited a significantly higher rejection rate for advantageous offers when playing with a human counterpart, W = 2703.5, p =.007 (Fig. 2B). This is also consistent with our hypothesis.

Fig. 2
figure 2

Plot (A) shows the counterpart type comparison of the rejection rate for disadvantageous offers. Plot (B) shows the counterpart type comparison of rejection rate for advantageous offers. Significance levels: *p <.05, **p <.01, ***p <.001.

Impact of affect

Subjective affect varied by human vs. AI condition

After removing mindless or logically incoherent entries (for more information on exclusion criteria, see Supplementary Material (The results did not meaningfully vary whether the excluded values were included or excluded). We examined the effect of counterpart type and offer type on self-reported affect following fair offers, disadvantageous offers, and advantageous offers. The results are summarised in Figs. 3, 4 and 5. Recall that higher values indicate higher affect positivity.

​​Disadvantageous offers

A one-way analysis of variance (ANOVA) on the self-reported affect index following disadvantageous offers revealed a significant effect of Counterpart Type (Human vs. AI), F (1, 128) = 6.79, p =.01, ηp² = 0.05. The direction of the effect indicates that participants reported feeling less positive affect/more negative affect following disadvantageous offers made by AI counterparts than by human counterparts.

Advantageous offers

In contrast, an analogous ANOVA on self-reported affect following advantageous offers revealed no significant effect of Counterpart Type, F (1, 130) = 0.707, p =.402, ηp² = 0.005. This suggested no substantial difference in self-reported affect when participants received advantageous offers from human versus AI counterparts.

Fair offers

An analogous ANOVA for fair offers revealed no significant main effect of Counterpart Type (Human vs. AI) on self-reported affect, F (1, 130) = 1.981, p =.162, ηp² = 0.015. This indicates that participants’ emotional responses to fair offers did not differ significantly when the offers were made by a human counterpart or an AI counterpart.

These results suggest that AI counterparts only elicit more negative affect than human counterparts following a disadvantageous offer. This pattern is consistent with the finding that participants were more likely to reject disadvantageous offers from the AI counterpart than from the human counterpart.

Fig. 3
figure 3

Affect by Counterpart Type. Higher values on the Y axis indicate higher positive affect. The figure summarizes the pairwise comparisons for (A) Disadvantageous offers. (B) Fair offers. (C) Advantageous offers. Significance levels: *p <.05, **p <.01, ***p <.001.

Subjective affect was generally correlated with rejection Rate, but did not vary as a function of counterpart type (AI versus Human)

To assess the relationship between subjective affect and rejection rate, we conducted correlation analyses between affective responses and rejection rates for each offer type, as depicted in Table 1.

Disadvantageous offers

Following disadvantageous offers, affect positivity was negatively correlated with rejection rate in both the AI condition (rs = − 0.321, p <.01) and the Human condition (rs = − 0.497, p <.001). To investigate whether the relation between affect and rejection rate was stronger in the Human condition than in the AI condition, we regressed rejection rate onto affect positivity, counterpart type and the interaction term. Results indicated that affect positivity significantly generally predicted rejection rate, B = −0.089, SE = 0.022, p <.001, with the negative sign indicating that more positive affect was associated with lower rejection rate. However, the interaction between counterpart type and affect was not significant, B = 0.030, SE = 0.030, p =.325, suggesting that the relationship between affect and rejection rate did not differ as a function of playing with an AI or human counterpart.

Advantageous offers

For advantageous offers, affect positivity was negatively correlated with rejection in the AI condition (rs = − 0.291, p <.05) and in the Human condition (rs = − 0.465, p <.001). As above, we performed regression analysis to investigate whether participants experienced stronger emotional responses when rejecting advantageous offers in the Human condition compared to the AI condition. Results indicated a significant main effect of counterpart type, B = −0.071, SE = 0.028, p =.013, suggesting that participants rejected advantageous offers from human counterparts more frequently than from AI counterparts. Affect also significantly predicted rejection rate, B = −0.084, SE = 0.018, p <.001, with higher positive affect associated with lower rejection rates. The interaction between counterpart type and affect was not significant, B = 0.042, SE = 0.024, p =.078, indicating that the effect of affect on rejection rate was not robustly stronger in the human condition.

Fair offers

Affect positivity following fair offers was negatively correlated with rejection rate of fair offers in the AI condition, (rs = − 0.339, p <.01) and the Human condition (rs = − 0.477, p <.001). That is, the more negative (less positive) affect participants reported feeling after an offer, the more likely they were to reject that offer. To test for possible differences between the AI condition and the human condition, we conducted a regression analysis analogous to the ones above. It revealed that the main effect of counterpart type was not significant, B = −0.046, SE = 0.035, p =.194, indicating no significant difference in Fair offer rejection rates between AI and Human counterparts. However, affect significantly predicted rejection rate, B = −0.075, SE = 0.014, p <.001, with more positive affect associated with lower rejection rates. The interaction between counterpart type and affect was not significant, B = 0.027, SE = 0.024, p =.254, suggesting that the effect of affect on rejection rate did not differ between AI and Human counterpart.

Table 1 Correlations between affect and offer rejection rates in the human and AI conditions. Significance levels: *p <.05, **p <.01, ***p <.001.

In summary, analyses of participants’ self-reported affect revealed that participants reported feeling (a) generally more displeasure following disadvantageous offers than following the other types of offers and (b) more displeasure following disadvantageous offers from an AI counterpart than from a human counterpart. However, these differences did not differentially influence participants’ ultimate decisions to accept or reject offers from AI versus human counterparts. This may indicate that self-reported affective state is a comparatively crude measure, susceptible to both reporting biases and high variability. It may also indicate that additional processes, beyond affect play an important role in accept/reject decisions. We turn next to such potential processes.

Heart rate variability

Manipulation check: analysis of HRV responses across initial baseline, ultimatum Game, and final baseline periods

First, to examine the effect of playing the Ultimatum Game on Heart Rate Variability (HRV), we analyzed HRV across three time periods: the initial baseline (B1), the UG period, and the final baseline (B2). Significant changes in HRV were observed across these periods, reflecting physiological variations. Because no significant interaction or main effect of counterpart type (Human vs. AI) was detected, we combined the conditions (H/AI) and conducted a repeated measures one way ANOVA to focus on the overall effect of time. For analysis, for consistency we focused on the middle five minutes of the UG condition as baseline and UG was of different length. Data from 12 participants were excluded due to missing the final baseline or both the ultimatum game (UG) and the final baseline caused by sensor data collection errors.

A one-way ANOVA revealed a significant main effect of time for CVI, F (1.74, 190.09) = 17.76, p <.001, partial η² = 0.14. Pairwise comparisons demonstrated that CVI was significantly lower during the initial baseline compared to the second baseline, t (109) = − 5.08, p <.001, Cohen’s d = − 0.48. The initial baseline was not significantly different from the UG period, t (109) = − 2.35, p =.062, Cohen’s d = − 0.22. Additionally, CVI was significantly higher during the second baseline compared to the UG period, t (109) = 4.34, p <.001, Cohen’s d = 0.41. For RMSSD, a significant main effect of time was observed, F (1.66, 181.02) = 4.30, p =.015, partial η² = 0.038. Pairwise comparisons showed no significant difference between the initial and second baselines, t (109) = − 2.33, p =.066, Cohen’s d = − 0.22, nor between the initial baseline and the UG period, t (109) = − 0.71, p =.481, Cohen’s d = − 0.07. However, RMSSD was significantly higher in the second baseline compared to the UG period, t (109) = 2.53, p =.039, Cohen’s d = 0.24. For SD1, a significant main effect of time was observed, F (1.66, 181.03) = 4.31, p =.015, partial η² = 0.038. Pairwise comparisons indicated no significant difference between the initial baseline and the second baseline, t (109) = − 2.33, p =.065, Cohen’s d = − 0.22, or between the initial baseline and the UG period, t (109) = − 0.71, p =.481, Cohen’s d = − 0.07. The second baseline was significantly higher than the UG period, t (109) = 2.53, p =.039, Cohen’s d = 0.24. For SD2, a significant main effect of time was observed, F (1.83, 199.98) = 29.17, p <.001, partial η² = 0.211. Pairwise comparisons revealed that SD2 was significantly lower during the initial baseline compared to the second baseline, t (109) = − 6.57, p <.001, Cohen’s d = − 0.63. The initial baseline was significantly lower than the UG period, t (109) = − 2.55, p =.037, Cohen’s d = − 0.24. Additionally, SD2 was significantly higher during the second baseline compared to the UG period, t (109) = 5.34, p <.001, Cohen’s d = 0.51. For SDNN, a significant main effect of time was observed, F (1.78, 194.42) = 16.49, p <.001, partial η² = 0.131. Pairwise comparisons indicated that SDNN was significantly lower during the initial baseline compared to the second baseline, t (109) = − 5.44, p <.001, Cohen’s d = − 0.52. The initial baseline was not significantly different from the UG period, t (109) = − 2.01, p =.142, Cohen’s d = − 0.19. SDNN was significantly higher during the second baseline compared to the UG period, t (109) = 4.87, p <.001, Cohen’s d = 0.46. For Shannon Entropy, a significant main effect of time was observed, F (1.78, 194.42) = 21.25, p <.001, partial η² = 0.163. Pairwise comparisons revealed that ShanEn was significantly lower during the initial baseline compared to the second baseline, t (109) = − 4.85, p <.001, Cohen’s d = − 0.46. The difference between the initial baseline and the UG period was not significant, t (109) = − 1.97, p =.154, Cohen’s d = − 0.19. However, ShanEn was significantly higher in the second baseline compared to the UG period, t (109) = 4.24, p <.001, Cohen’s d = 0.40. Lower HRV during the initial baseline aligns with anticipatory stress before engaging in a task69. The UG period likely imposed cognitive or emotional challenges, reflected in reduced HRV measures such as SDNN, CVI, RMSSD, SD1, SD2. Similarly, Entropy reductions during stress periods (initial baseline and UG) likely reflect lower complexity in heart rate variability, a marker of reduced autonomic adaptability under stress70. Increased variability during the second baseline indicates the individual’s recovery and adaptive capacity following the stressor, as suggested by research on autonomic recovery dynamics45.

These analyses provide evidence that HRV measures do capture heightened cognitive or emotional demands during the UG period. These data also serve as a manipulation check. The initial baseline’s lower HRV likely reflects anticipatory stress, while the final baseline’s higher HRV indicates recovery and relaxation following task completion. Consistent with systematic review evidence that heart-rate variability (HRV) is a general index of cognitive and emotional load rather than a task-specific marker71, we observed a tonic suppression of HRV across the entire Ultimatum-Game (UG) period. Our focus, following47 was on resting (trait-like) HRV as an index of inhibitory capacity; we therefore did not analyse phasic, event-locked changes. The observed decrease should be interpreted as reflecting the overall cognitive-emotional demands of making UG decisions.

Association between heart rate variability and ultimatum game decisions

Next, we examined the relationship between various HRV measures and fair, advantageous, disadvantageous, and overall offer rejection rate in the Human and AI conditions. Spearman correlations are presented in Table 2, as they are generally robust to outliers72. The outliers were not removed, but rather winsorized following ​​the recommendations of69. The results reveal distinct patterns across various HRV metrics. In line with Sütterlin et al.47 who reported Pearson correlations, we also tested Pearson correlations. Detailed results, largely align with the Spearman correlation findings.

Following prior work that treats resting/tonic HRV as a trait-like proxy for inhibitory/self-regulatory capacity, we focused our analyses on tonic HRV and did not analyze phasic, event-locked changes47. Accordingly, the decrease across the UG period as discussed in the Sect. 2.1 Manipulation Check, should be read as a task-general engagement of autonomic regulation rather than a fairness-condition–specific effect71,73. To aid interpretation, we report time- and non-linear HRV metrics that capture complementary facets of autonomic self-regulation. SDNN (SD of normal-to-normal intervals) reflects overall autonomic flexibility across mixed time scales, whereas RMSSD (root-mean-square of successive differences) and HF-HRV are more specifically vagally mediated and therefore represent tighter indices of short-epoch self-regulatory capacity42,67. From the Poincaré analysis, SD1 indexes short-term (beat-to-beat) variability with strong parasympathetic influence, and SD2 indexes longer-term oscillations involving both vagal and baroreflex contributions42. MeanNN is the mean RR interval (inverse of mean heart rate), and entropy-based measures (e.g., Shannon entropy) summarize non-linear complexity of the RR series, with higher values generally denoting a more adaptable regulatory system42. The Cardiac Vagal Index (CVI), derived from Poincaré plot descriptors, provides an additional nonlinear estimate of parasympathetic activity42,74. Together, higher values on vagal-specific indices (e.g., RMSSD, SD1, CVI) indicate greater regulatory capacity, whereas lower values index reduced autonomic flexibility under cognitive–emotional load42,67. In line with Sütterlin et al.47, our focus was on tonic HRV as a trait-like index of inhibitory capacity, rather than on phasic, event-related changes. To strengthen fairness-specific inferences in future work, we recommend adding event-related (phasic) HRV analyses time-locked to offer onset and outcome feedback, alongside standardized resting baselines and post-decision recovery intervals42,73. Phasic/reactivity designs have proven sensitive to trial-level regulatory demands in cognitively effortful decisions, suggesting that similar alignment in the UG could dissociate general task engagement from fairness-condition–specific regulation73,75.

At the aggregate level, a correlational analysis suggests differences in HRV metrics between AI and Human conditions. For overall rejection rate, correlations between HRV measures and rejection rate were consistently stronger in the AI condition: SDNN (0.334 for AI vs. 0.062 for H), SD2 (0.401 for AI vs. 0.098 for H), and ShanEn (0.312 for AI vs. 0.035 for H). These differences indicate that participants in the AI condition generally displayed a stronger association between physiological metrics and rejection of offers.

Disadvantageous offers

For disadvantageous offers, a similar trend emerged, with stronger correlations in the AI condition for SD2 (0.201 for AI vs. −0.08 for H) and ShanEn (0.153 for AI vs. −0.114 for H). The negative correlations in the Human condition (e.g., MeanNN: −0.167 for H vs. 0.181 for AI) might suggest differing physiological responses to unfairness when interacting with AI versus humans. To examine more directly whether the type of counterpart (human vs. AI) influenced the relationship between heart rate variability (HRV) predictors and rejection of disadvantageous offers, we conducted multiple regression analyses. We found no association between body mass index (BMI) and HRV, so we excluded it as a confounding variable in the subsequent regression analysis. We regressed the disadvantageous offer rejection rate onto each specific HRV measure, condition (Human vs. AI), and the interaction term. Significant interaction effects were observed for SDNN (β = −0.0028, p =.021), MeanNN (β = −0.0011, p =.002), SD2 (β = −0.0023, p =.040), and ShanEn (β = −0.1652, p =.003), such that participants in the AI condition exhibited a stronger positive association between these HRV metrics and rejection rate than did participants in the human condition.

Advantageous offers

For advantageous offers, the differences between AI and Human conditions were less pronounced but still notable. Measures such as SD2 (0.161 for AI vs. 0.102 for H) and MeanNN (0.185 for AI vs. 0.212 for H) exhibited comparable correlations in both conditions. To examine more directly whether the type of counterpart (Human vs. AI) influenced the relationship between heart rate variability (HRV) measures and rejection of advantageous offers, we conducted analogous regression analyses. Rejection rate was regressed onto each HRV measure, Counterpart Type (Human vs. AI), and the interaction term. No significant interaction effects were observed for SDNN (β = −0.00005, p =.940), MeanNN (β = −0.00001, p =.947), SD2 (β = −0.00009, p =.892), or ShanEn (β = −0.0011, p =.974), indicating that the associations between these HRV measures and rejection rates did not differ between the Human and AI conditions. A significant main effect of MeanNN was observed, β = 0.0004, p =.006, suggesting a positive association between MeanNN and rejection rate regardless of whether the counterpart was a human or an AI agent. Additionally, counterpart type significantly predicted rejection rates, β = −0.052, p =.005, with participants in the Human condition rejecting advantageous offers more frequently than participants in the AI condition. Neither Affect (β = −0.002, p =.813) nor its interaction with SDNN (β = −0.0002, p =.327) were significant predictors of rejection rate.

Fair offers

For fair offers, the Spearman correlation analysis revealed stronger associations in the AI condition compared to the Human condition for most HRV measures. Notable differences include SDNN (0.377 for AI vs. 0.068 for H), SD2 (0.405 for AI vs. 0.099 for H), and ShanEn (0.344 for AI vs. 0.077 for H). To examine more directly whether the type of counterpart (Human vs. AI) influenced the relationship between heart rate variability (HRV) measures and rejection of fair offers, we conducted analogous multiple regression analyses. Rejection rate was regressed onto each HRV measure, Counterpart Types (Human vs. AI), and their interaction term. Significant interaction effects were observed for SDNN (β = −0.0023, p =.016) and SD2 (β = −0.0021, p =.014), indicating that these HRV measures were more strongly associated with rejection rate in the AI condition compared to the human condition.

Table 2 Comparison of spearman correlations between HRV measures (SDNN, SD1, SD2, RMSSD, MeanNN, ShanEn, and CVI) and ultimatum game decisions (rejection rate for fair offers, disadvantageous offers, advantageous offers and overall rejection Rate) in the human and AI conditions. Significant differences highlight stronger associations in the AI condition across multiple HRV metrics.

In summary, these analyses reveal stronger HRV associations with rejection rate in the AI condition than in the Human condition, suggesting that individuals experience higher physiological activation when interacting with an AI agent (Fig. 4). The inhibitory component of rejecting a disadvantageous offer is thought to involve overriding one’s default preference to accept any money that is offered47. Thus, these analyses suggest that participants’ higher rejection rate of disadvantageous offers from the AI counterpart is driven, in part, by higher parasympathetic system activity associated with inhibitory control. Was this association moderated by participants’ affect? If so, it would suggest that emotional tone contributes to shaping participants’ deployment of self-regulatory processes. If not, it would suggest that affective processes and self-regulatory processes represent largely independent contributions to accept/reject decisions. We turn next to this question.

Fig. 4
figure 4

Counterpart Type differences in the association between HRV indices and disadvantageous offer rejection rates. The interaction between Condition (H = Human, AI = Artificial Intelligence) and four heart rate variability (HRV) indices is shown: (A) SDNN (B) Shannon Entropy (ShanEn), (C) SD2 and (D) MeanNN. Blue solid lines represent the AI condition group, and orange dashed lines represent the Human condition group. Shaded regions indicate 95% confidence intervals. While HRV indices (ShanEn, SDNN, SD2, MeanNN) generally predict increasing rejection rates in the AI condition group, the Human condition group shows flatter or decreasing trends.

In regression analyses of rejection rates of Advantageous, Disadvantageous, and Fair offers with Affect, HRV (SDNN), and the interaction term as predictors, distinct patterns emerged between participants who faced human versus AI counterparts. For Advantageous offers, participants in the human condition showed no significant main effects for SDNN (β = 0.00078, p =.220) or Affect (β = −0.00476, p =.541), but a significant interaction between SDNN and Affect (β = −0.00060, p =.034). The negative sign indicates that as parasympathetic activity (SDNN) increases, the relationship between Affect and rejection rate weakens or reverses. In contrast, participants in the AI condition displayed no significant main effects of SDNN (β = 0.00060, p =.156) or Affect (β = 0.00102, p =.843), and no interaction effects (β = 0.00011, p =.623).

For Disadvantageous offers, participants in the human condition exhibited a significant main effect of Affect (β = −0.0266, p =.023), indicating that greater negative affect predicted higher rejection rates, while SDNN (β = −0.00057, p =.549) and the interaction term (β = −0.00005, p =.898) were non-significant. In the AI condition, in contrast, SDNN showed a significant positive main effect (β = 0.00174, p =.033), suggesting that higher parasympathetic activity was associated with increased inhibitory control when rejecting disadvantageous offers, whereas Affect (β = −0.0115, p =.252) and the interaction term (β = 0.00018, p =.675) were not significant.

For Fair offers, participants in the human condition displayed a significant main effect of Affect (β = −0.0286, p =.008), indicating that greater positive affect was associated with lower rejection rates, while SDNN (β = 0.0012, p =.159) and the interaction (β = −0.00050, p =.184) were not significant. In the AI condition, SDNN emerged as a significant positive predictor of rejection rate (β = 0.00279, p <.001), while Affect (β = −0.0064, p =.310) and the interaction term (β = −0.00025, p =.365) were non-significant.

In summary, these analyses suggest distinct mechanisms driving rejection decisions when interacting with human versus AI counterparts. Specifically, the decision to reject an offer appears to be influenced by two sources: (a) affective experience and (b) inhibition of the tendency to accept any amount of money. With a human counterpart, the decision to reject an offer is primarily driven by affective state, such that higher positive emotion reduces the likelihood of rejection. This reflects a relatively straightforward process, as participants can readily interpret the experience of receiving a disadvantageous offer within a typical, human-to-human, normative framework. Moreover, human proposers may display emotional cues that may activate or heighten responders’ own emotional state through, for example, arousal attributional processes76 or emotional contagion77. In contrast, when interacting with an AI counterpart, rejection decisions are less influenced by emotional states, but more influenced by the deployment of inhibitory processes—reflected in higher parasympathetic activity (e.g. SDNN) which override the default tendency to accept any offer. This additional inhibitory control may stem from participants’ unfamiliarity and uncertainty with how to navigate social conventions with a quasi-human entity, making the act of rejecting an AI offer comparatively more effortful. Moreover, the AI counterpart in our study did not display any emotional cues that might activate or intensify participants’ affective state. Figure 5 illustrates this proposed relation between affect and self-regulation by enlarging the affect component for human counterparts but broadening the self-regulation component for AI counterparts.

General discussion

These data yielded several novel findings regarding individuals’ behavioral, physiological, and affective responses to disadvantageous, advantageous, and fair offers from AI agents. First, participants were more likely to reject disadvantageous offers from an AI counterpart than from a human counterpart. Although, as noted above, our method was not a direct replication of19, the pattern nonetheless differs from Sanfey et al.’s19 finding (and was obtained with a much larger sample). It is attributable, we suggest, to at least four reasons. First, with the passage of time, AI entities have assumed a larger role in millions of people’s everyday lives. This has likely led to evolving expectations of AI social behavior since 2003. Second, research on the Perfect Automation Schema23 has demonstrated that people often expect higher reliability and impartiality from autonomous agents. For example, Shariff, Bonnefon, and Rahwan24 reported that participants held autonomous vehicles to a higher standard of safety than they required from human drivers. Third, people may believe that AI agents should be subservient to humans26, thus, an AI agent that attempts to domineer a human with a lowball offer would violate its presumed subservient position. These findings align with Treiman et al.28, who demonstrated that participants even exhibit a willingness to incur personal costs to instill fairness in AI. Fourth, although LLMs are increasingly convincing as conversational agents, the social norms that inhibit negative emotional expressions in human-to-human interactions may nonetheless be weaker in human-to-AI interactions11. Thus, people may feel freer to express displeasure toward - or even ‘punish’ - an entity that will presumably not take offence. Although we interpret the higher rejection of AI offers through the lens of the Perfect Automation Schema (PAS) and the agent’s subordinate role, additional factors likely modulate participant responses. Beyond the Perfect Automation Schema (PAS) and the perceived subordination of AI, two complementary mechanisms likely also contribute. The first is anthropomorphism; to the extent that participants perceive artificial agents to possess less of a human-like mind, they are less likely to grant artificial agents standing as moral agents or patients20,21,78. Although we did not measure perceived anthropomorphism in this study (we recorded trait anthropocentrism to address questions beyond the scope of the present paper), future work should experimentally vary the robot/agent’s psychological features along the VASS dimensions - Values, Autonomy, Social connection, and Self-awareness, shown to shape trust in robots13, see also20. Such manipulations should be calibrated against uncanny-valley risks, as increasing human-likeness can backfire and induce unease in users53.

Second, emotional signals strongly shape trust, cooperation, and willingness to concede79. Artificial agents typically offer limited and non-contingent emotional feedback. Virtual agents’ facial/affective signals (or lack thereof) and beliefs about their origin (algorithmic vs. human-controlled or explainability etc.) systematically shape users’ decisions in negotiation and social-dilemma tasks80. Related work shows that an agent’s voice quality/prosody systematically shifts perceived anthropomorphism and competence/intelligence in human–agent interactions and teams81,82. If AI partners supply limited socio-emotional information, responders may rely more on internal regulation (e.g., HRV-indexed control) when adjudging unfairness. Future studies should manipulate vocal prosody and affective cues to test whether richer emotion signals attenuate the moderating role of prefrontal control and alter affective responses.

Finally, in human–human Ultimatum Games, acceptance of favorable offers can be constrained by internalized social-norm expectations (e.g., reciprocity; long-term relationship management), an idea formalized in social-norm and inequity-aversion accounts83,84. In contrast, interacting with a more instrumental AI counterpart may weaken such relational considerations. More broadly, people may find it less intuitive to evaluate machines’ moral decisions, which can raise thresholds for praise/blame and for what counts as a “fair” move78,85. Thus, social-normative mechanisms may also contribute to why favorable human offers are rejected more than comparable AI offers.

Influence of HRV versus affect

Although participants generally reported feeling negative emotion after receiving disadvantageous offers, self-reported positive/negative affect was equivalently associated with the rejection of disadvantageous offers from human and AI counterparts. In contrast, the relation between heart rate variability (HRV) metrics such as MeanNN, SDNN, and RMSSD and rejection rate displayed a clear sensitivity to counterpart type (AI vs. Human), with stronger associations between parasympathetic engagement and rejection of offers from AI counterparts than from human counterparts. These HRV metrics are commonly thought to index individuals’ level of effortful focus. Thus, the experience of playing the Ultimatum Game with an AI agent - and especially receiving a lowball offer from an AI agent - appears to intensify self-control processes.

Taken together, we propose the following explanatory model (depicted in Fig. 5): In typical, human-human interactions, societal fairness norms act as a check against the default tendency to accept any money that is offered. Such culturally ingrained fairness norms contribute to the rejection of both disadvantageous offers (“You are ripping me off!”) and advantageous offers (“I am ripping you off!”). In human-AI interactions, however, such social norms have yet to be clearly established and are thus weaker. In such cases, to restrain the default inclination to accept a disadvantageous offer, additional input is needed. This input appears to come from effortful, self-regulatory processes, captured by measures of HRV. In other words, in the absence of traditional emotional cues from the AI agent, physiological regulation reflected in HRV may serve as an important complementary mechanism that contributes to the decision to accept or reject the offer. This model aligns with prior research that has emphasized the role of HRV in regulating fairness-related behavior47.

Fig. 5
figure 5

Interaction Model illustrating hypothesized decision-making processes in Human-Human and Human-AI UG decision making.

HRV may also index the resolution of normative conflict (self-interest vs. fairness). In human–human bargaining, richer socio-emotional cues (perceived intentionality, agency, and emotional feedback) amplify affect-driven rejections; in our data, the markedly stronger affect–behavior associations in the human condition are consistent with that pattern. By contrast, AI counterparts offer sparse socio-affective signals, leading decision makers to rely more on internal control, for which resting vagal HRV is a well-validated proxy of top-down regulation within neurovisceral-integration frameworks86,87. This interpretation aligns with prior UG work showing that trait HRV and inhibitory capacity predict the rejection of unfair offers, consistent with self-control overriding short-term gain to uphold fairness norms47. This explanation also aligns with evidence that HRV indexes-controlled regulation in morally demanding judgments and that elevated HRV facilitates the flexible integration of cognitive and affective inputs88,89. Taken together, these findings extend dual-process theories of moral judgment to human–AI contexts: affective drives appear more influential when the counterpart is human, whereas cognitive control (indexed by HRV) is more influential in human-AI interactions88,90,91. Importantly, Rosas et al.88 found results consistent with a conflict-resolution account of HRV, reporting that only participants with high tonic HRV unlike those with low tonic HRV showed positive correlations between tonic and phasic HRV and sensitivity to the utilitarian gradient. This suggests that higher HRV supports more accurate mapping of normative trade-offs, consistent with efficient conflict resolution88. Converging evidence from criminal judges (who are trained to regulate affect in punitive contexts) shows that resting HRV selectively predicts rejection of unfair offers, indicating that vagal control contributes to fairness-oriented decisions when emotional engagement must be held in check50. We think that this mirrors our AI condition, in which reduced socio-emotional reciprocity likely elevates the role of internal control mechanisms indexed by HRV.

Advantageous inequity aversion (AIA)

Advantageous offers, despite being beneficial, are perceived as violations of fairness norms, leading to discomfort linked to guilt or embarrassment. Participants in this study demonstrated higher rejection rates of advantageous offers from human counterparts than from AI counterparts. This finding resonates with work by Shaw and Choshen-Hillel40 and McAuliffe et al.92 who provided evidence for the role of societal norms and social conditioning in shaping fairness perceptions. One possible contributor to this phenomenon is emotional synchrony. This phenomenon of emotional synchrony was captured by Adam Smith in his Theory of Moral Sentiments, in which he describes how individuals moderate their emotional expressions (including positive emotions) to align with societal expectations:

“To see the emotions of their hearts, in every respect, beat in time to his own constitute his sole consolation. But he can only hope to obtain this by lowering his passion to that pitch in which the spectators are capable of going along with him. He must flatten, if I may be allowed to say so, the sharpness of its natural tone…” Smith, 1759/2009, p. 2893

That is, people often “flatten” their positive emotional tone when interacting with other humans, to avoid eliciting resentment or discomfort. This emotional calibration, however, becomes less relevant in interactions with AI agents, who are perceived, as the default, to be generally lower in - or devoid of - emotional capacity20,53. Although the current procedure did not include any measures to directly test this positive emotion suppression mechanism, we consider it to be a plausible partial explanation for the diminished rejection rates of advantageous offers and the lower emotional disturbance when dealing with AI agents. We encourage future researchers to test this possibility more directly.

Some accounts suggest that rejecting unfair offers necessitates inhibitory control to suppress economically self-interested impulses, a notion supported by neuroimaging and neuromodulation studies linking such rejections to prefrontal cortex activity47,94,95. On the other hand, alternative views posit that fairness-based social norms serve as intuitive defaults guiding behavior, independent of economic self-interest, especially in third-party contexts83,95. Extending these literatures, our study demonstrates that interactions with AI agents, which are largely devoid of established social cues and norms, still engage self-regulatory processes. Specifically, we observed that heart rate variability (HRV), a physiological marker of inhibitory control and emotional regulation, plays a critical role in navigating fairness decisions involving AI partners. This suggests that even in the absence of traditional social frameworks, individuals rely on internal regulatory mechanisms to uphold fairness norms, highlighting the adaptability and resilience of these processes in novel interaction contexts. Whereas emotional responses play the dominant role in human-human UG exchanges, self-regulation appears to assume a larger role when the counterpart is an AI agent. We suggest that in future studies, HRV measures might emerge as effective indices for capturing decision-making in human-AI interactions beyond the UG, particularly in contexts involving fairness.

A limitation of our approach is that, by randomly assigning participants to either the human or AI condition (rather than using a within-subjects design as in19, we do not create an explicit side-by-side contrast. As a result, any “motivational gap” between human and AI partners is not cued directly. We believe that the lack of a perceived social intention behind an AI offer still emerges in a between-subjects design even though there is no side-by-side comparison. In our design, participants judged each AI offer on its own terms, against their own, internal fairness norms rather than against a human benchmark. We suggest that this between-subjects approach yields, if anything, a ‘purer’ measure of participants’ intuitions in isolation, free from potential contrast effects. Moreover, although we acknowledge that the universe of the Ultimatum Game is rather abstracted from everyday life, consider that people typically interact with either a chatbot or a human, not one and then the other. Another limitation we acknowledge is that our participants were mainly first-year university students (M = 19.2 years), future research should replicate this paradigm with more age-diverse and culturally heterogeneous samples including working professionals and older adults to determine whether the patterns observed holds across different populations.

Because our sample was composed primarily of young North American university students (Mean age ≈ 19), generalizability is constrained, and the reported fairness patterns should be interpreted with cultural caution. Cross-cultural research shows that Ultimatum Game (UG) norms and fairness expectations vary markedly across societies96 and human AI/robot interaction studies indicate that responses to agent “humanness” and social context are also culturally moderated97,98. Cultures that interpret technology as a neutral tool or that emphasize relational harmony may engage different affective and self-regulatory processes when evaluating AI fairness. Accordingly, we explicitly recommend the inclusion of age and culture-diverse samples in future work, with targeted recruitment from collectivist and individualist contexts, preregistered cross-cultural comparisons, and analyses that test measurement invariance for affective and physiological indices across groups97,98.

Real-world human–AI relationships unfold over time and familiarity can reshape both trust and fairness expectations. We therefore believe that the affective and HRV patterns reported here may evolve with repeated exposure to the same AI partner. Longitudinal studies that track shifts in emotion, autonomic regulation, and rejection behaviour across multiple interactions will be necessary for determining whether inequity aversion toward AI diminishes or intensifies as users gain experience and confidence in the technology.

The findings have significant implications for the design and implementation of AI systems in social and economic contexts. Understanding how people perceive fairness in AI interactions can guide the development of systems that align with human moral and ethical standards. For example, embedding human-like social cues in AI agents could mitigate the reliance on physiological regulation, fostering trust and cooperation13, to ease the cognitive and emotional self-regulatory burden on users, designers should incorporate social cues that explicitly convey the agent’s intentions and values. For example, designers may have the agent signal moral commitments and intentions explicitly (e.g., it states that it “will not exploit,” or it “prioritizes fairness over short-term gain”), given experiments demonstrating that attributing values/moral principles to robots increases trust13. In parallel, designers can systematically design vocal cues which can shape social perception and can increase cooperation and perceived competence. Moreover, as AI becomes more integrated into daily life, shifting cultural norms surrounding human-AI interaction must be continually assessed. Finally, clear cultural differences in comfort with robotic and AI agents have been documented98. Future researchers will need to be mindful of participants’ cultural context when assessing their perceptions of AI fairness.