Introduction

Non-communicable diseases (NCDs) have become the leading cause of global mortality, accounting for 76% of deaths worldwide1. This dramatic epidemiological shift has created an urgent need for innovative health communication solutions. Electronic health (eHealth) platforms have emerged as a promising approach to deliver scalable health interventions, but their effectiveness is often limited by low user engagement and high attrition rates2.

Embodied conversational agents (ECAs) offer a potential solution by simulating face-to-face health communication. These digital interfaces can provide personalized health advice and support through natural interactions3. However, a critical challenge remains: how to design ECAs that are truly persuasive in promoting health behavior change. While previous research has separately examined the effects of ECA appearance and message tone, little is known about how these elements interact to influence user perceptions and decisions. This gap in knowledge significantly limits our ability to create optimally persuasive health communication agents.

ECA in eHealth

ECAs are autonomous software entities designed for health communication, utilizing embodied interfaces to simulate face-to-face interaction4. Their effectiveness stems from the integration of verbal and non-verbal social cues - including gestures, facial expressions, and vocal prosody - which enhance user engagement through naturalistic responses5. Florence, WHO’s first virtual health worker, provided smoking cessation content to help people develop cessation plans6. Clinical applications demonstrate their versatility across: (1) chronic disease management (e.g., diabetes, hypertension), (2) mental health interventions (e.g., depression, anxiety), and (3) preventive care (e.g., smoking cessation, physical activity)7,8. Notably, 72% of users prefer ECAs for lifestyle-related guidance over complex medical consultations9, suggesting their persuasive potential hinges on optimizing both informational content (central processing) and social presence (peripheral processing) - a duality reflected in health communication models10. This evidence justifies our focus on lifestyle domains where ECAs show highest adoption rates.

Impact of ECA appearance features

The appearance of ECAs—encompassing attire, demographics, and expressions—serves as a critical social cue in health communication. Professional attire (e.g., white coats) consistently enhances perceived authority, replicating the physician “white coat effect” in digital contexts11,12. Recent evidence suggests this visual professionalism reduces cognitive conflict during message processing13. According to the Elaboration Likelihood Model (ELM), persuasion occurs through two distinct pathways: (1) the central route involves careful evaluation of message content when users are motivated and able to process information deeply; (2) the peripheral route relies on heuristic cues (like appearance) when users lack capacity or motivation for detailed analysis14. Here, professional attire serves as a strong peripheral cue that primes acceptance of health messages. Conversely, casual attire improves approachability but may undermine message credibility15 —reflecting the fundamental warmth-competence tradeoff in social cognition, where warmth (approachability) and competence (expertise) are often perceived as inversely related16.

Demographic features show similar contextual effects. Users prefer young female agents for peer-like interactions17, whereas authority-driven scenarios benefit from mature appearances18. Expressive elements further modulate these effects: static professional images paired with neutral text achieve high persuasion by facilitating central route processing19, while dynamic ECAs require emotional congruence between appearance and tone to optimize peripheral route effects20. This aligns with cue consistency principles in multimodal communication, where aligned visual and verbal cues enhance processing fluency13.

Effects of the emotional tonality of health information

According to ELM, neutral tones facilitate central route processing by enabling objective evaluation of factual content, while positive tones leverage peripheral route persuasion through affective arousal21. This dichotomy is evident in health communication: neutral texts excel for complex medical information by enhancing credibility20, whereas positive tones boost motivation in behavioral interventions through emotional resonance22.

Notably, text-based emotional cues trigger stronger social perceptions than previously assumed. Users instinctively associate positive tones with communicative warmth and neutral tones with expertise13—a pattern explained by ELM’s peripheral cue processing, where emotional valence serves as a heuristic for agent personality. However, this effect is context-dependent: while positive expressions enhance perceived agent helpfulness in peer interactions, neutral tones prove more effective for authority-driven advice15. This tension underscores the need for strategic tone selection based on communication goals.

The effect of matching ECA appearance to the emotional tone of the message on persuasion

Persuasion can be described as changing the attitudes and/or behavior of others. In the context of eHealth, the persuasive power of an ECA refers to its ability to effectively communicate health information and influence users’ perceived attitudes, intentions, and behaviors. Studies have identified the important role of persuasion in that by improving the perceived persuasiveness of a system23. It has also been shown that ECAs with a match between appearance and health topics (e.g., Chef and Cooking) result in higher ratings of persuasion and intention to use18.

Persuasion in eHealth requires strategic alignment between ECA appearance and message tone—a phenomenon where ELM’s dual routes interact with Social Cues Theory’s congruence principle13. When professional ECAs deliver neutral health messages, users engage central route processing to evaluate factual content, while the authoritative appearance simultaneously validates message credibility through peripheral cues14. Conversely, casual ECAs with positive messages leverage peripheral route persuasion through social rapport24, demonstrating how warmth-competence tradeoffs16 dictate optimal cue combinations.

Importantly, mismatches trigger cognitive strain—as predicted by ELM’s principle that conflicting central/peripheral cues impair persuasion25. Current applications already reflect these insights: clinical ECAs like WHO’s Florence use professional-neutral pairing for credibility26, while lifestyle coaches adopt casual-positive combinations for engagement18.

ERP in AI agents

Event-related potentials (ERPs), which are electrophysiological signals associated with neural responses to events, provides critical insights into AI agent interactions by capturing implicit neural processes that self-reports cannot access27. In ECA research, the N400 component (negative deflection presenting a peak 400 milliseconds after the stimulus appeared) reflects semantic congruence—directly measuring ELM’s central route processing when users evaluate message-appearance matches28,29. Similarly, the LPP component (Late Positive Potential, a late positive component with a peak amplitude of about 600ms) indexes motivational attention allocation30,31, quantifying peripheral route engagement through emotional arousal.

These neural markers may resolve key theoretical debates: (1) N400 amplitudes confirm professional-neutral pairings reduce cognitive conflict32, validating Social Cues Theory’s congruence principle; (2) LPP enhancements to positive-casual pairings demonstrate peripheral route efficacy28. Crucially, ERP data reveal ELM-predicted interactions between routes—when peripheral cues (appearance) and central content (tone) align, they synergistically enhance persuasion29. This explains why multimodal consistency—not isolated cues—drives ECA effectiveness13.

Research objectives

While prior research has established the independent effects of ECA appearance and message tone, how their congruence influences persuasion through integrated neurocognitive pathways remains unexplored at the intersection of ELM and Social Cues Theory. This study aims to: (1) identify optimal appearance-tone matches for eHealth ECAs, (2) uncover the neural mechanisms underlying their persuasive effects, and (3) provide evidence-based design guidelines. Specifically, we test four hypotheses that examine both explicit evaluations and implicit processing:

H1 Users have higher perceptions of matching ECAs with a combination of neutral messages and professional images compared to other combinations.

H2 Users perceive ECAs with neutral emotional messages and professional appearance as more persuasive than ECAs with other combinations.

H3 Users perceive less conflict and produce smaller N400s when matched with neutral emotional text and professional image.

H4 Users perceive higher similarity and produce greater LPP when matched with neutral emotional text and professional image.

Materials and methods

Participants

We recruited 42 students (23 females and 19 males, mean age 20.95 ± 2.118 years) from a University as participants. While the use of a homogeneous sample (e.g., university students) may limit the generalizability of our findings, it helps to reduce between-subject variability and increase statistical power for detecting neurocognitive effects, which is particularly important for ERP research with its typically moderate effect sizes33. This approach allows for a more sensitive test of our experimental manipulation under controlled conditions, generalizability to broader populations requires future verification. All participants were right-handed, had normal visual acuity or corrected vision, had no history of neurological or psychiatric disorders. In addition, all participants were required to be well rested and not taking stimulants or psychotropic drugs. The study was approved by the Science and Technology Ethics Committee of a University, and all subjects signed an informed consent form before the experiment and were paid a certain amount of money at the end of the experiment.

Stimuli

Based on previous research34, and avoiding interference from other variables, the ECAs are two cartoon female appearances with the same face shape, the same simple smile, and the same hairstyle, aged around 20–30 years old. Their appearance differs only in that one wears a white coat with a stethoscope and the other wears casual clothes. They were designed and generated by art and design professionals and discussed by three scholars specializing in communication and medicine.

The original health text materials included 18 text messages about healthy lifestyles (Fitness & Exercise, Healthy Eating, and Stress Management): a total of 9 positive texts and 9 negative texts were included, all written in Chinese. The three sub-themes were selected based on their prominence as modifiable risk factors in global health guidelines35.The Textual materials were adapted from WebMD and Mayo Clinic—consumer health platforms with established medical content accuracy36, then refined through a two-stage process: (1) a physician and a communication researcher classified texts as positive/neutral using CDC’s Clear Communication Index37 for clinical validity, and (2) a linguist standardized emotional tone using LIWC lexicon38 to ensure linguistic consistency. The mood of the texts was manipulated through the following: positive emotional texts usually use optimistic and inspirational language, aiming to stimulate and elevate the reader’s mood. Neutral tonal text, on the other hand, focuses on objective and factual expressions, avoids emotional overtones, and focuses on direct and precise delivery of information. Negative tones were excluded per WHO guidelines discouraging fear appeals in health promotion39. Therefore, information containing negative emotions is excluded from our study.

To confirm the validity of the stimulus material, we recruited online 31 university students (target users of eHealth interventions40 who did not participate in the subsequent ERP experiment (15 females and 16 males, mean age 21 ± 1.4 years).first, rating ECA professionalism on a 5-point scale (1 = Very unprofessional to 5 = Very professional) based on explicit visual cues (white coat = professional, casual = unprofessional), then evaluating text emotionality using the same scale. Although non-experts, for this study, the perception of the target users (university students) is the most important criterion in and of itself. Paired t-tests demonstrated significant discriminability: professional vs. casual images (p < 0.001, d = 1.2); positive vs. neutral texts (p < 0.001, d = 1.1). The 6 most discriminative messages (3 positive/3 neutral) were selected as final stimuli. The formal experimental materials are shown in Fig. 1.

Fig. 1
figure 1

Experimental stimulus materials.

Procedure

This experiment was conducted in the laboratory of the College of Humanities, a University. The experimental apparatus was the MindBridge-NaNo (developed by Guangzhou Qianga Neuroscience Technology Co., Ltd.). According to the extended version of the International 10–20 Electrode Placement System (Fig. 2), the electrodes were located at 32 standard positions. Stimuli were presented on a 19-inch LCD (1920 × 1080 pixels, 60 Hz) screen that was sized for clear observation by participants (1920 × 1080 pixels).

Fig. 2
figure 2

A diagram of the electrodes used in the experiment.

Before the start of the experiment, all participants were seated 70 cm from the front of the computer screen to view the stimulus images, with a viewing angle of approximately 33° x 19°. The ERP task was programmed and presented using Python 3.8 software, and pictures containing textual information or appearances were repeated 10 times each, presenting the stimuli randomly to eliminate the sequential effect, as shown in Fig. 3. Initially, a 1-minute countdown was used to put subjects in a relaxed state; a “+” sign appeared to help them focus on the center of the picture, and then the formal experiment began. First, a random picture containing textual information appeared for 3000-5000ms, followed by an appearance of a picture containing an image of an ECA, and subjects were required to judge the persuasiveness of this ECA image about the textual information that had just appeared, with 1 indicating very unpersuasive and 5 indicating very persuasive. The stimuli were alternated like this, with a blank page with a “+” sign in the center before each stimulus appeared, for 400-600ms to bring the participants’ visual perception back to baseline level. After the ERP experiment, subjects were required to fill out a questionnaire on the perceived persuasiveness of the stimuli and the degree of match. Each experiment lasted 30 min with 1 break in between.

Fig. 3
figure 3

Task paradigm with the timing of presentation.

Measurement

Participants’ immediate responses to the ECA persuasion were investigated via keystrokes in the formal ERP experiment. In the post-test questionnaire, the stimulus material from the ERP experiment was presented to the participants again. For matching, participants were asked to select one of the two ECAs that they perceived as the best match through text messages of different emotions. For perceived persuasion, participants were asked to rate the following three questions (adapted from a validated scale18: (1) The health advice provided by this character is persuasive; (2) The health advice provided by this character will influence me; (3) The health advice provided by this character will make me pay attention to my own (this aspect of) health behavior. Ratings were all on a five-point Likert scale ranging from strongly disagree to strongly agree.

Data recording and analysis

EEG activity was recorded with a neuroscanning cap, and EEG signals were acquired at a rate of 1000 Hz. Reference electrodes (A1 and A2) were placed in the bilateral mastoid process, and the impedance of each electrode was less than 5 kΩ. After the recording was completed, offline preprocessing was performed using the EEGLAB toolbox to obtain clean ERP data. The preprocessing of the data consisted of the following steps: (1) Re-referencing to the average of bilateral mastoids. (2) High-pass filtering at 30 Hz and low-pass filtering at 0.1 Hz; (3) Segmenting and baseline correction (-200ms to 800ms). (4) Independent Component Analysis (ICA); (5) Manual identification and artifact detection in epoched data; (6) Overlapping and averaging of ERP data.

Based on previous studies29,41 and visual inspection of grand average waveform maps (Fig. 4), six electrodes in the frontal region and central frontal region (F3, FZ, F4, FC3, FCZ, FC4) and N400 (400-440ms) were selected for analysis, and nine electrodes in the region from the frontal lobe to the center (F3, FZ, F4, FC3, FCZ, FC4, C3, CZ, C4) were analyzed for LPP (620-670ms).

Fig. 4
figure 4

ERP waveforms and topographical maps of N400 and LPP under different conditions.

The keystroke persuasion data were analyzed using a two-way (appearance × emotional information) repeated ANOVA. For the ERP component, a three-way (appearance × emotional information × electrode) repeated ANOVA was used. Subjects’ key press data and persuasion data from the post-test questionnaire were subjected to repeated ANOVA to determine whether subjects’ perceived persuasion remained consistent across time points. Key press data and ERP amplitudes were compared using Pearson correlation analysis. All statistical analyses were tested for statistical significance using SPSS Statistics 26 and were considered statistically significant at p < 0.05. Data analyzed in SPSS were corrected using Greenhouse-Geisser.

Results

Match

For neutral emotional text, 39 subjects (93%, Table 1) thought it was a better match to the professional image, and for positive emotional text, 28 subjects (67%) thought it was a better match to the professional image, so H1 was valid.

Table 1 Matched cognitive subject test under each eca’s textual expression set.

Difference between two persuasion judgments

In the ERP experiment, subjects’ keystroke choices responded only to the question of whether they were persuasive or not, while the post-test questionnaire measured persuasion more specifically through the validated perceived persuasiveness scale. A repeated ANOVA was conducted to determine whether there were significant differences in subjects’ persuasion judgments across time. The results showed a nonsignificant effect of time of measurement (p = 0.57), a nonsignificant interaction effect between time of measurement and other variables (p > 0.05), and a significant interaction effect only between emotional information and time of measurement (p = 0.006). However, on further analysis, the simple effect of emotional information was not significant either at the first measurement (p = 0.08) or at the second measurement (p = 0.89), and the simple effect of measurement time was not significant in either emotional information condition (p = 0.08; p = 0.52). That is, although the statistical tests showed a significant interaction effect, this effect may not be of much significance in practical applications. Thus, the results indicate that participants’ evaluations of the persuasion of the stimuli remained consistent across time points.

ERP result

The results are shown in Tables 2 and 3. The effect of emotional information on N400 amplitude was not significant (p = 0.90), the main effect of image type was significant (p = 0.004), and the interaction effect of emotional information and image was not significant (p = 0.17). Unprofessional image (M = -1.940, SE = 8.977) elicited a larger N400 amplitude compared to professional image (M = 0.895, SE = 6.533). There was a nonsignificant effect of emotional information on LPP amplitude (p = 0.66), a significant effect of appearance type on LPP amplitude (p = 0.02), and a significant interaction effect (p = 0.049). Further analysis revealed that professional images had higher mean values of LPP amplitude than unprofessional images at the significance level (p = 0.008) in the neutral emotional text condition.

ERP results showed that neutral text combined with a professional image elicited smaller N400 and larger LPP amplitudes, supporting H3 and H4.

A two-way repeated ANOVA on push-button persuasion found (Tables 2 and 3) that there was no significant difference between the two emotional messages on persuasion (p = 0.08), a significant main effect of image type (p < 0.001), and a significant interaction between emotional message and appearance (p < 0.001). A simple effects analysis found that persuasion triggered by the professional image (M = 3.967, SE = 0.650) was significantly higher than persuasion triggered by unprofessional image (M = 3.473, SE = 0.738) (p < 0.001). A repeated ANOVA with the group as a single factor revealed significant differences in persuasion between combinations (p < 0.001). Bonferroni’s multiple mean comparisons revealed that the combination of neutral emotional text and professional image was significantly more persuasive than the combination of neutral emotional text and unprofessional image (p < 0.001) and significantly more persuasive than the combination of positive emotional text and professional image (p = 0.003), and significantly higher than the positive emotional text and unprofessional image combination (p < 0.001). The other three groups were not significantly different from each other in terms of persuasion. Such results support H2.

Table 2 Results of the repeated ANOVAs on the mean (amplitudes) of P2, LPP, and persuasion.
Table 3 Mean (amplitudes) of the ERP components and persuasion in the ERP study.

Pearson correlation analyses of key press data and ERP amplitudes found no correlation between N400 and persuasion (p = 0.23), nor between LPP and persuasion (p = 0.16). Further subgroup analyses revealed that N400 showed a significant negative correlation with persuasion only in the combination of positive emotional text and professional image (r = − 0.306, p = 0.049). In addition, in the positive mood condition, the elicited LPP and persuasion showed a significant negative correlation, both in combination with professional image (r = − 0.319, p = 0.04) and unprofessional image (r = − 0.326, p = 0.04).

Discussion

Principal results

This study examined how ECA appearances and emotional text affect perceived persuasion through the integrated lens of Social Cues Theory and the Elaboration Likelihood Model (ELM). The results demonstrated that professional ECAs matched with neutral text were consistently more persuasive (supporting H1 and H2), with neural evidence further validating these effects (H3 and H4).

The matching results revealed that 93% of participants perceived professional image ECAs as most neutral emotional health messages. This strong preference aligns with the “diagnosticity principle” in social cognition, where professional attire provides unambiguous expertise signals26, while neutral tones maintain objective credibility. The white coat effect11 explains how professional imagery automatically activates trust schemas, and when combined with fact-based messaging, creates a powerful persuasive synergy that can significantly influence health behaviors18. Interestingly, unprofessional images paired with positive emotional text also showed notable matching effects (67% approval), likely because casual appearances foster peer-like rapport42 while positive tones enhance social engagement20 - a combination particularly effective for motivational contexts.

Behavioral data clearly established the persuasive superiority of professional-neutral pairings (M = 4.17 vs. 3.47 for unprofessional-neutral, p < 0.001). This effect reflects ELM’s central route processing, where credible sources facilitate deeper message elaboration. The interaction analysis revealed an important nuance: while professional images suffered persuasive penalties when paired with positive texts (due to role incongruence15, casual appearances actually benefited from positive emotional tones through what term “peer affinity effects"24. This dichotomy perfectly illustrates Social Cues Theory’s core premise - different social roles (expert vs. peer) demand distinct communication styles.

ERP findings provided neural validation of these effects. The professional-neutral combination elicited significantly smaller N400 amplitudes (p = 0.004), indicating reduced cognitive conflict during schema matching43 - neural evidence for the white coat effect’s automaticity. Concurrently, larger LPP amplitudes (p = 0.02) reflected enhanced motivational attention, suggesting this pairing successfully engages both automatic and controlled processing systems44,45. These neural signatures confirm that optimal persuasion occurs when visual and verbal cues mutually reinforce expected social scripts26.

Notably, mismatched conditions revealed the costs of violating cue consistency. Professional ECAs delivering positive messages produced significant N400-LPP dissociation (r = -0.32, p = 0.04), reflecting the neural strain of reconciling conflicting expertise and warmth cues46. Behaviorally, these pairings showed 12% lower persuasion scores, demonstrating how cue incongruence undermines ELM’s peripheral route effectiveness. Similarly, positive emotional texts unexpectedly generated negative LPP-persuasion correlations regardless of accompanying images, likely because health contexts trigger unique emotional processing patterns where excessive positivity may seem inappropriate31.

The complete findings collectively demonstrate that ECA persuasion operates through dual pathways: professional-neutral pairings dominate through credibility-driven systematic processing (ELM’s central route), while casual-positive combinations offer alternative peripheral route appeal via social-emotional engagement. This comprehensive account bridges Social Cues Theory’s focus on multimodal congruence with ELM’s processing depth continuum, providing both theoretical integration and practical design principles for health communication systems.

Strengths and limitations

The study’s primary strength lies in its triangulation of behavioral and neural measures to decode ECA persuasion mechanisms—an approach that operationalizes ELM’s dual-process theory through complementary explicit (questionnaire) and implicit (ERP) indicators. By capturing both conscious evaluations and subconscious processing, we bridge Social Cues Theory’s focus on multimodal congruence with ELM’s attention to processing depth. The replication of key effects across methods (e.g., professional-neutral superiority in both N400/LPP and persuasion ratings) enhances validity.

Three limitations warrant consideration: (1) Focusing solely on lifestyle topics (fitness, nutrition, stress) prioritizes ecological validity over breadth, though this aligns with ECAs’ predominant use cases10. (2) While static ECAs control for animation artifacts, they preclude examination of dynamic emotion-expression matching—a key dimension in Social Cues Theory’s extended framework20. (3) University students offer cognitive homogeneity for ERP but limit generalizability to age/education extremes, particularly relevant given ELM’s emphasis on receiver characteristics14.

These constraints suggest three theoretical-methodological synergies for future work: (1) dynamic ECAs to test ELM’s peripheral cue flexibility, (2) cross-cultural samples to probe Social Cues’ universality, and (3) clinical populations where central/peripheral route balances may differ.

Conclusions

This study advances our understanding of health communication by demonstrating how the strategic alignment of ECA appearance and message tone optimizes persuasion through complementary mechanisms. The robust superiority of professional ECAs delivering neutral messages confirms Social Cues Theory’s diagnosticity principle while validating ELM’s central route processing under high-credibility conditions. Conversely, the effectiveness of casual-positive pairings illustrates peripheral route persuasion through social-emotional engagement. Crucially, our neural evidence reveals that successful persuasion requires congruence between visual cues (which prime processing expectations) and textual tones (which fulfill those expectations)—a synthesis only possible through integrating both theoretical frameworks. These findings provide actionable guidelines: professional appearances should dominate fact-based health communication, while peer-like ECAs suit motivational contexts. Future research should explore how these principles generalize across cultures, health literacy levels, and dynamic interaction contexts.