Abstract
Social interaction supports brain health and recovery after neurological injury. Yet no validated tool exists for real-time measurement in individuals with and without neurological deficits. We developed SocialBit, a lightweight, privacy-preserving machine learning algorithm that detects social interactions using ambient audio features on a commercial smartwatch. In a prospective validation study, we evaluated SocialBit against livestream minute-by-minute human-coded ground truth in 153 hospitalized stroke patients who wore the device for up to 8 days, generating 88,918 min of observation. In these patients, the stroke severity and cognition spanned broad clinical ranges (NIH Stroke Scale 0–25; Montreal Cognitive Assessment 8–30), and 24 patients had aphasia across diverse subtypes, including severe presentations. SocialBit achieved high overall performance (sensitivity 0.87, specificity 0.88, area under the curve 0.94) and maintained accuracy in patients with language deficits (AUC 0.93). Despite lower temporal sampling, SocialBit produced interaction frequency distributions closely matching minute-by-minute human coding. Performance was robust across environments and interaction types. Of clinical relevance, SocialBit showed that patients with more severe strokes engaged in less social interaction, paralleling human-coded results. SocialBit is an accurate digital biomarker of social interaction with potential applications in remote monitoring and clinical trials.
Introduction
Engaging in social interactions is one of the most complex cognitive tasks performed by humans. It requires the close coordination of multiple cognitive processes, including meaning making, narrative coherence, perspective-taking, and non-verbal synchrony. These functions rely on distributed neural systems across frontal, temporal, and parietal cortices that were shaped by evolutionary pressures1,2. For instance, functional neuroimaging studies show that coordinated activity in the medial prefrontal cortex, temporoparietal junction, and fusiform gyrus contribute to processes such as empathy, trait inference, and facial recognition that are required for social interactions3. These areas develop together from infancy through processes of activity-dependent plasticity that are driven by the child’s experiences of the social environment4. Collectively, engagement in social interactions represents a high-order, multimodal expression of cognitive function.
Beyond serving as a marker of cognitive health, social interactions also play a protective role in long-term health outcomes5. Robust evidence shows that social connections increase longevity and reduce cardiovascular disease, cognitive decline, and depression with effects comparable to smoking or obesity6. The World Health Organization now lists “social support networks” as a determinant of health, alongside income, education, and access to care. When individuals are integrated into social communities, biological systems function more adaptively; conversely, persistent social isolation leads to elevated stress responses and increased allostatic load7. Despite the clear evidence of its importance, social interaction remains difficult to measure in clinical settings, and there are no widely implemented tools for passive, real-time social monitoring.
The implications of this measurement gap are particularly consequential for individuals recovering from stroke. Stroke survivors frequently experience social isolation due to speech, cognitive, and mobility impairments—factors that also impair their ability to complete self-report instruments8. This isolation occurs precisely during a time when social stimulation may offer the greatest benefit. A large body of preclinical evidence shows that enriched environments that include social interaction promote neurogenesis, synaptic remodeling, and functional recovery after stroke9,10. These effects are mediated by upregulation of neurotrophic factors and increased neuronal activity in peri-infarct tissue9. Yet, inpatient rehabilitation settings are often devoid of such social stimulation11. Even if social stimulation was prioritized, we lack scalable tools to track and optimize social engagement during the recovery process.
To address this need, we developed SocialBit, a lightweight machine learning algorithm that detects social interaction using ambient audio features collected passively on commercial smartwatches (Fig. 1)12. SocialBit does not store raw audio or use natural language processing, preserving privacy while enabling deployment in persons with and without language difficulties. Prior methods for interaction detection have relied on sensors worn by all parties, GPS tracking, or intrusive recording devices, which were not suited for cognitively, linguistically, or motor-impaired individuals13,14,15. In contrast, SocialBit aims to be technically parsimonious and universally accessible.
SocialBit is a Smartwatch App that Measures Social Interaction.This figure is a schematic representation, and the displayed interaction and noninteraction bars do not reflect SocialBit’s actual temporal resolution. Notably, social interaction was defined as any utterance spoken by or to the patient with another person. Image credit: The left-side image in Fig. 1 is adapted from “Compassionate healthcare nurse providing gentle care to elderly patient in modern hospital room” by Jelena, via Adobe Stock (Image ID: #1554007103).
In this study, we tested the accuracy of SocialBit in a cohort of 153 hospitalized stroke patients with diverse neurological abilities. Against minute-by-minute human-coded ground truth of patients’ interactions via live stream video, we evaluated the algorithm’s performance under varying conditions.
Results
Participants
We enrolled 153 hospitalized patients with ischemic stroke between June 2021 and March 2025. Figure S1 shows the participant flow diagram. As detailed in Table 1, participants had a mean age of 66 years (SD 13.9), and 53% were male. Stroke severity spanned a broad clinical range, with NIH Stroke Scale scores from 0 to 25 (median 2, IQR 0.00, 4.00), encompassing mild (0–5), moderate (6–15), and severe (> 15) stroke presentations. Cognitive assessments also showed a range, with Montreal Cognitive Assessment score from 8 to 30 (mean 23.6, SD 4.5). Aphasia was present in 24 participants (16%), including global (n = 7), Wernicke’s (n = 2), Broca’s (n = 5), mixed (n = 5), and unspecified subtypes (n = 5). This represented a heterogenous aphasia cohort mostly with moderate to severe stroke presentations. Most participants (84%) spoke English only, while 15% were bilingual or primarily Spanish-speaking.
Data collected
The ground truth dataset included 88,918 min (~ 1,482 h) of human-coded data. The SocialBit dataset included 14,045 min (~ 234 h) of algorithm-generated data. The ground truth data, of which SocialBit data are a subset, spanned 325 hospital days in total, with an average of 2.21 coded days per participant. Of the total minutes, 66,576 min (~ 1,110 h) were collected at Brigham and Women’s Hospital, and 12,109 min (~ 202 h) at Spaulding Rehabilitation Hospital. The median amount of coding per participant was 428 min (interquartile range [IQR] 525.5), with a mean of 604.88 min (SD 566.37). Coding duration per participant was higher at Spaulding (median 888 min, IQR 764) compared to Brigham and Women’s (median 378 min, IQR 497). There were no adverse events during the data collection.
To evaluate potential bias from SocialBit’s 1-in-5-minute sampling strategy, we compared kernel density plots of social interaction frequency with minute-by-minute human coding (Fig. 2). The proportion of time spent interacting was the number of minutes coded as social interactions divided by all minutes observed by human coders or SocialBit. The distributions overlapped closely, with comparable means and standard deviations (human coding: M = 0.51, SD = 0.20; SocialBit: M = 0.50, SD = 0.20).
Algorithm’s overall accuracy
As seen in Table 2; Fig. 3, the SocialBit LSTM and SocialBit Transformer fine-tuned algorithms performed better than the benchmark AudioSet Speech and AudioSet Conversation classifiers. SocialBit LSTM and SocialBit Transformer out-performed AudioSet Speech by 4.9% and 6.1% in balanced accuracy, respectively. The SocialBit Transformer algorithm achieved the highest overall performance, with a mean balanced accuracy of 0.87 and AUC of 0.94.
To further assess subject-level generalization, we examined the distribution of test accuracy for individual patients derived from the 5-fold cross-validation. The model achieved a mean participant-level accuracy of 0.86 (SD = 0.10, n = 139). This variance was partially attributable to data volume; the standard deviation in accuracy dropped by half (from 0.12 to 0.06) when comparing patients with fewer than 100 samples to those with 100 or more. While variance was higher at the individual level compared to the aggregate fold level, the mean performance remained consistent with the overall model metrics.
For specificity, the SocialBit Transformer outperformed the AudioSet Speech and AudioSet Conversation classifier by 8.0% and 31.6%, respectively, reducing false positives from non-social sounds (e.g., background TV). For sensitivity, SocialBit Transformer exceeded AudioSet Speech and AudioSet Conversation by 4.8% and 13.0%, respectively, demonstrating strong detection even in noisy or ambiguous interaction settings. We report subsequent analyses using the SocialBit Transformer, which showed higher performance across sensitivity, specificity, balanced accuracy, and AUC.
Algorithm’s accuracy in patients with aphasia
SocialBit maintained high performance in patients with aphasia (n = 24), achieving 0.82 sensitivity, 0.87 specificity, 0.84 balanced accuracy, and 0.93 area under the curve (AUC) (Table 3). Compared to patients without aphasia, this represented a moderate decrease of 6.8% in sensitivity, 2.2% in specificity, 4.5% in balanced accuracy, and 1.1% in AUC. Figure 4 displays SocialBit sensitivity across aphasia subtypes, highlighting variability by subtype with the lowest sensitivity observed in global aphasia. The reduction in sensitivity across all patients with aphasia may be because of fewer speech contributions from the patient with aphasia, and social interactions in such patients being briefer.
Algorithm’s clinical validity
We evaluated the clinical validity of SocialBit by examining its relationship with bedside-assessed stroke severity and comparing it to ground-truth human-coded interaction data. Higher National Institutes of Health Stroke Scale (NIHSS) scores (M = 3.0, SD = 4.4; range = 0–25) correlated significantly with fewer social interactions measured by SocialBit (r = − 0.19, p = 0.029) and by human coders (r = − 0.22, p = 0.008). Specifically, each 1-point increase in NIHSS corresponded to a 0.9% decrease in SocialBit-estimated interaction time and a 1.1% decrease in ground-truth-coded interaction time. These findings demonstrate that SocialBit captures meaningful social effects linked to stroke severity with high psychometric fidelity (Fig. 5).
Algorithm performance across interaction characteristics
We evaluated the sensitivity of SocialBit across six features of social interaction: depth, tone, number of speakers, interaction partner, communication modality, and spoken language (Table 4). Because annotations were available only for positive samples, we report sensitivity only.
Depth of interaction
Sensitivity increased with conversational depth: 0.55 for level 1, 0.79 for level 2, and ≥ 0.91 for levels 3–5. Shallow interactions (e.g., greetings) were harder to detect, while deeper exchanges produced sustained features aiding detection.
Tone
Sensitivity was high across tones, with slightly better performance in negatively valanced interactions. Neutral (76%) and mildly positive (20%) tones (tone + 1) had sensitivities of 0.87 and 0.89; strongly positive (tone + 2) was 0.86. Negative tones yielded 0.95 (tone − 1) and 0.99 (tone − 2).
Number of speakers
Consistent with the idea that more speech facilitates social interaction detection, performance improved with more speakers: 0.86 for one-on-one, 0.90 for two speakers, and 0.93 for three or more.
Interaction partner
Sensitivity was consistent across partners: 0.88 for clinicians/family, 0.82 for patients, 0.86 for caregivers, 0.92 for strangers, and 0.90 for unidentified.
Spoken Language
Performance was similar for English (0.88) and non-English (0.89; n = 210 samples) interactions.
Communication modality
Sensitivity varied across communication formats. Sensitivity was 0.87 for in-person, 0.91 for phone, and 0.88 for video calls, with highest detection on phone calls.
Algorithm performance across acoustic and environmental conditions
We tested SocialBit robustness in four real-world conditions that introduce acoustic complexity: (1) side conversations, (2) television sounds, (3) care setting, and (4) smartwatch hardware (Table 5.).
Side conversation
There was stable performance in the presence of side conversations. The model had a balanced accuracy of 0.84 and an AUC of 0.92 when side conversations were present, compared to 0.90 and 0.95 when side conversations were absent.
Television sound
Television presence had a minor effect on performance, primarily reducing sensitivity. The model had a balanced accuracy of 0.85 and AUC 0.92 with television present, compared to 0.88 and 0.95 with television absent.
Care setting
Performance was consistent across care settings: In the hospital, the model had a balanced accuracy of 0.87 and an AUC of 0.94. In the rehabilitation setting, the model had a balanced accuracy of 0.90 and an AUC of 0.96.
Smartwatch hardware
Performance remained equivalent across different smartwatch devices. For the TicWatch Pro 3, the model had a balanced accuracy of 0.88 and AUC of 0.94. For the Galaxy Watch5 Pro, the model had a balanced accuracy of 0.87 and AUC of 0.94.
Discussion
Our findings provide evidence for SocialBit as an accurate, universally accessible, and privacy-preserving sensor for detecting social interactions in patients with diverse neurological abilities. In 153 stroke survivors, SocialBit achieved high sensitivity (0.87), specificity (0.88), and overall discriminative ability (AUC 0.94) when benchmarked against minute-by-minute human coding and when benchmarked against existing speech and conversation classifiers. SocialBit maintained high accuracy even in patients with aphasia, a population historically excluded from survey-based research. The algorithm captured meaningful variation across key features of social interaction and performed reliably across environmental conditions and smartwatch devices.
Our findings build on recent advances in ambient audio sensing for social behavior. Prior methods have shown the feasibility of detecting face-to-face interactions using smartphone audio, foreground speech localization, and multi-party wearables13,14,15,16,17. For example, Liang et al. introduced a smartwatch model detecting conversational turn-taking via foreground speech and speaker change detection16. SocialBit complements these efforts by expanding detection to interactions without clear speaker alternation or linguistic symmetry—common in patients with aphasia, cognitive impairment, or serious illness. By using a single wearable without requiring turn-taking or speech content, SocialBit offers a technically simple, socially inclusive framework for real-world interaction monitoring.
Our findings can be contextualized relative to prior observational studies of inpatient stroke social activity, particularly the work by Bernhardt et al.11 In that multicenter study of 64 patients within 14 days of stroke, observed between 8 am and 5 pm, patients were with others for ~ 40% of the day. Our study shared key design features, including daytime observation in hospitalized stroke populations and documentation of the presence of other people. However, we observed higher rates of social interaction, approaching 50% by both human coding and SocialBit. Several factors may explain this difference. Our observations used higher temporal resolution, with one-minute human coding and five-minute SocialBit estimates, which may capture brief or fleeting interactions missed by ten-minute sampling. Our cohort also included both acute stroke units and inpatient rehabilitation settings, where social exposure may differ. Finally, social interaction after stroke likely varies across hospital environments and care models. Together, these findings suggest that observed social interaction after stroke varies by setting and measurement approach, warranting further study.
Our findings align with a growing field that uses voice as a digital biomarker for health. Voice has been linked to physical and mental well-being, reflecting changes in respiratory function, emotional state, and neurological status through passive monitoring18. This interest has expanded into cognitive health, where patterns in spontaneous speech—such as lexical diversity, syntactic complexity, prosody, pause duration, and turn-taking—are associated with mild cognitive impairment, early dementia, and functional decline19. Disruptions in these features often signal altered neurocognitive status, including slower responses, reduced topic maintenance, and diminished conversational synchrony20. However, most approaches depend on natural language processing and cooperative verbal output, limiting their use in everyday naturalistic settings. SocialBit complements this work by shifting focus from speech content to non-lexical acoustic features.
SocialBit’s passive detection of social interactions has practical applications across medical conditions. In clinical research, SocialBit can serve as an objective, longitudinal outcome for clinical trials, capturing aspects of quality of life that are central to recovery yet invisible to traditional clinical metrics. Studies have shown that relationships are central to quality of life, with strong social ties linked to greater resilience, emotional well-being, and functional independence5. In stroke recovery, social engagement is a marker and modifiable target of cognitive rehabilitation21,22. By enabling continuous measurement without reliance on self-report, SocialBit is particularly well suited for patients with aphasia or cognitive impairment, who are often excluded from existing assessments.
In clinical workflows, SocialBit could support remote monitoring by identifying early reductions in social engagement that may precede cognitive or functional decline, enabling timely intervention23,24. In inpatient and rehabilitation settings, social interaction metrics could complement physical and cognitive measures to track recovery trajectories and tailor therapy intensity or social enrichment strategies. Over longer time horizons, SocialBit may enable outcome assessment beyond discharge, supporting personalized rehabilitation, caregiver engagement, and evaluation of interventions aimed at reducing isolation. By quantifying real-world sociality, SocialBit provides a foundation for operationalizing social prescriptions and establishing social interaction as a measurable vital sign within routing clinical care.
This study has limitations. First, although conducted in naturalistic hospital settings with continuous minute-level labeling, the algorithm may need further tuning in home or community contexts. Future studies should explore longer-term use in ambulatory populations. Second, although our cohort was clinically and demographically diverse, most interactions were in English. Third, the study of patients with severe stroke presentations and aphasia were limited, representing an important area for future work. Fourth, the current model classifies the presence of interaction but not its depth, tone, or quality, which are features important to functional outcomes. Finally, integrating social sensing into clinical workflows will require shifts in culture, reimbursement, and regulatory frameworks to support behavioral vital signs as routine metrics.
By passively detecting real-world interactions in individuals with diverse neurological abilities, SocialBit fills a gap in digital health. It provides a technically parsimonious, privacy-preserving, and inclusive method for measuring a core dimension of brain health. Looking ahead, SocialBit could help establish social interaction as a routinely monitored dimension of clinical care, serving as both a research endpoint and a practical tool for guiding rehabilitation and preventing isolation in clinical populations.
Methods
Study design and participants
We provide a full description of the study rationale, design, and procedures in a protocol paper12. In brief, we conducted a prospective observational study to test the validity of SocialBit. Our primary aim was to assess the accuracy of SocialBit in detecting minute-level social interactions, using human observation as the ground truth.
We recruited participants in consecutive series while they were inpatients at Brigham and Women’s Hospital from June, 2021 to March, 2025. Inclusion criteria were: (1) Diagnosed with an acute ischemic stroke defined clinically with support from imaging and (2) 18 years old or older. Exclusion criteria were: (1) On Comfort Measures Only (a patient end- of- life care plan that focuses on pain relief and quality of life), (2) Diagnosed with dementia prior to stroke in the medical record, (3) Unable to obtain informed consent from the patient or patient decision maker, and (4) Patient or patient decision maker is unable to understand or speak English well enough to complete surveys.
Algorithm development
We developed the SocialBit algorithm to detect minute-level social interaction using ambient audio captured from a consumer-grade smartwatch. Our primary goal was to balance high model performance with technical feasibility for clinical research, including battery life constraints and privacy protections.
For algorithm development, we began with YAMNet25, a pre-trained deep neural network designed for general-purpose audio classification using the AudioSet ontology of 521 sound events26. YAMNet is based on the MobileNet v1 architecture27, containing 3.7 million parameters and capable of running in real time on a smartwatch while preserving battery life. It processes non-overlapping 0.96-second windows of raw audio and outputs class probability scores. For our purposes, we extracted the 1024-dimensional penultimate layer embeddings, which provide rich, compressed feature representations that exclude sensitive content such as specific words or speaker identity. Notably, in preliminary analyses, we evaluated commonly used voice activity detection approaches, including WebRTC VAD and Silero VAD, as potential baselines. However, these methods, which are optimized for binary speech detection in telephony settings, showed limited robustness in our on-body, real-world recordings with overlapping speakers and environmental noise; we therefore selected YAMNet Speech and Conversation classes as more task-relevant and rigorous baselines for social interaction detection. The algorithm needed no preprocessing or artifact mitigations strategies.
We used YAMNet embeddings as input to fine tune sequential models that could learn patterns of social interaction over time. Specifically, we trained two models: a Long Short-Term Memory (LSTM) network28 and a Transformer model29. Both models were designed to classify each one-minute segment of audio as containing social interaction or not. The LSTM model captured temporal dependencies by processing the sequence of YAMNet embeddings step-by-step, while the Transformer model used parallelized self-attention to evaluate all time steps simultaneously, offering computational advantages and improved handling of longer-range dependencies.
The SocialBit LSTM model architecture comprised a bidirectional LSTM layer with 350 units, followed by a dropout layer (75%), and a dense two-unit softmax output layer. The SocialBit Transformer model architecture included two sequential Transformer units, each composed of layer normalization, multi-head attention (six heads, 768 dimensions), dropout (35%), residual connections, and an additional normalization step. These were followed by a 1D convolutional layer (nine filters, kernel size 1 × 1), a fully connected layer with 35 units, and a final dropout layer. Notably, to preserve battery life, the SocialBit algorithms collected audio features for 1 out of every 5 min. In practice, the realized sampling yield was slightly lower than the nominal 20% due to technical factors, including intermittent time synchronization issues and transient software or data upload failures. These issues occasionally prevented scheduled sampling windows from being recorded but did not reflect systematic resets or errors in the sampling logic. During the study, we deployed system stability updates to maintain compatibility with evolving Wear OS versions and to improve data reliability.
For transparency and reproducibility, we provide layer-by-layer specifications of the SocialBit LSTM and Transformer architectures as Figure S2. This includes model structure, hyperparameters, and implementation details in TensorFlow/Keras.
For this study, we performed YAMNet feature extraction on-device on the smartwatch and conducted backend model training and evaluation off-device. Fully on-device inference represents a future implementation goal and was not assessed in this study.
Deployment in clinical setting
Participants wore a smartwatch running the SocialBit algorithm each day from 9:00 am to 5:00 pm for up to 5 days at Brigham and Women’s Hospital, or up to 3 days at Spaulding Rehabilitation Hospital. During this period, trained human coders used a HIPAA-compliant livestream system to label the presence or absence of social interaction and interaction characteristics minute-by-minute in a REDCap ground truth table (Table S1). Social interaction was defined as any utterance spoken by or to the patient with another person. Utterances included fragmented speech, non-sensical speech, or non-verbal sounds as may occur in patients with aphasia (for detailed guidelines on making coding judgements, see protocol)12. We also administered cognitive testing and standardized questionnaires to assess social connectedness, mood, and function at enrollment and again at 3 months. We determined aphasia status and subtype by detailed review of the medical record, including neurology and speech-language pathology documentation. We classified aphasia as global, Broca’s, Wernicke’s, mixed, or unspecified based on documented expressive and receptive language deficits. All participants provided written informed consent. The Mass General Brigham Institutional Review Board approved all study procedures. All methods were performed in accordance with the relevant guidelines and regulations.
Evaluation of socialbit performance
We evaluated two fine-tuned SocialBit algorithms, SocialBit LSTM and SocialBit Transformer, against the two most corresponding Google AudioSet classifiers as benchmarks, AudioSet Speech and AudioSet Conversation. We evaluated the raw AudioSet model’s output on these classes as a baseline, using the maximum score across each 1-minute window. Any time-points missing human-adjudicated data (such as when patients were being cleaned or changed) were removed. As described in our protocol, we determined sample size by iterative performance metrics and standards for deep learning algorithms in prior studies12.
We evaluated the performance of the models using standard classification metrics: sensitivity, specificity, balanced accuracy, and area under the receiver operating characteristic curve (AUC). Sensitivity was the proportion of correctly identified positive samples, while specificity was the proportion of correctly identified negative samples. Balanced accuracy was the calculated average of sensitivity and specificity. AUC was the trade-off between sensitivity and specificity across thresholds.
We implemented participant-level 5-fold cross-validation to ensure that data from any individual participant appeared in only one-fold, preventing overlap between training and testing sets. For the AudioSet Speech and AudioSet Conversation baseline models, within each training fold, we selected classification thresholds (τ) that maximized the harmonic mean of sensitivity and specificity.
For the fine-tuned SocialBit LSTM and SocialBit Transformer models, each training set (comprising four folds) was further split into training and validation subsets in a 3:1 ratio. We used validation subsets for hyperparameter tuning, including early stopping. This procedure yielded 20 trained models per architecture across the five outer folds. To reduce training bias from class imbalance, we applied under-sampling of the majority class randomly at the sample level during model training30.
We performed hyperparameter optimization using Bayesian optimization with validation accuracy as the objective, tuning hyperparameters independently within each training fold. Across folds, we observed highly consistent selected hyperparameter configurations and validation performance for both the LSTM and Transformer models; therefore, we adopted a single representative configuration per architecture for all reported analyses. The optimized hyperparameters, search ranges, and selected values are reported in Table S2.
For clinical validation, we tested whether SocialBit models produced effect estimates comparable to the human-coded ground truth. We evaluated the face-valid hypothesis that higher levels of stroke severity are related to lower social interaction frequency (indicating the socially isolating effects of stroke). We correlated stroke severity (NIH Stroke Scale) with social interaction frequency, defined as the proportion of observed minutes with interaction based on human coding and SocialBit. Effect estimates that are comparable in magnitude provide psychometric evidence that SocialBit can be used as a research algorithm on par with ground-truth human codings.
Finally, we conducted secondary analyses to assess SocialBit’s robustness across diverse conditions. First, we examined sensitivity by interaction characteristics—depth, tone, number of speakers, partner type, modality, and language—using human-coded annotations. Second, we evaluated balanced accuracy and AUC across challenging environments, including side conversations, background TV, care settings, and smartwatch models. These tests assessed the algorithm’s generalizability to real-world variability.
We used R version 4.4.1 for all statistical analyses. We used the STARD (Standards for Reporting Diagnostic accuracy studies) Statement Guideline in reporting results (Table S3).
Data availability
The data that support the findings of this study are available from Brigham and Women’s Hospital, but restrictions apply to their availability because they were used under HIPAA regulations for the current study and are therefore not publicly available. Data are, however, available from the authors upon reasonable request and with permission of Brigham and Women’s Hospital.
References
Dunbar, R. I. M. & Shultz, S. Evolution in the social brain. Science 317, 1344–1347 (2007).
Adolphs, R. The social brain: neural basis of social knowledge. Annu. Rev. Psychol. 60, 693–716 (2009).
Stanley, D. A. & Adolphs, R. Toward a neural basis for social behavior. Neuron 80, 816–826 (2013).
Carozza, S., Kletenik, I., Astle, D., Schwamm, L. & Dhand, A. Whole-brain white matter variation across childhood environments. Proc. Natl. Acad. Sci. U S A. 122, e2409985122 (2025).
Livingston, G. et al. Dementia prevention, intervention, and care: 2024 report of the lancet standing commission. Lancet 404, 572–628 (2024).
Holt-Lunstad, J., Smith, T. B., Baker, M., Harris, T. & Stephenson, D. Loneliness and social isolation as risk factors for mortality: a meta-analytic review. Perspect. Psychol. Sci. 10, 227–237 (2015).
Beckes, L. & Coan, J. A. Social baseline theory: the role of social proximity in emotion and economy of action. Soc. Pers. Psychol. Compass. 5, 976–988 (2011).
Northcott, S. & Hilari, K. Why do people lose their friends after a stroke? Int. J. Lang. Communication Disorders. 46, 524–534 (2011).
Li, W. et al. Changing genes, cells and networks to reprogram the brain after stroke. Nat. Neurosci. https://doi.org/10.1038/s41593-025-01981-8 (2025).
Murphy, T. H. & Corbett, D. Plasticity during stroke recovery: from synapse to behaviour. Nat. Rev. Neurosci. 10, 861–872 (2009).
Bernhardt, J., Dewey, H., Thrift, A. & Donnan, G. Inactive and alone: physical activity within the first 14 days of acute stroke unit care. Stroke 35, 1005–1009 (2004).
White, K. et al. SocialBit: protocol for a prospective observational study to validate a wearable social sensor for stroke survivors with diverse neurological abilities. BMJ Open. 13, e076297 (2023).
Choudhury, T. & Pentland, A. Sensing and modeling human networks using the sociometer. in Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings. 216–222 (2003). https://doi.org/10.1109/ISWC.2003.1241414
Chen, D., Yang, J., Malkin, R. & Wactlar, H. D. Detecting social interactions of the elderly in a nursing home environment. ACM Trans. Multimedia Comput. Commun. Appl. 3, 6–es (2007).
Harari, G. M. et al. Sensing sociability: individual differences in young adults’ conversation, calling, texting, and app use behaviors in daily life. J. Pers. Soc. Psychol. 119, 204–228 (2020).
Liang, D., Zhang, A. & Thomaz, E. Automated Face-To-Face conversation detection on a commodity smartwatch with acoustic sensing. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 1–29 (2023).
Mobile Sensing in Psychology: Methods and Applications (The Guilford Press, 2024).
Fagherazzi, G., Fischer, A., Ismael, M. & Despotovic, V. Voice for health: the use of vocal biomarkers from research to clinical practice. Digit. Biomark. 5, 78–88 (2021).
Tavabi, N. et al. Cognitive digital biomarkers from automated transcription of spoken Language. J. Prev. Alzheimers Dis. 9, 791–800 (2022).
Alhanai, T., Au, R. & Glass, J. Spoken language biomarkers for detecting cognitive impairment. in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 409–416 (2017). 409–416 (2017). (2017). https://doi.org/10.1109/ASRU.2017.8268965
Krueger, K. R. et al. Social engagement and cognitive function in old age. Exp. Aging Res. 35, 45–60 (2009).
Christelis, D. & Dobrescu, L. I. The causal effect of social activities on cognition: evidence from 20 European countries. Soc. Sci. Med. 247, 112783 (2020).
Kimura, N. et al. Modifiable lifestyle factors and cognitive function in older people: A Cross-Sectional observational study. Front. Neurol. 10, 401 (2019).
Dhand, A. et al. Leveraging social networks for the assessment and management of neurological patients. Semin Neurol. 42, 136–148 (2022).
Plakal, M., Ellis, D. & Yamnet GitHub (2020). https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
Gemmeke, J. F. et al. Audio set: an ontology and human-labeled dataset for audio events. 2017 IEEE Int. Conf. Acoust. Speech Signal. Process. (ICASSP). 776-780 https://doi.org/10.1109/ICASSP.2017.7952261 (2017).
Howard, A. G. et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Preprint at (2017). https://doi.org/10.48550/arXiv.1704.04861
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Vaswani, A. et al. Attention is all you need. in Advances in Neural Information Processing Systems (ed (ed Guyon, I.) vol. 30 (Curran Associates, Inc., (2017).
Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian Optimization of Machine Learning Algorithms. Preprint at (2012). https://doi.org/10.48550/arXiv.1206.2944
Acknowledgements
This study was funded by National Institutes of Health/National Center for Medical Rehabilitation Research R01HD099176, PI: Dhand. The funder played no role in the study design, data collection, analysis and interpretation of data, or the writing of the manuscript. We used ChatGPT (GPT-5, OpenAI) to assist with improving readability and clarity of the manuscript text. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Funding
This study was funded by National Institutes of Health/National Center for Medical Rehabilitation Research R01HD099176, PI: Dhand. The funder played no role in the study design, data collection, analysis and interpretation of data, or the writing of the manuscript.
Author information
Authors and Affiliations
Contributions
AD, SN, MRM, and MS conceptualized the study. ST, AD, MRM, CM, DF, MB, ML, and MS designed and implemented the model. AD, SN, SC, OA, GC, OB, RD, ML, VD, GSU, KW, AMB, RZ, SN, MW, MRM, and MS curated, analyzed, and interpreted the data. AD, ST, SC, AMB, MRM, and MS performed the statistical analyses. AD, ST, MRM, and MS directly accessed and verified the data. All authors read and revised the initial draft, approved the final version, and received frequent updates on results.
Corresponding author
Ethics declarations
Competing interests
AD reports consulting fees or stock from ECHAS and Availity. ST reports consulting fees from ECHAS. SC reports consulting fees from Availity. SN is Chief Scientist and Co-Founder with equity stake of Behavioral Signals, a technology company focused on creating technologies for emotional and behavioral machine intelligence in consumer services. He is Co-Founder with equity stake of Lyssn.io, a technology company focused on tools to support training, supervision, and quality assurance of psychotherapy and counseling. He is also a Visiting Faculty Researcher at Google DeepMind. All other authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dhand, A., Tate, S., Mack, C. et al. Validation of socialbit as a smartwatch algorithm for social interaction detection in a clinical population. Sci Rep 16, 4529 (2026). https://doi.org/10.1038/s41598-026-37746-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-37746-x




