Introduction

Obesity remains a significant public health challenge despite various pharmacological and behavioral interventions. Since 1975, the global prevalence of obesity has tripled. As of 2022, 2.5 billion adults aged 18 years and older were considered overweight, including over 890 million who were classified as having obesity1. Traditional behavioral weight loss interventions often fail to provide long-term results, with many individuals experiencing weight regain within 12 months post-intervention2,3. One common target of obesity interventions and treatments is overeating4. However, these efforts are often unsuccessful, potentially because the specific patterns and behaviors that contribute to overeating are not well understood5. A critical reason for this knowledge gap is the lack of a multifaceted approach to studying eating behavior, which often excludes investigation into the complex and interrelated factors contributing to overeating. Consequently, most current treatments do not account for the dynamic interplay of psychological, contextual, and physiological factors, highlighting the need for more personalized and adaptive intervention strategies6.

Considerable effort has been devoted to identifying predictors of overeating, primarily using self-reported data in combination with linear statistical approaches. Most of these studies have focused on single proximal determinants such as stress, cravings, and loss of control (LOC) eating as potential predictors of overeating7,8,9,10. For example, recent studies have identified various single predictors of overeating, such as emotional eating and impulsive responses to food cues11,12,13. While Ecological Momentary Assessment (EMA)—a research methodology that involves repeatedly sampling participants’ behaviors, experiences, and moods in real-time and in their natural environments—provides valuable insights into eating behaviors, it can sometimes suffer from limitations related to the accuracy of meal timing and portion-size reporting. Factors such as recall bias or delayed entries may affect the precision of self-reported data; therefore, objective measures are needed to improve our understanding of overeating.

Wearable sensors are a promising source of objective data on overeating behaviors and their predictors. Wearables can collect data passively and continuously, enabling researchers to obtain behavioral measurements that are both richer and more frequent than those obtained through self-reported measures. Applying wearable sensors to eating studies can increase data reliability and open new avenues for analyzing the occurrence and co-occurrence of overeating predictors by providing richer, finer-grained datasets.

In this study, we first applied machine learning algorithms to identify features that predict overeating using passive sensing data, with and without ecological momentary assessment (EMA) inputs. We then used these identified features to construct distinct clusters of overeating and eating episodes, allowing us to differentiate theoretically and clinically relevant patterns of problematic overeating behaviors. This approach provides individualized data for personalized, adaptive interventions, overcoming the limitations of current one-size-fits-all strategies.

Results

Out of the initial 65 participants, 48 adults were included in the subsequent analyses: seven participants dropped out of the study, five lacked dietitian-administered recalls, and five recorded fewer than 10 meals during the study period, as shown in Fig. 1. The final EMA-only dataset comprised 2302 meal-level observations from participants with obesity, averaging 48 meals per participant. Participants had a mean age of 41 years (range: 21–66), and 77.1% were female. Baseline demographics and the EMA response summary are presented in Table 1. Supplementary Table 1 summarizes the EMA and passive sensing data, including demographic information, EMA response details, and statistics related to passive sensing features for 700 meals (average of 17 meals per participant). To illustrate the distribution of observations across participants, Supplementary Fig. 1 shows the distribution of EMA and feature-complete data (incorporating all available features) per participant. Supplementary Fig. 2 presents the hourly distribution of meals for EMA and feature-complete data across all participants. Details on wear time, adherence, and participant feedback for the SenseWhy wearable camera are available in Supplementary Note 1, Supplementary Table 2, and Supplementary Fig. 3.

Fig. 1: Consort diagram.
figure 1

This figure presents the participant flow through the study, from initial recruitment to final analysis, showing the number of participants at each stage (screened, enrolled, and included in the analysis) and the reasons for exclusions at various points.

Table 1 Baseline Demographics and EMA Response Summary

Supervised overeating detection

We selected XGBoost as the best-performing model after comparing it with SVM and Naïve Bayes. While SVM is particularly effective in high-dimensional spaces and Naïve Bayes is efficient and robust, XGBoost, a non-linear ensemble method, proved more effective in capturing complex patterns in the data (Fig. 2). We further evaluated the model by examining the training and validation errors across iterations to ensure robustness (see Supplementary Fig. 4). For the detection of overeating, we conducted three separate analyses:

Fig. 2: AUROC, AUPRC, and Brier Score for XGBoost compared to SVM and Naïve Bayes.
figure 2

Performance of XGBoost, SVM, and Naïve Bayes models across three data scenarios: a EMA-only data, b Passive sensing-only data, and c Feature-complete data. Each graph shows the AUROC, AUPRC, and Brier score loss for each model to allow comparison of performance across different data inputs and modeling approaches.

EMA-only analysis

XGBoost yielded a AUROC with mean (SD) of 0.83 (0.02), an AUPRC of 0.81 (0.02), and a Brier score loss of 0.13 (0.01). The top five features identified by SHAP were: light refreshment (negative), pre-meal biological hunger (positive), perceived overeating (positive), evening eating (positive), and pleasure-driven desire for food (mixed association). In this context, a positive association means the feature increases the likelihood of overeating, a negative association means it reduces the likelihood, and a mixed association indicates varying effects depending on the instance.

Passive sensing-only analysis

XGBoost resulted in an AUROC of 0.69 (0.04), an AUPRC of 0.69 (0.05), and a Brier score loss of 0.18 (0.02). The top five features were: number of chews (positive), chew interval (negative), chew-bite ratio (negative), number of bites (positive), and chew rate (mixed).

Feature-complete dataset

In this combined dataset, XGBoost achieved an AUROC of 0.86 (0.04), an AUPRC of 0.84 (0.04), and a Brier score loss of 0.11 (0.02). To improve prediction probability calibration, we applied post-calibration using the sigmoid method (Platt’s scaling), which resulted in better alignment between predicted probabilities and observed outcomes (Supplementary Fig. 5). The top five predictive features were: perceived overeating (positive), number of chews (positive), light refreshment (negative), loss of control (positive), and chew interval (negative).

Figure 3 displays SHAP dot and bar plots, highlighting feature importance and their impact across all study levels.

Fig. 3: SHAP plots illustrating the feature importance and impact on XGBoost model output for different datasets.
figure 3

a EMA-only, b Passive sensing-only, and c Feature-complete (combining EMA and passive sensing). The left panels show the mean absolute SHAP values, ranking features by their overall contribution to the model, while the right panels show the distribution of SHAP values for individual feature impacts associated with overeating (red for high feature values and blue for low feature values).

Semi-supervised overeating phenotype clustering

After removing 56 zero-calorie meals (e.g., non-caloric foods and beverages), we arrived at a final dataset of 2,246 meals, of which 369 (16.4%) were identified as overeating episodes. This adjustment ensured that the analysis focused on meals that accurately represented eating behavior, allowing for more reliable detection of overeating patterns. We employed the semi-supervised clustering pipeline and evaluated cluster separability using the silhouette score for 2 to 35 clusters. The separability was visually confirmed through 2D projection using UMAP. The pipeline, applied to the entire dataset of both normal and overeating meals, identified 30 distinct clusters, with a maximum silhouette score of 0.53 and a homogeneity score of 0.66 (Supplementary Fig. 6). To define a cluster as an overeating cluster, we set a threshold of 0.05 for the proportion of total overeating instances.

Visual inspection further confirmed separability in five predominant clusters, each characterized by a high proportion of overeating instances (Supplementary Fig. 7). The final clustering solution achieved a mean purity of 81.4%, a cumulative proportion of overeating instances of 0.85, and an entropy score of 0.36 across the five overeating clusters, further validated using GMM (Supplementary Note 2). Additionally, the silhouette score of 0.59 supported the coherence and distinctiveness of the identified clusters.

Z-score analysis of overeating clusters

We applied z-score analysis to highlight differences within each cluster while identifying shared patterns and overarching themes across clusters. For contextual and psychological factors within each cluster, we selected features with z-scores exceeding the predefined cut-off (\({|z|}\ge 1\)), allowing us to identify the dominant co-occurring factors within the same clusters and uncover shared patterns across clusters. Figure 4 presents a polar bar plot illustrating the z-scores for each contextual and psychological factor across all overeating clusters. We characterized each phenotype based on which feature-level z-scores were large in magnitude for each cluster, highlighting the distinctive characteristics of the cluster. Cluster labels (or names) were assigned to concisely reflect the key factors of each cluster and to provide an intuitive understanding of the overeating patterns observed in the data. A detailed analysis of all contextual factors exceeding the z-score threshold is provided in Table 2. Based on these results, we characterized the following overeating phenotypes:

  • Take-out Feasting

    Preference for indulging in restaurant-sourced meals (take-out), often enjoyed in a social setting, emphasizing the social aspect of shared dining experiences.

  • Evening Restaurant Reveling

    Pleasure-driven indulgence in food, with a preference for restaurant-sourced meals (dine-in), typically consumed in the evening as part of social dining experiences.

  • Evening Craving

    Eating in the evening, often involving self-prepared meals and characterized by hunger, serving as a way to unwind at the end of day.

  • Uncontrolled Pleasure Eating

    Focus on the hedonic aspect of food, involving eating for pleasure, often perceived as overeating with loss of control, and accompanied by task-oriented distractions.

  • Stress-driven Evening Nibbling

Fig. 4: Polar bar plot illustrating the z-scores of features associated with distinct overeating phenotypes.
figure 4

The radial axis represents the magnitude of the z-scores, with bold labels indicating features with a \({|z|}\ge 1\). The circumference is divided into five phenotype clusters, each represented by a unique color, highlighting the key features that differentiate these clusters. This visualization provides insights into the characteristic features of each overeating phenotype, such as psychological factors, contextual influences, and behavioral patterns, enabling a deeper understanding of their unique profiles.

Table 2 Phenotype interpretation

Eating in the evening in response to stress and feelings of loneliness.

Discussion

In the face of a global obesity crisis imposing staggering healthcare costs on societies, the need for innovative and effective interventions has never been more critical. Obesity treatment presents the unique challenge of modifying a core behavior—eating—that is essential for survival yet influenced by several factors. Research indicates that eating is affected not only by physiological hunger but also by psychological factors such as stress and emotions, social interactions, and environmental cues14,15. The complex nature of obesity necessitates a nuanced understanding of eating behaviors and their surrounding influences. Recognizing this complexity, the health sector is increasingly embracing innovative technologies such as wearable sensors and EMA to detect, monitor, and interpret eating behaviors16,17. These tools offer promising avenues for capturing detailed data on dietary intake and contextual factors, enabling more personalized and effective interventions. By leveraging these advancements, researchers can gain valuable insights into overeating patterns and develop strategies that address the multifaceted nature of eating behaviors, ultimately advancing our efforts to combat obesity.

Findings from our study using EMA data revealed that light refreshments, pre-meal biological hunger, perceived overeating, and pleasure-driven desire for food were key factors influencing overeating. Our results demonstrate that biological hunger and pre-meal pleasure-driven desire (i.e., appetite) reflect distinct influences on eating behavior. Biological hunger is a physiological drive based on energy requirements, whereas pleasure-driven desire leads to food intake based on hedonic value. Prior research supports that despite satiation, hedonic factors contribute to excessive intake when highly palatable foods are available18,19. Additionally, our finding of a negative association between light refreshments and overeating suggests that smaller meals may help prevent overeating, aligning with existing literature20. Perceived overeating positively contributed to objective overeating, highlighting the role of self-awareness. Because perceived overeating was assessed post-meal, this suggests individuals can reliably reflect on their eating behavior after consumption and may benefit from interventions that enhance this self-awareness, such as by using mobile health (mHealth) apps to track eating behaviors, provide feedback, and promote mindfulness when eating21,22.

Given the significance of eating speed and bite and chewing patterns in influencing overeating23,24, our analysis of bite and chew counts indicates the potential for defining data-driven thresholds beyond which the likelihood of overeating substantially increases, as demonstrated by SHAP values. Specifically, our results show that at thresholds of approximately 500 chews and 75 bites per meal, SHAP values increase rapidly, signifying an elevated risk of overeating (Supplementary Figs. 8a, c). These findings could inform early interventions targeting individuals who exceed these thresholds. Notably, the model’s AUROC of 0.69 and AUPRC of 0.69 for passive sensing data alone demonstrate moderate predictive performance, supporting the potential for real-time feedback systems to encourage users to reflect on their eating behavior. Such systems could translate complex patterns—such as elevated chew rate or bite count—into simple, actionable feedback or nudges (e.g., timely reminders to slow down), while also providing an objective measure for evaluating the success of various behavioral nutrition interventions. Adding EMA features further enhanced the model’s performance, increasing the AUROC to 0.86 and the AUPRC to 0.84, substantially improving predictive accuracy.

We also found that a high number of chews—around 500 or more—might indicate a prolonged meal duration, potentially leading to overeating due to extended exposure to food (see Supplementary Fig. 8a). Conversely, when the number of chews is relatively low, approaching approximately 100, the SHAP values also increase, suggesting that a large amount of food was consumed rapidly (see Supplementary Fig. 8b). This rise in SHAP values at a lower number of chews corresponds to a higher bite rate, serving as a proxy for eating pace. A higher bite rate in these instances indicates that food was being consumed more quickly, further elevating the likelihood of overeating during these meals. Furthermore, our SHAP analysis of the chew-bite ratio (Supplementary Fig. 8d) indicates that a higher average number of chews per bite predicts a lower likelihood of overeating. This finding suggests that thoroughly chewing food might enhance satiety and reduce the risk of overeating, contributing to prior literature reporting that increased chewing promotes fullness and regulates appetite hormones25,26.

Through clustering analysis, we uniquely identified distinct overeating phenotypes that align with well-documented behaviors in the literature, offering insights into the drivers of overeating and their potential implications for intervention strategies. The “Evening Craving” and “Stress-driven Evening Nibbling” phenotypes reflect circadian-driven eating and emotional eating behaviors, respectively. The “Evening Craving” phenotype, characterized by nighttime eating of self-prepared meals and driven by biological hunger as a way to unwind, aligns with research on circadian rhythm disruptions influencing eating patterns. Studies have indicated that eating later in the day or at night can lead to increased hunger and a preference for energy-dense foods27. For instance, Goel et al.28 found that individuals with delayed circadian timing exhibited heightened appetite in the evening, which may promote late-night eating. This behavior is also consistent with findings on Night Eating Syndrome (NES), where individuals consume a significant portion of their daily intake during the night due to altered circadian eating rhythms29,30. These findings suggest that interventions targeting meal timing, such as structured eating schedules and circadian-based dietary strategies, may help mitigate the metabolic consequences of eating later in the day.

Our operationalization of evening eating diverges from the traditional definition of NES, which involves nocturnal awakenings to eat due to hunger or emotional distress, accompanied by heightened stress levels before and after the meal. Given the limitations of our data, we categorized evening eating as consumption occurring between 5 p.m. and 6 a.m., without the capacity to objectively assess nocturnal awakenings or associated stress responses.

The “Stress-driven Evening Nibbling” phenotype reflects emotional eating patterns widely reported in the literature. Emotional eating involves consuming food in response to negative emotions rather than physiological hunger, often leading to overeating, particularly of high calorie “comfort” foods. Research indicates that stress can elevate cortisol levels, which increases cravings for energy-dense foods and can trigger overeating later in the day31,32. Studies33,34 demonstrated that individuals under stress are more likely to engage in emotional eating, using food as a coping mechanism to alleviate negative feelings such as loneliness or anxiety. While most studies emphasize the impact of stress on increased eating, some research suggests that stress can also lead to reduced appetite in certain individuals32. We observed a group of meals that were not classified as overeating episodes but had some of the highest pre-and post-meal stress levels. These instances often involved snacking rather than consuming large meals, suggesting a pattern related to undereating or altered eating behaviors under stress. This indicates that stress responses can be heterogeneous, and individual differences may influence whether stress leads to overeating or undereating. Strategies like mindfulness-based stress reduction, and structured coping mechanisms have been shown to help regulate emotional eating behaviors and mitigate stress-related disruptions in appetite35.

Furthermore, the “Evening Restaurant Reveling” phenotype reflects the concept of social facilitation of eating, where the presence of others affects food intake. Our results show that this phenotype specifically involves eating with family and friends, indicating that individuals are more likely to overeat in comfortable social settings. Studies have shown that individuals tend to consume more food when eating in groups with friends compared to eating alone or with strangers, owing to extended meal duration and the influence of social norms. This pattern is partly explained by behavioral mimicry, where individuals adjust their eating to match that of their companions36,37,38. However, prior literature suggests that individuals with obesity are less likely to overeat in groups where they feel less comfortable39.

Similarly, the “Take-out Feasting” phenotype aligns with research on convenience eating and the impact of readily accessible, energy-dense foods on consumption patterns. The accessibility of take-out or fast food, combined with social settings, has been shown to lead to overconsumption due to larger portion sizes and the high palatability of foods40,41. Cohen et al.42 discussed how environmental factors, such as the ubiquity of fast-food outlets and marketing strategies, contribute to automatic eating behaviors that override internal hunger cues, especially in social situations where food is a central component.

In our analysis, we identified another distinct overeating phenotype—“Uncontrolled Pleasure Eating”—characterized by overeating for pleasure and a loss of control during tasks such as work or study. The inclusion of loss of control in this phenotype suggests a deeper psychological component, where external cues or emotional states trigger compulsive eating behaviors43. Environments associated with work or study may contribute to this behavior, as cognitive load can impair self-regulation, leading to mindless eating and a diminished ability to control food intake44. Overall, these phenotypes highlight the multifaceted nature of overeating behaviors, encompassing circadian misalignment, emotional stress, social influences, environmental convenience, and psychological factors such as loss of control. Understanding these distinct patterns is essential for designing targeted interventions that address the underlying mechanisms of overeating and support more effective treatments to reduce overeating.

The phenotypes were derived at the meal level, as our primary objective was to characterize meal-based overeating patterns. Because individuals may exhibit multiple overeating phenotypes across different meals, a single individual-level classification can obscure the nuanced interplay of factors, which prior research has shown to overlap rather than remain mutually exclusive45,46. While this study focuses on meal-level classification, future work should explore personalized phenotype trajectories over time, potentially leading to more individualized frameworks for identifying and intervening on shifting overeating behaviors.

Another avenue of research involves examining these phenotypes in relation to clinically relevant variables (e.g., BMI). Because all participants in this study met obesity criteria, BMI variability was limited, precluding robust analyses of phenotype-BMI associations. Future work could assess whether specific phenotypes correlate with broader health markers or weight trajectories, offering deeper insights into the underlying drivers and progression of overeating and obesity.

A strength of this research was the use of a personalized, objective measure of overeating. Overeating in the literature is often defined subjectively, relying on an individual’s perception of whether someone consumed more than needed11, and while this is important, it is long known that subjective overeating is fraught with recall and participant bias, resulting in errors47. Moreover, overeating is typically reported over extended periods—days, weeks, or months—with little attention given to overeating at the meal level. Although overeating over long periods contributes to weight gain, the promise of sensors48 and EMA10 makes timely interventions feasible. Without a solid meal-level definition of overeating, however, we may not be able to identify when, where, and how to intervene effectively. We define overeating episodes as those where an individual’s energy intake exceeds their personal meal/snack average by one standard deviation. Fixed thresholds49 (e.g., meals over 1,000 kCal) do not adjust for differences in BMI, sex, or typical eating habits, potentially misclassifying normal consumption as overeating for some individuals while overlooking it in others. Our method is also consistent with participants’ own perceptions of overeating, as people often consider themselves to have overeaten when they consume more than what is typical for them personally50. This approach assumes relatively stable meal patterns and energy intake over the 14-day study period. Future studies of longer duration might segment data into shorter intervals for recalculating z-scores or employ dynamic methods (e.g., rolling window averages, adaptive Bayesian models) to continuously update reference distributions and account for evolving eating behaviors over time.

A further strength of this work is the use of advanced machine learning techniques through a clustering approach that captures complex data representations, enabling deeper insights into nuanced eating patterns. While clustering algorithms are inherently sensitive to data characteristics, we mitigated potential limitations by leveraging a DNN encoder to extract nonlinearities from the features and learn latent representations. Additionally, UMAP preserved the intrinsic data topology without distorting the overall data structure. Through two-dimensional projection visualization, we confirmed cluster separability with minimal overlap. While we acknowledge the necessity of testing our method on an independent dataset to assess its generalizability, our novel pipeline effectively addresses common clustering pitfalls, enhancing the reliability and validity of our findings. Moreover, we note that addressing class imbalance with SMOTE can introduce synthetic examples that may not fully represent the true data distribution. While critiques of SMOTE are valid, our Supplementary Note 3 provides further analyses indicating its continued utility in this context51.

Our study additionally distinguishes itself by observing individuals with obesity in their natural settings, encouraging them to maintain their usual routines. This approach captures authentic, often problematic eating habits, which we redefine using digital longitudinal data. By characterizing distinct overeating phenotypes via clustering methodologies, we can stratify individuals during a run-in phase, enabling more targeted phenotype-specific interventions that improve both effectiveness and scalability in addressing obesity. Although we prioritized accurate caloric intake estimates via 24-hour recalls collected by dietitians, the actual classification of overeating episodes was data-driven, allowing the process to be automated for broader adoption. For large-scale implementation, mobile applications and self-reporting tools could estimate caloric intake while wearable devices passively track behavioral and physiological signals, reducing user burden. Future work could evaluate the trade-offs between accuracy and feasibility to ensure that these automated methods align with established dietary assessments, enabling timely, phenotype-specific interventions to help individuals reduce their propensity to overeat.

Despite these notable strengths, some additional limitations warrant discussion. Reactivity to continuous measurement is a recognized concern in studies using wearable cameras and EMA methodologies. To address this, participants were instructed to “act naturally” and maintain their usual routines, and the devices were designed to be small, lightweight, and require minimal user input, thus reducing awareness of data collection. The two-week study duration also allowed participants to acclimate to the procedures, further diminishing potential reactivity. In addition, prior research suggests that activity-oriented cameras can reduce perceived surveillance and social discomfort, helping to mitigate reactivity52. Nevertheless, some level of reactivity is inevitable, even with these precautions. EMA can reduce biases compared to other self-report methods53, yet it remains vulnerable to underreporting or stigma54. Integration of wearable sensors and cameras provides an objective means to validate EMA by capturing micromovement patterns, meal timing, dietary composition, and other consumption behaviors16,48. Discrepancies between self-reported EMA data and sensor-derived metrics may indicate inaccuracies arising from stigma or recall bias. Moving forward, triangulating data from EMA, passive sensing, and dietitian-led recalls can yield a more comprehensive and validated dataset that improves our understanding of human behavior.

In conclusion, there has been growing interest in examining phenotypes of overeating. Our study supports a promising new direction by utilizing both EMA and sensor data to lay the foundation for testing timely personalized interventions that mitigate overeating.

Methods

To improve the accuracy of dietary intake assessments, we incorporated passive data collection methods using wearable cameras. Wearable cameras provide objective, direct verification of meals through recorded footage with validated timestamps, capturing both meal timing and dietary composition. Additionally, participants completed 24-hour dietitian-administered dietary recalls, which included a photo-assisted feature to minimize reporting biases and ensure professional oversight of caloric intake estimations55. To analyze the collected data, we applied advanced machine learning techniques to identify patterns in eating behavior. Specifically, we used wearable device data and machine learning models to detect meal timing and examine the relationships between behavioral, psychological, and contextual predictors of overeating episodes. This methodological approach enables a comprehensive analysis of the co-occurring factors that contribute to overeating.

Study design

The SenseWhy study involved 65 adult participants with obesity (BMI ≥ 30 kg/m²) residing in the Chicago Metropolitan Area, and was conducted over a 14-day (two-week) period56. Each participant received a sensing suite, including a chest-worn, activity-oriented wearable camera secured with a neck-worn lanyard and an under-the-shirt magnetic pad. The camera featured an upward-facing sensor array that captured video and thermal images of the face and surrounding areas, enabling visual tracking of eating behaviors in naturalistic settings. An infrared sensor enhanced data quality, particularly for nighttime or low-light eating conditions. Participants could pause recording to protect bystander privacy when others were present. Further details on device placement and orientation are illustrated in Supplementary Fig. 9. Participants used a customized mobile application, installed on their personal smartphones, to record both dietary intake and event-based EMA surveys. The daily number of EMAs varied according to individual eating frequency, averaging approximately 4 entries per day. For each eating event logged, participants completed two pre-meal surveys and one post-meal survey. The pre-meal surveys—administered at the “Decided To” and “About To” stages—collectively consisted of 8 EMA items along with a meal image capture component for food content reporting. Following the meal, participants completed a post-meal survey that similarly comprised 8 EMA items, a meal image capture, and content reporting. This self-initiated, event-based design aimed to capture real-time, context-specific data while minimizing unnecessary notifications or disruptions.

We used the validated multiple-pass method with the Nutritional Data System for Research (NDSR)57 to collect 24-hour dietary recalls. This approach includes five steps: (1) listing all foods and beverages, (2) reviewing for forgotten items, (3) gathering additional details, (4) probing for forgotten foods, and (5) reviewing the entire recall58. The before- and after-meal photographs and brief descriptions for each eating event were provided to the dietitians via a web-portal to assist during recalls. Image-assisted dietary recall methods have been shown to reduce underreporting and recall bias59. Images were processed by trained dietary interviewers as an initial step, with subsequent steps completed during the recall interview. Although these images were not used for calorie estimation, they helped confirm foods consumed, thereby enhancing recall accuracy.

This study was approved by the Institutional Review Board (IRB) at Northwestern University under protocol number STU00204564, ensuring compliance with ethical standards.

Operationalization and validation of meal data collection

Self-reported EMA data on meals were collected using a mobile application. To validate caloric intake and meal composition, data from 24-hour dietary recalls and self-reported app data were merged by matching meals from both approaches within 60-minute windows. This process resulted in an EMA-only dataset containing validated caloric intake for each meal. From the camera footage, trained annotators labeled the precise start and end times of each meal following a standardized protocol. An eating episode was defined as a series of consecutive feeding gestures with intervals not exceeding 15 min, based on definitions from previous research60. Drinking gestures were also labeled separately. Additionally, fine-grained annotations of chews and bites were conducted. A bite was defined as a large jaw open/close movement when food touches the mouth, followed by chews defined as consecutive small jaw open/close movements.

To accurately associate meals with the camera footage labels, we merged each meal using the start and end time labels from all data sources. We visually confirmed these associations by comparing the contextual self-reported data and the food content from the dietitian-administered 24-hour dietary recall with the camera footage. These validated meals formed a dataset comprising EMA response and passive sensing. We defined an overeating episode as any meal that met or exceeded one z-score above an individual’s usual daily energy intake56,61,62. By analyzing the calorie distribution of all meals an individual consumed, we estimated a personalized one z-score cutoff for typical meal-level consumption. Overeating was then operationalized as a dichotomous variable, with 1 indicating an overeating meal and 0 representing a non-overeating meal. Supplementary Fig. 10 displays 14-day eating profiles for several individuals, showing energy intake for all consumed meals.

Preprocessing and feature extraction

We extracted psychological, contextual, and behavioral features known to be associated with obesity for each meal from various data sources. Supplementary Table 3 provides detailed descriptions of EMA questions, camera-derived features, and the timing of data collection (before or after an eating event). Psychological features were collected using EMA questionnaires administered through the study’s mobile application. Participants responded to Likert-scale questions related to stress, emotions, affect, and hunger before and after each meal.

Contextual features were derived from EMA questions focusing on social and environmental triggers during meals. To streamline the raw responses and address correlations among different answers, we preprocessed the data to consolidate responses into specific categories. Meals cooked from scratch, prepared using ingredient delivery services, or made from frozen prepackaged foods were classified as self-prepared meals. Meals obtained from restaurants were labeled as restaurant-sourced meals, while meals consisting solely of snacks, cereals, or beverages were designated as light refreshments. Participants’ activities during meals were categorized based on whether they were engaged in other tasks. If participants were socializing, watching TV, working, studying, or driving while eating, these instances were classified as task-oriented distractions. If they were not engaged in any of the activities listed above, this was termed focused eating. Meals consumed alone were labeled as solo dining, whereas those eaten with others were termed social dining experiences. Evening eating was defined as any meal consumed between 5 p.m. and 6 a.m., based on meal distribution patterns observed in our dataset.

Behavioral features were extracted from fine-grained annotations obtained through detailed analysis of camera footage for each meal, using a standardized annotation protocol. These features included the total number of chews (chewing motions during a meal), the total number of bites (instances where food was brought to the mouth), the chew rate (number of chews divided by the meal duration in minutes), the bite rate (number of bites divided by the meal duration in minutes), the chew-bite ratio (ratio of total chews to total bites), and the meal duration itself, validated through camera recordings. Annotation was performed by trained raters from a third-party professional labeling service, following a structured training program developed by the research team. Inter- and intra-rater agreement was monitored through an internal auditing system, periodic quality checks, and final validation by the research team. Further details on the annotation protocol, rater training, and quality control measures are provided in Supplementary Note 4. Building upon the extracted features, we conducted our analysis at three levels to evaluate the predictive capabilities of different data subsets. The first level, termed the passive-sensing-only analysis, included only the behavioral features obtained from the camera footage. The second level, the EMA-only analysis, utilized the psychological and contextual features derived from the EMA questionnaires. The third level, the feature-complete analysis, incorporated all available features. A detailed description of the analytical pipeline is shown in Fig. 5.

Fig. 5: Analytical pipeline, a step-by-step process of the study’s analytical approach.
figure 5

a Conceptual framework for an integrated ML pipeline aimed at identifying and addressing overeating phenotypes in practice. The pipeline begins with data collection, integrating multimodal sources such as sensor data, EMA data, and dietary recalls to capture a comprehensive view of eating behaviors. This is followed by preprocessing to clean and harmonize the data, ensuring consistency across modalities, and feature extraction to derive key indicators such as chew rate, bite frequency, and emotional states associated with eating episodes. The next stage involves the development of ML models for overeating detection and phenotype ideation. Supervised learning models identify key features predictive of overeating episodes, leveraging behavioral and psychological features, while clustering techniques group individuals into distinct overeating phenotypes based on shared behavioral and contextual patterns. Once phenotypes are characterized, they can be integrated into personalized treatment strategies, tailoring interventions to address specific overeating patterns (context- and behavior-driven overeating). These treatments may include real-time feedback systems to prompt users to reflect on their behaviors, along with recommendations for sustainable behavioral changes. The framework culminates in system deployment, where real-time feedback and monitoring enable continuous assessment of eating behaviors and treatment efficacy. Data from deployed systems can feed back into the pipeline, enabling refinement and validation of models and interventions over time. This iterative process supports the practical application of overeating phenotype identification and management in real-world settings, creating a closed-loop system for adaptive and effective health interventions. b Methodological approach used in this study. The process begins with data preparation, including the labeling and validation of meal times from sensor data, integration of psychological and contextual factors from EMA data and 24-hour dietary recalls. Overeating detection is performed using supervised models, incorporating SMOTE for imbalanced data and Bayesian optimization for fine-tuning. A semi-supervised clustering approach identifies overeating phenotypes, leveraging a non-linear encoder, UMAP for dimensionality reduction, and K-means clustering with z-score analysis for phenotype characterization. Evaluation metrics include AUROC, AUPRC, and Brier score loss for model performance, SHAP for interpretability, and clustering metrics such as silhouette score, homogeneity, and entropy.

Overeating detection: supervised machine learning

Prior to training machine learning models, we first performed feature preprocessing, including standardization of continuous variables (subtracting the mean and dividing by the standard deviation) to ensure zero mean and unit variance. For each subset of features (EMA-only, passive-sensing-only, and feature-complete), we implemented a 5-fold cross-validation procedure, stratified by class (overeating vs. non-overeating) to maintain class proportions within each fold at the meal level. Within each fold, the data were randomly split into training (60%), validation (20%), and test (20%) sets, ensuring no meal data cross-contamination between the sets. To assess the impact of meal-based splitting, we conducted an additional participant-level split analysis where participants were assigned exclusively to one of the training, validation, or test sets. This evaluation was conducted to assess whether the meal-based and participant-based splitting strategies yielded comparable results. Full details of this analysis are provided in Supplementary Note 5. To address the imbalance between overeating and non-overeating labeled meals, we applied the Synthetic Minority Oversampling Technique (SMOTE)63 to the training set. This ensured that the model learned from an equal representation of overeating and non-overeating meals, enhancing its ability to generalize across both categories.

We evaluated several machine learning models, including XGBoost, support vector machines (SVM), and Naïve Bayes in order to predict overeating episodes. Model training was performed on the upsampled training set using SMOTE. We performed hyperparameter optimization using Bayesian optimization64, utilizing the validation set to evaluate model performance during the optimization process. Final performance metrics were derived by averaging the results across the five folds. We also reported the standard deviation of these results, computed from the between-fold results, to assess the variability and robustness of the model’s performance. Ultimately, we selected the best-performing model based on the final performance metrics.

Supervised evaluation and explainability

Model performance was evaluated on the test set within each fold of the cross-validation procedure, using multiple evaluation metrics, including the Area Under the Receiver Operating Characteristic Curve (AUROC), the Area Under the Precision-Recall Curve (AUPRC), and the Brier loss score. We reported the mean and standard deviation of these metrics across the five cross-validation folds. Furthermore, we employed SHapley Additive exPlanations (SHAP)65 to interpret the models and identify the importance and directionality of individual features contributing to the prediction of overeating outcomes. SHAP analysis was conducted on the best-performing fold for each dataset (EMA-only, passive-sensing-only, and feature-complete), allowing us to understand how each feature influenced the model’s predictions and to uncover the underlying factors associated with overeating episodes. The results from SHAP also provided insights into the relative importance of behavioral, psychological, and contextual factors in predicting overeating episodes.

Overeating phenotype extraction: semi-supervised machine learning

We focused on EMA-based data in phenotype generation primarily to ensure comparability with past and future studies, facilitating broader implementation in future interventions. This approach also provided the largest set of recorded meals, thereby increasing statistical power to capture a wider range of overeating episodes and enabling the identification of more distinct behavioral patterns. To uncover the intrinsic structure of our data and enhance interpretability, we implemented a semi-supervised learning pipeline that combined dimensionality reduction and clustering techniques. Dimensionality reduction was performed through a two-step process. First, we employed a deep neural network architecture consisting of a feedforward multilayer perceptron (MLP) serving as a non-linear encoder. This encoder transformed the high-dimensional input data \({\rm{{\rm X}}}=\,{{\mathbb{R}}}^{n\times d}\) into a lower-dimensional latent representation \({\rm{Z}}=\,{{\mathbb{R}}}^{n\times h}\), where \({n}\) is the number of samples, \({d}\) is the number of original features, and \(h\) is the reduced feature spaced dimension. The encoder compresses the input by learning a compact latent representation, ensuring that \(h < d\) while retaining the most informative aspects of the data. The encoder captured non-linear relationships within the data, passively positioning similar data points closer together in the latent space while pushing dissimilar samples apart. The transformation can be found in Eq. (1):

$${\rm{Z}}={f}_{{encoder}}\left({\rm{{\rm X}}};\Theta \right)$$
(1)

where \({f}_{{encoder}}\) denotes the non-linear function parameterized by weights \(\Theta\) learned during training.

Subsequently, the latent representations \({\rm{Z}}\) obtained from the encoder were further processed using the Uniform Manifold Approximation and Projection (UMAP)66, a non-linear manifold learning technique. UMAP reduced the data to a two-dimensional space \({\rm{Y}}=\,{{\mathbb{R}}}^{n\times 2}\), preserving both global data structures while eliminating redundant and irrelevant features. This step facilitated visualization and interpretability of the high-dimensional data by mapping it onto a two-dimensional space in Eq. (2):

$${\rm{Y}}={\rm{UMAP}}\left({\rm{Z}}\right)$$
(2)

Following dimensionality reduction, we applied the k-means clustering algorithm to the transformed data \({\rm{Y}}\). To validate the robustness of our clustering results, we performed a sensitivity analysis using an alternative clustering method, Gaussian Mixture Modeling (GMM), which allows for soft assignments where each point has a probability of belonging to multiple clusters. We selected the number of clusters that maximized the silhouette score67, indicating the most appropriate number of clusters. Visual confirmation was performed using the two-dimensional UMAP projection to ensure well-separated clusters corresponding to distinct overeating meals.

Z-score analysis for cluster interpretation

To interpret clusters, we computed cluster-level z-scores for each variable. This involved calculating cluster means of a given variable, and then computing a z-score across cluster means for that variable. The resulting value characterizes the deviation of a given cluster from the average value of a given variable in terms of standard deviation units. Specifically, for each feature \(j\) and within each cluster \(i\), we computed the z-score in Eq. (3):

$${z}_{ij}=\frac{{\mu }_{ij}-{\mu }_{j}}{{\sigma }_{j}},{\mu }_{j}=\frac{1}{k}\displaystyle \mathop{\sum }\limits_{i=1}^{k}{\mu }_{ij},{\sigma }_{j}=\sqrt{\frac{1}{k}{\sum }_{i=1}^{k}{({\mu }_{ij}-{\mu }_{j})}^{2}}$$
(3)

Here, \({\rm{k}}\) is the total number of clusters, \({\mu }_{{ij}}\) represents the mean of feature \(j\) within cluster \(i\), \({\mu }_{j}\) is the overall mean of cluster-specific means for feature \(j\), and \({\sigma }_{j}\) is the standard deviation of the cluster means for feature \(j\).

This z-score represents how many standard deviations the cluster mean \({\mu }_{{ij}}\) deviates from the mean of cluster means \({\mu }_{j}\) for that feature. Features with high absolute z-scores (e.g., \({{|z}}_{{ij}}|\ge 1\)) were considered substantially different from the average cluster mean, indicating their substantial influence on overeating behaviors within that cluster.

Clustering evaluation metrics

Overeating was not included as a feature in the clustering analysis; instead, we evaluated clusters based on the proportion of overeating episodes within each cluster, following clear criteria outlined below. We evaluated the resulting overeating clusters using per-cluster purity68, which measures the extent to which each cluster contains data points from a single class. The silhouette score was used to assess how well-separated and cohesive the clusters are by comparing intra-cluster and inter-cluster distances. Entropy69 quantifies the uncertainty within each cluster, with lower values indicating higher purity. For overall clustering assessment, homogeneity70 measures how uniform the clusters are with respect to the true class labels across the entire dataset. Lastly, we calculated the proportion of correctly assigned overeating meals to assess the effectiveness of the clustering approach in identifying relevant clusters.