Introduction

In the recent decade, adolescents represent the fastest-growing demographic of social media users. In 2022, the time spent online by U.S. adolescents doubled, compared to 2014-15, with 95% of teens using YouTube, 67% TikTok, 62% Instagram, and 59% Snapchat1. A similar trend can be found in Switzerland, where 99% of 12 to 19-year-olds own a smartphone, 95% use instant messaging apps, and 91% use social media sites, with Instagram, YouTube, Snapchat, and TikTok being the top ones2. Along with the increased use of social media, mental health problems are also rising among adolescents. Studies found the steepest decline in life satisfaction occurs during 15–24 years old in most of the world’s regions3. In 2021, almost half of U.S. teens felt persistently sad or hopeless, and one-third reported poor mental health4, while about 37% of Swiss teens experienced mental health problems5. Moreover, adolescence is the time window of developmental sensitivity to the negative well-being outcomes of social media6. As stated in the U.S. Surgeon General Advisory on Social Media and Youth Mental Health (Office of the Surgeon General, 2023) and the update of the Digital Services Act in early 20247, parallel trends of rise in adolescents’ mental health problems and social media use ring the alarms for our society. Hence, exploring the impact of social media and smartphone use on adolescent well-being—with objective social media data—is increasingly vital to provide robust results beyond cross-sectional data and self-reported measures, and inform policymaking.

Despite compelling studies8,9,10, the (bi)-directional relationship between social media use (SMU) and adolescent well-being remains debated. In the release of the U.S. Surgeon General stated that “there are ample indicators that social media can also have a profound risk of harm to the mental health and well-being of children and adolescents” (p. 4) but called for more research using advanced methodologies to disentangle both negative and (understudied) positive effects of SMU on adolescents’ well-being11. So far, the literature on SMU and adolescent well-being includes many systematic reviews and reviews of reviews8,12,13,14, and an augmenting trend of intensive longitudinal studies using Ecological Momentary Assessments (EMAs)10,15,16,17, and, in some rare cases, trace data10,18. The consensus among previous studies is that the extent of SMU has a minimal—often small and negative—impact on adolescents’ mental health12,14, but mechanisms behind the association need to be further examined.

In the sections below, we present a review of the four perspectives for advancing studies on SMU and adolescent well-being and discuss how the current study will incorporate these perspectives.

Current objective social media data collection

Central to the debate between SMU and well-being is identifying how activities on social media affect well-being, a task complicated by the idiosyncratic nature of SMU behavior. Current self-reported measures of social media time are inadequate as they fail to capture different activities and are often underreported19 or distorted by users20, and are only moderately correlated with objective metrics21. Therefore, granular and accurate data on users’ social media activity are essential to unpack the detailed mechanisms between SMU and adolescent well-being. The European Union’s Digital Services Act (DSA), effective February 2024, is a significant advancement that mandates platforms to share data on platform design and operation through appropriate interfaces with researchers22 in future studies. Although some US states have pilot programs23, similar comprehensive legislation is lacking in other global regions, making collecting objective social media data for empirical research still challenging.

Objective social media data are activity records as processed or saved through an online environment24 and can be collected using Application Programming Interfaces (APIs), app data donation, and passive sensing, but each of these approaches is often hampered by various challenges25. First, data from APIs are constrained by content limitations to protect user privacy, removing most identifiable information, such as the user demographics. This leads to the second limitation: it is difficult to link SMU data requested from API to well-being and mental health outcomes, which are mostly measured through self-reports25,26. Third, data from API are subject to rate limits and platform policy changes. For example, the Twitter Academic API and Reddit Pushshift API, once free to access for academic research, were both gated with subscription fees in 202327,28.

Another option is using Data Donation Packages, archives of social media platforms that each user has the right to download. In both Europe and the United States, social media platforms are legally mandated to share user trace data with their users. However, challenges remain at both the data collection and analysis stages, and privacy issues. A low compliance rate has been reported for this kind of studies29, and analyzing the content of data donation packages is often complicated by the quick changes in data infrastructure implemented by social media platforms29.

Passive sensing apps like EARS and Ethica (now Avicenna) collect data on screen state and battery status from smartphones without the user’s active input30. However, most sensing apps only collect overall screen time, and only a few Andriod-specific sensing apps can track the exact screen on and off times by apps31. To partially overcome these limitations, the Human Screenome Project proposes Screenomic analysis, capturing smartphone screenshots every 5 s while the user is actively using the device32. While this method offers detailed data, this approach can be highly intrusive and increases privacy risks33, especially for underage users. High-frequency data collection generates a vast amount of image data, requiring substantial computational effort, and posing challenges in deriving meaningful measurements from the images. Also, the Screenomic approach faces privacy regulations challenges, such as the General Data Protection Regulation (GDPR) in the European Union, which limits its applicability outside the U.S.

In summary, collecting objective and specific data on teen smartphone usage and SMU at scale, while adhering to international regulations and minimizing privacy invasion among underage populations, presents significant research challenge. The current study introduces a novel method for collecting objective social media data from adolescents using user-donated screenshots to track specific app usages on smartphones. Reflecting the concept of screenomes, which records sequences of digital screen states, we developed “Appnomes”—metrics that measure app-specific screen time, number of activations, and number of notifications per participant per day, derived from user-donated screenshots. Appnomes encapsulate the body of app-specific usage information collected in this manner. For more detailed information on the methodology for collecting and analyzing user-donated screenshots, please refer to a previous study by our team34.

Between- and within-level analysis in SMU and well-being

The effects of SMU on adolescent well-being are person-specific and highly heterogeneous35, necessitating both between- and within-level analyses to capture the underlying complexity36. Intensive longitudinal designs such as EMAs enable multilevel modeling, where each survey assessment (level one) is nested within participants (level two)37. In multilevel modeling, the between-level analysis examines intergroup relationships, while within-level analysis focuses on intragroup levels, examining how SMU predictors at different times are associated with the well-being of the same individual38 (e.g., N = 1 approach36).

Multilevel modeling in SMU and well-being studies enable researchers to compare how between- and within-person level associations differ. Beyens et al.15 found no significant between-person association between active and passive SMU and well-being, yet observed a positive associations within person. Moreover, the association varied among adolescents with 44% feeling no change, 46% feeling better, and 10% feeling worse after passive SMU. Orben et al.39 analyzed data from seven waves of a UK national survey, noting small negative between-level associations between SMU and life satisfaction, but not at the within-person level. Other studies have reported differences in the between- and within-level analysis results9,10,16,40. A key challenge in this approach is its inability to support inferences from an individual to a targeted population. Therefore, scholars advocated for principles to quantify, interpret, and explain variation in average within-person effects41. These findings indicate that simply focusing on between-level analyses can obscure important individual dynamics. Therefore, both within- and between-level analyses, combined with longitudinal designs, should be conducted in studying SMU and well-being.

Temporal dynamics of SMU and well-being

Recent discussion in research have emphasized the temporal dynamics of the association between SMU and well-being, exploring the “delayed effects” where SMU at one time point relates to well-being later. For example, a study42 with 89 young adults found that passive Facebook usage was related to lagged affective well-being two hours later. Another experience sampling study found that SMU correlates to increased perceived uncertainty immediately (T1) was associated with greater perceived uncertainty at the same time point, but the correlation did not persist (T2)43, suggesting a transient relationship. Marciano et al.10 found that frequent smartphone use significantly predicted improved well-being the day after, and this effect size shrank at three days and returned to near zero at seven days. These studies underscore the temporal dynamics, suggesting that the potential of SMU on well-being are not constant but evolve over time.

However, determining the appropriate time unit for longitudinal studies analyzing the SMU and well-being relationship—minutes, hours, days, or months----remains challenging. While exposure to social media can triggers neural reactions within seconds44, most EMA studies collect data from the previous hours or the current day, and cohort studies, like those by Orben et al.39, aggregate associations over even longer periods. Indeed, panel studies that track participants for months or years may not account for these temporal dynamics, potentially biasing study interpretations. This mismatch in time intervals—between the rapid neural responses and the feasible measurement intervals in observational studies—leaves unresolved questions (e.g., how effects can pile up through time? ). Ephemeral associations (effects lasting only a few minutes) may not be captured, while long-term associations (effects over several years) remain unexamined. More empirical data is needed to identify these dynamics across different time scales. In the current study, we examine how SMU is associated with well-being on the same day and with one- and two-day lags.

App-specific social media components and psychological processes

In examining SMU and well-being, it is essential to recognize that different apps offer unique features, functions, and content, which result in different user activities and experience. Studies have distinguished between active usage, involving user interactions, and passive usage, which focuses on content consumption on social media45. Recent research has identified core components of the social media ecosystem and the psychological processes related to them, such as profiling with self-presentation, streaming that triggers social comparison, and networking with receiving social support46.

Different social media apps emphasize these components to varying degrees. For instance, WhatsApp mainly provides channels for networking and messaging, TikTok is more on streaming through short videos, while Facebook and Instagram blend the components. This variation in emphasis leads to differing psychological processes, and subsequently, distinct relationships between platform use and well-being across different apps. For example, during COVID-19, cross-sectional study found that passive Facebook use was linked to decreased well-being through social comparison, active Instagram use was associated with enhanced life satisfaction via social support, while TikTok had no significant impact47. Another EMA study has shown that both Facebook and Instagram use predicted increased social comparison, lower self-esteem, and negative affect48. Additionally, a two-wave panel survey found perceived information overload in YouTube among adults, in WhatsApp and Snapchat among older adults, but not in Facebook or Instagram49. Another experimental study found no significant differences in subjective well-being between active and passive uses of Instagram, Facebook, or Twitter50. Most of these studies targeted adults, yet it is observed that adolescents use Facebook less and prefer TikTok and Instagram more than adults1. Together, these studies emphasize the importance of moving beyond general metrics of social media use to focus on specific apps, especially among adolescents. Moreover, rather than merely focusing on specific social media platforms, studying the app-specific use with well-being facilitates identifying unique platform features and underlying psychological processes, which further supports the study of psychological mechanisms between SMU and well-being.

Our research also explores the relationship between gaming apps like “Clash of Clans” and “Pokemon Go,” and well-being, less studied together with social media apps in existing literature. Previous studies have largely debunked the idea that video gaming harms mental health, showing minimal evidence of harm and, in some cases, noting benefits such as stress reduction51,52,53. Digital games meet various psychological needs—identification, social interaction, and escapism—linked to improved hedonic, eudaimonic, and social well-being53. Yet, research also shows a significant negative impact of problematic gaming on adolescents’ mental and psychosocial health54,55. Given that adolescents spend as much time gaming as they do using social media56, and both activities engage similar psychological processes, it is crucial to compare the impact of gaming apps and social media apps on well-being.

From ill-being to well-being

Another important perspective for studies on SMU is to move beyond focusing on ill-being, such as psychopathological symptoms, and shift towards well-being and flourishing57. This shift is important because the absence of ill-being does not necessarily equate to well-being58. Moreover, this transition reflects a broader philosophical change in social media studies: moving from identifying failures of social media platforms to exploring how platform designs or user behaviors can promote well-being59,60.

Responding to the call for a broader perspective, our study adopts the human flourishing framework to examine well-being as cornerstone for societal thriving61. We focus on hedonic well-being—happiness or contentment derived from experiencing pleasure and avoiding pain—and eudaimonic well-being61, rooted in Aristotle’s concept of self-actualization and meaningful life purposes. Hedonic well-being includes aspects such as life satisfaction, positive affect, and negative affect62. Theoretical dimensions of eudaimonic well-being include autonomy, environmental mastery, personal growth, positive relations with others, purpose in life, and self-acceptance62. A study63 showed how specific social media activities are related to adolescent flourishing levels, however, only between-person associations have been explored. Only recently, a meta-analysis60 summarized the association between social media use and “positive” well-being indicators finding that hedonic well-being was positively related to communication activities on social media (r = 0.11) and positive online experiences (r = 0.21), problematic social media use (r = − 0.13) and social comparison on social media (r = − 0.30). While eudaimonic well-being was only associated with problematic social media use (r = − .0.26). However, included studies were mostly cross-sectional and none used intensive longitudinal data with social media data collected through screenshots.

The present study

The present study provides a possible solution to address previous issues on collecting objective social media data to disentangle the between- and within-person variation, examine the temporal dynamics, and consider adolescent well-being outcomes in a continuum from ill- to well-being. Using Appnomes—app usage metrics on screentime, number of activations, and number of notifications derived from user-donated screenshots—and well-being measures from EMAs, we applied multilevel modeling to explore between- and within-person associations and answer the following research questions:

  • RQ1: How do Appnomes relate to well-being at the between- and within-person level?

  • RQ2: Is the relationship between App-specific usage metrics and well-being consistent over time?

The << blinded for review>> study applied EMA design and collected data on well-being of 374 adolescents living in < < blinded for review>> (Meanage= 15.71; SDage= 0.82, 62.8% female) over two weeks in Spring 2022. Participants were invited to rate their well-being three times per day (with prompts at 12 pm, 6 pm, and 9 pm) with questions related to the last two hours. In total, 277 participants completed 2,273 assessments and uploaded screenshots, achieving a 58.6% compliance rate at the assessment level. Well-being measures included positive emotions (i.e., happiness, relaxation, and contentment) and negative emotions (i.e., anger, sadness, and loneliness), while eudaimonic well-being was related to the engagement in the activities (i.e., feeling “engaged”, “competent”, and “focused”, and feeling that the activities “were worthwhile and meaningful”, “made me feel connected with others”, and “made me feel satisfied”). Answer options were on a Visual Analogue Scale (VAS) from 0 “not at all” to 100 “A lot”. Participants also donated daily screenshots of App-specific usage metrics, including the total screentime, number of unlocks, and number of notifications, for the three most used apps from the previous days. The study received ethical approval from the IRB of USI, Università della Svizzera Italiana, Lugano, Switzerland, and was supported by the Department of Education, Culture, and Sport of Canton Ticino. Informed consent was obtained from all study participants following regulations of the involved institutes in Switzerland. All data collection and study analyses were performed in accordance with relevant guidelines and regulations of the involved institutes.

Results

Individual “Appnome”

Using text extracted from user-donated screenshots, we constructed individual “Appnome”, summarizing total and top three apps’ metrics for each dimension (screentime, unlocks, and notifications) for each user across the study period. For the current analyses, we selected the top 6 social media Apps with the highest average rank in screen time, number of activations, and notifications for the analyses in this paper. Figure 1A-1–C-1 shows the Appnome of one female participant who primarily used WhatsApp, TikTok, and Instagram. She spent 36–161 min on Instagram and 21–99 min on TikTok daily, unlocked her smartphone 8–23 times for WhatsApp, and received 136–342 WhatsApp notifications and around 50 Instagram notifications during the study. Figure 1A-2–C-2 displays the Appnome of one male participant, who predominantly used WhatsApp, YouTube, and some Instagram. His dailuy YouTube screen time ranged from 130 to 330 min, with around 10 times of activations each for YouTube and Instagram, and 332 WhatsApp notifications on a Wednesday.

Fig. 1
figure 1

An example of Appnome on the screentime (A), number of activations (B), and number of notifications (C), of one female (1) and one male (2) study participant. The figure shows the App-use features of one female adolescent and one male adolescent in the study. Study participants uploaded daily screenshots from their cellphone settings, detailing App-specific screen time, unlocks, and notifications throughout the study period. Image processing and OCR techniques were employed to extract text from these screenshots. As shown in (A-1), the female adolescent spent most of the screen time on TikTok and Instagram in almost the entire study period. As shown in (B-1), most of the screen unlocks comes from activating WhatsApp. As shown in (C-1), the adolescent received more than 150 notifications from WhatsApp daily and less than 100 notifications from Instagram daily in the study period. The male adolescent has a different app-use feature compared to the female participant. As shown in (A-2), the male adolescent spent more than 1000 min using YouTube in the observed days. The adolescent also spent some time on Instagram and Game Apps. As shown in (B-2), most of the screen unlocks comes from activating Instagram, WhatsApp, and YouTube. As shown in (C-2), the adolescent received notifications mainly from WhatsApp in the study period.

Between-person level app usage characteristics

Figure 2 and Supplementary Table 1 show that females primarily used TikTok, averaging 78.19 min (SD = 62.33 min) daily, followed by Instagram (mean = 37.89 min, SD = 38.13 min) and WhatsApp (mean = 36.99 min, SD = 33.92 min). Males used YouTube the most, averaging 44.27 min (SD = 53.94 min) daily, followed by Instagram (mean = 27.85 min, SD = 30.38 min), WhatsApp (mean = 25.11 min, SD = 29.37 min) and TikTok (mean = 24.93 min, SD = 47.46 min). Females spent significantly more time on TikTok than males (P <  0.001), while males spent significantly more time on YouTube than females (P = 0.001). WhatsApp accounted for the most screen unlocks, with average unlocks of 33 (SD = 26) for female and 23 (SD = 23) for male daily. Both females (mean = 114 times, SD = 78 times) and males (mean = 107 times, SD = 113 times) received the most notifications from WhatsApp, followed by Instagram.

Fig. 2
figure 2

The distribution of person-level average screentime (A), number of activations (B), and number of notifications (C) by gender in the << blinded for review>> study. The figure presents the distribution of the person-level average App-use features. Study participants uploaded daily screenshots from their cellphone settings, detailing app-specific screen time, unlocks, and notifications throughout the study period. OCR techniques were employed to extract text from these screenshots. Utilizing the extracted text, we calculated the average app-use features for each participant and depicted the distribution of these averages among the study participants in a violin plot.

Between-person level correlation between app usage and well-being

We examined the partial correlation (controlled for age and gender) between the app-specific usage and well-being outcomes at the between-person level. Figure 3 reveals that longer TikTok screentime (r = − 0.15), more Games Apps activation (r = − 0.21), and more Snapchat notifications (r = − 0.12) correlated with lower positive emotion. Conversely, longer screen time (r = − 0.12) and more notifications (r = − 0.12) on YouTube were linked to lower negative emotions. None of the correlations were statistically significant following Bonferroni correction (p < 0.008).

Fig. 3
figure 3

Correlation between average screentime (A), number of activations (B), and number of notifications (C) of selected Apps and average hedonic and eudaimonic well-being in the between-person level, in the << blinded for review>> study. The figure presents the partial correlation of the between-person level well-being outcomes (Y) and App-use features (X) adjusting for age and gender. For instance, the average score of positive emotion (y1) and screentime on Snapchat (x1) are calculated for each person. The partial correlation coefficient between y1 and x1 is then estimated among the study participants. Bonferroni correction is conducted to correct for multiple testing (adjusted P value threshold = 0.05/#=0.0083, where #=6). None of the correlation coefficients are significant statistically after the correction.

Within-person level correlation between app usage and well-being

We conducted multilevel analyses, with time at level one and participant at level two, controlling for age, gender, school, and the previous day’s well-being outcome (the autoregressive) to account for within-person variations, for each pair of app-specific usage and well-being outcomes. Table 1; Fig. 4 and Supplementary Fig. 1 present the standardized and unstandardized correlation between app-specific features and well-being outcomes. As shown in Table 1, after Bonferroni correction, most correlations are not statistically significant. However, several features of apps like TikTok and WhatsApp show significant associations with well-being. For statistical significance, we set the threshold at p < .008 (0.05/6), accounting for 6 competing hypotheses for each pair of well-being outcomes and app features.

Table 1 Mixed linear model results on the standnumberardized associations between screentime, number of activations, and number of notifications and hedonic/eudaimonic well-being in the << blinded for review>> study.
Fig. 4
figure 4

Forest plot of the mixed linear models results on the associations between TikTok, WhatsApp, and Instagram screentime, number of activations, and number of notifications and hedonic/eudaimonic well-being. Figure shows the unstandardized coefficients and confidence interval between well-being outcomes (Y) and App-specific features (X) of the same day from mixed linear models, adjusting for well-being outcome of the previous day, age, gender, and school in each model, with time (each day) as level one, and adolescents as level two. Well-being outcomes are collected from self-reported questionnaires, and App-specific features are derived using text extraction pipeline from the cellphone screenshots of the setting pages uploaded by users, both within the same EMA study. For example, happy (y1)—screentime of Instagram (x1) is one pair of well-being outcomes—App-use feature predictors. We have 288 pairs of such outcomes (12 individual outcomes and 4 aggregated outcomes) and predictors (3 features per App for 6 Apps). Figure shows the results for TikTok, WhatsApp, and Instagram. The results for Game Apps, YouTube, and Snapchat are presented in S-Fig. 4. For each pair of well-being outcomes—App-use feature predictors, we fit a random intercept model and random slope model. We apply the likelihood ratio test to select the best model following the parsimonious rule. Bonferroni correction is applied to the confidence interval to account for multiple testing. For each well-being outcome and App-use feature, we adjust the level of significance by 0.05/# of tests (# = 6) and the confidence interval based on the adjusted level of significance. The confidence interval is adjusted by \(\:Estimate\pm\:qnorm\left(1-\frac{0.05}{2*\left(\#-1\right)}\right)*Standard\_error\), where # = 6. The interactive Plotly version of this plot is available in OSF project in path “/code_and_data_for_replication /within_level”.

Time dynamics of the associations between app usage and well-being

We also applied multilevel models to explain well-being using app-specific features with one-day and two-day lags. The unstandardized correlations for the same-day, 1-day, and 2-day lags are presented for TikTok, WhatsApp, and Instagram (Fig. 5) and for Game apps, YouTube, and Snapchat (Supplementary Fig. 2). We found that most relationships between app-specific features and well-being outcomes remained insignificant over the following one or two days, with the exceptions of Snapchat and TikTok.

Fig. 5
figure 5

Associations of TikTok, WhatsApp, and Instagram usage with hedonic and eudaimonic well-being outcomes on the same day, the following day (1-day lag), and two days later (2-day lag), results from mixed linear models. Figure shows the unstandardized coefficients and confidence interval between well-being outcomes (Y) and App-specific features (X) of the same day, the following day (1-day lag), and two days later (2-day lag) from mixed linear models, adjusting for well-being outcome of the previous day, age, gender, and school in each model, with time (each day) as level one, and adolescents as level two. Well-being outcomes are collected from self-reported questionnaires, and App-specific features are derived using text extraction pipeline from the cellphone screenshots of the setting pages uploaded by users, both within the same EMA study. Figure shows the results for TikTok, WhatsApp, and Instagram. The results for Game Apps, YouTube, and Snapchat are presented in S-Fig. 5. For each pair of well-being outcomes – App-use feature predictors, we fit a random intercept model and random slope model. We apply the likelihood ratio test to select the best model following the parsimonious rule. Bonferroni correction is applied to the confidence interval to account for multiple testing. For each well-being outcome and App-use feature, we adjust the level of significance by 0.05/# of tests (# = 6) and the confidence interval based on the adjusted level of significance. The confidence interval is adjusted by \(\:Estimate\pm\:qnorm\left(1-\frac{0.05}{2*\left(\#-1\right)}\right)*Standard\_error\), where # = 6. The interactive Plotly version of this plot is available in OSF project in path “/code_and_data_for_replication /within_level”.

Specific app usage and adolescent well-being

TikTok

At the between-person level, longer screen time on TikTok correlated with lower levels of content (r = −0.18, p = 0.032), feeling focused (r = −0.18, p = 0.035), feeling engaged (r = −0.17, p = 0.039), feeling competent (r = −0.17, p = 0.042), feeling connected (r = −0.17, p = 0.044), and eudaimonic activity (r = −0.18, p = 0.029) across participants, but the associations were not statistically significant after Bonferroni’s correction. At the within-person level (Table 1), time spent on TikTok was significantly associated with lower eudaimonic well-being (β = −0.08, unstandardized B = −0.04, p = 0.001) on the same day. In other words, one additional minute of TikTok use corresponded to a 0.04-point decrease in eudaimonic well-being on a 0-100 scale. Additionally, increased TikTok screen time was linked to lower levels of contentment (β = −0.07, p = 0.003), engagement in daily activities (β = −0.06, p = 0.008), satisfaction (β = −0.07, p = 0.007), and feelings of connection with others (β = −0.10, p < .001) on the same day (Fig. 4), but increased positive emotion the following day (β = 0.06, p = 0.004) (Fig. 5). Moreover, TikTok notifications were associated with increased negative emotions (β = 0.06, p = 0.003) and sadness (β = 0.07, p = 0.002) on the same day while TikTok smartphone unlocks were not significantly associated with well-being.

WhatsApp

At the between-person level (Fig. 3), WhatsApp features showed no statistically significant associations with well-being. At the within-person level (Table 1), longer WhatsApp screentime correlated with increased negative emotion (β = 0.06, p = 0.008), and this relationship diminished to nearly zero in subsequent day (Fig. 5). Additionally, WhatsApp smartphone unlocks were related to a greater sense of connection (β = 0.08, p = 0.008), while the number of WhatsApp notifications was not significantly associated with well-being on the same and following days.

Instagram

At the between-person level, increased Instagram screen activation correlated to higher levels of anger (r = 0.21, p = 0.010) and feeling connected (r = 0.21, p = 0.012), and more Instagram notification was related to enhanced feeling connected (r = 0.19, p = 0.010) and eudaimonic well-being (r = 0.15, p = 0.044). However, these relationships were not statistically significant after Bonferroni’s correction. At the within-person level, Instagram notifications was positively related to greater feeling of focused (β = 0.06, p = 0.006) on the following day.

YouTube

At the between-person level, more YouTube smartphone unlocks were associated with lower feelings of connection (r = −0.18, p = 0.034), and more YouTube notifications correlated to higher feelings of satisfication (r = 0.16, p = 0.032), though neither association was statistically significant after Bonferroni’s correction. At the within-person level, on the following day, more YouTube smartphone unlocks were associated with an increased feeling of meaningful (β = 0.07, p = 0.006); other associations were not significant statistically.

Snapchat

At the between level, more Snapchat smartphone activation correlated to lower happiness (r = −0.15, p = 0.043) on the same day, but this was not significant statistically after Bonferroni correction. At the within level, no significant association was observed between Snapchat screentime and well-being outcomes on the same day after the Bonferroni correction. However, on the following day, Snapchat notifications was negatively related to feeling relaxed (β = −0.06, p = 0.003), feeling content (β = −0.07, p = 0.001), feeling competent in activity (β = −0.06, p = 0.002), feeling satisfied (β = −0.07, p = 0.001), and positive emotion (β = −0.06, p = 0.005).

Game apps

No relationship was found at the between-person and within-person level after the Bonferroni correction.

Discussion

The current study employs user-donated screenshots to build individual Appnomes – including different app usage metrics and examines the between- and within-person relationship with hedonic and eudaimonic adolescent well-being collected through EMAs. The discussion about the feasibility, ethical and privacy concerns of the user-donated screenshot and Appnome analysis is presented in another study of our group34. Here, we focus on the presented main results.

According to the Appnome, the most frequently used social media apps were—in order—WhatsApp, Instagram, TikTok, YouTube, Snapchat, and Game Apps (aggregated from various game apps). None of the between-person level relationships with well-being was statistically significant with few exceptions. That aligns with the majority of reviews and meta-reviews that found small and inconsistent effects45,60,64. Exceptions included YouTube, where notifications were associated with higher levels of feeling satisfied at the between-person level. Previous studies found a positive correlation between viewing inspiring videos on YouTube and psychological well-being65, in addition to watching music and infotainment content on YouTube66. A review found both negative and positive associations between YouTube use and well-being from both cross-sectional and longitudinal studies67. As YouTube is a video browsing app, users primarily passively consume content. Therefore, future studies should aim to unpack the relationship between the content on YouTube and user outcomes. Additionally, at the between-person level, more notifications from Instagram were associated with higher levels of feeling connected and eudaimonic well-being, and more screen activations due to Instagram were also related to a higher level of feeling connected. A previous study found that people who have an Instagram account are less lonely, anxious, and depressed than those who do not have an account68. Our findings suggest that the networking component on Instagram may benefit our study participants by providing online social support and a sense of community69. Alternatively, users who have better social connectedness might be more inclined to use Instagram.

At the within-person level, results indicated both positive and negative small effects across platforms. As discussed earlier, different apps are characterized by distinct components, such as profiling, networking, or streaming46. This leads users to engage with them in varied ways and contexts to fulfill different needs, which correspond to different psychological processes and well-being outcomes. Below, we discuss the findings for each platform and explore the potential mechanisms associated with each.

At the within-person level, increased screen time on TikTok was associated with decreased levels of eudaimonic well-being, contentment, and feelings of connection with others on the same day, but increased positive emotion the following day. As an app that provides streams of short videos through the “For You” page, users spend more time passively consuming content on TikTok compared to other social media platforms70. Therefore, the content and user experience on TikTok play a crucial role in shaping well-being outcomes. According to previous studies, TikTok users report both “enjoyment” and “time distortion” as well as telepresence (immersion in the world of TikTok)69. The enjoyment dimension may reflect the benefits of engaging, humorous content and online friendships within TikTok71. The time distortion and telepresence dimensions suggest that longer TikTok screen time may serve as an escape from everyday worries, although using TikTok as a coping strategy is suboptimal and can lead to lower eudaimonic well-being on the same day69,72. Moreover, since this is a correlational study, it’s important to consider that stressed adolescents might use TikTok as a way to cope with their stress73.

In addition, increased screen time on WhatsApp was related to elevated levels of negative emotions, which contrasts with previous studies finding that longer time spent on WhatsApp is positively associated with higher psychological well-being74,75. As a platform primarily focused on networking and messaging components, WhatsApp offers social support to users but can also increase stress due to information overload or online vigilance, leading to different well-being outcomes76. In a previous study, users with more active usage of the read receipt feature reported higher levels of perceived stress77. Also, at the within-person level, we found WhatsApp smartphone activations were associated with an increased feeling of being connected, which is likely related to strengthened bonding social capital74.

Furthermore, more notifications from Snapchat were not associated with well-being on the same day but were linked to decreased positive emotion, feelings of competence, and satisfaction the following day. Snapchat is a social media platform for multimedia messaging, where people exchange pictures and videos (snaps) that are meant to disappear after being viewed. In previous studies, high Snapchat use was linked to higher delinquent behaviors, school avoidance, and more anxiety and stress78,79. One potential explanation for the negative association between Snapchat and well-being is that certain behaviors on the platform increase social comparison. For example, monitoring ex-partners on Snapchat and using the Snap Map feature have been associated with increased jealousy in romantic relationships79. Snapchat offers a mix of profiling, networking, and streaming components, compelling researchers to not only link specific user motivations or activities on Snapchat to well-being but also focus on the underlying psychological processes in future studies.

Additionally, we did not find a significant association between well-being and usage of Game apps, measured as an aggregation of various games identified from screenshots. However, it is important to note that “game apps” are an aggregated category, including puzzles, card games, adventure games, and action games, among others. Not all game apps in the app store have social networking aspects. As a result, different types of mobile games can trigger various psychological responses; some alleviate loneliness and boost achievement, while others may lead to addiction among adolescents with low self-esteem due to the game’s rewarding nature80. Future studies on game apps should ensure a thorough understanding of the features of the games and conduct more focused analyses accordingly.

Overall, our results mainly showed small or non-significant associations. That aligns with the ongoing scientific debate stating that associations might be negligible45,64 and speak against the moral panic on social media and adolescent mental health. Besides the rise of social media, other factors like a broader definition of mental health and greater awareness of mental health issues could also explain the decline in adolescent mental health81. Previous research has also shown approaches through which social media can benefit youth mental health82. In the current study, though shifting from aggregated SMU metrics towards within-person variations offers deeper insights into the underlying mechanisms of social media’s impact on adolescent well-being15,16,39, the small size of the effects showed that these features are still insufficient to fully capture the social media behaviors or activities that risk or promote well-being. One practical implication is that intervening on these features might not have a great effect on well-being.

The fact that many associations were not statistically significant after applying a more restrictive threshold for significance highlights the importance of adjusting for multiple hypothesis testing when studying app-specific data. Correcting for multiple testing is essential to prevent p-hacking83 and contrast inaccurate results due to chance when exploring a large amount of data.

Finally, we examined the temporality of within-person associations between app-specific usage features and well-being. In cases where within-person associations were—almost significant—on the same day, most relationships diminished to near zero the following day. Overall, these results raise an important point concerning the temporality of the effects of social media10,43 and underline that these relationships can change over time. We can speculate that these temporal effects can be due to several reasons. On one hand, the impact of social media might be cumulative rather than instant. Repeated exposure to certain types of content or interactions could gradually influence well-being, becoming more apparent a day later as these effects build up. On the other hand, the feedback from social media, such as comments and likes, might not be immediately seen but reviewed the next day, thus influencing well-being once the user has had time to absorb and react to this feedback. These temporal changes should be considered even when policymakers need to act.

Limitations

The current study has three main limitations. First, the study experienced a relatively high attrition rate. Only 74.1% (277 out of 374) of the participants who completed the baseline survey finished at least one assessment and submitted screenshots along with the assessment among the total 14-day assessments. This attrition rate falls within the slightly lower range of those reported in existing EMAs84,85. Our attrition analysis showed that the drop-off group and the group with good adherence were generally comparable, except that female participants were more likely to remain in the sample (Supplementary-Table 2). However, despite the attrition analysis results, the high level of attrition may introduce the issue of missing not at random, particularly regarding SMU or other socio-demographic characteristics that we did not measure. Second, the observed app-specific usage time from user-donated screenshots is subject to a missing data problem. Only the top three apps’ usage features are observed in the screenshots; the fourth app and beyond are not. In the current analysis, we assume that the usage time of an unobserved app is zero. However, this assumption may be incorrect, as the app could have been used for some time. Third, the current study can be improved by considering the reciprocal relationship between SMU and well-being through advanced modeling approaches such as cross-lagged models and structural equation models. Given the scope of the current study, we did not conduct these additional analyses, but we plan to explore these approaches in future research.

Conclusion

The current study offers a novel approach to study the relationships between Appnomes and hedonic and eudaimonic adolescent well-being. Our results highlight the importance of studying both between- and within-person variations, as well as the temporal dynamics in the association between SMU and well-being.

Methods

Data collection

The << blinded for review>> study included the collection of intensive longitudinal data by tracking SMU and well-being in 374 Swiss adolescents (Mean-age = 15.71; SD-age = 0.82, 62.8% female) over 14 days in Spring 2022. Note that the target sample size for our EMA was set at 250 participants prior data collection, based on power analysis to achieve 90% power for detecting small effect sizes. Students who were in grades 1 and 2 are recruited from 4 high schools in Switzerland and joined the study between the 7th and the 23rd of March, 2022. Participants completed a baseline survey and EMAs using the Ethica app (now Avicenna). The timeline of the << blinded for review>> study data collection is shown in Supplementary Fig. 3. The baseline survey collected information on the socio-demographic characteristics, while EMAs were conducted three times daily from Day 1 to Day 14 at 12 pm, 6 pm, and 9 pm via the Ethica App. EMAs measured self-reported current well-being including hedonic and eudaimonic well-being assessment. Participants also provided screenshots of phones’ settings, showing App-specific usage patterns including daily usage time, screen activations, and notification counts of used apps. The example screenshots are shown in Supplementary Fig. 4. To collect the objective data on social media App-specific usage patterns, we ask participants to go to their smartphone’s Settings → Usage Time, take screenshots of platform usage information from yesterday, and upload it to Ethica. The platform usage information includes usage time, number of screen unlocks, and number of notifications. The study received ethical approval from the IRB of USI, Università della Svizzera Italiana, Lugano, Switzerland, and was supported by the Department of Education, Culture, and Sport of Canton Ticino. Informed consent was secured from all participants and all data collection and analysis procedures were conducted in accordance with the regulations of the participating institutions in Switzerland.

Well-being measurements

On a scale of 0-100, participants were asked to report the level of positive emotion (measured by happiness, relaxation, and contentment), negative emotion (measured by anger, sadness, and loneliness), eudaimonic activity (measured by the feeling engaged, competent, and focused), and eudaimonic well-being (measured by feeling meaningful, connected, and satisfied) in the previous two hours. The well-being measures were collected three times per day in the study period and averaged into daily measures in the downstream analyses. The distributions of specific and aggregated well-being measures are presented in Supplementary Fig. 5 and Supplementary Fig. 6.

Text extraction from user-donated screenshots

We extract information on App-specific usage patterns, including the total and app-specific screentime, number of notifications, and number of activations, from the collected screenshots. For example, in Supplementary Fig. 4 (A), we intented to extract the following information: a total screentime of 278 min (4 h 38 m) from the previous day, with 63 min from Chrome, 53 min from Office, and 47 min from YouTube. Similarly, in Supplementary Fig. 4 (B), we extracted: a total of 157 screen activations happened the previous day, with 44 activations due to WhatsApp, 19 activations for Mobile FFS and another 19 activations for Photomath.

We developed a pipeline to extract details on total screen time, number of activations, and notifications for the top 3 apps from screenshots. The text extraction pipeline incorporates image preprocessing, Tesseract Optical character recognition (OCR), text translation, and post-processing34. OCR is a method to convert texts from images into machine-encoded text. We applied OCR to transfer the objective social media usage data from the screenshots to text. Then, we conducted post hoc heuristic correction of the extracted texts, correcting the extracted app names based on a database of app names using edit distance.

To evaluate the accuracy of our text extraction, we randomly selected a total of 600 screenshots, with 100 each for iOS/Android screen time, activation, and notification, and annotated them manually to establish the ground truth. For example, the ground-truth labels for Supplementary Fig. 4C are: 347, “WhatsApp”, “267”, “TikTok”, “24”, “Instagram”, and “17”. In general, we generated 7 measures for each screenshot, including the total screentime/count, app name for the app with the 1st screentime/count, screentime/count for the 1st app, app name for the app with the 2nd screentime/count, screentime/count for the 2nd app, app name for the app with the 3rd screentime/count, and screentime/count for the 3rd app. We evaluate the pipeline’s performance based on the accuracy rate—the percentage of screenshots where extracted text precisely matched the ground truth, with the results shown in Supplementary Table 2.

Appnome construction

To establish App-specific usage patterns for each participant, we generated an Appnome that displays daily screen time, activation count, and notifications per app and app category across the study period. In this study, we constructed the Appnome by the top 6 Apps with the highest average rank in screen time, number of activations, and number of notifications, including TikTok, WhatsApp, Instagram, YouTube, Game apps, and Snapchat, in the downstream analyses. Note that we aggregated all game apps into the general Games category in the app-specific analysis. Using the text extracted from screenshots, we created an Appnome for each participant, detailing daily screen time, notifications, and activations for the top three apps. This is visualized for each participant, as illustrated in Fig. 1.

Before downstream analyses, we applied a series of filters to ensure the data quality. We preprocessed the text extracted from the screenshots at both the screenshot and participant level. At the screenshot level, we dropped screenshots without total counts, removed screenshots without the name of the first app, excluded screentime screenshots with total screen time or app-specific screentime exceeding 20 h (1200 min), dropped screenshots with the total screentime (or counts) smaller than the sum of the screentime (or count) of the top three apps, and removed screenshots with the screentime (or counts) of the first app smaller than the screentime (or counts) of the second app as well as the third app, as these scenarios could indicate a measurement error, such as the participant submitting a weekly instead of a daily view, or an error with the text extraction pipeline. At the participants level, we kept participants that had at least 2 days of observations with both the App-specific usage patterns and the well-being outcome measures. In the filtered sample, we have 1,151 assessments from 151 people for screentime analysis, 1,105 assessments from 150 people for activation analysis, and 1,357 assessments from 191 people for the notification analysis.

Between and within-level analyses on app usage and well-being

First, we conducted a between-level analysis by calculating partial correlations between the average well-being score of the participants and the average app usage across study periods, adjusting for age and gender, as shown in Fig. 3. Bonferroni correction is conducted to correct for multiple testing. For example, regarding positive emotion with app-specific screentime, six regression models are fitted to test competing hypotheses, thus the level of confidence is 0.05/6 = 0.0083.

Then, we analyzed well-being outcomes (positive and negative emotions, eudaimonic activity, and eudaimonic well-being) and app usage features at level one (days), nested within individuals at level two, with the lme4 package in R (4.2.1) software86. Using mixed linear models, we explored within-person variations in well-being outcomes (Y) and App-specific features (X) of the same day, adjusting for well-being outcome of the previous day(Yt−1), age, gender, and school, with days as level one and individuals as level two. For example, positive emotion (y1)—screentime of Instagram (x1) is one pair of well-being outcomes – App-use feature predictors. In total, we have 288 pairs of such outcomes (12 outcomes and 4 aggregated outcomes) and predictors (3 features per app for 6 Apps), that we fit mixed linear models on. For each well-being outcome and App-use feature, we adjust the P values by P_adjusted = P_unadjusted * 6. The coefficient estimates that are significant after the Bonferroni correction are marked with * and highlighted in bold.

For each outcome and feature pair, we fitted two linear regression models, one with a random intercept, and one with a random intercept and a random slope. We then conducted likelihood ratio tests comparing the two models to select the model that provides a good fit and is more parsimonious. The App-specific usage features were person-mean centered in the mixed linear models, with the random intercept models specified as:

$$\:Wellbein{g}_{ti}={\beta\:}_{0i}+{\beta\:}_{1}AppFetur{e}_{ti}+{\beta\:}_{2}Wellbein{g}_{(t-1)i}+{\beta\:}_{3}Ag{e}_{i}+{\beta\:}_{4}Gende{r}_{i}+{\beta\:}_{5}HighSchoo{l}_{i}+{e}_{ti}$$
$$\:{\beta\:}_{0i}=\:{\gamma\:}_{00}+{\mu\:}_{0i}$$

And the random slope random intercept model is specified as:

$$\:Wellbein{g}_{ti}={\beta\:}_{0i}+{\beta\:}_{1i}AppFeatur{e}_{ti}+{\beta\:}_{2i}Wellbein{g}_{(t-1)i}+{\beta\:}_{3}Ag{e}_{i}+{\beta\:}_{4}Gende{r}_{i}+{\beta\:}_{5}HighSchoo{l}_{i}+{e}_{ti}$$
$$\:{\beta\:}_{0i}=\:{\gamma\:}_{00}+{\mu\:}_{0i}$$
$$\:{\beta\:}_{1i}=\:{\gamma\:}_{10}+{\mu\:}_{1i}$$

The mixed linear model results are presented in Table 1, zooming in on the relationship between the four aggregated well-being outcomes, positive/negative emotion and eudaimonic activity/well-being, and App-specific usage features. In addition, the correlation between App-specific usage features and individual well-being measures, such as sadness and feeling satisfied, are presented in Fig. 4 (presenting results for TikTok, WhatsApp, and Instagram) and Supplementary Fig. 1 (presenting results for Games Apps, YouTube, and Snapchat).

Same-day and 1-, and 2- day lagged within-level analyses on app usage and well-being

Next, we extended the same-day analysis to 1- and 2-day lagged analyses to test intra-individual relationships between well-being outcomes and App-specific usage features using multilevel modeling. Similar to the same-day analysis, we conducted Bonferroni correction for multiple testing. The 1-day lag random intercept model is specified as:

$$\:Wellbein{g}_{ti}={\beta\:}_{0i}+{\beta\:}_{1}AppFetur{e}_{(t-1)i}+{\beta\:}_{2}Wellbein{g}_{(t-1)i}+{\beta\:}_{3}Ag{e}_{i}+{\beta\:}_{4}Gende{r}_{i}+{\beta\:}_{5}HighSchoo{l}_{i}+{e}_{ti}$$
$$\:{\beta\:}_{0i}=\:{\gamma\:}_{00}+{\mu\:}_{0i}$$

And the 1-day lag random slope random intercept model is specified as:

$$\:Wellbein{g}_{ti}={\beta\:}_{0i}+{\beta\:}_{1i}AppFeatur{e}_{(t-1)i}+{\beta\:}_{2i}Wellbein{g}_{(t-1)i}+{\beta\:}_{3}Ag{e}_{i}+{\beta\:}_{4}Gende{r}_{i}+{\beta\:}_{5}HighSchoo{l}_{i}+{e}_{ti}$$
$$\:{\beta\:}_{0i}=\:{\gamma\:}_{00}+{\mu\:}_{0i}$$
$$\:{\beta\:}_{1i}=\:{\gamma\:}_{10}+{\mu\:}_{1i}$$

The results comparing the same-day, 1-day, and 2-day lagged multilevel models for TikTok, WhatsApp, and Instagram are presented in Fig. 5, and the results for Games Apps, YouTube, and Snapchat are presented in Supplementary Fig. 2.

Attrition analysis

We remove participants from data analysis if the participant finished less than 2 assessments across the entire study period. We conducted hypotheses testing to examine whether people dropped from the study have significant differences in demographic characteristics, App-specific usage features, and well-being outcomes. Analysis of Variance (ANOVA) and Chi-squared test were applied to check whether group means and frequencies were significantly different. The results of the attrition analysis are presented in Supplementary Table 3. Given the attritions we observed in our study, we also conducted a power analysis for the multilevel modeling using the mlmpower package in R87. Based on the observed data, the intraclass correlation coefficient (ICC) ranged from 0.45 to 0.6, and variance explained by the fixed effects of Level 1 and Level 2 predictors were around 0.003 and 0.0035, respectively. The estimated power results show that with n_between = 150 and n_within = 7, the power estimates for the app-specific usage variables ranges from 0.66 ± 0.07 to 0.75 ± 0.06, and with n_between = 150 and n_within = 10, the power estimates ranges from 0.80 ± 0.08 to 0.88 ± 0.07. We discussed the attrition issue in the “Limitations” section above.