Introduction

Diabetic kidney disease (DKD) is a common microvascular complication of diabetes and a major cause of end-stage renal disease1. With the global rise in diabetes prevalence, the incidence of DKD has also increased, occurring in approximately 30–40% of patients with diabetes2,3,4. In China, DKD has surpassed glomerulonephritis-related chronic kidney disease as the leading cause of chronic kidney disease5. Although integrated management strategies can delay progression, no curative therapy is currently available6,7,8. Therefore, it is crucial to raise public awareness of early DKD detection and self-management.

In the digital era, the widespread adoption of the internet and the rapid emergence of short video platforms have granted the public unprecedented access to health information, substantially enhancing health literacy9,10,11. A study has shown that about 80% of internet users use online sources to obtain health information12. TikTok and Bilibili, two of the leading short video platforms in China, have become important channels for disseminating medical and health knowledge to the public12,13. Owing to their wide reach, speed, and interactivity, these platforms offer considerable opportunities for health education and public engagement14. However, because videos can be freely uploaded without filtering, these platforms often feature low-quality and unreliable content, with some even conveying misleading or deceptive information15,16. This issue is particularly concerning for chronic diseases like DKD, which require long-term management. Exposure to inaccurate or incomplete information may impair patients’ understanding and reduce treatment adherence.

Although previous studies have assessed the quality of health information in short videos on topics such as liver cancer15, cervical cancer17, gastroesophageal reflux disease18, and schizophrenia19, systematic investigations focusing on DKD-related content remain scarce. Most research emphasizes content quality while neglecting user engagement, and few studies have quantitatively assessed or predicted behaviors such as likes. This gap limits both understanding of DKD-related content and insights into the factors shaping user engagement.

Given this, the present study focuses on health videos related to DKD on TikTok and Bilibili. A cross-sectional design was employed to systematically assess the quality and reliability of video content using standardized assessment tools. Additionally, the XGBoost algorithm was applied to model video characteristics and user engagement metrics. This study was designed as an exploratory evaluation of DKD-related health information on China’s leading short video platforms, focusing on video quality, reliability, and factors associated with user engagement.

Materials and methods

Search strategy and data collection

This exploratory cross-sectional study analyzed DKD-related short videos on TikTok and Bilibili. On April 4, 2025, we systematically searched the Chinese versions of TikTok and Bilibili using the keyword “糖尿病肾病” (DKD in Chinese) in the search bar. To minimize personalized recommendations, we cleared the browsing history, conducted the searches in anonymous mode, and manually set the sorting method to comprehensive ranking; however, some algorithm-dependent bias could not be entirely eliminated. Videos were excluded if they were advertisements, entirely in English, duplicates (only the first video retained), or unrelated to the theme (as defined in Supplementary Table S1), until the top 100 relevant videos were identified. The sample size was based on previous cross-sectional studies of short videos, which have demonstrated good representativeness and statistical stability15,20.

Data collection was completed on April 5, 2025, encompassing basic information for all selected videos. The study conducted a quantitative analysis of video data across several dimensions, including user interaction metrics such as likes, comments, saves, and shares, as well as content features like background music (BGM), dietary management, subtitle settings, and video length. Additionally, uploader-related information was recorded, including whether the uploader was a patient, whether the account was verified, the number of followers, and the total number of videos published.

Video classification

Each video was assigned a classification label based on its style and uploader attributes, which facilitated subsequent analyses. Video styles were categorized into five types: (1) solo narration, (2) PPT/classroom-style explanation, (3) animation/action, (4) clinical setting, and (5) TV show/documentary. Uploader identity was classified into four categories: (1) professional institution, (2) professional individual, (3) non-professional institution, and (4) non-professional individual. This classification framework was adapted from the standards established by Zheng15. To further refine uploader attributes, a secondary classification for “verified” accounts was implemented. Detailed classification criteria are provided in Supplementary Tables S2-S4.

Video quality and reliability assessments

Three established tools were used to assess video quality and reliability. Although originally developed for websites or written health materials, mDISCERN and GQS have been increasingly applied in studies evaluating health-related videos on platforms such as TikTok and Bilibili13,21,22,23. The GQS employs a 5-point scale to evaluate overall content quality, focusing on dimensions such as content flow, information completeness, and clinical applicability24. In contrast, mDISCERN emphasizes scientific accuracy and reliability, providing a content quality framework based on five dimensions: information accuracy, evidence reliability, balance of viewpoints, clarity of information sources, and expression of uncertainty25. The MQ-VET integrates patient education and clinical applicability, offering a more detailed and structured evaluation through four modules: content accuracy, educational effectiveness, comprehensibility, and applicability, with a total of 15 specific criteria assessed across these modules26. Detailed descriptions of these tools can be found in Supplementary Tables S5-S8. Prior to scoring, two assessors with medical backgrounds (JJ and LS) underwent standardized training and calibration tests. In case of discrepancies, a nephrology expert (ZH) with extensive clinical experience intervened to resolve inconsistencies and ensure the reliability of the assessments.

XGBoost-Based prediction of likes

To further examine user engagement with DKD-related short videos, the dataset of 200 samples was randomly divided into training and testing sets at a 7:3 ratio for XGBoost model development and performance evaluation. Likes, as the most immediate and simplest form of feedback, reflect viewers’ instant recognition and emotional resonance with the content27. A previous study has shown that videos with higher like counts are more likely to be recommended to users, as likes consistently demonstrate a positive correlation with diverse engagement metrics and serve as a key indicator of content popularity28. Therefore, likes were selected as the outcome variable, with video and creator features such as follower count, BGM, subtitle availability, certification status, video length, the number of topics, uploader type, and days since upload as predictors. Given the skewed distribution and presence of zero values in the like count, a log (1 + x) transformation was applied to normalize the data and improve model fit. The model was developed using the “xgboost” package, with three-fold cross-validation to assess robustness and grid search for hyperparameter tuning to prevent overfitting. Feature importance was visualized, and SHapley Additive exPlanations (SHAP) values were used to interpret the impact of each variable on the predicted like count. Finally, the model results were integrated into a Shiny application framework for interactive web-based visualization.

Statistical analysis

All statistical analyses were performed using IBM SPSS Statistics (version 27.0) and R (version 4.4.3). Continuous variables were tested for normality using the Shapiro-Wilk test. For normally distributed data, results were presented as mean ± standard deviation, and comparisons were made using the independent samples t-test. For non-normally distributed data, results were presented as median and interquartile range (IQR), and comparisons were performed using the Mann-Whitney U test. Categorical variables were presented as frequencies and percentages, and group comparisons were conducted using the chi-square test or Fisher’s exact test. To ensure the reliability of the assessments, inter-rater consistency was quantified using the intraclass correlation coefficient (ICC)29. To address potential non-independence among videos from the same uploader, uploader-level ICCs were calculated to quantify clustering. Random-intercept multilevel linear models were additionally fitted, where appropriate, to examine whether the main results remained consistent after adjusting for this clustering. A p-value < 0.05 was considered statistically significant.

Results

Video characteristics

Initially, 141 videos were identified on TikTok and 143 on Bilibili. After applying standardized inclusion and exclusion criteria, 41 TikTok videos (24 irrelevant, 17 duplicates) and 43 Bilibili videos (28 irrelevant, 13 duplicates, 1 in English, 1 advertisement) were excluded. Ultimately, 100 eligible videos were retained per platform, yielding 200 videos for analysis (Fig. 1).

Fig. 1
figure 1

Flowchart of DKD-related video selection on TikTok and Bilibili.

As shown in Table 1, user engagement data revealed that TikTok significantly outperformed Bilibili in terms of likes, saves, shares, and comments (all p < 0.001). Additionally, the two platforms differed in video length (p < 0.001) and days since upload (p < 0.001). TikTok videos were generally more up-to-date and shorter. Regarding content presentation, Bilibili videos predominantly featured PPT/classroom-style explanation (47.0%) and solo narration (26.0%), while TikTok videos primarily featured solo narration (66.0%) (Supplementary Figure S1). Compared to Bilibili, TikTok videos made greater use of technical elements, with 49.0% incorporating BGM and 97.0% including subtitles. In terms of content, both platforms focused on “symptoms and diagnosis” (Bilibili: 74.0%; TikTok: 72.0%) and “treatment and prevention” (Bilibili: 69.0%; TikTok: 63.0%). The radar chart in Supplementary Figure S2 illustrated that Bilibili covered a broader range of topics, particularly in core medical areas such as “pathogenesis,” “epidemiology,” and “treatment and prevention.” Content related to dietary management was underrepresented on both platforms, with coverage below 30.0% (Table 1).

Table 1 Comparison of DKD-related video characteristics between TikTok and Bilibili.

Uploader characteristics

This study included 127 video uploaders, with 67 on Bilibili and 60 on TikTok. Table 2 showed that TikTok uploaders had a median follower count of 111,500 and posted more videos than those on Bilibili. On TikTok, 95.0% of uploaders were professional individuals, a significantly higher proportion than Bilibili’s 49.3%. In contrast, non-professional individuals made up the second largest group on Bilibili, accounting for 29.8%. Among verified uploaders, the majority were healthcare professionals on both platforms (TikTok: 98.3%; Bilibili: 82.6%).

Table 2 Uploader characteristics for DKD-related videos on bilibili and TikTok.

Video quality and reliability assessment

The ICC values were all > 0.8 (Supplementary Table S9), ensuring the reliability of the subsequent data analysis. As shown in Fig. 2, TikTok outperformed Bilibili in both GQS and MQ-VET scores (GQS: p < 0.001, MQ-VET: p = 0.013). Specifically, the median GQS score for TikTok was 4.0 (IQR: 3.0–4.0), higher than Bilibili’s median of 3.0 (IQR: 2.0–4.0). The median MQ-VET score for TikTok was 45.5 (IQR: 41.0–49.0), also higher than Bilibili’s median of 42.0 (IQR: 40.0–46.0). In contrast, there was no significant difference between the two platforms in mDISCERN scores, with both platforms having a median score of 2.0 (IQR: 2.0–2.0).

Fig. 2
figure 2

Distribution and comparison of quality and reliability scores of DKD-related short videos between TikTok and Bilibili. Note: (A) Distribution of GQS scores for TikTok and Bilibili; (B) Comparison of GQS scores between the two platforms; (C) Distribution of mDISCERN scores for TikTok and Bilibili; (D) Comparison of mDISCERN scores between the two platforms; (E) Distribution of MQ-VET scores for TikTok and Bilibili; (F) Comparison of MQ-VET scores between the two platforms. ns, not significant; *p < 0.05, ****p < 0.0001.

To further investigate the impact of uploader background on the scores, we conducted subgroup analyses. Uploaders were categorized into two groups based on their background: the professional group (comprising professional individuals and institutions) and the non-professional group (comprising non-professional individuals and institutions). As shown in Fig. 3 and Supplementary Table S10, videos from the professional group generally scored higher than those from the non-professional group across GQS, mDISCERN, and MQ-VET, suggesting that professional background may be associated with improved informational quality and reliability.

Fig. 3
figure 3

Comparison of quality and reliability scores of DKD-related videos from professional and non-professional uploaders on Bilibili and TikTok. Note: (A) Comparison of GQS scores between professional and non-professional videos on Bilibili; (B) Comparison of GQS scores between professional and non-professional videos on TikTok; (C) Comparison of mDISCERN scores between professional and non-professional videos on Bilibili; (D) Comparison of mDISCERN scores between professional and non-professional videos on TikTok; (E) Comparison of MQ-VET scores between professional and non-professional videos on Bilibili; (F) Comparison of MQ-VET scores between professional and non-professional videos on TikTok. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001.

Sensitivity and clustering assessment

Evidence of uploader-level clustering was observed for video quality scores on both platforms (Supplementary Table S11), suggesting some degree of within-uploader correlation. After adjustment for random effects using multilevel linear models, the positive association between uploader professionalism and video quality remained statistically significant on both platforms. Videos from professional creators generally achieved higher GQS, mDISCERN, and MQ-VET scores than those from non-professionals (Supplementary Tables S12–S13). Although clustering may have influenced the precision of estimates to a limited extent, the overall pattern of associations appeared broadly consistent after adjustment.

When comparing engagement metrics between platforms, ICCs for likes, saves, shares, and comments ranged from 0.062 to 0.132 (Supplementary Table S14), indicating weak clustering at the uploader level. After accounting for this dependence, TikTok videos generally received more likes (p = 0.006), saves (p = 0.004), and shares (p = 0.029) than Bilibili videos, while the difference in comments approached significance (p = 0.057). These findings suggest that accounting for uploader-level clustering had limited impact on the observed platform differences in engagement (Supplementary Table S15).

Like-count prediction model

We successfully developed a like-count prediction model, achieving an R² of 0.833 in the training set, indicating a good fit (Fig. 4A). As shown in Fig. 4B, follower count, video length, and days since upload were the primary factors influencing the number of likes, with follower count showing the strongest association. In contrast, certification status and BGM contributed less to the prediction, indicating their relatively limited influence on user engagement. The beeswarm plot further revealed a negative correlation between video length and like count (Supplementary Figure S3). To enhance the model’s interpretability and practical application, we developed an interactive visualization platform based on the Shiny framework (https://mdyy.shinyapps.io/dkdvedio0/), which allows real-time prediction of like counts based on the characteristics of DKD-related short videos and dynamically displays the influence of each variable on the predicted outcome (Fig. 4C).

Fig. 4
figure 4

XGBoost-based prediction of like counts and web application deployment. Note: (A) Scatter plot showing the performance of the XGBoost model in predicting like counts; (B) Feature importance ranking of predictors in the model; (C) Web-based application interface for predicting like counts of DKD-related short videos based on input features.

Discussion

Principal findings

This study found that TikTok outperformed Bilibili in both GQS and MQ-VET scores, while the two platforms showed similar performance in mDISCERN scores. Although the overall video quality on TikTok was higher than that on Bilibili, both platforms exhibited varying degrees of deficiencies in content depth, evidence-based support, and the structural integrity of the information. Subgroup analysis revealed that uploaders with a medical background significantly outperformed non-professional uploaders in terms of scores. This trend persisted after adjusting for uploader-level clustering, suggesting that professionalism may be associated with more accurate and comprehensible content, though this relationship should be interpreted with caution.

Video quality and reliability

The study found that the proportion of DKD-related short videos citing authoritative references in the mDISCERN scores was extremely low, indicating that these videos still lack sufficient evidence-based support. Additionally, the “content accuracy” module in the MQ-VET tool received significantly lower scores compared with other modules, further reflecting the videos’ limitations in terms of information reliability. Most videos did not specify recording dates, update times, or references, which weakened the verifiability and timeliness of the content. These issues are not unique to DKD-related videos. Prior research has reported similar concerns, showing that many health-related videos on short video platforms fail to provide credible references or balanced perspectives, thereby increasing the risk of misinformation12,30. For example, Mueller et al. reported that 48.0% of atopic dermatitis–related short videos contained potentially harmful information, and more than two-thirds of psoriasis-related videos disseminated misleading claims31,32.

Several factors may contribute to this problem. First, citing authoritative sources requires substantial time and effort for searching, screening, and verification, which increases the workload for content creators. Second, even medical professionals often prefer simplified, accessible language in public health communication to facilitate comprehension and engagement33. Third, the short and fast-paced format of videos, particularly solo narration, makes it difficult to systematically reference clinical guidelines or research evidence.

Association between uploader background and video quality

Videos uploaded by professionals tended to show higher quality and reliability than those from non-professionals, in line with previous findings22,34. Healthcare professionals often possess stronger medical knowledge and are trained to follow evidence-based guidelines, which may enhance the accuracy and credibility of their content. Moreover, professional creators are more likely to disclose their authorship transparently, a factor that may contribute to higher mDISCERN scores.

The difference between professional and non-professional uploaders was particularly pronounced on TikTok (Fig. 3). TikTok enforces stricter verification standards for professional uploaders, such as requiring physicians to be licensed at tertiary hospitals and hold at least the title of attending physician22. In contrast, Bilibili applies more lenient entry criteria, allowing non-professionals easier access to publish health-related content, which may contribute to greater variability in quality22.

In our study, we also noted that some uploaders identified themselves as healthcare professionals but lacked verified credentials. This raises the possibility that certain individuals may present themselves as medical experts to increase credibility and attract viewers20,35. While such cases appear relatively uncommon, they highlight the need for caution when interpreting online health information, particularly in the absence of clear verification.

Factors influencing engagement

In the XGBoost model, follower count was emerged as the strongest predictor of likes, accounting for 60.6% of the model’s feature importance. This finding is consistent with previous studies indicating that creators with larger audiences generally achieve greater visibility and engagement on video-sharing platforms27,28. Once users follow a creator, they are more likely to receive updates and encounter that creator’s videos in their feed, thereby having more opportunities to engage than non-followers28. A larger follower base may also provide broader initial exposure, which in turn increases the likelihood of accumulating user interactions. Although the correlation between follower base and engagement appears robust, the present analysis cannot establish a definitive causal relationship.

This study revealed a negative association between video length and likes. Longer videos were linked to lower viewing ratios and fewer views, with each additional minute estimated to reduce viewership by nearly one-sixth28. Because users’ attention spans are limited and expectations are high, longer videos are less likely to sustain viewing or generate engagement28. Other factors, including certification status and background music, contributed only marginally to predicting likes. Although uploader verification has been recognized as an important factor for enhancing the credibility of health-related videos20,36, its impact on immediate engagement in our dataset was limited. Similarly, while background music may improve the overall viewing atmosphere, it does not appear to be a major factor influencing user engagement27,37.

Limitations

This study has several limitations. First, restricting the sample to the top 100 videos on TikTok and Bilibili may have introduced algorithm-driven selection bias and excluded other relevant content. Second, the quality assessment tools (GQS and mDISCERN), although widely used in prior research, were not specifically designed for short-form videos and may not fully capture their characteristics. Third, multiple videos from the same uploader could lead to clustering effects and reduce the independence of observations. Fourth, subgroup analyses were limited by small sample sizes and should be regarded as exploratory and hypothesis-generating. Finally, as a cross-sectional study based on data collected on a single day, the findings represent only a snapshot of platform content. Given the algorithm-driven dynamics and cultural specificity of TikTok and Bilibili, the representativeness and generalizability of the results are limited. Future studies should incorporate multi-platform and multi-time-point sampling, as well as patient perspectives and guideline concordance, to provide a more comprehensive evaluation.

Conclusion

This exploratory study offers an overview of the current landscape of DKD–related short videos on TikTok and Bilibili. Overall video quality was suboptimal, particularly regarding information accuracy and source attribution. Videos from professional uploaders were generally of higher quality, and follower count appeared to be associated with greater user engagement. Considering the algorithm-driven sampling, potential clustering among uploaders, and the limited adaptability of evaluation tools to short-form content, these findings should be interpreted as descriptive observations rather than causal inferences, offering a reference point for future systematic and longitudinal research.