Introduction

Orthognathic surgery is a common and effective treatment for correcting dentofacial deformities caused by congenital anomalies, trauma, or developmental disorders. It not only improves occlusal function but also significantly enhances facial aesthetics and quality of life for affected individuals1,2. As awareness of facial appearance and oral health continues to grow, more patients are actively seeking information about orthognathic surgery prior to treatment.

In parallel, the Internet has become a major source of health information. Among various online media, short video platforms such as YouTube, TikTok, and BiliBili have rapidly emerged as influential channels for health education3. These platforms allow easy access to visually engaging content and are widely used by both health professionals and the public. However, recent studies have raised concerns about the quality and reliability of health-related content on these platforms. Research evaluating the accuracy of medical videos on YouTube has revealed that a substantial portion of content is either incomplete or misleading, potentially influencing patients’ understanding and decision-making3,4.

Despite this, little attention has been given to short video platforms especially regarding orthognathic surgery. BiliBili and TikTok are among the most popular platforms in China, with millions of daily active users, including many seeking health-related information5. Yet, there is currently a lack of systematic analysis evaluating the credibility, educational value, and influencing factors of videos related to orthognathic surgery on these platforms.

Therefore, the aim of this study is to evaluate the quality and reliability of orthognathic surgery-related content on BiliBili and TikTok, and to identify how factors such as video source, content type, and user engagement influence video quality. By doing so, this study aims to provide insights into the current status of health information dissemination in the digital age and highlight the need for improved regulation and professional participation in online health communication.

Methods

Search strategy and data collection

In this cross-sectional study, the top 100 videos related to “正颌手术” (“orthognathic surgery”) were searched on BiliBili and TikTok via a newly registered account on February 24, 2025, to avoid algorithm bias (Fig. 1). Non-Chinese, duplicate, and irrelevant videos were excluded. Basic video data—source, content, duration, upload date, and user interactions (likes, collects, comments, shares)—were extracted on the same day to minimise time bias. Professional accounts were verified via platform details and hospital websites. All the data were recorded in Excel (Microsoft Inc., Redmond, United States).

Fig. 1
figure 1

Video search strategy for orthognathic surgery.

Classification of videos

The source categories for the videos were medical professionals and nonmedical persons. For the purposes of this study, orthognathic surgeons and doctors from other fields were classified as medical professionals, while science communicators and patients were classified as nonmedical persons. The categorised content of the video comprised disease knowledge, surgical treatment, risks and complications, personal experiences, and news/advertisements.

Video assessment

Videos were assessed with the Global Quality Scale (GQS) scale, and the modified DISCERN (mDISCERN) tool was used to determine the reliability of the information presented6. The GQS range is from 1 to 5; the higher the score is, the better the quality. The mDISCERN range is from 0 to 5, according to five yes/no questions, which reflects the reliability of the information7. The scoring criteria are shown in Tables 1 and 2. All videos were independently scored by two orthognathic surgeons, Liang Xia and Jianfei Zhang; in cases of disagreement, a discussion was held with a senior specialist to settle them, Wenwen Yu.

Table 1 Description of the global quality score (GQS) for assessing the quality of videos (Scoring ranges from 1 to 5).
Table 2 Description of the modified DISCERN (mDISCERN) tool for evaluating the reliability of videos (Scoring ranges from 0 to 5).

Statistical analysis

The Shapiro‒Wilk test was used to test the data for normality. Depending on the presence of distribution and variance, Student’s t test, Welch’s t test, or the Mann‒Whitney U test was appropriate for group comparisons. Interrater agreement was measured by Cohen’s kappa. Spearman correlation analyses the associations between the quality of the videos and users’ interactions, such as likes, collects, comments, and shares. A P value < 0.05 was considered to indicate statistical significance. Analyses were performed via GraphPad Prism 9.5.1(GraphPad Software, San Diego, United States).

Results

Basic characteristics of the videos

A total of 200 videos were analysed, with 100 each from BiliBili and TikTok. As shown in Table 3, TikTok videos received more likes, shares, collects, and comments than BiliBili videos did, although the differences were not statistically significant (P > 0.05). However, the BiliBili videos were significantly longer in duration (330.74 ± 286.74 vs. 87.64 ± 94.31 s, P < 0.001) and had been online for a longer period (986.79 ± 730.15 vs. 315.24 ± 220.53 days, P < 0.001).

Table 3 Baseline characteristics of the videos on bilibili and tiktok.

Figure 2; Tables 4 and 5 summarise the sources and types of videos. On both platforms, more than half of the videos share personal experiences with patients. On BiliBili, patients uploaded 54% of the videos, followed by medical professionals (18%) and science communicators (10%). On TikTok, patients uploaded 51% of the videos, and medical professionals uploaded 48%. From the content perspective, the main content on both platforms is videos that share personal experiences with patients and medical knowledge (Fig. 2). For BiliBili, 55% of the videos share experiences, 22% are medical knowledge videos, 18% are videos about surgical treatments, and the rest are related to complications or advertisements. On TikTok, 57% of the videos share experiences, 26% are medical knowledge videos, 14% are related to surgical content, and 3% are videos discussing risks or complications.

Fig. 2
figure 2

Percentage of Orthognathic Surgery Videos by Source and Content on BiliBili and TikTok. (A) Distribution of video sources on both BiliBili and TikTok. (B) Distribution of video sources on BiliBili. (C) Distribution of video sources on TikTok. (D) Distribution of content types on both BiliBili and TikTok. (E) Distribution of content types on BiliBili. (F) Distribution of content types on TikTok.

Table 4 Characteristics of videos by source and content on bilibili.
Table 5 Characteristics of videos by source and content on tiktok.

Assessment of video quality and reliability

Interobserver consistency for the GQS and mDISCERN scores was confirmed, with κ values of 0.631 and 0.592, respectively. The BiliBili videos had higher average scores (GQS: 2.50 ± 0.93; mDISCERN: 1.96 ± 0.91) than did the TikTok videos (GQS: 1.89 ± 0.81; mDISCERN: 1.57 ± 0.62), with both differences being statistically significant (P < 0.001). However, overall video quality and reliability on both platforms were low (Table 3; Fig. 3).

Fig. 3
figure 3

GQS, mDISCERN scores, and quality/reliability distributions of short videos associated with orthognathic surgery across BiliBili and TikTok. (A) Comparison of the GQS between the BiliBili and TikTok videos. (B) Proportion of videos at different quality levels. (C) Comparison of mDISCERN scores between the BiliBili and TikTok videos. (D) Proportion of videos at different reliability levels. ***P < 0.001.

Further analysis revealed that videos from medical professionals had significantly higher GQS scores than those from patients or nonprofessional communicators did (P < 0.05), and videos focused on disease knowledge or surgical treatment scored higher than personal experience-sharing videos did (both P < 0.0001) (Fig. 4a). The mDISCERN results paralleled the GQS findings (Fig. 4b), with videos by medical professionals showing significantly higher reliability than those by patients (P < 0.0001). Videos related to disease knowledge and surgical treatment also scored higher than personal experience-sharing videos did (P < 0.0001).

Fig. 4
figure 4

GQS (A) and mDISCERN (B) scores of orthognathic surgery-related videos categorised by source and content. *P < 0.05, **P < 0.01, ***P < 0.001, ***P < 0.0001.

To investigate whether different types of professionals affect the quality and reliability of videos, the sources of professionals are divided into the field of orthognathic surgery and other fields (almost all of them are orthodontists). Interestingly, the GQS and mDISCERN scores of videos in the field of orthognathic surgery are lower than those of videos in orthodontic medical fields (P < 0.01 and P < 0.001, respectively) (Fig. 4).

Spearman correlation analysis

Spearman correlation was used (Table 6) since the data were nonnormally distributed. Notably, video quality was significantly positively correlated with video reliability (r = 0.762, P < 0.001), as were collection (r = 0.273, P < 0.001), shares (r = 0.166, P = 0.019), duration (r = 0.235, P < 0.001), and days since release (r = 0.285, P < 0.001). Video reliability showed similar correlations with collects (r = 0.191, P = 0.007) and shares (r = 0.180, P = 0.011). Likes were highly correlated with collects (r = 0.907, P < 0.001), shares (r = 0.891, P < 0.001), and saves (r = 0.900, P < 0.001). Video collection was significantly positively correlated with shares (r = 0.897, P < 0.001), comments (r = 0.878, P < 0.001), and duration (r = 0.237, P < 0.001), whereas shares were linked to comments (r = 0.183, P = 0.001). Additionally, days since release correlated positively with quality, reliability, collects, shares, comments, and duration.

Table 6 Spearman correlation analysis among video Variables.

Discussion

In this cross-sectional study, the GQS and mDISCERN tools were used to evaluate the quality and reliability of the top 100 orthognathic surgery-related videos on BiliBili and TikTok. TikTok attracted a broader audience, while BiliBili videos were generally longer in duration. Videos on TikTok scored significantly lower than those on BiliBili in both quality and reliability, likely due to shorter durations and limited content coverage. However, overall scores on both platforms were suboptimal, as more than half of the videos were patient-generated rather than professionally curated. Additionally, the ease of uploading videos and the lack of strict scientific and authenticity checks may also contribute to this issue. Only one video achieved a full mDISCERN score among the 200 analyzed.

Professionally uploaded videos, especially those explaining medical knowledge or demonstrating surgical techniques, exhibited higher quality and reliability. Orthognathic surgeons’ videos were generally inferior to those made by orthodontists, potentially due to clinical workload or limited media experience. Patient-uploaded videos were of the lowest quality, often disorganized and lacking educational value. Videos sharing personal experiences were less trustworthy and informative compared to those focused on medical education or surgical demonstration.

Video quality was closely related to its source and content. More than half of the videos were uploaded by nonmedical individuals, often lacking critical information or context. In contrast, videos created by medical professionals were generally more reliable, in line with previous findings8,9. Patient-generated content is prone to emotional bias and limited medical understanding, which can mislead viewers10. Moreover, the brief nature of short videos restricts depth, which further compromises quality. A positive correlation was observed between video duration and quality, explaining why BiliBili videos tended to be of better quality than those on TikTok.

Although video quality and reliability were positively correlated, neither was significantly associated with user interaction metrics such as likes or comments. This implies that viewers may lack the ability to discern high-quality content11,12. Popularity-driven algorithms may amplify low-quality videos, as noted in studies on YouTube12. While previous research indicates viewer preference for shorter videos13,14 in our study, longer videos were more likely to be informative and reliable.

GQS and mDISCERN were effective for assessing video quality and reliability. The GQS, a 5-point scale developed by Bernard et al., is widely used to assess health-related education materials15,16,17,18. mDISCERN evaluates the quality of treatment-related health information19,20,21 and has been adapted for video assessment on a 5-point scale22. Despite their consistency, both tools are limited by their focus on textual content and may inadequately capture the visual or interactive elements of videos. Additional tools may be needed to comprehensively assess video-based health information.

With growing demand for orthognathic surgery to improve facial aesthetics, occlusion, and airway function2,23,24. Internet-based platforms play a crucial role in patient education25. However, the quality of online health videos remains uneven due to insufficient regulation, and patients often struggle to evaluate credibility3,17,20,26. High-quality videos are vital to helping patients build realistic expectations and avoid misinformation. The Chinese government has issued regulations aimed at improving the quality of online health content27. Medical professionals are encouraged to produce accurate, accessible, and engaging videos to raise public awareness of orthognathic surgery.

This study has several strengths. It is among the few that compare both TikTok and BiliBili, the two leading Chinese short-video platforms, which serve different demographics. The GQS and mDISCERN were applied here systematically to assess video quality and reliability; therefore, few studies have approached this particular content from an orthognathic surgeon’s perspective. For interrater reliability, two independent raters ensured objectivity and consistency, with discrepancies resolved by a third reviewer.

Limitations included that only the top 100 videos were analysed from each platform; however, prior research supports the representativeness of such small samples11,27,28. Additionally, this cross-sectional analysis does not account for content changes over time, necessitating longitudinal studies. Lastly, the study focused exclusively on Chinese-language videos, limiting generalizability to other languages or regions. Given the global relevance of orthognathic surgery, future cross-cultural studies are warranted to promote reliable online medical information worldwide.

Conclusion

With the increasing demand for orthognathic surgery, short video platforms have recently become pivotal sources of health information. However, the quality and reliability of most orthognathic surgery-related videos on BiliBili and TikTok are poor, especially those where patients take part and are not scientifically based. In contrast, the videos made by medical professionals, the vast majority of which deal with explaining diseases and treatments, are relatively better. This emphasises the necessity of stricter control over medical content on social media, such as the establishment of review systems, improvement of recommendation algorithms, and greater exposure to high-quality content. Evidence-based accurate videos made by medical professionals are welcomed, but the critical thinking ability of viewers in the assessment of the credibility of health information is much more appreciated.