Abstract
Addressing the risk of uncontrolled dissemination of AI deepfake videos in entertainment scenarios, this study constructs an explainable ensemble learning prediction framework from an entertainment computing perspective, systematically revealing the diffusion mechanisms of technology-enabled entertainment content. Guided by information ecosystem theory, the study first identifies nine core factors influencing deepfake video propagation through multidimensional feature decomposition. It innovatively proposes the RFECV-GA-PSO-RF hybrid feature selection algorithm to achieve efficient dimensionality reduction of entertainment computing features. Subsequently, the study employs a PSO-GA-XGBOOST ensemble model—fusing particle swarm optimization (PSO) and genetic algorithm (GA)—to achieve precise predictions of deepfake video propagation on real-world Chinese video platforms. This approach significantly outperforms existing models, demonstrating average improvements of 42.95% across four evaluation metrics (RMSE reduced to 1.230, MAPE reduced to 0.280, MAE reduced to 1.063, R² reaching 0.818). Finally, leveraging the interpretability of this predictive model, the study quantifies the importance of each feature and feature dimension. The proposed integrated prediction model not only provides novel predictive tools for the field of entertainment computing but also offers quantitative decision support for dissemination regulation and content ecosystem optimization in the era of intelligent entertainment, expanding the theoretical boundaries of interdisciplinary research in entertainment technology.
Similar content being viewed by others
Introduction
Under the Web 3.0 technological paradigm, Generative Artificial Intelligence (GAI) is reshaping the production logic of entertainment content, forming a new content ecosystem characterized by the triadic symbiosis of “technology-entertainment-users”. Artificial Intelligence Generated Content (AIGC) has emerged as a novel content generation model following Professional Generated Content (PGC) and User Generated Content (UGC)1. As a quintessential manifestation of GAI technology, deepfake technology employs generative adversarial networks (GAN) and diffusion models to achieve hyper-realistic forgeries of multimedia elements like faces and voices2. The resulting deepfake videos have become the most controversial technological artifacts in the digital entertainment sphere.This type of video profoundly influences contemporary entertainment practices through three key entertainment attributes: technological entertainment (creating surreal audiovisual experiences via algorithms), social entertainment (sparking viral dissemination on short-form video platforms), and ethical entertainment (deconstructing public figures’ images through playful satire). However, this technology-driven entertainment innovation faces a dual paradox. On the one hand, deepfake videos satisfy users’ primal craving for sensational entertainment through “technological deception”, spawning novel entertainment formats like “deepfake celebrity impersonation shows” and “AI face-swap variety shows” on platforms like TikTok and YouTube. On the other hand, the black-box nature of their algorithms blurs the boundaries of entertainment authenticity, triggering governance crises such as the erosion of news veracity, disordered communication systems, and failed public discourse guidance3. This contradiction creates a unique research tension in the field of entertainment computing: how to achieve a dynamic equilibrium between “entertainment innovation” and “dissemination security” through technological means? Consequently, effectively controlling the dissemination of AI deepfake videos and predicting their dissemination trends have become critical issues requiring urgent resolution. They also represent new challenges posed by emerging entertainment paradigms in the AI era. Notable cases include: in March 2025, AI-generated propaganda featuring Academician Zhang Boli of the Chinese Academy of Engineering promoting skincare products circulated; in January 2024, a deepfake video of Hong Kong SAR Chief Executive John Lee Ka-chiu selling investment products began spreading; in May 2023, deepfake videos emerged showing Democratic candidate Hillary Clinton endorsing Republican candidate DeSantis and Biden expressing dissatisfaction with transgender individuals, severely disrupting the 2024 U.S. presidential election; in March 2023, YouTube circulated a deepfake video of Ukrainian President Zelenskyy surrendering to Russia4.
The widespread dissemination of AI deepfake videos and their potential societal impacts cannot be overlooked, including misleading public perception, damaging personal reputation, undermining social trust, and even threatening national security5. Once deployed in great power competition, deepfake videos will pose a significant potential threat to national security, social stability, and public trust when used as information weapons in the form of information warfare. Specifically, at the national security level, deepfake videos involving state leaders could severely damage national image and disrupt the international relations landscape, while deepfake military operation videos could influence military decision-making and arms control. At the social stability level, deepfake videos containing financial insider information or economic policy content could undermine economic order, while deepfake videos depicting ethnic discrimination or violence could threaten public safety. At the level of public trust, deepfake videos concerning critical issues like human rights and ethnicity could undermine citizens’ political identity and trigger a crisis of public trust6.
To address the aforementioned issues, this study proposes an AI deepfake video dissemination prediction method based on the information ecosystem and PSO-GA-XGBOOST. First, feature elements for predicting AI deepfake video dissemination are identified using the information ecosystem theory. Next, the RFECV-GA-PSO-RF combined model is employed to screen these features, yielding core features for training the deepfake video dissemination prediction model. Finally, the PSO-GA-XGBOOST combined prediction model forecasts the dissemination of AI deepfake videos. Concurrently, leveraging XGBOOST’s inherent interpretability, it accurately identifies key feature indicators and dimensions influencing deepfake video dissemination, thereby revealing the underlying logical patterns governing their dissemination. This model offers three key advantages. First, during the feature element identification process, it integrates considerations of the technical characteristics that distinguish AI deepfake videos from other PGC and UGC videos, grounded in the theoretical framework of information ecosystems. This approach ensures a more comprehensive and targeted feature element identification process. Second, during feature selection, the RFECV-GA-PSO-RF combined model can systematically identify the feature subset that contributes most significantly to the target variable. This approach enables it to escape local optima, explore a broader feature space, and simultaneously enhance feature selection efficiency. Third, during predictive model construction, the PSO-GA-XGBOOST combined prediction model effectively addresses challenges such as XGBOOST’s feature selection difficulties, complex parameter tuning, and high overfitting risks, thereby enhancing the accuracy of predicting the dissemination of AI deepfake videos.
Literature review
Research on factors affecting video dissemination effectiveness
Existing research on factors influencing video dissemination effectiveness primarily revolves around three theoretical frameworks: the Lasswell 5 W communication model, the heuristic-systematic model, and the elaboration likelihood model. The Lasswell 5 W Communication Model, proposed by American scholar Harold Lasswell in 19487, is a communication process analysis framework. This model identifies five fundamental elements in the communication process: the communicator, the message, the channel, the audience, and the response. Scholars have applied this theory to analyze the communication effects of cultural UGC videos8, online videos9, and other similar content. The Heuristic-Systematic Model (HSM) is an information processing model proposed by psychologist Chaiken S. in 198010 to explain users’ thinking and behavior when receiving and processing persuasive information. This model comprises two components: heuristic cues and systematic cues. Scholars have applied this theory to analyze the dissemination effects of various online knowledge-based videos11, rumor-debunking short videos12, science popularization short videos13, university library videos14, and Cantonese opera videos15. The Elaboration Likelihood Model (ELM) is an information processing model proposed by American psychologists Richard E. Petty and John T. Cacioppo in 198616. It explains the fundamental process by which users are persuaded and change their attitudes. The model comprises two components: the central route and the peripheral route. Scholars have applied this theory to analyze the dissemination effects of false short videos17, health science popularization short videos18, and other content. Additionally, scholars have analyzed factors influencing video dissemination effectiveness based on information diffusion models and technology diffusion models, with research subjects including COVID-19 videos19.
Research on methods for predicting video dissemination effectiveness
Regarding the prediction of video dissemination effectiveness, existing research primarily quantifies this through metrics such as video likes, comments, saves, and shares. Different studies vary in their summarization of this composite indicator, with related terms including video popularity20,21. In this study, we uniformly summarize these metrics as video dissemination effectiveness. Regarding prediction methods for video dissemination effectiveness, the primary approaches encompass three categories of predictive modeling: statistical models, time series regression models, and machine learning models. Statistical models primarily employ least squares and linear regression for forecasting. For instance, Stephanie M. Brewer utilized prior least squares regression to reveal the relationship between video dissemination effectiveness and factors such as budget and reviews22; Wenbin Zhang integrated textual features into linear regression analysis, achieving a significant improvement in video dissemination effectiveness predictive performance23. Time series regression models predict outcomes by examining correlations before and after video dissemination. For instance, C. Dellarocas developed a time series prediction model grounded in diffusion theory, accounting for word-of-mouth’s impact on movie dissemination effectiveness, with results demonstrating superior predictive power compared to baseline models24. Machine learning models gradually improve prediction accuracy based on large amounts of training data, minimizing the error between predicted outputs and true labels. They aim to predict the propagation effects of new video data, encompassing models such as attention-based prediction models25,26, backpropagation neural networks27, network representation learning algorithms28, autoencoder algorithms29, and support vector machines30,31.
In summary, existing research has explored the factors influencing video dissemination effectiveness and predictive methodologies, yet several limitations remain. First, current studies on factors affecting video dissemination effectiveness primarily focus on PGC and UGC content, lacking analysis of AIGC. This fails to bridge the transition from the Web 2.0 era to Web 3.0, and neglects personalized consideration of emerging entertainment paradigms in the AI era. Second, due to the technical distinctiveness of AI deepfake videos compared to other video formats, existing video dissemination prediction models exhibit limitations in forecasting deepfake video dissemination. Their limited applicability results in suboptimal prediction outcomes. Therefore, there is an urgent need to integrate considerations of AI technological characteristics, identify key dissemination features of AI deepfake videos, and subsequently develop a high-precision prediction model for deepfake video dissemination.
Methodological framework
This paper first identifies the characteristic elements of deepfake video dissemination based on the information ecosystem theory. Subsequently, it collects multi-source data and extracts features for each indicator while proposing a quantitative method for assessing the dissemination effectiveness of deepfake videos. Next, it employs a combined RFECV-GA-PSO-RF model to screen features, obtaining core features for training the deepfake video dissemination prediction model. Finally, it proposes a PSO-GA-XGBOOST combined prediction model to forecast the dissemination of AI deepfake videos. Simultaneously, leveraging XGBOOST’s inherent interpretability accurately identifies key feature indicators and dimensions influencing AI deepfake video dissemination. This approach profoundly reveals the underlying logic governing such dissemination. By enabling predictable dissemination of deepfake videos, it ensures AI technology remains secure and controllable. This facilitates governance against the rampant dissemination of AI deepfake videos, balancing AI technological advancement with safety. The research framework is illustrated in Fig. 1.
Identifying key elements of AI deepfake video dissemination based on the information ecosystem theory
The information ecosystem is an artificial system composed of information subject, information, information technology, and the information environment, possessing certain self-regulating functions. Due to the interactions among these elements, they collectively drive information dissemination and influence the ultimate dissemination effectiveness. Consequently, it is frequently employed to analyze the information dissemination process32,33. The information subject serves as both the starting point and endpoint of the dissemination process, encompassing two roles: the sender and the receiver. The sender is primarily responsible for encoding information, selecting appropriate communication channels, and transmitting the message to the receiver. As the originator of the communication activity, the sender determines the nature, form, and method of the message’s delivery. As the endpoint of the communication process, the receiver is responsible for receiving and decoding the information transmitted by the sender. Furthermore, upon receiving the information, the receiver forms a feedback mechanism through responses or interactions, which in turn influences the sender’s subsequent communication behavior8. In the dissemination process of AI deepfake videos, “information” refers to the deepfake video as the disseminated content, “information technology” denotes the technical characteristics of the AI deepfake technology itself, and “information environment” signifies the information ecosystem in which deepfake videos operate. Based on the information ecosystem theory, this paper identifies nine characteristic elements of AI deepfake video dissemination across four feature dimensions: information subject, information, information technology, and information environment. The specific elements and their definitions are presented in Table 1.
The analysis of deepfake video dissemination proposed in this paper based on information ecosystem theory can also be understood as an advancement of the traditional Lasswell “5 W” communication theory in the information age. It employs information ecosystem theory to identify key elements of deepfake video dissemination, while selecting communication effects from Lasswell’s framework as the output variable. The input design incorporates not only Lasswell’s components—communicator, message, channel, and receiver—but also integrates information technology factors. This approach highlights the technical characteristics of generative AI deepfake technology and its pivotal role in the dissemination process.
Information subject comprise two categories: disseminators and recipients. The attributes of both disseminators and recipients directly influence the effectiveness of video dissemination11. Therefore, this study employs disseminator popularity and user age distribution to represent the attributes of disseminators and recipients, respectively. Disseminator popularity is measured by the number of followers the disseminator possesses on the video platform13, while user age distribution is assessed by the proportion of users aged 30 and below among the platform’s user base34.
Information refers to deepfake video content, where different characteristics of video content exert varying influences on its dissemination. Existing research indicates that video theme categories impact dissemination effectiveness35. Therefore, for deepfake videos, the video theme categories reflects both the content theme and the purpose of fabrication, potentially affecting dissemination outcomes. Regarding video duration, fragmentation is a core characteristic of videos in the information age, as users consume content during fragmented leisure time36. A study based on the YouTube platform revealed a significant negative correlation between video duration and user attention37, with user attention being a key factor influencing information dissemination38. Regarding video title length, existing research indicates its significant impact on dissemination effectiveness39. Video title length influences readership40, thereby affecting video propagation. Concerning the number of video tags, content creators reduce search costs and enhance users’ assessment of content value by adding genre-specific tags during publication41. Studies confirm that tag quantity is a crucial factor influencing video dissemination effectiveness42.
Information technology factors constitute the distinctive element that differentiates the dissemination of AI deepfake videos from other types of video content. The developmental objective of AI deepfake technology lies in generating highly realistic multimodal content such as images and videos. Therefore, when analyzing the impact of this technology on the dissemination effectiveness of deepfake videos, the examination primarily focuses on two aspects: visual dissemination technology factors and audiovisual matching dissemination technology factors. Among these, visual dissemination technology factors primarily refer to image generation and processing techniques, examining the authenticity of video footage, image detail levels, color balance, lighting effects, and consistency in dynamic imagery—that is, the degree to which images appear lifelike and whether they exhibit obvious artificiality. Audiovisual matching dissemination technology factors primarily refer to techniques ensuring temporal, content, and emotional consistency between audio and video content generated via deepfake technology. This evaluates the degree of audio-visual alignment, synchronization, and coordination within videos, reflecting the difficulty in discerning deepfake content.
The information environment refers to the information ecosystem in which deepfake videos exist, specifically the channels through which they are disseminated. It serves as the specific medium for the realization of dissemination activities. Drawing upon Zhang Ying’s research42, this study analyzes how differences in video account characteristics influence the dissemination effectiveness of AI-generated deepfake videos. Analogous to Hu Bing’s mechanism analysis of how content verticality influences video dissemination effectiveness13, this study selects account technical content verticality as the personalized factor distinguishing deepfake video dissemination from other UGC and PGC videos. It quantifies this factor using the proportion of AI-related videos within an account’s content, thereby analyzing the impact of the information environment on deepfake video dissemination.
Quantifying the dissemination effectiveness of AI deepfake videos
Assessing the dissemination effectiveness of AI deepfake videos requires comprehensive consideration of multiple indicator coding factors; it cannot be quantified solely through a single metric. This study proposes to utilize the Bilibili and Douyin platforms as data collection sources for deepfake videos. Therefore, in quantifying video dissemination effects, it references the methodology employed by Shen Hongzhou’s research43, which similarly relies on Bilibili and Douyin as video data collection platforms. This study employs a currently mainstream method for measuring dissemination effectiveness44,45, calculated based on likes, comments, and shares. The specific formula is shown in Eq. (1). This calculation method is applicable for processing integrated dissemination effectiveness data from both Bilibili and Douyin platforms. The constant 1 is included to avoid obtaining a logarithm of zero.
Dissemination Effectiveness
Feature selection for AI deepfake video dissemination prediction based on RFECV-GA-PSO-RF
To effectively predict the dissemination trends of deepfake videos, this paper proposes a hybrid feature selection method (RFECV-GA-PSO-RF) based on Recursive Feature Elimination Cross-Validation (RFECV), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Random Forest (RF). Through multi-stage feature selection and optimization, this method eliminates redundant features, enhancing the performance and interpretability of deepfake video dissemination prediction models. The specific steps are as follows.
Step1: Divide the dataset into training and testing sets.
Step2: Use Random Forest (RF) as the base model for preliminary training on the extracted features.
Step3: Evaluate each feature’s contribution to model prediction based on its importance score within the RF model.
Step 4: Gradually eliminate less important features using the RFECV algorithm, evaluating model performance with cross-validation after each removal. Specifically, in each iteration, remove one or more of the least important features, then retrain the model and compute the cross-validation score. Repeat this process until either a preset number of features is reached or model performance no longer improves significantly. Through RFECV, a feature subset is obtained that minimizes the number of features while ensuring model performance.
Step 5: Encode the feature subset selected by RFECV, where each feature corresponds to a gene position. The gene position value is either 0 or 1, indicating whether the feature is selected. Randomly generate a certain number of individuals, each representing a feature combination, to form the initial population. Design a fitness function to evaluate the quality of each individual.Based on the values of the fitness function, individuals with higher fitness are selected for reproduction. New individuals are generated through crossover and mutation operations. The selection, crossover, and mutation operations are repeated until the preset number of iterations is reached or convergence criteria are satisfied. Through GA optimization, the feature space can be further explored to discover more optimal feature combinations.
Step 6: Use the feature combinations optimized by GA as the initial particle swarm, where each particle represents a feature combination. Assign an initial velocity to each particle. Update the velocity and position of each particle based on its own historical optimal position and the global optimal position of the swarm. Repeat the velocity and position update operations until the preset iteration count is reached or convergence criteria are satisfied. Through PSO optimization, further fine-tune the feature combinations to enhance the model’s performance.
AI deepfake video dissemination prediction based on PSO-GA-XGBOOST
This paper employs a PSO-GA-XGBOOST model to fit the selected core features for predicting the dissemination of deepfake videos generated by artificial intelligence. XGBOOST is a powerful gradient boosting tree model, but practical applications often encounter challenges such as difficulty in feature selection, complex parameter tuning, and high risks of overfitting. To address this, this study proposes a hybrid optimization algorithm based on genetic algorithms (GA) and particle swarm optimization (PSO). The hybrid optimization process is similar to that of the aforementioned hybrid optimized random forest algorithm. In this approach, genetic algorithms can retain the features most influential on prediction outcomes, thereby enhancing model accuracy, while eliminating redundant features to reduce model complexity. Particle Swarm Optimization efficiently searches for the optimal hyperparameter combination of XGBOOST within the continuous parameter space. Simultaneously, it avoids getting stuck in local optima, thereby improving the stability of parameter tuning.
Experiments and analysis of results
Data sources and preprocessing
In the Web 3.0 era, the dissemination of deepfake videos is a process involving the interaction of four factors: information subject, information, information technology, and information environment. Relevant data originates from industry research reports, video platform operators, the deepfake videos themselves, and video users. This study references Shen Hongzhou’s analysis of the dissemination effectiveness of emergency knowledge short videos43, similarly selecting Bilibili and Douyin as the two mainstream platforms for video dissemination research. During video data collection, since no vertical channel for “deepfake videos” has been established across these platforms and video labeling lacks standardization and consistency, instances persist where users generate false videos using AI synthesis technology without explicitly indicating “AI” in video titles or tags. Therefore, this study builds upon Liu Chunnian’s deepfake video retrieval methodology46 while further refining search parameters. Keywords including “deepfake”, “AI synthesis”, and “AI-generated” were used to retrieve videos on Bilibili and Douyin. We selected videos from the “Comprehensive Ranking” and “Most Played” lists under each keyword on Bilibili, and from the “Comprehensive Ranking” and “Most Liked” lists under each keyword on Douyin. These videos demonstrate high influence and attention, indicating the samples possess a degree of representativeness. Additionally, these deepfake videos originate from different account entities, suggesting the samples exhibit diversity and heterogeneity.
The selection criteria for the aforementioned deepfake videos are as follows. (1) Authentic videos related to deepfake technology—such as science popularization content, news reports, awareness campaigns, and video generation tutorials—are excluded from this study. (2) Ordinary special effects videos created using video editing and compositing software to add animations, transitions, filters, or other effects that do not alter the fundamental content or character features within the video, and where the effects are relatively easy to distinguish from the real content, are excluded from this study. (3) Videos featuring virtual digital humans are excluded due to significant technical differences from deepfake videos and their generally discernible nature. (4) Duplicate samples obtained from the same video platform under different search conditions are excluded. (5) Duplicate deepfake video samples across different platforms are not excluded. This is because variations in dissemination effectiveness for identical content across platforms effectively illustrate the impact of platform factors, user group characteristics, and account attributes on deepfake video propagation. (6) Samples exhibiting account anomalies—such as deleted disseminator accounts or closed comment sections—are excluded.After screening, the initial dataset comprised 344 deepfake videos, including 248 videos from the Bilibili platform and 96 videos from the Douyin platform. It should be noted that disseminators exhibit preferences when selecting video platforms, and platforms themselves gradually develop distinct content positioning—including preferences for specific video genres—during their evolution. These factors likely contribute to the non-equivalent distribution of video samples across platforms, a phenomenon confirmed by existing dual-platform studies45. Therefore, the non-balanced sample obtained through the aforementioned screening criteria is reasonable and consistent with actual circumstances.
Considering the “long-tail effect” and “seven-day effect” of online information dissemination47, actual observation of video sample data reveals that video data tends to stabilize one month after publication12. Therefore, the collection period for this study’s AI deepfake video dataset spans from November 6, 2024, to December 6, 2024. During the data collection period, sample data was reviewed weekly to verify continued existence. Video samples missing from any of the four monthly sampling instances were excluded. The retained sample data constitutes the video samples for this study, representing deepfake videos deemed capable of stable dissemination and exerting a certain influence on the external environment. Initial sampling identified 344 video samples. After a one-month observation period, the final dataset comprised 338 video samples: 246 from the Bilibili platform and 92 from the Douyin platform. Some of the video samples are shown in Figs. 2 and 3.
Quantification and normalization of eature elements
Among the AI deepfake video dissemination characteristics identified in Table 1, all technology-related features represent unique elements distinguishing deepfake video dissemination from other UGC and PGC video dissemination. These include three key characteristics: visual dissemination technology level, audiovisual matching dissemination technology level, and account technical content verticality. Among these, the visual dissemination technology level and audiovisual matching dissemination technology level require manual coding for quantification. Currently, detection and identification technologies for deepfake techniques remain in a phase of ongoing development. No software or tools capable of evaluating visual deepfake technology and audiovisual synchronization technology with high accuracy have yet been developed48. Auditory and visual technical features extracted automatically by computers exhibit significant errors, making it difficult to meet the rigorous requirements of empirical research. This study references Fu Shaoxiong’s research methodology17 to quantify the visual and audiovisual matching technical characteristics of deepfake videos through manual coding. The coders comprised five doctoral candidates majoring in Management Science and Engineering at the School of Economics and Management, possessing strong research foundations and information literacy. Following training and trial coding sessions, all coders mastered the coding requirements. During the formal coding phase, each coder was required to score the visual dissemination technology and audiovisual matching dissemination technology of each deepfake video on a scale of 1 to 7 after viewing it. The scores represented the level of deepfake technology, ranging from low to high. Among them, a score of 1–3 indicates that the deepfake technology is at a low level, with relatively crude techniques, low naturalness, and low difficulty in detection; a score of 4 indicates that the deepfake technology is at a medium level, with somewhat improved techniques, acceptable naturalness, but still some flaws, and moderate difficulty in detection; a score of 5–7 indicates that the deepfake technology is at a high level, with more sophisticated techniques, high naturalness, and high difficulty in detection. It is worth noting that a score of 4, as the dividing point for the medium level, may seem limited in range numerically, but this division aligns with the definition of a midpoint in mathematics. Logically, in a 1–7 score range, 4 sits exactly in the middle, concisely representing the characteristics of medium-level deepfake technology. In actual coding operations, coders combine subtle differences in various technical indicators and use 4 as a core reference to carefully identify and reasonably categorize deepfake technologies near the boundaries of medium level, ensuring the accuracy and objectivity of the evaluation results. Therefore, this division is sufficient to meet the precision requirements of this study’s assessment of deepfake technology levels. After coding, consistency checks were performed on the coding results. The Krippendorff’s alpha coefficients for the visual dissemination technology, and audiovisual matching dissemination technology coding results were 0.859 and 0.820, respectively, both exceeding 0.7. This indicates good consistency in the coding results49. Additionally, the feature element of video theme category also required manual tagging coding, with the specific coding details shown in Table 2.
All categorical variables in this study were measured through manual coding. Coders were doctoral candidates in Management Science and Engineering with strong research expertise. Given the objective nature of coding items, a double-blind coding system was employed: initial coding by one individual followed by verification by another. Discrepancies were resolved by a third coder, with final coding undergoing consistency testing. After obtaining quantitative data for all feature elements, maximum-minimum normalization was applied. For the video theme category—an ordinal categorical variable—frequency coding was performed first, followed by normalization.
Core feature screening results
To achieve better model fitting and enhance the generalization capability of the prediction model, this study employs the RFECV algorithm to identify the optimal number of feature variables. Experimental results indicate that removing one feature variable from the original nine yields stable performance when retaining eight features, as shown in Fig. 4. Therefore, the combined optimized RFECV-GA-PSO-RF algorithm was employed to retain the top 8 most influential features from the 9 listed in Table 1 for subsequent experiments. This involved excluding the “user age distribution” feature. The feature importance ranking is illustrated in Fig. 5.
Model construction results
Model implementation
This study extracted 34 samples from a dataset of 338 entries to form the test set, with the remaining 304 entries serving as the training set for model training. To objectively evaluate model performance, the XGBOOST model, SVM model, BP neural network, RF model, and GA-XGBOOST model were selected as baseline models for comparison with the model developed in this paper. All six models utilized the selected eight feature indicators as inputs, with the output metric being the dissemination effectiveness of AI deepfake videos. Each model was implemented using Python software with the following parameter settings:
The training set proportion for the PSO-GA-XGBOOST model is set to train_size = 0.900. The GA algorithm parameters are configured as follows: maximum iteration count max_num_iteration = 50, population size population_size = 30, mutation probability mutation_probability = 0.1, elite ratio elit_ratio = 0.01, crossover probability crossover_probability = 0.5, parents_portion = 0.3, and the early termination condition is max_iteration_without_improv = 10. The convergence curve is shown in Fig. 6. Next, based on the optimization results from the GA algorithm, further optimization is performed using the PSO algorithm. The parameters are as follows: swarm size swarmsize = 30, maximum iterations maxiter = 30, variable dimension dim = 4, The lower and upper bounds for the number of trees in the forest (n_estimators) are 50 and 1000, respectively. The lower and upper bounds for the maximum tree depth (root depth) (max_depth) are 3 and 15, respectively. The lower and upper bounds for the minimum leaf node weight (min_child_weight) are 1 and 10, respectively. The lower and upper bounds for the L2 regularization coefficient (reg_lambda) are 0 and 2, respectively.
The comparative base model settings are as follows: First, in the SVM model, the regularization parameter C = 1.0 and the loss function tolerance epsilon = 0.1; Second, in the BP neural network, the hidden layer structure is set to 100 neurons, i.e., hidden_layer_sizes = 100, with a maximum iteration count max_iter = 1000; Third, the number of trees in the RF model is set to n_estimators = 100.
Model performance evaluation and comparison
The numerical fitting results of the PSO-GA-XGBOOST model, XGBOOST model, SVM model, BP neural network, RF model, and GA-XGBOOST model for predicting the propagation effects of AI deepfake videos in the test dataset are shown in Fig. 7. As shown in Fig. 7, the XGBOOST model, SVM model, BP neural network, and RF model exhibit significant errors. Compared to the PSO-GA-XGBOOST model, the GA-XGBOOST model also demonstrates minor errors. In summary, the PSO-GA-XGBOOST model achieves the best fitting accuracy with the lowest error rate.
Table 3 compares the RMSE (Root Mean Square Error), MAPE (Mean Absolute Percentage Error), MAE (Mean Absolute Error), and R² (Coefficient of Determination) of the six models’ predictions on the test set. As shown in Table 3, based on the four evaluation metrics—RMSE, MAPE, MAE, and R²—the PSO-GA-XGBOOST model demonstrates greater convergence and stability compared to the XGBOOST model, SVM model, BP neural network, RF model, and GA-XGBOOST model. In summary, the PSO-GA-XGBOOST model demonstrates significantly superior prediction accuracy and capability compared to XGBOOST, SVM, BP neural network, RF, and GA-XGBOOST models. It achieves an average improvement of 42.95% across the four evaluation metrics, making its application for predicting the dissemination of AI deepfake videos both reasonable and scientifically sound.
Interpretability analysis of AI deepfake video dissemination prediction models
As a distributed gradient boosting model, XGBOOST can determine the importance level of each feature, intuitively demonstrating the “contribution” each feature makes to enhancing the decision tree during the modeling process, thereby offering strong interpretability. The PSO-GA-XGBOOST model yields optimal weight values and relative contribution rates for feature indicators, as illustrated in Fig. 8. As shown in Fig. 8, the optimal weight values and relative contribution rates of the eight selected feature indicators can be categorized into three tiers: high, medium, and low.
(1) Key indicators for high-level contribution. The disseminator popularity and the account technical content verticality demonstrate significant contributions, each exceeding 15% relative contribution rate. Among these, disseminator popularity exhibits the highest relative contribution rate at 32.24%, with the largest optimal weight value of 1.162—the sole indicator exceeding a weight value of 1.
(2) Moderately contributing feature indicators. The relative contribution rates of the four feature indicators—video duration, video title length, audiovisual matching dissemination technology level, and number of video tags—range between 5% and 15%, with optimal weight values between 0.3 and 0.5. These are classified as moderately contributing feature indicators.
(3) Low-level contribution indicators. The relative contribution rates of video theme category and visual dissemination technology level are below 5%, with optimal weight values below 0.2, classifying them as low-contribution indicators.
Feature indicators can be aggregated by their respective dimensions to yield the relative contribution rate and optimal weight value for each dimension, as shown in Fig. 9. As illustrated in Fig. 9, for the AI deepfake video dissemination prediction model constructed in this study, the influence weights of the information subject factor and information factor are relatively large, while those of the information technology factor and information environment factor are relatively small. The empirical results indicate that the dissemination of AI deepfake videos in the Web 3.0 era continues to adhere to the principle of “content is king”, reflecting similarities with the propagation of cultural UGC during the Web 2.0 era8.
Conclusion
Addressing the risk of uncontrolled dissemination of AI deepfake videos in entertainment scenarios, this study effectively compensates for existing video dissemination prediction models that primarily focus on PGC and UGC content while lacking personalized consideration for AIGC deepfake videos in the Web 3.0 era. This study adopts an entertainment computing perspective, grounding its theoretical framework in the information ecosystem. It identifies key predictive factors for AI deepfake video dissemination across four dimensions: information subject, information, information technology, and information environment. Additionally, it proposes a quantitative methodology for measuring the dissemination impact of AI deepfake videos. Next, feature selection is performed using the RFECV-GA-PSO-RF ensemble model to obtain core features for training the deepfake video propagation prediction model. Finally, we propose a PSO-GA-XGBOOST ensemble prediction model to forecast the dissemination of AI deepfake videos. Concurrently, leveraging XGBOOST’s inherent interpretability, we accurately identify key feature indicators and dimensions influencing the spread of AI deepfakes, thereby revealing the underlying logic governing their propagation. The proposed ensemble prediction model not only provides novel predictive tools for the field of entertainment computing but also offers quantitative decision support for dissemination regulation and content ecosystem optimization in the era of intelligent entertainment. This study conducted empirical research by collecting 338 AI deepfake video data points from China’s Bilibili and Douyin platforms. It not only proposes a relatively systematic method for constructing deepfake video datasets but also validates the predictive effectiveness of the model. Finally, based on the interpretability of the ensemble model, it analyzes the relative contribution levels of various feature metrics and dimensions, providing a theoretical basis for predicting and governing the dissemination of AI deepfake videos.
This paper presents a novel approach for predicting the dissemination of AI deepfake videos. Although the proposed model demonstrates strong predictive performance on empirical samples, there remains room for optimization. Future research could explore different theoretical perspectives to further refine the feature metric system, identify additional characteristics aligned with AI deepfake technology, and enhance model performance. In addition, this article only compares the hybrid model with traditional machine learning models and does not compare it with recently prominent models in video popularity or sequence prediction, such as LSTM, GNNs, etc. Future research could comprehensively evaluate the advancement of this model.
Data availability
Data will be made available on request. If needed, please contact Jia Wang via email at wangjia1337@bupt.edu.cn.
References
Du, H. et al. Exploring collaborative distributed diffusion-based AI-generated content (AIGC) in wireless networks. IEEE Netw. 38 (3), 178–186 (2024).
Li, Z. et al. A survey on multimodal deepfake and detection techniques. J. Comput. Res. Dev. 60 (6), 1396–1416 (2023).
Mustak, M. et al. Deepfakes: Deceptions, mitigations, and opportunities. J. Bus. Res. 154, 113368 (2023).
He, K. et al. Cognitive Rashomon effect manufacturing: a case study of deepfake in russia-ukraine conflict. J. Mass. Soc. 1, 88–96 (2023).
Lee, Y. et al. To believe or not to believe: framing analysis of content and audience response of top 10 deepfake videos on YouTube. J. Cyberpsych Beh Soc. N. 24 (3), 153–158 (2021).
O’Donnell, N. Have we no decency? Section 230 and the liability of social media companies for deepfake videos, U. Ill. Law Rev. 2, 701–740 (2021).
Lasswell, H. D. The structure and function of communication in society. Comm. Ideas. 37 (1), 136–139 (1948).
Ni, Y. et al. Prediction of cultural UGC communication effectiveness from the perspective of multi-source heterogeneous data fusion: a combination modeling of GRA-PSO-WRF method. Manag Rev. 36 (11), 235–247 (2024).
Hsieh, J. K., Hsieh, Y. C. & Tang, Y. C. Exploring the disseminating behaviors of eWOM marketing: persuasion in online video. Electron. Commer. Res. 12 (2), 201–224 (2012).
Chaiken, S. Heuristic versus systematic information processing and the use of source versus message cues in persuasion. J. Pers. Soc. Psychol. 39 (5), 752–766 (1980).
Zhou, T., Liu, J. & Deng, S. Online knowledge video dissemination effects based on heuristic-systematic model. J. Mod. Inf. 44 (08), 61–68 (2024).
Fu, S., Su, Y. & Sun, J. Factors that influence the dissemination effects of short videos of refuting rumors based on heuristic-systematic model. J. Chn Soc. Sci. Tech. Inf. 43 (04), 457–469 (2024).
Hu, B. & Feng, C. Influencing factors of propagation effect of science short videos from a cognitive perspective. Stud. Sci. Sci. 41 (10), 1755–1764 (2023).
Ding, D. & Li, X. An empirical study on influencing factors of video dissemination of university library in bilibili. Libr. Inf. Serv. 67 (21), 63–72 (2023).
Cen, C. H. et al. Enhancing the dissemination of Cantonese Opera among youth via bilibili: a study on intangible cultural heritage transmission. Hum. Soc. Sci. Commun. 11 (1), 1038 (2024).
Petty, R. E. et al. The elaboration likelihood model of persuasion, Berkowitz L. advances in experimental social psychology 123–205 (academic, 1986).
Fu, S. & Cheng, Q. Research on the influencing factors of false short video dissemination from the perspective of content emotion: based on CAC and ELM dual models. Inf. Doc. Serv. 46 (02), 61–69 (2025).
Hou, Z. et al. A study on dissemination effect and influencing factors of short health education videos. Chn J. Health Educ. 40 (10), 913–918 (2024).
Liu, J. F., Lu, C. Y. & Lu, S. J. H. Research on the influencing factors of audience popularity level of COVID-19 videos during the COVID-19 pandemic. Healthcare 9 (9), 1159 (2021).
Ma, X. et al. Video popularity prediction model based on attention and neural network. J. Hefei Univ. Technol. Nat. Sci. 46 (11), 1472–1478 (2023).
Zhong, Z. et al. Modeling dynamics of online short video popularity based on Douyin platform. J. Univ. Electron. Sci. Technol. China. 50 (05), 774–781 (2021).
Brewer, S. M., Kelley, J. M. & Jozefowicz, J. J. A blueprint for success in the US film industry. Appl. Econ. 41 (5), 589–606 (2009).
Zhang, W. & Skiena, S. S. Improving Movie Gross Prediction through News Analysis,2009 IEEE/WIC/ACM Int. Conf. on Web Intel. WI 2009 Milan Italy September 2009 Main Conference Proceedings 15–18. (2009).
Dellarocas, C., Zhang, X. M. & Awad, N. F. Exploring the value of online product reviews in forecasting sales: The case of motion pictures. J. Interact. Mark. 21(4), 23–45 (2007).
Cho, M., Jeong, D. & Park, E. Predicting popularity of short-form videos using multi-modal attention mechanisms in social media marketing environments. J. Retail Consum. Serv. 78, 103778 (2024).
Wu, W. et al. Deep attention video popularity prediction model fusing content features and Temporal information. J. Comput. Appl. 41 (7), 1878–1884 (2021).
Li, J., Guan, H. & Zhang, S. Modeling on playback volume prediction of self-produced programs of Chinese video websites. J. Inf. Commun. 24 (6), 7–20 (2017).
Zhu, H. et al. Study of short video popularity prediction based on network representation learning. J. Chn Soc. Sci. Tech. Inf. 43 (09), 1105–1115 (2024).
Lin, Y. T., Yen, C. C. & Wang, J. S. Video popularity prediction: an autoencoder approach with clustering. IEEE Access. 8, 129285–129299 (2020).
Halim, Z., Hussain, S. & Ali, R. H. Identifying content unaware features influencing popularity of videos on youtube: A study based on seven regions,Expert. Syst. Appl. 206, 117836 (2022).
Sangwan, N. & Bhatnagar, V. A framework for video popularity forecast utilizing metaheuristic algorithms. Arab. J. Sci. Eng. 47 (2), 2077–2094 (2022).
Chang, N. et al. An empirical study on the trigger mechanism of public opinion communication of hot events in social media. Inf. Sci. 41 (11), 120–127 (2023).
Dou, Y. et al. Research on the influence of dissemination and interaction of public security affairs Douyin accounts based on information ecology theory and algorithm recommendation. Oper. Res. Manage. Sci. 33 (5), 9–15 (2024).
Xu, X. & Zhao, Z. Research on demand characteristics and participation behavior of intangible cultural heritage information of short video users——taking Huangmei Opera short video online review as an example. J. Mod. Inf. 42 (8), 74–84 (2022).
Wang, C. & Mang, L. How government short videos gain viral impact: a content analysis of government Douyin accounts. E-Gov 7, 31–40 (2019).
Huang, C. The development status and trend of short videos under the integration background. Front 23, 40–47 (2017).
Welbourne, D. & Grant, W. Science communication on youtube: factors that affect channel and video popularity. Public. Underst. Sci. 25 (6), 706–718 (2016).
Zhao, L. The social construction of algorithmic practice: take an information distribution platform as an example. Sociol. Stud. 37 (4), 23–44 (2022).
Yang, D., Li, S. & Cong, Y. Research on influencing factors of transmission effect of reading promotion short videos on TikTok. Res. Libr. Sci. 23, 34–44 (2021).
Liu, G. & Wang, X. The influence of title features on the effect of digital media content communication——based on an empirical study of WeChat official accounts title in news commentary. J. Commun. Rev. 73 (6), 29–39 (2020).
Kui, J., Wang, L. & Liu, Y. A study on factors influencing users’ book purchase intentions via short videos. Chn Publ J. 6, 8–14 (2020).
Zhang, Y. et al. Empirical study on influencing factors of communication effect of scientific journal video on bilibili. Chn J. Sci. Tech. Period. 35 (08), 1125–1133 (2024).
Shen, H. et al. Research on the dissemination effectiveness of emergency knowledge short videos:an analysis based on different types of publishers,Inf. Stud. Theor. Appl. 47 (11), 101–110 (2024).
Ning, H. & Yang, W. Empirical analysis of factors influencing the communication effectiveness of major public health emergencies: a case study of health-related government accounts on Douyin. Mod. Commun. 43 (1), 147–151 (2021).
Meng, S. et al. Research on the impact of COVID-19 science information dissemination on social media and the factors influencing the choice of crisis coping strategies:an empirical analysis based on popular micro-blog texts of scientists’ groups. Libr. Inf. Serv. 66 (13), 91–101 (2022).
Liu, C., Chen, M. & Yi, L. Information features extraction and the correlation calculation of the deep fake videos. J. Intell. 43 (8), 92–101 (2024).
Jiang, J. & Wang, W. Research on government Douyin for public opinion of public emergencies: comparison with government microblog. J. Intell. 39 (1), 100–106 (2020).
Yang, H., Li, X. & Hu, Z. Survey of deepfake face generation and detection technologies. J. Huazhong Univ. Sci. Technol. 53 (05), 85–103 (2025).
Krippendorff, K. Agreement and information in the reliability of coding. Commun. Methods Meas. 5 (2), 93–112 (2011).
Chen, W. & Zhou, Y. An empirical study on factors influencing dissemination effect of short videos in popular science journals in china: focusing on 50 Chinese outstanding popular science journals in 2020. Chn J. Sci. Tech. Period. 34 (12), 1616–1622 (2023).
Acknowledgements
This research was funded by the Social Science Foundation of Beijing, China(NO.25JCC128).
Author information
Authors and Affiliations
Contributions
Xiaofei Ma: Writing - review & editing, Writing - original draft, Conceptualization.Jia Wang: Writing - review & editing, Writing - original draft, Validation, Methodology, Conceptualization.Enyu Ji: Writing - review & editing, Conceptualization.Zhongyu Wang: Writing - review & editing, Conceptualization.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ma, X., Wang, J., Ji, E. et al. Prediction model for the dissemination of AI-generated deepfake videos in the intelligent entertainment paradigm. Sci Rep 16, 4733 (2026). https://doi.org/10.1038/s41598-025-34789-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-34789-4











