Introduction

Under the Web 3.0 technological paradigm, Generative Artificial Intelligence (GAI) is reshaping the production logic of entertainment content, forming a new content ecosystem characterized by the triadic symbiosis of “technology-entertainment-users”. Artificial Intelligence Generated Content (AIGC) has emerged as a novel content generation model following Professional Generated Content (PGC) and User Generated Content (UGC)1. As a quintessential manifestation of GAI technology, deepfake technology employs generative adversarial networks (GAN) and diffusion models to achieve hyper-realistic forgeries of multimedia elements like faces and voices2. The resulting deepfake videos have become the most controversial technological artifacts in the digital entertainment sphere.This type of video profoundly influences contemporary entertainment practices through three key entertainment attributes: technological entertainment (creating surreal audiovisual experiences via algorithms), social entertainment (sparking viral dissemination on short-form video platforms), and ethical entertainment (deconstructing public figures’ images through playful satire). However, this technology-driven entertainment innovation faces a dual paradox. On the one hand, deepfake videos satisfy users’ primal craving for sensational entertainment through “technological deception”, spawning novel entertainment formats like “deepfake celebrity impersonation shows” and “AI face-swap variety shows” on platforms like TikTok and YouTube. On the other hand, the black-box nature of their algorithms blurs the boundaries of entertainment authenticity, triggering governance crises such as the erosion of news veracity, disordered communication systems, and failed public discourse guidance3. This contradiction creates a unique research tension in the field of entertainment computing: how to achieve a dynamic equilibrium between “entertainment innovation” and “dissemination security” through technological means? Consequently, effectively controlling the dissemination of AI deepfake videos and predicting their dissemination trends have become critical issues requiring urgent resolution. They also represent new challenges posed by emerging entertainment paradigms in the AI era. Notable cases include: in March 2025, AI-generated propaganda featuring Academician Zhang Boli of the Chinese Academy of Engineering promoting skincare products circulated; in January 2024, a deepfake video of Hong Kong SAR Chief Executive John Lee Ka-chiu selling investment products began spreading; in May 2023, deepfake videos emerged showing Democratic candidate Hillary Clinton endorsing Republican candidate DeSantis and Biden expressing dissatisfaction with transgender individuals, severely disrupting the 2024 U.S. presidential election; in March 2023, YouTube circulated a deepfake video of Ukrainian President Zelenskyy surrendering to Russia4.

The widespread dissemination of AI deepfake videos and their potential societal impacts cannot be overlooked, including misleading public perception, damaging personal reputation, undermining social trust, and even threatening national security5. Once deployed in great power competition, deepfake videos will pose a significant potential threat to national security, social stability, and public trust when used as information weapons in the form of information warfare. Specifically, at the national security level, deepfake videos involving state leaders could severely damage national image and disrupt the international relations landscape, while deepfake military operation videos could influence military decision-making and arms control. At the social stability level, deepfake videos containing financial insider information or economic policy content could undermine economic order, while deepfake videos depicting ethnic discrimination or violence could threaten public safety. At the level of public trust, deepfake videos concerning critical issues like human rights and ethnicity could undermine citizens’ political identity and trigger a crisis of public trust6.

To address the aforementioned issues, this study proposes an AI deepfake video dissemination prediction method based on the information ecosystem and PSO-GA-XGBOOST. First, feature elements for predicting AI deepfake video dissemination are identified using the information ecosystem theory. Next, the RFECV-GA-PSO-RF combined model is employed to screen these features, yielding core features for training the deepfake video dissemination prediction model. Finally, the PSO-GA-XGBOOST combined prediction model forecasts the dissemination of AI deepfake videos. Concurrently, leveraging XGBOOST’s inherent interpretability, it accurately identifies key feature indicators and dimensions influencing deepfake video dissemination, thereby revealing the underlying logical patterns governing their dissemination. This model offers three key advantages. First, during the feature element identification process, it integrates considerations of the technical characteristics that distinguish AI deepfake videos from other PGC and UGC videos, grounded in the theoretical framework of information ecosystems. This approach ensures a more comprehensive and targeted feature element identification process. Second, during feature selection, the RFECV-GA-PSO-RF combined model can systematically identify the feature subset that contributes most significantly to the target variable. This approach enables it to escape local optima, explore a broader feature space, and simultaneously enhance feature selection efficiency. Third, during predictive model construction, the PSO-GA-XGBOOST combined prediction model effectively addresses challenges such as XGBOOST’s feature selection difficulties, complex parameter tuning, and high overfitting risks, thereby enhancing the accuracy of predicting the dissemination of AI deepfake videos.

Literature review

Research on factors affecting video dissemination effectiveness

Existing research on factors influencing video dissemination effectiveness primarily revolves around three theoretical frameworks: the Lasswell 5 W communication model, the heuristic-systematic model, and the elaboration likelihood model. The Lasswell 5 W Communication Model, proposed by American scholar Harold Lasswell in 19487, is a communication process analysis framework. This model identifies five fundamental elements in the communication process: the communicator, the message, the channel, the audience, and the response. Scholars have applied this theory to analyze the communication effects of cultural UGC videos8, online videos9, and other similar content. The Heuristic-Systematic Model (HSM) is an information processing model proposed by psychologist Chaiken S. in 198010 to explain users’ thinking and behavior when receiving and processing persuasive information. This model comprises two components: heuristic cues and systematic cues. Scholars have applied this theory to analyze the dissemination effects of various online knowledge-based videos11, rumor-debunking short videos12, science popularization short videos13, university library videos14, and Cantonese opera videos15. The Elaboration Likelihood Model (ELM) is an information processing model proposed by American psychologists Richard E. Petty and John T. Cacioppo in 198616. It explains the fundamental process by which users are persuaded and change their attitudes. The model comprises two components: the central route and the peripheral route. Scholars have applied this theory to analyze the dissemination effects of false short videos17, health science popularization short videos18, and other content. Additionally, scholars have analyzed factors influencing video dissemination effectiveness based on information diffusion models and technology diffusion models, with research subjects including COVID-19 videos19.

Research on methods for predicting video dissemination effectiveness

Regarding the prediction of video dissemination effectiveness, existing research primarily quantifies this through metrics such as video likes, comments, saves, and shares. Different studies vary in their summarization of this composite indicator, with related terms including video popularity20,21. In this study, we uniformly summarize these metrics as video dissemination effectiveness. Regarding prediction methods for video dissemination effectiveness, the primary approaches encompass three categories of predictive modeling: statistical models, time series regression models, and machine learning models. Statistical models primarily employ least squares and linear regression for forecasting. For instance, Stephanie M. Brewer utilized prior least squares regression to reveal the relationship between video dissemination effectiveness and factors such as budget and reviews22; Wenbin Zhang integrated textual features into linear regression analysis, achieving a significant improvement in video dissemination effectiveness predictive performance23. Time series regression models predict outcomes by examining correlations before and after video dissemination. For instance, C. Dellarocas developed a time series prediction model grounded in diffusion theory, accounting for word-of-mouth’s impact on movie dissemination effectiveness, with results demonstrating superior predictive power compared to baseline models24. Machine learning models gradually improve prediction accuracy based on large amounts of training data, minimizing the error between predicted outputs and true labels. They aim to predict the propagation effects of new video data, encompassing models such as attention-based prediction models25,26, backpropagation neural networks27, network representation learning algorithms28, autoencoder algorithms29, and support vector machines30,31.

In summary, existing research has explored the factors influencing video dissemination effectiveness and predictive methodologies, yet several limitations remain. First, current studies on factors affecting video dissemination effectiveness primarily focus on PGC and UGC content, lacking analysis of AIGC. This fails to bridge the transition from the Web 2.0 era to Web 3.0, and neglects personalized consideration of emerging entertainment paradigms in the AI era. Second, due to the technical distinctiveness of AI deepfake videos compared to other video formats, existing video dissemination prediction models exhibit limitations in forecasting deepfake video dissemination. Their limited applicability results in suboptimal prediction outcomes. Therefore, there is an urgent need to integrate considerations of AI technological characteristics, identify key dissemination features of AI deepfake videos, and subsequently develop a high-precision prediction model for deepfake video dissemination.

Methodological framework

This paper first identifies the characteristic elements of deepfake video dissemination based on the information ecosystem theory. Subsequently, it collects multi-source data and extracts features for each indicator while proposing a quantitative method for assessing the dissemination effectiveness of deepfake videos. Next, it employs a combined RFECV-GA-PSO-RF model to screen features, obtaining core features for training the deepfake video dissemination prediction model. Finally, it proposes a PSO-GA-XGBOOST combined prediction model to forecast the dissemination of AI deepfake videos. Simultaneously, leveraging XGBOOST’s inherent interpretability accurately identifies key feature indicators and dimensions influencing AI deepfake video dissemination. This approach profoundly reveals the underlying logic governing such dissemination. By enabling predictable dissemination of deepfake videos, it ensures AI technology remains secure and controllable. This facilitates governance against the rampant dissemination of AI deepfake videos, balancing AI technological advancement with safety. The research framework is illustrated in Fig. 1.

Fig. 1
figure 1

Research framework diagram.

Identifying key elements of AI deepfake video dissemination based on the information ecosystem theory

The information ecosystem is an artificial system composed of information subject, information, information technology, and the information environment, possessing certain self-regulating functions. Due to the interactions among these elements, they collectively drive information dissemination and influence the ultimate dissemination effectiveness. Consequently, it is frequently employed to analyze the information dissemination process32,33. The information subject serves as both the starting point and endpoint of the dissemination process, encompassing two roles: the sender and the receiver. The sender is primarily responsible for encoding information, selecting appropriate communication channels, and transmitting the message to the receiver. As the originator of the communication activity, the sender determines the nature, form, and method of the message’s delivery. As the endpoint of the communication process, the receiver is responsible for receiving and decoding the information transmitted by the sender. Furthermore, upon receiving the information, the receiver forms a feedback mechanism through responses or interactions, which in turn influences the sender’s subsequent communication behavior8. In the dissemination process of AI deepfake videos, “information” refers to the deepfake video as the disseminated content, “information technology” denotes the technical characteristics of the AI deepfake technology itself, and “information environment” signifies the information ecosystem in which deepfake videos operate. Based on the information ecosystem theory, this paper identifies nine characteristic elements of AI deepfake video dissemination across four feature dimensions: information subject, information, information technology, and information environment. The specific elements and their definitions are presented in Table 1.

The analysis of deepfake video dissemination proposed in this paper based on information ecosystem theory can also be understood as an advancement of the traditional Lasswell “5 W” communication theory in the information age. It employs information ecosystem theory to identify key elements of deepfake video dissemination, while selecting communication effects from Lasswell’s framework as the output variable. The input design incorporates not only Lasswell’s components—communicator, message, channel, and receiver—but also integrates information technology factors. This approach highlights the technical characteristics of generative AI deepfake technology and its pivotal role in the dissemination process.

Table 1 Characteristics and interpretive framework of AI deepfake video dissemination.

Information subject comprise two categories: disseminators and recipients. The attributes of both disseminators and recipients directly influence the effectiveness of video dissemination11. Therefore, this study employs disseminator popularity and user age distribution to represent the attributes of disseminators and recipients, respectively. Disseminator popularity is measured by the number of followers the disseminator possesses on the video platform13, while user age distribution is assessed by the proportion of users aged 30 and below among the platform’s user base34.

Information refers to deepfake video content, where different characteristics of video content exert varying influences on its dissemination. Existing research indicates that video theme categories impact dissemination effectiveness35. Therefore, for deepfake videos, the video theme categories reflects both the content theme and the purpose of fabrication, potentially affecting dissemination outcomes. Regarding video duration, fragmentation is a core characteristic of videos in the information age, as users consume content during fragmented leisure time36. A study based on the YouTube platform revealed a significant negative correlation between video duration and user attention37, with user attention being a key factor influencing information dissemination38. Regarding video title length, existing research indicates its significant impact on dissemination effectiveness39. Video title length influences readership40, thereby affecting video propagation. Concerning the number of video tags, content creators reduce search costs and enhance users’ assessment of content value by adding genre-specific tags during publication41. Studies confirm that tag quantity is a crucial factor influencing video dissemination effectiveness42.

Information technology factors constitute the distinctive element that differentiates the dissemination of AI deepfake videos from other types of video content. The developmental objective of AI deepfake technology lies in generating highly realistic multimodal content such as images and videos. Therefore, when analyzing the impact of this technology on the dissemination effectiveness of deepfake videos, the examination primarily focuses on two aspects: visual dissemination technology factors and audiovisual matching dissemination technology factors. Among these, visual dissemination technology factors primarily refer to image generation and processing techniques, examining the authenticity of video footage, image detail levels, color balance, lighting effects, and consistency in dynamic imagery—that is, the degree to which images appear lifelike and whether they exhibit obvious artificiality. Audiovisual matching dissemination technology factors primarily refer to techniques ensuring temporal, content, and emotional consistency between audio and video content generated via deepfake technology. This evaluates the degree of audio-visual alignment, synchronization, and coordination within videos, reflecting the difficulty in discerning deepfake content.

The information environment refers to the information ecosystem in which deepfake videos exist, specifically the channels through which they are disseminated. It serves as the specific medium for the realization of dissemination activities. Drawing upon Zhang Ying’s research42, this study analyzes how differences in video account characteristics influence the dissemination effectiveness of AI-generated deepfake videos. Analogous to Hu Bing’s mechanism analysis of how content verticality influences video dissemination effectiveness13, this study selects account technical content verticality as the personalized factor distinguishing deepfake video dissemination from other UGC and PGC videos. It quantifies this factor using the proportion of AI-related videos within an account’s content, thereby analyzing the impact of the information environment on deepfake video dissemination.

Quantifying the dissemination effectiveness of AI deepfake videos

Assessing the dissemination effectiveness of AI deepfake videos requires comprehensive consideration of multiple indicator coding factors; it cannot be quantified solely through a single metric. This study proposes to utilize the Bilibili and Douyin platforms as data collection sources for deepfake videos. Therefore, in quantifying video dissemination effects, it references the methodology employed by Shen Hongzhou’s research43, which similarly relies on Bilibili and Douyin as video data collection platforms. This study employs a currently mainstream method for measuring dissemination effectiveness44,45, calculated based on likes, comments, and shares. The specific formula is shown in Eq. (1). This calculation method is applicable for processing integrated dissemination effectiveness data from both Bilibili and Douyin platforms. The constant 1 is included to avoid obtaining a logarithm of zero.

Dissemination Effectiveness

$$\:\:s={ln}(0.5\times\:Number\:of\:shares+0.3\times\:Number\:of\:likes+0.2\times\:Number\:of\:comments+1)\:\:\:\:$$
(1)

Feature selection for AI deepfake video dissemination prediction based on RFECV-GA-PSO-RF

To effectively predict the dissemination trends of deepfake videos, this paper proposes a hybrid feature selection method (RFECV-GA-PSO-RF) based on Recursive Feature Elimination Cross-Validation (RFECV), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Random Forest (RF). Through multi-stage feature selection and optimization, this method eliminates redundant features, enhancing the performance and interpretability of deepfake video dissemination prediction models. The specific steps are as follows.

Step1: Divide the dataset into training and testing sets.

Step2: Use Random Forest (RF) as the base model for preliminary training on the extracted features.

Step3: Evaluate each feature’s contribution to model prediction based on its importance score within the RF model.

Step 4: Gradually eliminate less important features using the RFECV algorithm, evaluating model performance with cross-validation after each removal. Specifically, in each iteration, remove one or more of the least important features, then retrain the model and compute the cross-validation score. Repeat this process until either a preset number of features is reached or model performance no longer improves significantly. Through RFECV, a feature subset is obtained that minimizes the number of features while ensuring model performance.

Step 5: Encode the feature subset selected by RFECV, where each feature corresponds to a gene position. The gene position value is either 0 or 1, indicating whether the feature is selected. Randomly generate a certain number of individuals, each representing a feature combination, to form the initial population. Design a fitness function to evaluate the quality of each individual.Based on the values of the fitness function, individuals with higher fitness are selected for reproduction. New individuals are generated through crossover and mutation operations. The selection, crossover, and mutation operations are repeated until the preset number of iterations is reached or convergence criteria are satisfied. Through GA optimization, the feature space can be further explored to discover more optimal feature combinations.

Step 6: Use the feature combinations optimized by GA as the initial particle swarm, where each particle represents a feature combination. Assign an initial velocity to each particle. Update the velocity and position of each particle based on its own historical optimal position and the global optimal position of the swarm. Repeat the velocity and position update operations until the preset iteration count is reached or convergence criteria are satisfied. Through PSO optimization, further fine-tune the feature combinations to enhance the model’s performance.

AI deepfake video dissemination prediction based on PSO-GA-XGBOOST

This paper employs a PSO-GA-XGBOOST model to fit the selected core features for predicting the dissemination of deepfake videos generated by artificial intelligence. XGBOOST is a powerful gradient boosting tree model, but practical applications often encounter challenges such as difficulty in feature selection, complex parameter tuning, and high risks of overfitting. To address this, this study proposes a hybrid optimization algorithm based on genetic algorithms (GA) and particle swarm optimization (PSO). The hybrid optimization process is similar to that of the aforementioned hybrid optimized random forest algorithm. In this approach, genetic algorithms can retain the features most influential on prediction outcomes, thereby enhancing model accuracy, while eliminating redundant features to reduce model complexity. Particle Swarm Optimization efficiently searches for the optimal hyperparameter combination of XGBOOST within the continuous parameter space. Simultaneously, it avoids getting stuck in local optima, thereby improving the stability of parameter tuning.

Experiments and analysis of results

Data sources and preprocessing

In the Web 3.0 era, the dissemination of deepfake videos is a process involving the interaction of four factors: information subject, information, information technology, and information environment. Relevant data originates from industry research reports, video platform operators, the deepfake videos themselves, and video users. This study references Shen Hongzhou’s analysis of the dissemination effectiveness of emergency knowledge short videos43, similarly selecting Bilibili and Douyin as the two mainstream platforms for video dissemination research. During video data collection, since no vertical channel for “deepfake videos” has been established across these platforms and video labeling lacks standardization and consistency, instances persist where users generate false videos using AI synthesis technology without explicitly indicating “AI” in video titles or tags. Therefore, this study builds upon Liu Chunnian’s deepfake video retrieval methodology46 while further refining search parameters. Keywords including “deepfake”, “AI synthesis”, and “AI-generated” were used to retrieve videos on Bilibili and Douyin. We selected videos from the “Comprehensive Ranking” and “Most Played” lists under each keyword on Bilibili, and from the “Comprehensive Ranking” and “Most Liked” lists under each keyword on Douyin. These videos demonstrate high influence and attention, indicating the samples possess a degree of representativeness. Additionally, these deepfake videos originate from different account entities, suggesting the samples exhibit diversity and heterogeneity.

The selection criteria for the aforementioned deepfake videos are as follows. (1) Authentic videos related to deepfake technology—such as science popularization content, news reports, awareness campaigns, and video generation tutorials—are excluded from this study. (2) Ordinary special effects videos created using video editing and compositing software to add animations, transitions, filters, or other effects that do not alter the fundamental content or character features within the video, and where the effects are relatively easy to distinguish from the real content, are excluded from this study. (3) Videos featuring virtual digital humans are excluded due to significant technical differences from deepfake videos and their generally discernible nature. (4) Duplicate samples obtained from the same video platform under different search conditions are excluded. (5) Duplicate deepfake video samples across different platforms are not excluded. This is because variations in dissemination effectiveness for identical content across platforms effectively illustrate the impact of platform factors, user group characteristics, and account attributes on deepfake video propagation. (6) Samples exhibiting account anomalies—such as deleted disseminator accounts or closed comment sections—are excluded.After screening, the initial dataset comprised 344 deepfake videos, including 248 videos from the Bilibili platform and 96 videos from the Douyin platform. It should be noted that disseminators exhibit preferences when selecting video platforms, and platforms themselves gradually develop distinct content positioning—including preferences for specific video genres—during their evolution. These factors likely contribute to the non-equivalent distribution of video samples across platforms, a phenomenon confirmed by existing dual-platform studies45. Therefore, the non-balanced sample obtained through the aforementioned screening criteria is reasonable and consistent with actual circumstances.

Considering the “long-tail effect” and “seven-day effect” of online information dissemination47, actual observation of video sample data reveals that video data tends to stabilize one month after publication12. Therefore, the collection period for this study’s AI deepfake video dataset spans from November 6, 2024, to December 6, 2024. During the data collection period, sample data was reviewed weekly to verify continued existence. Video samples missing from any of the four monthly sampling instances were excluded. The retained sample data constitutes the video samples for this study, representing deepfake videos deemed capable of stable dissemination and exerting a certain influence on the external environment. Initial sampling identified 344 video samples. After a one-month observation period, the final dataset comprised 338 video samples: 246 from the Bilibili platform and 92 from the Douyin platform. Some of the video samples are shown in Figs. 2 and 3.

Fig. 2
figure 2

Some bilibili platform video samples.

Fig. 3
figure 3

Some douyin platform video samples.

Quantification and normalization of eature elements

Among the AI deepfake video dissemination characteristics identified in Table 1, all technology-related features represent unique elements distinguishing deepfake video dissemination from other UGC and PGC video dissemination. These include three key characteristics: visual dissemination technology level, audiovisual matching dissemination technology level, and account technical content verticality. Among these, the visual dissemination technology level and audiovisual matching dissemination technology level require manual coding for quantification. Currently, detection and identification technologies for deepfake techniques remain in a phase of ongoing development. No software or tools capable of evaluating visual deepfake technology and audiovisual synchronization technology with high accuracy have yet been developed48. Auditory and visual technical features extracted automatically by computers exhibit significant errors, making it difficult to meet the rigorous requirements of empirical research. This study references Fu Shaoxiong’s research methodology17 to quantify the visual and audiovisual matching technical characteristics of deepfake videos through manual coding. The coders comprised five doctoral candidates majoring in Management Science and Engineering at the School of Economics and Management, possessing strong research foundations and information literacy. Following training and trial coding sessions, all coders mastered the coding requirements. During the formal coding phase, each coder was required to score the visual dissemination technology and audiovisual matching dissemination technology of each deepfake video on a scale of 1 to 7 after viewing it. The scores represented the level of deepfake technology, ranging from low to high. Among them, a score of 1–3 indicates that the deepfake technology is at a low level, with relatively crude techniques, low naturalness, and low difficulty in detection; a score of 4 indicates that the deepfake technology is at a medium level, with somewhat improved techniques, acceptable naturalness, but still some flaws, and moderate difficulty in detection; a score of 5–7 indicates that the deepfake technology is at a high level, with more sophisticated techniques, high naturalness, and high difficulty in detection. It is worth noting that a score of 4, as the dividing point for the medium level, may seem limited in range numerically, but this division aligns with the definition of a midpoint in mathematics. Logically, in a 1–7 score range, 4 sits exactly in the middle, concisely representing the characteristics of medium-level deepfake technology. In actual coding operations, coders combine subtle differences in various technical indicators and use 4 as a core reference to carefully identify and reasonably categorize deepfake technologies near the boundaries of medium level, ensuring the accuracy and objectivity of the evaluation results. Therefore, this division is sufficient to meet the precision requirements of this study’s assessment of deepfake technology levels. After coding, consistency checks were performed on the coding results. The Krippendorff’s alpha coefficients for the visual dissemination technology, and audiovisual matching dissemination technology coding results were 0.859 and 0.820, respectively, both exceeding 0.7. This indicates good consistency in the coding results49. Additionally, the feature element of video theme category also required manual tagging coding, with the specific coding details shown in Table 2.

All categorical variables in this study were measured through manual coding. Coders were doctoral candidates in Management Science and Engineering with strong research expertise. Given the objective nature of coding items, a double-blind coding system was employed: initial coding by one individual followed by verification by another. Discrepancies were resolved by a third coder, with final coding undergoing consistency testing. After obtaining quantitative data for all feature elements, maximum-minimum normalization was applied. For the video theme category—an ordinal categorical variable—frequency coding was performed first, followed by normalization.

Table 2 Coding categories for key features in the dissemination of AI deepfake Videos.

Core feature screening results

To achieve better model fitting and enhance the generalization capability of the prediction model, this study employs the RFECV algorithm to identify the optimal number of feature variables. Experimental results indicate that removing one feature variable from the original nine yields stable performance when retaining eight features, as shown in Fig. 4. Therefore, the combined optimized RFECV-GA-PSO-RF algorithm was employed to retain the top 8 most influential features from the 9 listed in Table 1 for subsequent experiments. This involved excluding the “user age distribution” feature. The feature importance ranking is illustrated in Fig. 5.

Fig. 4
figure 4

Iterative process diagram of root mean square deviation.

Fig. 5
figure 5

Feature element importance ranking.

Model construction results

Model implementation

This study extracted 34 samples from a dataset of 338 entries to form the test set, with the remaining 304 entries serving as the training set for model training. To objectively evaluate model performance, the XGBOOST model, SVM model, BP neural network, RF model, and GA-XGBOOST model were selected as baseline models for comparison with the model developed in this paper. All six models utilized the selected eight feature indicators as inputs, with the output metric being the dissemination effectiveness of AI deepfake videos. Each model was implemented using Python software with the following parameter settings:

The training set proportion for the PSO-GA-XGBOOST model is set to train_size = 0.900. The GA algorithm parameters are configured as follows: maximum iteration count max_num_iteration = 50, population size population_size = 30, mutation probability mutation_probability = 0.1, elite ratio elit_ratio = 0.01, crossover probability crossover_probability = 0.5, parents_portion = 0.3, and the early termination condition is max_iteration_without_improv = 10. The convergence curve is shown in Fig. 6. Next, based on the optimization results from the GA algorithm, further optimization is performed using the PSO algorithm. The parameters are as follows: swarm size swarmsize = 30, maximum iterations maxiter = 30, variable dimension dim = 4, The lower and upper bounds for the number of trees in the forest (n_estimators) are 50 and 1000, respectively. The lower and upper bounds for the maximum tree depth (root depth) (max_depth) are 3 and 15, respectively. The lower and upper bounds for the minimum leaf node weight (min_child_weight) are 1 and 10, respectively. The lower and upper bounds for the L2 regularization coefficient (reg_lambda) are 0 and 2, respectively.

Fig. 6
figure 6

Convergence curve of the GA algorithm.

The comparative base model settings are as follows: First, in the SVM model, the regularization parameter C = 1.0 and the loss function tolerance epsilon = 0.1; Second, in the BP neural network, the hidden layer structure is set to 100 neurons, i.e., hidden_layer_sizes = 100, with a maximum iteration count max_iter = 1000; Third, the number of trees in the RF model is set to n_estimators = 100.

Model performance evaluation and comparison

The numerical fitting results of the PSO-GA-XGBOOST model, XGBOOST model, SVM model, BP neural network, RF model, and GA-XGBOOST model for predicting the propagation effects of AI deepfake videos in the test dataset are shown in Fig. 7. As shown in Fig. 7, the XGBOOST model, SVM model, BP neural network, and RF model exhibit significant errors. Compared to the PSO-GA-XGBOOST model, the GA-XGBOOST model also demonstrates minor errors. In summary, the PSO-GA-XGBOOST model achieves the best fitting accuracy with the lowest error rate.

Fig. 7
figure 7

Comparison of prediction results for the test set across different models.

Table 3 compares the RMSE (Root Mean Square Error), MAPE (Mean Absolute Percentage Error), MAE (Mean Absolute Error), and R² (Coefficient of Determination) of the six models’ predictions on the test set. As shown in Table 3, based on the four evaluation metrics—RMSE, MAPE, MAE, and R²—the PSO-GA-XGBOOST model demonstrates greater convergence and stability compared to the XGBOOST model, SVM model, BP neural network, RF model, and GA-XGBOOST model. In summary, the PSO-GA-XGBOOST model demonstrates significantly superior prediction accuracy and capability compared to XGBOOST, SVM, BP neural network, RF, and GA-XGBOOST models. It achieves an average improvement of 42.95% across the four evaluation metrics, making its application for predicting the dissemination of AI deepfake videos both reasonable and scientifically sound.

Table 3 Comparison table of evaluation metrics for test set predictions across different models.

Interpretability analysis of AI deepfake video dissemination prediction models

As a distributed gradient boosting model, XGBOOST can determine the importance level of each feature, intuitively demonstrating the “contribution” each feature makes to enhancing the decision tree during the modeling process, thereby offering strong interpretability. The PSO-GA-XGBOOST model yields optimal weight values and relative contribution rates for feature indicators, as illustrated in Fig. 8. As shown in Fig. 8, the optimal weight values and relative contribution rates of the eight selected feature indicators can be categorized into three tiers: high, medium, and low.

(1) Key indicators for high-level contribution. The disseminator popularity and the account technical content verticality demonstrate significant contributions, each exceeding 15% relative contribution rate. Among these, disseminator popularity exhibits the highest relative contribution rate at 32.24%, with the largest optimal weight value of 1.162—the sole indicator exceeding a weight value of 1.

(2) Moderately contributing feature indicators. The relative contribution rates of the four feature indicators—video duration, video title length, audiovisual matching dissemination technology level, and number of video tags—range between 5% and 15%, with optimal weight values between 0.3 and 0.5. These are classified as moderately contributing feature indicators.

(3) Low-level contribution indicators. The relative contribution rates of video theme category and visual dissemination technology level are below 5%, with optimal weight values below 0.2, classifying them as low-contribution indicators.

Fig. 8
figure 8

Optimal weight values and relative contribution rates of feature indicators.

Feature indicators can be aggregated by their respective dimensions to yield the relative contribution rate and optimal weight value for each dimension, as shown in Fig. 9. As illustrated in Fig. 9, for the AI deepfake video dissemination prediction model constructed in this study, the influence weights of the information subject factor and information factor are relatively large, while those of the information technology factor and information environment factor are relatively small. The empirical results indicate that the dissemination of AI deepfake videos in the Web 3.0 era continues to adhere to the principle of “content is king”, reflecting similarities with the propagation of cultural UGC during the Web 2.0 era8.

Fig. 9
figure 9

Optimal weight values for feature dimensions and their relative contribution rates.

Conclusion

Addressing the risk of uncontrolled dissemination of AI deepfake videos in entertainment scenarios, this study effectively compensates for existing video dissemination prediction models that primarily focus on PGC and UGC content while lacking personalized consideration for AIGC deepfake videos in the Web 3.0 era. This study adopts an entertainment computing perspective, grounding its theoretical framework in the information ecosystem. It identifies key predictive factors for AI deepfake video dissemination across four dimensions: information subject, information, information technology, and information environment. Additionally, it proposes a quantitative methodology for measuring the dissemination impact of AI deepfake videos. Next, feature selection is performed using the RFECV-GA-PSO-RF ensemble model to obtain core features for training the deepfake video propagation prediction model. Finally, we propose a PSO-GA-XGBOOST ensemble prediction model to forecast the dissemination of AI deepfake videos. Concurrently, leveraging XGBOOST’s inherent interpretability, we accurately identify key feature indicators and dimensions influencing the spread of AI deepfakes, thereby revealing the underlying logic governing their propagation. The proposed ensemble prediction model not only provides novel predictive tools for the field of entertainment computing but also offers quantitative decision support for dissemination regulation and content ecosystem optimization in the era of intelligent entertainment. This study conducted empirical research by collecting 338 AI deepfake video data points from China’s Bilibili and Douyin platforms. It not only proposes a relatively systematic method for constructing deepfake video datasets but also validates the predictive effectiveness of the model. Finally, based on the interpretability of the ensemble model, it analyzes the relative contribution levels of various feature metrics and dimensions, providing a theoretical basis for predicting and governing the dissemination of AI deepfake videos.

This paper presents a novel approach for predicting the dissemination of AI deepfake videos. Although the proposed model demonstrates strong predictive performance on empirical samples, there remains room for optimization. Future research could explore different theoretical perspectives to further refine the feature metric system, identify additional characteristics aligned with AI deepfake technology, and enhance model performance. In addition, this article only compares the hybrid model with traditional machine learning models and does not compare it with recently prominent models in video popularity or sequence prediction, such as LSTM, GNNs, etc. Future research could comprehensively evaluate the advancement of this model.