Introduction

In the context of current education reform and comprehensive promotion of quality education, the evaluation of college students’ mental health quality has become a key research field. This process not only focuses on the actual situation of students’ psychological development, but also provides a basis for formulating standards for educational quality1. The establishment of a psychological health quality evaluation system not only helps to supervise students’ academic progress within schools, but also provides important references for the industry to improve education quality standards in the future. Although traditional mental health assessment methods can to some extent reflect students’ psychological state, they usually rely on standardized tests and questionnaire surveys, which are susceptible to subjective bias and have limitations in data processing and analysis abilities2,3. In contrast, Generative Adversarial Network (GAN), as a deep learning algorithm, has stronger data modeling and generation capabilities. GAN can not only learn complex mental health features from large amounts of data, but also generate samples that are highly similar to real data to improve and validate mental health assessment models. The motivation for choosing GAN as the core algorithm component instead of traditional methods or other generative models lies in its unique adversarial training mechanism. This mechanism enables GANs to capture more implicit features when processing non-linear, high-dimensional mental health data, improving the accuracy and reliability of the model. In addition, GAN also has the ability to self improve. With the increase of training data, the quality and diversity of generated samples will continue to improve, thus better reflecting the actual mental health status of college students. These advantages make GAN have significant potential in the evaluation of mental health quality, providing strong technical support for building a more scientific, fair, and transparent evaluation system4.

Text generation methods are categorized into extraction-based and neural network-based approaches5. Extraction-based methods center around learning patterns, often relying on comprehensive knowledge systems built through substantial human effort. While possessing high interpretability and robustness, these models are contingent on the quality of the extracted design and remain prevalent in various commercial applications6. Yuan (2021) proposed an automatic composition generation framework based on extraction7. Zahila et al. (2021) introduced a multi-text summarization model grounded in content extraction. Biswas et al. (2021) presented a Wikipedia generation model employing article extraction and template learning8,9. Contrastingly, neural network-based text generation employs distributed representation values in the network and long-distance dependent context coding10. This approach has gained prominence, particularly with the advent of the Generative Adversarial Network (GAN)11 proposed by Cernis et al. (2022), which has stimulated advancements in countermeasure training—an unsupervised learning mode demonstrating efficacy in tasks such as virtual information cancellation and image super-resolution processing12. Shui et al. (2022) established a new theoretical framework by directly proving the upper and lower limits of target risk based on the joint distribution Jensen Shannon divergence. Additionally, they further derived the bidirectional upper limits of marginal and conditional transitions. The proposed framework exhibited inherent flexibility for different transfer learning problems and could be used in various scenarios13. The theory of Shui et al. (2020) provided a general guideline that unified the principles of semantic condition matching, feature edge matching, and label edge offset correction, and empirically validated the benefits of the framework on real datasets14. Sala et al. (2021) suggested a Maximum Likelihood Enhancement GAN for discrete data, utilizing a logarithmic likelihood-based output to derive a low-variance target, directly quantifying the disparity between the generated and actual data distribution15.

Work in the field of text generation has made some progress in issues such as controllability, correctness, diversity, and others. However, these challenges have not been completely resolved, especially in neural network-based text generation models. Nevertheless, this current state of affairs leaves significant room for further advancements in subsequent related work. This study innovatively applies the GAN and Sequence Generative Adversarial Network (SeqGAN) from the realm of deep learning to efficiently evaluate the quality of students’ mental health. Throughout the evaluation process, a novel approach is introduced by incorporating the Reward function. This pioneering research not only focuses on the assessment of mental health quality but also represents a significant endeavor and breakthrough at the methodological and technological levels, contributing valuable outcomes to the advancement of the academic domain.

Method and model design

Gan and SeqGAN

Gan

The GAN functions as a structural framework rather than a specific network model16. The schematic representation of this network is delineated in Fig. 1.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

The overall structure of the GAN framework.

In Fig. 1, the architecture of the GAN is depicted, consisting primarily of two components: the “Generator” and the “Discriminator”17. Within this configuration, the generator’s sole responsibility is to produce data conforming to the authentic sample distribution without formal constraints18. While the generator can function independently of a discriminator, the training objectives diverge. When operating autonomously, the generator necessitates the definition of a loss function. Nevertheless, manually defined loss functions exhibit limited representational capacity and are often unilaterally focused, resulting in model convergence and outcomes that may fall short of expectations, particularly in tasks related to text processing. GAN offers a partial remedy to this issue. The role of the discriminator within this framework is akin to that of a dynamically self-updating objective function. Its primary objective is to minimize the probability of the generated sample being classified as authentic and the probability of the genuine sample being deemed false. Adversarial training dictates that the updates to the generator are contingent upon the discriminator’s outcomes, aiming to minimize the distribution disparity between the generated and authentic samples19.

Within the GAN framework, the generator is conceptualized as a function \(\:G( \cdot )\). The sample distribution generated by \(\:G( \cdot )\) is denoted as pg(x), while the distribution of real samples is denoted as \(\:{p}_{data}\left(x\right)\). The discriminator is abstracted as a function \(\:D( \cdot )\), with its output being a scalar representing a binary outcome. Given sufficient data sampling, \(\:D( \cdot )\) can implicitly characterize the distribution gap between pg\(\:\left(x\right)\) and \(\:{p}_{data}\left(x\right)\) within the sample space20.

The overarching optimization objective of GAN is to determine a generating network G*, as illustrated in Eq. (1):

$$\:{G}^{*}=arg\underset{G}{{min}}\underset{D}{{max}}V(G,D)$$
(1)

The function \(\:V(G,D)\) serves as a metric for discriminator discrimination, where \(\:\underset{D}{{max}}V(G,D)\) represents the distinction between real data \(\:{p}_{data}\left(x\right)\) and generated data pg\(\:\left(x\right)\), as expressed in Eq. (2):

$$\:V\left(G,D\right)={IE}_{x\sim{p}_{data}\left(x\right)}\left[logD\left(x\right)\right]+{IE}_{x\sim{p}_{g}\left(x\right)}[1-logD(x\left)\right]$$
(2)

The solution to the GAN is bifurcated into two sequential phases. Initially, \(\:{D}^{*}\) is determined to maximize the discriminator result, followed by the subsequent step, where \(\:{G}^{*}\) is aspired to minimize the gap between pg\(\:\left(x\right)\) and \(\:{p}_{data}\left(x\right)\).

In the event that the discriminator function is continuous, leveraging the relationship between the probability density function and the expectation function allows the transformation of Eq. (2) into Eq. (3):

$$\:V\left( {G,D} \right) = \int {\left[ {{p_{data}}(x)logD(x) + {p_g}\left( x \right){\text{log}}(1 - D(x))} \right]dx}$$
(3)

With the generator held constant and the discriminator reaching its maximum value, the polynomial in Eq. (3) undergoes the process of derivation and attains the extreme value. The ensuing analysis of the discriminator is elucidated in Eq. (4):

$$\:{D}^{*}\left(x\right)=\frac{{p}_{data}\left(x\right)}{{p}_{data}\left(x\right)+{p}_{g}\left(x\right)}$$
(4)

The solution derived from Eq. (4) is subsequently substituted into \(\:\underset{D}{{max}}V(G,D)\), yielding the expression delineated in Eq. (5):

$$\:V\left(G,{D}^{*}\right)=-log4+2*JSD\left({p}_{data}\right(x\left)\right|\left|{p}_{g}\right(x\left)\right)$$
(5)

In Eq. (5), when the distribution generated by the Generator, denoted as \(\:{p}_{g}\left(x\right)\), is entirely inconsistent with the real distribution \(\:{p}_{data}\left(x\right)\), the Jensen-Shannon Divergence (JSD) remains constant and equals log2. When the two distributions are identical, the JSD is zero, signifying that the Generator has reached its optimum.

Furthermore, the process of minimizing the real distribution and the generated distribution is the process of solving for \(\:{G}^{*}\).

$$\:\underset{\text{D}}{{max}}V(G,D)=L\left(G\right)$$
(6)

Then:

$$\:{G}^{*}=\underset{\text{D}}{{max}}V(G,D)=L\left(G\right)$$
(7)

Subsequently, the gradient descent method is employed to identify the optimal solution G that satisfies the given equation.

SeqGAN

The inherent structure of the GAN exhibits specific limitations. Conceived primarily for the generation of continuous, real-valued data, the native GAN architecture encounters challenges when tasked with directly generating discrete sequences. Notably, it struggles to propagate gradient updates from the discriminant network to the generating network effectively21,22.

As a remedy, the SeqGAN model is employed for the assessment of college student’s mental health quality evaluation texts23. This model conceptualizes the sequence generation process as a decision-making endeavor and incorporates reinforcement learning. Within this framework, the generated network assumes the role of a reinforcement agent, while the generated discrete sequences are construed as the current state. The action taken determines the subsequent generated word example24,25. Notably, the generator network solely engages in gradient optimization on the strategy, relying on the discriminator’s scores pertaining to the generator’s output sequence without directly computing the strategy outcomes. The architectural configuration of the SeqGAN model is depicted in Fig. 2.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Structure of SeqGAN.

In Fig. 2, the fundamental structure of SeqGAN aligns with the original GAN framework, where both real and generated data are amalgamated for discriminator training. Subsequent to receiving feedback from the discriminator, the generator undergoes updates26. The pivotal components of this structure entail reward computation and strategy gradient updating:

The primary objective of the generator is to maximize the cumulative rewards over the entire generative sequence, as expressed in Eq. (8):

$$\:J\left( {\theta \:} \right) = E\left[ {{R_T}|{s_0},\theta } \right] = \sum\nolimits_{_{{y_1} \in \:Y}} {{G_{\theta \:}}\left( {{y_1}|{s_0}} \right)Q_{{D_{\varphi {\kern 1pt} }}}^{{G_{\theta {\kern 1pt} }}}\left( {{s_0},{y_1}} \right)}$$
(8)

Within Eq. (8), \(\:{R}_{T}\) signifies a comprehensive sequence of rewards and \(\:{Q}_{{D}_{\varphi\:}}^{{G}_{\theta\:}}(s,a)\) represents a sequence of action-value functions. The REINFORCE algorithm is employed to estimate the action value function, utilizing the estimated probability from the discriminator as a reward, as delineated in Eq. (9):

$$\:{Q}_{{D}_{\varphi\:}}^{{G}_{\theta\:}}\left(s={Y}_{1:T-1},a={y}_{T}\right)={D}_{\varphi\:}\left({Y}_{1:T}\right)$$
(9)

However, the discriminator can only yield valid output when the sequence is complete. Given that sentence generation endeavors to optimize long-term returns, the roll-out strategy employs a Monte Carlo search to acquire T-t word examples, a process subsequently iterated27.

Text generation of college students’ mental health quality evaluation based on SeqGAN

Text generation guided by abstract semantics

In order to achieve targeted text generation for psychological quality evaluation, this study introduces a new variable, , and a hidden variable \(\:{h}_{0}\) within each network. \(\:{h}_{0}\) is associated with a specific attribute that the sentence aims to control, while h governs other attributes. Both the generator and discriminator are equipped with an encoder E to update the distribution of \(\:{h}_{0}\)28,29. The constructed model is presented in Fig. 3.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Process of the training algorithm.

In Fig. 3, the text generation process guided by abstract semantics unfolds through five distinct steps:

  1. 1)

    Data Processing: Involves conventional text preprocessing, encompassing tasks such as text content segmentation, extraction of corresponding indicators, word segmentation, removal of noise words, and establishment of a dictionary.

  2. 2)

    Initialization: The generator, discriminator, and data loader are initialized based on several hyperparameters derived from preprocessing, such as dictionary size and word index list. Additionally, a corresponding metric object is instantiated.

  3. 3)

    Generator Pre-training: The pre-training of the generator assumes significant importance, with the discriminator influencing parameter adjustments during adversarial training. Adequate pre-training ensures model stability. The adjustment direction of the generator’s pre-training parameters is contingent solely upon the dataset. The maximum likelihood estimation method is employed for monitoring the training process.

  4. 4)

    Discriminator Pre-training: Supervision signals provided by the pre-trained discriminator more effectively aid the generator in making adjustments. Discriminator pre-training involves supervised training, where both real text and text generated based on random semantic input are inputted and labeled as true or false for training.

  5. 5)

    Counter Training: In the adversarial process, the generator functions as an agent for reinforcement learning, utilizing the scores assigned by the discriminator as rewards. The generator employs strategy gradients for updates.

Discriminator structure

The primary task of the discriminator is to ascertain the authenticity of a given sentence as real data. Throughout the supervised training process, data encoding with the true tag encompasses both textual and abstract semantic content.

The SeqGAN model employs a Convolutional Neural Network (CNN) as the discriminator, a choice validated for its effectiveness in text classification30. Further details regarding the discriminator’s training process are provided in Table 1.

Table 1 Pseudocode for discriminator.

Generator structure

The primary objective of the generator is to generate a sentence with a high reward value, maximizing the likelihood of being deemed authentic by the discriminator, as articulated in Eq. (8).

In order to address common challenges such as gradient vanishing and exploding gradients during reverse propagation, the generator adopts the Long Short-Term Memory (LSTM) structure31. The model iteratively utilizes an update function g to map the word embedding vector representation of the input sequence \(\:\{{x}_{1},{x}_{2},\dots\:,{x}_{T}\}\) to the hidden vector \(\:\{{h}_{1},{h}_{2},\dots\:,{h}_{T}\}\). The function g is formally expressed in Eq. (10):

$$\:{h}_{T}=g({h}_{t-1},{x}_{t})$$
(10)

The word distribution probability of the softmax output layer z, mapping the hidden state to the output, is delineated in Eq. (11):

$$\:p\left({y}_{t}|{x}_{1},{x}_{2},\dots\:,{x}_{t}\right)=z\left({h}_{t}\right)=softmax(c+V{h}_{t})$$
(11)

Within Eq. (11), the parameter c denotes the bias term, and V signifies the weight matrix.

The training process of the generator is bifurcated into two stages: pre-training and adversarial training.

In the supervised pre-training phase, each step of real data word examples is integrated into the encoding of abstract semantics. This integration aims to reinforce the association between word examples and their corresponding abstract semantics, facilitating the swift determination of the generation direction in subsequent processes. The generator’s output is a selection probability, with loss calculated according to Eq. (12). Here, \(\:{p}_{data}\) represents a unique heat distribution:

$$\:L = - \sum\nolimits_{t = 1}^T {\left( {{y_t}|{Y_{1:t - 1}}} \right) \cdot {G_{\theta \:}}{p_{data}}\left( {{y_t}|{Y_{1:t - 1}}} \right)}$$
(12)

During unsupervised adversarial training, the input of each step of case reasoning undergoes encoding using abstract semantics—a process converse to supervised pre-training. The objective is to leverage abstract semantics for the identification of more consistent word examples. Penalties are additionally imposed based on rewards. As depicted in Eq. (13), when the real data and generation probability remain constant, higher rewards result in smaller losses, whereas lower rewards correspond to greater losses:

$$\:L = - \sum\nolimits_{t = 1}^T {{p_{data}}\left( {{y_t}|{Y_{1:t - 1}}} \right) \cdot {G_{\theta \:}}{p_{data}}\left( {{y_t}|{Y_{1:t - 1}}} \right) \cdot \:Q_{{D_{\varphi {\kern 1pt} }}}^{{G_{\theta {\kern 1pt} }}}\left( {{Y_{1:t - 1}},{y_t}} \right)}$$
(13)

Calculation of reward

The primary function of the reward is to incentivize generators to produce distributions resembling the real text utilized in discriminator training, particularly during generator updates. When the generated sentence closely aligns with the characteristics of real text, the associated reward for that sentence increases32.

The reward, in this context, is a function that assesses the action value at each step of a given policy π. Employing Monte Carlo sampling, this function generates multiple simulation trajectories starting from the current state. Utilizing a Generator \(\:{G}_{\beta\:}\) identical to \(\:{G}_{\theta\:}\), multiple distinct trajectories may exist under each action. In order to mitigate discrepancies and obtain a more precise evaluation of action values, the generator is executed N times from the current state to the conclusion of the sequence, yielding a batch of output samples. These trajectories, collectively forming a batch, are then employed to compute the average return, thereby estimating the value associated with each action.

The detailed calculation process of the reward is outlined in Table 2.

Table 2 Pseudocode calculated by Reward.

Experimental design

Selection of data sets

The dataset originates from the mental health quality assessment data encompassing all high school and university sports disciplines within a specific province, city, and region during the period of 2020–2021. Notably, certain inconsistencies were identified in the original dataset, including instances of insufficient content length and diverse descriptors such as “excellent,” “good,” “A,” “B,” or simplistic four-character words. Following preprocessing tasks such as deduplication removal of empty and excessively brief text, a curated subset of 12,087 records, collectively comprising approximately 800,000 words, was selected. On average, each student’s evaluation content spans 52 words. The dataset encompasses three fundamental pieces of information: city, gender, and sports events. Additionally, it incorporates four objective achievement metrics: cognition (RZ), emotion (QX), will (YZ), and adaptation (SY).

The detailed structure of the data items is elucidated in Table 3.

Table 3 Examples of data items.
Experimental environment and parameter settings

The experimental environment and parameter settings designed are shown in Tables 4 and 5. Table 4 comprehensively enumerates the environmental configurations, encompassing aspects such as processors, GPUs, memory, operating systems, programming languages, and deep learning frameworks. This meticulous detailing is undertaken to ensure the stability and reproducibility of the experimental setup. Table 5 presents the hyperparameter settings employed during the experiments, encompassing the optimizer, word embedding dimensions, hidden vector dimensions, sentence length, and batch size, as well as pertinent parameters for LSTM generator pre-training, CNN generator pre-training, and adversarial training. The selection of these parameters is informed by prior research experience aimed at guaranteeing the effectiveness and stability of the experiments.

Table 4 Environment configuration.
Table 5 Hyperparameter setting.

Results

Model pre-training

The pre-training of the model’s generator employs the Negative Log Likelihood Loss (NLL). Subsequently, the pre-trained generator serves as the foundation for the pre-training of the discriminator. The training of the discriminator involves the amalgamation of generated and real texts in a non-sequential manner, aiming to minimize cross-entropy. Consequently, the training dynamics of both the generator and discriminator are illustrated in Fig. 4.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Pre-training losses for generators and discriminators.

In Fig. 4, the textual composition of the dataset is not overly complex. Through multiple experimental iterations, the pre-training of the generator stabilizes after the 10th epoch, indicating that even with additional iterations, the loss does not exhibit a significant decrease. The pre-training of the discriminator begins to gradually stabilize after the 35th epoch.

Analysis of the impact of the ratio of confrontation training rounds on the model

An incorrect ratio of adversarial training rounds can lead to the divergence of the model’s NLL. With the hyperparameters of the Generator and Discriminator held constant, the performance of the model under different proportions of hyperparameters for the Generator and Discriminator is illustrated in Fig. 5.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

The loss situation of the model under different ratios of generator and discriminator rounds.

Figure 5 illustrates the model’s loss dynamics under varying ratios of generator and discriminator iterations. It can be observed that when the generator’s update rate surpasses that of the discriminator, the model’s loss fails to converge. However, as the ratio between the two iterations decreases, the model’s convergence significantly improves. This observation underscores the critical importance of judiciously adjusting the iteration ratio during adversarial training to ensure the robustness and effectiveness of the model.

Performance comparison experiment

To evaluate the model’s performance, three comparative models are selected: Bidirectional Encoder Representations from Transformers-Generative Pre-Trained Transformer (BERT-GPT), Text-To-Text Transfer Transformer (T5), and Conditional Transformer Language Model (CTRL). The evaluation metrics included Metric for Evaluation of Translation with Explicit ORdering (METEOR), Perplexity, Recall-Oriented Understudy for Gisting Evaluation - Longest Common Subsequence (ROUGE-L), and Distinct-n. The experimental results are presented in Fig. 6.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Performance comparison results (a) METEOR (b) Perplexity (c) ROUGE-L (d) Distinct-n.

In Fig. 6, the optimized model achieves the highest performance in the METEOR metric across the lower, middle, and upper high school groups, with a score of 0.962, followed by the university group with a score of 0.558. Comparatively, the CTRL model reaches a high score of 0.989 within the upper high school group. For the Perplexity metric, the optimized model demonstrates lower perplexity across all groups, notably reducing perplexity within the university group. In contrast, BERT-GPT and T5 show higher perplexity in the upper high school group, indicating lower adaptability to text generation. In terms of ROUGE-L, the optimized model obtains the highest score in the university group, closely followed by the high score of the T5 model. Overall, the optimized model exhibits superior coherence. The Distinct-n metric further reveals that the optimized model produces the most diverse text, achieving the highest scores in the lower high school group, reflecting a significant advantage in lexical diversity.

The results indicate that the optimized model consistently improves performance across various student groups. In both the METEOR and ROUGE-L metrics, the model demonstrated balanced performance, particularly excelling in semantic coherence and fluency compared to the baseline models. The lower Perplexity values suggest that the optimized model generates text that aligns more closely with natural language distributions, enhancing text generation quality. Furthermore, the higher Distinct-n scores indicate a richer vocabulary diversity, reducing repetitive content. Overall, the optimized model outperforms the BERT-GPT, T5, and CTRL baseline models in semantic consistency, fluency, and diversity, making it suitable for generating texts aimed at promoting mental health literacy.

Evaluation indicators and results

The Bilingual Evaluation Understudy (BLEU) score has been selected as the metric for evaluating the proposed model. The evaluation procedure for BLEU involves generating comments given an objective input. Suppose the content of the generated comments is similar to the reference content from the original sample set. In that case, the generation quality is deemed satisfactory, and accuracy is computed using a 3-gram approach. In order to highlight the correlation between generation and controlled semantics, a specific level of a metric is chosen. Initially, BLEU is computed over all reference texts. Subsequently, the text content corresponding to the selected metric and level is filtered and compared. The “A” level of the “Ethical Quality” category in the generated results is compared with all real texts, as indicated by the data under the “All” metric in Fig. 7. Following this, BLEU calculation is performed on the comments corresponding to the “All” level of the “Ethical Quality” category in the real texts, yielding the values under the “selected” metric in Fig. 7. The specific BLEU results for each dimension are presented in Fig. 7.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

BLEU score based on training data.

Figure 7 illustrates the BLEU scores based on training data, wherein the relevance of the generated results’ “Ethical Quality” category at the “A” level is compared with all real texts, and BLEU calculations are performed for the “All” category in real texts, reflecting the model’s evaluation effectiveness. Due to school control over score proportions, the samples for the “D” category are relatively small, which may impact the model’s learning for this category. Additionally, the highest average score is for adaptability quality, indicating a significant correlation between generated comments and adaptability quality. These results provide robust support for the effectiveness of the proposed model and the successful implementation of semantic control. Compared with the research done by Li (2022)33, the advantage of this study lies in the introduction of GAN, which enhance the modeling ability of the model for mental health data through adversarial training mechanisms. Li’s research mainly relies on traditional statistical methods and standardized testing. Although it can provide certain evaluation results, it is often difficult to capture implicit features when dealing with complex and nonlinear mental health data. In contrast, this study achieves a more refined and dynamic evaluation of mental health quality through GAN, significantly improving the accuracy and robustness of the evaluation. Compared with the study by Aldaisari et al. (2022)34, the advantage of this study lies in proposing a training method for associative semantic fusion and combining it with the SeqGAN model to achieve controllable semantic generation. Aldaisari et al.‘s study used a machine learning-based classification method, and although it achieves some results in evaluating mental health qualities, the controllability and semantic consistency of the generated content are relatively limited. This study enhances the control ability of generated content by designing new generators and discriminators, and introducing specific semantic labels, making the evaluation results more in line with practical application needs and further improving the applicability of the model in different scenarios.

Conclusion

With the widespread popularity of sports, the assessment of psychological health quality for middle and high school students, as well as university students, has become a research hotspot. Therefore, this study aims to design a practical method for evaluating middle and high school and university students’ psychological health quality. Firstly, this study proposes an associative semantic fusion training method, utilizing SeqGAN and meeting practical application needs. We design new generators and discriminators to adjust randomly generated content into controllable semantic generation through input. Experimental results show that during the pre-training phase of the generator, stability is achieved after ten iterations, while the discriminator stabilizes gradually after 35 rounds. Secondly, the experiment preprocesses psychological education data for primary, middle, and high school students, applying it to real-world scenarios and establishing a training set. Special labels related to the text content in the training set are fused as abstract semantics to map distribution relationships. During adversarial training, an improper ratio of generator and discriminator rounds may lead to the model’s NLL divergence. Adjusting this ratio improves the model’s convergence, emphasizing the importance of reasonably adjusting the round ratio in adversarial training to ensure the model’s robustness and effectiveness. Additionally, this study proposes and validates a theoretical hypothesis: using special text labels as input to control the constrained generation process. Experimental results indicate that despite deficiencies in the Monte Carlo sampling phase, particularly due to limitations in sampling frequency resulting in completely random and uncontrollable path directions, this study underscores the effectiveness of the model’s semantic control.

There are also some shortcomings. On the one hand, the proposed model involves complex settings and requires fine tuning of specific hyperparameters, which may limit the repeatability and scalability of the model in practical environments. Due to the dependence on hyperparameters, the model may be difficult to directly migrate and reuse in different datasets or application scenarios, thereby affecting its potential for widespread application. On the other hand, although this study briefly mentions the challenges of the Monte Carlo sampling stage, the discussion on potential biases and model generalizability is relatively limited. To address the current issues of model complexity and sensitivity to hyperparameters, future research can focus on developing more automated hyperparameter optimization methods or designing more simplified and robust model architectures. By reducing reliance on manual adjustments, the model’s repeatability and adaptability in different scenarios are improved, thereby enhancing its scalability in practical applications. And in response to the problem of random and uncontrollable path directions in the Monte Carlo sampling stage, future research can explore more refined sampling strategies or combine them with other advanced sampling techniques to improve the accuracy and coverage of sampling, reduce the impact of potential biases, and ensure that the content generated by the model is more stable and reliable. Meanwhile, future research can further expand the validation of the model in different datasets and application scenarios, exploring how to improve its generalizability without sacrificing model performance. Through more comprehensive experiments and comparative analysis, future research can clarify the scope and limitations of the model, laying the foundation for its application in a wider range of educational evaluation fields. Finally, future research can explore combining current models with big data frameworks to process larger scale mental health data. This not only helps to improve the processing ability and efficiency of the model, but also provides a more comprehensive and intelligent solution for evaluating mental health qualities, promoting further development in the field of education.