A multi-factor data mining and transformer-based predictive modeling approach for career success using educational and behavioral traits

Zihan, Zhao

doi:10.1038/s41598-025-23078-9

Download PDF

Article
Open access
Published: 11 November 2025

A multi-factor data mining and transformer-based predictive modeling approach for career success using educational and behavioral traits

Zhao Zihan^1,2

Scientific Reports volume 15, Article number: 39484 (2025) Cite this article

2191 Accesses
1 Citations
Metrics details

Subjects

Abstract

Artificial Intelligence (AI) and automation are increasingly transforming the job market, necessitating advanced methods to enhance job opportunities and career satisfaction for students. In this context, data mining plays a crucial role by uncovering hidden patterns and relationships within large-scale educational and behavioral datasets, enabling more accurate and data-driven insights. This study investigates the use of data mining predictive to analyze career satisfaction based on students’ academic achievements and behavioral traits. Specifically, we explore the efficacy of a transformer-based Bidirectional Encoder Representations from Transformers (BERT) model, which incorporates embedding layers and feed-forward networks within its multi-layer transformer blocks to capture complex, non-linear relationships among diverse educational and behavioral factors. For comparative purposes, traditional machine learning models, and deep learning architectures are also applied to the same Education & Career Success data set. For comparative purposes, traditional machine learning models such as support vector machines, logistic regression, and random forest, as well as a deep learning baseline using gated recurrent units, were also implemented on the same dataset. The empirical analysis demonstrates that the BERT model significantly outperforms these baseline methods, achieving a highest classification accuracy of 98%, compared to 80–85% for traditional and deep learning approaches. This superior performance highlights the proposed model’s ability to effectively integrate and contextualize multifaceted input features, making it a powerful tool for predicting career satisfaction outcomes.

Methods for measuring career readiness of high school students: based on multidimensional item response theory and text mining

Article Open access 16 July 2024

The impact of AI anxiety on career decisions of college students

Article Open access 11 February 2026

Construction of a predictive model for self realization and career satisfaction of university teachers based on text mining

Article Open access 23 November 2025

Introduction

Artificial Intelligence (AI) refers to the development of computer systems capable of performing tasks that typically require human intelligence, such as learning, reasoning, and decision-making¹. It enables machines to analyze data, recognize patterns, and adapt to new information, thereby automating complex processes across various domains including healthcare, social network analysis², finance, education, social media analysis³ and human resources. AI systems integrate algorithms and computational models to simulate cognitive functions⁴, enabling applications like natural language processing⁵, computer vision, and predictive analytics⁶. As AI continues to evolve, it plays an increasingly vital role in enhancing efficiency, personalization, and innovation across industries⁷. Within this broad scope, AI-driven research on career development and job market dynamics has gained significant momentum⁸. Career satisfaction, a critical factor influencing employee productivity and retention, is increasingly being studied through the lens of AI models that consider both behavioral traits and academic performance⁹. The integration of these diverse factors offers a holistic understanding of career success, opening new avenues for personalized career guidance and workforce optimization¹⁰.

Despite its importance, predicting career satisfaction remains a complex challenge due to the multifaceted nature of influencing factors and their nonlinear interactions¹¹. This motivates the need for advanced AI methodologies capable of capturing intricate patterns in multidimensional data¹². This study addresses this gap by applying state-of-the-art transformer-based models, specifically BERT, to predict career satisfaction based on educational and behavioral traits, offering enhanced accuracy and interpretability compared to conventional approaches¹³.

In this study, we explore a standard and publicly widely used dataset encompassing comprehensive academic and behavioral features to analyze career success rates. Using state-of-the-art transformer models, particularly BERT architecture, the research aims to capture complex, nonlinear relationships within these multifaceted traits. By integrating diverse data points such as educational performance indicators and behavioral metrics, the model provides a nuanced understanding of factors influencing career satisfaction. This approach not only advances predictive accuracy but also facilitates deeper insights into the interplay between academic achievements and personal attributes, thereby contributing to more effective career guidance and workforce development strategies. The key contributions of this study are:

Development of a multi-factor predictive model integrating BERT’s transformer architecture to integrate academic and behavioral data for career satisfaction prediction.
Comparative evaluation of transformer-based models against traditional machine learning and deep learning methods, demonstrating superior performance.
The proposed BERT-based model achieves a highest classification accuracy of 98% by effectively capturing both syntactic and semantic patterns, enabling complex and context-aware prediction of career satisfaction levels.

The remainder of this paper is organized as follows: Sect. 2 reviews related work; Sect. 3 describes the dataset and preprocessing steps; Sect. 4 presents experimental results and analysis; Sect. 5 concludes the paper with future research directions.

Related work

The quest to comprehend career success and its prospects has shifted from conventional sociological and psychological models to one of data-centered analysis and prediction, spurred by the explosion of digital data and the tremendous leaps and bounds in machine learning (ML) and deep learning (DL)¹⁴. Recent literature summarizes in Table 1, indicates a shift towards multi-factor predictive models that utilize multiple sources of data.

The study¹⁵ applied scores of vocational interest inventories to articulate categories of individuals’ careers. Compared to traditional profile matching, their ML-augmented approach increased the overall accuracy of predicting common occupations, while still underpredicting rare job categories. In the same vein¹⁶, developed a Big 5 based ANFIS fuzzy-inference model to categorize experts (software vs. data scientist) and showed how personality patterns can be utilized effectively for placement in a career. A partnered objective¹⁷ alumni information (i.e., grades, major, demographics) with subjective survey items and an analysis that integrated a genetic algorithm to predict career success, discovering no structure in terms of the subjective features. These multi-trait models demonstrate worthful in interviewing session based on question-answer to analyze cognitive skills for hiring candidate which encompasses their features that combining hard skills (such as GPA) and soft skills (interests, personality, satisfaction) offers a better resolution of career outcomes. However, they usually target specific communities (e.g., a university, a field) and generalization is limited¹⁸. AI in smart education can further be used for career prediction strategies: after analyzing students’ skills, interests, and performance data, they are guided with potential suitable career paths as well. This allows for the development of more personalized guidance to enhance decision making and curricular alignment with workforce demand¹⁹. In systems that predict careers, we also face challenges of human and AI collaboration. Like human-machine plan conflicts that could result in increased cognitive burden and decreased trust, unfair or opaque career prediction can have negative effect on the users’ confidence and their decision making. Combining fairness-aware AI models with transparency mechanisms and adaptive feedback loops can help to alleviate these problems ensuring that predictions are explainable, bias-aware, and aligned with their goals. “Moving away from the soft skills, AI-infused career counseling is beneficial for increased trust and acceptance of such a product, so that education and workforce outcomes might better benefit those who can now make informed, equitable choices about their careers²⁰. Workplace analytics driven by AI can detect early signals of exclusion, providing HR with empirical evidence to infuse interventions aimed at fairness. If career prediction tools are bridged with our well-being, AI may facilitate inclusive growth and foster a leveling playing field in careers²¹.

Academic performance is a good predictor of an individual’s efforts and an indicator overall school performance. The study²² employed several ML algorithms (decision trees, gradient boosting, neural nets) on an alumni survey that connected academic experiences to career satisfaction. They found a Gradient Boosting model to slightly surpass logistic/ordinal regression (approx. 2–3% in accuracy) and highlight the “frequency of applying learned knowledge in work” as the most important predictor of career satisfaction. For salary prediction, an LSTM network (using both max-likelihood and Bayesian tricks) is proposed²³ to upgrade and show it to be orders of magnitude better than traditional regressions at predicting alumni salaries. Similarly, ML-based based classifiers are created on students’ high-school grades, college GPA, and socio-economic status to predict career fields, and observed pre-university grades to be particularly predictive²⁴. Even less structured educational data has been utilized²⁵ utilized lexical descriptors of high-school essays, trained a Random Forest model, and achieved “excellent” levels of accuracy when predicting students’ career readiness. They demonstrate that tangible educational outcomes (grades, curriculum, and application of knowledge) have a significant impact on career paths, but such analyses need data rich and sophisticated feature engineering (e.g. mining text).

Table 1 Summary of existing studies.

Full size table

Also, other behavioral and personal characteristics have entered in prediction models. For instance²⁹, applied Random Forests to survey data of 409 college graduates with disabilities to forecast job and career satisfaction at a later point in time. Their model achieved ~ 72% accuracy, demonstrating that meeting academic accommodation needs during education significantly positively influenced long-term career satisfaction. This result is indicative of how personal context to individuals (e.g., access to support) could be harnessed by ML for career. Personal traits also matter numerous works that verify that augmentation with style or personality features improves the predictive performance³⁰. Another study used AI models indeed rely only on behavioral traits, but these vary markedly among different individuals, so models that are learned on restricted sample tend to be not very generalizable³¹. Indeed, systematic reviews explored, despite the rise and rise of more complex ML models (random forests, SVMs, neural nets), there is lack of data, class imbalances and (lack of) interpretability issues that prevent such models being widely deployed. In brief, ML can measure the impact that characteristics and experiences have on success but must solve with the empirical and ethical difficulties (fairness, transparency) of modeling human behavior³².

Some work has also introduced new data modalities and deep architectures integrated biomechanical features (gait and posture metrics) together with behavioral questionnaires into an RF model that predicted the career outcomes of 4-year college students; this multimodal model achieved ~ 82.6% prediction accuracy and demonstrated that biomechanical information could significantly increase prediction performance²⁶. On the deep-learning side, utilized an encoder-decoder LSTM with students’ vector of exam scores and demographics as input sequence to recommend study subjects, which obtained very high accuracy achieving close guidance to tradition while allowing for more personalized forecasts²⁷. These approaches demonstrate that rich data and DL can improve prediction, but they demand large samples and often lose interpretability³³. Overall, Random Forests, SVMs, and neural nets have relatively everything else beat, and in fact many papers report only marginal improvements in accuracy over simpler methods³⁴.

In brief, the studies under consideration use ML/DL to analyze education and behavior predict career success. They share such strengths (ability to capture non-linear interactions and to personalize predictions) and weaknesses (limited learning due to the size of the dataset, risk of overfitting and opacity)²⁸. For instance, studies show the superiority of big data on accuracy while smaller studies show sensitivity to feature selection. Autonomy and utility concerns (privacy, algorithm bias) are also decorated as major problems³⁵. Therefore, although the literature reflects clear advances in AI-based modeling of career outcomes, it also raises a call for even more generalizable and interpretable models and evaluation on broad, longitudinal data sources.

Research proposed methodology

This analysis follows a data-based approach to forecasting career satisfaction with an extensive collection of educational and behavioral variables. More advanced predictive models are trained and tested after extensive preprocessing and feature engineering, as shown in Fig. 1. The trained models based on BERT architecture because it is known to successfully learn intricate and context-dependent relationships in high-dimensional data.

Differing with classic machine learning models and RNNs, BERT features a transformer-based architecture instead of a recurrent structure that leverages multi-head self-attention and deep input representation to capture complex cross-variable interactions in a bidirectional manner. This provides scope in order to allow the model to adequately capture both linear and non-linear relationships present in the multifactorial predictors of career success. The methodical work thus unites solid data preparation and state-of-the-art deep learning to achieve both high predictive accuracy and interpretability regarding career satisfaction classification³⁶.

Unlike previous literature, which mostly relied on deep learning or transformer-based architectures for textual or domain-specific sequence data, we introduce the application of BERT on structured tabular data with educational and behavioral attributes. Parameter sharing has to be managed carefully in order to scale the model to millions of user–job pairs, but the main contribution is on the feature embedding side: for each feature whether it be academic (e.g., GPA, SAT score, university rank) or behavioral (e.g., networking score, work–life balance, internships), they projected into a dense space where values are normalized and placed in context. These embeddings are then fed into BERT’s multi-head self-attention layers that help model the interactions across heterogeneous characteristics. For example, the model is able to optimally weight the overlap of “University GPA” and “Networking Score”, capturing information that standard tree-based or shallow neural techniques cannot. These relationships are then further refined by the residual connections and feed-forward layers which preserve local as well as global dependencies in features. Therefore, BERT’s transformer blocks not only as a sequence model but as a relational learner for tabular data can learn syntactic structures for numeric hierarchies and categorical encoding, as well as semantic relationships for behavioral tendencies associated with career outcomes. This methodological novelty combine embedding and context-aware attention for mixed features is the key consideration of our proposed method, which results in consistently better performance compared to other baseline works.

Data collection and preprocessing

This study utilized a dataset $\:{\mathcal{D}=\left\{\left({x}_{i},{y}_{i}\right)\right\}}_{i=1}^{N}$ where each instance $\:{x}_{i}\in\:{\mathbb{R}}^{d}$ represents a feature vector composed of educational and behavioral attributes, and $\:{y}_{i}\in\:\{\text{0,1},2$} denotes the categorical career satisfaction class (Low, Medium, High), features displayed in Table 2. The dataset was collected from a longitudinal survey of graduates, incorporating continuous variables such as GPA, test scores, and soft skills scores, as well as categorical variables including gender and field of study. Preprocessing involved data cleansing where missing entries were removed $\:(D{\prime\:}=\:\{({x}_{i},\:{y}_{i})\::{x}_{i},\:{y}_{i}\:\ne\:null\left\}\right),$ and categorical features $\:{x}_{j}$were encoded via one-hot or ordinal mappings to numeric vectors, ensuring $\:{x}_{i}\in\:{\mathbb{R}}^{d}\:$is fully numeric. Target variable $\:y$ was discretized by mapping original satisfaction scores $\:{s}_{i}\in\:\left[\text{1,10}\right]$ into classes using Eq. 1.

$$\:{y}_{i}=\:\left\{\begin{array}{c}0\:\:if\:\:1\:\le\:{s}_{i}\le\:3\\\:1\:if\:\:4\:\le\:{s}_{i}\le\:6\\\:\:\:2\:if\:\:7\:\le\:{s}_{i}\le\:10\end{array}\right.$$

(1)

Numerical features were standardized via z-score normalization $\:{\stackrel{\prime}{x}}_{ij}=\frac{{x}_{ij}-{{\upmu\:}}_{j}}{{{\upsigma\:}}_{j}}$, where $\:{{\upmu\:}}_{j}$ and $\:{{\upsigma\:}}_{j}$, are the mean and standard deviation of feature $\:j.$ Outliers beyond $\:\pm\:3$ standard deviations were removed to ensure robust model training. The final cleaned and transformed dataset $\:\mathcal{D}"$ was then partitioned into training and testing subsets using stratified sampling with ratio $\:80:20$, preserving class distributions for reliable evaluation.

Table 2 Dataset feature analysis.

Full size table

The original data set included student records but with preprocessing, it resulted in valid instances. The data is fully anonymized, no personal identifiable information being divulged, conforming with the Privacy movement. Therefore, characteristics of geographic locality, ethnic affiliation or socio-economic standing are not attainable that reduces the representativeness of the dataset and might have an impact on generalization. In addition, some fields might be over-represented which can cause bias. To prevent overfitting, we adopted a stratified 80–20 hold-out split combined with the 5-fold cross-validation keeping class distribution. Cross validation hyperparameter tuning was conducted separately for each model on the training folds and test folds were held out to prevent data leakage. Despite the high accuracy rate (98%) of the BERT model in this study, this result should be cautiously understood, and further studies need to include external validation using independent data sets to test the generalization.

Feature engineering

In this study, the raw dataset comprises multiple variables that characterize individuals’ academic and behavioral profiles. For modeling purposes, these variables are grouped into two primary categories: educational traits and behavioral traits. Consider an individual indexed by $\:i$ within the dataset of size $\:N$. Let the overall feature space be represented as in Eq. 2.

$$\:{x}_{i}\in\:\mathcal{X}\subseteq\:{\mathbb{R}}^{d}$$

(2)

where $\:d\:=\:m\:+\:n$ is the dimensionality corresponding to $\:m$ educational and $\:n$ behavioral traits, respectively.

Educational traits consist of measurable academic indicators reflecting formal learning achievements and credentials. Mathematically, we define the educational feature vector for the $\:i-th$ individual as mapping shown in Eq. 3.

$$\:\mathcal{E}:\mathcal{X}\to\:{R}^{m},\hspace{1em}{x}_{i}\mapsto\:{E}_{i}={\left({e}_{i1},{e}_{i2},\dots\:,{e}_{im}\right)}^\tau$$

(3)

with each component $\:{e}_{ij}$ modeled as in Eq. 4.

$$\:{e}_{ij}={{\upphi\:}}_{j}\left({{\uptheta\:}}_{j}\left({x}_{ij}\right)\right),\hspace{1em}{{\uptheta\:}}_{j}:\mathbb{R}\to\:\mathbb{R},\mathbb{\:}\mathbb{\:}{{\upphi\:}}_{j}:\mathbb{\:}\mathbb{R}\to\:\mathbb{R}$$

(4)

where $\:{{\upphi\:}}_{j}$, is a feature scaling or normalization function (e.g., z-score normalization), using Eq. 5.

$$\:{{\uptheta\:}}_{j}\left({x}_{ij}\right)=\frac{{x}_{ij}-{{\upmu\:}}_{j}}{{{\upsigma\:}}_{j}}$$

(5)

And $\:{{\upphi\:}}_{j}\:$is a nonlinear transformation such as polynomial or sigmoid activation enhancing representational richness, computed using Eq. 6.

$$\:{{\upphi\:}}_{j}\left(z\right)=\frac{1}{1+{e}^{-{{\upalpha\:}}_{j}z}}$$

(6)

with $\:{\propto\:}_{j}>0\:$controlling nonlinearity intensity.

Behavioral traits capture aspects related to an individual’s skills, experiences, and personal development that influence career outcomes beyond pure academic performance, defined in Eq. 7. The behavioral feature vector is expressed as in Eq. 8.

$$\:\mathcal{B}:\mathcal{X}\to\:{\mathbb{R}}^{n},\hspace{1em}{x}_{i}\mapsto\:{B}_{i}={\left({b}_{i1},{b}_{i2},\dots\:,{b}_{in}\right)}^\tau$$

(7)

$$\:{b}_{ik}={\psi\:}_{k}\left({\gamma\:}_{k}\left({x}_{ik}\right)\right),\hspace{1em}{\gamma\:}_{k}\mathbb{\:}:\mathbb{R}\to\:\mathbb{R},\mathbb{\:}\mathbb{\:}{\psi\:}_{k}:\mathbb{\:}\mathbb{R}\to\:\mathbb{R}$$

(8)

For categorical or ordinal behavioral traits, may denote an embedding or encoding function, computed using Eq. 9.

$$\:{\gamma\:}_{k}\left({x}_{ik}\right)={\sum\:}_{l=1}^{L}{w}_{kl}\cdot\:\mathbb{I}\left({x}_{ik}=l\right)$$

(9)

Where $\:\mathbb{I}\mathbb{\:}(.)$ is the indicator function and $\:{w}_{kl}\in\:\:\mathbb{R}$ are learned embedding weights. The combined feature vector is the concatenation, shown as in Eq. 10.

$$\:{x}_{i}={\left[{E}_{i}^\tau,{B}_{i}^\tau\right]}^\tau\in\:{\mathbb{R}}^{d}$$

(10)

Define the continuous career satisfaction score as $\:{s}_{i}\in\:\left[\text{1,10}\right]$ and the discrete target class $\:{y}_{i}\in\:\{\text{0,1},2\}$ via the function in Eq. 11.

$$\:{y}_{i}={\sum\:}_{c=0}^{2}c\cdot\:\mathbb{I}\mathbb{\:}\left({s}_{i}\in\:{\mathcal{S}}_c\right)$$

(11)

where the class partitions are intervals defined as in Eq. 12.

$$\:{\mathcal{S}}_{0}=\left[\text{1,3}\right],\hspace{1em}{\mathcal{S}}_{1}=\left(\text{3,6}\right],\hspace{1em}{\mathcal{S}}_{2}=\left(\text{6,10}\right]$$

(12)

The indicator function $\:\mathbb{I}(.)$ evaluates to $\:1$ if its argument is true, $\:0$ otherwise. This formulation embeds preprocessing and feature extraction within nonlinear transforms $\:{\vartheta\:}_{j}$,$\:{\gamma\:}_{k}$ and mappings $\:{\theta\:}_{j}$,$\:{\gamma\:}_{k}$, to highlight the multistage, nonlinear nature of feature construction. The target mapping uses crisp interval-based indicator functions for categorical class assignment, suitable for classification learning frameworks.

Although the mathematical representation in Eqs. (2)–(12) is a strict definition of the feature space, an intuitive example may help to clarify how it actually works. To illustrate, suppose we have a student with these skill levels:

Educational traits: High_School_GPA = 3.6 (scale 0–4), SAT_Score = 1450 (range 900–1600), University_GPA = 3.4 (scale 0–4), Projects_Completed = 4 (range 0–9), Certifications = 2 (range 0–5).

Behavioral traits: Internships_Completed = 2 (range 0–4), Soft_Skills_Score = 8 (range 1–10), Networking_Score = 7 (range 1–10), Work_Life_Balance = 6 (range 1–10), Entrepreneurship = Yes.

In preprocessing, z-score normalization is applied to the educational traits (Eqs. 3–6). For example, All features are scaled into the [0, 1] interval, the normalized value are:

$$\:{\theta\:}_{x\left(SAT\right)}=\:\frac{1450-900}{1600-900}= 0.79$$

$$\:{x{\prime\:}}_{\left(CGPA\right)}=\:\frac{3.6}{4.0}= 0.90$$

$$\:{x{\prime\:}}_{\left(CERTIFICATIONS\right)}=\:\frac{2}{5}= 0.40$$

Behavioral traits follow the same rule:

$$\:{x{\prime\:}}_{\left(SOFTSKILLS\right)}=\:\frac{8-1}{10-1}= 0.78$$

$$\:{x{\prime\:}}_{\left(INTERNSHIPS\right)}=\:\frac{2}{4}= 0.50$$

Behavioral traits (Eqs. 7–9) may be encoded categorically. Like, if we have “Entrepreneurship = Yes” then it is encoded as 1 and if it is a no then as 0. Job levels (Entry, Mid, Senior) are also encoded as embeddings to enable capturing co-occurrence patterns between categories. The educational and behavioral features are combined into a single representation, considering the transformed feature vector (Eq. 10). In this instance, the student’s profile combines normalized test scores, GPAs, project counts and encoded behavioral features in a single feature space.

The resulting feature vector (Eq. 10) concatenates these normalized values as one feature. For this student, the resulted vector contains values such as [0.90, 0.79, 0.85, 0.44, 0.40, 0.50, 0.78, 0.67, 0.56, 1.0] combining academic and behavioral (content-related) indicators.

Finally, the stated Career Satisfaction = 8 is associated to the discrete target class High (2) as values of 7–10 map into High satisfaction class (Eq. 12).

This illustration provides a place for the abstract equations and makes them interpretable as a workflow, demonstrating how raw academic and behavioral inputs are processed to fit within 0–1 feature space suitable for transformer-based modeling.

The target label (Eqs. 11–12) represents the given Career Satisfaction score as one of three classes: Low, Medium or High. To illustrate, if this student reports a satisfaction score of 8, then they are categorized in the “High” level of satisfaction.

Proposed model

The proposed model is based on the Bidirectional Encoder Representations from Transformers (BERT) architecture, which essentially uses the multi-head self-attention mechanism to capture complex contextual dependencies among input features, working shown in Fig. 2. Unlike how traditional sequence models are designed, BERT is using transformer encoder blocks which in turn will let each element in the input sequence attends to all elements with both directions considering, so that richer representation can be obtained. At the heart of this architecture is the self-attention function, which calculates attention weights using scaled dot-products between query ($\:Q$), key $\:\left(K\right),$ and value$\:\:\left(V\right)\:$vectors extracted from the input embeddings³⁷. Mathematically, the attention output is calculated using Eq. 13.

$$\:\text{Attention}\left(\text{Q},\text{K},\text{V}\right)=\text{softmax}\left(\frac{\text{Q}{\text{K}}^\tau}{\sqrt{{\text{d}}_{\text{k}}}}\right)\text{V}$$

(13)

where dimensionality of key vector used as a scaling factor to avoid high gradient. The multi-head attention mechanism concatenates several such involving attention outputs, which allow different representation subspaces to participate in jointly attending other representation subspaces³⁸.

The feature embeddings are iteratively refined by stacked transformer layers, which consists of multi-head attention, position-wise feed-forward networks, layer normalization, and residual connections per layer, all layers defined in Table 3. This contextualization helps in capturing refined correlations between educational and behavioral characteristics, increasing the levels of career satisfaction classification.

Although tree-based models, such as RF and GB, and classical DL architectures such multilayer perceptron’s or recurrent networks achieve good prediction performance for structured data, they may fail to exploit complex interactions and contextual dependencies across disparate attributes. Transformer-based models such as BERT on the other hand use multi-head self-attention mechanisms, enabling mutual weighted and bidirectional interactions between diverse features so that local as well as global dependencies within data can be preserved. This is especially beneficial in career achievement prediction, as the educational (e.g., GPA, university ranking) and behavioral traits (e.g., networking score, work–life balance) interact in nonlinear and context-specific manners.

Table 3 Analysis of BERT working based on layers.

Full size table

By utilizing attention and embedding we are not only able to improve predictive performance but also provide a more interpretable representation of feature contribution compared to previous works. Pre-trained BERT model and adapted its embedding mechanism for structured tabular data by mapping educational and behavioral traits into embedding vectors rather than raw tokens. Instead of language tokens, each feature dimension was embedded and processed through the transformer blocks, allowing BERT to capture contextual interactions across heterogeneous attributes. The results of this study with BERT achieving 98% accuracy and significantly performing over both tree-based and recurrent baselines provide empirical evidence of transformer models being a good fit for tabular predictive tasks.

Baseline models

The chosen baseline models of this work are comprised of the widely adopted machine learning and deep learning techniques for offering an extensive performance comparison. Support Vector Machine (SVM) functions as a strong linear and nonlinear classifier and maximizes the margin between classes by applying kernel functions and is also able to process high-dimensional data. Logistic Regression (LR) is a basic probabilistic linear model which is a highly applied model in binary and multi-class classification problems because of its simplicity and interpretability. Random Forest (RF), which is based on an ensemble of decision trees, also has the same objective of enhancing model prediction accuracy and preventing overfit and works particularly well in the structured tabular data case with the use of bagging and feature randomness. Finally, Gated Recurrent Unit (GRU) is an RNN model variant that has been developed to model temporal dependencies in sequences, which addressed the vanishing gradient problem with the addition of gating mechanisms and has been proven to outperform RNNs in time-series or ordered feature models. Taken together, these baselines form a strong comparison baseline against a range of modeling paradigms and allow the proposed predictive framework to be effectively evaluated.

For transparency and reproducibility, we highlighted the hyperparameters, training settings, and computational resources across all models employed in this work. Table 4 shows the all-parameter settings of the baseline machine learning and deep learning methods and the BERT based transformer, which are summarized along with our proposed model for ease of comparison inclusion of hardware and software.

Table 4 Model training setup and hyperparameters.

Full size table

Performance measures

The proposed model was evaluated with a range of metrics to measure the effectiveness and reliability of the classification, defined in Table 5. The principal indices applied to compare the model performance included accuracy, which measures how many instances are classified correctly, precision and recall, which estimate on one hand the capability of the model to correctly label the relevant positive samples and on the other hand to predict all actual positives, offering info on the false positive and false negative, and the F1-score, which is the harmonic mean of two former indices and was used to balance the two, especially when the class is imbalanced³⁹.

Table 5 Analysis of evaluation metrics.

Full size table

Results and discussion

This section discusses the results of a predictive analysis of career satisfaction based on both compound educational and behavioral characteristics. First, we investigate the dataset by performing EDA to learn about feature distributions and their relationship. Then performance of different predictive models (older machine learning methods and newer deep learning networks) is analyzed and contrasted. The focus is specifically on the original BERT transformer model which has shown improved prediction accuracy and robustness. Comprehensive empirical results deify the behavior of the model and the importance of the features and demonstrate the verification metrics to justify the efficiency of the multi-factor approach in learning complicated patterns that determine career results. The bar plots in Fig. 3 represent the sum of job offers in the field of study. The data shows that graduates from Arts and Mathematics receive the highest volume of job offers, followed closely by Law, Business, Engineering, Medicine, and Computer Science. This in turn indicates that not only the popular fields that are technology focused but also the traditional and broader degree fields such an Arts and Mathematics bring good employability outcomes in this data.

This observation could be indicative of industry requirements or linked to networking opportunities in these domains. The salary distribution histogram in Fig. 4, split by the current job level of a user, understandably shows hierarchical trends: junior level jobs are mostly concentrated in the lower salary ranges, and mid, senior, and executive level positions have salaried distributions at increasingly higher peaks. Interestingly, at all but the top level the data skews toward lower salaries, which could indicate early-career leakage or industries in which earning potential is capped. This is yet another direct evidence of the job level starting salary correlation that is so critical to career satisfaction.

t-SNE plot in Fig. 5 transforms the high dimensional feature space into two dimensions and colors the points based on career satisfaction classes (Low, Medium and High). The clusters overlap between classes suggesting that the extraction of features that unambiguously separates classes is difficult as the underlying feature relationships are complex and non-linear. Nevertheless, there are a few localized areas which suggest partial clustering indicating that certain combinations of traits prefer to co-occur and prefer to be paired with some level of satisfaction. This supports the importance of using complex models such as deep learning or ensemble methods to model these subtle interactions.

Furthermore, Heatmap in Fig. 6 demonstrates weak to moderate correlations between numerical features, though some strong positive relationships e.g. between University GPA and High School GPA, Soft Skills Score and Networking Score. Note that career satisfaction has almost no direct relationship with most properties, which implies that satisfaction may be affected by some complex combination of factors, or by (latent) variables that are not only in a linear relationship. This emphasizes the value of multivariate modeling approaches.

The pair plot in Fig. 7 further illustrates the distributions in scatter forms for the behavioral traits such as number of internships completed, soft skills, networking score, and work-life balance of the different classes within career satisfaction.

It is clear that, satisfaction Skill and Network score correlates highly within any of the satisfaction brackets, indicating that there is a significant behavioral impact on the career experience. The clustering of work-life balance for arbitrary groups can be seen to be discreet, possibly because of survey participants answering in a categorical style, but its distribution generally looks quite similar across each satisfaction group.

The feature of importance analysis in Fig. 8 is insightful to understand which predictors the forecasting model uses the most in determining career satisfaction. Remarkably, SAT Score, College Ranking and College GPA, which are all commonly used academic performance benchmarks, emerge as the top three factors in addition to Publications: First Author. This emphasizes the large extent to which conventional educational achievements determine career paths and satisfaction. Starting Salary and the other highest-ranked features are consistent with the findings that economic reward is positively associated with perceived career success and satisfaction. In addition to these strong academic and financial metrics, behavioral/experiential characteristics– such as Soft Skills Score, Work Life Balance, Networking Score– also show compelling importance but at lower ranks. They also highlight the complexities of what makes a career satisfying — not just grades or paychecks, but also people’s skills, professional relationships and even personal happiness. The model’s addition of these characteristics confirms the hypothesis of the study that the career outcome is a result of both educational and behavioral factors acting in concert. The fact that features like Job Offers and Certifications have moderate important scores but are non-zero reveals that actual achievements or external validations are accounted for in career satisfaction. At the same time Gender and Entrepreneurship have relatively less importance, indicating that the direct effect is less for these features or it would be mediated through some other features. In general, this importance of ranking confirms the utility of comprehensive feature engineering and the importance of employing more advanced models on multiple modalities of data.

Visualize Distributional Properties in the Data via the Summary Heatmap in Fig. 9. The statistical summary heatmap is provided for a summary overview of the dataset’s distributional properties across a range of numerical attributes. Note that the summary reflects anticipated central tendencies and spread for school-performance measures, with the mean of High_School_GPA, SAT Score, and University GPA being at the middle of those measures’ observed range, and indicative of a relatively normal academic profile across the sample. With a maximum above 100,000, one has to say that the salary is not stable in that example. This discrepancy indicates that, although many respondents start out earning intermediate-level salaries, a minority is compensated significantly higher, which may vary based on discipline, location or experience. This broad salary range has consequences for modeling and indicates the importance of normalization or robust regression methods to deal with right skewness. Soft_Skills_Score and Networking Score have rather moderate and not too deviating means, which is also not surprising given their subjective and self-reported nature. Distributions of ordinal-like continuous variables, such as Work_Life_Balance and Entrepreneurship, validate authors ‘informal understanding of discrete or ordinal scales. The low standard deviations for academic measures, in comparison to salary and a few behavior scores, suggest the differing stability of the formal education measures and subjective experiences or situational influences on career satisfaction. Appropriate features handling and the integration of multiple data modalities should be considered for model development considering these statistical trends to guarantee model robustness.

The exploratory phase indicates that educational backgrounds and behavioral traits are twined with career results. Academic metrics such as GPA are the most important feature but do not capture career satisfaction which highlights the significance of behavioral attributes like networking and soft skills. Satisfaction tends to be correlated with salary and job level and, the t-SNE visualization conveys the nonlinearity of the data. These findings corroborate the demand for sophisticated machine learning methods in combining diverse characteristics to accurately forecast job success and satisfaction.

Classification results and comparative analysis of models

The career satisfaction levels were dichotomized into Low, Medium, and High levels by Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Gated Recurrent Unit (GRU) and the BERT model. These models used multi-factor input features including educational and behavioral features (e.g., GPA, mentorship, the rank of the school, soft skills and work-life balance). The Random Forest model outperformed SVM and Logistic Regression slightly in the classical machine learning methods; with an accuracy of 81%, and balanced precision-recall figures around 72–74%. This suggests the robustness of RF for such heterogeneity tabular data and modeling of nonlinear feature interactions. The adaptation of a GRU (a kind of RNN) able to model sequential dependence of the data markedly ameliorated the classification results (accuracy = 85%), with positive values for both recall and precision, which is in line with the RNN’s capacity to encapsulate temporal or sequential relationship between behavioral attitudes. The BERT model (based on transformer architecture) is largely superior to all baselines, reaching an accuracy of 98% and precision, recall and F1 scores over 96–98%, as shown in Table 6. This significant performance gain is evidence of BERT’s strong generalization ability to contextualize feature embeddings in its multi-layer transformer architecture. The transformer blocks in BERT use self-attention mechanisms that enable the model to dynamically assign importance to input features, thus capturing complex interdependencies among behavioral and educational features. Unlike sequential recurrent architectures, no steps are taken during training to explicitly condition on the output on previous sufficiently long sub-sequences or to avoid interference of previous sub-sequences with future input.

Table 6 Results analysis of applied model.

Full size table

In BERT, this is done by embedding layers, which project he sparse or categorical features into dense high-dimensional vectors, so that the semantic similarities are preserved and the model will generalize better. These embeddings are locally refined by a stack of transformer layers, each comprising multi-head self-attention, feed-forward networks, and normalization layers with residual feedback. This layered attention mechanism helps the model to capture subtle details in the data that are essential for nuanced classification such as their career satisfaction level. The confusion matrices in Fig. 10 show that in the case of the traditional models (SVM and LR) the confusion is mostly concentrated on adjacent classes (e.g. Low being confused with Medium) showing that these groups are very close in the feature space. Historical satisfaction of users per day of the week. RF and GRU alleviate some of these misclassifications, but confusion remains high between Medium and High classes. In contrast, the BERT model achieves almost perfect classification with no confusion between classes, which indicates its strong discriminative capability and the extent to which even subtle difference among categories of career satisfaction are properly separated. The comparison illustrates the importance of transformer architecture in capturing the complexity of multifactorial career outcomes. The stronger performance of BERT indicates that complexly interconnected, nonlinear relationships embodying a combination of educational and behavioral data can be modeled effectively when advanced attention mechanisms and rich contextualized embeddings are in place, which enables highly accurate and stable classification of career satisfaction. This functionality support makes BERT a strong potential for practical career analytics and personalized career development interventions.

Figure 11 is about training and validation loss trends during the progress of 50 epochs on the BERT model training. They both decrease steadily and represent good learning curve and convergence values of the model. The training loss is continuously reduced from approximately 0.7 down to around 0.11 during training sessions, indicating the growing capability of the model to reduce errors on training data. The validation loss has a similar trajectory with an initial very high value before decreasing and flattening out around 0.14 by the final epochs.

The little difference between the training and testing cuts will be strong evidence of hardly overfitting, so the model is likely to generalize well in previously unseen data. The zoomed inset in the last epochs comes to support this convergence, with the train and validation loss nearing closely at the validation loss minimum (epoch 50). This convergence suggests that the training is well-augmented, as the model seems able to balance its bias and variance quite well, finally finding the best parameters by proper regularization and learning schedules.

The plot in Fig. 12 compares validation accuracy of the proposed BERT-attention model to its baseline GRU model counterparts in terms of difference in training epochs. Overall, the learning curves and final performance of the BERT model are significantly better. Beginning with higher initial accuracy (e.g. 74% vs. 70% of GRU), BERT’s validation accuracy grows faster in the Rapid Learning Phase (epochs 1–10) by 4:6% greater than GRU. With time going on even further toward the Stabilization Phase (epochs from 30 to 50), the BERT model more stably reaches the top, eventually significantly outperforming the GRU model’s plateau around 86%. The graph is shaded to represent these phases, accentuating the performance discrepancy between the models. The labeled percentage increases at different epochs show very good increments for all the models which emphasizes the role of conditional probability and contextual deadline embeddings for learning patterns in the joint educational and behavioral feature space.

In the Fig. 13, the model prediction is certain in the three career satisfaction classes (Low, Medium, High). Each bubble is a particular prediction, the size of each bubble being equivalent to the level of confidence. Most importantly, the clusters appear in the confidence regime close to 1.0 for all classes, showing high confidence of the predictions. The tight cluster at high CI values indicates the model’s good performance and high discrimination power. Very low dispersion with less than 0.6 confidence indicates low ambiguous predictions, which also confirms the strong generalization of the model. This highly confident distribution also agrees with accuracy and loss analysis, underlining the adaptability of BERT in classifying fine-grained career satisfaction levels with high precision and confidence.

Interpretability analysis by LIME and SHAP show that educational and behavioral features are the most important factors in predicting career satisfaction. Starting Salary, Current Job Level, Certifications and Job Offers contribute most towards the class prediction in both the methods and Soft Skills Score along with Networking Score are found only marginally effective. Figure 14 gives a local explanation of a single prediction and shows how each feature contributes to driving the model’s output towards the predicted class. The contributions from predictors on job salary, certification level, internship count, and organization size are positive and the prediction gets higher each, whereas weak or even negative contributions (e.g., age, work-life balance) of other impressive features are not what matter.

This highlights that the individual-level factors compound a particular way versus the global trend presenting personalized interpretability.

Figure 15 illustrates the average global importance of features with respect to the trained proposed model in a bar plot. Academic features such as GPA and SAT scores contribute less, while soft skills, networking, and work-life balance have minimal global impact. This emphasizes the importance of practical as well as career factors rather than purely academic performance in influencing destination. Together, these results suggest that measurable career outcomes (salary, promotions, job level) remain decisive predictors of satisfaction, whereas softer indicators, although relevant, play a comparatively minor role.

Taking in combination, these findings offer convincing proof for the superiority and stability of the proposed BERT model in modeling career satisfaction in term of multivariate educational and behavioral characteristics. Although BERT achieved a very high accuracy of 98%, it is necessary to consider this result with caution. Professional data in real world situations are usually noisy, subjective and with different demographics, and socio-economic backgrounds which may not be complete in the dataset⁴⁰. Hence, the almost perfect scores we observe here are at least to a certain extent influenced by characteristics or biases specific to this dataset (or set of datasets) and that validation in more diverse cohorts is needed. However, the feature importance analysis provides some useful insights in line with previous career development research. For instance, key educational indicators like SAT scores are robust predictors of early-career success echoed by findings from the field of educational psychology, while behavioral factors such as Networking and work-life balance resonate with those from human capital management and organizational behavior research that have focused on social capital and sustainable professional engagement. This concordance with prior work underscores that, while overfitting may be a concern, the proposed model learns meaningful associations between educational and behavioral attributes and career satisfaction outcomes. The complex attention mechanisms in BERT can effectively capture complex dependencies among the network in its end-to-end learning processes, which leads to fast convergence and high accuracy on confidant predictions, surpassing the classical recurrent models such as GRU. This demonstrates the viability of transformer-model based architectures in intelligent career analytics applications. Our proposed BERT model for career satisfaction prediction is significantly better than the existing state-of-the-art performance of models in the field.

Previous work, displayed in Table 7, used the model Gradient Boosting (GB) train on the data like university alumni surveys and get an AUC of 89¹⁵ and used the hybrid LSTM and RF model train on the longitudinal student data to get the accuracy of 89%¹⁷, while our BERT model got the accuracy of 98%. The former deep learning solutions, such as the single-layer LSTM for the alumni salary data, reported moderate accuracy around 78%¹⁶, while the BERT model applied using the corporate demographic and exam score data reported an accuracy about 73%²⁶. Classic machine learning models such as Random Forest and Support Vector Machines (SVM) also were achieving accuracies between 68% and 77% on similar educational datasets^18,19. The improved performance of our model results from the incorporation of behavioral and academic characteristics and from the contextual learning about intricate syntactic and semantic relationships by leveraging attention mechanisms of transformers. This broad feature representation allows to better understand and predict career satisfaction and showcases how state-of-the-art transformer architectures are transformative for educational and career analytics.

Table 7 Comparison of proposed with state-of-the-art.

Full size table

Conclusion and future work

This study presents a multi-factor predictive model for career success and satisfaction that integrates both educational and behavioral traits using advanced machine learning techniques. Our findings demonstrate that the proposed BERT-based model significantly outperforms traditional classifiers and recurrent neural networks by effectively capturing complex interactions through its transformer-based attention mechanisms. Key educational features such as SAT scores, university ranking, and GPA, alongside behavioral factors like soft skills and networking, collectively contribute to accurate prediction of career satisfaction levels. The model’s superior performance, achieving 98% accuracy, underscores the value of combining diverse data modalities and leveraging deep contextual embeddings for nuanced career analytics. Moreover, the exploratory data analysis revealed intricate relationships between educational background, personal traits, and career outcomes, affirming that career satisfaction is influenced by multifaceted factors rather than isolated indicators. The t-SNE visualization and feature importance analysis further highlighted the non-linear and interdependent nature of these attributes. The current study with its limitations: first, the data is collected through self-reported survey responses, which may be subject to bias or misrepresent actual academic and career performance. Secondly, without external validation or deployment on other datasets prevents generalizability to other populations, hospitals or social-economical settings. Third, although BERT performed with a very high accuracy as mentioned above, transformer models often serve as black-box architectural functions; their decision-making process is not interpreted as clearly compared to tree-based methods. Additionally, incorporating temporal dynamics of career progression and integrating external economic or industry-specific variables may further improve model robustness.

Data availability

The data available at online repository Kaggle: https://www.kaggle.com/datasets/adilshamim8/education-and-career-success.

References

Lee, L. W., Dabirian, A., McCarthy, I. P. & Kietzmann, J. Making sense of text: artificial intelligence-enabled content analysis. Eur. J. Mark. 54 (3), 615–644. https://doi.org/10.1108/EJM-02-2019-0219 (Jan. 2020).
Wu, X., Li, L., Tao, X., Yuan, J. & Xie, H. Towards the explanation consistency of citizen groups in happiness prediction via factor decorrelation. IEEE Trans. Emerg. Top. Comput. Intell. 9 (2), 1392–1405. https://doi.org/10.1109/TETCI.2025.3537918 (2025).
Article Google Scholar
Naz, A. et al. Using Transformers and Bi-LSTM with sentence embeddings for prediction of openness human personality trait. PeerJ Comput. Sci. 11, e2781. https://doi.org/10.7717/peerj-cs.2781 (May 2025).
Wan, B., Wu, P., Yeo, C. K. & Li, G. Emotion-cognitive reasoning integrated BERT for sentiment analysis of online public opinions on emergencies. Inf. Process. Manag. 61 (2). https://doi.org/10.1016/j.ipm.2023.103609 (Mar. 2024).
Naz, A. et al. Machine and deep learning for personality traits detection: a comprehensive survey and open research challenges. Artif. Intell. Rev. 58 (8), 239. https://doi.org/10.1007/s10462-025-11245-3 (2025).
Article Google Scholar
Li, Q., Zeng, Z., Li, T. & Sun, S. Identifying artificial intelligence–generated content in online Q&A communities through interpretable machine learning. J. Inf. Sci., 0, 0, p. 01655515241281491, https://doi.org/10.1177/01655515241281491
Khosravi, H. et al. Explainable artificial intelligence in education. Computers Education: Artif. Intell. 3, 100074. https://doi.org/10.1016/j.caeai.2022.100074 (2022).
Article Google Scholar
Alsaif, S. A., Sassi Hidri, M., Ferjani, I., Eleraky, H. A. & Hidri, A. Dec., NLP-Based Bi-directional recommendation system: towards recommending jobs to job seekers and resumes to recruiters, Big Data and Cognitive Computing, 6, 4, (2022). https://doi.org/10.3390/bdcc6040147
Rahhal, I., Kassou, I. & Ghogho, M. Data science for job market analysis: A survey on applications and techniques. Expert Syst. Appl. 251, 124101. https://doi.org/10.1016/j.eswa.2024.124101 (2024).
Article Google Scholar
Faruque, S. H., Khushbu, S. A. & Akter, S. Decision support system to reveal future career over students’ survey using explainable AI. Educ. Inf. Technol. (Dordr). https://doi.org/10.1007/s10639-025-13361-7 (2025).
Article Google Scholar
Köchling, A., Wehner, M. C. & Ruhle, S. A. This (AI)n’t fair? Employee reactions to artificial intelligence (AI) in career development systems. Rev. Managerial Sci. Apr. https://doi.org/10.1007/s11846-024-00789-3 (2024).
Article Google Scholar
Zhang, Y. Path of career planning and employment strategy based on deep learning in the information age. PLoS One. 19 (10), e0308654. https://doi.org/10.1371/journal.pone.0308654 (Oct. 2024).
Wu, X., Li, L., Tao, X., Xing, F. & Yuan, J. Happiness prediction with domain knowledge integration and explanation consistency. IEEE Trans. Comput. Soc. Syst. https://doi.org/10.1109/TCSS.2025.3529946 (2025).
Article Google Scholar
Hassan, A., Nawaz, S., Tahira, S. & Ahmed, A. Preterm birth prediction using an explainable machine learning approach. Artif. Intell. Appl. Mar. https://doi.org/10.47852/BONVIEWAIA52024517 (2022).
Article Google Scholar
Song, Q. C., Shin, H. J., Tang, C., Hanna, A. & Behrend, T. Investigating machine learning’s capacity to enhance the prediction of career choices, Pers Psychol, 77(2), 295–319, https://doi.org/10.1111/PEPS.12529. (2024)
Rezaiee Fard, A. & Amiri, B. Decoding career success: A personality-based analysis of data science professional based on ANFIS modeling. Heliyon 10 (13), e34130. https://doi.org/10.1016/J.HELIYON.2024.E34130 (Jul. 2024).
Pico-Saltos, R., Garzás, J., Redchuk, A., Escandón-Panchana, P. & Morante-Carballo, F. Role of alumni program in the prediction of career success in an Ecuadorian public university. Appl. Sci. (Switzerland). 12 (19), 9892. https://doi.org/10.3390/APP12199892 (Oct. 2022).
Huang, C. Q. et al. XKT: toward explainable knowledge tracing model with cognitive learning theories for questions of multiple knowledge concepts. IEEE Trans. Knowl. Data Eng. 36 (11), 7308–7325. https://doi.org/10.1109/TKDE.2024.3418098 (2024).
Article Google Scholar
Liu, Y., Cao, S. & Chen, G. Research on the long-term mechanism of using public service platforms in National smart education—based on the double reduction policy. Sage Open. 14 (1). https://doi.org/10.1177/21582440241239471 (Jan. 2024).
Pan, Y. & Xu, J. Human-machine plan conflict and conflict resolution in a visual search task. Int. J. Hum. Comput. Stud. 193, 103377. https://doi.org/10.1016/J.IJHCS.2024.103377 (Jan. 2025).
Zhang, H., Xia, B., Li, Q. & Wang, X. The effect of self-enhancement motivation and political skill on the relationship between workplace exclusion and ingratiation, Current Psychology, vol. 44, no. 7, pp. 5399–5412, Apr. (2025). https://doi.org/10.1007/S12144-025-07534-3
Ramos-Pulido, S., Hernández-Gress, N. & Torres-delgado, G. exploring the relationship between career satisfaction and university learning using data science models,Inf. 2024, 11, Page 6, 11, 1, p. 6, doi: https://doi.org/10.3390/INFORMATICS11010006.Jan. (2024).
Li, F., Majid, N. A. & Ding, S. Unlocking the potential of LSTM for accurate salary prediction with MLE, Jeffreys prior, and advanced risk functions. PeerJ Comput. Sci. 10, e1875. https://doi.org/10.7717/PEERJ-CS.1875 (Feb. 2024).
Mahboob, K., Asif, R. & Haider, N. G. Career planning matters: Intelligence-based career path predictions using data mining models - A longitudinal study, Mehran University Research Journal of Engineering and Technology, vol. 43, no. 4, pp. 192–213, Oct. (2024). https://doi.org/10.22581/MUET1982.3343
Wang, P. et al. Methods for measuring career readiness of high school students: based on multidimensional item response theory and text mining. Humanit. Soc. Sci. Commun. 11 (1), 1–15. https://doi.org/10.1057/S41599-024-03436-0;SUBJMETA=160,4014,477 (Dec. 2024).
Xue, X. Predicting career development paths of college students using biomechanical and behavioral data with machine learning, Molecular & Cellular Biomechanics, vol. 21, no. 3, pp. 612–612, Dec. (2024). https://doi.org/10.62617/MCB612
Gülten, H. & Baraçlı, H. A machine learning-based forecast model for career planning in human resource management: a case study of the Turkish post corporation. Appl. Sci. 2024. 14 (15), 6679. https://doi.org/10.3390/APP14156679 (Jul. 2024). Page 6679.
Liu, M. et al. Research on predicting the turnover of graduates using an enhanced random forest model. Behav. Sci. 2024. 14 (7), 562. https://doi.org/10.3390/BS14070562 (Jul. 2024). Page 562.
LeBlond, B. et al. Using machine learning to identify educational predictors of career and job satisfaction in adults with disabilities. Disabil. 2025. 5 (2), 56. https://doi.org/10.3390/DISABILITIES5020056 (Jun. 2025). Page 56.
Naz, A. et al. AI knows you: deep learning model for prediction of extroversion personality trait. IEEE Access. 1. https://doi.org/10.1109/ACCESS.2024.3486578 (2024).
Trujillo, F., Pozo, M. & Suntaxi, G. Artificial intelligence in education: a systematic literature review of machine learning approaches in student career prediction. J. Technol. Sci. Educ. 15 (1), 162–185. https://doi.org/10.3926/JOTSE.3124 (2025).
Article Google Scholar
Nasifoglu Elidemir, S., Ozturen, A. & Bayighomog, S. W. Innovative behaviors, employee creativity, and sustainable competitive advantage: a moderated mediation. Sustainability 12 (8). https://doi.org/10.3390/su12083295 (2020).
Bahalkar, P., Peddi, P. & Jain, S. AI-Driven Career Guidance System: A predictive model for student subject recommendations based on academic performance and aspirations, Frontiers in Health Informatics, vol. 13, no. 3, pp. 8216–8230, Nov. 2024, Accessed: Jun. 18, 2025. [Online]. Available: https://healthinformaticsjournal.com/index.php/IJMI/article/view/781
Ishfaq, U., Khan, H. U. & Iqbal, S. Identifying the influential nodes in complex social networks using centrality-based approach, Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 10, pp. 9376–9392, Nov. (2022). https://doi.org/10.1016/j.jksuci.2022.09.016
Alsini, R. et al. Using deep learning and word embeddings for predicting human agreeableness behavior. Sci. Rep. 14 (1). https://doi.org/10.1038/s41598-024-81506-8 (Dec. 2024).
Hassan, A. et al. Enhanced model for gestational diabetes mellitus prediction using a fusion technique of multiple algorithms with explainability, International Journal of Computational Intelligence Systems, vol. 18, no. 1, pp. 1–33, Dec. (2025). https://doi.org/10.1007/S44196-025-00760-4
Gulzar Ahmad, S. et al. IoT-based smart wearable belt for tracking fetal kicks and movements in expectant mothers. IEEE Sens. J. 25, 27322–27333. https://doi.org/10.1109/JSEN.2025.3571818 (2025).
Article Google Scholar
Li, B., Li, G. & Luo, J. Latent but not absent: the ‘long tail’ nature of rural special education and its dynamic correction mechanism. PLoS One. 16 (3), e0242023. https://doi.org/10.1371/JOURNAL.PONE.0242023 (Mar. 2021).
Science, A., Hassan, A. & Ahmed, A. Predicting parkinson’s disease progression: a non-invasive method leveraging voice inputs, Computer Science, vol. Vol:8, no. Issue:2, pp. 66–82, Dec. (2023). https://doi.org/10.53070/BBD.1350356
Fei, Z. & Li, J. Does education level impact parental gender preferences? A comparative perspective on fathers and mothers in China. Acta Psychol. (Amst). 257, 105110. https://doi.org/10.1016/J.ACTPSY.2025.105110 (Jul. 2025).

Download references

Author information

Authors and Affiliations

Faculty of Education, Shaanxi Normal University, Xi’an, Shaanxi, 710062, China
Zhao Zihan
College Institute of Education, Xi’an FanYi University, Xi’an, Shaanxi, 710105, China
Zhao Zihan

Authors

Zhao Zihan
View author publications
Search author on:PubMed Google Scholar

Contributions

The author Zhao Zihan fully contributed to conducting this study.

Corresponding author

Correspondence to Zhao Zihan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zihan, Z. A multi-factor data mining and transformer-based predictive modeling approach for career success using educational and behavioral traits. Sci Rep 15, 39484 (2025). https://doi.org/10.1038/s41598-025-23078-9

Download citation

Received: 21 August 2025
Accepted: 03 October 2025
Published: 11 November 2025
Version of record: 11 November 2025
DOI: https://doi.org/10.1038/s41598-025-23078-9