Introduction

The rapid development of the Internet industry has led to the increasing prominence of online recruitment (Cao and Wang, 2023). Traditional analytical methods are difficult to comprehensively handle the statistical analysis of online recruitment data, making it difficult for college graduates to cope with constantly changing market demands when seeking employment (David and Azad, 2014). This challenge often leads to career mismatch, job dissatisfaction, and reduced employment satisfaction (Gu, 2012). Therefore, the gap between the skills taught through university education and those pursued by employers has intensified, resulting in a dual challenge where enterprises face difficulties in filling vacancies, while students encounter obstacles in employment. Job seekers majoring in Teaching Chinese as a Foreign Language (TCSOL) also face this problem.

The TCSOL program aims to cultivate bilingual, cross-cultural, and comprehensive professionals, who possess the basic qualities and skills in Chinese international education. TCSOL graduates should possess theoretical knowledge and practical application abilities in both Chinese and foreign languages, enabling themselves to effectively engage in international Chinese language teaching and promote Chinese culture in various educational institutions both domestically and internationally. With the introduction of the international talent training strategies, the number of TCSOL students has been steadily increasing. Since the 21st century, TCSOL has continuously expanded its undergraduate, graduate, and doctoral programs in China. According to statistics from the Ministry of Education of the People’s Republic of China, by 2010, 285 universities offered TCSOL programs, and this number increased to 342 by 2012, with a total enrollment of 63,933 students. By 2023, 409 institutions were offering four-year undergraduate programs, and 198 provided professional master’s degrees. In addition, in 2022, TCSOL introduced a doctoral degree program offered by 23 institutions (Li and Wu, 2024). On this basis, the attention to the employment prospects of TCSOL professionals has been strengthened.

The recruitment data related to TCSOL is disseminated through various online platforms, which is an important resource for understanding the market demand for such talents. Given that electronic recruitment data is mainly presented in text format, text mining techniques can better extract the information contained in recruitment information (Zhu, 2020). Text mining technology captures and analyzes recruitment information, providing job seekers with insights into salary levels, work locations, educational requirements, and company sizes (Yao et al. 2022). Students majoring in TCSOL are more concerned about some questions particularly relevant to TCSOL, such as where there is more demand for Chinese language education majors? What types of jobs can applicants majoring in TCSOL do? What kind of job skills job seekers need to master? etc., all of which can be addressed through text mining technology (Wan et al. 2023). Furthermore, the results of text mining analysis provide valuable insights for career guidance centers within educational institutions, helping to develop tailored talent training programs and course structures that meet current market demands (Li and Gao, 2018). With the rise of text mining, recruitment data analysis has become more detailed, allowing for a deeper understanding of market demand.

The research on text mining applications for analyzing talent demand in the context of academic employment mainly focuses on long-term data analysis in fields such as information engineering (Li et al. 2023), big data analytics (Wang and Yao, 2024), and graphic intelligence (Ma, 2022). However, in humanities disciplines such as TCSOL, there is a clear lack of research on text mining of recruitment data. Despite extensive discussions on the employment prospects of TCSOL graduates, there is limited research specifically focused on data processing and analysis of recruitment information for this major. This gap highlights the necessity of using text mining methods to clarify recruitment information related to TCSOL and provide valuable insights for students and educators. This article uses web crawling technology to aggregate recruitment data from online platforms and conducts text mining on basic information about the relevant job market, aiming to explore the career needs of recruitment units for TCSOL professionals. It can not only help job seekers grasp the needs of the recruitment market, but also provide some references for universities and educators to develop teaching systems and talent training goals.

The main contributions of this paper include: (1) By integrating three methodologies, namely LDA topic modeling, BERT-BiLSTM-CRF based named entity recognition, and co-occurrence network analysis, a comprehensive multi-level analysis framework is established encompassing macro to micro levels and content mining to relationship analysis. Among these methods, LDA topic modeling can cluster the job descriptions in the recruitment information, and the named entity recognition model based on BERT-BiLSTM-CRF can deeply explore keyword entities explicitly mentioned in the recruitment texts. Based on entity extraction, co-occurrence network analysis is conducted to explore the correlation and structural characteristics of different job requirements, to reflect their synergies effects in recruitment demands. This combination of methods is applied for the first time to the mining of TCSOL professional talent demands, filling gaps in previous studies. (2) This study introduces time series analysis to examine the trend of skill demand. By incorporating a temporal dimension, the analysis covers the demand characteristics of TCSOL professionals from specific time points to annual trends, comprehensively studying the trends and predictions of skill-related keywords over time. (3) In addition to the text analysis methods, this paper innovatively constructs a hierarchical model of talent demand and development in TCSOL, incorporating perspectives from employers, job seekers, educators, and policymakers. This model visualizes key insights into the skills and development strategies required for future professionals in the field. This study aims to address the following key questions:

  1. 1.

    What specific aspects constitute the primary focus of talent recruitment in the domain of international Chinese education?

  2. 2.

    What key skills and qualities are essential for TCSOL professionals?

Literature Review

The internet recruitment platform is a key channel for information exchange between enterprises and college graduates. More and more enterprises disseminate recruitment information through various online platforms, thereby solving the problem of information asymmetry in the job market. Scholars have responded by conducting in-depth research on the field of online recruitment information, proposing various analytical methods. Todd et al. (1995) conducted keyword extraction and frequency analysis on information system positions advertised in newspapers across the United States and Canada. Lee and Lee (2006) compiled a classification catalog of skills in information technology positions and compiled the original recruitment dictionary. MS Sodhi and Son (2010) further made cross analysis and correlation analysis of frequency based on Todd and Lee’s research, and compiled a dictionary of skill keywords. They proposed an empirical method for inferring employers’ operational research (OR) skills requirements through content analysis of online job advertisements. Their research provides a reference for utilizing intelligent analysis methods to analyze talent needs. However, akin to Den et al. (2006), these studies predominantly focused on simplistic keyword frequency analyses, failing to delve deeper into the nuanced relationship between skills and positions.

Text mining has emerged as a pivotal research field gaining increasing attention within academic circles. Text mining, also known as text analysis, is the process of using some technical methods to extract some potentially relevant information contained in the text (Feldman and Dagan, 1995). Some visualization software is used to present the analysis results to provide data support for subsequent work decisions (Zhu et al, 2024).

Leon et al. (2017), Xia et al. (2016) and Lv and Han (2012) were among the pioneering scholars to employ intelligent research tools for the automated processing of job advertisements. With the advancement of research, an increasing number of scholars have undertaken comprehensive text mining analyses of job advertisements from diverse perspectives. Qian et al. (2022) believe that competitive intelligence requires detailed analysis of the recruitment information publicly released by enterprises. The researchers used the attractiveness theory and text mining methods to build an enterprise competitiveness analysis model. Guo (2020) discussed the recruitment process from big data perspective, analyzed the factors influencing effective recruitment, and built a more precise recruitment system by combining employer situations with specific company information, verifying the results.

Chen (2020) analyzed the recruitment situation of industries related to the keywords “data analysis” in six typical cities, used statistical analysis method for structured data, supplemented the keywords of unstructured data using the Word2Vec model to obtain an overview of the topic. Pan et al. (2022) explored the employment needs of recruiters for big data professional positions in the job market and explored the skill requirements. Wei (2022) analyzed the employment situation of statistics majors, obtained future recruitment information, analyzed data on enterprises, qualifications and positions, extracted key terms related to job requirements, and established decision tree model and association rule models, and mined correlations between various factors. Tong and Xu (2024) designed an analytical framework for intelligently mining professional positions and skill requirements from unstructured data of online recruitment information by using text mining method. The research provides a reference for the national talent policy, the professional construction of colleges and universities, and the information of social demand for job seekers.

In recent years, text mining technology has been improving, and some scholars have begun to use some intelligent research tools to automate the processing of recruitment information. Various methods based on natural language processing (NLP) are widely used in text mining tasks, among which the topic probability model and the named entity recognition (NER) method based on deep learning are representative.

Latent Dirichlet Allocation (LDA), first introduced by Blei et al. (2003), is one of the most popular methods in topic modeling. LDA is a generative probabilistic model that identifies potential topic information within large document sets or corpora. Some scholars use the LDA topic model to visually display the core job requirements in the recruitment information and analyze the recruitment needs. Yang (2019) and Zhu (2020) both analyzed the situation of data positions. Yang focused on the job requirements of Shandong Province, while Zhu explored the job conditions of 297 industries in 7 regions, established the LDA model, constructed the subject situation of data positions, and discussed the talent requirements. Yue et al. (2022) collected the recruitment information of the mobile Internet industry, established the LDA topic model of the post requirement text to analyze the talent demand. Wu et al. (2021) devised a feature scoring algorithm based on the LDA topic model, dynamically scoring subject words and multiple features within recruitment text to extract requisite skills. Li (2021) used Scrapy to crawl web technology job postings in response to the new situation, helping candidates understand enterprise talent demand.

Although LDA and other models have been widely used by scholars in the field of topic modeling, there are also some shortcomings. LDA models assume that documents are represented by word bags, ignoring the order and context information between words, and fail to capture the semantic association between words. It has been proved that deep learning models, such as Bi-directional Long Short Term Memory (BiLSTM) and Bidirectional Encoder Representation from Transformers (BERT) models, have the ability to capture contextual semantic associations (Wang and Gao, 2021). Using the word vector of deep learning models such as BERT as input can provide a richer semantic representation for the LDA model and improve the accuracy of topic modeling (Zhao et al. 2021). Named entity recognition method based on deep learning has been applied in many fields and obtained excellent results (Li et al. 2021). Schuster and Paliwal (1997) proposed the Bi-directional Long Short Term Memory (Bi-LSTM) network, which can learn bidirectional information of input sequences to obtain optimal coding sequences, improving entity recognition. However, although the BiLSTM model can automatically extract text sequence features according to the target entity, it cannot learn the constraints and dependencies between the output labels. In this regard, many scholars consider combining the BiLSTM model with the conditional Random Fields (CRF) model (Lafferty et al. 2001) to give full play to the advantages of the CRF model considering the order among output labels, thereby outputting the annotation sequence with the highest joint probability.

Huang et al. (2015) first proposed the application of the BiLSTM-CRF model in sequence labeling data sets. Liang et al. (2021) carried out named entity recognition for data science recruitment texts, and concluded that the BiLSTM-CRF model had the best entity recognition effect by constructing corpus sets and model comparison experiments. He et al. (2023) built a corpus set for academic named entity recognition in colleges and universities through entity labeling, and conducted comparative experiments on four named entity recognition models. The results showed that BiLSTM-CRF model had the best entity recognition effect on the corpus data set constructed by it. Cheng et al. (2022) used the BiLSTM-CRF model to extract skill entities from IT job recruitment texts and conducted in-depth analysis to provide guidance for job seekers in the IT field. The BiLSTM-CRF model can achieve good results in named entity recognition tasks. However, the BiLSTM-CRF model cannot solve the problems of polysemy and multiplicity of one word in Chinese text, and cannot obtain semantic information well from text (Zheng et al. 2023). Therefore, some scholars consider adding a pre-training model based on the traditional BiLSTM-CRF model. For example, the BERT model.

Devlin et al. (2019) of Google proposed a bidirectional encoder representation BERT method based on Transformer, which is a deep bidirectional representation pre-training model that fully considers word ambiguity and possesses strong semantic analysis capabilities. Wang et al. (2019) combined the BERT model and BiLSTM-CRF model for named entity recognition, and obtained a 94.86% F1 value on the People’s Daily dataset. Hu et al. (2022) used the BERT-BiLSTM-CRF model for named entity recognition of educational technology main courses, and achieved high recognition performance on the corpus data set of educational technology main courses constructed. Wang et al. (2023) used the optimized BERT-BiLSTM-CRF algorithm to solve the problem that the text of earthquake emergency information constantly changes with the time of earthquake occurrence. Based on the historical earthquake emergency information dataset of the past 10 years, the model can efficiently and accurately extract earthquake emergency information from online media, and its performance is better than other baseline models. Zhuang et al. (2024) suggested that geological entity identification from texts was significant for researchers to mine and analyze geological data.

Based on the LDA topic model and named entity recognition, it is feasible to extract key information regarding the job capacity requirements related to the major of TCSOL from unstructured online recruitment texts automatically and accurately. Thus, the current capacity requirements for professionals in TCSOL positions in the recruitment market can be obtained at a deeper and more precise level, circumventing the deficiencies of descriptive statistical analysis. Considering the polysemy phenomenon exists in the words within online recruitment texts, it is prone to cause incorrect identification of entity types. The BERT-BiLSTM-CRF model is applied to the employment requirement data of TCSOL-related positions for text analysis, which can accurately extract the job skill requirements and work experience description, and provide support for career recommendations.

Research Methodology

This study uses a multi-method framework combining LDA topic modeling, BERT-BiLSTM-CRF-based named entity recognition (NER), and social network analysis (SNA) to comprehensively analyze the recruitment demand for TCSOL professionals. The specific analysis steps are illustrated in Fig. 1. The detailed description of research procedures and text feature extraction methods is available in Appendices A1 and A2. Additionally, the architecture of the BERT-BiLSTM-CRF model is shown in Fig. A1.

Fig. 1: Data analysis process.
figure 1

The data analysis includes four steps: identifying data sources, collecting recruitment information via web crawlers, preprocessing data, and building models for visual analysis.

Data Acquisition

According to the latest data from iiMedia Research (2023), the survey of the most used recruitment platforms by Chinese enterprises in 2023 (TOP5) reveals that 51job is the most used platform by 53.7% of consumers, followed by Boss Zhipin at 51.9%, Zhaopin at 46.2%, liepin at 27.8%, and LinkedIn at 20.3%. LinkedIn mainly focuses on overseas recruitment, was excluded from this study. Therefore, four commonly used recruitment platforms in China (Boss Zhipin, Zhaopin.com, 51Job.com, and Liepin) were chosen as data sources.

Three Chinese keywords were selected for search: “对外汉语教学” (Teaching Chinese as a Foreign Language), “汉语国际教育” (Teaching Chinese to Speakers of Other Languages), and “国际中文” (International Chinese Language). These keywords reflect both the historical and contemporary evolution of Chinese language education terminology. The process for selecting these keywords is detailed in Appendix A3. A comparison of key factors in the evolution of the three terminologies is shown in Appendix Table A1. The data collection spanned from August 25 to August 27, 2023.

The collected job information encompasses three main parts: detailed information related to the enterprise (e.g., name, type, scale, industry), job-related information (e.g., job title, location, experience, education, salary, recruitment number), and detailed job descriptions (e.g., responsibilities, requirements).

This study used Bazhuayu collector (https://www.bazhuayu.com) as a data acquisition tool to collect data, providing abundant data that greatly improves the reliability of analysis results. Additionally, Python was used during the data preprocessing stage to handle more complicated text content. A total of 4313 recruitment information related to the TCSOL major were initially collected. A portion of the collected results is presented in Appendix Table A2.

A detailed description of the recruitment data cleaning steps is provided in Appendix A4. After removing duplicates and irrelevant text, a total of 3310 recruitment messages were saved in Excel tables. The specific source distribution is as follows: 784 pieces of data were collected from the Boss Zhipin recruitment website, 752 from the 51job.com recruitment website, 1325 from the Zhaopin.com recruitment website, and 449 from the Liepin website.

Data Preprocessing

Data preprocessing is a critical step in ensuring the reliability and accuracy of subsequent analysis (Li, 2022). This study used NLP techniques such as data cleaning, Chinese word segmentation, stop word removal, NER, and LDA topic modeling to process recruitment information.

Data cleaning involved removing irrelevant and duplicate entries, special characters, and short or incomplete texts. The cleaned text was then divided into Chinese words. Details of the data cleaning process and Chinese word segmentation are provided in Appendix A5. For NER, an entity tag set was defined based on recruitment characteristics, covering four categories: comprehensive abilities, character traits, experience requirements, and educational background (Table 1).

Table 1 Part of the recruitment information collection results.

Using the Doccano annotation tool, over 110,000 words were labeled following the BIO annotation strategy, which categorizes text into entity beginnings (B-X), interiors (I-X), and non-entities (O). For instance, for the sentence “{普通话标准 (Mandarin proficiency standard), 有1年以上对外汉语教学经验 (more than 1 year of experience in teaching Chinese as a foreign language)}”, the BIO annotation should be “{‘B-comprehensive abilities (nl)’, ‘I-comprehensive abilities (nl)’, “I-comprehensive abilities (nl)’, ‘I-comprehensive abilities (nl)’, ‘I-comprehensive abilities (nl)’, ‘O’, ‘O’, ‘O’, ‘O’, ‘O’, ‘O’, ‘B-experience (jy)’, ‘I-experience (jy)’, ‘I-experience (jy)’, ‘I-experience (jy)’, ‘I-experience (jy)’, ‘I-experience (jy)’, ‘I-experience (jy)’, ‘I-experience (jy)’}”. Here, “Mandarin proficiency standard” is annotated as the entity “comprehensive abilities” and “with more than 1 year of experience in teaching Chinese as a foreign language” is annotated as the entity “experience”.

The annotated dataset was split into training, test, and validation sets at an 8:1:1 ratio. Three models—BiLSTM, BiLSTM-CRF, and BERT-BiLSTM-CRF—were trained and evaluated for NER tasks using metrics such as accuracy, recall, and F1-score. Detailed descriptions of the variables for model evaluation and the recognition results of three NER models are provided in Appendix A6. The recognition results of these models are presented in Table A3. Specific model parameter settings are summarized in Appendix Table A4.

This comprehensive preprocessing framework ensured high-quality data input, enabling robust model training and accurate results.

Results

Structured Data

Data collected from recruitment websites includes structured and unstructured data. The structured data contains fields such as work location, work experience requirements, education background, salary, company type and size, and job type. The format is unified for analysis, and descriptive statistics are summarized below.

The dataset records work locations at the city level. The histogram shows the distribution of work locations (Fig. 2). Recruitment is concentrated in first-tier cities such as Beijing, Shanghai, Guangzhou, and Shenzhen. Shanghai has the highest demand (15% of all posts), while regions in the northeast and northwest, such as Hulun Buir and Yanji, show minimal demand. This geographical difference indicates that TCSOL opportunities are concentrated in economically developed and densely populated urban centers, with relatively low demand in remote areas.

Fig. 2: Distribution statistics of duty stations.
figure 2

Recruitment is concentrated in first-tier cities such as Beijing, Shanghai, Guangzhou, and Shenzhen. Shanghai has the highest demand.

Work experience requirements vary across positions, reflecting the diversity in industry needs and job responsibilities. As shown in Fig. 3, 42.9% of jobs require 1–3 years of experience, followed by 30.02% with no explicit requirement, and 16.96% demand 3–5 years of experience. Jobs requiring over 10 years of experience are rare, accounting for only 0.29%, typically limited to leadership or highly specialized positions. Figure 3 shows TCSOL’s focus on both experienced educators and trainable graduates, valuing practical teaching skills while ensuring a steady flow of new talent.

Fig. 3: Work experience length requirements.
figure 3

The pie chart shows the distribution of job postings based on work experience length requirements.

Academic qualifications are a crucial criterion for businesses to assess resumes and hire talent, showcasing the knowledge and technical requirements of the industry. The educational requirements are categorized into six levels: doctor, postgraduate, undergraduate, junior college, high school degree or less, and unlimited education background. As illustrated in Fig. 4, most positions require a bachelor’s degree (59.35%) or junior college (25.33%), while doctorate-level positions are rare, comprising only 0.72% of the total.

Fig. 4: Educational requirements.
figure 4

The pie chart shows the distribution of educational qualifications required for job positions.

The salary level reflects the talent shortage and technical demands of the position. Due to the use of different units in the original data (e.g., annual salary or “ten thousand”), all salaries were standardized to monthly salaries in thousands (k) RMB. Salaries were divided into seven ranges: 0–5k, 5–10k, 10–15k, 15–20k, 20–25k, 25–30k, and above 30k. The normal distribution of overall salary is plotted in Fig. 5 according to the sorted salary levels. As shown in Fig. 5, 49% of positions offer salaries between 5–10k per month, aligning with the expectations of educators. Salaries usually increase with experience and years of service, while higher salary positions are less common, with those above 30k accounting for only 1%. Higher salaries attract job seekers, but they also require more experience and skills.

Fig. 5: Salary distribution.
figure 5

The histogram shows the frequency of job postings across different salary ranges (in thousands).

The company type is a key factor for job seekers, as it impacts career prospects and development opportunities. The collected data (Fig. 6) shows that recruitment information from public institutions and government agencies is limited as they typically follow unified recruitment schedules and rarely post on commercial recruitment platforms. These sectors only account for 2.53% and 2.8% of enterprise types, respectively. In contrast, private enterprises dominate, representing 58.72% of recruitment postings, underscoring their significant role in China’s economy. Foreign-invested enterprises follow, accounting for 15.85%, including European-American and non-European-American entities, providing diverse choices for international Chinese language education professionals.

Fig. 6: Company type distribution.
figure 6

The pie chart shows the proportions of various company types.

This study categorizes recruitment companies into six scale ranges, ranging from less than 50 employees to over 10,000 employees. Figure 7 shows the scale distribution of recruitment companies. Fifty-three percent of hiring companies are small businesses with 50–500 employees, accounting for the majority. Additionally, 24% are small enterprises with less than 100 employees, while medium-sized companies (500–1000 employees) account for 7%. Large enterprises with over 10,000 employees are the smallest group, accounting for only 3% of the total.

Fig. 7: Company size distribution.
figure 7

The figure categorizes recruitment companies into six size ranges, from less than 50 employees to over 10,000.

Due to challenges such as inconsistent job titles and data noise in job descriptions, this study adopted a standardized classification of TCSOL-related positions from online recruitment platforms. Over 3,000 recruitment messages were categorized by position type, as illustrated in Fig. 8.

Fig. 8: Job type distribution.
figure 8

The chart shows the percentage distribution of various job types.

TCSOL talents are distributed across various industries, showing a diverse trend. However, the distribution is uneven, most of which are concentrated in education (39.36%), professional services (17.17%), Internet positions (9.79%) and import and export trade (8.08%). Although educational positions account for the largest share, they still account for less than half of the total demand due to the wide range of available positions. In contrast, industries such as government/public undertakings, social organizations/social security, and banking show minimal demand for TCSOL professionals, accounting for only 0.1–0.3%.

Unstructured Data

Unstructured data focuses on analyzing the content of job descriptions from recruitment postings to understand enterprise requirements for candidates. This study examines the needs for TCSOL-related positions and their dynamic changes from three perspectives: LDA topic modeling, BERT-BiLSTM-CRF named entity recognition, and co-occurrence network analysis.

LDA Topic Model Extends Text Features

The perplexity index evaluation method was used to determine the optimal topic number. The specific analytical details for selecting the number of topics are detailed in Appendix B1. Figure B1 illustrates the confusion degree corresponding to each model through the topic number cycle. The optimal number of topics for this study was ultimately determined to be 3 by combining the visualization module pyLDAvis with the LDA topic model. The Intertopic distance map representing different topic numbers is shown in Appendix Fig. B2. The three selected topics were analyzed for their thematic content, with keyword distributions visualized in Figs. 911. The topic-term distribution is represented by a bar, which reflects the frequency of the term. The word on the left is the keyword representing the topic. The red and blue bars in these figures represent the term’s weight in the specific topic and the entire corpus, respectively. A longer red bar indicates higher relevance to the topic.

Fig. 9: Top-30 most relevant terms for Topic 1.
figure 9

Saliency (term w) = frequency (w) * [sum_t p (t | w)/p(t)] for topics t; see Chuang, Manning and Heer (2012). Relevance (term w | topic t) = λ * p (w | t)/p(w); see Sievert and Shirley (2014). Minor discrepancies due to rounding may cause totals to deviate slightly from 100%, without impacting the interpretation of the data.

Fig. 10: Top-30 most relevant terms for Topic 2.
figure 10

Saliency (term w) = frequency (w) * [sum_t p (t | w)/p(t)] for topics t; see Chuang et al. (2012). Relevance (term w | topic t) = λ * p (w | t)/p(w); see Sievert and Shirley (2014). Minor discrepancies due to rounding may cause totals to deviate slightly from 100%, without impacting the interpretation of the data.

Fig. 11: Top-30 most relevant terms for Topic 3.
figure 11

Saliency (term w) = frequency (w) * [sum_t p (t | w)/p(t)] for topics t; see Chuang et al. (2012). Relevance (term w | topic t) = λ * p (w | t)/p(w); see Sievert and Shirley (2014). Minor discrepancies due to rounding may cause totals to deviate slightly from 100%, without impacting the interpretation of the data.

It is crucial to define the core meaning of topics. Although there have been advancements in statistical indicators, the accuracy of the output cannot be guaranteed due to the complexity of language (Grimmer and Stewart, 2017). Therefore, when summarizing the subject-word matrix of each topic, it is necessary to assign artificial titles to accurately reflect the internal connection and context of the corresponding keywords.

As shown in Table 2, there exists a notable degree of overlap between the main theme keywords, especially terms such as “school”, “international”, and “Chinese as a foreign language”. These terms mainly refer to positions related to international Chinese language educators. However, the importance of each keyword varies, resulting in differing emphases. The number of keywords selected in the calculation process of the LDA topic model is 30, as shown in Table 2. When describing the content of these three topics, Topic 1, characterized by the highest proportion of text posts, accounting for 51% of the discourse. The most relevant keywords for Topic 1 include “school”, “company”, “management”, “study abroad”, and “Japanese”. It differs significantly from other topics, so Topic 1 is classified as an educational position, mainly used for teaching and managing TCSOL in foreign schools or training institutions abroad. The second most is Topic 2, relating to the “copy editor position”, which accounts for 25.3% of the total thematic distribution. Topic 3 on the thematic dimensions of “sales position” contributed 23.8%. Based on keywords such as “children”, “client”, “tutorship” and “sales”, it may be related to children’s tutoring organizations and involve sales related responsibilities.

Table 2 A series of a subject-word matrix and weight of three topics.

BERT-BiLSTM-CRF based named entity recognition

When revealing the specific content of job requirements, it is difficult for simple thematic analysis to dig deeply into the key entities explicitly mentioned in the recruitment text (such as “children”, “treatment”, “arrange”, etc.). BERT-BiLSTM-CRF model has certain superiority in the entity recognition task for TCSOL professionals and has shown good recognition performance. Therefore, this study selects the BERT-BiLSTM-CRF model to extract recruitment entities from a corpus of 3310 TCSOL professional job postings. The extracted results are visually represented and analyzed.

Four categories of entities are identified through word frequency statistics: comprehensive ability, character requirements, experience requirements and educational requirements. Given the extensive number of entities, only high-frequency terms are selected for detailed focus and analysis.

From the perspective of comprehensive ability, TCSOL employers have highly diverse and practical requirements for job seekers. Especially the high-frequency demands for “communication ability” (260 times) and “expression ability” (91 times) indicate that language transmission and interaction skills are core competencies. These requirements are closely related to the professional nature of TCSOL. As implementers of language education, teachers need to effectively communicate language and culture in cross-cultural environments. Clear expression and accurate understanding of others are foundation for achieving teaching goals.

Additionally, the high frequency of soft skills such as “teamwork” (114 times) and “anti-pressure ability” (66 times) reflects the higher challenges of collaboration and psychological resilience in diverse teaching environments, especially in international cooperative teaching or high-intensity teaching tasks, which are important guarantees to ensure the smooth teaching process. Moreover, job seekers with “standard Mandarin” (103 times) and “teacher qualification certificate” (97 times) are generally favored by recruiters, indicating that basic teaching qualifications are an important threshold for employers to screen candidates. With the popularization of online education and digital teaching, the ability to master “office software” (70 times) and information teaching tools have become basic skills. Notably, “bilingualism” (80 times) is becoming a bonus point for international education positions, reflecting the need for linguistic diversity in cross-cultural teaching.

Both structured data and unstructured data contain descriptions of educational requirements. Educational qualifications for job seekers have become mandatory in recruitment information. A bachelor’s degree is the basic threshold for TCSOL, while a master’s degree or above is more related to high-end positions or research-oriented jobs.

In structured data, the “experience requirements” of recruitment positions are mainly analyzed from the perspective of time, while in unstructured data, the “experience requirements” are more detailed. As shown in Table 3, “teaching” (169 times) is the most valued type of work experience, which is highly consistent with the professional attributes of TCSOL. The accumulation of teaching experience can not only improve teachers’ classroom organization ability and teaching effect, but also help teachers better cope with the learning needs of students with different language backgrounds. Additionally, “study abroad” (93 times), “worked in an international school” (17 times) and “worked in different countries” (8 times) are considered valuable assets, indicating that candidates with an international background or relevant career experience are more competitive in a cross-cultural teaching environment. In contrast, work experiences such as “sales”, “edit” and “translation” are mentioned less frequently, reflecting the specialized nature of TCSOL roles. However, these requirements indicate that diverse skills and experiences are becoming increasingly important, encouraging TCSOL job seekers to explore broader developmental opportunities to enhance their career prospects.

Table 3 Entity type word frequency statistics.

Personality characteristics are important factors for employers to consider when selecting candidates for TCSOL positions. Because personality affects a person’s work and learning styles to some extent, the most valued personality traits by employers are “sense of responsibility” (mentioned 310 times), “patience” (133 times) and “passion for education” (104 times). This shows that the position of TCSOL has high requirements for teachers’ professional quality and psychological quality. With the further expansion of the scope of Chinese education, teachers will face more complex teaching environment and cultural differences. Teachers need to maintain their passion for education, take responsibility for students’ learning results, and maintain enough patience to resolve difficulties in teaching and enhance students’ sense of learning experience.

“Affinity” (95 times) and “caring” (80 times) reflect that friendly and caring educators are more likely to gain students’ goodwill and trust. Traits such as being “proactive” (26 times) and having a “pleasant personality” (24 times) are conducive to enhancing students’ learning enthusiasm. It can also create a relaxed and pleasant classroom atmosphere. Having a strong “professional ethics” (67 times) is a key factor that makes a candidate stand out.

In some service industries, such as sales, customer service and human resources, there are emphasis on “service awareness” (42 times) and “bear hardships and stand hard work” (35 times). These qualities can improve the quality of employment and recruit satisfactory service talents.

In summary, comprehensive abilities and personality traits are highly valued in TCSOL-related positions. This has a certain guiding effect on the future development direction of job seekers.

Co-occurrence network analysis

Based on entity extraction, this study further explores the correlation and structural characteristics of different job requirements through co-occurrence network analysis. For example, the frequency of co-occurrence of different competencies (eg. “Communication ability” and “anti-pressure ability”) can reflect their synergies in recruitment needs. The segmented text is imported into ROST CM6.0 to construct a co-occurrence matrix and co-occurrence network analysis diagram, as shown in Table 4. Through the analysis of network diagram, this study can reveal the core competence combination and its correlation model of job requirements, and provide support for the overall understanding of job requirements.

Table 4 Co-occurrence matrix of some feature words.

The co-occurrence semantic network diagram can visualize the feature words with high frequency co-occurrence relationship more clearly, as shown in Fig. 12. Most enterprises tend to use the words “education background”, “teaching”, “education”, “communication”, “experience” and “responsibility” to directly indicate the requirements for the professional ability and quality of candidates. These factors are the most fundamental factors in determining whether a company will hire a job applicant.

Fig. 12: Co-occurrence semantic networks.
figure 12

Nodes represent characteristic words. Arrows represent the co-occurrence paths in documents. The network is weighted, fully connected, and undirected.

The co-occurrence frequency of characteristic words indicates that a higher frequency correlates with increased complexity in the corresponding paths within the co-occurrence semantic network. Analysis of the co-occurrence matrix reveals that the word frequencies associated with “education” and “teaching” are notably high, reflecting the disciplinary nature of the TCSOL major. Specifically, “priority” exhibits the highest co-occurrence frequency with “education background” at 297 instances, followed closely by its association with “teaching”, which occurs 258 times. In examining the semantic network diagram, the path of “priority” is very complicated. It becomes evident that “priority” is intricately connected to concepts such as “teaching”, “education”, “communication”, “responsibility”, and “experience”. This reflects that enterprises attach great importance to job seekers’ work experience, education and vocational skills, which will be the core competitiveness that determines whether employees can be preferentially employed.

Besides “priority”, the pathways for “relevant”, “students”, “teaching”, “courses” and “training” are also quite complex. Particularly within the network centered on “relevant”, “relevant” is connected to “priority”, indicating that words associated with “relevant” and “priority” are prominently emphasized in job advertisements. For example, in the job position of training, enterprises may give priority to job seekers majoring in education-related fields. The word “team” is connected to “communication”, “cooperation”, “good” and “skilled” to form a complex pathway relationship, indicating that enterprises attach great importance to the team spirit and communication and coordination ability of employees.

The co-occurrence of terms such as “Mandarin”, “team”, “training”, “management”, “give lessons”, “sales” and “service” indicate a demand in the social market for international Chinese teachers who possess Mandarin proficiency, are team-oriented, and exhibit strong communication skills, along with substantial teaching experience.

Time series analysis

The overall analysis and correlation analysis of skill demands reflect the changes in skill demands at the current time point. To understand the future demands for certain common skills, the concept of time series analysis needs to be introduced. Time series analysis involves decomposing historical data into four parts: trend, cycle, season, and random factors, and then proposing predictions by integrating these factors (Katarya and Prasad, 2017).

In this study, Python libraries such as matplotlib and pandas are used to analyze and model time series. The time series data is generated based on the formula described above, and a trend graph displaying the trend, seasonality, and final demand (Fig. 13) is plotted to show the dynamic change of recruitment demand. Monthly recruitment demand analysis for 2023 were presented in Appendix Table A5. The trend component takes the total demand of 22,066 items in 2023 as the benchmark. Considering that TCSOL industry is showing a certain growth trend with the increase of Chinese learning popularity and the advancement of education internationalization process globally (Lei, 2024). According to relevant education market research reports and the comprehensive forecast of industry development in the next few years, recruitment demand in the TCSOL field is forecast to grow at an annual rate of 5%, reaching an estimated 23,169 jobs in 2024.

Fig. 13: Simulated recruitment demand trend (TCSOL).
figure 13

The red dotted line (Trend) represents a steady long-term growth trend in recruitment demand, reflecting the continued development of the TCSOL field. The green dotted line (Trend + Seasonality) indicates peak and trough periods, showing cyclical fluctuations that align with established recruitment market patterns. Blue solid line (Final Demand) indicates the final demand. Based on trend and seasonality, random fluctuations are added, which is closer to the actual situation.

In the seasonal component, based on the general cyclical law of the recruitment industry (Gong, 2009), the TCSOL field also shows obvious seasonal characteristics. The residual component takes into account that there are some random fluctuations in the actual data that cannot be explained by trends or seasonality, and the standard deviation (σ) is set at 100 bars following a normal distribution.

According to the “2023 China Online Recruitment Industry Market Research Report” released by the China Economic Industry Research Institute (2023), online recruitment typically follows a rule-based pattern with distinct recruitment seasons and off-seasons and recruitment demand shows a degree of stability across the year (Chen and Fomby, 1999).

The numerical value was relatively low at the beginning of the year, specifically in January and February. Subsequently, it rose rapidly in March and April due to the implementation stage of the new semester planning in educational institutions and the new supplementary demand for recruitment positions in the spring recruitment. After the increase, there was a fallback in May and June, followed by another rise in July to welcome the graduation season of college students and the traditional peak recruitment season of “golden September and silver October”. From August to November, various schools carried out autumn recruitment, but the number of positions mainly targeted at fresh graduates was relatively large, so the job demand decreased gradually month by month. With the conclusion of autumn recruitment and other factors such as holidays, recruitment gradually became less active. Numerically, the more obvious trough periods were distributed from December to February and from May to June. The values in the traditional peak recruitment months of March and April were higher, but the overall performance was not as good as that from June to August. In July, the overall indicator level reached its peak. This seasonal pattern is closely related to the teaching cycle in the education industry, the learning patterns of students, and the seasonal changes in market supply and demand, presenting a relatively stable and predictable cyclical fluctuation mode.

On account of the time series model constructed in this study and the analysis results, the recruitment demand in the TCSOL field exhibits a significant long-term growth trend and cyclical fluctuation characteristics. This dynamic change provides an important reference for the development of the industry and offers a basis for optimizing strategies for job seekers, recruitment platforms, and policy makers.

Based on the above time simulated, combined with comprehensive ability and character requirements in Table 3, experience requirements and educational requirements, a time series analysis is carried out on four specific recruitment demand keywords. The current word frequency data was used as the baseline trend, and a time series model was constructed under reasonable assumptions to predict future changes in recruitment demand. Due to the excessive number of entities, eight keywords with word frequency exceeding 100 were selected for analysis. The monthly demand trends are shown in Fig. 14.

Fig. 14: Estimated monthly trends for high frequency demand entities (2023).
figure 14

This line graph combined with four specific recruitment demand keywords as the baseline trend in Table 3, and a time series model was constructed under reasonable assumptions to predict future changes in recruitment demand.

Overall, the frequency of recruitment demand entities from December to January was relatively low, followed by a slow upward trend from February to April. It began to decline in April and reached a lower point in May. Significant fluctuations occurred from April to August, with the highest value observed in July. After August, it showed a slow downward trend, and the fluctuation frequency was not obvious, which is consistent with our previous assumptions of the recruitment timeline.

Among these eight entities, “Bachelor degree or above” had the highest frequency. Academic qualifications have become one of the hard conditions for job seekers. Especially in July, during the graduation season, many fresh graduates entered the job market. Enterprises increased their recruitment efforts for candidates with “bachelor degree or above” qualifications to meet their needs, resulting in peak demand reaching 450 in July. Enterprises usually formulate campus recruitment plans and focus on recruiting many fresh graduates for talent reserve and cultivation during the college graduation season. This also leads to significant fluctuations in the demand for “bachelor degree or above” around the graduation season.

The fluctuation trends of “communication ability”, “Bachelor degree or above”, and “patience” are closely aligned. During the peak job-hunting period for fresh graduates, enterprises would examine the communication ability through interviews, group discussions and similar methods. From April to July, many enterprises entered the business peak season or the critical promotion period of projects. In industries such as e-commerce or education, frequent communication with partners is required to solve problems. Therefore, strong communication skills are essential for enhancing workplace efficiency. The business peak seasons or critical nodes of some industries may be concentrated in specific months. For example, industries experience production peak seasons from April to July and sales peak seasons from October to November. During these periods, heavy workloads and tight schedules demand that individuals maintain “patience” under high-pressure conditions to ensure projects are completed on time and goals are met.

A “sense of responsibility” also occupies a considerable share in the entity demand. The difference is that although this demand fluctuates with the recruitment season, it remains relatively stable during the recruitment peak season.

“Teaching experience” shows similar fluctuation patterns, especially in TCSOL-related positions, where educational roles are abundant. Teachers’ sense of responsibility is pivotal in tasks such as course design, teaching, and student assessments. Similarly, in other industries, project-critical phases or teamwork-intensive tasks make this attribute a priority.

The fluctuation trends of “teamwork”, “standard Mandarin”, and “passion for education” are interconnected, particularly in education-related roles. Teaching in both schools and training institutions frequently involves collaboration between teachers and administrative staff, making teamwork indispensable. “Standard Mandarin” is the basic tool for effective teaching communication, especially for students from diverse linguistic backgrounds. It ensures the accurate transfer of knowledge and minimizes misunderstandings. People with a passion for education are more willing to contribute positively to their teams and continuously refine their teaching and language skills, thereby creating a more effective and engaging educational environment.

Hierarchical Model of Talent Demand and Development

Based on these findings, a hierarchical model of talent demand and development was constructed, incorporating perspectives from employers, job seekers, educators, and policymakers (Fig. 15). This model visualizes the core skills, theoretical contributions, and practical applications needed to bridge the gap between TCSOL training and market needs.

Fig. 15: Talent demand and development in TCSOL.
figure 15

The diagram outlines essential skills, theoretical contributions, and practical applications from the perspectives of employers, job seekers, educators, and policymakers.

Figure 15 shows the Hierarchy of Talent Demand and Development in TCSOL, mapping the essential skills, theoretical contributions, and practical applications required for effective talent cultivation. The model visualizes the dynamic demands of the TCSOL job market from the perspectives of employers, educational institutions, and policymakers.

The model highlights the core competencies prioritized by employers such as teaching ability, proficiency in Mandarin and English, and the ability to navigate cross-cultural communication effectively. For job seekers, practical experience and certifications are emphasized as critical factors to meet the market’s demands. Educators are tasked with reshaping training programs to encompass interdisciplinary skills, including areas such as management and sales, which reflect the broader scope of professional requirements. Meanwhile, policymakers are encouraged to promote cross-disciplinary education through policies that support private sector demands and internship programs.

By presenting the perspectives of these four stakeholder groups, the model provides a structured framework to understand how TCSOL professionals can develop the diverse skill sets required in today’s competitive labor market. This visualization also offers practical insights for educational institutions to align their curriculum with the specific demands of employers, as well as for policymakers to foster supportive frameworks for interdisciplinary and practical training.

Discussion

The analysis of recruitment trend shown by the above 7 types of structured data has a certain impact on TCSOL talent training. From the perspective of work location, the demand in first-tier cities such as Beijing and Shanghai are concentrated. This suggests that the education industry should focus on improving students’ teaching abilities and cross-cultural communication skills in an international and diverse environment when training talents, to meet the needs and characteristics of first-tier cities. For the situation of insufficient demand in economically backward areas, it can be considered to expand skills training related to online teaching, so that students can break through regional restrictions to provide teaching services.

From the perspective of salary distribution, 5–10k belongs to the normal range of wages in most industries, but enterprises with high requirements for work experience also represent a higher salary level. Zhu (2021) used association rule analysis to study the hidden relationship between the basic characteristics of various positions and their starting salary levels. Therefore, it is reasonable to speculate that the workplace analyzed in this article has a certain impact on salary levels. According to the differences in salary levels in different regions, students can be provided with career planning guidance. For example, employment in first-tier cities may offer higher income but high living costs, while employment in some newly developed areas may have low initial salary but large development space.

Given that most enterprises prefer candidates with 1–3 years of experience, educational institutions should strengthen their practical teaching components. This may include establishing on-campus training facilities that simulate real-life teaching scenarios, and encouraging students to engage in part-time teaching or volunteer activities off-campus. These measures can help fresh graduates compensate for their lack of experience and improve their competitiveness in the job market.

As for educational requirements, enterprises focus on undergraduate and junior college requirements and educational posts have high requirements. Although the two types of workers are in a state of oversupply, the former has higher market adaptability. In this industry where the employment threshold is not particularly high, how to make students more competitive in the market is a problem worthy discussing. Educational institutions should optimize their curricula to maintain high-quality undergraduate teaching while emphasizing the balanced development of professional knowledge and teaching skills. They should also provide guidance for students interested in pursuing further studies to address the demand for highly educated professionals in the education sector.

Private enterprises are still the main force to absorb employment. In recent years, there has been a surge in entrepreneurship driven by national initiatives promoting self-employment, along with favorable policies supporting entrepreneurship. This wave of entrepreneurial activity has led to the emergence of numerous small private enterprises, contributing to their relatively high representation in the market. State-owned enterprises have seasonal and high requirements for recruitment, educational institutions can establish long-term cooperative relations with private enterprises, carry out order-type training, and customize courses according to the needs of enterprises.

In terms of company scale, small and micro enterprises have great demand but limited welfare. They represent a promising option for job seekers in the TCSOL field. Educational institutions can guide students to establish a correct employment concept, recognize the development potential and personal growth opportunities of small and micro enterprises, and cultivate students’ entrepreneurial spirit and adaptability.

The distribution of job types shows a diversified trend. Due to the professional nature, positions in the field of education occupy a dominant position. Moreover, the demand for talents in TCSOL in government or public institution is only for specific clerical positions or positions without restrictions on majors, and the competition is intense. For those talents of TCSOL in enterprises such as publishing houses, sales companies, biotech companies, and investment companies, where their specialties do not align with the positions, they can consider applying their educational background and language skills to industries such as services, advertising, public relations or exhibitions. These industries may require talents with cross-cultural communication skills, language proficiency, and educational background. According to the types of posts, universities should incorporate interdisciplinary training into TCSOL programs, combining language teaching with other fields such as business, law, technology, and media. This will equip graduates with versatile skills that enable them to work in diverse roles beyond traditional education, including content development, project management, and corporate training. Institutions can cultivate professionals who are not only language teachers but also specialists in specific professional fields. For example, training in Business Chinese or Legal Chinese could equip professionals to work in corporate environments, significantly expanding their career opportunities beyond traditional teaching roles. This strategy could enhance the employability of TCSOL graduates in sectors that are currently underexplored.

The visual analysis of job descriptions employs multiple methods, including LDA topic modeling, named entity recognition (BERT-BiLSTM-CRF), co-occurrence network analysis, and time series analysis. These methods illustrate the diverse professional requirements enterprises seek for TCSOL professionals. Firstly, the LDA topic model was employed for text clustering. The overlapping keywords elucidate the overall nature of roles related to teaching Chinese as a foreign language; however, different positions place varying emphases on skill requirements. For instance, Topic 1 highlights study abroad initiatives or instruction in less commonly taught languages, Topic 2 focuses on editing and publishing, and Topic 3 underscores sales and service. This guidance helps job seekers broaden their job search and career planning, while industry stakeholders can adjust recruitment strategies to attract talent from a wider pool.

Second, in comprehensive ability cultivation, the high demand for “communication ability” and “expressive ability” implies that TCSOL talent training programs should enhance oral and written communication courses. For example, incorporating more practical oral communication exercises such as cross-cultural dialog simulations and written composition tasks that focus on cultural and educational topics. To meet the need for “teamwork” and “anti-pressure ability”, team projects and high-intensity teaching practice simulations can be added. This could involve group teaching plan design projects where students must collaborate under time and resource constraints, similar to real teaching task pressures. With the importance of “bilingualism” and “office software” skills, language immersion programs and digital teaching tool training workshops should be integrated. For instance, organizing language exchange activities with native speakers of different languages, and providing hands-on training for various office software used in teaching, such as online teaching platforms and educational software for creating teaching materials.

To address the high value placed on “teaching” experience, TCSOL institutions should establish affiliated teaching practice bases, either on-campus or in cooperation with local schools. These bases can provide students with regular teaching practice opportunities, starting from observing classes to gradually taking over teaching responsibilities under the guidance of experienced teachers. The significance of “study abroad” and “relevant work experience” calls for international exchange programs and industry cooperation initiatives. For example, partnering with international schools or language teaching institutions abroad to offer short-term exchange teaching opportunities or internships.

To cultivate the essential personality traits such as “sense of responsibility”, “patience”, and “passion for education”, a series of professional ethics and teacher’s attitude courses can be designed. These courses can include case studies of successful and unsuccessful teaching experiences, where students analyze the role of teacher’s personality in teaching outcomes. For enhancing “affinity” and “caring”, practical training in student-centered teaching methods can be provided. This involves activities such as role-playing where students act as teachers and must handle different student personalities and needs, and then reflecting on their performance in terms of building good teacher-student relationships.

Furthermore, co-occurrence network analysis effectively highlights the interrelationship between various skills and reflects the market demand for skill sets. Among these factors, “experience” and “educational background” frequently appear in job descriptions and exhibit the highest co-occurrence frequency with the term “priority” within the semantic co-occurrence matrix. Most companies tend to prioritize candidates who possess relevant work experience or internships. In contemporary society, a degree has become an important gateway for job seekers, as it is widely believed to be related to professional knowledge and skill levels (Ainun, 2024). For job seekers who wish to enter a specific field (such as international schools, Chinese language training institutions, publishing houses, etc.), the unique needs of this field can be understood by analyzing co-appearing words. Job seekers can target their own abilities for career planning, and strive to make their experience a stepping-stone to future jobs. What’s more, to enhance the value of the educational background, institutions can encourage students to pursue additional certifications related to TCSOL, such as the International Chinese Teacher Certificate or relevant language proficiency certificates in multiple languages. This would not only demonstrate their academic achievements but also their practical abilities in the field, making them more competitive in the job market.

The estimation of time series on recruitment demand also has a certain impact on the training of TCSOL professionals. The observed long-term growth trend in TCSOL recruitment demand indicates a need for educational institutions to expand their TCSOL talent training capacity. They should consider increasing the number of available seats in TCSOL programs and investing in faculty development to ensure high-quality instruction. For example, institutions could offer scholarships or incentives to attract more students into TCSOL majors and recruit experienced TCSOL practitioners as adjunct faculty to bring real-world insights into the classroom. Moreover, the curriculum should be updated to cover emerging trends and technologies in TCSOL. This could include courses on online teaching platforms and tools, as the field is likely to see an increase in digital teaching methods. Additionally, courses on intercultural communication in the context of a globalized TCSOL environment should be enhanced to prepare students for diverse teaching scenarios.

The cyclical fluctuations in recruitment demand have significant implications for TCSOL talent training. Educational institutions could adjust their curriculum according to the seasonal patterns of recruitment demand. In the months leading up to the peak recruitment seasons, they could offer intensive practical training courses and career preparation workshops. For example, in February and March, resume writing, interview skills training, and teaching demonstration practice could be provided to help students present themselves effectively in the job market. In the trough periods, such as from December to February and May to June, institutions can focus on theoretical learning and professional development activities. This could involve organizing seminars, workshops, and research projects related to TCSOL. For example, students could conduct research on effective teaching methods for specific regions or age groups during the trough months to enhance their academic knowledge and research capabilities.

In summary, the research findings highlight the dynamism and diversity of employment opportunities in the TCSOL field. Although there is great demand from different industries and regions, candidates must have a balance of academic qualifications, work experience, and relevant skills to succeed in this competitive environment. The text mining method used in this study demonstrates significant extensibility, allowing for the analysis of additional samples and potential application in various professional humanities disciplines. This ability helps universities, job seekers, and employees quickly and accurately understand the market demands for professional talent.

Conclusion

Main Findings

This study provides a comprehensive analysis of the demand and skill requirements for TCSOL professionals using structured and unstructured data from 3,310 job postings. By analyzing eight key dimensions—workplace, experience, education, salary, company nature, company scale, job type and job description—as well as text feature extraction modeling, a comprehensive view and trend of the TCSOL job market have emerged.

Structured data reveals the imbalance of regional demand. There is a strong demand for TCSOL professionals in first-tier cities such as Beijing, Shanghai, Guangzhou, and Shenzhen. This demand for undergraduate students is particularly high, while recruitment in smaller cities and inland provinces remains limited. This regional difference suggests the need for policy interventions to promote Chinese language education more evenly across the country, potentially unlocking new opportunities in underserved areas. Small and medium-sized private enterprises have advantages in recruiting professional talents for TCSOL. This trend may indicate that Chinese language education is moving towards a more commercialized direction, emphasizing the need for job seekers to have adaptability, entrepreneurial spirit, and awareness of market-driven demands.

Unstructured data analysis reveals that soft skills such as communication, teamwork, and responsibility are highly valued by employers. These skills are critical across all job categories and frequently co-occur with “education” and “teaching”, highlighting their importance in employer decision-making. The evolving professional environment for TCSOL professionals requires candidates not only to develop comprehensive skills and traits, but also to continuously accumulate practical work experience. The time series analysis highlights the long-term growth trend and seasonal fluctuations in TCSOL demand, guiding curriculum adjustments for educational institutions. In addition to these findings, a hierarchical model of talent demand and development in TCSOL was constructed, incorporating perspectives from employers, job seekers, educators, and policymakers. This model visualizes the core skills, theoretical contributions, and practical applications necessary to bridge the gap between TCSOL training and market demands.

Both structured and unstructured data highlight the growing importance of professional skills. Teaching skills and work experience are still given priority. Employers increasingly seek professionals with interdisciplinary expertise, including management, curriculum development, and digital literacy, marking a shift from traditional teaching-focused roles to diverse, market-oriented responsibilities. Therefore, graduates need to diversify their skills and explore non-traditional career paths to maintain competitiveness.

Limitations and Future Research

Although this study provides valuable insights, there are some limitations. The recruitment data from online platforms may not fully reflect the diversity of opportunities in less developed regions or smaller institutions, where online recruitment is less common. Future research may aim to conduct qualitative analysis through interviews with recruiters or TCSOL professionals to gain a broader understanding of the employment situation. Furthermore, the analysis could benefit from a more detailed classification of job positions, considering different teaching levels (e.g., primary, secondary, higher education) and specific skill requirements (e.g., digital teaching tools). Addressing these gaps would help to deepen the understanding of the evolving TCSOL job market.