Introduction

Multimodal pedagogies refer to the ways in which educators can design learning experiences using a range of multi-modal resources (Bezemer and Kress 2016), which has emerged as a pivotal approach in modern education. According to Peng (2019), there has been a growing interest in researching multimodal pedagogies, which involves exploring the many ways in which communication may be enhanced via the utilization of various modes of expression. Multiliteracies highlight the dynamic nature of textual interpretation in a society characterized by an unparalleled expansion of communication channels and cultural variety. They stress the significance of taking social context and various meaning-making approaches into account. Lotherington and Jenson (2011) noted that multimodality, while not a new phenomenon, has garnered more attention due to the widespread use of digital technology and changing social behaviors. The multimodality has had a major impact on the function of literacy in the twenty-first century. Integrating different modes is crucial in education, and multimodal teaching is rapidly gaining popularity. The reason for this growth is that as times progressed, the needs of educators and students are constantly changing. Traditional classroom methods have struggled to keep pace with these changing demands. As traditional methods struggle to keep up with the evolving requirements of educators and students in this dynamic landscape, multimodal teaching has gained prominence for its ability to adapt to diverse learning styles and preferences. Swan (2018) stated that multimodal teaching has emerged as a prominent characteristic of contemporary education.

The COVID-19 pandemic had a profound impact on global education systems, forcing a rapid shift to online and remote teaching. This shift further accelerated the adoption of multimodal teaching methods, such as video conferencing, virtual reality (VR), and augmented reality (AR), to facilitate learning. With the suspension of traditional face-to-face instruction, multimodal teaching became an essential tool for effective communication between teachers and students. The rapid growth of research on multimodal teaching in 2021 and 2022 reflects the widespread application of these methods during the pandemic and highlights their significance in remote education.

The emergence of multi-modal teaching not only aligns with the demands of contemporary society but also reflects societal progress. In light of this, many studies have adopted multimodal teaching in different educational backgrounds, such as: anatomy (Johnson et al. 2012), physical education (Liu et al. 2022a), and multimodality writing (Jiang et al. 2021). Additionally, research on the application of new technologies in multimodal teaching has become a hot topic, such as AR (Lin et al. 2022a), online teaching platform (Qianjing and Lin 2021), and video games (Nash and Brady 2022). These studies consistently highlight the positive outcomes of multimodal teaching applications and underscore its promising future, further emphasizing their versatility and potential.

Despite the extensive research on multimodal teaching, few studies have employed bibliometrics method to assess academic literature and examine current trends in this field. Bibliometric analysis applies statistical methods to explore research trends across academic publications, assessing elements such as publication volume, citation impact, and collaboration networks. This approach enables the evaluation of productivity and influence among researchers, institutions, and nations, while also mapping the development of emerging interdisciplinary fields. By analyzing bibliographic data like publication year, citation count, authorship, and keywords, bibliometric analysis offers a quantitative overview that complements traditional literature reviews (Guo et al. 2021). Bibliometric analysis provides a macroscopic overview of large amounts of academic literature, allowing for an objective and comprehensive visualization of research trends. Jia et al. (2014) contended that the publication history, characteristics, and development of scientific output in a specific research field can be documented through a quantitative analysis. Compared to traditional reviews, visualization through bibliometric analysis can more comprehensively, objectively, and visually display existing multimodal-related publications. Not only can the research status of the research field be understood through the graphs, but also the relationship between authors and institutions in the current research field can be understood, providing a reference for subsequent researchers or research institutions to seek cooperation opportunities and guide investment. Donthu et al. (2021) asserted that the bibliometric method is regarded as the most objective among the various methods of conducting literature reviews. This is due to its reliance on a review protocol and the application of quantitative analytical techniques.

To address the limitations of the traditional literature review, this study employs both content analysis and bibliometric analysis to investigate global research publications in multimodal teaching. It examines the scope of research on multimodal teaching, focusing on key themes, influential contributors, and emerging trends in this field. The study maps influential authors, countries, and journals, while also tracing the developments in the field. Additionally, by analyzing keyword distributions and trends, this study uncovers the main research focuses and proposes new perspectives to guide future research.

This study primarily addresses the following research questions:

RQ1. How has research in multimodal teaching progressed over time?

RQ2. What are the primary topics and current concerns in multimodal teaching research?

RQ3. What theoretical and practical implications arise from these findings, and what opportunities exist for future exploration?

Literature review

Applications of multimodal teaching

The concept of “multimodal teaching” was initially introduced by the New London Group (1996) from a teaching standpoint, applying the theory of multimodal discourse analysis to teaching practices. They advocated for teachers to engage students’ various senses and collaborative learning processes in language teaching to enhance teaching effectiveness. Stein (2000) further elaborated on the concept of multimodal teaching mode, suggesting that all communication activities should be carried out in a multimodal teaching environment, both in teachers’ teaching instructional practices and students’ learning processes. Stein also explored the practical application of this mode in teaching. According to the New London Group, more and more scholars have shifted their focus towards investigating the relationship between modal theory and language teaching. Consequently, there has been a proliferation of studies on how multimodal teaching can be used for specific situations.

In educational contexts, multimodal teaching offers a plethora of advantages by incorporating diverse communication methods. A multimodal teaching method was presented by Liang and Lim (2021), suggesting that learning and communicative activities in classroom instruction ought to encompass multimodal communication. In Li’s (2017) study, multimodality should be used in all parts of teaching, such as pedagogy, material delivery, and evaluation.

Multimodal teaching methods are widely used in language teaching. The multimodal teaching method may be used in English classes covering reading, writing, listening, and speaking abilities, as mentioned by Song (2017). Multimodal teaching techniques are beneficial not just for English but also for other languages, including Chinese, Spanish, and Japanese. When it comes to international Chinese language education, combining multimodal courses with mobile learning materials makes it easier to make dynamic virtual classrooms that keep students interested and improve their overall language skills. Wang and Zhang (2021) confirmed that multimodal Chinese teaching greatly improves learning efficiency when compared to traditional listening and speaking classes. Multimodal teaching optimizes language learning by engaging learners’ senses and employing diverse modes for knowledge input and output, thereby improving teaching quality and compensating for the limitations of traditional language instruction.

Beyond language teaching, multimodal teaching has also prevalent in fields such as physics education, nursing education, and design education. Han and Black (2011) provided students with visual, aural, and haptic information by simulating the force and kinesthetic movement they experience using physical objects (simple machines in that study). This information serves as the reference points of students’ future learning and aid in their comprehension. Hardie et al. (2020) conducted a study to explore the use of virtual reality (VR) among nursing and midwifery students, employing immersive VR storytelling as a valuable and efficacious pedagogical tool that engendered heightened levels of attention and interest in learners. The principles behind educational games are added to traditional training programs by Patti and Vita (2017). This aims to improve the potential of a multimodal educational system that uses Edutainment and Game-Based Learning to foster active learning via simulated experiences. These examples highlight the interdisciplinary nature and wide applicability of multimodal teaching.

Multimodal teaching caters to learners of all ages, including children, youth, and adults. For children, a multimodal learning environment with robots and IoT-based 3D books was constructed by Lin et al. (2022b). They demonstrated that multimodal task-based learning systems have the potential to enhance learner agency, positive emotions, and learning motivation by delivering robot and multimedia feedback. For youth, Dunn and Sweeney (2018) highlight the iPad’s potential for creating technology-mediated learning spaces in the classroom, fostering engagement and content learning. Such spaces enable teachers to deliver content effectively and cultivate a more engaging environment. For adults, Qin et al. (2023) show that a Human-Computer Interaction (HCI)-based multimedia teaching mode works better for students with limited English proficiency, fostering their interest and ability in English reading. These examples highlight the age-neutral nature of multimodal teaching, demonstrating that its flexible teaching style can adapt to learners of all ages.

The importance and benefits of multimodal teaching

In the 21st century, the advance of digital technologies has transformed the contemporary communication practices-transitioning from page to screen and from monomodal to multimodal (Belcher 2017). Liu and Jiang (2017) argue that the conventional teaching model often results in teacher-dominated English classrooms. As a consequence, students experience a decline in their enthusiasm for the subject of English and make limited progress in their overall English language skills.

The multimodal teaching approach was found to be positively received by students, as demonstrated by Pan and Zhang (2020). Students think that the multimodal teaching method is more interesting and can stimulate their interest in learning compared to traditional teaching methods. In Liu’s (2017) study, a multimodal teaching technique was implemented for English writing instruction. The results showed improvements in students’ self-efficacy and motivation towards writing. Lidar et al. (2020) suggest that bringing a multimodal teaching mode into an English classroom might effectively transform the language learning experience from dull and unengaging to dynamic and interactive, therefore motivating students to switch from a passive to an active role. When utilized appropriately, multimodal teaching can be helpful without being harmful, contributing to the improvement of students’ reading, listening, speaking, and communication abilities. Tai and Wei (2023) highlight that teachers can employ a diverse array of multilingual and multimodal resources to enrich educational experiences of their students, even if without specific training on how and when to use them strategically.

These studies underscore the necessity and efficacy of multimodal teaching in contemporary education, emphasizing its ability to engage students, enhance learning outcomes, and foster active participation in the learning process. By leveraging diverse modes of communication, multimodal teaching holds promise for transforming traditional classrooms into dynamic learning environments conducive to student success.

Method

The data utilized in this study were gathered from the Web of Science Core Collection (WoSCC). This database is one of the most authoritative and popular for conducting scientific research (Zhu et al. 2022). On December 6, 2023, a comprehensive search was conducted in the WoSCC for literature on multimodal teaching. The search query used was: topics= (“multimodal teaching OR multimodal education OR multimodal pedagogy OR multimodal instruction”), with the publication year, limited to 1995–2023 and the language restricted to English. The concept of multimodal teaching began to gain attention in the mid-1990s, notably with the New London Group’s work (1996), marking the formal introduction of multimodal discourse in teaching practices. Since then, multimodal pedagogies have evolved into a significant research area in education. Therefore, selecting 1995 allows for a thorough analysis of the evolution and development of multimodal teaching practices over time.

Data screening process

To ensure the rigor and relevance of the selected literature, the following inclusion and exclusion criteria were applied:

Inclusion criteria

The inclusion criteria were as follows: (1) The literature source must be the WoSCC. (2) The publication period was set from January 1, 1995, to December 6, 2023, to capture relevant developments in multimodal teaching research. The total count of articles has been collected as 6704 for this step. (3) The language of publication was restricted to English to focus on widely accessible international scholarly work, resulting in the removal of 416 non-English articles. (3) Document types were restricted to “Article” and “Review,” leading to the exclusion of 1062 conference papers, commentaries, editorials, and other non-qualifying publications. After applying these criteria, the initial set of 6704 records was reduced to 5226 articles. (4) The literature must explicitly explore topics related to multimodal teaching, multimodal pedagogy, multimodal education, or multimodal instruction.

Exclusion criteria

The exclusion criteria were as follows: (1) Publications that mention “multimodal” or “teaching” but do not significantly address multimodal teaching practices or pedagogy. (2) Non-academic or non-peer-reviewed documents, such as conference proceedings, editorials, commentaries, or book chapters. (3) Papers focusing on general teaching methodologies but are unrelated to the topic of multimodal teaching.

Manual screening

Each publication was manually reviewed by three researchers who adhered to the specified inclusion and exclusion criteria. The initial screening focused on titles and abstracts, and in cases where relevance was uncertain, the full text was accessed to ensure the publication’s alignment with the research topic. To ensure consistency in inclusion decisions, inter-rater reliability among the reviewers was calculated using Cohen’s kappa coefficient, achieving a kappa value of 0.86, which indicates a strong level of agreement among the reviewers. Discrepancies in inclusion decisions were resolved through discussion and consensus among the researchers. This multi-step process was crucial for ensuring both the objectivity and reliability of the selection process, ultimately leading to the identification of 689 high-quality articles for detailed analysis.

The reviews process ensured consistency through regular discussions and well-defined inclusion and exclusion criteria. All three reviewers independently evaluated the publications, and any discrepancies were resolved through consensus. In cases of disagreement, inclusion decisions were resolved by majority vote, with at least two researchers agreeing on the final decision. Following systematic review principles similar to PRISMA (Tetzlaff et al. 2020), the literature selection process was carefully documented to enhance transparency and replicability. This collaborative approach helped maintain consistency and objectivity throughout the screening process.

Research process

Figure 1 displays the search terms and strategy used in this study on multimodal teaching.

Fig. 1: Screening process for publications.
figure 1

Note: “” indicates that the data was retrieved from WoSCC, last updated on December 6, 2023.

As shown in Fig. 1, Stage 1 involved data collection from the WoSCC using advanced search method to retrieve relevant literature on multimodal teaching between 1995 and 2023. The search was refined to focus on articles and reviews published in English, resulting in 689 documents.

Stage 2 applied bibliometric analysis and information visualization to examine publication trends, co-authorship networks, and international collaborations. Co-keyword analysis and keyword citation burst analysis were also used to identify key research themes.

Stage 3 involved content analysis, where highly cited and emerging publications with high attention were examined in detail to understand evolving research interests.

Stage 4 provided conclusions based on the identified academic trends, collaboration patterns, and key research issues, offering a comprehensive understanding of the evolution of multimodal teaching research.

Analytical tools and methods

With the dataset established, the next step was to select appropriate analytical tools for conducting the bibliometric and content analysis. While packages in R and Python, such as bibliometrix and NetworkX, offer flexibility for general citation analysis, they often require extensive customization to perform advanced academic network analyzes, such as burst detection and temporal clustering. CiteSpace, in contrast, was chosen due to its specialized functionalities specifically tailored for academic citation analysis, with built-in tools for identifying citation bursts, performing co-citation clustering, and visualizing trend evolution. These capabilities make CiteSpace particularly suited for this study’s goal of identifying emerging trends and key nodes in the development of multimodal teaching research.

CiteSpace, developed by Chen Chaomei, was used to examine and visualize the interconnections among authors, countries, journals, and keywords. CiteSpace offers visualization and interactive functionalities that enhance the comprehension and interpretation of network and historical patterns. It was used to build visual co-citation maps, define co-citation networks, discover turning points, search for important nodes, and explore the development of the field, as stated by Chen (2006). In addition, CiteSpace’s clustering and keyword co-occurrence analysis tools were applied to help identify key research themes, hotspots, and influential publications in the field of multimodal teaching (Synnestvedt et al. 2005). This approach is valuable for both researchers and educators to track and visualize the content and trends in multimodal teaching research.

This study utilized CiteSpace to conduct various analyzes, including co-citation analysis, co-occurrence analysis, and keyword burst detection. Co-citation networks were employed to reveal the relationships between frequently co-cited references, helping to identify influential scholars and key publications in the field. The keyword co-occurrence map, on the other hand, was used to explore frequent themes and research focuses. This study selected these visualizations due to their ability to present a comprehensive view of research connections and highlight the central themes within the field of multimodal teaching. The nodes in the co-citation network represent publications, while the thickness of the connecting lines indicates the frequency of co-citation, thereby illustrating the strength of the relationship between publications. In co-citation analysis, the centrality metric was used to measure the importance of a node within the network, indicating the extent to which a publication acts as a bridge connecting different research clusters. Centrality values, ranging from 0 to 1, highlight influential publications that significantly connect diverse areas of research. Additionally, the silhouette score was used in clustering to assess the reliability and coherence of identified clusters, with values ranging from −1 to 1. Higher values indicate more distinct and well-defined groupings. Keyword burst detection was applied to identify emerging trends by measuring sudden increases in keyword frequency over specific periods, reflecting shifts in research focus within multimodal teaching.

To enhance the visualization of the knowledge map, the “Top N%” parameter in CiteSpace was set to 10%, and the “per slice” setting was adjusted to 50. This study analyzes the results using both structural and temporal metrics. Structural metrics include modularity Q, silhouette score, and betweenness centrality. Modularity Q, which ranges from 0 to 1, indicates the extent to which the network can be divided into distinct thematic research clusters. Higher values suggest a well-structured network. Betweenness centrality assesses how well a node connects disparate nodes within the network, with higher values typically associated with seminal works linking otherwise unrelated research clusters. Temporal metrics, such as citation burstness, indicate sudden surges in citations over specific timeframes, serving as a measure of a document’s impact within the relevant field.

The study employed a mixed-method approach, combining bibliometric analysis and content analysis. The bibliometric analysis allows for a macroscopic view of the research landscape, while content analysis offers a qualitative approach to interpreting the key themes and trends identified in the literature. According to Tlili et al. (2022), this mixed-method approach, known as data triangulation, enhances the validity of the research by providing a multi-dimensional perspective. According to Shashi et al. (2020), bibliometric analysis mitigates bias in data selection through its reliance on quantitative techniques, allowing researchers to prioritize essential and relevant articles. Content analysis can be employed to investigate the content of articles, abstracts, and titles, providing insights into the research priorities related to a specific topic. By combining these methods, the study provides a comprehensive understanding of the evolution of multimodal teaching research and identifies potential future research directions by mapping the historical and structural patterns within the academic field.

Results

Bibliometric analysis

This section provides a comprehensive review of scientific publications on multimodal teaching. It examines the countries and regions, institutions, journals, and authors that have produced the most prolific research. Moreover, this research investigates the utilization of citation and co-citation analysis as a means to gain insights into prominent scholars, esteemed journals, and collaborative networks. This comprehensive study provides researchers with the necessary tools to identify prominent scholars and influential publications, as well as gain insights into collaboration patterns. These findings reveal the current state of research on multimodal teaching, providing a valuable reference for researchers and practitioners alike. Therefore, this study did the following analysis to answer RQ1.

Annual publication and citations

The number of published papers serves as a crucial indicator reflecting the trajectory of scientific research development. Additionally, the frequency with which an article is cited by others constitutes a significant metric for publication quality assessment. Figure 2 presents the annual publications in multimodal teaching, offering insights into the macro trend.

Fig. 2: Trends in publications and citations.
figure 2

Yearly number of publications and citations related to multimodal teaching research based on WoSCC data. Note: “a” indicates the data covers the period from January 1, 1995 to December 6, 2023.

The 689 screened articles, published between 1995 and 2023, reflect a constant growth in the field of multimodal teaching. This growth pattern can be divided into three stages: a slow development period (1995–2008), a stable growth period (2009–2015), and a major development period (2016–2023). Notably, the annual paper count increased significantly from 77 to 106 between 2021 and 2022, primarily due to technological advancements, emerging educational needs, and innovations in remote teaching tools and methods during the COVID-19 pandemic, which accelerated the adoption of multimodal teaching as a response to remote learning challenges.

From 1995 to 2023, a total of 689 scholarly publications in multimodal teaching accumulated 10,230 citations, averaging 14.85 citations per article. While the slow development period saw limited citations due to a lack of publications, the stable growth period marked a significant increase, highlighting it as a critical phase for the field’s development.

In the third stage, there has been a consistent rise in the amount of academic literature related to multimodal teaching between 2016 and 2023. In particular, there was a significant spike in 2022, with over 100 articles being published. These findings suggest that research on multimodal teaching has progressively evolved over the past 25 years and reflects a positive trajectory of development. Furthermore, Fig. 2 indicates an escalating interest in multimodal instruction within academia, with increasing attention from journal-based publications.

Throughout the major development period, citation numbers continued to grow steadily, reflecting the ongoing interest and relevance of the research in this field. This trend underscores the increasing influence and recognition of multimodal instruction within the academic community.

Institutions analysis

Institutions analysis aims to uncover the key institutional contributions to multimodal teaching research and identify their influence in the field. The publication record of an institution reflects its competitiveness and leadership in the research sector. The top 10 global institutions in multimodal teaching and learning research are listed in Table 1.

Table 1 Top 10 institutions with the most publications on multimodal teaching research.

Nanyang Technological University (NTU) contributed the most with 18 articles, followed by State University System of Florida, and Australian Catholic University. Furthermore, the number of publications across these institutions is relatively similar, indicating an equal contribution to the field of multimodal teaching.

The distribution of institutions that have published papers by time zone is illustrated in Fig. 3. As shown in Table 1 and Fig. 3, there was a notable increase in publications on multimodal teaching around 2009, with institutions such as NTU and the University of London producing significant outputs. Figure 3 illustrates a growing trend in institutional involvement from 2009 onward, likely driven by global emphasis on digital literacy and technological integration in education, spurring institutions worldwide to prioritize multimodal teaching research.

Fig. 3: Time-zone map of institutions.
figure 3

This map illustrates the time-zone evolution of institutions publishing research on multimodal teaching, based on WoSCC data from 1995 to 2023. Each node represents an institution.

The prominence of NTU and the State University System of Florida reflects the strategic educational priorities and governmental support. For instance, Singapore’s Ministry of Education has invested consistently in digital literacy, likely supporting NTU’s research leadership. The increased research output from NTU may also relate to Singapore’s educational reforms emphasizing digital skills and innovative pedagogy.

Over the 28-year time span covered, the timeline highlights influential institutions and research milestones, shedding light on contributors to multimodal teaching research. This analysis underlines the impact of national policies and institutional priorities in advancing the field.

Distribution analysis of countries and regions

The objective of countries and regions distribution analysis is to explore the geographic landscape and collaborative networks in multimodal teaching research. Table 2 lists the top 10 most productive countries or regions.

Table 2 Top 10 countries or regions in multimodal teaching research publications.

Table 2 shows that the USA, China, and Australia are leading contributors in this field. Notably, the USA alone accounts for 32.65% of total publications, demonstrating its dominant role. Developed nations like Australia and Spain also show strong research output, while developing countries, such as China and South Africa, have steadily increased their contributions, underscoring the global expansion of this research area. This geographic distribution aligns with the findings from the section “Annual Publication and Citations”, where the rise in publications since 2016 is closely linked to increased international collaboration and governmental support.

Although the United States continues to lead, its share of total contributions has declined slightly, dropping from 42.86% in 2021 to 26.13% in 2023. This shift indicates that other countries, for example China, are increasing their contributions to the field. China’s contribution has increased notably over time, from just one publication during the 2015–2016 period to 33 publications in 2022. This surge can be directly attributed to the Chinese government’s strong financial support, evidenced by acknowledgments in research articles for funding from the Ministry of Humanities and Social Sciences Project, the China Scholarship Council, and the National Social Science Foundation.

These findings highlight the pivotal role of government support in driving research productivity. Developed countries like the United States and Australia benefit from well-established research funding systems, enabling sustained productivity and international collaboration. Developing countries, such as China and South Africa, are enhancing their presence in the field through increased financial support, signaling a shift toward a more balanced, globally collaborative landscape. This trend aligns with findings from the section “Institutions Analysis”, which identified key contributing institutions, including Nanyang Technological University and the State University System of Florida.

Figure 4 illustrates the collaboration network between countries and regions using CiteSpace. The network comprises 65 nodes representing countries or regions and 83 collaboration links, emphasizing the prominence of international cooperation in multimodal teaching research. Chen et al. (2010) stated that nodes with betweenness centrality (ranging from 0 to 1), play a crucial role in fostering collaborative relationships. They illustrate a transition pattern and highlight principal network themes and hot topics. In the graph, the nodes with high centrality are indicated by purple circles, which indicate that they play a more active and closer role in cooperative relationships with other nodes. These values are also referred to as mediator centrality. The USA (0.38), People’s Republic of China (0.09), Spain (0.17), Australia (0.14), Germany (0.10), and England (0.28) possess a high intermediate centrality and serve as crucial bridges in international research collaboration. The dominance of the United States in the field of multimodal teaching is reflected not only in its publication volume but also through its prominent research institutions and strong international collaborations. These collaborations have significantly contributed to interdisciplinary applications of multimodal teaching, particularly in language and science education.

Fig. 4: Collaboration network between countries/regions.
figure 4

Visualization of international collaboration on articles related to multimodal teaching. The nodes represent countries or regions.

Journals and authors analysis

This section aims to identify the major publishing platforms and key contributors in the field of multimodal teaching, providing insights into the primary channels through which research findings are disseminated. Journal co-citation analysis explores relationship among journals by analyzing how often different journals are cited together. According to Liu et al. (2022b), the interdependence and cross-relationship between journals are revealed through journal co-citation analysis. Journals serve as a platform for authors to publish papers, allowing them to showcase core areas of the research field. By analyzing the co-citation of journals, the academic framework of multimodal teaching can be better understood. Table 3 lists the top 10 journals in multimodal teaching, ranked by citation count.

Table 3 Top 10 journals with the most citations.

Table 3 highlights the journals with the most citations in the field of multimodal teaching research, reflecting its interdisciplinary appeal. High citation counts in journals such as Harvard Educational Review, TESOL Quarterly, Journal of Adolescent & Adult Literacy, and Computers & Education indicate that multimodal teaching is gaining recognition across education, language studies, and technology. This trend emphasizes the growing importance of multimodal approaches not only in language and literacy education but also in broader educational and technological contexts.

Reyes-Gonzalez et al. (2016) noted that the study of co-authorship is an essential component of bibliometrics, as it serves as an indicator of the level of collaboration within a research field and reflects its current state. Figure 5 shows the network of author collaborations. Node size represents the frequency of each author’s appearance in publications. The network consists of 425 nodes and 243 connections, with a network density of 0.0027, indicating sparse collaboration among authors. Despite the presence of many small groups, the lack of inter-group connections suggests limited collaboration across the network. The central nodes, such as Jiang, Lianjiang, Yu, Shulin, Mills, Kathy A., and Smith, Blaine E., act as bridges within the network, linking smaller clusters and facilitating knowledge exchange across different research subfields. Additionally, the largest connected cluster contains 115 nodes, accounting for 27% of the network, underscoring a subset of authors who engage in larger, more cohesive collaborative groups.

Fig. 5: Collaboration network between authors.
figure 5

Visualization of author collaborations in multimodal teaching research. Each node represents an author.

These patterns illustrate the distribution of collaboration intensity within the multimodal teaching research community. The network’s structure reflects a field where smaller collaborative groups operate in parallel, with a few influential authors serving as connecting nodes that promote cross-group collaboration. This analysis provides a detailed view of the collaboration dynamics within the research landscape of multimodal teaching.

Co-citation Analysis

The objective of co-citation analysis is to identify influential scholars, reputable journals, and cooperation models in the field of multimodal teaching by conducting a meticulous co-citation analysis. Co-cited references serve as pivotal indicators in bibliometrics, representing the frequency with which two publications are mentioned together by other publications. The clustering analysis can be used to group a multitude of similar references into several knowledge units, which objectively summarizes the major content of the relevant knowledge unit and reflects the dynamic evolution in a field (Lv et al. 2022). According to Small (1973), reference co-citation refers to the situation when two or more articles are referenced simultaneously by one or more subsequent papers, establishing a co-citation link between the two works. Therefore, the co-citation relationship is contingent upon the quantity of authors who cite the same source. The magnitude of the co-citation relationship is dependent on the number of authors who have referenced the source.

Utilizing CiteSpace, the analysis of references was conducted using several key parameters: citation frequency, centrality, and cluster analysis. In this analysis, the most prominent publications are depicted as nodes, with the size of each node corresponding to the number of associated publications. As the number of co-citations increases, the size of the node grows. Figure 6 presents the co-citation network generated using CiteSpace. This network is composed of 867 nodes and 2026 links, representing the relationships between co-cited publications in the field of multimodal teaching. Each node represents a specific publication, and the size of the node corresponds to the number of times that publication has been co-cited with others. The more frequently a publication is co-cited, the larger its node becomes. In the network, nodes with higher centrality and influence are highlighted with larger labels and bolder circles. For example, Saldaña (2021) and Kress (2010) stand out as highly co-cited references, indicating their significant influence. Saldaña provides foundational knowledge on qualitative research methods, while Kress contributes seminal insights into multimodal discourse, both of which are essential to advancing multimodal teaching research. The colors of the links between nodes correspond to different time periods, allowing the researchers to track the development of research collaborations over time. The clustering patterns in Fig. 6 reveal distinct groups of research focused on similar topics, showcasing key areas of focus within multimodal teaching research.

Fig. 6: Literature co-citation network.
figure 6

This map was created using CiteSpace. Each node represents a publication, and the node size corresponds to its number of co-citations.

In order to highlight the most frequently cited works in the field, Table 4 provides a summary of the top five cited references, which play a significant role in shaping current multimodal teaching research.

Table 4 Top 5 most cited references (Frequency ≥ 10).

As shown in Table 4, primary predictors for citations are considered to be the quality of the paper itself, journal impact factor, and accessibility and visibility of papers (Tahamtan et al. 2016). Hafner and Ho (2020) rank first with 21 citations, highlighting its importance. This study proposes a process-based model for assessing digital multimodal compositions, offering a framework for teachers to evaluate multimodal communicative competencies. The second important article is Jiang’s (2017). This groundbreaking study confirms the potential of digital multimodal writing in promoting students’ participation in English learning. It provides empirical evidence to support why digital multimodal composing can promote English learning and provides valuable insights for future reference. Kress (2010) contributes theoretical insights into multimodality, defining multimodal discourse. Jiang (2018) expands on previous studies on digital multimodal composing inclusion, exploring how various identity orientations impact learners’ responses and engagement in multimodal activities. This resource offers researchers access to the most influential articles in the field of multimodal teaching, enabling them to gain a comprehensive understanding of its foundational aspects.

Figure 7 shows the co-citation network of journals. The journals that have the highest number of citations are Harvard Educational Review, TESOL Quarterly, and Journal of Adolescent & Adult Literacy. These journals have a substantial influence in the field and are widely cited, with several being considered high-impact journals.

Fig. 7: Journal co-citation network.
figure 7

This map was generated using CiteSpace. Nodes represent journals, and their size reflects their citation impact.

According to White and McCain (1998), author co-citation analysis is distinct from reference co-citation analysis. It involves the occurrence of two authors being quoted together in a third author’s literature simultaneously, regardless of which specific works of theirs are mentioned. The higher the frequency of citations between two authors, the stronger the academic correlation between them, and the more similar topics they do. Through the analysis of author co-citations, it can be determined the authors who are most productive and influential, as well as ascertain the specific research area in which they are active. This analysis aids in the understanding of the knowledge structure within that particular subject.

Figure 8 shows the network of high-impact authors, where the size of each node represents the frequency of citations. The most influential authors in the network are Kress G., Jewitt C., Cazden C., and Cope B., etc. These authors’ high citation counts reflect their authority and influence within this domain.

Fig. 8: Author co-citation network.
figure 8

A co-citation network of high-impact authors, where node size represents citation frequency and highlights influential contributors.

In addition to identifying influential publications, understanding the contributions of key authors is essential for mapping the research landscape in multimodal teaching. Table 5 presents the top 10 most frequently cited authors in this field, showing the major contributors driving advancements in multimodal teaching.

Table 5 Top 10 cited authors with higher citation frequency.

As shown in Table 5, Kress G. and Jewitt C. occupy the top two positions, widely recognized for their foundational contributions to multimodal-based theories. Cazden C., ranking third, focuses on children’s language acquisition, including phenomena such as U-shaped development in second language acquisition and the acquisition of English inflectional forms. These theories are attributed to Cazden C., underscoring their significance in the field of multimodal teaching. The prevalence of citations for Kress G., Jewitt C., Cazden C., and others suggests their high authority and influence in this field.

Theme hotspot analysis

Theme hotspot analysis serves to identify and contextualize the key areas of focus within multimodal teaching research, highlighting current interests and emerging trends. Keywords in academic papers serve as vital indicators of the central content and theme of the publication, while the citation count of an article reflects its impact. Chen et al. (2012) states that keyword analysis is used to identify hotspots in the current study and possible future directions in order to obtain a deeper understanding of a field. Consequently, keyword analysis is an indispensable tool for acquiring a more profound understanding of the fundamental principles of a field, as well as for identifying current hotpots and potential development directions for future development.

Figure 9 shows the keyword co-occurrence map of multimodal teaching, generated using CiteSpace. The analysis yielded 449 keywords, with the following parameters set:

Fig. 9: Keyword co-occurrence network.
figure 9

This map was generated using CiteSpace, where each node represents a keyword, and its size reflects the frequency of occurrence.

(1) Slice Length = 1;

(2) Selection criteria: g-index(k = 25), LRF = 3.0, L/N = 10, LBY = 5, e = 1.0;

(3) Network: N = 449, E = 1749 (Density = 0.0174);

(4) Pruning: Pathfinder;

(5) Nodes labeled:1.0%.

The proximity value (Q) approaching 1 indicates a well-defined cluster, while Silhouette (S) value close to 1 signifies greater reliability of the node cluster mode. Successful clustering is achieved when S exceeds 0.6. Figure 9 illustrates the generated keyword graph, where Modularity Q = 0.6385 and mean S = 0.8566, indicating suitability for further analysis. The network of nodes demonstrates that scientific articles about multimodal teaching have a substantial degree of research interest and contain a variety of important issues. A node’s purple border signifies its centrality, with thicker borders indicating higher interdependence and academic impact in the analysis. For example, in Fig. 9, three pivotal keywords are highlighted: “design”, “literacy”, and “education”. Predominantly, the research focus lies in the field of literacy education, as indicated by the frequent usage of keywords such as “education”, “literacy”, “language”, “English”, and “technology”. Conversely, less frequently used keywords like “animated movies”, “curriculum and instruction”, and “epistemic position” suggest emerging areas that may benefit from further exploration. Nevertheless, examining keyword frequency provides valuable insights into current research trends within this domain. Consequently, research efforts have primarily concentrated on literacy education, with a particular emphasis on language instruction supplemented by technology integration. English teaching comprises a significant portion due to its widespread use as a primary language. Additionally, the primary focus of the study was on children, highlighting the importance of developing effective multimodal approaches for younger learners.

The cluster view of keyword analysis aimed to uncover the academic hotspots in the field of multimodal teaching by analyzing keywords and indexed terms associated with journals. The cluster level was determined by selecting noun phrases from the article title, abstract, and keywords within each cluster, with the highest-ranking noun phrase chosen as the cluster tag, as indicated by a silhouette score greater than 0.5. In Fig. 10, there are a total of 13 clusters identified, including “augmented reality”, “conversation analysis”, “multimodal composing”, “media literacies”, “multimodal texts”, “cognitive load”, “early childhood education”, “video conferencing”, “multimedia human-computer interaction”, “science education”, “learner attitudes”, and “multimodal teaching and learning environment”. Cluster numbers range from 0 to 12, with a lower number signifying a larger cluster containing more closely related keywords. As described by Chen (2017), each cluster is composed of multiple interconnected keywords, reflecting significant thematic concentrations within multimodal teaching research.

Fig. 10: Keyword clustering map.
figure 10

Thirteen keyword clusters in the field of multimodal teaching were generated using CiteSpace, with each cluster represented by a distinct color.

Figure 10 reveals that most clusters focus on text attributes and their role in multimodal teaching. Kress and Leeuwen (2001) discuss these multimodal structures, proposing a two-part theory of multimodal communication: (1) the semiotic resources of communication, encompassing modes and media, and (2) the communicative practices that utilize these resources. This work not only clarifies various concepts and definitions related to multimodality but also formally introduces the term “multimodal,” distinguishes between “mode” and “medium,” and for the first time defines “multimodal discourse.” Thus, it bridges discourse analysis with multimodality. Stein (2000) further suggests that teachers and students should collaboratively view classrooms as semiotic spaces—environments where individuals can construct multimodal texts and create their own meaning.

Figure 10 shows that most clusters focus on text attributes and their role in multimodal teaching. Kress and Leeuwen (2001) describe these multimodal structures, proposing a two-part multimodal communication theory: (1) the semiotic resources of communication, including modes and media employed, and (2) the communicative practices that employ these resources. In addition to explaining concepts and definitions related to multimodality, this book formally defines the term “multimodal” for the first time, distinguishes between “mode” and “medium”, and introduces the term “multimodal discourse” for the first time. Thus, it bridges discourse analysis with multimodality. According to Stein (2000), teachers should work together with students and consider classrooms to be semiotic spaces. These spaces serve as locations where individuals, who possess the ability to create their own meaning-making, construct multimodal texts.

Through thematic analysis and synthesis, the reviewed studies have been grouped into three primary thematic areas for further discussion: (1) engagement with multimodal texts in the classroom, (2) emphasis on learners’ attitudes in multimodal learning, and (3) challenges and considerations regarding the integration of new technologies.

Theme evolution analysis

The theme evolution analysis aims to explore the progression and evolution of topics related to multimodal teaching over time. A comprehensive understanding of the development of themes and knowledge systems may be achieved by examining the fluctuations in keywords throughout time. This analysis facilitates the identification of the shifting focuses, emerging trends, and trajectories of academic research. Figure 11 shows the spatial and temporal distribution of keywords, while Fig. 12 presents the timeline of keyword development.

Fig. 11: Time-zone map of keywords.
figure 11

Time-zone evolution of keywords in multimodal teaching research, visualized using CiteSpace to highlight their frequency and temporal distribution.

Fig. 12: Timeline visualization map of keywords.
figure 12

This map was generated using CiteSpace. It highlights the temporal dynamics and evolving significance of keywords in multimodal teaching research.

By combining the Figs. 11 and 12, the evolution of the theme could be found. The results reveal significant changes in research themes over time. Initially, scientific literature focused primarily on fundamental topics, including language and instructional design, with particular emphasis on the research subjects, namely students and teachers, and cognitive load. Subsequently, the emphasis shifted towards literacy, using several perspectives and methodologies to conduct thorough analysis. Recently, attention has shifted towards the practical application of new technologies, particularly media literacies, based on previous research. The scope of research is also expanding, and it has begun to affect related fields and industries. Figure 11 also shows that from 2009 to 2014, there was an expansion in the keywords related to multimodal teaching, particularly with an increase in keywords associated with literacy. After 2018, more emerging keywords have emerged, signifying ongoing development, some of which may become focal points of future research.

Keyword citation bursts represent significant surges in the frequency of a specific word or article over time, serving as crucial indicators for identifying cutting-edge research fields. Figure 13 presents a list of top 18 keyword citation bursts based on their occurrence time, with the red portion indicating the burst period.

Fig. 13: Citation bursts map.
figure 13

Top 18 keywords with the strongest citation bursts, highlighting periods of significant citation increases.

The earliest burst keyword in multimodal teaching research is “elementary education”, and its duration is relatively long (Fig. 13). In addition, the duration of the keyword bursts such as “technology”, “multimodal literacy” and “adolescents” is also very long, and “augmented reality” remains active, indicating that the focus of today’s research is to apply AR technology to assist teaching. Notably, since 2019, there has been a significant burst of citations related to “teaching strategies” (burst strength: 2.65), highlighting it as an emerging research hotspot. This observation emphasizes the need to not only focus on student teaching but also on teacher training. Given the increasing appeal of the Internet, teachers need to constantly acquire new technologies for educational innovation, thus confirming the popularity and relevance of multimodal teaching practice.

Analyzing the sequence of burst keywords from “elementary education” to “higher education”, reveals a shift in the primary subjects of multimodal teaching, moving from children initially to adults. Furthermore, language teaching has always been a central topic of pedagogical research, with particular emphasis on the writing part of language teaching. The emergence of new keywords reflects the evolution of research topics, indicating that new research fields are constantly emerging in the field of multimodal teaching.

Content analysis

Understanding the research landscape of multimodal teaching is crucial for visualizing its current status and identifying potential research gaps and future directions. This section focuses on addressing RQ2.

To gain a comprehensive knowledge of its growth trends, it is needed to conduct a thorough examination of high quality and valuable documents. The number of citations of a document represents its popularity and recognition (Zhong and Lin 2022). Based on the quantitative analysis of co-occurrence and emergence of high-frequency keywords and citations, the highly cited and emerging high attention literature in the existing research achievements are deeply studied, to more comprehensively explore the important content in the multimodal teaching research field through the content analysis.

Content analysis of highly cited literature

Citation frequency serves as an indicator of the recognition and impact of an article. Literature is arranged in descending order based on citation frequency, and the top ten most-cited articles are chosen for analysis. This analysis aims to investigate the main content of significant literature in the area of multimodal instruction.

Firstly, highly cited literature focuses on multimodal composing. The most frequently cited literature is Jiang and Luk (2016), which analyzes the source of multimodal composing as a dynamic ability of English learning activities through in-depth semi-structured interviews and written reflections. The seven factors of multimodal composing dynamic ability sources summarized by the author are mostly cited by subsequent studies. Smith et al. (2017) discusses the utilization of heritage languages by bilingual students during the process of multimodal code-meshing. The study offers fresh insights into the simultaneous occurrence of recursion on various interconnected levels across modes, phases, and sections. Lim and Polio (2020) shift the research object to college students, investigating the types of second language multimodal writing tasks required by undergraduate courses, and discuss the research significance of the development of second language multimodal tasks.

Secondly, the highly cited literature focuses on digital multimodal writing. Hafner and Ho (2020) explored the criteria that teachers need to adopt when evaluating multimodal composition and design an assessment model for digital multimodal composition that is process-based and comprises four primary stages. Kim and Belcher (2020) explored the writing quality of students who engage in digital multimodal composing. They compare traditional writing to digital multimodal writing, concluding that there is no difference in the complexity of language, accuracy, or students’ cognition of writing tasks in terms of accuracy. This suggests that digital multimodal writing does not lessen the emphasis placed on language. Jiang (2017) provided empirical evidence to support the possibilities of using several modes of digital composition in English as a Foreign Language (EFL) acquisition. Additionally, Jiang proposed a framework that demonstrates how digital multimodal composing may enhance the process of learning a foreign language. Jiang (2018) explored the effect of digital multimodal composing on English learners’ writing in the classroom. The study examined how students modify their input in English writing and identifies the factors that influence their response and input change. The findings suggest that students’ multiple identities have a significant influence on their negotiation of digital multimodal composing. Smith et al. (2021) systematically reviewed middle school classroom composed of bilingual and digital multimodal, summarized five main finding themes through induction, discussed the significance of the theme and the key new directions.

Finally, the highly cited literature focuses on the application of multimodality in the classroom. Jewitt (2008) presented a literature review on multimodality and reading and writing ability in the classroom, discussing concepts related to multimodality and literacy in contemporary school classrooms, and exploring the future direction of multimodality, classroom policies and school education. The thorough arrangement of concepts in this literature is of great help to subsequent related literature in terms of research background or theory. Grapin (2019) categorized multimodality into weak and strong versions based on the conceptualizing multimodal English language learners’ education and content field differences, revealing that the use of strong multimodality is necessary in today’s content standards.

Overall, highly cited literature mainly comprises empirical research or literature reviews, with second language learners of English as the main research subjects. The study primarily focuses on the promotion effect of multimodality on education, specifically examining the factors that contribute to the promotion of multimodal composing, the specific types of multimodal composing required, and the assessment of multimodal writing. These findings highlight the wide range of research conducted in this area. However, the research objects are mainly focused on students and English teaching. Future research could focus more on teachers and conduct teaching research in other languages to promote diversification in research.

Content analysis of emerging high attention literature

The publication date significantly influences citation frequency. Exploring emerging high-profile literature from recent years can highlight the current research hotspots, offering valuable references for future multimodal teaching research. The relevant literature in 2021, 2022, and 2023 are selected. The top 60 pieces of literature with the greatest citation frequency in the previous three years have been chosen for focused research based on their citation frequency.

Firstly, concerning the multimodal tools used, some still rely on traditional teaching tools such as the blackboard (Tai and Wei 2023), but more are using new multimodal tools to assist teaching. Lin et al. (2022a), for instance, showcased how AR technology can strengthen the teaching of second language writing, and effectively improve the self-regulation of learners in the long-term memory, short-term memory and cognitive process in writing. According to Lin et al. (2022b), a well-designed multimodal learning environment can significantly improve learning outcomes by incorporating 3D books enhanced by educational robot and Internet of Things technology into vocabulary instruction in EFL classrooms. Tai and Wei (2021) found that the iPad can create a humorous and safe space for bilingual or multilingual students, which can expand teachers’ semiotic spatial repertoires to achieve their teaching goals. Liang and Hwang (2023) demonstrated the successful integration of robots into digital story design activities as a means of enhancing language learning. This source serves as a valuable resource for educators and researchers seeking to develop impactful robotics activities within educational environments.

Secondly, the emerging high-profile literature focuses on applying new concepts in the field of education. Shu and Gu (2023) designed a smart education model supported by Edu-Metaverse, which significantly improved students’ learning effect, and featured a high degree of creative freedom, multimodal interaction, resource sharing, and an intensely immersive experience.

Thirdly, many studies use mixed methods; for example, they collect data through case analysis, interviews, questionnaire surveys, and pre-performance and post-performance analysis of students, combining quantitative and qualitative analysis to objectively evaluate the adopted multimodal teaching methods. Using the case analysis, Nguyen (Ruby) (2023) concluded that VoiceThread has a greater potential to enhance the online engagement and enjoyment of distance learning of stakeholders, thereby promoting the development of an online community. Interviews and classroom observations were carried out by Yap and Gurney (2023) at an intermediate school in New Zealand. This paper used a narrative inquiry approach to examine how teacher incorporates digital technologies and texts in literacy instruction.

Finally, the emerging literature with high attention expanded the research subjects. Jiang et al. (2021) explore second language teachers’ engagement, revealing the three modes of teachers’ participation in digital multimodal composing, as well as the factors affecting their engagement. This research aligned with the need for teachers to participate in digital multimodal composing within the context of curriculum reform in the digital age, providing important inspiration for future second language teachers engaging in this practice.

The majority of the studies examined in this study were qualitative case studies based on observational data. Revising existing understandings of multimodal pedagogies requires the utilization of a wider range of methodologies, including the incorporation of quantitative and mixed methods research as well as the extension of qualitative approaches. New insights into the research in this area could be gained through the combination of quantitative and qualitative studies. Quantitative methods could be utilized to assess the learning outcomes of multimodal teaching and to account for background variables that are associated with the social diversity of learners.

The focus on new technologies and expanding research horizons underscores multimodal teaching’s interdisciplinary nature. This shift highlights its international relevance and adaptability across age groups, introducing new concepts into the field.

Conceptual framework

Based on the findings from the bibliometric and content analysis, the Multimodal Teaching Research (MTR) conceptual model was developed to capture the dynamic and multifaceted nature of multimodal teaching. Grounded in constructivist learning theory, the model provides a structured visual representation of the key antecedents, in-class and after-class activities, and outcomes associated with multimodal teaching. The constructivist theory posits that learning is an active process where new knowledge builds on prior understanding and is shaped by social interactions within the learning environment (Vygotsky 1978). In a constructivist classroom, the teacher’s role is to establish a collaborative, student-centered atmosphere (Oliver 2000).

As shown in Fig. 14, the MTR model organizes multimodal teaching into three main stages: Pre-class, In-class, and After-class activities. Each stage highlights how different modalities—such as vision, gesture, space, and audio—work together to foster a conducive learning environment. Antecedents are divided into personal factors (e.g., teachers’ vocal expressions) and structural factors (e.g., technology, language policies), while outcomes address both teacher and student impacts, focusing on improved teaching strategies and learning achievements.

Fig. 14: Conceptual model of multimodal teaching.
figure 14

This framework highlights the key elements and its impacts on both teachers and students.

In the Pre-class stage, teachers prepare multimodal resources and assess students’ learning attitudes to adapt their instructional approach. During the In-class stage, various modalities such as visual aids, gestures, and spatial arrangements are used to engage students and support comprehension, with Fig. 14 illustrating how these modes synergize for an integrated learning experience. The After-class stage involves reflection and assessment, which helps refine teaching methods for continuous improvement.

According to the New London Group, modes are divided into space, gesture, vision and audio. Based on the theoretical frameworks of multiple literacies, students should be emphasized, and situated practice to build a suitable learning environment for students. Wang (2017) mentioned that the utilization of multimodal teaching can activate the students’ vision, hearing, sense organs, thereby mobilizing their motivation for autonomous learning, stimulating their interest in acquiring knowledge and fostering their proficiency in applying language skills.

In today’s digital age, students can obtain knowledge from many aspects, teachers need to change the teaching concept, the focus of teaching from imparts knowledge to promote the internalization of students’ knowledge, so the multimodal knowledge resources before class are necessary. The classroom should be student-centered, and it is necessary to investigate the students’ learning attitude before class, adjust the teaching content timely, and deploy the mode according to the content. In the classroom, various modes are invoked to create a suitable learning environment for students, and multimodal course resources are used. The multimodal facilities of digital technologies enable image, sound, and movement to enter the classroom in new and significant way (Jewitt 2008). Maluleke (2019) suggests that after-class, it is important to focus on evaluating teaching methods. The multimodal teaching mode involves using different communication methods, with an emphasis on teachers using multimodal teaching techniques and students engaging in multimodal learning and interactive assessment.

Among the antecedents of multimodal teaching, structural factors such as modality, clarity manipulations (Limperos et al. 2015), enlightened language policies (Ntelioglou et al. 2014), technology (Liu et al. 2022a), the subject matter studied, the cultural environment, the educational goals (Poyas and Eilam 2012) impact the financial literacy of individuals. Other factors like personal factors such as teachers’ voice/facial expressions (Peng 2019), learner characteristics (Chan and Unsworth 2011), and learner’s perspective (Dunn and Sweeney 2018) are the critical antecedents analyzed in the research so far. In the outcomes of multimodal teaching, literature shows that multimodal teaching has benefits for both educators and learners.

The conceptual framework is primarily designed to help clarify and visualize key knowledge points in multimodal teaching. By utilizing this framework, educators can systematically understand the content and knowledge structure of multimodal teaching, enabling them to identify antecedent factors, components, and learning outcomes. This understanding allows teachers to select appropriate instructional modes, enhancing the effectiveness of teaching. Additionally, the framework supports the recommendation of personalized learning resources and planning of learning paths for students, thereby improving the quality and efficiency of their learning.

The MTR conceptual model synthesizes the essential components of multimodal teaching, providing a structured lens through which the impact of multimodal teaching on both teachers and students can be better understood. This framework not only supports practical applications in teaching but also offers clear directions for future research in the field.

Discussion

This study employed bibliometric analysis and content analysis to explore the research trends and key developments in multimodal teaching, analyzing 689 documents from 1995 to 2023. The findings reveal distinct phases of development and offer valuable insights into the key contributors, themes, and future directions for multimodal teaching research.

Regarding to RQ1, from a temporal perspective, research on multimodal teaching from 1995 to 2023 has undergone three distinct phases: slow development period (1995–2008), stable growth period (2009–2015), and major development period (2016–2023). There were few research results in the slow development period of multimodal teaching research. Earlier studies were mainly in the form of books and rarely published in journals. This finding aligns with Yang (2019). Between 2009 and 2015, research activity grew steadily, spurred by the rise of new media and increasing learner demands. Kress (2009) proposed that the “world of meaning is multimodal” arises a range of questions in the field of education. Therefore, numerous researchers have conducted extensive studies on this issue, leading to a steady increase in research on multimodal teaching since 2009. In the third stage, the volume of relevant literature produced between 2016 and 2023 regarding multimodal teaching exhibits a general upward trend, with a notable increase in 2022, when over 100 papers were published. This surge can be partly attributed to the impact of the COVID-19 pandemic, which pushed educators to adopt digital and multimodal teaching strategies to engage students in virtual environments. These multimodal approaches helped address the challenges of remote learning, highlighting the integration of multimodality and multiliteracies in enhancing students’ comprehension of various multimodal systems. It is widely acknowledged that the integration of multimodality and multiliteracies can effectively support the pedagogical task of cultivating students’ explicit comprehension of a wide range of multimodal systems and their design. The rise of new technologies has also played a pivotal role in shaping modern teaching practices. Pyo (2016) states that online activities provide teenagers with many chances to engage deeply in a globalized world, establish transnational and transcultural spaces inside and across cultures, and use various communication techniques to construct significance. The organization of academic conferences also facilitated the advancement of multimodal teaching during this period, for instance, Archer and Breuer’s (2015) Multimodality in Writing: The State of the Art in Theory, Methodology and Pedagogy (a collection of thirteen seminal papers); Zhang and Huang’s (2018) Multimodal and Foreign Language Education Research which is the outcome of the 1st China Multimodality Forum held at Tongji University in Shanghai.

Institutional analysis reveals those institutions like Nanyang Technological University lead in research volume on multimodal teaching. This is primarily due to Singapore’s education is ongoing systematic examination and transition. Since 1997, Singapore has implemented three follow-on Masterplans on ICT (information and communication technologies). The first two Masterplan have facilitated the establishment of the physical and institutional foundations. The Ministry of Education, Singapore, encouraged more research and development (R&D) work to support cutting edge and innovative ICT-focused educational approaches (Koh and Lee 2008). The third Masterplan began in the year 2009. Efforts to enhance ICT integration within the curriculum, pedagogy and assessment in order to keep pace with the 21st century competencies evolved (Ministry of Education, Singapore 2008a). This promoted the research on multimodal teaching by Singaporean, among which the number of papers published by Nanyang Technological University was the highest. The English language syllabus explicitly requires teachers to “create opportunities for pupils to be exposed to a wide range of rich texts which model good writing and the use of language” (Ministry of Education, Singapore 2008b, p. 29). In Singapore, the government strategic focus on “making learning come to life with multimedia and interactive elements” (Info-communications Development Authority of Singapore 2010, p. 22) as laid out in the ten-year, national ICT Master plan, “The Intelligent Nation 2015”. This highlights the crucial role of national policies in driving institutional research output and aligns with the observed increase in multimodal studies from Singapore.

Globally, the United States, People’s Republic of China, and Australia are prominent contributors to this field. The research on multimodal teaching is primarily conducted in these countries and is closely linked to the national policies. Moreover, most publications on multimodal teaching focus on education, computer science, and communication. This analysis serves as a valuable reference for academics seeking a suitable journal for publishing their research in this field. In the United States, Common Core State Standards highlight multimodal texts, as highlighted by Jacobs (2012), and advocate for the use of multimodal teaching methods. From another perspective, Yelland (2018) mentioned that governments and the general public are too fixated on national test results that primarily assess reading and writing abilities, as well as the demonstration of fundamental skills, in order to ensure that students achieve national standards. Overemphasis on curriculum standardization and testing is increasingly impacting learners. As digital technology becomes more prevalent and the definition of literacy evolves, the emphasis in schools remains on traditional literacy skills. Therefore, it is crucial to conduct a thorough study of multimodal teaching within this context. China is on the list as the Chinese government has been committed to attracting talented overseas scholars by implementing strong economic incentives and government policy (Zhang 2019). The earlier programs, such as the ‘100 Talents Program’ and the ‘Yangtze River Scholars Program’, targeted leading scientists and researchers who have worked overseas and have achieved significant scientific achievements (Chinese Academy of Science 2014; Ministry of Education of China 2014). In the past 15 years, these 2 programs alone have recruited more than 4000 Chinese researchers (Zhao and Zhu 2009). Such policies have driven multimodal research growth in China, underlining the connection between governmental support and academic output. This global distribution analysis underscores the close ties between national policies and multimodal research development.

The increasing trend of publications, especially after 2016, signifies a growing recognition of multimodal teaching’s relevance, with notable contributions from leading institutions like Nanyang Technological University and significant regional outputs from countries such as the USA, China, and Australia. This aligns with global educational shifts towards digitalization and multimodal learning environments, indicating that the research landscape mirrors broader societal and technological developments.

Regarding to RQ2, research in the field of multimodal teaching is highly popular, and new technologies and concepts are increasingly adopted, but the scope of research needs to be expanded. At first, scientific literature focused on core themes such as language and teaching design, emphasizing that the research object was students and teachers, and focused on the cognitive load. Later, the emphasis shifted towards literacy, using several research perspectives and methodologies to conduct thorough analysis. Recently, based on previous research, the focus has shifted to the practical application of new technologies, emphasizing media literacies. According to Stornaiuolo and LeBlanc (2016), there is a growing trend in education to view literacy as multiliteracies, which is further emphasized in the classroom due to the increasing use of digital technologies, leading to the emergence of new literacy practices. This is due to the development of the times, and new media is also constantly evolving. According to Bingimlas (2009), given the fast growth and popularity of multimedia technology, it is commonly accepted that information and communication technology (ICT) is crucial in teaching and learning.

The MTR model integrates key antecedents of multimodal teaching, such as personal factors (teacher’s voice, learner characteristics) and structural factors (modality, technological applications), to outline their influence on teaching outcomes. This framework also addresses RQ2 by identifying and categorizing main topics like modality integration, learner engagement, and technological advancement. By organizing these elements, the framework demonstrates how different modes—such as visual, auditory, and gestural—contribute to enhanced pedagogical practices and learning experiences. Furthermore, it categorizes teaching phases (pre-class, in-class, and after-class) to show the dynamic nature of multimodal pedagogy and how it adapts to diverse educational contexts. This allows for a more comprehensive understanding of how multimodal teaching strategies respond to the evolving concerns in education, such as increased digital tool integration and diverse learner needs.

For RQ3, from a theoretical standpoint, this study synthesizes key research on multimodal teaching, offering a comprehensive overview of its knowledge domain. Through the method of bibliometrics analysis and content analysis, this study identifies the hot spots and evolution of the research topic, providing theoretical guidance for future research. This research can offer scholars in this field research concepts and topic references from a practical perspective. The analysis of influential journals and institutions guides researchers to accurately find suitable journals and institutions. In addition, the shortcomings in the research field should be found, which has the potential to support researchers in advancing the process of knowledge reconstruction and fostering disciplinary innovation. This research examines development patterns in the area of multimodal teaching.

The MTR model also lays the foundation for exploring theoretical and practical outcomes of multimodal teaching, offering insights for future research, as addressed in RQ3. The framework provides a clear representation of how multimodal teaching positively impacts both teachers and students. For teachers, it enhances instructional methods and professional growth, allowing them to respond more effectively to current trends in multimodality and multiliteracy pedagogy. For students, it increases perceptions of learning, promotes active classroom participation, and fosters autonomy and literacy engagement. These outcomes are visualized through the framework’s depiction of teaching phases and modes of communication, emphasizing the importance of structured pre-class preparations, in-class multimodal engagement, and after-class reflections. This structured approach helps educators implement multimodal strategies effectively, thereby aligning theory with classroom practices. This framework not only serves to answer RQ1 but also effectively addresses RQ2 and RQ3. By providing a comprehensive overview of the knowledge network associated with multimodal teaching, the framework facilitates a deeper understanding for future researchers, enabling them to adopt it as a foundational research framework.

This research offers valuable insights for anyone engaged in the study for multimodal education. Based on the aforementioned findings, there are some potential directions for future development in the field of multimodal teaching. (1) Multimodal teaching encompasses various disciplines such as language instruction, computer science, multimodal writing, psychology and education; however, research is currently conducted within disciplinary boundaries. Therefore, there is a need for more interdisciplinary research in multimodal teaching in the future. (2) Future research should investigate what other country-level factors influence research on multimodal teaching. (3) Research should employ diverse research methodologies, such as conducting expert interviews, to validate the findings and enhance the study’s credibility.

Conclusions

Despite a significant increase in the scientific literature on multimodal teaching, there remains a notable lack of comprehensive bibliometric and visual analyzes to provide an overview of current research. This study employed CiteSpace to explore multimodal teaching research trends from 1995 to 2023, uncovering key findings:

  1. (1)

    Publications and citations in multimodal teaching research have consistently increased since 2016, reflecting growing scholarly interest and the expanding influence of multimodal teaching methodologies.

  2. (2)

    Nanyang Technological University is the most prolific institution, with substantial contributions also from the USA, People’s Republic of China, Australia, Spain, and England. Focusing on these institutions and countries can provide researchers with insights into current advancements and regional priorities in multimodal teaching.

  3. (3)

    The top 12 research hotspots in multimodal teaching include augmented reality, conversation analysis, multimodal composing, media literacies, multimodal texts, cognitive load, early childhood education, video conferencing, multimedia human-computer interaction, science education, learner attitudes, multimodal teaching and learning environments. These clusters highlight the interdisciplinary nature of multimodal teaching, integrating insights from fields such as language education, computer science, and psychology.

  4. (4)

    The emphasis in multimodal teaching research has gradually shifted from younger learners to adult learners, reflecting a broader application of multimodal techniques across age groups and education levels.

  5. (5)

    Hafner and Ho’s (2020) work on assessing digital multimodal composing is the most cited, reflecting its significant impact. Additionally, Kress emerges as a central figure in the field, while Harvard Educational Review is the leading publication, underscoring its role as a repository for multimodal research.

  6. (6)

    Highly cited literature primarily investigates multimodality’s impact on teaching effectiveness, including factors like multimodal composition types and assessment methods. Emerging research adopts increasingly interdisciplinary methods and new technologies, broadening the scope and relevance of multimodal studies.

  7. (7)

    The MTR conceptual framework synthesizes key elements of multimodal teaching, illustrating how teaching strategies, instructional activities, and learning outcomes are interrelated. This framework serves as a guide for future interdisciplinary research in multimodal teaching.

In contrast to traditional reviews, this study provides a structured, systematic perspective on multimodal teaching through bibliometric and content analysis, offering both theoretical and practical insights. However, the study has some limitations. Only English language articles were examined in this study. Papers written in other languages were not taken into account. Further investigation into various languages across diverse academic databases within the field of multimodal teaching is warranted. Moreover, since this article solely captures samples from WoS, it may not encompass all relevant journals in the field. Future research endeavors may consider analyzing data from additional databases, including Scopus, PubMed, the Education Resources Information (ERIC), and Academic Search Elite (EBSCO). Finally, the dataset only included journal articles and literature reviews, perhaps limiting its comprehensiveness and impartiality as a representation of all existing research on multimodal teaching. To provide readers with a deeper comprehension of the most recent findings in the field of multimodal teaching, future research could consider about broadening the data collection to include other types of publications, such as conference papers and book chapters.