Introduction

Peer tutoring has long been recognized as a critical component of academic support programs in higher education institutions worldwide. It typically involves academically proficient students assisting their peers in understanding course content under the supervision of faculty or academic staff (Topping, 1996). Over the years, peer tutoring has gained widespread recognition for its benefits, such as enhanced academic performance, improved student confidence, and the development of critical thinking skills among both tutors and tutees (Arco-Tirado et al., 2019; Murtonen et al., 2023). Moreover, research by Falchikov (2001) found that peer tutors who received structured training were better equipped to handle diverse learning needs and could foster more meaningful academic interactions with tutees. Despite its effectiveness, most of the research in this domain has focused on categorizing different types of peer tutoring, including distinctions based on year of study, subject matter, and the duration of tutor-tutee interactions (Topping, 1996; Arco-Tirado et al., 2011).

As higher education continues to modernize further and more focus is given to how to improve the quality of instruction, peer-tutoring initiatives need to adapt to serve more of the needs of modern-day learners. Notwithstanding, a key limitation to the effectiveness of these programs is the lack of formal pedagogical training for peer teaching, as Di Benedetti et al. (2022) mentioned, such as defects in the graduate teaching assistants. While peer tutoring has been shown to improve academic outcomes, the absence of structured teaching methods in many programs raises concerns about whether tutors are adequately equipped to facilitate meaningful learning experiences. In many cases, pedagogical training is only offered to professional teaching staff, leaving peer tutors—who may excel academically but lack formal teaching skills—without the necessary tools to maximize their effectiveness (Murtonen & Vilppu, 2020).

In many university peer tutoring programs, including those in business and science disciplines, students are recruited based on academic qualifications rather than teaching ability. Yet, their expertise in the subject matter is clear, and the ability to convey that knowledge in an accessible and engaging manner remains in question. Research by Delquadri et al. (1986) highlights that while peer tutoring can boost students’ confidence and subject knowledge, the role of pedagogical training in enhancing tutoring effectiveness has been overlooked. A growing body of literature suggests that pedagogical training can play a crucial role in developing tutors’ ability to deliver content more effectively, leading to better learning outcomes for tutees (Di Benedetti et al., 2022; Gardner & Jones, 2011).

The aim of this study is to investigate the impact of pedagogical training on peer tutoring outcomes within the context of a Sino-American university. Although existing literature emphasizes the importance of pedagogical skills for effective tutoring, there is limited research on how such pedagogical training specifically enhances the peer tutoring experience, both the process and outcomes. Meanwhile, the university’s academic center operates a large peer tutoring team, yet it remains unclear to what extent pedagogical training has been integrated and how it affects the quality of tutoring provided. This research addresses this gap by examining the effect of pedagogical training on tutoring outcomes and exploring how peer tutors with formal teaching training influence tutees’ academic performance and learning attitudes. This study seeks to contribute to the broader conversation on how universities can improve peer tutoring programs through the integration of pedagogical training, thereby better supporting students’ academic and personal development. In the long run, findings from this study could lead to the development of standardized training modules tailored to different academic disciplines, and such advancements may contribute to creating a more inclusive and supportive learning environment for all students. Furthermore, this research holds the potential to influence educational policy within higher education institutions, advocating for the integration of pedagogical frameworks into peer tutoring as a means to promote equity in academic success.

Literature review

The effectiveness of peer tutoring in higher education

Existing literature defined peer tutors as substitute teachers who transfer knowledge from teachers to students (Topping, 1996). As peer tutoring developed, a modern conception of peer tutoring was given to individuals from comparable social classes who are not qualified teachers assisting one another in learning and learning by teaching (Alegre Ansuategui, Moliner Miravet, 2017; Topping, 2020). Peer tutoring is characterized as good, cost-effective for tutees in higher education than traditional remedial programs, which are primarily concerned with passive learning and have often been condemned by scholarship organizations for not involving students in professional activities (Topping, 1996; Stump et al., 2011). Numerous research reported undergraduate students in peer tutoring groups obtained significantly exceptional academic achievement compared to students who were not peer tutored (Topping, 1996; Sobral, 2002; Saunders (1992); Maynard & Almarzouqi, 2006; Comfort & James McMahon, 2014; Topping et al., 2017; Raja et al., 2018). Apart from enhancing academic performance, peer tutoring also improves positive learning attitudes and student motivation (Malone et al., 2019). Núñez-Andrés et al. (2021) examined 103 undergraduate students and evaluated the suggested peer learning experience and its influence on their sustainability mindsets and education. Sanchez-Aguilar (2021) reported that both tutors and tutees possessed active and effective experiences in their English-language tutoring. The result showed peer tutoring increased students’ knowledge, motivation, and dedication to the disciplines they were taught in.

Cross-year small-group tutoring, one of the forms of peer tutoring, is when senior undergraduates or postgraduates act as tutors for lower-year undergraduates, and every tutor works with a small group of students at the same time (Topping, 1996). Research has shown that cross-year small-group tutoring is effective in enhancing students’ academic performance, reducing dropout rates, and increasing confidence. Bobko (1984) found that although no critical improvement in course grades among tutees compared to previous years for 25 tutees who were peer tutored 12 h per week, there were notable benefits: tutees reported increased confidence and reduced anxiety, while tutors improved their own knowledge and communication skills. Lidren et al. (1991) compared the results of peer tutoring groups of six with groups of twenty using randomized control groups. Compared to pupils who did not get tutoring, both groups outperformed them academically in terms of test scores and favorable subjective evaluations. The results of the smaller peer-taught groups were superior to those of the bigger ones. Arnemen and Prosser (1993) found that both tutors and tutees gained confidence through peer tutoring in Australia. American River College (1993) discovered that even though tutees received lower general Grade Point Average (GPA), tutees outperformed non-tutors in the disciplines in which they received tutoring.

BOPPPS model in teaching practice: heterogeneous and multidimensional

The BOPPPS model, a participatory teaching approach, serves as a teaching pedagogical technique for lesson planning and designing. Pattison and Russell (2006) introduced this model, which was originally developed by Douglas Kerr and his research team in Canada, in the Instructional Skills Workshop (ISW) book. They explained this six-phase process as such:

B: Bridge-in. Build motivation and link the lesson objective to the learners.

O: Objective. Clarify what learners should know, think, value, or do by the end, including conditions and performance levels.

P: Pre-assessment. Identify what learners already know about the learning topic or content.

P: Participatory Learning. Engage learners actively in the learning activities.

P: Post-assessment. Find out at which level learners meet the objective.

S: Summary. Allow learners to reflect and integrate the lesson.

They stated that the model underscores the necessity of tailoring pedagogical strategies and instructional approaches to diverse learner profiles to effectively achieve educational objectives. Additionally, BOPPPS prioritizes student agency in the learning process, fosters dynamic interaction between instructors and students, and maintains the coherence of instructional sequences. It, in teaching practice with heterogeneous disciplines like English (Li, 2016; Wang et al., 2022; Guo, 2024), Computer Science (Wang, 2020), Communication (Yang et al., 2023), Business Etiquette (Shih & Tsai, 2019), and Medical Education (Liu et al., 2022; Xu et al., 2023; Ma et al., 2021; Li et al., 2023), is widely used. These studies underline the effectiveness of the BOPPPS model with the following salient traits:

  1. 1.

    Stimulate students’ learning autonomy (Guo, 2024; Li, 2016; Liu et al., 2022);

  2. 2.

    Vibrant learning or classroom atmosphere (Xu et al., 2023; Yang et al., 2023; Li et al., 2023);

  3. 3.

    Better learning outcomes on tests and exams (Ma et al., 2021; Shih & Tsai, 2019; Wang et al., 2022).

While the application of the BOPPPS model becomes proficient and extensive in teaching practice, educators and researchers are embedding the BOPPPS model with other pedagogical strategies to scaffold the learning process. Li et al. (2023) revealed that team-based learning combined with BOPPPS helped nursing students develop critical thinking skills and motivate their learning interests. Xu et al. (2023) and Liu et al. (2022) both investigated the effectiveness of the hybrid BOPPPS (HBOPPPS) model, concluding that HBOPPPS within the capabilities of flexibility and reproducibility to enhance the learners’ practice ability and satisfaction. The combination of blended learning and the BOPPPS model (Ma et al., 2021) reflects the multidimensional adhibition and practical significance of the BOPPPS model in teaching practice as well.

In a nutshell, the BOPPPS model represents heterogeneous usage across disciplines with multidimensions. Albeit so, a critical issue persists regarding the effectiveness of tutors who, despite receiving pedagogical training, are tasked with incorporating the BOPPPS model alongside different scaffolding strategies during the tutoring process. This challenge calls into question whether such an approach truly enhances the tutoring experience. Therefore, this study aims to resolve the following research questions (RQs):

RQ1: How do peer tutors with pedagogical training, particularly using the BOPPPS structure, influence the effectiveness of tutoring outcomes in comparison to those without pedagogical training?

RQ2: What are the perceptions of tutees regarding the support provided by peer tutors who have received pedagogical training?

RQ3: Whether and to what extent do tutoring sessions led by peer tutors with pedagogical training influence tutees’ academic confidence and their attitudes towards peer tutoring programs?

Research methodology

Research setting

The study was conducted at a Sino-American university. Each year, the university enrolls approximately 900 students, many of whom face challenges adapting to the English-language learning environmentFootnote 1. In response, the Student Academic Support and Retention Center (SASRC) recruits 50 peer tutors to provide support, offering 186 tutoring courses across 21 majors. In 2023 alone, these peer tutors provided 4291 hours of one-on-one tutoring. Approximately 21% of the undergraduate population participated in the programFootnote 2.

Peer tutoring program

The SASRC at the university administers the Peer Tutoring Program, which endeavors to aid students in enhancing their academic performance. Eligible candidates with a cumulative GPA of 3.7 or higher may seek to provide tutoring for subjects in which they earned A or A- grades after they held the peer tutor positions.

Peer tutors with pedagogical training

Peer tutors who have undergone pedagogical training are denoted as candidates who completed the EMSE 3420 course BASIC THEORY/PRAC ENG LEARN (Basic Theory/Practice of English Learners) with an A grade at the university. They have been equipped with structured pedagogical training, have formulated written teaching strategies, and have had the opportunity to trial their teaching plans in a secure setting. They adopt Bloom’s Taxonomy and follow the ‘BOPPPS’ lesson structure for designing each lesson plan. Tutors will use various scaffolding techniques during their teaching process.

Tutoring course

The tutoring course in this research will be ENG 1430 COLL COMP II FOR ELL (Composition II for English-Language Learners). The university offers year-long academic writing courses tailored for undergraduate students to acclimate them to the English for Academic Purposes setting. The composition curriculum within the academic program comprises two components: ENG 1300 and ENG 1430. Students who achieve a grade of seventy percent or higher in ENG 1300 are eligible to progress to ENG 1430 in the subsequent semester, thereby fulfilling the course requirements. The ENG 1430 course necessitates the completion of four distinct categories of written compositions: (a) Rhetorical Analysis, (b) Bibliography, (c) Annotated Bibliography, and d) Argumentative Essay. This study only offered tutoring sessions for the first three categories.

Research design

This study employed a post-test control group-only experimental research design with repeated measures, conducted over an academic term. Fourteen candidates from College of Liberal Arts (CLA) were recruited by the SASRC and categorized into four types of tutoring, as shown in Table 1.

Table 1 Peer tutor profile within each type.

Types A (pedagogical training tutors with tutoring experience) and B (pedagogical training tutors without tutoring experience) have three peer tutors who conduct tutoring each time, while C (non-pedagogical training tutors with tutoring experience) and D (non-pedagogical training tutors without tutoring experience) types only have one peer tutor for a session apiece. They offered three rounds of group tutoring with four sessions per round, conducted by the four types of tutors, respectively. More specific details with time and participant samples are presented in Table 2.

Table 2 Three rounds of group tutoring sessions.

The participants were randomly assigned to these four groups according to their academic performance in ENG 1300 (e.g., GPA) to ensure that students’ academic levels in each class were normally distributed by passing the Kolmogorov–Smirnov test (p > 0.05). To solve RQ2 and RQ3, the study also utilized a mixed method, amalgamating quantitative analysis for RQ1 and qualitative inquiry for the last two research questions to steer the investigative trajectory.

Research instrument

The study utilized both a survey protocol (Appendix A) and interview protocol (Appendix B) as the tools. Researchers applied SPSS 26.0 to examine the reliability of the survey protocol and ran confirmatory factor analysis (CFA) in AMOS 26.0 to examine the measurement models. The elaborated structure and scales are presented below:

I) Respondents identification: This includes a combined question (Q1) of the project ID and the engaged tutoring session ID of participants;

II) Tutor teaching persona: This scale engages five questions (Q2–6) to inquire about the tutor’s teaching persona trait with a 5-point Likert scale. The internal consistency alpha coefficient demonstrated excellent reliability with a value of α = 0.94;

III) Tutoring satisfaction: Six questions (Q7–12) on a 5-point Likert scale are included for this scale to inquire about the tutee’s feelings and overall satisfaction degree of the tutoring session, with high reliability of α = 0.91;

IV) Teacher Autonomy Support: This scale (Q20, 22, 25, 29) evaluates students’ perceptions of their teachers’ support for autonomy using a 4-item version of the Learning Climate Questionnaire (LCQ, Williams & Deci, 1996) with a single-factor structure on a 5-point Likert scale. The item loading ranged between 0.79 to 0.90. With a reliability rating of 0.91, according to Cronbach’s alpha, the scale showed strong internal consistency. To better fit the Chinese collegiate atmosphere in this study, ‘teachers’ were changed to ‘tutors” in this study’s version of the LCQ. As one item put it, ‘I feel that my tutor provides me with choices and options’. A CFA for one factor solution yielded satisfactory fit indices χ² (0.37, df = 2), CMIN/DF = 0.19, n = 305, p > 0.05, CFI = 1.00, TLI = 1.01, SRMR = 0.01, and RMSEA = 0.00.

V) Teacher Control Questionnaire (TCQ): A six-item, quick self-report questionnaire was used to gauge how well-controlled the instructional conduct in class was perceived by the students (Q21, 23–24, 26–28). Four items are included in this assessment on a 5-point Likert scale, which has a high internal consistency score (α = 0.94) with answers like ‘My tutor tries to control everything I do’ and ‘My tutor puts much pressure on me.’ Similar to teacher autonomy support, the TCQ was modified for this research by replacing ‘teachers’ with ‘tutors’ in order to better reflect the Chinese college context used for this study. The items loading ranged between 0.81 to 0.89. A CFA for a single-factor solution yielded satisfactory fit indices χ² (21.14, df = 9), CMIN/DF = 2.35, n = 305, p < 0.05, CFI = 0.99, TLI = 0.99, SRMR = 0.02 and RMSEA = 0.07.

VI) Intrinsic Value: Three items from The Motivated Strategies for Learning Questionnaire—Revised Chinese Version (MSLQ-RCV; Lee et al., 2010) to represent intrinsic value (Q14, 16, 18). An example of an item is, ‘I will do more than is required of me for this assignment’. Internal consistency reliability for this construct was 0.77. The item loading ranged between 0.42 to 0.90.

VII) Self-efficacy: Four items from The Motivated Strategies for Learning Questionnaire—Revised Chinese Version (MSLQ-RCV; Lee et al., 2010) to represent self-efficacy (Q13, 15, 17, 19). An example of an item is, ‘I am sure I can do an excellent job on this assignment’. Internal consistency reliability for this construct was 0.76. The item loading ranged between 0.55 to 0.89.

VIII) Open-ended questions: Two open-ended questions (Q30-31) are added for participants to share their feelings and suggestions about the tutoring session.

Aim to comprehensively understand what these tutoring sessions bring to tutees and investigate how they further impact them. A semi-structured interview was applied in this study. Based on the three primary questions, researchers aspired to discover tutees’ personal attitudes toward these tutoring sessions. Participants could be reminded of the experience of attending the tutoring sessions more easily through the questions asked by researchers. Emphasized more on what participants learned from these tutoring sessions and how they benefited from them. Researchers tried to figure out whether these tutoring sessions indeed help them practically when they learn new things or review their knowledge. During the interview, in order to encourage participants to talk more about their personal feelings, researchers used some promotional sentences like ‘Besides these, can you remind me more?’, ‘Compared to the classes before you attend the tutoring session, how do you feel?’ This facilitated researchers to learn more from their words.

The interviews were administered by researchers who were not involved in the tutoring sessions to ensure impartiality in data collection. The primary objective was to elicit genuine insights and feedback from the participants. To enhance the credibility and rigor of the qualitative data obtained through interviews, multiple methodological safeguards were implemented. Member checking was employed by summarizing key interview responses and presenting them to participants for verification, ensuring the accuracy of the interpretations and alignment with their intended meanings. Additionally, peer debriefing sessions were conducted with research colleagues who critically examined the coding framework and engaged in discussions regarding emerging themes, thereby mitigating potential researcher bias and enhancing the reliability of the findings. The coding process was done alone by the first author of this paper via NVivo 14.0. After coding, all researchers of this study reviewed the codes and agreed on the setting code themes and contents.

Thematic saturation was assessed through an ongoing review of emerging themes during the data analysis process. Interviews continued until no new themes or momentous variations emerged, indicating that saturation had been reached. By explicitly addressing thematic saturation, the study ensures that the qualitative findings are comprehensive and representative of the participants’ perspectives. This strengthens the validity of the analysis and demonstrates the depth of the qualitative insights obtained.

Data collection

For this mixed-methods approach study, an online survey was used to collect quantitative data to examine tutees’ perceptions of these tutoring sessions. Participants were required to fill out the evaluation survey after each tutoring session in which they participated. Researchers collected 206 data points (90 participants at time 1, 66 participants at time 2, and 50 participants at time 3) in total before data cleaning. Outliers were excluded before data analysis based on several criteria: (1) respondents selecting the same score for every item on the scale, (2) questionnaires that were completely or partially blank, (3) students providing an incorrect tutor number, (4) incorrect student IDs, and (5) surveys completed in less than one minute. After screening the data, we deleted 20 cases based on criterion 1 and 12 cases based on criterion 5 across all three time points. No students left any items blank or incomplete, so there were no missing data in each time point in this study. Additionally, no students entered incorrect tutor IDs or student IDs. Ultimately, 174 valid data points (73 samples at time one, 54 samples at time two, and 47 samples at time three) are determined to be maintained. The interview was conducted at the end of all tutoring sessions. Scheduled to interview 10 participants, each participant joined voluntarily under the incentive reward system, which benefited them in the following auction held by researchers to stimulate their internal interest. Each interview was conducted in, at most, 15 min. Researchers are required to ask for the recording approval before the interview officially begins.

Data analysis

To assess the effectiveness of pedagogical training and tutor experience, we examined group differences according to these two factors. Group comparisons were conducted using the Scheirer-Ray-Hare test for each time point, a nonparametric alternative to two-way ANOVA suitable for data that violate normality assumptions (Scheirer et al., 1976). The primary benefit of the Scheirer-Ray-Hare test lies in its capacity to analyze ranked data from factorial designs and examine both main effects and interaction effects between categorical variables (Sheskin, 2003). In contrast, the Kruskal–Wallis test is limited to assessing group differences based on a single factor (Field, 2014). Therefore, the Scheirer-Ray-Hare test was the preferred method, ensuring a more comprehensive analysis of both main effects and interactions. The Scheirer-Ray-Hare test has been performed using R (v4.4.1) with the rcompanion package.

A Mann–Whitney U-test was used to perform significant interaction effects after running the Scheirer-Ray-Hare test due to its suitability for small samples and non-normal data (Kolmogorov–Smirnov test, p < 0.05) (Nachar, 2008). It is robust to outliers, reducing the risk of misleading significance compared to the t-test (Nachar, 2008; Siegel & Castellan, 1988). While bootstrapping is an alternative for non-normal data, it has key drawbacks compared to the Mann–Whitney U-test. It is computationally intensive, requiring thousands of resampling iterations, whereas Mann–Whitney U is faster and more efficient (Gibbons & Chakraborti, 2011). Bootstrapping is also sensitive to outliers, which can distort estimates, unlike Mann–Whitney U, which is rank-based and more robust (Efron & Tibshirani, 1993). Moreover, bootstrapped confidence intervals may be unstable, especially in small-to-moderate samples (Davison & Hinkley, 1997). Given these factors, Mann–Whitney U is the preferred method for its efficiency, robustness, and direct hypothesis testing capability. To account for multiple comparisons, we applied a correction for Type I error using the Holm-Bonferroni method. This adjustment ensured that our Mann–Whitney U test comparisons remained statistically robust while controlling for the increased likelihood of false positives (Aickin & Gensler, 1996). The Mann–Whitney U-test was carried out using R (v4.4.1).

Additionally, aligned ranks transformation ANOVA (ART ANOVA) is a nonparametric method that facilitates the analysis of multiple independent variables, their interactions, and repeated measures within factorial designs (Wobbrock et al., 2011). It is used to investigate the effects of tutor experience, pedagogical training, and time on tutee’s feelings of teaching style, satisfaction, intrinsic value and self-efficacy in this study. The key advantages of ART ANOVA include its ability to handle repeated measures and accurately evaluate both main and interaction effects in factorial designs, making it particularly constructive when analyzing within-subject factors over multiple time points (Salter & Fawcett, 1993). Additionally, it is robust to heterogeneity of variances and unequal sample sizes, ensuring reliable statistical inferences even when assumptions of homoscedasticity are not met (Salter & Fawcett, 1993). By applying ART ANOVA, this study ensures that both main effects and interaction effects are properly assessed, providing a more comprehensive understanding of how tutor experience and pedagogical training evolve over time.

This study addressed the limitation of the ART ANOVA, which cannot handle missing data (Wobbrock et al., 2011) By replacing missing data with the median, it ensures that the analysis remains robust while maintaining the integrity of the dataset. ATR was implemented using the ARTool package in R (v4.4.1).

Results and discussions

Pedagogical expertise: comparative performance

Table 3 presents comparisons of the outcome variables based on two factors: tutoring experience (tutors vs. non-tutors) and pedagogical training (pedagogical vs. non-pedagogical) by the Scheirer–Ray–Hare test, a nonparametric alternative to two-way ANOVA, to examine the main and interaction effects. Mann–Whitney U-tests were conducted to further explore the significant interaction effect.

Table 3 Scheirer–Ray–Hare test and post hoc Mann–Whitney U-tests for the effect of tutor experience and pedagogical training on outcome variables across three time points.

At time 1, pedagogical training shows a significant main effect on controlling teaching, H = 0.02, p < 0.05, with a medium effect size, η2 = 0.08. Pedagogically trained individuals (Md = 1.83; n = 44) received significantly higher levels of controlling teaching scores compared to those without training (Md = 1.33; n = 29), as shown in Fig. 1. The observed difference may be attributed to the pedagogical training enhancing tutors’ ability to manage classroom dynamics and guide students more effectively. Pedagogical training typically provides educators with strategies and techniques to better understand and respond to student needs (Darling-Hammond, 2000). For example, studies have shown that tutors with formal training are more adept at creating structured learning environments and using instructional strategies (Guskey, 2002). Additionally, pedagogical training often emphasizes the development of reflective practices and classroom management skills, which may contribute to higher controlling teaching scores (Hattie, 2009). The Scheirer–Ray–Hare test showed no other significant pedagogical training’s main effects on outcomes in three sessions (p > 0.05).

Fig. 1
figure 1

The difference of controlling teaching between pedagogical training and non-pedagogical training group at the first session.

Peer tutor experience’s influence: enhancing learning dynamics

Table 3 also exhibits the main effects of tutoring experience on student outcomes across three times. At time 1, tutor experience had a significant main effect on teacher autonomy support, H = 6.21, p < 0.05, with a medium effect size, η2 = 0.09. Students whose teachers had prior tutoring experience reported higher levels of teacher autonomy support (Md = 4.25, n = 40) compared to those whose teachers lacked such experience (Md = 4.00, n = 33).

At time 2, tutor experience significantly affected all measured outcomes except controlling teaching. Teachers with tutoring experience were rated higher on teaching persona (Md = 5.00, n = 31) than those without (Md = 5.00, n = 23), H = 8.18, p < 0.01, with a large effect size, η2 = 0.15. Students gained a higher teaching satisfaction when taught by tutors (Md = 5.00, n = 31) versus non-tutors (Md = 4.50, n = 23), H = 7.43, p < 0.01, with a large effect size, η2 = 0.14. Similarly, students’ self-efficacy was higher for whose teachers had tutoring experience (Md = 4.75, n = 31) compared to those without (Md = 4.50, n = 23), H = 6.99, p < 0.01, with a medium effect size, η2 = 0.13. Intrinsic value was also greater among students of tutors (Md = 4.67, n = 31) than non-tutors (Md = 4.33, n = 23), H = 4.86, p < 0.05, with a medium effect size, η2 = 0.09. Finally, teacher autonomy support remained significantly higher in the tutor group (Md = 4.67, n = 31) than non-tutors (Md = 4.33, n = 23), H = 10.54, p < 0.01, with a large effect size, η2 = 0.20. At time 3, there was no significant main effect of tutoring experience on targeted outcomes.

Figures 27 are the box plots showing significant differences between the tutor group and the non-tutor group. These results suggest that tutoring may positively impact various aspects of student experience. Tutors with tutoring experience at the university are more familiar with teaching processes, allowing them to respond and adjust more quickly to tutees’ performance. Aligning with the study (Hattie, 2009), tutors often provide more personalized attention and feedback, which can enhance teaching persona and increase teaching satisfaction. Also, the higher levels of teacher autonomy support in tutor groups align with research showing that autonomy-supportive practices can boost student motivation and engagement (Deci & Ryan, 2000). In a nutshell, the significant differences observed between tutor and non-tutor groups underscore the impact of tutoring on various aspects of tutee engagement and satisfaction. These findings highlight the value of tutoring as a means to improve study outcomes and engagement in educational settings.

Fig. 2
figure 2

The difference of teacher autonomy support between tutor group and non-tutor group at the first session.

Fig. 3
figure 3

The difference of teaching persona between tutor group and non-tutor group at the second session.

Fig. 4
figure 4

The difference of teaching satisfaction between tutor group and non-tutor group at the second session.

Fig. 5
figure 5

The difference of self-efficacy between tutor group and non-tutor group at the second session.

Fig. 6
figure 6

The difference of intrinsic value between tutor group and non-tutor group at the second session.

Fig. 7
figure 7

The difference of teacher autonomy support between tutor group and non-tutor group at the second session.

Pedagogical training matters: moderated by tutoring experience

Table 3 displays the interaction effect between pedagogical training and tutoring experience, assessed through Mann–Whitney U-tests. These results indicate that the effectiveness of pedagogical training on outcomes is contingent upon the level of prior tutoring experience. Figures 811 are the interaction plots that visualize how the effect of pedagogical training differs depending on the level of tutoring experience.

Fig. 8
figure 8

Interaction between pedagogical training and tutor experience on teaching persona at T1. Note. The red and blue points at x = 2 overlap due to identical medians.

Fig. 9
figure 9

Interaction between pedagogical training and tutor experience on teaching satisfaction at T1. Note. The red and blue points at x = 2 overlap due to identical medians.

Fig. 10
figure 10

Interaction between pedagogical training and tutor experience on self-efficacy at T1.

Fig. 11
figure 11

Interaction between pedagogical training and tutor experience on intrinsic value at T1.

At Time 1, significant interaction effects were identified across all measured outcomes, with the exception of teacher autonomy support and controlling teaching. The interaction effect for teaching persona was significant, H = 5.78, p < 0.05, with a medium effect size, η2 = 0.08. Follow-up Mann–Whitney U-tests revealed that among non-tutors, students taught by pedagogically trained individuals (Md = 5.00, n = 23) perceived higher teaching persona scores than non-trained teachers (Md = 4.20, n = 10), U = 119.00, z = −2.27, Holm-Adjusted p  = 0.049, with a medium effect size, r = 0.39. A similar pattern was reflected for teaching satisfaction, with a significant interaction effect (H = 3.86, p < 0.05, η2 = 0.05). Among non-tutors, students derived higher satisfaction when taught by pedagogically trained teachers (Md = 4.67, n = 23) than non-pedagogical non-tutors (Md = 3.83, n = 10), U = 104.50, z = −2.60, Holm-Adjusted p = 0.020, with a medium effect size, r = 0.45. The interaction effect for self-efficacy approached significance (H = 6.71, p < 0.01, η2 = 0.09), though the Holm-adjusted p value for the non-tutor group comparison did not reach significance (p = 0.07), suggesting a trend. Within the non-tutor group, students with pedagogical training teachers (Md = 4.25; n = 23) possessed higher self-efficacy than those without training (Md = 3.63, n = 10), U = 117.00, z = −2.10, with a medium effect size, r = 0.37. In contrast, a strong interaction effect for intrinsic value was found, H = 10.15, p < 0.01, with a large effect size, η2 = 0.14. Among non-tutors, the pedagogically trained group (Md = 4.67; n = 23) demonstrated significantly higher intrinsic value than the non-trained group (Md = 3.50, n = 10), U = 97.00, z = −2.90, Holm-adjusted p = 0.005, with a large effect size, r = 0.51. At time 2 and time 3, no significant interaction effects were found on any variable, and all Holm-adjusted p-values exceeded 0.05, which indicated the diminishing role of tutor experience as a moderator over time.

Concisely, the interaction effects observed at Time 1 indicate that the effectiveness of pedagogical training is moderated by prior tutoring experience. Specifically, pedagogical training significantly enhanced teaching persona, satisfaction, and intrinsic value among non-tutors, with a marginal effect on self-efficacy. These results align with existing research demonstrating that formal training supports teachers in engaging students, communicating effectively, and fostering motivation (Ahmed et al., 2021; Ambady & Rosenthal, 1993; Darling-Hammond, 2000; Hattie, 2009). From a motivational standpoint, trained non-tutors may be better equipped to meet students’ needs for competence and relatedness, thereby promoting greater intrinsic value and self-efficacy (Bandura, 1997; Deci & Ryan, 2000;). In contrast, tutors may have developed comparable skills through hands-on experience, reducing the incremental benefits of training—a pattern consistent with the expertise reversal effect (Berliner, 2004). The absence of significant interaction effects for autonomy support and controlling teaching suggests that these aspects may be less responsive to short-term training or more stable in early teaching stages.

Effects of pedagogical training and tutoring experience over time

A notable finding from Table 3 is the absence of significant main effect and interaction effects across outcomes at Time 2 and Time 3 (p > 0.05). Therefore, to further investigate the role of time, ART ANOVA analyses were conducted to examine how the effects of pedagogical training and tutoring experience evolved across the three time points. As the main and interaction effects of these two factors have already been discussed, the following section focuses exclusively on results related to the factor of time. Table 4 shows the results of three-way Aligned Rank Transform (ART) ANOVAs to measure the main effect of time and its interactions with tutoring experience and pedagogical training across six outcome variables. Partial eta squared (\({\eta }_{{\rm{p}}}^{2}\)) is reported as a measure of effect size.

Table 4 Analysis of variance of aligned rank transformed in the variable of tutor, pedagogical training, and time.

As illustrated in Table 4, significant interaction effects on teaching persona were exhibited for time and tutoring experience, F(2, 312) = 27.27, p < 0.001, \({\eta }_{{\rm{p}}}^{2}\) = 0.15, and time and pedagogical training, F(2, 312) = 25.97, p < 0.001, \({\eta }_{{\rm{p}}}^{2}\) = 0.14, both demonstrating large effect sizes. However, neither the main effect of time nor the three-way interaction reached statistical significance (p > 0.05). For teaching satisfaction, there was a significant main effect of time, F(2, 312) = 6.26, p < 0.01, with a small effect size, \({\eta }_{{\rm{p}}}^{2}\) = 0.04. Additionally, significant interaction effects were revealed for time and tutoring experience, F(2, 312) = 13.28, p < 0.001, \({\eta }_{{\rm{p}}}^{2}\) = 0.08; time and pedagogical training, F(2, 312) = 8.32, p < 0.001, \({\eta }_{{\rm{p}}}^{2}\) = 0.05; and a three-way interaction of time, tutoring experience and pedagogical training, F(2, 312) = 11.53, p < 0.001, \({\eta }_{{\rm{p}}}^{2}\) = 0.07, all indicating medium effect sizes.

As for motivational outcomes, a significant main effect of time on self-efficacy was detected, F(2, 312) = 6.25, p < 0.01, with a small effect size, \({\eta }_{{\rm{p}}}^{2}\) = 0.04, while all interactions were non-significant. Intrinsic value was also significantly influenced by time, F(2, 312) = 6.38, p < 0.01, \({\eta }_{{\rm{p}}}^{2}\) = 0.04. A significant interaction between time and pedagogical training was identified, F(2, 312) = 5.42, p < 0.01, \({\eta }_{{\rm{p}}}^{2}\) = 0.03, while other interactions were not significant.

For teaching style, both the main effect of time on teacher autonomy support, F(2, 312) = 8.82, p < 0.001, \({\eta }_{{\rm{p}}}^{2}\) = 0.05, and the time and tutoring experience interaction, F(2, 312) = 9.06, p < 0.001, \({\eta }_{{\rm{p}}}^{2}\) = 0.06. Other interactions were non-significant. Similarly, controlling teaching was significantly affected by time, F(2, 312) = 13.87, p < 0.001, with a medium effect size, \({\eta }_{{\rm{p}}}^{2}\) = 0.08. and by the time and pedagogical training interaction, F(2, 312) = 7.98, p < 0.001, \({\eta }_{{\rm{p}}}^{2}\) = 0.05. Other interaction terms were not significant.

The findings suggest time appears to moderate the sustained impact of tutoring experience and pedagogical training on teaching outcomes. Several factors may explain the observed decline in the significance of main and interaction effects over time. Firstly, via experience gained throughout the interaction time, non-trained teachers may organically acquire pertinent abilities and modify their teaching methods, progressively bridging the initial divide with their trained counterparts (Berliner, 2004). Secondly, as students and teachers build relationships over time, the quality of that interpersonal connection might become a more dominant factor influencing student perceptions and motivation than the teacher’s initial training status (Cornelius-White, 2007). Thirdly, the particular benefits of early pedagogical preparation may be particularly important in the first stages of the teaching relationship and become less distinctively predictive of results as routines and shared experiences grow. Some training-informed teaching strategies may lose their novelty, or the changing demands of the students may cause the emphasis to move away from training-based differences (Bailey et al. 2020).

Pedagogy insights: what do tutees say?

Teacher scaffolding, wherein instructors provide adaptive or contingent support to students, is widely regarded as an effective pedagogical approach (Omoniyi & Ese, 2018). Studies (Chi, 1996; Chi et al., 2001) have explored the scaffolding boons in tutoring. This section, based on the interviews, further investigated the scaffolding strategies participants mentioned and underlined their efficacy within the tutoring process.

“I find the practical application of technology in teaching during those two sessions [Types A and B] particularly compelling…My professor did not use them in the classroom; hence, for me, it is very interesting. Especially about the Quizizz, the tutor can use the turntable to draw tutees to share their opinions. It is so impressive! I am engaged more with that (Student L).”

In the interviews, tutors from categories A and B who employed gamification technologies such as Quizizz and Kahoot to assess students’ knowledge were frequently highlighted. These tools, as assessment instruments, not only increase students’ interest but also reduce their anxiety about being tested or evaluated due to the engaging nature of gamification. Consequently, the impact of these tools extends beyond mere assessment. As one interviewee noted, features like the turntable encourage students to share ideas and explain questions, transforming them from passive participants into active enablers. In effect, the students begin to ‘tutor’ themselves. This shift enhances their learning motivation and self-efficacy, as evidenced by some students scheduling one-on-one tutoring sessions to delve further into the course content. These findings align with Hernanz et al. (2024), who observed that students reported greater satisfaction with their performance and activated intrinsic motivation after engaging with new educational experiences by Quizizz. In addition to that, Padlet, as an interactive virtual whiteboard platform, has been well-received by tutees.

“I really like the practices we did in Padlet…We polished the sentence that the tutor gave us and then published it after collective, thoughtful discussions. I can also see how other groups doing. Honestly, I learned the most from it (Student Y).”

Megat et al. (2020) conducted a case study exploring the collaborative learning environment fostered by Padlet, which stimulated students’ interest in learning and enhanced their academic performance through active engagement. Nevertheless, the context of this study emphasizes the development of a shared learning community that promotes student autonomy via Padlet. Through Padlet, students independently engage in discussions and provide feedback on each other’s academic writing, spontaneously creating an active, student-led learning environment.

One scaffolding strategy discussed by all tutees across various tutoring contexts is the use of instructional language.

“I can not fully understand in the class. Hence, I come to this tutoring… Chinese help me understand easily (Student O).”

“[For Type A tutoring] Although the materials are all in English, they interpret in Chinese when they are conducting instructions. But the essential concept and content they still use English to stress it… Thanks to that, I get good grades for my essay (Student K).”

Given that English serves as the standard instructional language at this university, first-year students often encounter difficulties in fully adapting during their initial year, especially in English academic writing. Thus, four different tutoring groups have opted to use their first language (L1) as the primary instructional language, with slight differences. The groups possessing pedagogical expertise continue to use English for core content and discussions while presenting their teaching materials exclusively in English. Language for giving instructions and further explanations, they decided to use the students’ mother tongue. Accordingly, tutees comprehend the content through their L1 but express and transfer their learning outcomes in English. Herein, the mother tongue functions as a bridge rather than a crutch. In contrast, the other two groups rely primarily on L1 for their tutoring sessions.

Apart from the scaffolding techniques for learning, the tutees’ favorite part, as well as the approach that the researcher is most eager to promote as a prompt for learning, is ‘auction’. Boyd and Boyd (2014) set auction as a learning environment for students to communicate in economic contexts through practicing their audiolingual skills and gaining real-life experiences. Nonetheless, tutors in this study take auction as a booster for learning autonomy. In essence, tutees accumulate a set amount of ‘fake money’ each time they actively engage in tutoring sessions, which can then be used in an auction organized by the researchers at the end, where students can bid for items they desire.

“I want to go to that session [Type A tutoring]!… My last session did not have the money, and it seems like I did not participate a lot (Student F).”

“The ‘money’ makes me participate in the discussion part. I can share my ideas with the tutors and also friends…Do you still have the tutoring sessions next semester? (Student P)”

This reward system is specifically implemented in tutoring sessions facilitated by peer tutors who have received formal training in pedagogy. These tutors view this strategy as an innovative means to encourage students to take greater initiative in their learning process. Satisfactorily, the purpose and result echo. The tutees’ motivation to learn is heightened by the incentive of monetary or material rewards, which subsequently fosters greater learning autonomy. In addition, findings in previous sections suggest that both the learning atmosphere and students’ willingness to engage are notably improved in sessions led by peer tutors who have undergone pedagogical training, which aligns with the interviews.

Conclusion

This study has revealed that peer tutors gain far-reaching advantages from targeted pedagogical training, as demonstrated by a marked statistical decrease in the use of controlling teaching styles when compared to tutors who did not undergo such training. Tutors who underwent pedagogical training showed compelling improvements not only in their teaching style but also in developing a clearer teaching persona, more effective communication, and greater instructional clarity. These changes enabled tutors to engage tutees more effectively, resulting in higher satisfaction and better overall tutoring experiences. These results align with previous studies (Gardner & Jones, 2011; Di Benedetti et al., 2022) that emphasize the importance of pedagogical training in refining tutors’ teaching methods, which in turn positively impact student outcomes. The data also highlighted the importance of structured frameworks like the BOPPPS model, which played a key role in fostering an environment where tutees could actively engage and improve their academic confidence. This is in line with studies that emphasize the benefits of participatory and structured teaching approaches (Xu et al., 2023). Tutees in the pedagogically trained groups reported higher self-efficacy and motivation, supporting the idea that a structured, interactive tutoring experience can significantly enhance students’ academic growth and personal development. Not all tutees experienced the same level of improvement in their academic confidence. A few tutees, particularly those with lower initial motivation or who faced a mismatch between their learning style and the tutor’s approach, did not report the expected boost in confidence. Such echoes Topping’s (2005) work, which pointed out that tutor-tutee mismatches could hinder the benefits of peer tutoring. Additionally, some language barriers were noted, especially in cases where tutors over-relied on the students’ first language for explanations, which may have impacted the effectiveness of the tutoring in improving academic English skills. Despite these challenges, the introduction of an incentive reward system, including the use of gamification tools like Quizizz and the final auction, notably enhanced student engagement. These findings support previous research that suggests external rewards can complement intrinsic motivation to boost student participation and academic confidence (Hernanz et al., 2024). The rewarding system helped to motivate students, especially those who were less intrinsically motivated, and kept them engaged in the learning process. While this study adds valuable insights into how pedagogical training impacts peer tutoring, it also highlights the need for practical improvements in training programs. Based on the findings, several recommendations can be made. First, pedagogical training programs should focus on enhancing tutors’ ability to use scaffolding strategies, such as adaptive questioning, providing timely feedback, and fostering collaborative learning environments. These strategies have been shown to be highly effective in improving tutee engagement and academic confidence. Second, tutors should be trained to assess and adapt to individual tutee needs more effectively, ensuring that learning styles are matched and that the tutor’s approach is flexible. This is essential for creating a more personalized learning experience and overcoming barriers such as mismatches in teaching and learning styles. Furthermore, institutions should consider integrating more comprehensive training that goes beyond the BOPPPS model to include specific strategies for enhancing tutor-tutee interactions and building a supportive, autonomy-promoting environment. This could involve incorporating role-playing exercises or peer-feedback sessions into the training process to help tutors better understand how to respond to diverse learner needs. In a nutshell, while the findings underscore the substantial benefits of pedagogical training for peer tutors, they also reveal areas for improvement. Peer tutoring programs can be strengthened by refining training approaches, focusing on scaffolding techniques, and fostering better tutor-student interactions. Institutions developing these programs should focus on supporting both tutors and tutees to create a dynamic, inclusive, and confidence-building learning experience for all involved. “Pedagogy determines how teachers think and act” (Kincheloe, 2005, p. 53), shaping their beliefs and influencing how students learn. As educators and researchers as well, we assert that pedagogy directs the educational trajectory and molds the educational landscape, just as education per se illuminates learners.