Introduction

The scientific community has recently been faced with a concerning phenomenon termed the ‘reproducibility crisis. At its core, this crisis signifies a substantial loss of validity in research outcomes, leading to diminished confidence in the generated findings. This issue is multifaceted, spanning various disciplines and challenging the fundamental principles of scientific inquiry, which rely on the ability of independent researchers to verify reported results1.

The broad scope of this challenge is evident in estimates across various fields; for instance, it is widely acknowledged that only about 10–25% of biomedical research outcomes can be reliably reproduced, a statistic that has prompted considerable attention to reproducibility in biomedical research. These figures are employed to illustrate the general scope of the reproducibility crisis across research types (e.g., biomedical and social sciences) and are not strictly confined to any single methodology or discipline2. Scholars are increasingly linking the crisis to issues such as fraud, carelessness, and general unreliability3,4. This issue is recognized as a multifaceted and multistakeholder problem with no single cause or solution. At its core, reproducibility is defined as the ability to validate research knowledge using established science and methodology; findings that cannot be replicated will not be considered valid knowledge. Furthermore, reproducibility remains a crucial tool for evaluating scientific claims and assessing their validity5.

Understanding the crisis requires a clear conceptual framework. Freedman et al.6 identify three key aspects of reproducibility: reproducibility of methods (requiring detailed explanations for experts to accurately reproduce results), reproducibility of results (the technical replication of an experiment), and inferential reproducibility (determining if a reanalysis or replication yields qualitatively similar conclusions)6.

Despite being an integral part of scientific research, many fields have experienced a decline in reproducibility in recent years7. Although scientific research is increasingly reliant on reproducible outcomes, many pivotal studies across multiple disciplines have not been properly replicated. An analysis of the top 100 journals in education research found that 0.13% of publications described reproducibility projects8. Studying 250 psychology articles published between 2014 and 2017 revealed that 5% discussed reproducibility efforts9; however, social sciences only mentioned reproducibility attempts 1% of the time10. This crisis threatens research, wasting resources, retarding hindering knowledge progression, and undermining scientific journals’ credibility11.

In medical education, the reproducibility of research results is crucial, as validated findings serve as guiding principles for curriculum development, teaching methods, and student evaluation methods12 Evidence-based decisions in this field are essential for optimizing resource utilization and ensuring improved outcomes for faculty, learners, and health systems. Education strategies based on flawed evidence can compromise student learning and lead to the misallocation of resources toward ineffective initiatives. Moreover, a finding that has not been duly validated damages research credibility and can negatively affect health care systems overall13,14. While significant attention has been given to the reproducibility issue in biomedical research, comparatively less emphasis has been placed on medical education research15,16,17,18.

This disparity highlights a critical research gap. Medical education research often lacks standardized methodologies, making replication difficult. Unlike controlled biomedical studies, this field frequently employs qualitative data and context-specific interventions19.

Reproducibility is further complicated by the unique complexities inherent in the setting, including diverse student demographics, ethical constraints, environmental factors, and methodological issues such as inconsistent measurement tools. Consequently, results, which are often self-reported, are frequently prone to bias20. Recognizing the need for evidence-based improvement, this qualitative study’s primary objective is to explore and interpret the various factors affecting reproducibility, identify the underlying barriers, and propose specific solutions for enhancing quality and reproducibility in Iranian medical educational research.

Materials and methods

Study design

This study utilized a qualitative design within an interpretive paradigm to deeply investigate the reproducibility crisis in Iranian medical education. To analyze the interview data, we employed Conventional Content Analysis based on the framework established by Graneheim and Lundman21. The selection of this specific inductive approach was imperative for the study’s objectives.

Graneheim and Lundman’s method offers a systematic rigor specifically designed for health and education contexts, capable of revealing complex phenomena through both manifest (explicit) and latent (underlying) content. This dual capability was essential for capturing the nuanced, context-dependent meanings in our experts’ perspectives, ensuring that the findings remained grounded in the data rather than being limited by pre-existing theoretical biases.

Participants

A total of twenty-four medical education experts affiliated with Iranian universities of medical sciences were recruited using purposive sampling. Participants were selected based on their specific familiarity with the concept of reproducibility. Inclusion criteria mandated a minimum of five years of experience in medical education research and scholarship, along with a record of at least ten published papers or projects in the field. Participants who did not complete the full interview process or were unavailable for follow-up were excluded from the study.

The participants in this study were aged between 35 and 44 years, with the sample exhibiting a gender distribution of 62.5% women and 37.5% men. Their demographic characteristics are summarized in Table 1.

Table 1 The demographic characteristics of the participants.

Data collection

Data were collected through in-depth, semi-structured interviews to gain a comprehensive understanding of participants’ experiences. Before each interview, the study objectives were clearly explained, confidentiality was emphasized, and informed consent (written or verbal) was obtained. Depending on participants’ preferences and convenience, interviews were conducted either face-to-face or via online platforms in a quiet setting to ensure privacy.

The interview guide was developed to encourage participants to express their views freely on research reproducibility. Each session began with general introductory questions to establish rapport, followed by the main questions. The central question was: “In your opinion, what factors influence the reproducibility of research in medical education?” In line with qualitative research traditions, the sequence of questions remained flexible to follow the flow of the participants’ narratives. Probing questions were used extensively to elicit deeper insights and uncover underlying meanings in participants’ perceptions, such as: “What challenges do you face in ensuring research reproducibility?”, “What strategies could enhance reproducibility?”, and “What actions should medical educators take to ensure reproducible research reporting?”.

Data collection involved in-depth, semi-structured interviews, lasting approximately 60–90 min per participant. All interviews were recorded and subsequently transcribed verbatim to ensure the preservation of the original data complexity for subsequent analysis.

We employed a simultaneous and iterative process of data collection and analysis, which continued until data saturation was achieved, ensuring the generation of a comprehensive and rich dataset. Data were formally considered saturated when several consecutive interviews yielded no new codes or categories relevant to the research questions, and when subsequent interviews primarily served to elaborate or confirm existing themes rather than generate novel conceptual insights.

Data analysis

Data Analysis Data analysis was conducted using the qualitative conventional content analysis approach developed by Graneheim and Lundman21. This inductive method was selected to systematically interpret the data through a process of decontextualization and recontextualization. Initially, the corresponding author transcribed the interviews verbatim. The research team then read the transcripts multiple times to obtain a sense of the whole and achieve immersion in the data. Following this preparatory phase, a line-by-line analysis was performed to identify “meaning units”—words, sentences, or paragraphs containing information relevant to the study’s aim.

In the subsequent procedural steps, these meaning units were “condensed” to shorten the text while preserving the core content, and then abstracted and labeled with codes. Through a process of constant comparison, codes regarding similar subjects were grouped into sub-categories and categories, which represented the “manifest” content (the visible components) of the data. Finally, through deep interpretation and reflection on the underlying meanings across categories, themes were formulated to capture the “latent” content of the phenomenon (Fig. 1).

Fig. 1
figure 1

From meaning unit to theme: steps in the qualitative content analysis.

Rigor

Trustworthiness to ensure the rigor and trustworthiness of the study, the four criteria proposed by Lincoln and Guba22—credibility, dependability, confirmability, and transferability—were strictly applied. Credibility was established through prolonged engagement with the data and member checking, where a summary of the coded findings was shared with participants to confirm that the results accurately reflected their experiences. Dependability was ensured through peer debriefing and external audit, where the coding process and initial findings were reviewed by two external qualitative researchers and two doctoral candidates. Confirmability was achieved by maintaining a detailed audit trail, documenting all analytical steps and decisions to allow external tracing of the research path. Finally, Transferability was facilitated by employing maximum variation sampling across participant demographics and providing a thick description of the Iranian medical education context, allowing readers to judge the applicability of the findings to similar settings (Fig. 2).

Fig. 2
figure 2

Application of Lincoln and Guba’s22 Criteria for Trustworthiness.

Ethical considerations

This study received ethical approval from the Ethics Committee of Tehran University of Medical Sciences (Ethics Code: IR.TUMS.Medicine.REC.1402.515) prior to the commencement of data collection. All participants were fully briefed on the aims, procedures, and potential implications of the research to ensure their complete understanding. Written or verbal informed consent was obtained from all participants before their inclusion in the study.

Participation was entirely voluntary, and all respondents were unequivocally assured of their right to withdraw at any point without penalty or consequence. Confidentiality and anonymity were strictly maintained throughout the research process. All interview transcripts and data were anonymized immediately upon transcription, and access to the raw data was limited exclusively to the research team. This study adhered to the ethical principles governing research involving human subjects, including the relevant guidelines outlined in the Declaration of Helsinki. This specific research was conducted as a component of a project investigating research misconduct in medical education.

Results

The qualitative conventional content analysis of participant interviews resulted in the identification of three major themes that capture the latent content of the data: factors affecting the reproducibility crisis, consequences of the reproducibility crisis, and solutions to deal with the reproducibility crisis. These themes, along with their respective categories and sub-categories (manifest content), are summarized in Table 2.

Table 2 Overview of the analytical process: categories, sub-categories, and emergent themes.

Theme: factors affecting the reproducibility crisis

This theme encompasses the systemic issues within research practice and the academic environment identified by participants as contributors to the reproducibility crisis. The contributing factors were grouped into three main categories: research methodology, biases, and contextual factors.

Research methodology

Participants highlighted specific methodological weaknesses that compromise the reliability of research findings, including failures in choosing appropriate sample size, ensuring accurate study design, and the use of reliable measurement tools. Statistical errors are also implicit methodological failures addressed by the solutions later. Representative participant insights reflecting these concerns are detailed in Table 3.

Table 3 Participant quotations on factors: research methodology.

Biases

Participants pointed to several forms of bias that distort scientific literature, specifically: Publication Bias (favoring positive results), Sample Selection Bias (non-representative sampling), and Reporting Bias (selective reporting or exaggeration of findings). Relevant participant quotations are presented in Table 4.

Table 4 Participant quotations on factors: biases.

Contextual factors

The pressures of the academic system were identified as external drivers, including intense pressure to publish articles (quantity over quality) and the potential for conflicts of interest from research funding. Participant comments highlighting these contextual pressures are shown in Table 5.

Table 5 Participant quotations on factors: contextual factors.

Theme: consequences of the reproducibility crisis

This theme details the negative impacts of non-reproducible research on educational practice and the public’s perception of science.

Influence on educational decisions

Unreliable evidence compromises the decision-making process concerning curriculum design, choosing teaching methods, and learning assessment, ultimately reducing the quality of education. Table 6 provides participant perspectives on these consequences for educational decisions.

Table 6 Participant quotations on consequences: educational decisions.

Influence on the progress of science

The crisis creates systemic problems that impede scientific advancement by slowing down the progress of science (wasted time and resources) and causing decreased trust in scientific evidence among the public. The related participant quotations are found in Table 7.

Table 7 Participant quotations on consequences: progress of science.

Theme: solutions to deal with the reproducibility crisis

Participants proposed concrete actions grouped into three major categories that directly address the factors and consequences identified: improving research methodology, changing research culture, and strengthening research supervision.

Improving research methodology

These solutions directly address methodological weaknesses through teaching research methodology, encouraging the use of appropriate statistical methods (to counter analysis errors), and strong emphasis on transparency in reporting. Participant suggestions for improving methodology are presented in Table 8.

Table 8 Participant quotations on solutions: improving research methodology.

Changing research culture

Addressing the contextual and bias-related factors requires fundamental changes such as promoting the encouragement of publication of negative results (to counter publication bias), reduced pressure for publication, and increasing international cooperation. These proposed cultural changes are supported by the quotations in Table 9.

Table 9 Participant quotations on solutions: changing research culture.

Strengthening research supervision

Solutions focusing on oversight include improving the peer review process and creating databases for pre-registration of research (to enhance transparency and counter reporting bias). Table 10 contains the relevant participant quotations.

Table 10 Participant quotations on solutions: strengthening research supervision.

The primary findings, synthesizing the key factors, consequences, and solutions identified in the qualitative analysis, are summarized visually in Fig. 3.

Fig. 3
figure 3

Reproducibility crisis in medical education research.

Discussion

There are several dimensions to the reproducibility crisis in medical education research, which are highlighted in this study. Research reproducibility and validity are explained by three main themes and eight categories.

First theme: factors affecting reproducibility crisis

This study identified three categories of factors: research methodology, biases, and contextual factors.

The first category, research methodology, is a crucial factor in ensuring reproducibility, confirming the findings of Hildebrandt and Prenoveau23 and Klein24. Our findings highlight that achieving reproducibility depends on three specific categories: choosing appropriate sample size, accurate study design (a well-designed study), and the use of reliable measurement tools (well-defined measurements). Furthermore, participants implicitly indicated that flaws in statistical and data analysis methods represent methodological failures. This issue is exacerbated by the over-reliance on single metrics like the p-value and the resultant misuse of statistical methods, which the wider literature identifies as a primary driver of non-reproducibility25,26.

Likewise, Wichman et al.27 observe that Clinical and Translational Research (CTR) must uphold rigor and reproducibility, arguing that studies ought to be adequately collected, designed, and analyzed (addressing the statistical element). To overcome the challenges associated with each phase of CTR, tailored approaches are required. A rigorous scientific approach, transparency, and interdisciplinary collaboration can contribute significantly to the advancement of clinical research27.

The second category, biases, seriously compromises the quality of research findings. Our study strongly emphasizes three forms of bias. Firstly, publication bias (the tendency to prefer statistically significant findings) skews actual effect sizes and overestimates findings, a discrepancy also reported by Johnson et al.28 in psychology research, where only 36% of replications were significant compared to 97% of originals. Secondly, sample selection bias (non-representative inclusion), which is compounded by the personalization of online search engines, threatens the completeness and validity of systematic reviews, as noted by Ćurković and Košec29. Thirdly, reporting bias (selective reporting or exaggeration) is a critical issue that compromises scientific integrity. The credibility of meta-analyses is similarly challenged by publication bias, subjective inclusion criteria, and variables in interpretation, as demonstrated by Lakens, Hilgard, and Staaks30. Blanco-Pérez et al.31 and Mohyuddin et al.32 agree that publication and selection bias are significant threats. Therefore, to overcome these challenges, researchers must transparently report results and adhere to standardized reporting guidelines, allowing data sharing and preregistering their analysis plans, which, as Simonsohn33 suggests, increases confidence and reduces bias.

The identified biases, including publication bias, sample selection bias, and reporting bias, must be analyzed within the specific institutional and systemic pressures of the Iranian academic environment. The pervasive “publish or perish” culture, driven by strict institutional promotion and ranking requirements, significantly contributes to the perpetuation of these biases. This pressure often forces researchers toward selective reporting (reporting bias) and the pursuit of statistically significant, novel, or positive findings that are more likely to be accepted by local or international journals (publication bias). Furthermore, constraints on funding and access to diverse populations can lead to non-probabilistic or convenience sampling methods, exacerbating sample selection bias. Addressing the reproducibility crisis thus requires not only methodological training but also a fundamental re-evaluation of institutional policies that prioritize quantity over research quality and rigor34.

The third category, contextual factors, points to systemic pressures. Our findings highlight two key drivers. Scientists may be pressured to publish research quickly which can compromise the quality of their research. Kearney et al.35 confirm that researchers under time pressure may neglect data accuracy. Additionally, our study explicitly revealed that the influence of research funding on research direction and the potential for conflicts of interest are significant contextual factors that introduce bias. To maintain high standards, it is essential to foster a research environment that prioritizes quality over quantity.

Second theme: consequences of the reproducibility crisis

The consequences of the reproducibility crisis are two-fold, affecting both educational decisions and scientific progress.

The first consequence is the influence on educational decisions. Non-reproducible research may result in poorly conceived curricula (flawed curriculum design) and worse teaching methods (choosing teaching methods). Baker36 affirms that only evidence that can be replicated and verified is essential for making an informed educational decision. Abid et al.37 and Ellaway38 concur that transparency and rigorous research are essential for high-quality health profession education. Our study further identified that reliance on unreliable research leads to the use of flawed tools for learning assessment, thereby diminishing the validity of the entire educational system. This issue is compounded by the fact that even basic steps like literature search strategies are often non-reproducible, lacking essential details like Boolean operators or search dates, as Maggio et al.39 demonstrated.

The second consequence, influence on the progress of science, primarily involves two impacts. Firstly, non-reproducible research directly results in the slowing down the progress of science due to wasted resources and time. Secondly, and most critically, lack of reproducibility leads to decreased trust in Scientific Evidence among the public. Freese et al.40, Wingen et al.25, Mede et al.41, and Nosek et al.26 all confirm that this loss of confidence hinders scientific advancement and destroys scientific credibility. Auspurg et al.42 also state that lack of reproducibility destroys confidence in scientific investigations. Therefore, researchers must improve the quality and transparency of their research to rebuild public trust.

Third theme: solutions to deal with the reproducibility crisis

Our results indicate that the most important strategies fall under improving research methodology, changing research culture, and strengthening research supervision.

The first solution category, improving research methodology, is supported by previous studies and includes developing comprehensive programs for teaching research methodology and promoting the use of appropriate statistical methods to address data analysis flaws. This solution directly addresses the implicit methodological failure noted by participants, specifically the over-reliance on simplistic metrics, which requires a shift toward rigorous statistical approaches that enhance robustness and transparency25,26. Furthermore, emphasis on transparency in reporting of results, including detailed descriptions of methods and data, is necessary. As mentioned by Nosek et al.26, transparency adds to the credibility of findings, fostering high public trust. Providing more detailed descriptions of research methods, releasing code and data, and preregistration of studies will enhance replicability and result in more informed clinical practice.

The second solution category, changing research culture, is essential to address systemic pressures. Key strategies include encouraging the publication of negative results (to counter publication bias) and implementing policies for reduced pressure for publication. Hail et al.43, Baker36, and Kedron et al.44 emphasize the importance of prioritizing quality over quantity in the research environment. Crucially, the lack of a supportive environment necessitates institutionalizing robust Mentorship Programs to guide junior faculty through rigorous research protocols45,46,47. Our study further suggests that increasing international cooperation is a viable strategy for quality enhancement and knowledge exchange.

The final proposed solution category—Strengthening Research Supervision—focuses on enhancing oversight mechanisms to ensure research reliability. Participants specifically recommended improving the peer review process for rigor and, more fundamentally, the mandatory creation of databases for pre-registration33. This practice is strongly promoted as a critical means to increase transparency, reduce selective reporting bias (as highlighted by Simonsohn), and curb questionable research practices. Furthermore, a core component of enhanced supervision is demanding transparent and accurate reporting of methods and results. This entails the mandatory adoption of guidelines promoted by the EQUATOR Network (e.g., CONSORT, STROBE), as enhancing reporting standards is a fundamental and necessary step toward achieving replicability48,49

To make these proposed solutions immediately actionable, it is essential to draw upon successful models implemented globally. For instance, the recommendation to improve statistical methodologies can be concretely realized by adopting practices promoted by the Center for Open Science (COS) and initiatives like the TOP (Transparency and Openness Promotion) Guidelines, which compel researchers to preregister study protocols and analytic plans50. The successful adoption of preregistration in fields like psychology and economics has demonstrably reduced questionable research practices and enhanced the credibility of findings. Furthermore, promoting research transparency can be significantly strengthened by integrating FAIR (Findable, Accessible, Interoperable, Reusable)51 data principles in institutional research policies. Exemplar models, such as the widespread adoption of standardized reporting guidelines like CONSORT (for trials) and STARD (for diagnostic accuracy)52, have markedly improved the completeness and clarity of published reports across many international journals. By integrating these tangible, proven examples into training and institutional oversight, Iranian medical universities can effectively bridge the gap between aspirational solutions and practical, impactful reforms.

A summary of the identified challenges and their corresponding actionable solutions, along with international examples, is provided in Table 11.

Table 11 Summary of identified reproducibility challenges, corresponding actionable solutions, and international best practice examples.

The reproducibility challenges identified in this study are inevitably shaped by the institutional characteristics of Iranian medical education. Research integrity and practice are influenced by a system marked by centralized governance, hierarchical academic structures, and performance metrics that strongly prioritize publication volume. These systemic factors dictate how research quality and reproducibility are understood and enacted. Critically, however, the core concerns identified here—such as methodological flaws, systemic biases, and the prevailing pressure on researchers—align closely with the global discourse on the reproducibility crisis43.

A comparative analysis with international studies from regions like North America and Europe reveals a shared necessity for targeted training, robust institutional policies, and community-led initiatives to reinforce reproducible research practices in the health sciences4. This convergence suggests that while the determinants of irreproducibility are universally acknowledged, their specific manifestation is filtered through local cultural, ethical, and organizational contexts. Therefore, effective intervention requires context-sensitive strategies. These strategies must adapt international best practices (e.g., transparent data sharing and preregistration) to the realities of Iranian medical universities, particularly by reforming incentive structures to reward quality, transparency, and replication rather than prioritizing volume alone.

Conclusion

This qualitative study provides a comprehensive, multi-dimensional understanding of the reproducibility crisis within medical education research. The findings, based on conventional content analysis, clearly delineate the systemic threats and necessary remedies to address this phenomenon. The reproducibility crisis is driven by three main factors: methodological flaws (e.g., inadequate sample size/design and unreliable measurement tools); various Biases (specifically publication bias, selection bias, and reporting bias); and powerful contextual factors (e.g., pressure to publish and conflicts of interest from funding). The consequences of these factors are serious, leading to a reduction in the quality of educational decisions (flawed curriculum and assessment) and a fundamental loss of Public Trust in scientific evidence. Crucially, the study identifies robust solutions across three strategic domains: improving research methodology (e.g., training and appropriate statistical methods), changing academic culture (e.g., encouraging negative results and reduced publication pressure), and strengthening research supervision (e.g., pre-registration and improved peer review). Successfully implementing these comprehensive strategies is paramount to restoring scientific rigor and credibility in the field of medical education.

Limitations

This study presents several limitations inherent in its design and execution that should be carefully considered when interpreting the findings. As a qualitative inquiry conducted through interviews within a highly specific geographic and institutional context, the results are intrinsically context-dependent and are not intended for statistical generalization across the global medical education community. Nevertheless, we employed a maximum variation sampling strategy and provided a rich, contextualized description of the setting to enhance the transferability of the findings, allowing readers to evaluate their relevance to similar contexts in medical education.

Additionally, due to the sensitive nature of the topic (research integrity and non-reproducible practices), there is an unavoidable potential for social desirability bias. Participants may have consciously or subconsciously aligned their responses with professional and academic norms, potentially understating the true extent of challenges related to research rigor. Furthermore, this study was primarily designed to identify the nature and existence of the relevant factors and consequences; it did not quantitatively measure their prevalence in the population or the strength of the relationships between the identified variables.

Research suggestions

Future research must primarily focus on quantitative expansion and empirical evaluation. Initially, studies should transition to large-scale quantitative methods (e.g., international surveys) to accurately measure the global prevalence of the identified factors and consequences. Concurrently, comparative studies are needed to distinguish between universal challenges and those that are context-specific to different academic systems.

Crucially, intervention studies with longitudinal designs are required to empirically evaluate the effectiveness of proposed solutions, such as mandatory advanced training and the institutional adoption of Pre-registration Databases. Finally, given the digital landscape, research must explore the impact of digital tools and search biases on reporting bias and selection bias in evidence synthesis.