Abstract
Observational studies in kidney transplantation often face confounding bias due to the absence of randomization, which can compromise validity and limit generalizability. Propensity score matching (PSM) helps mitigate this bias by mimicking random assignment. This guide outlines the implementation of PSM in kidney transplant research, focusing on methodology, practical considerations, common pitfalls, and reporting standards. We summarize key steps, including selecting covariates related to both treatment and the probability of receiving a transplant, estimating propensity scores, applying appropriate matching techniques, assessing balance, and conducting sensitivity analyses to test robustness. Practical considerations include ensuring sufficient overlap in propensity scores and balancing sample size with matching quality. Common challenges involve omitting relevant covariates, inadequate overlap, suboptimal matching, and loss of statistical power due to reduced sample size. By adhering to rigorous methodological practices and transparent reporting, researchers can improve the credibility and impact of their findings. When carefully implemented, PSM can substantially reduce confounding bias, enhance causal inference, and ultimately support better decision-making in kidney transplantation research. The example provided is illustrative only and does not replace a formal time-to-event analysis accounting for competing risks.
Similar content being viewed by others
Introduction
Observational studies are inherently vulnerable to various types of bias due to the absence of randomization. Therefore, effectively controlling bias in these studies is essential to enhance the validity of the results, ensure generalizability, and inform decision-making. It is crucial to follow established guidelines, such as the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines, to ensure high-quality reporting in observational studies1.
Mitigating bias presents a significant challenge for researchers, and several strategies have been proposed to address confounding variables, including the use of matching estimators2. Propensity score matching (PSM) is a statistical method to reduce confounding bias in observational studies first introduced in 1983 by Rosembaum and Rubin3. This method has gained wide acceptance for addressing confounding in observational studies. When properly implemented, it offers value in enhancing covariate balance between treatment groups and in approximating the conditions of a randomized controlled trial, thereby strengthening causal inference4. This strategy combines matching with the propensity score to approximate a quasi-randomized environment5. Also, propensity score methods can be used not only to reduce confounding but also to define or approximate specific target populations, allowing researchers to emulate the conditions of different randomized controlled trials within observational data6. This methodology has gained widespread application in kidney transplantation studies, where researchers must consider multiple patient covariates7.
This method offers a robust analytical framework that minimizes confounding factors, ultimately leading to stronger evidence-based practice8,9,10. PSM is particularly advantageous in scenarios where randomization is impractical or impossible, making it a powerful tool for observational studies in kidney transplantation11. The purpose of this review is to explain PSM, explore its practical applications in the field of kidney transplantation, and provide a practical example.
Applications of PSM in kidney transplant research
In kidney transplant studies, common confounding variables that can affect outcomes include age, sex, dialysis duration, and comorbidities12,13,14,15. The use of PSM can be particularly useful in various scenarios within this field. For example, PSM enables the comparison of outcomes from different therapies in transplanted patients, providing a robust methodology to control for confounding variables and derive more accurate conclusions16. Moreover, this technique is valuable for evaluating the impact of pretransplant conditions on posttransplant outcomes, helping to identify which preexisting factors may influence patient recovery and survival17,18. Additionally, PSM is used to identify factors associated with different outcomes in transplanted patients, facilitating an understanding of variables that can affect transplant effectiveness and posttransplant quality of life7,19,20,21. These applications underscore the versatility of PSM in kidney transplant research, allowing researchers to explore a wide range of questions related to treatment efficacy and patient outcomes11.
Methodology: implementing PSM
Covariate selection
The starting point in any PSM approach is selecting the right covariates. These should include variables that are related both to the likelihood of receiving the treatment and to the outcome, since failing to control for such confounders can bias the results22,23. To strengthen this step, it is advisable to draw on existing literature and expert knowledge, which can help ensure that all relevant factors are taken into account11.
However, not all variables should be included. It’s crucial to distinguish between confounders, which must be adjusted for, and mediators, which lie along the causal pathway and may distort the estimated treatment effect if included. To navigate these decisions systematically, researchers can use tools such as Directed Acyclic Graphs (DAGs), which offer a clear visual representation of hypothesized relationships and help guide covariate selection in a defensible way24.
Estimating propensity scores
Once the covariates are defined, the next step is to estimate each individual’s probability of receiving the treatment. This is commonly done using logistic regression, which models the likelihood of treatment assignment based on the selected covariates. While logistic regression remains the standard, alternatives such as probit models or machine learning techniques (e.g., random forests or gradient boosting) may be better suited to complex scenarios or multi-arm settings. In any case, including interaction terms or nonlinear functions often helps improve model fit and enhances balance across groups. Notably, since the goal is not prediction but covariate balancing, overfitting is not considered a major issue here23.
More recently, approaches like Covariate Balancing Propensity Scores (CBPS) have gained traction. Unlike traditional methods that separate prediction from balance, CBPS directly integrates both objectives, simultaneously optimizing model fit and covariate balance. This makes it particularly useful when there is concern about model misspecification or limited sample size25.
In line with these balance-focused approaches, recent work by Li et al. has demonstrated that enforcing balance directly through estimation methods—such as CBPS and entropy balancing—can lead to more accurate and less biased treatment effect estimates, particularly when traditional models are misspecified. Their simulation studies showed improvements in bias, variance, and mean squared error, supporting the utility of balance-focused methods26 Similarly, Huan et al. proposed a flexible weighting strategy that chooses between global and local scores based on balance quality. This method has proven effective for multi-site survival analyses and performs comparably to individual-level data pooling27.
Matching
After estimating propensity scores, researchers must choose how to match treated and untreated individuals. The decision should be informed by study goals, sample size, and the distribution of the propensity scores. In the practical exercise, we used 1:1 nearest neighbor matching, which is straightforward and maintains interpretability. Nonetheless, other strategies may be more efficient. For instance, one-to-many (K:1) matching allows each treated subject to be matched with multiple controls, improving precision and reducing standard errors. Evidence from simulations and applied studies suggests that variable-ratio matching often results in lower mean squared error with only a small trade-off in bias28.
Each method brings its own strengths and limitations. Nearest neighbor matching is easy to apply but may yield poor matches in the absence of strong overlap. Caliper matching mitigates this by setting a maximum acceptable difference in propensity scores between matched individuals, though it may reduce the number of matches. Meanwhile, optimal matching seeks to minimize the overall distance across all matched pairs but can be more computationally intensive. In all cases, it is critical to evaluate post-matching balance and justify the selected strategy in light of the dataset’s characteristics11,29.
Assessing balance
Once matching is completed, it is essential to assess whether the groups are now comparable in terms of baseline covariates. The most widely recommended metric is the standardized mean difference (SMD), which is not affected by sample size and provides a directional indication of imbalance. An SMD below 0.1 in absolute value is generally considered acceptable30. Complementary to numerical metrics, visual diagnostics—such as histograms or box plots—can help detect residual imbalances and verify that the matching process worked as intended22. This step is fundamental for establishing the internal validity of the treatment effect estimate11.
Sensitivity analysis
Even when good balance is achieved, it is important to assess whether the results are robust to reasonable changes in the matching procedure. Sensitivity analyses may involve varying the caliper width (e.g., 0.05, 0.1, 0.15) or trying alternative matching methods such as optimal or full matching. By comparing the treatment effect estimates across different configurations, researchers can determine whether findings are consistent or dependent on specific modeling choices23.
Estimating treatment effects
With covariates balanced and matching complete, the treatment effect can be estimated. This is typically done by regressing the outcome on the treatment variable in the matched sample. Because the matching process already adjusted for confounding, it is not necessary to reintroduce covariates into the regression model. This approach ensures that each treated subject is compared only to a comparable control and allows for clean interpretation of the treatment effect12.
Practical considerations and common pitfalls
While PSM offers a robust framework for addressing confounding, several practical aspects must be considered to ensure its validity.
First, although selecting appropriate covariates has been addressed earlier, it is worth emphasizing that omitting key confounders or including irrelevant variables can still compromise results—either by introducing residual bias or by increasing variance, particularly in small samples11,29.
Second, adequate overlap in propensity scores between treatment groups is essential. A lack of common support can hinder valid comparisons and may require trimming unmatched individuals, potentially reducing statistical power22. Visualizing score distributions can help diagnose this issue early in the process.
Third, match quality should not be assumed. Post-matching balance diagnostics, such as SMD, remain essential to confirm that the procedure was successful and that the matched groups are truly comparable22,25.
Finally, it is important to recognize that matching inherently reduces the available sample size. In settings with limited data, this may threaten power and precision. When excessive data loss occurs, alternative approaches—such as inverse probability weighting or covariate adjustment—may offer a more efficient solution23.
Practical example of PSM in kidney transplant research
In this section, we conduct a practical exercise in which PSM is applied to a publicly available dataset focused on kidney transplants. In this example, diabetes status is treated as the exposure variable, and the propensity score is estimated as the probability of having diabetes given relevant covariates (such as age, dialysis time, sex, blood type, and subregion). Matching is performed to balance these covariates between patients with and without diabetes, enabling a comparison of transplant outcomes between these groups.
This example has been intentionally simplified into a binary outcome analysis (transplanted or not) to illustrate the steps involved in implementing PSM. It does not account for the probability of transplantation as a time-to-event process with competing risks (e.g., death or permanent waitlist removal), and continuous variables are dichotomized while certain confounders are excluded to enhance didactic clarity.
The objective is to perform a PSM analysis to examine the impact of a diabetes diagnosis on the likelihood of receiving a transplant among patients on a waitlist. This practical example is conducted in RStudio via R version 4.3.331. By following this example, we aim to demonstrate the step-by-step implementation of PSM, emphasizing the importance of meticulous covariate selection and methodological rigor in observational studies. This exercise provides a detailed and systematic approach to applying PSM, ensuring the robustness and validity of the findings. Furthermore, this example serves as a foundational guide for the application of this methodology, thereby enhancing the overall quality and reliability of research in this field.
Dataset description
The dataset used here is sourced from Kaggle, titled “Waitlist Kidney Brazil”32. It includes patient demographics, clinical factors, and treatment details such as age, time on dialysis, race, sex, underlying disease, diabetes status, blood type, subregion, and transplant outcomes.
Running the analysis directly in R
To execute the full PSM analysis directly, simply copy and paste the script shown in Fig. 1 into RStudio. This script automatically downloads the complete R Markdown file containing the analysis and the kidney transplant waitlist dataset from GitHub and runs the workflow in a fully reproducible way. Both the R Markdown file and dataset, while hosted on GitHub, are also permanently archived and accessible via Zenodo33. The output appears as an HTML report within your R session, without needing to save files manually. This HTML report is also included as Supplementary File 1.
Step-by-Step implementation of the PSM analysis
Before starting the main analysis, an automatic script prepares all necessary elements to ensure reproducibility. This script installs and loads the required R packages (MatchIt, dplyr, readr, cobalt, ggplot2, and gtsummary) from a stable CRAN mirror, downloads the kidney transplant waitlist dataset from GitHub, standardizes column names, corrects character encoding, and recodes the outcome variable (Transplant_Y_N) as a binary indicator (1 = Yes, 0 = No). It also recategorizes key variables such as age and dialysis duration into clinically meaningful groups and removes incomplete cases, generating a clean dataset (data_filtered) ready for matching. Although dichotomizing continuous variables may introduce limitations, such as masking residual imbalance across the full range of values, this step was implemented strictly for didactic purposes to streamline the example and facilitate replication. It is not intended as a methodological recommendation. The full setup code is included in the R Markdown file, although it is not displayed in the HTML output for clarity.
Step 1: sample size before matching
Initially, the analytic dataset size is assessed using the nrow() function to count complete cases. The baseline sample includes 46,817 observations, providing a reference point for evaluating the impact of subsequent matching on sample size and data retention. In addition, Table 1 summarizes baseline characteristics overall and by diabetes status.
Step 2: unadjusted association between exposure and outcome
Prior to matching, a logistic regression (glm) estimates the crude association between the exposure (e.g., diabetes) and outcome (receiving a transplant). In this unadjusted model, the exposure coefficient reflects the log-odds of the outcome, providing an initial baseline. A statistically significant result here (log-odds = 0.41, p < 0.001) indicates a strong, yet potentially confounded relationship between diabetes and transplant probability.
Step 3: propensity score matching procedure
To minimize confounding and enhance the comparability between individuals with and without diabetes, we applied PSM using the matchit() function from the MatchIt package. The approach selected was nearest neighbor matching, supplemented with exact matching on key categorical variables—race, blood type, and subregion—to ensure that pairs were only formed within identical strata. Additionally, a caliper width of 0.2 standard deviations was imposed, restricting matches to those with closely aligned propensity scores. This rigorous combination of methods maximizes the quality and comparability of the matched pairs, substantially reducing potential systematic bias. Inevitably, some cases remain unmatched due to these strict criteria; however, this is an expected and acceptable trade-off, as it prioritizes analytic rigor and matching quality over sheer sample size. Because sex showed residual imbalance in the baseline match (see Steps 5–6), we also examined tighter calipers (0.05–0.15) and an exact-on-sex specification in sensitivity analyses (Steps 8–9).
Step 4: sample size after matching
Following the matching process, we used the nrow() function to determine the number of observations retained. The final matched cohort consisted of 19,504 individuals, reflecting a considered trade-off between preserving statistical power and enhancing covariate balance through stringent matching criteria. Such a reduction in sample size is expected in rigorous propensity score analyses, where the primary focus is on ensuring high-quality matches to reduce potential bias.
Step 5: assessing covariate balance numerically
To determine whether the matching procedure effectively balanced the main covariates, we calculated SMD for each variable before and after matching using the bal.tab() function from the cobalt package. The most relevant SMD results are summarized in Table 2. After matching, nearly all covariates exhibited SMD values close to zero, reflecting successful reduction of imbalance between the diabetes and non-diabetes groups. The only exception was a minor residual imbalance in the “Male” variable. Specifically, the post-match SMD for sex was 0.156 (above the 0.10 threshold), which motivated the sensitivity checks reported in Steps 8–9.
Step 6: visualizing covariate balance with a love plot
To complement the numeric summary, we generated a Love plot using the love.plot() function from the cobalt package, which offers a visual representation of the SMDs reported in Table 2 (Fig. 2). In this plot, each covariate appears on the y-axis and SMDs are shown on the x-axis. Red dots represent SMD before matching, and blue dots indicate SMD after matching. The vertical dashed line at 0.1 denotes the threshold for acceptable covariate balance. As shown in Fig. 2, after matching, SMDs for all main covariates—except for a slight imbalance in the “Male” variable—fall well below the threshold, visually confirming the success of the matching procedure Consistent with Step 5, sex remains slightly above the 0.10 line in the baseline match.
Step 7: assessing propensity score overlap and distribution
To evaluate whether the PSM procedure achieved adequate overlap and comparability between groups, we examined the distribution of propensity scores using two complementary visual diagnostics generated with the plot() function from the MatchIt package. First, the jitter plot (Fig. 3) displays the distribution of individual propensity scores among treated (diabetes) and control (non-diabetes) units before and after matching. This plot shows that, after matching, most treated and control observations lie within a common range of propensity scores, supporting the validity of comparisons within the matched sample. Second, the set of histograms (Fig. 4) shows the proportion of subjects at each propensity score interval for both groups, before and after matching. The close alignment of these distributions in the matched samples further demonstrates that the matching process produced analytic groups with similar baseline characteristics. Collectively, these visualizations confirm that the matched dataset achieves substantial overlap in propensity scores, which is critical for unbiased estimation of treatment effects in subsequent analyses.
Step 8: sensitivity analysis by caliper width
To test the robustness of findings, a sensitivity analysis is conducted with different caliper widths (0.05, 0.10, 0.15) in the matching process. For each caliper, logistic regression estimates the diabetes effect on transplant likelihood. Results are consistently positive and statistically significant (caliper 0.05: 0.17; caliper 0.10: 0.25; caliper 0.15: 0.27; all p < 0.001), indicating stable results despite variations in matching strictness. This stability reinforces confidence in the robustness of the primary findings. Importantly, sex balance improved to SMD < 0.10 at calipers 0.05 (− 0.067) and 0.10 (− 0.078), but not at 0.15 (− 0.142).
Step 9: sensitivity analysis by matching method
To check robustness, three nearest-neighbor variants were run at a caliper of 0.10: without replacement, with replacement, and exact-on-sex. In every case, balance on sex improved to an absolute SMD < 0.10. Notably, the exact-on-sex variant achieved perfect balance on sex (SMD = 0.000) while keeping a matched sample size (N) similar to the no-replacement design, and it produced a diabetes coefficient of ≈ 0.216 (standard error, SE, ≈ 0.034; p < 1 × 10⁻⁹).
Step 10: adjusted association after matching
Finally, logistic regression analysis (glm) was fitted to the matched dataset to estimate the adjusted association between diabetes and the probability of transplant. For inference, dependence within matched sets was addressed using cluster-robust (“sandwich”) standard errors (SEs) at the matched-set level (implemented with the sandwich and lmtest R packages). This is comparable to generalized estimating equations (GEE) with an exchangeable working correlation and is recommended for matched observational data34,35. Under the final specification (nearest-neighbor matching, caliper 0.10, exact-on-sex), the diabetes coefficient was ≈ 0.216 with a cluster-robust SE ≈ 0.033 (z-statistic, z, ≈ 6.49; p-value, p, ≈ 8.45 × 10⁻¹¹), which corresponds to an odds ratio (OR) of ~ 1.24. Therefore, compared with the unadjusted analysis, this smaller but still significant effect supports effective confounding control through PSM. If any covariate had remained ≥ 0.10 in SMD after matching, a doubly robust outcome model adjusting for that covariate would have been added, still using cluster-robust SEs.
Reporting guidelines for PSM studies
Essential elements to report
Covariates included and rationale
It is essential to list all covariates included in the propensity score model and provide a rationale for their inclusion. Covariates should be selected on the basis of theoretical considerations, prior empirical research, or both, ensuring that they are related to both the treatment and the outcome. This comprehensive selection process helps to adequately control for confounding factors36,37.
Matching algorithm and parameters
Describe the matching algorithm used, such as nearest neighbor, caliper matching, or Mahalanobis distance matching. Additionally, specify any parameter set, such as the caliper width or matching ratio. For example, nearest neighbor matching might use a 1:1 or 1:2 matching ratio, and caliper matching might specify a caliper width of 0.1 standard deviations of the logit of the propensity score29,37,38.
Sample sizes before and after matching
The sample sizes of the treatment and control groups are reported both before and after matching. This information is crucial for understanding the extent of data reduction due to matching and the potential impact on statistical power29,37.
Methods and results of balance assessment
The methods used to assess the balance of covariates between the treatment and control groups after matching were outlined. This typically involves SMDs, variance ratios, or graphical methods such as histograms and jitter plots. Reporting the results of these assessments demonstrates the effectiveness of the matching process in achieving covariate balance29,36,37,38.
Sensitivity analysis
A sensitivity analysis is conducted to check the robustness of the matching results. This includes varying the caliper width and trying different matching methods to ensure that the conclusions are consistent across different specifications. For example, varying the caliper width and using different matching methods can provide insights into the stability of the estimated treatment effects. The results of these analyses show that the findings are robust to different matching parameters and methods.
Statistical analysis of treatment effects
Detail the statistical methods used to estimate treatment effects after matching. This might include regression adjustment, difference-in-differences analysis, or instrumental variable approaches. The results of these analyses, including estimates of treatment effects, confidence intervals, and significance levels38, are presented.
Best practices for transparency and reproducibility
Detailed methodological description
A comprehensive description of all the steps in the PSM process, including data preprocessing, propensity score estimation, matching procedures, and balance assessment, is provided. This ensures that other researchers can fully understand and replicate the methodology.
Code and data sharing
Share the code used for propensity score estimation and matching, preferably with annotations to explain each step. Where possible, make the dataset or a synthetic version available to allow others to replicate the analysis. This practice enhances transparency and facilitates validation by other researchers.
Documentation of assumptions
Clearly, document all assumptions made during the analysis, such as the ignorability assumption and discuss their potential impact on the study’s conclusions. This transparency allows for a better understanding of the limitations and strengths of the study.
Thorough reporting of results
Include detailed tables and figures that show the balance of covariates before and after matching, as well as the estimated treatment effects. Provide supplementary materials if necessary to keep the main text concise. This thorough reporting ensures that the results are clear and interpretable.
Ethical considerations
We discuss any ethical considerations related to the data and analysis, including issues of consent, confidentiality, and potential biases introduced by the matching process. Addressing these issues is crucial for maintaining ethical standards and the integrity of research. It is also recommended to follow established guidelines, such as the STROBE guidelines, to further ensure clarity, transparency, and methodological rigor in reporting observational studies1.
By following these guidelines, researchers can enhance the transparency, reproducibility, and credibility of PSM studies. Ultimately, this contributes to the robustness of causal inference in observational research.
Complementary causal inference approaches
In addition to PSM, it is advisable to use complementary methods to address limitations inherent to matching alone. One valuable tool in this regard is the use of DAGs, which help make causal assumptions explicit and guide the selection of appropriate covariates for adjustment, reducing the risk of including mediators or colliders that could bias estimates39,40. Furthermore, an approach such as inverse probability of treatment weighting (IPTW) is an alternative to matching that can help preserve sample sizes and hence power; in particular, marginal structural models (MSMs) are well-suited when exposures and confounders vary over time, as is often the case in kidney transplant research41,42. For example, recent studies have demonstrated how IPTW can emulate a target trial comparing transplantation with long-term dialysis, providing more robust effect estimates than matching alone43.
Additionally, doubly robust estimators, which combine outcome regression with propensity-based weighting, offer an extra safeguard by providing valid causal estimates if either the propensity score model or the outcome model is correctly specified44,45. This dual protection makes doubly robust methods a valuable complement to both PSM and IPTW, particularly in observational contexts where some model misspecification is likely. By combining DAGs, IPTW, MSMs, and doubly robust estimation alongside PSM, researchers can strengthen the validity, transparency, and interpretability of causal inferences in kidney transplantation studies.
Conclusion
In conclusion, PSM plays an important role in kidney transplant research by providing a structured approach to control for confounding and strengthen the validity of findings from observational data. It improves comparability between treated and control groups on key baseline variables, allowing researchers to estimate treatment effects more reliably. Nonetheless, its effectiveness depends on appropriate covariate selection, consistent data quality, and careful consideration of potential sample size reductions, which may influence statistical power. To address these challenges, combining PSM with complementary methods such as DAGs, IPTW, and MSMs can help account for complex causal structures and time-varying confounding. Maintaining rigorous methodology, clearly reporting each analytical step, sharing code and assumptions, and conducting sensitivity analyses are essential for ensuring transparent and reproducible results. By applying these principles and adhering to established reporting standards such as STROBE, researchers can contribute to more robust and informative evidence that supports decision-making in kidney transplantation.
Data availability
The dataset analyzed in this article is available on [Kaggle](https:/www.kaggle.com/datasets/gustavomodelli/waitlist-kidney-brazil). Additionally, the dataset and the R script used for the analysis can be found in the [Colombiana de Trasplantes’ repository](https:/github.com/ColTrasplantes/PSM) on GitHub and can be downloaded via [Zenodo](https://doi.org/10.5281/zenodo.17023205).
Abbreviations
- CBPS:
-
Covariate Balancing Propensity Scores
- CRAN:
-
Comprehensive R Archive Network
- DAG:
-
Directed Acyclic Graph
- GEE:
-
Generalized Estimating Equations
- IPTW:
-
Inverse Probability of Treatment Weighting
- MSMs:
-
Marginal Structural Models
- OR:
-
Odds Ratio
- PSM:
-
Propensity Score Matching
- SE:
-
Standard Error
- SMD:
-
Standardized Mean Difference
References
Cuschieri, S. The STROBE guidelines. Vol. 13, Saudi Journal of Anaesthesia S31–S34 (Wolters Kluwer Medknow, 2019). https://doi.org/10.4103/sja.SJA_543_18
Corder, N. & Yang, S. Utilizing stratified generalized propensity score matching to approximate blocked randomized designs with multiple treatment levels. J. Biopharm. Stat. 32 (3), 373–399. https://doi.org/10.1080/10543406.2022.2065507 (2022).
Rosenbaum, P. R. & Rubin, D. B. The central role of the propensity score in observational studies for causal effects. 70. https://doi.org/10.1093/biomet/70 (1983). https://academic.oup.com/biomet/article/70/1/41/240879. Available from
Austin, P. C. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46 (3), 399–424. https://doi.org/10.1080/00273171.2011.568786 (2011).
Wang, J. To use or not to use propensity score matching? Pharm. Stat. 20 (1), 15–24. https://doi.org/10.1002/pst.2051 (2021).
Thomas, L., Li, F. & Pencina, M. Using Propensity Score Methods To Create Target Populations in Observational Clinical Research. Vol. 323, JAMA - Journal of the American Medical Association 466–467 (American Medical Association, 2020). https://doi.org/10.1001/jama.2019.21558
Okumi, M. et al. Preemptive kidney transplantation: a propensity score matched cohort study. Clin Exp Nephrol. 21(6), 1105–12. https://doi.org/10.1007/s10157-016-1345-x. Available from: https://doi.org/10.1007/s10157-016-1345-x (2017).
Kane, L. T. et al. Propensity Score Matching A Statistical Method [Internet]. (2020). https://doi.org/10.1097/BSD.0000000000000932. Available from: www.clinicalspinesurgery.com.
Nguyen, V. T. et al. Risk of bias in observational studies using routinely collected data of comparative effectiveness research: a meta-research study. BMC Med. 19 (1). https://doi.org/10.1186/s12916-021-02151-w (2021).
Kim, H. Y. et al. Comparison of clinical outcomes between preemptive transplant and transplant after a short period of Dialysis in Living-Donor kidney transplantation: A Propensity-Score-Based analysis. Ann. Transpl. 24, 75–83. https://doi.org/10.12659/AOT.913126 (2019).
Chen, J. W. et al. Best practice guidelines for propensity score methods in medical research: consideration on Theory, Implementation, and reporting. Rev. Arthrosc. - J. Arthroscopic Relat. Surg. 38 (2), 632–642. https://doi.org/10.1016/j.arthro.2021.06.037 (2022).
Schold, J. D., Malamon, J. & Kaplan, B. Statistical Confounding in Observational Research and Center Performance Evaluations in Organ Transplantation. Curr Transplant Rep. 10(4), 224–9. https://doi.org/10.1007/s40472-023-00420-6. Available from: https://doi.org/10.1007/s40472-023-00420-6. (2023).
Fu, R., Kim, S. J., de Oliveira, C. & Coyte, P. C. An instrumental variable approach confirms that the duration of pretransplant dialysis has a negative impact on the survival of kidney transplant recipients and quantifies the risk. Kidney Int. 96(2), 450–9. Available from: https://doi.org/10.1016/j.kint.2019.03.007. Available from: https://www.sciencedirect.com/science/article/pii/S0085253819303230 (2019).
Taber, D. J. et al. The Impact of Diabetes on Ethnic Disparities Seen in Kidney Transplantation. Ethn Dis. 23(2), 238–44. Available from: https://www-jstor-org.ezproxy.uniandes.edu.co/stable/48667841 (2013).
Isaacs, R. B. et al. Racial disparities in renal transplant outcomes. American Journal of Kidney Diseases. 34(4), 706–12. Available from: https://doi.org/10.1016/S0272-6386(99)70397-5. Available from: https://www.sciencedirect.com/science/article/pii/S0272638699703975(1999).
Cristelli, M. P. et al. Efficacy of Convalescent Plasma to Treat Mild to Moderate COVID-19 in Kidney Transplant Patients: A Propensity Score Matching Analysis. Vol. 106, Transplantation. Lippincott Williams and Wilkins; E92–4. https://doi.org/10.1097/TP.0000000000003962 (2022).
Lozano-Suárez, N., García-López, A., Gómez-Montero, A. & Girón-Luque, F. Relación Entre La compatibilidad Del HLA y La pérdida Del Injerto En Trasplante renal de Donante cadavérico: Un análisis Por propensity score matching En Colombia. Revista Colombiana De Cirugía. 39, 268–279. https://doi.org/10.30944/20117582.2491 (2024).
Castelli, C. et al. Impact of kidney transplantation in obese candidates: a time-dependent propensity score matching study. Nephrology Dialysis Transplantation. 37(9), 1768–76. https://doi.org/10.1093/ndt/gfac152. Available from: https://doi.org/10.1093/ndt/gfac152 (2022).
Sood, M. M. et al. Risk of Major Hemorrhage after Kidney Transplantation. Am J Nephrol. 41(1), 73–80. https://doi.org/10.1159/000371902. Available from: https://doi.org/10.1159/000371902 (2015).
Han, S. H., Go, J., Park, S. C. & Yun, S. S. Long-Term Outcome of Kidney Retransplantation in Comparison With First Transplantation: A Propensity Score Matching Analysis. Transplant Proc [Internet]. ;51(8):2582–6. Available from: https://doi.org/10.1016/j.transproceed.2019.03.070. (2019). Available from: https://www.sciencedirect.com/science/article/pii/S004113451930288X
Kosoku, A. et al. Sarcopenia as a predictor of mortality in kidney transplant recipients: A 5-year prospective cohort study with propensity score matching. International Journal of Urology [Internet]. 2024;n/a(n/a). https://doi.org/10.1111/iju.15539. Available from: https://doi.org/10.1111/iju.15539.
Staffa, S. J. & Zurakowski, D. Five steps to successfully implement and evaluate propensity score matching in clinical research studies. Anesth. Analg. 127 (4), 1066–1073. https://doi.org/10.1213/ANE.0000000000002787 (2018).
Ali, M. S. et al. Propensity score methods in health technology assessment: Principles, extended applications, and recent advances. Front. Pharmacol. 10 https://doi.org/10.3389/fphar.2019.00973 (2019).
Arif, S. & MacNeil, M. A. Utilizing causal diagrams across quasi-experimental approaches. Ecosphere 13 (4). https://doi.org/10.1002/ecs2.4009 (2022).
Imai, K., Ratkovic, M., Stat Soc Series, J. R. & Stat Methodol, B. Covariate Balancing Propensity Score. 76(1), 243–63. https://doi.org/10.1111/rssb.12027. Available from: https://doi.org/10.1111/rssb.12027 (2014).
Li, Y. & Li, L. Propensity score analysis methods with balancing constraints: A Monte Carlo study. Stat. Methods Med. Res. 30 (4), 1119–1142. https://doi.org/10.1177/0962280220983512 (2021).
Huang, C., Wei, K., Wang, C., Yu, Y. & Qin, G. Covariate balance-related propensity score weighting in estimating overall hazard ratio with distributed survival data. BMC Med. Res. Methodol. 23 (1). https://doi.org/10.1186/s12874-023-02055-8 (2023).
Rassen, J. A. et al. One-to-many propensity score matching in cohort studies. Pharmacoepidemiol Drug Saf. 21 (SUPPL.2), 69–80. https://doi.org/10.1002/pds.3263 (2012).
Randolph, J. J., Falbe, K. & Practical Assessment A step-by-step guide to propensity score matching in R. Res. Evaluation ;19. DOI: https://doi.org/10.7275/n3pv-tx27 (2014).
Austin, P. C. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat. Med. 28 (25), 3083–3107. https://doi.org/10.1002/sim.3697 (2009).
Team, R. C. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available from: https://www.R-project.org/ (2024).
Modelli, G. Waitlist Kidney Brazil. Available from: https://www.kaggle.com/datasets/gustavomodelli/waitlist-kidney-brazil (2023).
ColTrasplantes. ColTrasplantes/PSM: v3.0 . Zenodo. DOI: 10.5281/zenodo.17023205. (2025). Available from: https://doi.org/10.5281/zenodo.17023205
Austin, P. C. A comparison of variance estimators for logistic regression models estimated using generalized estimating equations (GEE) in the context of observational health services research. Stat. Med. 43 (29), 5548–5561. https://doi.org/10.1002/sim.10260 (2024).
Austin, P. C., Kapral, M. K., Vyas, M. V., Fang, J. & Yu, A. Y. X. Using multilevel models and generalized estimating equation models to account for clustering in neurology clinical research. Neurology 103 (9). https://doi.org/10.1212/WNL.0000000000209947 (2024).
Prasad, A. et al. Propensity score matching in otolaryngologic literature: A systematic review and critical appraisal. Vol. 15, PLoS ONE. Public Library of Science; (2020). https://doi.org/10.1371/journal.pone.0244423
Yao, X. I. et al. Reporting and Guidelines in Propensity Score Analysis: A Systematic Review of Cancer and Cancer Surgical Studies 109 (Oxford University Press, 2017). Journal of the National Cancer Institute10.1093/jnci/djw323
Li, M. Using the propensity score method to estimate causal effects: A review and practical guide. Vol. 16, Organizational Research Methods. SAGE Publications Inc.; 188–226. DOI: https://doi.org/10.1177/1094428112447816 (2013).
Suttorp, M. M., Siegerink, B., Jager, K. J., Zoccali, C. & Dekker, F. W. Graphical Presentation of Confounding in Directed Acyclic Graphs. 30, 1418–1423 (Oxford University Press, 2015). Nephrology Dialysis Transplantation10.1093/ndt/gfu325
Digitale, J. C., Martin, J. N. & Glymour, M. M. Tutorial on directed acyclic graphs. J. Clin. Epidemiol. 142, 264–267. https://doi.org/10.1016/j.jclinepi.2021.08.001 (2022).
Williamson, T. & Ravani, P. Marginal Structural Models in Clinical Research: when and How To Use them? 32, ii84–90 (Oxford University Press, 2017). Nephrology Dialysis Transplantation10.1093/ndt/gfw341
Cohen, J. B. et al. Leveraging marginal structural modeling with Cox regression to assess the survival benefit of accepting vs declining kidney allograft offers. Am. J. Transplant. 19 (7), 1999–2008. https://doi.org/10.1111/ajt.15290 (2019).
Strohmaier, S. et al. Survival benefit of first Single-Organ deceased donor kidney transplantation compared with Long-term Dialysis across ages in Transplant-Eligible patients with kidney failure. JAMA Netw. Open. 5 (10), E2234971. https://doi.org/10.1001/jamanetworkopen.2022.34971 (2022).
Funk, M. J. et al. Doubly robust Estimation of causal effects. Am. J. Epidemiol. 173 (7), 761–767. https://doi.org/10.1093/aje/kwq439 (2011).
Gabriel, E. E. et al. Inverse probability of treatment weighting with generalized linear outcome models for doubly robust Estimation. Stat. Med. 43 (3), 534–547. https://doi.org/10.1002/sim.9969 (2024).
Author information
Authors and Affiliations
Contributions
Andrea Gomez-Montero: Conceptualization of the article, statistical analysis, and primary manuscript drafting. Andrea Garcia-Lopez: Support in methodological development and critical review of the manuscript. Santiago Cabas: Support in methodological development and critical review of the manuscript. Adrián Alfonso Nieves-Rico: Partial writing and critical review of the manuscript. Fernando Giron-Luque: Overall supervision, support in the study conceptualization, and final review.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gomez-Montero, A., Garcia-Lopez, A., Cabas, S. et al. Methodological guidance on implementing propensity score matching in observational studies of kidney transplantation. Sci Rep 16, 1878 (2026). https://doi.org/10.1038/s41598-025-31596-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-31596-9






