Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Kernel mean matching enhances risk estimation under spatial distribution shifts
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 02 February 2026

Kernel mean matching enhances risk estimation under spatial distribution shifts

  • Egor Serov1,
  • Diana Koldasbayeva1 &
  • Alexey Zaytsev1,2 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Ecology
  • Mathematics and computing

Abstract

Accurate risk estimation under distribution shifts is critical for deploying machine learning models in real-world spatial applications, from ecological forecasting to medical image analysis. Conventional methods such as No Weighting (NW) and Importance Weighting (IW) fail in spatially structured data due to two challenges: (1) density ratio estimation in high-dimensional clustered distributions and (2) non-stationarity from environmental gradients or sampling biases. Classifier-based approaches offer partial improvements but often yield miscalibrated risk estimates by prioritizing discriminative accuracy over distribution alignment. We conduct a systematic evaluation of four risk estimation methods —NW, IW, Kernel Mean Matching (KMM), and classifier-based reweighting—across synthetic benchmarks (with controlled spatial clustering) and real-world datasets (species distributions and immune cell layouts). Results show that KMM achieves superior robustness, reducing Mean Absolute Percentage Error (MAPE) by 12.3–86.5% compared to alternatives in high-dimensional settings. This advantage stems from KMM’s direct minimization of distributional divergence via kernel embeddings, bypassing error-prone density ratio estimation. Our findings demonstrate that KMM is a principled solution for spatial risk estimation, particularly when source and target distributions exhibit complex clustering or sampling artifacts. Its consistency across ecological and biomedical domains suggests broad applicability for reliable model deployment in spatially heterogeneous environments.

Similar content being viewed by others

An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE

Article Open access 07 October 2022

Real-time high-resolution millimeter-wave imaging for in-vivo skin cancer diagnosis

Article Open access 23 March 2022

High-resolution mapping of single cells in spatial context

Article Open access 15 July 2025

Data and code availability

For the data, preprocessing and modeling details to reproduce the calculations, we refer the reader to the repository of the project https://github.com/awesomeslayer/Importance-reweighting.

References

  1. Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inference 90, 227–244 (2000).

    Google Scholar 

  2. James, F. Monte Carlo theory and practice. Rep. Prog. Phys. 43, 1145 (1980).

    Google Scholar 

  3. Bickel, S., Brückner, M. & Scheffer, T. Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 2137–2155 (2009).

    Google Scholar 

  4. Zadrozny, B. Learning and evaluating classifiers under sample selection bias. In Proceedings of the 21st International Conference on Machine Learning 114 (ACM, Banff, Alberta, Canada, 2004) https://doi.org/10.1145/1015330.1015425.

  5. Tokdar, S. T. & Kass, R. E. Importance sampling: A review. Wiley Interdiscip. Rev.: Comput. Stat. 2, 54–60 (2010).

    Google Scholar 

  6. Wills, R. C., Dong, Y., Proistosecu, C., Armour, K. C. & Battisti, D. S. Systematic climate model biases in the large-scale patterns of recent sea-surface temperature and sea-level pressure change. Geophys. Res. Lett. 49, e2022GL100011 (2022).

    Google Scholar 

  7. Denissen, J. M. et al. Widespread shift from ecosystem energy to water limitation with climate change. Nat. Clim. Chang. 12, 677–684 (2022).

    Google Scholar 

  8. Ben-Said, M. Spatial point-pattern analysis as a powerful tool in identifying pattern-process relationships in plant ecology: An updated review. Ecol. Process. 10, 1–23 (2021).

    Google Scholar 

  9. Gatrell, A. C., Bailey, T. C., Diggle, P. J. & Rowlingson, B. S. Spatial point pattern analysis and its application in geographical epidemiology. Trans. Inst. Br. Geogr. 256–274 (1996).

  10. Vokinger, K. N., Feuerriegel, S. & Kesselheim, A. S. Mitigating bias in machine learning for medicine. Commun. Med. 1, 25 (2021).

    Google Scholar 

  11. Zhao, Z. et al. Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy. BMC Med. Genom. 12, 1–10 (2019).

    Google Scholar 

  12. Vegas, E., Oller, J. M. & Reverter, F. Inferring differentially expressed pathways using kernel maximum mean discrepancy-based test. BMC Bioinform. 17, 399–405 (2016).

    Google Scholar 

  13. Maley, C. C., Koelble, K., Natrajan, R., Aktipis, A. & Yuan, Y. An ecological measure of immune-cancer colocalization as a prognostic factor for breast cancer. Breast Cancer Res. 17, 1–13 (2015).

    Google Scholar 

  14. Roberts, D. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography https://doi.org/10.1111/ecog.02881 (2016).

    Google Scholar 

  15. Meyer, H. & Pebesma, E. Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods Ecol. Evol. 12, 1620–1633. https://doi.org/10.1111/2041-210x.13650 (2021).

    Google Scholar 

  16. Tuia, D., Persello, C. & Bruzzone, L. Domain adaptation for the classification of remote sensing data: An overview of recent advances. IEEE Geosci. Remote Sensing Mag. 4, 41–57. https://doi.org/10.1109/MGRS.2016.2548504 (2016).

    Google Scholar 

  17. Wilson, G. & Cook, D. J. A survey of unsupervised deep domain adaptation. ACM Trans. Intell. Syst. Technol. (TIST) 11, 1–46 (2020).

    Google Scholar 

  18. Gretton, A. et al. Covariate shift by kernel mean matching. Dataset Shift Mach. Learn. 3, 5 (2009).

    Google Scholar 

  19. Martynova, E. & Textor, J. A uniformly bounded correlation function for spatial point patterns. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2177–2188 (2024).

  20. Scott, D. W. On optimal and data-based histograms. Biometrika 66, 605–610. https://doi.org/10.1093/biomet/66.3.605 (1979).

    Google Scholar 

  21. Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B. & Smola, A. Correcting sample selection bias by unlabeled data. Adv. Neural Inf. Process. Syst. 19, (2006).

  22. Quionero-Candela, J., Sugiyama, M., Schwaighofer, A. & Lawrence, N. D. Dataset Shift in Machine Learning (The MIT Press, 2009).

    Google Scholar 

  23. GBIF.org. Occurrence download: Tussilago farfara l. (2024). https://www.gbif.org/occurrence/download/0031125-240626123714530. Accessed 20 Jul 2024.

  24. GBIF.org. Occurrence download: Anemone nemorosa l. (2024). https://www.gbif.org/occurrence/download/0031144-240626123714530. Accessed 20 Jul 2024.

  25. GBIF.org. Occurrence download: Caltha palustris l. https://www.gbif.org/occurrence/download/0031146-240626123714530 (2024). Accessed 20 Jul 2024.

  26. Hijmans, R. J. et al. Package ‘raster’. R package 734, 473 (2015).

    Google Scholar 

  27. Bivand, R. et al. Package ‘rgdal’. Bindings for the Geospatial Data Abstraction Library. Available online: https://cran.r-project.org/web/packages/rgdal/index.html (Accessed on 15 Oct 2017) 172 (2015).

  28. Pebesma, E. J. et al. Simple features for r: Standardized support for spatial vector data. R J. 10, 439 (2018).

    Google Scholar 

  29. Hijmans, R. J. et al. Package ‘terra’ (Vienna, Austria, Maintainer, 2022).

    Google Scholar 

  30. van der Hoorn, I. A. et al. Detection of dendritic cell subsets in the tumor microenvironment by multiplex immunohistochemistry. Eur. J. Immunol. 54, 2350616 (2024).

    Google Scholar 

  31. van der Woude, L. L., Gorris, M. A., Halilovic, A., Figdor, C. G. & de Vries, I. J. M. Migrating into the tumor: A roadmap for t cells. Trends Cancer 3, 797–808 (2017).

    Google Scholar 

  32. Sultan, S. et al. A segmentation-free machine learning architecture for immune land-scape phenotyping in solid tumors by multichannel imaging. BioRxiv 2021–10 (2021).

  33. Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 1189–1232 (2001).

  34. Liu, A. & Ziebart, B. D. Robust classification under sample selection bias. In (eds. Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. & Weinberger, K.) Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., 2014).

  35. Cauchois, M., Gupta, S., Ali, A. & Duchi, J. C. Robust validation: Confident predictions even when distributions shift. J. Am. Stat. Assoc. 119, 3033–3044 (2024).

    Google Scholar 

  36. Lam, H. & Zhang, H. Doubly robust stein-kernelized monte carlo estimator: Simultaneous bias-variance reduction and supercanonical convergence (2023). arXiv:2110.12131.

  37. Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 15, e1002683 (2018).

    Google Scholar 

Download references

Funding

The work was supported by the grant for research centers in the field of AI provided by the Ministry of Economic Development of the Russian Federation in accordance with the agreement 000000C313925P4F0002 and the agreement with Skoltech №139-10-2025-033.

Author information

Authors and Affiliations

  1. Skolkovo Institute of Science and Technology, Moscow, Russia

    Egor Serov, Diana Koldasbayeva & Alexey Zaytsev

  2. Beijing Institute of Mathematical Sciences and Applications, Beijing, China

    Alexey Zaytsev

Authors
  1. Egor Serov
    View author publications

    Search author on:PubMed Google Scholar

  2. Diana Koldasbayeva
    View author publications

    Search author on:PubMed Google Scholar

  3. Alexey Zaytsev
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization: A.Z., E.S. and D.K.; methodology: E.S., A.Z., D.K.; software: E.S.; validation: E.S., A.Z.; formal analysis: E.S., D.K.; investigation: A.Z., E.S.; data curation: D.K., E.S.; writing—original draft preparation: E.S., D.K.; writing—review and editing: D.K., E.S. and A.Z.; visualization: E.S., D.K.; supervision: A.Z.; project administration: A.Z., D.K. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Diana Koldasbayeva.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Serov, E., Koldasbayeva, D. & Zaytsev, A. Kernel mean matching enhances risk estimation under spatial distribution shifts. Sci Rep (2026). https://doi.org/10.1038/s41598-026-36740-7

Download citation

  • Received: 16 September 2025

  • Accepted: 16 January 2026

  • Published: 02 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-36740-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Kernel mean matching
  • Spatial risk estimation
  • Spatial modeling
  • Importance reweighting
  • Distribution shift robustness
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing Anthropocene

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Anthropocene