Abstract
Nebbia et al. 1 report that foundation models enable re-identification of patients in medical imaging datasets, given a known medical image of the same type of targeted individual. They postulate that this is due to the ability of foundation models to identify attributes such as age, sex or ethnicity of a person from said imaging and identify this as a risk stemming from medical foundation model research.
Non-foundation models outperform foundation models at “re-identification”
Most re-identification experiments performed by the authors are in retinal imaging and use a foundation model called RETFound2. RETFound is used to obtain “features”, i.e. a numeric vector that describes the content of retinal images in a meaningful yet abstract way. One of the datasets they consider is openly available, namely the GRAPE dataset3 We replicate their experiments on this dataset, but replace RETFound with a very small convolutional neural network (CNN) – a 10-layer ResNet4 - that has never been trained on any retinal images. The experiment here involved no training either; the model is kept frozen. Furthermore, we conduct the same experiment using raw pixel values instead of features from a neural network. Our code is available here: https://gist.github.com/justinengelmann/63d5ad32cad4b57bb31627ded9111093.
The results are shown in Table 1. ResNet10t achieves substantially higher performance across the board. In other words, a CNN that was not trained on any retinal images allows for better re-identification than RETFound. This calls into question the titular thesis of Nebbia et al. that foundation models enable such re-identification. Furthermore, the very naïve approach of comparing pixel values directly achieves worse yet comparable performance to RETFound, and substantially better performance than random guessing. This suggests that the images of a given patient in the GRAPE dataset might simply be very similar to each other and re-identification is trivial in this scenario.
Figure 1 shows some examples. The successful re-identification examples in Fig. 1 a are nearly identical to each other. So, it appears that on the GRAPE dataset, “re-identification” is not particularly complex. This is not surprising, as the follow-up interval in GRAPE is relatively short (mean 18 months, min 5, max 53), the images were quality controlled and are of the same fixation. In Fig. 1 b, we can see that erroneous matches are likewise visually similar, supporting the view that it is this superficial similarity that RETFound matches on.
Example retinal image pairs from GRAPE, where finding the most similar image using RETFound features retrieved an image from the same patient (a) and a different patient (b), respectively; c illustrates a fundus image at normal resolution (top) and a resolution of 16 × 16 pixels (bottom). The correct matches have very consistent fixation and little apparent change between visits.
Thus, we think that the lack of non-foundation model baselines in Nebbia et al. might have led to an interpretation of the results that does not hold up to closer examination, namely that foundation models are what enable the re-identification results they obtained.
Predictability of demographic features and “re-identification”
Nebbia et al.’s explanation for why foundation models enable re-identification is that they learn representations that encode demographic information. So, if – as we have shown – non-foundation models can outperform foundation models, what do we then make of the finding that demographic features were marginally better predicted in patients that could be re-identified?
Nebbia et al. argue that this indicates that their “hypothesis that demographic features prediction and re-identification are related was correct”. We would be more cautious. For instance, image quality could explain this difference just as well: Some fundus images are blurry or under-illuminated, making most or all of the retina hard to see. Thus, from such images, demographic features will be less well predicted as they contain less information. Poor quality images would also lead to poorer re-identification as e.g. all blurry images look similar to each other, yet a given patient likely has a mix of good and bad quality images which look very different from each other.
Finding similar images is not the same as “re-identification”
A secondary point we want to raise is that the experiments by Nebbia et al., strictly speaking, do not (re-)identify anyone. Instead, they show that given an image of an individual, it is possible, some of the time, to retrieve an image of the same individual from a larger pool of images. However, the identity of said individual remains unknown if it was not already known.
This distinction is not mere sophistry. In our opinion, the phrase “re-identification from medical imaging” is likely to be misunderstood, especially by a lay audience such as patients concerned about their privacy, and suggests that one could identify a particular person of interest (e.g. Pearse Keane) given an openly available dataset of medical images (e.g. retinal images). But that is not the case. One needs to already be in possession of a medical image of the same type of person. Only then can we attempt to find similar images in the dataset.
Consider what the “attacker” gains when they try to re-identify someone in a dataset of retinal images. To attempt their attack, they must already possess an image of the targeted individual. If they execute their attack successfully, they might identify additional images as well as associated metadata from the dataset. For retinal imaging datasets (e.g., GRAPE), this is typically age, sex, and information about ocular disease.
However, as Nebbia et al. also point out, age and sex are reasonably well-predicted from fundus images. Of course, fundus images further contain rich information about ocular health. Thus, prior to executing their attack, the attacker could already infer age, sex, and ocular health of the targeted individual from the fundus image they needed to have to attempt re-identification. It is then unclear what harm resulted from the re-identification attack.
In security research, the focus is on “threat models” – under what circumstances can an attacker with specific resources bring about specific harm. In the scenario that Nebbia et al. consider, the harm in question is that the attacker learns protected information about a target individual. But in order to do so, the attacker needs to have resources that allow them to infer said information even without any re-identification.
Conclusion
In summary, the view presented by Nebbia et al. that foundation models enable re-identification from medical imaging appears incompatible with the experimental results presented here. On the contrary, non-foundation models might allow better image matching. Finally, the responsible use of patient data in medical research is paramount and thus studying potential risks is important. However, data need to be carefully interpreted and properly contextualised. This is especially important as this is an area of particular interest to a lay audience. The re-identification scenario Nebbia et al. consider would require an attacker to already be in a position to infer much of the information that could be learned by re-identifying someone. We recommend that future work consider simple baseline approaches (e.g., matching on pixels) to understand whether image-matching results are non-trivial, and to carefully spell out the envisioned threat model, including what resources an attacker needs and what potential harm could result.
Data availability
Our code is available here: https://gist.github.com/justinengelmann/63d5ad32cad4b57bb31627ded9111093.
References
Nebbia, G. et al. Re-identification of patients from imaging features extracted by foundation models. npj Digit. Med. 8, 469, https://doi.org/10.1038/s41746-025-01801-0 (2025).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163, https://doi.org/10.1038/s41586-023-06555-x (2023).
Huang, X. et al. ‘GRAPE: A multi-modal dataset of longitudinal follow-up visual field and fundus images for glaucoma management’, Sci Data, 10 https://doi.org/10.1038/s41597-023-02424-4. (2023).
He, K., Zhang, X., Ren, S. & Sun, J. ‘Deep residual learning for image recognition’, in Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. (2016).
Acknowledgements
We thank Nebbia et al. for investigating this important topic and for making their code openly available. We further thank Huang et al. for making the GRAPE dataset openly available, which is a substantial contribution to progress in our field.
Author information
Authors and Affiliations
Contributions
J.E. conceptualised and drafted the manuscript and conducted the experiments. Y.Z. checked the code and reproduced the experiment. Y.Z., S.K.W., and P.A.K. provided feedback on the manuscript and its content.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Engelmann, J., Wagner, S.K., Zhou, Y. et al. Matters Arising: Near-identical images, not foundation models, explain purported re-identification of patients from medical imaging. npj Digit. Med. 9, 159 (2026). https://doi.org/10.1038/s41746-026-02440-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41746-026-02440-9
