Matters Arising: Near-identical images, not foundation models, explain purported re-identification of patients from medical imaging

Engelmann, Justin; Wagner, Siegfried K.; Zhou, Yukun; Keane, Pearse A.

doi:10.1038/s41746-026-02440-9

Download PDF

Matters Arising
Open access
Published: 13 February 2026

Matters Arising: Near-identical images, not foundation models, explain purported re-identification of patients from medical imaging

Justin Engelmann^1,2,
Siegfried K. Wagner^1,2,
Yukun Zhou^1,2 &
…
Pearse A. Keane^1,2

npj Digital Medicine volume 9, Article number: 159 (2026) Cite this article

472 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Nebbia et al. ¹ report that foundation models enable re-identification of patients in medical imaging datasets, given a known medical image of the same type of targeted individual. They postulate that this is due to the ability of foundation models to identify attributes such as age, sex or ethnicity of a person from said imaging and identify this as a risk stemming from medical foundation model research.

Non-foundation models outperform foundation models at “re-identification”

Most re-identification experiments performed by the authors are in retinal imaging and use a foundation model called RETFound². RETFound is used to obtain “features”, i.e. a numeric vector that describes the content of retinal images in a meaningful yet abstract way. One of the datasets they consider is openly available, namely the GRAPE dataset³ We replicate their experiments on this dataset, but replace RETFound with a very small convolutional neural network (CNN) – a 10-layer ResNet⁴ - that has never been trained on any retinal images. The experiment here involved no training either; the model is kept frozen. Furthermore, we conduct the same experiment using raw pixel values instead of features from a neural network. Our code is available here: https://gist.github.com/justinengelmann/63d5ad32cad4b57bb31627ded9111093.

The results are shown in Table 1. ResNet10t achieves substantially higher performance across the board. In other words, a CNN that was not trained on any retinal images allows for better re-identification than RETFound. This calls into question the titular thesis of Nebbia et al. that foundation models enable such re-identification. Furthermore, the very naïve approach of comparing pixel values directly achieves worse yet comparable performance to RETFound, and substantially better performance than random guessing. This suggests that the images of a given patient in the GRAPE dataset might simply be very similar to each other and re-identification is trivial in this scenario.

Table 1 Image matching performance on the openly available GRAPE dataset³

Full size table

Figure 1 shows some examples. The successful re-identification examples in Fig. 1 a are nearly identical to each other. So, it appears that on the GRAPE dataset, “re-identification” is not particularly complex. This is not surprising, as the follow-up interval in GRAPE is relatively short (mean 18 months, min 5, max 53), the images were quality controlled and are of the same fixation. In Fig. 1 b, we can see that erroneous matches are likewise visually similar, supporting the view that it is this superficial similarity that RETFound matches on.

Thus, we think that the lack of non-foundation model baselines in Nebbia et al. might have led to an interpretation of the results that does not hold up to closer examination, namely that foundation models are what enable the re-identification results they obtained.

Predictability of demographic features and “re-identification”

Nebbia et al.’s explanation for why foundation models enable re-identification is that they learn representations that encode demographic information. So, if – as we have shown – non-foundation models can outperform foundation models, what do we then make of the finding that demographic features were marginally better predicted in patients that could be re-identified?

Nebbia et al. argue that this indicates that their “hypothesis that demographic features prediction and re-identification are related was correct”. We would be more cautious. For instance, image quality could explain this difference just as well: Some fundus images are blurry or under-illuminated, making most or all of the retina hard to see. Thus, from such images, demographic features will be less well predicted as they contain less information. Poor quality images would also lead to poorer re-identification as e.g. all blurry images look similar to each other, yet a given patient likely has a mix of good and bad quality images which look very different from each other.

Finding similar images is not the same as “re-identification”

A secondary point we want to raise is that the experiments by Nebbia et al., strictly speaking, do not (re-)identify anyone. Instead, they show that given an image of an individual, it is possible, some of the time, to retrieve an image of the same individual from a larger pool of images. However, the identity of said individual remains unknown if it was not already known.

This distinction is not mere sophistry. In our opinion, the phrase “re-identification from medical imaging” is likely to be misunderstood, especially by a lay audience such as patients concerned about their privacy, and suggests that one could identify a particular person of interest (e.g. Pearse Keane) given an openly available dataset of medical images (e.g. retinal images). But that is not the case. One needs to already be in possession of a medical image of the same type of person. Only then can we attempt to find similar images in the dataset.

Consider what the “attacker” gains when they try to re-identify someone in a dataset of retinal images. To attempt their attack, they must already possess an image of the targeted individual. If they execute their attack successfully, they might identify additional images as well as associated metadata from the dataset. For retinal imaging datasets (e.g., GRAPE), this is typically age, sex, and information about ocular disease.

However, as Nebbia et al. also point out, age and sex are reasonably well-predicted from fundus images. Of course, fundus images further contain rich information about ocular health. Thus, prior to executing their attack, the attacker could already infer age, sex, and ocular health of the targeted individual from the fundus image they needed to have to attempt re-identification. It is then unclear what harm resulted from the re-identification attack.

In security research, the focus is on “threat models” – under what circumstances can an attacker with specific resources bring about specific harm. In the scenario that Nebbia et al. consider, the harm in question is that the attacker learns protected information about a target individual. But in order to do so, the attacker needs to have resources that allow them to infer said information even without any re-identification.

Conclusion

In summary, the view presented by Nebbia et al. that foundation models enable re-identification from medical imaging appears incompatible with the experimental results presented here. On the contrary, non-foundation models might allow better image matching. Finally, the responsible use of patient data in medical research is paramount and thus studying potential risks is important. However, data need to be carefully interpreted and properly contextualised. This is especially important as this is an area of particular interest to a lay audience. The re-identification scenario Nebbia et al. consider would require an attacker to already be in a position to infer much of the information that could be learned by re-identifying someone. We recommend that future work consider simple baseline approaches (e.g., matching on pixels) to understand whether image-matching results are non-trivial, and to carefully spell out the envisioned threat model, including what resources an attacker needs and what potential harm could result.

Data availability

Our code is available here: https://gist.github.com/justinengelmann/63d5ad32cad4b57bb31627ded9111093.

References

Nebbia, G. et al. Re-identification of patients from imaging features extracted by foundation models. npj Digit. Med. 8, 469, https://doi.org/10.1038/s41746-025-01801-0 (2025).
Article PubMed PubMed Central Google Scholar
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163, https://doi.org/10.1038/s41586-023-06555-x (2023).
Article CAS PubMed PubMed Central Google Scholar
Huang, X. et al. ‘GRAPE: A multi-modal dataset of longitudinal follow-up visual field and fundus images for glaucoma management’, Sci Data, 10 https://doi.org/10.1038/s41597-023-02424-4. (2023).
He, K., Zhang, X., Ren, S. & Sun, J. ‘Deep residual learning for image recognition’, in Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. (2016).

Download references

Acknowledgements

We thank Nebbia et al. for investigating this important topic and for making their code openly available. We further thank Huang et al. for making the GRAPE dataset openly available, which is a substantial contribution to progress in our field.

Author information

Authors and Affiliations

Institute of Ophthalmology, University College London, London, UK
Justin Engelmann, Siegfried K. Wagner, Yukun Zhou & Pearse A. Keane
Moorfields Eye Hospital NHS Foundation Trust, London, UK
Justin Engelmann, Siegfried K. Wagner, Yukun Zhou & Pearse A. Keane

Authors

Justin Engelmann
View author publications
Search author on:PubMed Google Scholar
Siegfried K. Wagner
View author publications
Search author on:PubMed Google Scholar
Yukun Zhou
View author publications
Search author on:PubMed Google Scholar
Pearse A. Keane
View author publications
Search author on:PubMed Google Scholar

Contributions

J.E. conceptualised and drafted the manuscript and conducted the experiments. Y.Z. checked the code and reproduced the experiment. Y.Z., S.K.W., and P.A.K. provided feedback on the manuscript and its content.

Corresponding author

Correspondence to Justin Engelmann.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Engelmann, J., Wagner, S.K., Zhou, Y. et al. Matters Arising: Near-identical images, not foundation models, explain purported re-identification of patients from medical imaging. npj Digit. Med. 9, 159 (2026). https://doi.org/10.1038/s41746-026-02440-9

Download citation

Received: 21 August 2025
Accepted: 05 February 2026
Published: 13 February 2026
Version of record: 13 February 2026
DOI: https://doi.org/10.1038/s41746-026-02440-9