Could transparent model cards with layered accessible information drive trust and safety in health AI?

Gilbert, Stephen; Adler, Rasmus; Holoyad, Taras; Weicken, Eva

doi:10.1038/s41746-025-01482-9

Download PDF

News & Views
Open access
Published: 25 February 2025

Could transparent model cards with layered accessible information drive trust and safety in health AI?

Stephen Gilbert¹,
Rasmus Adler²,
Taras Holoyad³ &
…
Eva Weicken⁴

npj Digital Medicine volume 8, Article number: 124 (2025) Cite this article

4544 Accesses
7 Citations
4 Altmetric
Metrics details

Subjects

We place ‘Model Cards’ and graphical ‘nutrition labels’ for health AI in context with the information needs of patients, health care providers and deployers. We discuss the applicability of Model Cards for General Purpose AI (GPAI) models. If these approaches are to be useful and safe they need to be integrated with regulatory approaches and linked to deeper layers of open and detailed model information and optimized through user testing.

Recently the Coalition for Health AI (CHAI) have proposed ‘Nutrition Label’-inspired ‘Model Cards’ for health AI¹. These have the purpose of providing a standard user-recognizable labelling template, which transparently presents key information for those needing information on the evaluation of AI solutions and their performance and safety. The information provided in the CHAI Model Card (Fig. 1) includes a description of the model, warnings, the intended use, and approaches for assessing and metrics on fairness and equity. The Model Card is primarily aimed at professionals reviewing AI models during health system procurement processes and for the vendors of electronic health records (EHR) systems (as an informational starting point), but is also aimed at patients, clinicians, health system data custodians, and developers¹. The CHAI proposal is a great idea as transparency through a familiar and standardized approach is much needed. What objections could exist for such an approach? Although we support many aspects of this proposal, its implementation will be key to avoid three well-known problems. The first problem is duplication and integration with other compulsory regulatory processes such as medical device labels. The second problem, common to many regulatory initiatives², is that governance structures (e.g., templates, databases, or standards) are designed without any or enough user testing and concept refinement. The third problem is a risk of superficial transparency claims, where information is skillfully communicated, but much in the manner of many marketing activities, it provides limited verifiably true information on model performance or ethical framing.

**Fig. 1: Layout of a ‘Model Card’ for an AI model.**

With respect to avoiding duplication and ensuring integration with regulatory approaches, it is particularly relevant to consider the international requirements for labelling and for the ‘Instructions for Use’ (IFU) of medical devices, as internationally AI-models performing medical purposes are regulated as medical devices². Existing regulations require that devices are prepared with both a label and an IFU (with a small number of exceptions for software devices if it can be demonstrated that they have use explanations through their interfaces³).

Medicines and medical device labelling in comparison to the Model Card

For medical devices, including AI models, laws specify many requirements for the IFU but do not standardize the layout of this information. Although these documents provide much useful information, they do not standardize the layout of capabilities, limitations, safety, performance, validation, or on other parameters as the Model Card does. Patients wanting to know about the suitability of a medical AI, doctors prescribing medical AI, or health system buyers and information system implementation engineers looking to locally implement AI tools in their hospitals would find it hard to extract information for side-by-side comparison from current IFUs.

In the US, UK and EU, there is already a legal requirement for a standardized compulsory medical device label (and therefore, by extension, an AI Model when these are used for a medical purpose). However, this provides highly technical information such as lot and serial numbers, the names of assessing certifying bodies (Notified Bodies) and importers (Fig. 2). It does not provide the tangible information a patient is likely to want, or the information critical to know for AI system introduction and implementation.

**Fig. 2: The medical device label required by medical device laws for AI models intended to be used for medical purposes.**

We agree that the more standard layout of information for users in the Model Card, with which they could quickly become familiar, would likely assist them in comparing different AI models. The Model Card proposed by CHAI¹ (Fig. 1), does not include all the information required on a medical device label (Fig. 2), but these two concepts could be provided together or merged, with the latter approach being superior as avoiding repeated data.

Accessibility and comprehensibility for different users

AI development and validation uses a vast array of technical jargon, much of which is not readily understandable to the layperson, or even to health care providers (HCPs) unless they have had advanced technical training^4,5. An AI label is of little practical use if it conveys information but does not truly communicate understanding. For many patients, and some HCPs, the information in both the proposed Model Card and the medical device label will be minimally comprehensible. Arguably neither approach has been developed with these user groups to the forefront. Is it important that labelling and product information is comprehensible to patients? It is required that IFUs undergo usability testing, where the ability of users (including patient users, if applicable) to follow critical instructions is assessed³. Explaining technical information in an accessible manner to users and the public relies upon simplicity and uniformity. We advocate for much greater standardization of information supplied to users, with the presentation of this information in highly usable formats, that look the same for every product and use standardized layouts for including, ordering and ranking of information. When users are faced with highly detailed information that has a format that differs for every new product that they encounter, this can lead to information overlap and disengagement, irrespective of how much effort has been taken to present information well. The CHAI Model Card delivers a highly standardized format, but interestingly it is entirely text-based, with simple titles in repeated tabular elements¹. Under headings, it provides boxes for text-based model descriptions, but it is unclear how long these responses can be and stipulates links should be provided to the validation process and its justification.

Other authors have proposed graphical approaches to provide information to consumers, including for AI, based on food nutritional quality labelling, food origin labelling and energy efficiency labelling^6,7,8 (Fig. 3). These approaches focus on the labelling of the ethics of AI models, but the domains described overlap the CHAI model card (e.g.^6,7,8, and¹ both address the domains of fairness and equity). Consumers are used to simple coloured rating scales, which rapidly convey complex information that allows consumers to make fast decisions. Is this approach applicable to AI in the health domain? If such approaches were to be used to convey critical safety information for users, they may not be applicable, and they should not be used as a substitute to inherent safe design. They may however be applicable to convey information about the ethical sourcing of data for model training, ethical employment practices in model training, and responsible approaches to data privacy, antidiscrimination, and intellectual property.

**Fig. 3: A representation of a color-scale ‘Nutrition Label’ for AI ethics enabling rapid communication of information.**

Layered verifiable information to avoid the label serving as a conduit for false claims

The AI Model Card has the beauty of simplicity, but beauty is not skin deep. In order to be both useful and safe, it is critical that the Model Card (and if used, the ‘Nutrition Label’ for patients) is linked to verifiable and refreshed data on model safety and performance. Any label or Model Card is only as good as the verifiable deeper meaning of the information it summarizes.

The CHAI ‘Model Card’ already has a degree of information layering through the way in which it orders information, starting from goals to results, then to methods, followed by links to external sources for justifications and more detailed methods descriptions. The layering of information, starting from higher level principles, with successive linking through to more complex and detailed information, is a longstanding approach for structuring information, increasingly used in digital information tools, e.g. in wikis. As we set out above, layered information serves two primary purposes (Fig. 4). Firstly, it allows the high-level summaries to be simple and accessible to many users (including some patients and non-tech-savvy HCPs), while allowing the more curious user, depending on the detail they need, to look at successively deeper layers of information. This serves the second purpose, which is to have the credibility of information checked by users, who can at least to some degree verify the compatibility of links between summaries and data. This should be assisted by appropriate mechanisms to allow reporting of problems⁹.

**Fig. 4: Information layering is critical to accessibility and to genuine (rather than sham) transparency.**

Is an accessible layered ‘Model Card’ applicable to general-purpose AI

There is increasing development and direct and indirect use of general-purpose AI (GPAI) models in health and these models challenge the approaches that were developed for the oversight and regulation of specifically developed (and generally narrow scope) AI-enabled medical devices^10,11. The EU AI Act sets out a series of requirements for GPAI providers^12,13. The developer of the GPAI models must make a large amount of information available to the downstream providers of AI systems. When developers of a GPAI model directly apply the system in health or make it available for medical purposes then they become providers of high-risk AI systems. The delivery of meaningful transparency and the exchange of useful information between GPAI developers and downstream medical device ‘manufacturers’ calls for common approaches on transparency and model testing, which are best delivered through standardized approaches^14,15. This raises the question, should Model Cards be applicable only to downstream medical device products (e.g., approved clinical decision support systems) that have defined (even when broad) intended purposes and target populations and clinical indications, or should Model Cards also be applied for underlying GPAI models, where these are intended to be the foundations of later approved medical device products? In other words, could Model Cards be used to describe the basic claims made by providers of GPAI models? Downstream providers of healthcare AI systems must receive an extensive range of information from GPAI providers, so that they can satisfy the information requirements of the AI Act, and where needed, pass this information on to deployers and users. The high-level Model Card does not directly include all this information, but as described in Fig. 4, the concept could readily be linked to this information, serving as a starting point, or a form of directory for more detailed model information. Once GPAI models providers make a Model Card available for their GPAI, would these models then be used directly by HCPs and patients (even where the Model Cards and laws state that they should not), bypassing developed, fine-tuned and approved downstream medical device products? This is likely to occur to a degree if these models are available to downstream users¹¹ and it will require strong engagement with users and strict enforcement action if regulators aim to stop it.

Balancing transparency with practical implementation

The CHAI Model Card summarizes this critical model information through a standardized table. Do we propose changing the simple Model Card into a ‘bureaucratic monster’?

We argue that the utility of the ‘Model Card’ in the hands of its users must be measured through representative quantitative and qualitative assessment, accounting for cultural and international differences. As far as we can determine, the ‘Model Card’ has not been systematically tested, and this should be done before widespread uptake. It is also important that the approach gains regulatory acceptance, otherwise it is difficult to see it being sustainable.

A yet harder challenge is ensuring that the information provided in the card is reliable and genuinely transparent. If not well audited, the Model Card could turn out to be highly accessible to users, but with underlying information that is predominantly deceptive marketing spin rather than truly reliable transparent information. This is a clear and present danger for AI nutrition labels and Model Cards that must be avoided to maintain public safety and to avoid the erosion of public trust in health AI. This is not scaremongering - in September 2024 a US Attorney General investigation was settled after a company deployed a GPAI medical documentation and summarization tool at several Texas hospitals after making a series of false and misleading statements about the accuracy and safety of its products¹⁶. The ‘Model Card’ could either be a system that serves safety and the ethical use of AI, or it could turn into a charade, with companies in a race to the bottom of true transparency and top of claimed performance, as each takes steps to outdo the last based on percentages massaged for the better marketing messages but unsupported by any rigorous data or by any independent audit under the eyes of regulators or the public. Health AI needs innovative concepts like the CHAI Model Card, and even graphical ‘nutrition labels’ for patients, but it also deserves to have these well-validated, integrated with regulatory labelling, and most importantly, containing verified auditable and open information.

References

CHAI Advances Assurance Lab Certification and ‘Nutrition Label’ for Health AI. https://chai.org/chaiadvances-assurance-lab-certification-and-nutrition-label-for-health-ai/ (2024).
Gilbert, S. et al. Learning from experience and finding the right balance in the governance of artificial intelligence and digital health technologies. J. Med. Internet Res. 25, e43682 (2023).
Article PubMed PubMed Central Google Scholar
International Organization for Standardization. IEC 62366-1:2015 Medical devices — Part 1: Application of usability engineering to medical devices. https://www.iso.org/standard/63179.html (2015).
Cetindamar, D. et al. Explicating AI literacy of employees at digital workplaces. IEEE Trans. Eng. Manag. 71, 810–823 (2024).
Article Google Scholar
IMDRF AIMD Working Group. Machine Learning-enabled Medical Devices: Key Terms and Definitions. https://www.imdrf.org/documents/machine-learning-enabled-medical-devices-key-terms-and-definitions (2022).
Stiftung, B., Hustedt, C. & Hallensleben, S. From principles to practice: How can we make AI ethics measurable. (2020).
VDE. Franco-German alliance develops label for trustworthy artificial intelligence (AI). https://www.vde.com/en/press/press-releases/deutsch-franzoesisches-ki-label (2022).
VDE. VCIO-based description of systems for AI trustworthiness characterisation VDE SPEC 90012 V1.0 (en). https://www.vde.com/resource/blob/2242194/a24b13db01773747e6b7bba4ce20ea60/vcio-based-description-ofsystems-for-ai-trustworthiness-characterisationvde-spec-90012-v1-0--en--data.pdf (2022).
Mathias, R. et al. Safe AI-enabled digital health technologies need built-in open feedback. Nat. Med. https://doi.org/10.1038/s41591-024-03397-6 (2025).
Gilbert, S., Harvey, H., Melvin, T., Vollebregt, E. & Wicks, P. Large language model AI chatbots require approval as medical devices. Nat. Med. 1–3 (2023).
Gilbert, S. & Kather, J. N. Guardrails for the use of generalist AI in cancer care. Nat. Rev. Cancer 24, 357–358 (2024).
Article CAS PubMed Google Scholar
European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence. (2024).
Gilbert, S. The EU passes the AI Act and its implications for digital medicine are unclear. NPJ Digital Med. 7, 135 (2024).
Article Google Scholar
IMDRF Standards Working Group. Optimizing Standards for Regulatory Use. https://www.imdrf.org/sites/default/files/docs/imdrf/final/technical/imdrf-tech-181105-optimizing-standards-n51.pdf (2018).
World Health Organization. Regulatory considerations on artificial intelligence for health. (2023).
Paxton, K. Attorney General Ken Paxton Reaches Settlement in First-of-its-Kind Healthcare Generative AI Investigation https://www.texasattorneygeneral.gov/news/releases/attorney-general-ken-paxton-reaches-settlement-first-its-kind-healthcare-generative-ai-investigation. (2024).
International Organization for Standardization. ISO 15223-1:2021 Medical devices — Symbols to be used with information to be supplied by the manufacturer Part 1: General requirements. (2021).
US FDA. UDI Basics, https://www.fda.gov/medical-devices/unique-device-identification-system-udi-system/udi-basics.

Download references

Acknowledgements

The cooperative work on this paper resulted from interactions through a stakeholder workshop on the human-centered design of AI-based systems in healthcare hosted by the Federal Institute for Occupational Safety and Health (Bundesanstalt für Arbeitsschutz und Arbeitsmedizin). This work was supported by the European Commission under the Horizon Europe Program, as part of project ASSESS-DHT (101137347) via funding to S.G.

Author information

Authors and Affiliations

Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
Stephen Gilbert
Fraunhofer Institute for Experimental Software Engineering (IESE), Kaiserslautern, Germany
Rasmus Adler
Federal Network Agency for Electricity, Gas, Telecommunications and Railways, Mainz, Germany
Taras Holoyad
Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, HHI, Berlin, Germany
Eva Weicken

Authors

Stephen Gilbert
View author publications
Search author on:PubMed Google Scholar
Rasmus Adler
View author publications
Search author on:PubMed Google Scholar
Taras Holoyad
View author publications
Search author on:PubMed Google Scholar
Eva Weicken
View author publications
Search author on:PubMed Google Scholar

Contributions

S.G., R.A., T.H. and E.W. developed the concept of the manuscript. S.G. wrote the first draft of the manuscript. R.A., T.H. and E.W. contributed to the writing, interpretation of the content, and editing of the manuscript, revising it critically for important intellectual content. S.G., R.A., T.H. and E.W. have read and approved the completed version. S.G., R.A., T.H. and E.W. take accountability for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Stephen Gilbert.

Ethics declarations

Competing interests

R.A., T.H. and E.W. declare no nonfinancial interests and no competing financial interests. S.G. declares a nonfinancial interest as an Advisory Group member of the EY-coordinated “Study on Regulatory Governance and Innovation in the field of Medical Devices” conducted on behalf of the DG SANTE of the European Commission. S.G. declares the following competing financial interests: he has or has had consulting relationships with Una Health GmbH, Lindus Health Ltd., Flo Ltd, ICURA ApS, Rock Health Inc., Thymia Ltd., FORUM Institut für Management GmbH, High-Tech Gründerfonds Management GmbH, DG SANTE, Prova Health Ltd, Haleon plc and Ada Health GmbH and holds share options in Ada Health GmbH. S.G. is a News and Views Editor for npj Digital Medicine. S.G. played no role in the internal review or decision to publish this News and Views article.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gilbert, S., Adler, R., Holoyad, T. et al. Could transparent model cards with layered accessible information drive trust and safety in health AI?. npj Digit. Med. 8, 124 (2025). https://doi.org/10.1038/s41746-025-01482-9

Download citation

Received: 11 December 2024
Accepted: 27 January 2025
Published: 25 February 2025
DOI: https://doi.org/10.1038/s41746-025-01482-9