Table 2 The STARD-AI checklist
From: The STARD-AI reporting guideline for diagnostic accuracy studies using artificial intelligence
Section and topic | No. | STARD-AI item |
|---|---|---|
Title or abstract | ||
1† | Identification as a study reporting AI-centered diagnostic accuracy and reporting at least one measure of accuracy within title or abstract | |
Abstract | ||
2 | Structured summary of study design, methods, results and conclusions (for specific guidance, see STARD for Abstracts) | |
Introduction | ||
3† | Scientific and clinical background, including the intended use of the index test, whether it is novel or an established index test and its integration into an existing or new workflow, if applicable | |
4 | Study objectives and hypotheses | |
Methods | ||
Study design | 5 | Whether data collection was planned before the index test and reference standard were performed (prospective study) or after (retrospective study) |
Ethics | 6* | Formal approval from an ethics committee. If not required, justify why. |
Participants | 7† | Eligibility criteria: listing separate inclusion and exclusion criteria in the order that they are applied at both participant level and data level |
8 | On what basis potentially eligible participants were identified (such as symptoms, results from previous tests and inclusion in registry) | |
9 | Where and when potentially eligible participants were identified (setting, location and dates) | |
10 | Whether participants formed a consecutive, random or convenience series | |
Dataset | 11* | Source of the data and whether they have been routinely collected, specifically collected for the purpose of the study or acquired from an open-source repository |
12* | Who undertook the annotations for the dataset (including experience levels and background) and how (within the same clinical context or in a post hoc fashion), if applicable | |
13* | Devices (manufacturer and model) that were used to capture data; software (with version number) used to engineer the index test, highlighting the intended use | |
14* | Data acquisition protocols (for example, contrast protocol or reconstruction method for medical images) and details of data preprocessing, in sufficient detail to allow replication | |
Test methods | 15a | Index test, in sufficient detail to allow replication |
15b* | How the index test was developed, including any training, validation, testing and external evaluation, detailing sample sizes, when applicable | |
15c | Definition of and rationale for test positivity cutoffs or result categories of the index test, distinguishing prespecified from exploratory | |
15d* | The specified end-user of the index test and the level of expertise required of users | |
16a | Reference standard, in sufficient detail to allow replication | |
16b | Rationale for choosing the reference standard (if alternatives exist) | |
16c | Definition of and rationale for test positivity cutoffs or result categories of the reference standard, distinguishing prespecified from exploratory | |
17a | Whether clinical information and reference standard results were available to the performers or readers of the index test | |
17b | Whether clinical information and index test results were available to the assessors of the reference standard | |
Analysis | 18 | Methods for estimating or comparing measures of diagnostic accuracy |
19 | How indeterminate index test or reference standard results were handled | |
20 | How missing data on the index test and reference standard were handled | |
21 | Any analyses of variability in diagnostic accuracy, distinguishing prespecified from exploratory | |
22 | Intended sample size and how it was determined | |
23* | Details of any performance error analysis and algorithmic bias and fairness assessments, if undertaken | |
Results | ||
Participants and dataset | 24 | Flow of participants, using a diagram |
25† | Baseline demographic, clinical and technical characteristics of training, validation and test sets, if applicable | |
26a | Distribution of severity of disease in those with the target condition | |
26b | Distribution of alternative diagnoses in those without the target condition | |
27 | Time interval and any clinical interventions between index test and reference standard | |
28* | Whether the datasets represent the distribution of the target condition that one would expect from the intended use population | |
29* | For external evaluation on an independent dataset, an assessment of how this differs from the training, validation and test sets | |
Test results | 30 | Cross-tabulation of the index test results (or their distribution) by the results of the reference standard |
31 | Estimates of diagnostic accuracy and their precision (such as 95% confidence intervals) | |
32 | Any adverse events from performing the index test or the reference standard | |
Discussion | ||
33 | Study limitations, including sources of potential bias, statistical uncertainty and generalizability | |
34 | Implications for practice, including the intended use and clinical role of the index test | |
35* | Ethical considerations and adherence to ethical standards associated with the use of the index test and issues of fairness | |
Other information | ||
36 | Registration number and name of registry | |
37 | Where the full study protocol can be accessed | |
38 | Sources of funding and other support; role of funders | |
39* | Commercial interests, if applicable | |
40a* | Availability of datasets and code, detailing any restrictions on their reuse and repurposing | |
40b* | Whether outputs are stored, auditable and available for evaluation, if necessary | |