Table 3 Benchmarking workflow, steps and interconnections with roles
From: Federated benchmarking of medical artificial intelligence with MedPerf
Workflow step | Objective | |
|---|---|---|
1 | Define and register benchmark | ● The benchmarking process starts with establishing a benchmark committee of healthcare stakeholders: healthcare organizations, clinical experts, AI researchers and patient advocacy groups. ● Benchmark committee identifies a clinical problem for which an effective AI-based solution can have a substantial clinical impact. ● Benchmark committee registers the benchmark on the platform and provides the benchmark assets (see ‘MedPerf Benchmarks’). |
2 | Recruit data owners | ● Benchmark committee recruits data and model owners either by inviting trusted parties or by making an open call for participation. ● Dataset owners are recruited to maximize aggregate dataset size and diversity on a global scale. Many benchmarking efforts may initially focus on data providers with existing agreements. |
Prepare and register datasets | ● In coordination with the benchmark committee, dataset owners are responsible for data preparation (that is, extraction, preprocessing, labelling, reviewing for legal/ethical compliance). ● Once the data are prepared and approved by the data owner, the dataset can be registered with the benchmarking platform. | |
3 | Recruit model owners | ● Model owners modify the benchmark reference implementation. To enable consistent execution on data owner systems, the solutions are packaged inside of MLCube containers. ● Model owners must conduct appropriate legal and ethical review before submission of a solution for evaluation. |
Prepare and register models | ● Once implemented by the model owner and approved by the benchmark committee, the model can be registered on the platform. | |
4 | Execute benchmarks | ● Once the benchmark, dataset and models are registered to the benchmarking platform, the platform notifies the data owners that models are available for benchmarking. ● The data owner runs a benchmarking client that downloads available models, reviews and approves models for safety, and then approves execution. ● Once execution is completed, the data owner reviews and approves upload of the results to the benchmark platform. |
5 | Release results | ● Benchmark results are aggregated by the benchmarking platform and shared per the policy specified by the benchmark committee, following data owners’ approval. |