Table 3 Benchmarking workflow, steps and interconnections with roles

From: Federated benchmarking of medical artificial intelligence with MedPerf

Workflow step

Objective

1

Define and register benchmark

The benchmarking process starts with establishing a benchmark committee of healthcare stakeholders: healthcare organizations, clinical experts, AI researchers and patient advocacy groups.

Benchmark committee identifies a clinical problem for which an effective AI-based solution can have a substantial clinical impact.

Benchmark committee registers the benchmark on the platform and provides the benchmark assets (see ‘MedPerf Benchmarks’).

2

Recruit data owners

Benchmark committee recruits data and model owners either by inviting trusted parties or by making an open call for participation.

Dataset owners are recruited to maximize aggregate dataset size and diversity on a global scale. Many benchmarking efforts may initially focus on data providers with existing agreements.

Prepare and register datasets

In coordination with the benchmark committee, dataset owners are responsible for data preparation (that is, extraction, preprocessing, labelling, reviewing for legal/ethical compliance).

Once the data are prepared and approved by the data owner, the dataset can be registered with the benchmarking platform.

3

Recruit model owners

Model owners modify the benchmark reference implementation. To enable consistent execution on data owner systems, the solutions are packaged inside of MLCube containers.

Model owners must conduct appropriate legal and ethical review before submission of a solution for evaluation.

Prepare and register models

Once implemented by the model owner and approved by the benchmark committee, the model can be registered on the platform.

4

Execute benchmarks

Once the benchmark, dataset and models are registered to the benchmarking platform, the platform notifies the data owners that models are available for benchmarking.

The data owner runs a benchmarking client that downloads available models, reviews and approves models for safety, and then approves execution.

Once execution is completed, the data owner reviews and approves upload of the results to the benchmark platform.

5

Release results

Benchmark results are aggregated by the benchmarking platform and shared per the policy specified by the benchmark committee, following data owners’ approval.