Table 2 Tests for which we compare different score functions (score12, talaris2013, talaris2014, ref2015, ligand, betaNov16, mpframework, ref2015mem, and franklin2019), complete with quality measures, number of targets in each benchmark, number of models created (nstruct) and runtime in CPU hours per score function.

From: Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

Test suite

Tests

score12

ligand

mpframework

talaris13

talaris14

ref2015

ref2015mem

betaNov16

franklin2019

Quality measures

Targets

nstruct

Runtime in CPUh

Docking

docking

x

  

x

x

x

   

I_sc vs. I_rmsd

10

1000

150

Design

design_fast

x

  

x

x

x

   

Score vs. seqrec

48

100

2600

Loop modeling

loop_modeling_CCD

x

  

x

x

x

   

Score vs. loop_rmsd

7

500

500

loop_modeling_KIC

x

  

x

x

x

   

Score vs. loop_rmsd

7

500

620

loop_modeling_KIC_fragments

x

  

x

x

x

   

Score vs. loop_rmsd

7

500

760

loop_modeling_NGK

x

  

x

x

x

   

Score vs. loop_rmsd

7

500

570

Refinement

relax_fast

x

  

x

x

x

   

Score vs. rmsd

12

100

120

relax_fast5

x

  

x

x

x

   

Score vs. rmsd

12

100

120

relax_cart

x

  

x

x

x

   

Score vs. rmsd

12

100

120

Ligand docking

ligand_docking

 

x

  

x

x

 

x

 

Delta_Isc vs. ligand_rmsd

50

200

2000

Membrane proteins

mp_ddg (ddG of mutation)

  

x

  

x

x

 

x

Pearson correlation

3

50

1800

  1. The ligand docking and membrane ddG applications require specialized score functions.