Table 2 Sample benchmarking challenges
From: Protein–ligand data at scale to support machine learning
Data | Challenge | Experimental validation |
---|---|---|
SMILES and/or fingerprint and enrichment metrics of DNA-encoded chemical library (DEL) screening hits and negatives from 4-10B compound library | Train machine learning (ML)/artificial intelligence (AI) models on DEL screening data and use them to predict actives from billions of commercial compounds | Procure and test predicted hits with two orthogonal assays |
300k affinity-selection mass spectrometry (AS–MS) compound library (SMILES) and protein target | Predict true and false positives | Compare predictions with screening results, annotated with orthogonal assays |
AS–MS screening and orthogonal hit confirmation data for 80% of a 300k compound library | Challenge 1: predict confirmed hits for the remaining 20% hold-out set Challenge 2: if successful, predict novel hits from commercial libraries | For challenge 1: unblind existing data from the hold-out set For challenge 2: procure and test predicted hits with two orthogonal assays |
300k AS–MS compound library (SMILES), protein target and annotated screening results (including orthogonal hit verification) | Challenge 1: use target-based and/or receptor-based virtual screening to predict experimental hits Challenge 2: if successful, predict novel hits from commercial libraries | Challenge 1: unblind existing data Challenge 2: procure and test predicted hits with two orthogonal assays |
SMILES and/or fingerprint and enrichment metrics of DNA-encoded chemical library (DEL) screening hits and negatives from 4-10B compound libraries against hundreds of targets | Build a foundation model to predict hits from commercial libraries for targets absent from the training set | Procure and test predicted hits with two orthogonal assays |
AS–MS screening and orthogonal confirmation data for 80% of >1,000 targets | Predict hits for homologous and/or unrelated targets | Procure and test predicted hits with two orthogonal assays |