Fig. 3: Schematic representation of DEL screen output data and ML/AI workflow. | Nature Reviews Chemistry

Fig. 3: Schematic representation of DEL screen output data and ML/AI workflow.

From: Protein–ligand data at scale to support machine learning

Fig. 3

Affinity-mediated selection of DNA-encoded chemical library (DEL) members leads to the enrichment of potential binders. Deep sequencing is used to identify the DNA barcodes of enriched and unenriched DEL members. Output data are subsequently translated into chemical structures and their corresponding chemical fingerprints. Both positive (enriched) and negative (not enriched) DEL members are included in open machine learning (ML)-ready datasets. The datasets are used to train ML models that are in turn used to recognize and nominate potential small-molecule binders from ultralarge chemical libraries. These compounds are procured, and their binding is tested experimentally in biophysical and/or biochemical assays. All generated data (ML-ready datasets, including chemical structures and/or their corresponding fingerprints, ML models and ligand validation data) are made public in a purpose-built, cloud-based storage system called Artificial Intelligence-Ready CHEmiCal Knowledge base (AIRCHECK). AI, artificial intelligence.

Back to article page