Fig. 1: Data generation pipeline. | Nature Reviews Chemistry

Fig. 1: Data generation pipeline.

From: Protein–ligand data at scale to support machine learning

Fig. 1

The workflow for generating data and binders. (1) Purified proteins are produced in experimental hubs, by partners or by the community. (2) Proteins are screened in project screening laboratories, and data are experimentally annotated in partner laboratories or experimental hubs. (3) Quality-controlled (QC) screening data are deposited into the AIRCHECK database. (4) Computational experts in the project and the community build machine learning (ML)/artificial intelligence (AI) models and make predictions about new or improved binders. (5) Predicted compounds are procured and tested in experimental hubs. (6) The QC’d assay data, including hits and binding data, are deposited into the Artificial Intelligence-Ready CHEmiCal Knowledge base (AIRCHECK). (7) Hits and data are released to the community, freely available for further research and development.

Back to article page