Data-driven computational methods have demonstrated promising potential in predicting compound activities from chemical structures, however, unbiased practical applications remain challenging due to the lack of proper benchmarking methods. Here, the authors develop a benchmark termed CARA to eliminate the biases in current compound activity data by using new train-test splitting schemes and evaluation metrics, revealing accurate and informative model performances.
- Tingzhong Tian
- Shuya Li
- Jianyang Zeng