Table 3 The ablation study of the MoleculeFormer model for 4 different datasets using RMSE as the evaluation metric

From: MoleculeFormer is a GCN-transformer architecture for molecular property prediction

RMSE

GCN

Atom

Atom + Bond

FP

Atom + Bond + FP

MoleculeFormer

HLM

0.571+/− 0.020

0.466+/− 0.017

0.468+/− 0.012

0.472+/−0.018

0.464+/− 0.021

0.462+/−0.021

MDR1-MDCK ER

0.640+/− 0.027

0.465+/− 0.028

0.466+/−0.018

0.463+/−0.031

0.465+/−0.018

0.453+−0.013

RLM

0.661+/− 0.023

0.553+/− 0.021

0.546+/− 0.013

0.545+/−0.025

0.562+/−0.018

0.510+−0.042

SOLUBILITY

0.636+/−0.028

0.563+/− 0.047

0.561+/− 0.038

0.562+/−0.043

0.555+/−0.015

0.556+/−0.045

Avgrage

0.627+/− 0.025

0.512+/− 0.028

0.510+/− 0.020

0.511+/− 0.029

0.511+/− 0.018

0.495+/−0.030

  1. HLM: Logarithmic data of the intrinsic clearance (CLint) of human liver microsomes (HLM). MDR1-MDCK ER: Logarithmic data of the efflux ratio (ER) in the MDR1-MDCK cell model. RLM: Logarithmic data of the intrinsic clearance (CLint) of rat liver microsomes (RLM). SOLUBILITY: Logarithmic data of drug solubility at pH 6.8. In each ablation experiment, 10 identical random split seeds were selected from the dataset. This ensured that each model was provided with the same 10 sets of training, validation, and test datasets for experimentation. The ratio of the training, validation, and test datasets was set at 8:1:1.