Table 1 Comparison of classification accuracy (AUC-ROC) between fine-tuned MLM-FG and existing pre-trained/self-supervised baselines on multiple classification benchmarks

	BBBP	BACE	ClinTox	Tox21	SIDER	HIV	MUV
No. molecules	2039	1513	1478	7831	1427	41,127	93,087
No. prediction tasks	1	1	2	12	27	1	17
Pre-trained models from existing literature
MolCLR-gin	0.9307	0.7873	0.8005	0.7644	0.5826	0.7768	0.7386
MolCLR-gcn	0.8432	0.7194	0.7997	0.7179	0.5353	0.7616	0.6701
GROVER-base	0.9022	0.7700	0.6847	0.7187	0.5579	0.6950	0.6265
GROVER-large	0.8861	0.7795	0.6082	0.7155	0.5283	0.6956	0.5132
GEM	0.9103	0.8603	0.8506	0.7791	0.6279	0.7500	0.7253
MoLFormer	0.9037	0.8275	0.9451	0.7734	0.5826	0.7630	0.7599
MoLFormer and RoBERTa models without pre-training
MoLFormer (from scratch)	0.8636	0.7728	0.7317	0.7461	0.5667	0.6991	0.6863
RoBERTa (from scratch)	0.8711	0.7445	0.8858	0.7369	0.5285	0.5575	0.6674
RoBERTa models pre-trained by random subsequence masking
RoBERTa (10M, rand. subseq)	0.8572	0.8253	0.9284	0.7533	0.6111	0.7006	0.6234
RoBERTa (20M, rand. subseq)	0.9068	0.8135	0.9011	0.7635	0.5799	0.7477	0.6481
RoBERTa (100M, rand. subseq)	0.9048	0.8248	0.9167	0.7852	0.5860	0.7683	0.6909
MoLFormer and RoBERTa models pre-trained by MLM-FG
MLM-FG (MoLFormer, 10M)	0.8980	0.8044	0.9669	0.7765	0.5811	0.7633	0.6829
MLM-FG (MoLFormer, 20M)	0.8976	0.8088	0.9436	0.7793	0.5992	0.7801	0.7185
MLM-FG (MoLFormer, 100M)	0.9055	0.8040	0.9270	0.7893	0.5786	0.7690	0.6017
MLM-FG (RoBERTa, 10M)	0.8870	0.8265	0.9258	0.7545	0.6054	0.7106	0.6103
MLM-FG (RoBERTa, 20M)	0.9378	0.8458	0.8919	0.7603	0.5908	0.7594	0.6428
MLM-FG (RoBERTa, 100M)	0.9237	0.7981	0.9606	0.7896	0.6042	0.7807	0.7990

Quick links

Search