Fig. 1
From: Domain adaptation of a SMILES chemical transformer to SELFIES with limited computational resources

Overview of the methodology for repurposing a SMILES-pretrained transformer to SELFIES. The workflow includes data collection from PubChem, SELFIES conversion, tokenization checks, domain adaptation via masked language modeling, embedding evaluation, and downstream fine-tuning on ESOL, FreeSolv, and Lipophilicity.