Fig. 2: Conditional generation with SmileyLlama for fragment growth and before and after DPO compared with ChEMBL.
From: SmileyLlama: modifying large language models for directed chemical space exploration

a, Example molecules generated by growing from one of the Enamine substructures and to satisfy Lipinski’s rule-of-five using the prompt ‘Output a SMILES string for a drug like molecule with the following properties: a substructure of O=C(O)c1ccc(C(F)(F)F)cc1, < = 500 MW, < = 5 logP, < = 5 H-bond donors, < = 10 H-bond acceptors’. b, Distribution of four properties satisfying Lipinski’s rule-of-five comparing ChEMBL molecules (orange) with molecules generated by SmileyLlama (blue) with the prompt ‘Output a SMILES string for a drug like molecule with the following properties: < = 5 H-bond donors, < = 10 H-bond acceptors, < = 500 MW, < = 5 logP’, compared with 1,000 molecules generated by SmileyLlama with the same prompt after DPO (gray). MW and logP distributions were estimated using a Gaussian kernel density estimator (KDE). All results generated 1,000 molecules at a temperature T = 1.0 and a maximum of 128 new tokens.