Table 1 Evaluation metrics used in the experiments.
Metric | Description | Values | Direction |
|---|---|---|---|
BLEU46 | Computes similarity as geometric mean of n-gram precisions scaled by brevity penalty | \([0, 1]\) | \(\uparrow\) |
Exact | Represents whether the string match is exact (1) or not (0) | {0, 1} | \(\uparrow\) |
Levenshtein47 | Measures Levenshtein edit distance between two strings | \([0, \infty )\) | \(\downarrow\) |
Computes Tanimoto similarity between two molecular MACCS fingerprints | \([0, 1]\) | \(\uparrow\) | |
Computes Tanimoto similarity between two molecular RDK fingerprints | \([0, 1]\) | \(\uparrow\) | |
Computes Tanimoto similarity between two molecular Morgan fingerprints | \([0, 1]\) | \(\uparrow\) | |
FCD52 | Measures distance between distributions of real-world and LLM-generated molecules | \([0, \infty )\) | \(\uparrow\) |
Text2Mol53 | Uses pretrained model to compute similarity between SMILES string and text | \([0, 1]\) | \(\uparrow\) |
Validity | Represents whether the generated SMILES string is syntactically valid (1) or not (0) | \(\{0, 1\}\) | \(\uparrow\) |
BLEU-246 | Computes cumulative 2-gram BLEU score | \([0, 1]\) | \(\uparrow\) |
BLEU-446 | Computes cumulative 4-gram BLEU score | \([0, 1]\) | \(\uparrow\) |
Measures overlap of unigrams between the candidate and reference strings | \([0, 1]\) | \(\uparrow\) | |
Measures overlap of bigrams between the candidate and reference strings | \([0, 1]\) | \(\uparrow\) | |
Calculates similarity via Longest Common Subsequence (LCS) statistics | \([0, 1]\) | \(\uparrow\) | |
METEOR57 | Computes similarity between two strings via weighted unigram F-score | \([0, 1]\) | \(\uparrow\) |
Text2Mol53 | Uses a pretrained model to compute similarity between two strings | \([0, 1]\) | \(\uparrow\) |