Table 1 Evaluation metrics used in the experiments.

From: Emerging opportunities of using large language models for translation between drug molecules and indications

Metric

Description

Values

Direction

BLEU46

Computes similarity as geometric mean of n-gram precisions scaled by brevity penalty

\([0, 1]\)

\(\uparrow\)

Exact

Represents whether the string match is exact (1) or not (0)

{0, 1}

\(\uparrow\)

Levenshtein47

Measures Levenshtein edit distance between two strings

\([0, \infty )\)

\(\downarrow\)

MACCS48,49

Computes Tanimoto similarity between two molecular MACCS fingerprints

\([0, 1]\)

\(\uparrow\)

RDK48,50

Computes Tanimoto similarity between two molecular RDK fingerprints

\([0, 1]\)

\(\uparrow\)

Morgan48,51

Computes Tanimoto similarity between two molecular Morgan fingerprints

\([0, 1]\)

\(\uparrow\)

FCD52

Measures distance between distributions of real-world and LLM-generated molecules

\([0, \infty )\)

\(\uparrow\)

Text2Mol53

Uses pretrained model to compute similarity between SMILES string and text

\([0, 1]\)

\(\uparrow\)

Validity

Represents whether the generated SMILES string is syntactically valid (1) or not (0)

\(\{0, 1\}\)

\(\uparrow\)

BLEU-246

Computes cumulative 2-gram BLEU score

\([0, 1]\)

\(\uparrow\)

BLEU-446

Computes cumulative 4-gram BLEU score

\([0, 1]\)

\(\uparrow\)

ROUGE-154,55

Measures overlap of unigrams between the candidate and reference strings

\([0, 1]\)

\(\uparrow\)

ROUGE-254,55

Measures overlap of bigrams between the candidate and reference strings

\([0, 1]\)

\(\uparrow\)

ROUGE-L54,56

Calculates similarity via Longest Common Subsequence (LCS) statistics

\([0, 1]\)

\(\uparrow\)

METEOR57

Computes similarity between two strings via weighted unigram F-score

\([0, 1]\)

\(\uparrow\)

Text2Mol53

Uses a pretrained model to compute similarity between two strings

\([0, 1]\)

\(\uparrow\)

  1. \(\uparrow\): higher values result in higher string similarity.\(\downarrow\): higher values result in lower string similarity.