Table 4 Predicted odor profiles of fragrance standards, cosmetic raw materials, and dipeptides using structure-based classifiers

From: A comparative study of machine learning models on molecular fingerprints for odor decoding

Category

CAS number

Compound name

Predicted label 1

Predicted label 2

Predicted label 3

Fragrance

78-70-6

Linalool

LAVENDER (0.999, ST-LGBM)

PETITGRAIN (0.998, ST-LGBM)

FLORAL (0.980, ST-RF)

5989-27-5

Limonene

TERPENIC (0.988, ST-LGBM)

PINE (0.963, ST-LGBM)

CITRUS (0.918, ST-LGBM)

60-12-8

Phenyl ethyl alcohol

HONEY (0.986, ST-LGBM)

ROSE (0.966, ST-LGBM)

FLORAL (0.940, ST-RF)

Cosmetic raw material

6309-51-9

Isoamyl laurate (skin-conditioning agents)

BRANDY (0.983, ST-LGBM)

APRICOT (0.971, ST-LGBM)

ALCOHOLIC (0.950, ST-LGBM)

70-18-8

Glutathione (skin brightening)

YEAST (0.999, ST-LGBM)

ODORLESS (0.859, ST-LGBM)

MILD (0.419, ST-LGBM)

111-01-3

Squalane (skin-conditioning agents)

FISHY (0.983, ST-LGBM)

WAXY (0.859, ST-LGBM)

CITRUS (0.854, ST-LGBM)

Dipeptide

114659-59-5

Gln-Met

MEATY (0.892, ST-LGBM)

SAVORY (0.689, ST-XGB)

CHEESY (0.647, ST-XGB)

6422-36-2

Pro-Ala

MEATY (0.747, ST-LGBM)

ODORLESS (0.360, ST-RF)

OTHERS (0.325, ST-LGBM)

13589-02-1

Pro-Phe

FLORAL (0.852, ST-LGBM)

ODORLESS (0.463, ST-LGBM)

FRUITY (0.200, ST-RF)

  1. For each compound, the top three odor labels were predicted by ST-RF, ST-XGB, and ST-LGBM models trained on Morgan fingerprints. Odor descriptors for known substances reflect commonly recognized or empirically grounded characteristics. Dipeptides, not tested experimentally, were included as exploratory cases to illustrate potential in novel fragrance discovery.
  2. ST structural (Morgan) fingerprint, RF Random Forest, XGB eXtreme Gradient Boosting, LGBM Light Gradient Boosting Machine.