Table 1 Summary of recent machine learning approaches for odor prediction

From: A comparative study of machine learning models on molecular fingerprints for odor decoding

Study

Machine learning model

Number of odorants

Feature source

Performance metrics

Reference number

Nozaki, Y. & Nakamoto, T. (2016)

Deep ANN

121

Mass spectral data

R ≈ 0.76

3

Shang, L. et al. (2017)

SVM (Boruta-C), ELM (PCA)

1026

DRAGON Physicochemical Parameters (PCA/Boruta)

SVM Accuracy: 97.08%

ELM Accuracy: 97.53 ± 1.35%

4

Sharma, A. et al. (2021)

DNN (PPMF), CNN (Xception on 2D images)

5185

PaDel fingerprints; RDKit 2D chemical images

DNN Accuracy: 97.3%

CNN Accuracy: 98.3%

Combined Precision: 100% (64 smells)

5

Saini, K. & Ramanathan, V. (2022)

Daylight-BR

7374

Mordred, Morgan, Daylight fingerprints

micro-F1: 0.3523

6

Schicker, D. et al. (2023)

Olfactory Weighted Sum (linear)

64

SMARTS structural patterns

Predicted Accuracy: 0.677

Training Accuracy: 0.905

Random Guessing Performance: 0.214

7

Zhang, M. et al. (2024)

Mol-PECO (Deep Learning, Coulomb Matrix/LPE)

8503

Coulomb matrix + LPE encodings

AUROC: 0.813

AUPRC: 0.181

8

This study

RF, XGB, LGBM

8681

Morgan fingerprints (ST Model)

XGB-ST AUROC: 0.828

XGB-ST AUPRC: 0.237

LGBM-ST AUROC 0.810

LGBM-ST AUPRC 0.228

Table 2

  1. Overview of major studies applying machine learning (ML) models to olfactory prediction tasks. Shown are the model types, number of odorants used for training, feature sources, and key performance metrics.
  2. ANN artificial neural network, R Pearson correlation coefficient, SVM support vector machine, ELM extreme learning machine, PCA principal component analysis, CNN convolutional neural network, DNN deep neural network, PPMF physiochemical properties and molecular fingerprints, BR binary relevance, SMARTS SMILES arbitrary target specification, LPE learned positional encoding, ST structural (Morgan) fingerprint, AUROC area under the receiver operating characteristic curve, AUPRC area under the precision–recall curve, RF Random Forest, XGB eXtreme Gradient Boosting, LGBM Light Gradient Boosting Machine.