Table 1 Summary of popular macromolecular descriptor classes.

From: Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials

Descriptor Class

Overview

Implementation

Labor-Intensiveness

Domain-specific

Task-specific analytical measurements of the underlying system

Tabular encoding of experimental conditions, physical properties, physics-based simulations, analytical measurement results in accordance with domain knowledge

High (unless data collection is automated or variables known a priori)

Fingerprint

Vector encoding of the macromolecular chemistry in the training dataset

Tabular encoding of vectorized system, adoption of pre-existing frameworks where compatible

Low

String Representation

Encoded molecular structures using a predefined chemically complete knowledge framework

Incorporate framework in tabular data (ex. SMILES, BIGSMILES, SELFIES, etc.)

Medium

Graph Representation

Encoding and attribution of molecular systems using a graph data structure at a predefined level of abstraction

Custom encoding of nodes and edges for a graph learning task, adoption of pre-existing frameworks where compatible

Medium