Table 1 Summary of popular macromolecular descriptor classes.
From: Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials
Descriptor Class | Overview | Implementation | Labor-Intensiveness |
|---|---|---|---|
Domain-specific | Task-specific analytical measurements of the underlying system | Tabular encoding of experimental conditions, physical properties, physics-based simulations, analytical measurement results in accordance with domain knowledge | High (unless data collection is automated or variables known a priori) |
Fingerprint | Vector encoding of the macromolecular chemistry in the training dataset | Tabular encoding of vectorized system, adoption of pre-existing frameworks where compatible | Low |
String Representation | Encoded molecular structures using a predefined chemically complete knowledge framework | Incorporate framework in tabular data (ex. SMILES, BIGSMILES, SELFIES, etc.) | Medium |
Graph Representation | Encoding and attribution of molecular systems using a graph data structure at a predefined level of abstraction | Custom encoding of nodes and edges for a graph learning task, adoption of pre-existing frameworks where compatible | Medium |