Fig. 3: Workflow for curating the Tg dataset to standardize data and ensure reliability. | npj Computational Materials

Fig. 3: Workflow for curating the Tg dataset to standardize data and ensure reliability.

From: PolyMetriX: an ecosystem for digital polymer chemistry

Fig. 3: Workflow for curating the Tg dataset to standardize data and ensure reliability.

This process assigns reliability classes to data points based on their frequency of occurrence and statistical consistency. The workflow groups polymer PSMILES with associated Tg values and categorizes them according to their occurrence count: (1) unique occurrences are assigned the lowest reliability (black), (2) duplicate occurrences undergo a Z-score check where values within ±2 standard deviations are considered reliable (yellow), and (3) multiple occurrences (>2) follow the same Z-score validation to classify them into gold or red reliability categories. This structured approach ensures that the dataset maintains consistency and reduces errors due to outliers.

Back to article page