Fig. 3: Classifier development, performance testing, and validation in independent blinded datasets. | Cell Discovery

Fig. 3: Classifier development, performance testing, and validation in independent blinded datasets.

From: Artificial intelligence defines protein-based classification of thyroid nodules

Fig. 3

a Schematic workflow of the classifier development. Protein features were prioritized based on the discovery dataset. The model was trained using 19 proteins selected from the discovery dataset and further validated in test datasets. More details are described in Materials and Methods. b The importance rank of the selected 19 protein features was interpreted by SHapley Additive exPlanations (SHAP) algorithm. c Protein abundance distribution of the 19 features. d Network of the 19 proteins. Blue nodes and orange nodes indicate the protein features and connected molecules or pathways, respectively. Direct interactions are in solid lines and indirect interactions are in dash lines. e ROC plots of seven different machine learning models of 19 selected features. f ROC plots of the discovery set, retrospective test sets, prospective test sets and Bethesda III and IV samples in the prospective test sets. g UMAP plots showing the separation between benign and malignant groups in the retrospective and prospective test sets using 19 protein features with latent space. h Overall performance metrics of prediction of the neural network model for five specific histopathological types per set. Graduated colors in the shaded bar indicate accuracy levels. Numbers in the boxes indicate the number of correctly identified samples/total sample number. HCA and HCC were assigned as FA and FTC, respectively. i Sankey diagram showing the distribution ratio and correspondence between histopathology and cytopathology in the prospective sets. Histopathological type L denotes lymphocytic thyroiditis. Cytopathology scores were assigned by specialized pathologists using the Bethesda System. TP, TN, FP, and FN indicate true positive, true negative, false positive, and false negative, respectively, of the results predicted by our classifier model.

Back to article page