Fig. 1
From: Semi-supervised machine-learning classification of materials synthesis procedures

a Learning curves of the RF model demonstrating F1 score improves with more training data. The red plus and blue cross symbols represent model F1 scores tested on training data sets and test data sets, respectively. The shaded areas denote the standard deviations of the curve. The performance converges to high F1 scores with training data sets as small as a few hundred paragraphs. b Precision/Recall/F1 scores of the RF model. The model was trained using 5000 training paragraphs and cross-validated using 1000 test paragraphs. Training paragraphs were randomly drawn from the annotated data set several times to calculate the standard deviation