Table 4 Text extraction model accuracies.
From: Text-mined dataset of gold nanoparticle synthesis procedures, morphologies, and size entities
Pipeline Component | ML Method | F1: (precision|recall) |
---|---|---|
Article filtering | regex/TF-IDF | 0.96: (1.00|0.92) |
Synthesis paragraph classification | BERT classification | 0.90: (0.96|0.85) |
Characterization paragraph classification | BERT classification | 0.90: (0.93|0.87) |
Materials Entity Recognition | BiLSTM+CRF (MatBERT embeddings) | Â |
0.95:(0.95|0.95) - materials1 | ||
0.90:(0.89|0.91) - precursors1 | ||
0.85:(0.86|0.83) - targets1 | ||
Morphology Entity Recognition | Fine-tuned MatBERT NER model | 0.87:(0.89|0.84) - Micro average |
0.92:(0.90|0.95) - MOR (morphology) | ||
0.56:(0.70|0.52) - DES (descriptor) | ||
0.70:(0.83|0.64) - MES (measurement) | ||
0.69:(0.81|0.62) - SIZ (size value) | ||
0.91:(0.94|0.91) - UNT (unit) | ||
Synthesis actions2 | BiLSTM (Word2Vec embeddings) | 0.89 (0.90|0.88) |
Synthesis conditions3 | Rule-based | Â |
– Temperature |  | 0.94: (0.97|0.92) |
– Time |  | 0.93: (0.98|0.89) |
Material quantities3 | Rule-based | 0.87: (0.90|0.85) |
Seed-mediated tag | Rule-based | 1.00: (1.00|1.00) |