Table 4 Text extraction model accuracies.

From: Text-mined dataset of gold nanoparticle synthesis procedures, morphologies, and size entities

Pipeline Component

ML Method

F1: (precision|recall)

Article filtering

regex/TF-IDF

0.96: (1.00|0.92)

Synthesis paragraph classification

BERT classification

0.90: (0.96|0.85)

Characterization paragraph classification

BERT classification

0.90: (0.93|0.87)

Materials Entity Recognition

BiLSTM+CRF (MatBERT embeddings)

 

0.95:(0.95|0.95) - materials1

0.90:(0.89|0.91) - precursors1

0.85:(0.86|0.83) - targets1

Morphology Entity Recognition

Fine-tuned MatBERT NER model

0.87:(0.89|0.84) - Micro average

0.92:(0.90|0.95) - MOR (morphology)

0.56:(0.70|0.52) - DES (descriptor)

0.70:(0.83|0.64) - MES (measurement)

0.69:(0.81|0.62) - SIZ (size value)

0.91:(0.94|0.91) - UNT (unit)

Synthesis actions2

BiLSTM (Word2Vec embeddings)

0.89 (0.90|0.88)

Synthesis conditions3

Rule-based

 

– Temperature

 

0.94: (0.97|0.92)

– Time

 

0.93: (0.98|0.89)

Material quantities3

Rule-based

0.87: (0.90|0.85)

Seed-mediated tag

Rule-based

1.00: (1.00|1.00)

  1. 1Metrics from He et al.33.
  2. 2Metrics from associated manuscript on synthesis actions extraction35.
  3. 3Metrics from accepted publication on solution synthesis extraction42.