Table 1 Summary of the initial training dataset

From: Discovery of new high-pressure phases – integrating high-throughput DFT simulations, graph neural networks, and active learning

Source

Number of data points

Percentage of the total number of data points (%)

Number of unique structures

Percentage of the total number of unique structures (%)

CellRelaxDFT

1884

4.0

177

2.4

MP—EOS

4174

8.8

199

2.7

MP—bulk modulus

41,274

87.2

6879

94.9

Total

47,332

100

7255

100