Table 2 Hyper-parameters, train/test protocol, and environment needed to reproduce all results.
Component | Specification / Setting |
|---|---|
Adaptive sampler | Four energy regions; Region weights \(\{w_\text{therm}=0.10,\;w_\text{res}=0.60,\;w_\text{fast}=0.20,\;w_\text{high}=0.10\}\); 20 % uniform + 80 % gradient-based points per region; threshold energies always retained; \(N_{\text{keep}}=1\,500\) pts/temperature; random seed = 42. |
KNN baseline | KNeighborsRegressor (scikit-learn 1.3); 5-fold GridSearchCV: \(n_\text {neighbors}\in \{3,5,7\}\), weights = distance, metric = Minkowski (\(p=2\)); best model: \(k=5\). |
Gaussian process baseline | Kernel \(k(\!E,E')=1.0\times \text{RBF}(\ell =0.1~\text {eV})\); alpha=1e-2; 10 restarts; predictions clipped at \(\sigma \ge 0\) ; negative predictions clipped to 0. |
Uncertainty quantification | For Developed Method, Non-parametric bootstrap of the reduced grid; \(n_\text {boot}=200\) resamples with replacement; linear interp1d (fill_value=’extrapolate’) per replica; duplicate energies removed via np.unique to avoid zero-slope artefacts; point-wise 95% CI taken as \([\,P_{2.5},\,P_{97.5}\,]\); random seed=42; predictions clipped with \(\sigma \leftarrow \max (0,\sigma )\). |
Train/test split | Each temperature processed independently; 80 % train / 20 % test (stratified by energy decade); metrics averaged over 5 random splits ; seed 42. |
Hardware | LRZ Linux Cluster Intel(R) Xeon(R) Platinum 8380 CPU (Ice Lake) (2\(\times\)40 cores), 1 TB RAM, single node, no GPU acceleration. |
Software stack | Python 3.9, NumPy 1.23, SciPy 1.10, scikit-learn 1.3, h5py 3.9, Matplotlib 3.7. |
Reproducibility | Code and Data: All data supporting the findings of this study are provided within the article. |
Execution footprint | End-to-end pipeline (Fig. 3) reproduced in \(\approx\)19 min wall-time / < 8 GB RAM per temperature |