Table 2 Hyper-parameters, train/test protocol, and environment needed to reproduce all results.

From: Development of a novel machine learning-based adaptive resampling algorithm for nuclear data processing

Component

Specification / Setting

Adaptive sampler

Four energy regions; Region weights \(\{w_\text{therm}=0.10,\;w_\text{res}=0.60,\;w_\text{fast}=0.20,\;w_\text{high}=0.10\}\); 20 % uniform + 80 % gradient-based points per region; threshold energies always retained; \(N_{\text{keep}}=1\,500\) pts/temperature; random seed = 42.

KNN baseline

KNeighborsRegressor (scikit-learn 1.3); 5-fold GridSearchCV: \(n_\text {neighbors}\in \{3,5,7\}\), weights = distance, metric = Minkowski (\(p=2\)); best model: \(k=5\).

Gaussian process baseline

Kernel \(k(\!E,E')=1.0\times \text{RBF}(\ell =0.1~\text {eV})\); alpha=1e-2; 10 restarts; predictions clipped at \(\sigma \ge 0\) ; negative predictions clipped to 0.

Uncertainty quantification

For Developed Method, Non-parametric bootstrap of the reduced grid; \(n_\text {boot}=200\) resamples with replacement; linear interp1d (fill_value=’extrapolate’) per replica; duplicate energies removed via np.unique to avoid zero-slope artefacts; point-wise 95% CI taken as \([\,P_{2.5},\,P_{97.5}\,]\); random seed=42; predictions clipped with \(\sigma \leftarrow \max (0,\sigma )\).

Train/test split

Each temperature processed independently; 80 % train / 20 % test (stratified by energy decade); metrics averaged over 5 random splits ; seed 42.

Hardware

LRZ Linux Cluster Intel(R) Xeon(R) Platinum 8380 CPU (Ice Lake) (2\(\times\)40 cores), 1 TB RAM, single node, no GPU acceleration.

Software stack

Python 3.9, NumPy 1.23, SciPy 1.10, scikit-learn 1.3, h5py 3.9, Matplotlib 3.7.

Reproducibility

Code and Data: All data supporting the findings of this study are provided within the article.

Execution footprint

End-to-end pipeline (Fig. 3) reproduced in \(\approx\)19 min wall-time / < 8 GB RAM per temperature