Scientific Reports

Table 2 Hyper-parameters, train/test protocol, and environment needed to reproduce all results.

From: Development of a novel machine learning-based adaptive resampling algorithm for nuclear data processing

Component	Specification / Setting
Adaptive sampler	Four energy regions; Region weights \(\{w_\text{therm}=0.10,\;w_\text{res}=0.60,\;w_\text{fast}=0.20,\;w_\text{high}=0.10\}\); 20 % uniform + 80 % gradient-based points per region; threshold energies always retained; \(N_{\text{keep}}=1\,500\) pts/temperature; random seed = 42.
KNN baseline	KNeighborsRegressor (scikit-learn 1.3); 5-fold GridSearchCV: \(n_\text {neighbors}\in \{3,5,7\}\), weights = distance, metric = Minkowski (\(p=2\)); best model: \(k=5\).
Gaussian process baseline	Kernel \(k(\!E,E')=1.0\times \text{RBF}(\ell =0.1~\text {eV})\); alpha=1e-2; 10 restarts; predictions clipped at \(\sigma \ge 0\) ; negative predictions clipped to 0.
Uncertainty quantification	For Developed Method, Non-parametric bootstrap of the reduced grid; \(n_\text {boot}=200\) resamples with replacement; linear interp1d (fill_value=’extrapolate’) per replica; duplicate energies removed via np.unique to avoid zero-slope artefacts; point-wise 95% CI taken as \([\,P_{2.5},\,P_{97.5}\,]\); random seed=42; predictions clipped with \(\sigma \leftarrow \max (0,\sigma )\).
Train/test split	Each temperature processed independently; 80 % train / 20 % test (stratified by energy decade); metrics averaged over 5 random splits ; seed 42.
Hardware	LRZ Linux Cluster Intel(R) Xeon(R) Platinum 8380 CPU (Ice Lake) (2\(\times\)40 cores), 1 TB RAM, single node, no GPU acceleration.
Software stack	Python 3.9, NumPy 1.23, SciPy 1.10, scikit-learn 1.3, h5py 3.9, Matplotlib 3.7.
Reproducibility	Code and Data: All data supporting the findings of this study are provided within the article.
Execution footprint	End-to-end pipeline (Fig. 3) reproduced in \(\approx\)19 min wall-time / < 8 GB RAM per temperature

Back to article page

Search

Advanced search

Quick links