Fig. 4: Enhanced linear regression models for DNA folding energetics.
From: High-throughput DNA melt measurements enable improved models of DNA folding thermodynamics

a Adjusted mean absolute error (adjusted MAE) of linear regression models with features based on single-base-pair, nearest-neighbor, or next-nearest-neighbor stacks (length of stack features = 1, 2, and 3, respectively), applied to Array Melt test set. b Comparison of nearest-neighbor parameters for Watson-Crick pairs fitted from Array Melt data and those reported in the literature8. c Visualization of dna24 and dna04 free energy parameters for single mismatches. “Context” and “mismatch” definitions are consistent with those in Fig. 3a. Note: The flat GT/TG mismatch columns in the dna04 panel result from NUPACK’s use of placeholder (dummy) parameters for these mismatches. This issue is due to a historical artifact in the parameter files, which the NUPACK development team has acknowledged and is currently addressing (Supplementary Discussion; personal communication, N. Pierce, 2025). The notes also apply to Supplementary Fig. 3c. d, e Performance comparison of the dna04, dna24, and rich parameter models on held-out Array Melt data. All calculations were performed at 1 M Na+ concentration. f Variances in data (technical and biological) and variances explained or added by the models, for all Array Melt test data, or Watson Crick, mismatch, and bulges only. g Performance comparison of the dna04, dna24, and rich parameter models on an orthogonal dataset of DNA duplexes with varying numbers of mismatches (Oliveira et al. 15.). h Adjusted MAE on Array Melt data plotted as a function of the percentage of training data used to fit the linear regression models.