Extended Data Fig. 5: Comprehensive assessment of reconstruction skill, performed on separate calibration and validation intervals.
From: Forced changes in the Pacific Walker circulation over the past millennium

ΔSLP reconstruction ensemble members trained on a 1951–2000 calibration interval and assessed against instrumental ΔSLP in an independent (1900–1950) interval. These scores provide a minimum independent estimate of reconstruction skill; in reality, skill is probably higher, as we used a longer calibration interval. a,b, Violin-and-box plots (‘voxplots’) summarizing ΔSLP reconstruction skill in terms of correlation coefficient (r), RMSE and RE (ref. 132). All tests were performed on all 4,800 ΔSLP reconstruction ensemble members. Voxplots show the distribution of scores; boxes shows median and interquartile range (IQR), whiskers show IQR × 1.5, points show outliers. Each individual ensemble member was assessed against ΔSLP used to train that particular ensemble member, that is, ΔSLP calculated from HadSLP, ICOADS or ERA-20C. a, Skill scores split according to the gridded SLP product used to calculate the ΔSLP training data. b, Skill scores split according to the reconstruction method. c, As per a and b, but with panels showing skill scores for each temporal subset that contributed to the full reconstruction interval. This provides an estimate of the decrease in reconstruction skill back through time (as the number of available proxy records decreases; see Extended Data Figs. 1 and 2).