Figure 2
From: Monitoring the Age of Mosquito Populations Using Near-Infrared Spectroscopy

The ability of NIRS to predict the age of individual laboratory reared mosquitoes using standard (I) and iPLS (II to IV) NIRS chemometric methods. I. The accuracy of NIRS in predicting the age of individual Anopheline mosquitoes, using previously published methods to convert spectra into estimates of age. II. Improvements in individual mosquito predictive accuracy and bias reduction through adoption of the iPLS NIRS chemometric methods. In Panels I. and II. the training and testing sets comprised mosquitoes only from study A (see Table 1), the solid black line indicates the fit of a polynomial regression to the data, the blue points indicate the individual-age estimates (with jitter added to the x-axis), whilst orange line shows the ideal (y = x perfect correlation) line. In Panel II., blue shading represents the 95% posterior quantiles for the surrogate model representing the combined action of NIRS and the iPLS machine learning algorithm, and individual age estimates (blue points). Versions of Panel II. are given for each dataset separately (Fig. S1). Panel III. shows the predictions for the individual studies using standard (blue) and iPLS (orange) machine learning methods. In (III) the error is estimated by randomly sampling 200 mosquitoes from the training dataset with the middle, lower and upper edges of the boxes representing the 50%, 25% and 75% quantiles in average error obtained across 100 replicates. The lower and upper fences show the 25%/75% quantiles +/− 1.5 times the interquartile range. The dots indicate any outliers, defined as points which lie outside the bounds of the fences. The “average error” is calculated as the root-mean-square error (RMSE) across all replicates. (IV) Increasing the number of mosquitoes used in the training dataset substantially increases the predictive accuracy of the age of individual mosquitoes for the iPLS NIRS chemometric method. The solid points represent median average error seen in Study A, and the upper and lower fences represent the 75% and 25% quantiles, each of which was calculated across 100 replicates at each sample size. The regression line here was estimated as: y = 0.44 + 34.81 x0.5, where y is the RMSE and x is the number of mosquitoes in the training dataset.