Extended Data Fig. 1: Imputation accuracy for randomly sampled emissions data.

Emissions of non-electric vehicles were accurately imputed from recursively partitioned regression models (Supplementary Code). Predictions made on a masked random 20% sample of vehicle emissions tests (N = 6,708 holdout make – model – year - engine combinations) showed a high degree of accuracy under ten-fold cross-validation (a; R2 on y-axis). Accuracy increased with tree depth to achieve moderate to high accuracy across CO2 (b), nitrogen oxides (NOx; c), and miles per gallon (MPG; d) prediction models, with the lowest cross-validation accuracy for total hydrocarbon emissions (THC; e). Model and manufacturer effects accounted for minimal variance, independent of these effects, and were not fit to allow imputation of rare makes and models. Filled circles in (a) are red for CO2, green for MPG, blue for NOx, and mauve for THC.