Fig. 2: NOVA classification and processing score.
From: Machine learning prediction of the degree of food processing

a Visualization of the decision space of FoodProX via principal component analysis of the probabilities {pi}. The manual 4-level NOVA classification assigns unique labels to only 34.25% of the foods listed in FNDDS 2009–2010 (empty circles). The classification of the remaining foods remains unknown, or must be further decomposed into ingredients. The list of foods manually classified by NOVA is largely limited to the three corners of the phase space, foods to which the classifier assigns dominating probabilities. b FoodProX assigned NOVA labels to all foods in FNDDS 2009–2010. The symbols at the boundary regions indicates that for these foods the algorithm’s confidence in the classification is not high, hence a 4-class classification does not capture the degree of processing characterizing that food. For each food k, the processing score FProk represents the orthogonal projection (black dashed lines) of \({\overrightarrow{p}}^{k}=({p}_{1}^{k},{p}_{2}^{k},{p}_{3}^{k},{p}_{4}^{k})\) onto the line p1 + p4 = 1 (highlighted in dark red). c We ranked all foods in FNDDS 2009/2010 according to FPro. The measure sorts onion products in increasing order of processing, from “Onion, Raw'', to “Onion rings, from frozen''. d Distribution of FPro for a selection of the 155 Food Categories in What We Eat in America (WWEIA) 2015–2016 with at least 20 items (Section S2). WWEIA categories group together foods and beverages with similar usage and nutrient content in the US food supply52. Sample sizes vary from a minimum of 21 data points for “Citrus fruits” to a maximum of 340 data points for “Fish''. For each box in the box plots, the minimum indicates the lower quartile, the central line represents the median, and the maximum corresponds to the upper quartile. The upper and lower whiskers represent data outside of the inter-quartile range. All categories are ranked in increasing order of median FPro, indicating that within each food group, we have remarkable variability in FPro, confirming the presence of different degrees of processing. We illustrate this through four ready-to-eat cereals, all manually classified as NOVA 4, yet with rather different FPro. While the differences in the nutrient content of Post Shredded Wheat 'n Bran (FPro = 0.5658) and Post Shredded Wheat (FPro = 0.5685) are minimal, with lower fiber content for the latter, the fortification with vitamins, minerals, and the addition of sugar, significantly increases the processing of Post Grape-Nuts (FPro = 0.9603), and the further addition of fats results in an even higher processing score for Post Honey Bunches of Oats with Almonds (FPro = 0.9999), showing how FPro ranks the progressive changes in nutrient content. Source data are provided in Source Data Figure 2a–d.xlsx.