Fig. 4: Generating space-efficient trees.
From: Machine learning enables improved runtime and precision for bio-loggers on seabirds

a Our process for the weighted random selection of features. We start with a list of features along with their required programme memory sizes in bytes (first panel). Each feature is assigned a weight proportional to the inverse of its size, illustrated using a pie chart where each feature has been assigned a slice proportional to its weight (second panel). We then perform weighted random selection to choose the subset of features that will be used when creating a new node in the tree. In this example, we have randomly placed four dots along the circumference of the circle to simulate the selection of four features (second panel). The resulting subset of features will then be compared when making the next node in the decision tree (third panel). b Example decision tree built using scikit-learn’s default decision tree classifier algorithm using the black-tailed gull data described in “Methods”. Each node is coloured based on its corresponding feature’s estimated size in bytes when implemented on board the bio-logger (scale shown in the colour bar). c Several space-efficient decision trees generated using the proposed method from the same data used to create the tree in (b). d Example space-efficient tree selected from the trees shown in (c) that costs much less than the default tree in (b) while maintaining almost the same accuracy.