Background & Summary

Understanding large-scale patterns of diversity, and the ecological and evolutionary origins and consequences of these patterns, is of growing interest. These efforts have historically been constrained by the limited availability of comparative quantitative trait datasets at large spatial and taxonomic scales1. As large datasets have become available, they have stimulated significant advances in macroecology and macroevolution.

While large-scale trait datasets are increasingly available across a range of taxa (e.g., vascular plants2, lizards3, and freshwater fish4), birds are a model system in macroecology and macroevolution, with their well-known distributions5, extinction risks5, phylogenetic relationships6,7, ecological niches8,9, life history strategies10, nesting biologies11, and external morphologies12. These diverse datasets have been integrated to answer a wide range of questions spanning ecology and evolution (e.g.13,14,15,16,17) and are increasingly being used to understand human impacts on natural systems (e.g.18,19,20,21,22). Although much has been learned from existing large-scale datasets, in animals, the availability of trait data spanning multiple anatomical systems would open new avenues of research and could allow for more mechanistic understanding of morphological patterns. Bird skeletons, which are well-represented in natural history collections, present an underutilized opportunity to develop such a dataset.

The accumulation of comparative skeletal trait data for many traits and across many species and individuals has lagged far behind the generation of data from the measurement of external traits23. In birds, aspects of the skeleton provide key insights into bird locomotion24, the physics of flight25,26, directional evolution27, phylogenetic relationships28, and responses to environmental change29, and are often used to better understand the morphologies of fossil birds30. Further, the utility of skeletal traits expands significantly when they can be easily studied in conjunction with other types of phenotypic data. For example, although the tendency for appendages to be longer in warmer climates (i.e., Allen’s Rule31) is a classic pattern in macroecology and has been the focus of intensive research for over a century, the integration of skeletal and plumage trait data revealed a novel morphological trend that generated new insights into the mechanistic basis underlying Allen’s Rule32. As such, the availability of comparative skeletal data may provide new insights into macro-scale patterns in avian morphology and improved mechanistic understanding of the drivers of bird morphological variation across space and time.

This dataset encompasses a large portion of the diversity within Passeriformes, the most diverse order of Neornithes (modern, living birds). The 2,057 species in the dataset comprise 34% of passerine species and represent 89% of passerine families6. The sampling is also spatially expansive and includes specimens from all continents where passerines are resident (Fig. 1). Multiple individuals were measured per species when possible, resulting in a dataset that includes 14,419 individuals. We targeted twelve skeletal elements for each specimen. Combined with our estimates for missing values our dataset includes 173,028 unique values. The dataset could be expanded in the future by photographing additional taxa and applying the existing model to those taxa or by training the model to identify and measure new elements, or aspects of the elements we identify, and measuring them on the existing body of specimen images. The data are presented in three formats: 1) a specimen-level dataset that only includes trait values that were directly measured, 2) a specimen-level dataset with no missing data that includes both the directly measured trait values and imputed trait values, and 3) a complete species-level dataset derived by applying a multivariate evolutionary model. The taxonomy in the datasets has been unified to the Birdlife Version 3 taxonomy to facilitate integration with existing largescale datasets and to simplify conversion to other widely used taxonomies using recently published taxonomic crosswalks12. As such, it should be straightforward to integrate our data with data on the phylogenetic history of birds6, bird range maps5, IUCN threat statuses5, and existing comprehensive external trait data12. Importantly, the methods used to generate these data are open source23 and easily applied, enabling future expansion of the dataset.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Dataset coverage. (A) The dataset includes species spanning all continents where passerines are resident. The ranges of all species included in the dataset are plotted, with colour indicating the number of species included in our dataset at each point in space. (B) The species in the dataset span Passeriformes, the most diverse taxonomic order within modern birds. Each bar associated with a tip on the phylogeny represents a species that is included in the dataset, with the height of the bar indicating the number of individuals of that species that were measured and included in the dataset, reflecting the high intra-specific sampling for some species and high variation in sampling among species. The bird skeleton highlights the bones that were measured in darker green.

Methods

Sampling

The large majority of our data come from skeleton specimens held in the University of Michigan Museum of Zoology (UMMZ), one of the largest and most diverse bird skeletal collections in the world, where we effectively photographed and measured the entirety of the UMMZ’s passerine skeletal collection (N = 12,421, number of species = 1,881). We supplemented this dataset with specimens at the Field Museum of Natural History (FMNH; N = 1,998, number of species = 438), targeting families that are well represented in both the UMMZ and FMNH collections, with an emphasis on species found in the Neotropics. Access to the specimens was granted upon request by the curators of UMMZ and FMNH. Ultimately, we photographed 14,419 specimens spanning 2,057 species, from 86 families (Fig. 1). Every trait measurement has an associated specimen catalogue number that can be used to link each measurement to the specimen, housed in UMMZ or FMNH, and the images of the UMMZ specimens are accessible on the Deep Blue Data repository (https://doi.org/10.7302/69fn-md77).

Photographing

Trait measurements were generated using Skelevision23, a deep neural network-based approach for identifying and measuring skeletal elements in photographs of bird skeleton specimens. In this approach, museum skeleton specimens are first removed from their containers and spread randomly on a standard background, except for the keel and the skull, which are consistently oriented to display their profile. They are then photographed from a fixed distance before being returned to their boxes. Each specimen is photographed individually, independently, and in its entirety.

The photographs were taken with the same imaging equipment that was used in the Skelevision methods paper23. All images were collected from ~400 mm above the specimen, using a SONY IX183 sensor on a FLIR Blackfly S camera. This generated photographs with a pixel size of 0.07 mm.

Trait measurement

We applied the Skelevision method23 for segmenting, identifying, and measuring target bones to each photograph. This method integrates a U-Net33 and Mask R-CNN34 trained on images annotated by hand35,36 to identify pixels in the images that are bone, determine which element the pixels belong to, and then segment the elements (the model was only trained to segment the 12 target elements). The pipeline then takes segmented masks from the images for all elements and measures their longest linear dimension (i.e., the longest linear length of each element) by drawing a bounding box around the element and measuring the longest diagonal23. We use this method to measure 12 traits: the lengths of the tibiotarsus, humerus, tarsometatarsus, ulna, radius, keel, carpometacarpus, 2nd digit 1st phalanx, furcula, and femur; the maximum outer diameter of the sclerotic ring, and the length from the back of the skull to the tip of the bill (treating the rhamphotheca as part of the bill when it remains present on the specimen).

Skelevision estimates the probability that Skelevision’s classification of each element is correct, given the classification options (‘bprob’). Because elements that are classified with a lower certainty (i.e., a low bprob) are at a higher risk of false positives23, we filtered out all trait estimates with a bprob < 0.95; this has been found to result in a relatively low rate of false negatives without increasing the risk of false positives23. For specimens with multiple high-confidence estimates of a trait (e.g., if two femurs were confidently identified and measured from a specimen), we combined these measures by taking the mean. In this way, a single high-quality estimate of each trait was made for each specimen whenever at least one example of an element was confidently identified (Skelevision-Only Dataset). For those traits that did not have at least one high confidence trait estimate (i.e., if there was not at least one trait measure with a bprob ≥ 0.95), the trait was marked as missing data (given a value of ‘NA’ in the Skelevision-Only Dataset).

Phylogenetic data imputation and validation

To generate a 100% complete dataset, we imputed values for all missing data in the Skelevision-Only Dataset using Rphylopars, a maximum likelihood approach for fitting multivariate phylogenetic models and estimating missing values in comparative data37. An advantage of this approach is that it can model variation at the level of individual specimens along with variation among species, providing estimates of missing values at the individual level. This approach allowed us to leverage the dataset’s large size and dimensionality (12 dimensions x 14,419 individuals), along with the expected hierarchical structure due to phylogeny, to estimate missing values. To approximate phylogenetic relationships among the included species, we downloaded 1,000 trees from the posterior distribution of a phylogeny for all birds6,38 and constructed a consensus tree following Rubolini et al.39. Using this consensus tree, we used Rphylopars to estimate variance-covariance structures (both within and between species) according to the expectations of a multivariate Brownian Motion model (‘mvBM’) process. All data were log-transformed before model fitting with Rphylopars, but otherwise, we left all user options set at their defaults. The full code pipeline is available on Zenodo (https://doi.org/10.5281/zenodo.15256923).

To evaluate the accuracy of trait imputation, we validated our approach by iteratively withholding 10–90% of the non-missing data for each trait to use as test data. We then estimated the mvBM model, excluding the test data in the model fitting, and imputed the withheld test data. For each level of missing data, we repeated the analyses ten times, selecting a different random sample of the dataset to use as test data for each replicate. We then compared the known true values of the test data to their estimated values by calculating the root-mean-squared error (RMSE) and percentage bias (p-bias; Fig. 2).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Variation in trait imputation accuracy across different levels of missing data. We masked an increasing percentage of the data generated by Skelevision (10–90% in intervals of 10%) and then imputed the missing values. We estimate the RMSE and P-Bias across ten randomized sets of data (comparing the imputed values to the Skelevision-generated values) for each percentage quantile and present the mean per trait. RMSE and P-Bias are generally low and stable across the range of missing data quantiles.

We present the data from these analyses as complete datasets both at the specimen level (Complete Trait Dataset) and at the species level using species averages estimated by Rphylopars, which includes both the averages and the estimated standard error associated with each species mean (Species-level Data Estimates). It is possible that the species-level means may include individuals of varying ages, though we endeavoured to not include any obviously immature specimens in the dataset.

Data Records

The datasets are available at Dryad (https://doi.org/10.5061/dryad.v41ns1s4c)40. For all datasets, we include species binomials following the BirdLifeV3 taxonomy. For the specimen-level data, we also include museum specimen catalogue numbers40. The data are presented in three datasets:

Skelevision-only dataset

This dataset includes all the Skelevision data measured with high confidence (bprob ≥ 0.95). It does not include any imputed trait measurements. The data are available for download as a comma-separated values file, “Skelevision_Only_Dataset_v1.csv”.

Complete trait dataset

This dataset includes all the Skelevision data measured with a high degree of confidence and imputed trait values for all missing data. The data are available for download as a comma-separated values file, “Complete_Trait_Dataset_v1.csv”.

Species-level data estimates

This dataset includes species means from the model fit using Rphylopars. We also provide estimates of the species-level standard error, variance, and 95% confidence intervals around trait means for downstream analyses. The data are available for download as a comma-separated values file, “Skelevision_species_complete_v1”.

Technical Validation

Skelevision accuracy

The accuracy of the processed data (Skelevision-Only Dataset) generated from the UMMZ image capture setup and specimens has been quantified previously. Weeks et al.23 compared 100 handmade measurements of each trait (except the furcula) to Skelevision estimates of the same traits on the same specimen and found a mean RMSE of 0.89 mm, with some variation in error across bone types (Table 1).

Table 1 Estimated error of Skelevision-generated data.

Because there is a risk that Skelevision will perform differently across the different image-capturing contexts (e.g., locations with variation in lighting) and some variation in specimen preparation between UMMZ and FMNH (e.g., differences in the degree to which bones remain articulated), and to collect validation data for the furcula, we conducted a similar validation test with a subset of the FMNH data. For a random sample of 30 specimens from FMNH, a single person measured each trait of interest from the photographs of the specimens using ImageJ software41. We then compared these handmade measurements of the trait values to the Skelevision measurements for the same trait on the same skeleton. As with the UMMZ samples, assuming the handmade measurements are correct, we find Skelevision is accurate, with a mean RMSE of 1.78 mm across all traits. This is higher than the RMSE of the UMMZ specimens but remains comparable to inter-human measurement error. The errors are not uniform among the element types, and while many have similar or lower errors compared to the UMMZ data, a few have elevated error levels, albeit on a similar scale to the expected range of human measurement error (Table 1)9.

Trait imputation accuracy

The RMSE of the imputed data was uniformly low across replicates and increasing levels of additional missing data (Table 2; Fig. 2). The low levels of error in the imputed data suggest uncertainty in phylogenetic relationships has a negligible impact on the trait imputation accuracy, and the imputation is robust to variation in the level of missing data. The maximum mean RMSE (~1.15 mm) was observed for estimates of the length of the second digit; notably this was only when we simulated maximal amounts of additional missing data (90%; or ~85% missing data overall). This maximum mean error is like that observed from human measurement9. We also observe very low P-bias (<1%) in estimated values throughout the range of evaluated levels of missing data, lending further credence to the validity of our approach. Maximum P-bias values ranged from ~ −0.4% (radius) to 0.5% (second digit) and thus were essentially negligible, particularly at the levels of missing data within the dataset (Table 2); most traits had a mean P-bias centered near zero.

Table 2 Estimated error of imputed data.

In general, when anatomical traits have strong phylogenetic signals and multivariate correlations, we expect to be able to estimate missing values with high accuracy and precision under mvBM. Our approach highlights the power of multivariate phylogenetic models to generate complete datasets at the level of individual specimens and should provide a useful framework for future research.