A harmonized dataset of ground-mounted solar energy in the US with enhanced metadata

Stid, Jacob T.; Kendall, Anthony D.; Anctil, Annick; Rapp, Jeremy; Bingaman, James C.; Hyndman, David W.

doi:10.1038/s41597-025-05862-4

Download PDF

Data Descriptor
Open access
Published: 29 September 2025

A harmonized dataset of ground-mounted solar energy in the US with enhanced metadata

Scientific Data volume 12, Article number: 1586 (2025) Cite this article

1311 Accesses
Metrics details

Subjects

Abstract

Solar energy generating systems are critical components of our expanding energy infrastructure, yet available datasets remain incomplete or not publicly available–particularly at the sub-array level. Combining the best open access datasets in the US with image analysis on freely available remotely-sensed imagery, we present the Ground-Mounted Solar Energy in the United States (GM-SEUS) dataset, a harmonized, open access geospatial and temporal repository of solar energy arrays and panel-rows. GM-SEUS v1.0 includes over 15,000 commercial- and utility-scale ground-mounted solar photovoltaic and concentrating solar energy arrays (186 GW_DC) covering 2,950 km² and includes 2.92 million unique solar panel-rows (466 km²). We use these newly compiled and delineated solar arrays and panel-rows to harmonize and independently estimate value-added attributes to existing datasets including installation year, azimuth, mount technology, panel-row area and dimensions, inter-row spacing, ground cover ratio, tilt, and installed capacity. By harmonizing and estimating attributes of the distributed US solar energy landscape, GM-SEUS supports diverse applications in renewable energy modeling, ecosystem service assessment, and infrastructure planning.

A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK

Article Open access 13 November 2020

A global inventory of photovoltaic solar energy generating units

Article 27 October 2021

A solar panel dataset of very high resolution satellite imagery to support the Sustainable Development Goals

Article Open access 20 September 2023

Background & Summary

High-quality spatiotemporal characterization of solar energy systems (photovoltaic–PV and concentrating solar power–CSP) has historically been sparse, incomplete, or held behind privacy barriers or paywalls. This gap in data availability has hindered regional- and global-scale analysis on a key component of the growing and diversifying energy landscape, and inhibits solar energy design, distribution, and monitoring investigation. Recently, numerous groups have attempted to fill this gap using remote sensing, manual digitization, crowdsourcing, and machine learning techniques to spatiotemporally characterize solar energy across the globe (e.g., refs. ^{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40}). There are also others maintaining databases of value-added attributes for a variety of applications (e.g., refs. ^{41,42,43,44,45,46,47,48,49,50,51}). However, dataset availability, quality, and completeness vary widely, leaving key solar energy design information unknown for the broader scientific community. We aim to fill this gap by compiling a harmonized spatiotemporal dataset of ground-mounted solar energy arrays in the United States (US). We go further using high-spatial resolution aerial imagery alongside high-temporal resolution satellite imagery to independently estimate a suite of installation design metadata that contributes new knowledge on the solar energy landscape.

Understanding the location and design of current renewable energy infrastructure allows for more effective modeling, monitoring, and planning efforts for future infrastructure. The United States Large-Scale Solar Photovoltaic Database (USPVDB) is the most comprehensive publicly available, regularly updated, and standardized dataset of georectified utility-scale solar arrays in the US^26,52. Importantly, this database contains valuable permitting data from the US Energy Information Administration (EIA) Form 860 including installation year, installed capacity, mount technology (fixed-axis, single-axis tracking, or dual-axis tracking), tilt, azimuth, prior land use, agrivoltaic acceptance, and more. Although USPVDB is the current best available solar metadata dataset in the US, it does have limitations. USPVDB reports utility-scale (≥1 MW_DC) solar PV installations, which comprise the majority of installed capacity in the US²⁶. However, this database omits the more numerous and distributed commercial-scale (<1 MW_DC) solar PV projects^53,54 and CSP installations. These limitations leave critical data gaps in our understanding of distributed energy resources and the solar energy landscape. There are key differences in the ecosystem service and economic land use trade-offs between commercial- and utility-scale installations⁵⁵. Proportionally, commercial-scale systems tend to reside on cropland more often than utility-scale installations¹⁴ and experience less regulation and oversight⁵⁶. Recent work has also shown that reported metadata can be incomplete or contain errors⁵³ and underestimates the total extent of installed solar energy⁵⁷, thus there is a need for a comprehensive and independent metadata characterization of this rapidly deployed technology.

Kruitwagen et al.¹⁴ produced the first global and publicly available geospatial dataset of solar energy installations. The follow-on product, the TransitionZero Solar Asset Mapper (TZ-SAM), is an open-access, global, and regularly updated dataset of commercial- and utility-scale solar facilities, derived using machine learning with Copernicus Sentinel-2 imagery (10 m), trained and validated on existing and hand annotated datasets³⁶. OpenStreetMap (OSM), one of the richest geographical databases in existence, also provides access to commercial- and utility-scale solar arrays and panel-row data generated by open collaboration–crowd-sourced hand annotation of aerial and satellite imagery⁵⁸–that has been used in a number of previous solar data acquisition efforts (e.g., refs. ^8,9,14,36,40). There are concerns about the spatial quality and consistency of medium-coarse resolution remote sensing and crowd-sourced solar array delineation^17,26,36. For example, remote sensing and crowd-sourced datasets are known to overestimate the total area of an array due to ambiguous array definitions or to classification of medium resolution satellite imagery^9,17,26. Yet, datasets like TZ-SAM and OSM are critical for filling temporal, scale, and reporting bias limitations. Together, USPVDB, TZ-SAM, OSM, and similar high-fidelity spatial delineation and metadata acquisition efforts provide the foundation for understanding the renewable energy landscape and for the dataset presented here.

Solar array siting, management, and design choices have long-term impacts on electricity production, the physical landscape, and related ecosystem services. Most often, available datasets derived on permitting data, remote sensing, or even manual annotation stop at the project or array-scale. However, panel-row design metadata would allow for the scaling of in-depth design and design-impact analyses that are often completed at a single system level⁵⁶ and thus limited by a lack of high-resolution data. Several modern tools and approaches have been published working to optimize solar designs for electricity production^59,60,61,62, co-production of electricity and vegetation^63,64,65, and stormwater runoff^{66,67,68,69,70}. Along with numerous other tools, these models are dependent on array location and design characteristics that have not been widely available prior to this work.

Here, we leverage the best available datasets and databases to compile a comprehensive ground-mounted solar array dataset that is up to date, open access, and not limited to utility-scale capacity. We also compile existing panel-row datasets and, where available, use high spatial-resolution imagery to delineate new panel-row objects within solar array bounds. We use this new panel-row delineation to standardize and enhance existing array and panel-row boundaries, addressing concerns about accuracy and harmonization of manually digitized datasets and medium to coarse remote sensing-derived datasets^17,26,36. We add value to the dataset by independently estimating several array- and panel-row attributes including installation year, installed capacity, mount technology, ground cover ratio (GCR), fixed-axis tilt, panel-row azimuth and dimensions, and inter-row spacing.

The Ground-Mounted Solar Energy in the United States (GM-SEUS) v1.0 dataset contains 15,017 ground-mounted solar PV and CSP arrays covering 2,944 km². The dataset includes 9,631 utility-scale arrays composing an estimated 184.2 GW_DC and 5,386 commercial-scale arrays composing an estimated 2.1 GW_DC, making this the largest publicly available US solar repository to date (Fig. 1). For 9,042 arrays (83.1 GW_DC), we delineated 2.92 million high-quality solar panel-rows, thus improving array geometries and providing sub-array design metadata. Collectively, the solar panel-row geometries compose 466 km² in total area. Including harmonized metadata, 42% of arrays were fixed-axis, 21% were single-axis, 2.1% were dual-axis, 2.6% were mixed, and 33% were unknown. Solar PV GCR varies with mount type, and on average was 53% for fixed-axis, 42% for single-axis tracking, 50% for dual-axis tracking, and 63% for arrays with mixed mounts (GCR₁).

The goal of this effort is to provide researchers and policymakers with a distributed ground-mounted solar array dataset and panel-row design metadata on the existing US solar energy landscape. Additional use cases may include nowcast modeling and transmission planning^{71,72,73,74,75}; grid pricing and incentive planning⁷⁶; tracking policy effectiveness⁷⁷; evaluating property value impacts^78,79; assessing public perception towards projects with varying scale and levels of community engagement⁸⁰; tracking agricultural production opportunity costs⁵⁵; modeling carbon storage and sequestration potential⁸¹; modeling potential for habitat connectivity and pollination services⁸²; pattern recognition and deep learning semantic segmentation models^3,36; soiling and performance monitoring^83,84,85; repowering preparation⁸⁶; and tracking spatiotemporal material stock and recycling potential in existing infrastructure^87,88,89,90.

Several efforts have extracted panel-row design metadata (e.g., refs. ^{21,23,28,53,91,92}) but this is the first endeavor to provide a publicly-available dataset of this magnitude and spatial coverage. Importantly, the dataset is open access with all code and data available for training and acquisition of new array datasets both in the US and other countries. Greater knowledge on global solar PV panel-level distribution would enhance use cases reported here to the global PV market and all impacted landscapes. We intend to update this dataset annually and invite others to continue to introduce new value-added attributes to this dataset.

Methods

The five development phases of GM-SEUS are outlined in Fig. 2 and described in detail in the following Methods and Technical Validation sections. In phase 1, Compile Existing Geospatial Data, we collected and harmonized freely available solar array and panel-row datasets with existing geospatial data, removing duplicates. In phase 2, Georeference Metadata, we spatially referenced value-added attributes from solar array point data to polygon boundaries by proximity and manual digitization, delineating new array boundaries where necessary. Phase 3, Acquire Panel-Row Shapes, involved using a combination of image analysis approaches to classify solar panel-rows using high-resolution aerial imagery combined with existing panel-row data. In phase 4, Enhance Design Metadata, we used this panel-row data to generate new array boundaries and estimate several design attributes using remote sensing and geospatial analysis. Finally, in phase 5, Validate and Share, we performed technical validation of the derived design metadata and new spatial boundaries using high-quality reference datasets, while also ensuring that we maintain open access data principles.

Compiling existing geospatial data

Existing solar array data in the US

We compiled a dataset of distributed multi-scale ground-mounted solar arrays across the contiguous US (CONUS) complete through December 2024. We chose to use only freely available datasets to ensure availability of our results. We used existing ground-mounted solar array datasets in the US that contained explicit array polygons. Each dataset is unique in coverage and metadata completeness and was created for a distinct purpose. We collected data from the following open repositories: The United States Large-Scale Photovoltaic Database v2.0 (USPVDB)^26,52, project and panel-row annotations from OpenStreetMap (OSM)⁵⁸, the TransitionZero Global Solar Asset Mapper Q3-2024 (TZ-SAM)^36,57, a California Central Valley solar PV dataset (CCVPV)^21,93, and a Chesapeake Watershed solar dataset (CWSD)^25,94.

We defined a solar array footprint or boundary as adjacent, existing, and connected solar panel-rows (PV or CSP) of the same installation year including the inter-row spacing between them (Fig. 3). We qualitatively assessed the spatial quality of input array geometries based on adherence to our array boundary definition (Fig. 3), inferred from reported delineation methods and visual inspection. High-quality sources reported clearly documented methods that aligned with our array footprint definition. Lower quality sources were those with less certain boundary definitions or those derived from medium-resolution imagery, often delineating project boundaries rather than array footprints. The resulting order of adherence was: USPVDB, CCVPV, CWSD, OSM, and TZ-SAM, with USPVDB, CCVPV, and CWSD most closely following the provided definition of an array.

We also compiled existing value-added solar energy datasets that contained location (latitude and longitude) spatial data without explicit array boundaries. These data were from the following open repositories: the National Renewable Energy Laboratory (NREL) Agrivoltaic Map from the InSPIRE initiative⁴⁵, the Lawrence Berkeley National Lab (LBNL) Utility-Scale Solar (USS), 2024 Edition report⁴⁹, the NREL Photovoltaic Data Acquisition initiative (PVDAQ)⁴⁶, the International Energy Agency and NREL hosted Solar Power and Chemical Energy Systems (SolarPACES) initiative CSP.guru data product^47,95, Global Energy Monitor’s (GEM) Global Solar Power Tracker (GSPT)⁴⁸, and the World Resources Institute’s Global Power Plant Database v1.3.0 (GPPDB)^42,96. Some of these datasets were compilations of each other and existing datasets including the EIA Form 860, Wiki-Solar⁴¹, and various other regional and global sources including those used here. We joined these locations with existing array boundaries using a 190 m radius, the distance at which ~75% of solar location data is associated with an existing array⁵³. For all geospatial datasets, we excluded decommissioned arrays where that information was available. Existing array and panel-row data sources are described in Table 1.

Table 1 Existing and publicly available geospatial datasets of solar energy systems.

Full size table

Existing solar panel-row data in the US

To our knowledge, only two data repositories contain large quantities of ground-mounted solar panel-row geospatial data in the US, our recently published dataset of panel-row geometries in California’s Central Valley^21,93, and mixed array and panel-row data within OSM, most often tagged with generator:source = solar^8,58. We took guidance and motivation from existing OSM solar data extraction methods to process current solar array and panel-row data from OSM^8,97, though we developed our own independent workflow. Complete polygon data was extracted from both generator:source = solar (likely panel-rows) and plant:source = solar (likely arrays) from OSM. We separated panel-rows from arrays within both tags by checking geometries with comparable panel-row area and perimeter to area ratios to panel-rows reported in Stid et al.²¹. We removed repeat panel-row objects prioritizing hand-digitized objects from OSM over the imagery classification approach of CCVPV.

Georeferencing metadata

Digitizing missing solar array boundaries

The 190 m georeference distance for missing boundary is shorter than distances used for metadata attribution in similar studies using between 300 m (ref. ⁹) to 400 m (ref. ⁸). There were 1,616 arrays with value-added reference point data and without georeferenced solar array boundary data within 190 m. For the initial GM-SEUS v1.0, we manually delineated 126 missing array boundaries from the NREL Agrivoltaic Map using the most recently available imagery including National Agricultural Imagery Program (NAIP) aerial imagery⁹⁸, Copernicus Sentinel-2 imagery⁹⁹, Google Maps basemap imagery¹⁰⁰. We followed delineation logic from Fujita et al.²⁶ and our array definition, creating new boundaries that encompass panel-rows and the space between them. If the array geometry was present in the existing solar array datasets but was outside the 190 m radius, we georeferenced information to that shape, and added omitted array boundaries where necessary. Where possible, we investigated context using the array name in a Google search (which often pointed to InSPIRE, OSM, and GSPT repositories), leading to georeferencing and value-added attribute joining of new and existing objects between 191 m and ~50 km from the provided coordinates. We omitted 27 solar arrays installed after available reference imagery, or without available imagery and context. In total, we added 4.26 km² of new array area for 34 arrays. We intend to fully delineate and georeference new and remaining 1,490 point data arrays in future version updates.

A complete reference dataset of existing ground-mounted panel-row and array data

We excluded rooftop solar arrays by removing existing and delineated array boundaries that had more than a 50% areal intersection with the Global Google-Microsoft Open Buildings Dataset¹⁰¹, and panel-rows within those array boundaries. The resulting preliminary dataset of existing ground-mounted solar arrays contained 14,905 arrays with over 3,056 km² in original direct land use area. For 4,470 of those existing arrays, the preliminary dataset contained 1.07 million unique panel-row objects composing 137.3 km² in direct panel-row area.

Acquiring solar panel-row shapes

We acquired solar panel-row shapes within solar array boundaries using high-resolution aerial imagery and a combination of pixel-based and object-based image analysis approaches. Though solar installations possess a distinctive spectral signature^15,21, pixel-based classifications alone can suffer from spectral confusion (noise) due to variance across imagery acquisition conditions and some spectral similarity to shadows, water, and impervious surfaces^21,102. The consistent pattern and constrained layout of solar panel-rows within solar arrays make geographic object-based image analysis particularly effective for improving classification accuracy. We thus used a combination of supervised machine learning approaches and unsupervised object-based image analysis including Random Forest¹⁰³, X-means clustering¹⁰⁴, simple non-iterative clustering (SNIC)¹⁰⁵, and gray-level co-occurrence matrix (GLCM) texture^106,107. The integration of these methods is described in the following section and is well-supported in the literature for both general land cover mapping and solar classification (e.g., refs. ^{15,21,34,39,108,109}).

NAIP 4-band imagery is the only free and widely available imagery with spatial resolution capable of delineating individual solar PV and CSP panel-rows. NAIP is collected during the primary regional growing season every two to three years at the state-level at 0.3 to 0.6 m resolution (Fig. 4). At the time of writing, the most recent NAIP mosaics made available in Google Earth Engine range from 2021 to 2023 depending on the state and flight contracts for specific years. During GM-SEUS processing, NAIP 2023 was actively being uploaded to Google Earth Engine, replacing 2021 imagery in some states. NAIP imagery dates used in the development of GM-SEUS are shown in Fig. 4.

Panel-row image classification

We classified panel-rows using five spectral indices with reported utility in identifying solar panel-rows²¹ and arrays¹⁵. The indices were the normalized difference photovoltaic index (NDPVI)²¹, the normalized blue deviation (NBD)²¹, 4-band brightness (Br), the normalized difference vegetation index (NDVI)^110,111, and the normalized difference water index (NDWI)¹¹². These indices are calculated by:

$${NDPVI}=\frac{(\alpha \ast B-{NIR})}{(\alpha \ast B+{NIR})}$$

(1)

$${NBD}=\frac{B-\frac{R+G}{2}}{B+\frac{R+G}{2}}$$

(2)

$${Br}=\frac{R+G+B+{NIR}}{4}$$

(3)

$${NDVI}=\frac{({NIR}-R)}{({NIR}+R)}$$

(4)

$${NDWI}=\frac{(G-{NIR})}{(G+{NIR})}$$

(5)

where α is a weighting coefficient (0.5) to reduce the importance of variations in the blue band when differentiating impervious surfaces from the rest of the landscape²¹. The letters R, G, B, and NIR indicate the red, green, blue, and near infrared bands of the aerial imagery, respectively.

To incorporate spatial context, we clustered imagery within array boundaries using SNIC of the given spectral indices and a GLCM textural measure (sum average) of each index. SNIC is a polygonization segmentation approach that generates superpixel objects across a seed grid based on spatial-spectral context parameters such as compactness, connectivity, and neighborhood¹⁰⁵ that has shown promise in delineating solar arrays in combination with Random Forest¹⁵. GLCM textural metrics, specifically the sum average, further capture the unique spatial-neighborhood relationships between proximal panel-row-like pixels. GLCM has demonstrated utility in mapping high-resolution land cover (NAIP)¹⁰⁹ and in solar classification³⁴. SNIC and GLCM sum average were calculated for each spectral index using native Google Earth Engine functions.

We randomly sampled the SNIC superpixel clusters of the five spectral indices and the sum average for each index at 1000 points within each array boundary to train a locally relevant X-means clustering algorithm. We used a minimum number of clusters of 2 (solar and non-solar) and a maximum of 4 clusters, allowing one level of variability in solar and non-solar supercluster averages (e.g., two module types or two ground covers). For large arrays ( > 5 ha) and arrays with multi-polygon boundaries, we split imagery within the array sub-boundary into equal area chunks (no greater than 5 ha) to enhance computational efficiency and to allow large arrays to have greater X-means variability.

To classify the unsupervised X-means clusters, we trained and ran a Random Forest model to identify panel-rows (distinct from other land covers) within each array. We generated a new CONUS NAIP training dataset with 12,000 training points composed of 6 classes and 2,000 sample points per class (solar: 0, developed: 1, vegetated: 2, water: 3, snow/ice: 4, barren/sparse vegetation: 5). This training dataset is distributed along with the GM-SEUS data to facilitate others doing land use classification with NAIP data. To generate solar samples, we randomly sampled 2,000 panel-row centroids in existing solar panel-row data. We acquired land cover samples from 2018 and 2019 NAIP imagery random sampling within 25,000 Land Change Monitoring, Assessment, and Projection (LCMAP) validated reference plots from Pengra et al.¹¹³, ensuring each class had the closest to 2,000 samples as possible given class limitations. Given that LCMAP contains few examples of snow/ice plots, we also randomly sampled ~2,000 snow/ice points from the Randolph Glacial Inventory¹¹⁴ within CONUS. Qualitatively, we observed that within array bounds where the surrounding ground cover was relatively constrained to vegetation, barren surfaces, and impervious surfaces, CSP and thin film panel-rows exhibited distinct spectral signatures to solar PV. CSP panel-rows were often spectrally similar to snow/ice due to high reflectance of CSP reflectors, and thin-film panel-rows could be spectrally similar to water because of the high absorbance of the visible and NIR spectrum. Thus, the final solar image classification included those respective classes for CSP and thin-film module type installations.

We classified the original NAIP imagery using the five spectral indices and the new NAIP training dataset with 200 trees and a bag fraction of 0.5. The Random Forest model had an overall accuracy of 99.6% based on the aggregated performance per tree and an out of the bag error estimate of 0.35. Each cluster with the majority of its area classified as solar was assigned to the solar class. We then eroded small islands (commission errors) and filled holes (omission errors). Panel-rows were then vectorized and negative-buffered by a single pixel width to dissolve single-pixel inter-row connections. Given the large number of vertices captured by sub-meter image classification, we improved processing and storage efficiency by saving the convex-hull of each pixel-based panel-row as the final geometry. All imagery and data were accessed, trained, classified, and analyzed in Google Earth Engine¹¹⁵.

Filtering for high-quality panel-rows

The vectorized panel-row dataset contained commissions that were geometrically dissimilar to true positives within the dataset (universally) and within the array (locally). Thus, we universally removed panel-rows based on several criteria. We removed panel-row objects that were outside a minimum (15 m²) and maximum (2000 m²) panel-row area based on the minimum and maximum panel-row areas of the existing panel-rows dataset. We then universally removed any panel-row object with a perimeter to area ratio less than the minimum of the existing panel-row dataset (0.18).

Locally (within each array) we removed panel-rows in which two or more of five geometric similarity measures failed: (1) mount technology of the panel composed less than 10% of the array, (2) ratio of the long-edge to the short-edge (length ratio) more than three standard deviations from the array mean, (3) the ratio of the panel-row area to the bounding box area more than three standard deviations from the array mean, (4) the perimeter to area ratio more than three standard deviations from the array mean, and (5) the Polsby-Popper ratio of compactness more than three standard deviations from the array mean. The Polsby-Popper ratio, first used to defend against gerrymandering¹¹⁶, is defined by:

$${Compactness}=\frac{4\ast \pi \ast {rowArea}}{{{rowPerimeter}}^{2}}$$

(6)

While accounting for universal and within-array outliers removed a considerable quantity of commissions, some arrays contained overall low-quality classifications leading to retention of low-quality panel-rows objects. To address this, we created temporary solar array boundaries from the panel row objects (see Enhancing existing array boundaries with panel-rows). We then calculated the new array area and the perimeter to area ratio of the new array and the original existing array shape. We removed array-wide panel-row objects that were less than 25% of the original array area and greater than 99th percentile of the existing arrays perimeter to area ratio.

To create the final GM-SEUS panel-row dataset, newly derived panel-rows were merged with existing panel-rows giving preference to existing panel-rows. The result was 1.07 million panel-rows from existing sources, and 1.85 million newly delineated panel-rows. In addition to the quality-controlled dataset of array and panel-row boundaries, the data repository contains the raw Google Earth Engine output for all NAIP classified panel rows. The panel-row and enhanced array boundary delineation workflow is shown in Fig. 5.

Enhancing design metadata

Enhancing existing array boundaries with panel-rows

We created new solar array geometries from GM-SEUS panel-rows using a buffer, dissolve, and erode approach. The selected buffer distance was 10 m, allowing for large panel-assemblies in panel-rows and high-latitude panel-rows to both have large (up to 20 m) spacing. This was similar to our previous approach where we used a 5 m buffer to group panel-rows into an array²¹ and to Hu et al.¹⁷ who used a 3 m buffer to assign an array group for rooftop solar. Importantly, the erosion used here removed the area external to the panel-rows and the space between. Thus, this process inherently aligns with our definition of a solar array spatial footprint (see Fig. 5).

The GM-SEUS repository contains a version of all newly created array boundaries from the final panel-row dataset and a version of all existing array boundaries replaced with newly delineated array boundaries where available. We maintained USPVDB and CCVPV array boundaries due to their completeness and matching to our definition of an array. OSM, CWSD, and TZ-SAM arrays do not inherently follow our definition of an array and contain manual delineation (OSM and CWSD) and medium resolution remote sensing (CWSD and TZ-SAM) biases^26,36. Additionally, given that CWSD and TZ-SAM arrays were independent of project-level metadata, we allowed newly delineated disconnected array shapes to be considered as separate arrays in this source dataset. For these arrays, we grouped newly created sub-array geometries by the same installation year (see Estimating installation year), allowing arrays installed in different years to be their own installation. In total, 5,017 arrays (174 km²) received a new array boundary delineation. The original area of these arrays was 281 km².

Estimating panel-row azimuth, mount technology, inter-row spacing, and tilt

We estimated the azimuth and mount technology for each panel-row object. We defined the azimuth as the primary south-facing cardinal direction of the short-edge vector in the minimum bounding rectangle (±180°), given that all arrays were in the northern hemisphere. In the final GM-SEUS dataset, the average azimuth (avgAzimuth) for single-axis mounted solar arrays was corrected to the southward-normal (perpendicular) angle to the panel-row face direction to follow azimuth definitions in existing datasets^26,49. The final panel-row dataset maintains azimuth (rowAzimuth) as the primary direction of the short-edge vector (panel-row face).

To classify mount technology, we also calculated the length ratio of the long vector to the short vector and the ratio of panel-row area to bounding box area. The conditions for classifying mount technology were (see Fig. 3): single-axis–azimuth is within 30° of east or west and length ratio is greater than 2.5, fixed-axis–azimuth is within 60° of south and length ratio is greater than 2.5, dual-axis–the length ratio is less than 2.5. We also calculated the distance between each panel-row and the nearest panel-row (rowSpace) in the azimuthal direction for fixed- and single-axis panel-rows and all directions for dual-axis panel-rows. Azimuth and mount classification logic is similar to that of Edun et al.⁹¹ and Perry et al.⁹².

Optimal fixed-axis tilt (tiltEst) is generally assumed to correlate with array latitude, with slight deviations at higher latitudes¹¹⁷. However, local climate and topography also play important roles¹¹⁸. Using newly acquired azimuth and mount metadata, we estimated optimum tilt angle (tiltEst) for fixed-axis solar PV arrays (and mixed-mounted arrays) using the pvlib iotools package^60,119. The latitude and longitude of each array was used to retrieve local typical meteorological year data from the PVGIS-ERA5 v5.3 database¹²⁰. The typical meteorological year data provided location specific irradiance data that incorporates shading from local topography. Global plane of array irradiance for orientations between 10 and 70 degrees from horizontal facing the avgAzimuth were modeled using the typical meteorological year and an isotropic model in python. The tilt with the greatest annual modeled global plane of array irradiance was selected for tiltEst.

Estimating ground cover ratio (GCR)

Ground cover ratio (GCR), sometimes referred to as packing factor (PF), has previously been defined by two different relationships. We calculated relationships both by:

$${{GCR}}_{1}=\frac{{totRowArea}}{{{totArea}}^{\ast }}$$

(7)

$${{GCR}}_{2}=\frac{{rowWidth}}{{rowWidth}+{rowSpace}}$$

(8)

where totRowArea is the top-down or apparent total panel-row area within an array at peak solar inclination (for tracking arrays), totArea^* is the total land area of the panel-rows and the spacing between them^{21,117,121,122,123,124}, which is equivalent to totArea for arrays with complete panel-row delineation and arrays were a new boundary was delineated and replaced an original boundary, rowWidth is distance from the bottom edge to the top edge of a row along the short edge, and rowSpace is the distance (azimuthal distance fixed- and single-axis) from an array edge to the nearest panel-row edge. The sum of rowWidth and rowSpace is the horizontal ground distance between any identical point of a module in a directly adjacent row^59,62,125. We filled in gaps for arrays without panel-row information by estimating GCR₁ and GCR₂ using a multiple linear regression between GCR and latitude and longitude from arrays with panel-row information for each mount and module technology.

Estimating installation year

Solar array installation year is often acquired by change detection and manual validation of aerial and satellite remote sensing imagery where permitting data is not available^{14,21,29,33,36}. Given the quantity of arrays in GM-SEUS and the lack of permit data for commercial-scale installations, we needed a way to independently and automatically estimate the year of completed installation. We used the Google Earth Engine implementation of Landsat-based detection of trends in disturbance and recovery (LandTrendr) algorithms¹²⁶ v0.2.0 to estimate the solar array installation year. LandTrendr is a suite of temporal segmentation algorithms tailored to detecting changes in forested areas at 30 m resolution, but with broader applications. We previously used LandTrendr and NDPVI to detect solar installation years between 2008 and 2018 in California with a 79% accuracy within one year of the manually validated installation year²¹.

Temporal segmentation requires a significant change in pixel spectral trajectory, which is dependent on the historical land use and land cover, and the subsequent land management between the arrays. Landsat pixels (30 m) contain mixed land cover of panel-rows and the space between them, meaning the post-installation reflectance is dependent on GCR, panel-row area, and ground cover management. Given the broader spectrum of possible spectral histories and variables affecting solar reflectance across the US, and reported utility in employing multiple indices for LandTrendr disturbance detection¹²⁷, we modified our original approach to include a multi-index performance-weighted average of LandTrendr years of disturbance across twelve spectral indices. These were the indices with reported utility in solar detection (NDPVI, NBD, Br, NDVI, and NDWI) along with seven land use change indices built into LandTrendr including the enhanced vegetation index (EVI)¹²⁸, the normalized burn ratio (NBR)¹²⁹, the normalized difference moisture index (NDMI)¹³⁰, and the Tasseled Cap-Transformations (greenness, brightness, wetness, and angle)^131,132. All built-in indices except EVI take advantage of short-wave infrared bands that Landsat includes but NAIP does not.

We used LandTrendr and these indices to estimate the year of newest disturbance, or land use change, within the boundaries of each array polygon between 2009 and 2023, noting that a significant amount of existing solar has been installed in the last decade²⁶. For arrays without permitted installation years, if an input array shape contained multiple newly delineated or existing polygons, we broke it into component polygons to check for unique installation years, identifying and separating mistakenly grouped arrays within the original boundary. This step ensures consistency with our array definition. We also removed segmented boundary years (2008 and 2024) to address known biases in the LandTrendr segmentation at edge years²¹. To promote accuracy, we subset LandTrendr indices where the mean absolute error (MAE) between the USPVDB permitted installation year values and LandTrendr estimated installation year (~4,000 arrays) was less than two years (NBD, NDPVI, NDWI, TCG, and NDMI). For these indices, we applied an inverse variance-weighted average to calculate the installation year. Similar performance-weighted average approaches are common for various applications (e.g., refs. ^127,133,134). We then included indices with greater MAE to fill in non-detect installation years (TCA, NDVI, NBR, Br, TCW, EVI, and TCB). Of the over 16,000 input array polygon boundaries, this LandTrendr method only omitted installation year estimates for 41 array polygons, only 2 of which did not have installation years from other sources.

We independently estimated the installation year for all arrays. However, we also retain installation years from existing datasets in the following order: USPVDB, InSPIRE, USS, SolarPACES, GSPT, GPPDB, TZ-SAM, CWSD, and OSM. We omitted CCVPV since the multi-index performance-weighted average method is more robust than our original single index approach. We also only included installation year from TZ-SAM and CWSD for 2018 or later, since these approaches are based on Sentinel-2 availability. When LandTrendr did not result in a year of detected disturbance, we manually analyzed the installation year using available historical satellite and aerial imagery for each array and LandTrendr time series plots from methods from Stid et al.^21,55. We allowed new array shapes derived within TZ-SAM array boundaries to be independent arrays, and regrouped shapes if they were installed in the same year. We provide both a compiled existing installation year (instYr) and the new LandTrendr-derived installation year (instYrLT) in the final GM-SEUS dataset.

Estimating installed capacity

Installed solar capacity is commonly assumed to correlate with panel-row surface area. Others have used statistical regression relationships between known solar PV capacity and panel-row surface area¹⁹ along with several adjustable parameters such as the spectral-intensity of the module surface area^17,135. With temporal information and module composition, we estimated module efficiency and thus installed capacity for solar PV arrays with (Eq. 9) and without (Eq. 10) panel-row information. We thus used the following relationships modified from Martín-Chivelet¹¹⁷ and Phillpott et al.³⁶ to estimate peak installed capacity for solar PV arrays (power, MW_DC):

$${{capMWest}}_{{PV}}={totRowArea}\ast \eta \ast {G}_{{STC}}$$

(9)

$${{capMWest}}_{{PV}}=({{totArea}}^{\ast \ast }\ast {{GCR}}_{{local}})\ast \eta \ast {G}_{{STC}}$$

(10)

where η is the annual average value for ground-mounted systems from the LBNL Tracking the Sun 2024 Report⁴⁴ for each technology (c-Si or thin-film), G_STC is the irradiance at standard test conditions (1 kW_DC m^–2), totArea^** is the total array area adjusted for area bias of the input dataset relative to USPVDB array area, and GCR_local is the estimated GCR₁ for arrays without panel-row information in relation to latitude and longitude by mount technology and module type. Note that Eqs. 9 and 10 are effectively the same, because GCR_local is equivalent to the mount and spatially relevant average ratio of totRowArea to totArea. Area bias was determined by intersecting array shapes from input datasets with USPVDB arrays and acquiring the average percent-difference in array polygon area (Fig. 6B and 6C). This corrects for array datasets that tend to over or underestimate total array area (by our definition), reducing erroneous totRowArea estimates when multiplying by GCR_local.

After dataset harmonization, 59 of 74 CSP arrays were missing a reported installed capacity. For these arrays, we estimated thermal capacity (MW_th) for solar CSP arrays with (Eq. 11) and without (Eq. 12) panel-row information by:

$${{capMWest}}_{{CSP}}={{totRowArea}}_{{effective}}\ast {C}_{f}$$

(11)

$${{capMWest}}_{{CSP}}={({{totArea}\ast {GCR}}_{{local}})}_{{effective}}\ast {C}_{f}$$

(12)

where totRowArea_effective is the effective panel-row (for CSP, collector) area, estimated for parabolic trough, linear Fresnel, and dish-CSP systems as the half of the circumference of a circle with a diameter of rowWidth, and for power tower, beam down tower, and hybrid-CSP systems as the totRowArea, C_f is the recommended conversion factor 0.0007 MW_th m^–2 of aperture collector area to installed thermal capacity¹³⁶. Again, totArea * GCR_local is a spatial regression of GCR₁ as it relates to totRowArea. This is a vastly simplified approach with numerous limitations and does not include assumptions about thermal-to-electric efficiency. More robust methods are available (e.g., refs. ^137,138), but beyond the scope of this GM-SEUS initial version.

Similar to other new solar array attributes, we estimated installed capacity for all arrays and retain capacity attributes from existing datasets with a capacity attribute in order of perceived quality. We considered high-quality capacity estimates as those that were largely complete and derived from single-source permit records or data directly from industry partners (USPVDB, InSPIRE, USS, SolarPACES, PVDAQ). Lower quality estimates are capacity data that were derived from multiple sources, including some permit or operator data, but with less consistency (GSPT, GPPDB). Due to uncertainty in source or quality, we report only the newly estimated capacity for all arrays from CCVPV, TZ-SAM, and OSM datasets using Eqs. 9–12. CWSD did not report capacity.

Data Records

GM-SEUS v1.0 is available for public use and is provided in the Zenodo Repository¹³⁹. The final data repository provides all geospatial files as geopackage, shapefile, and as CSV. We also provide 17,500 input and target images derived from GM-SEUS and NAIP imagery for direct application in deep learning and pattern recognition use cases. When using these products, please cite the original data sources and articles^{21,25,26,36,42,45,46,47,48,49,52,57,58,93,94,95,96} along with this publication. All arrays and panel-rows contain a ‘Source’ attribute, which references the data source of the original spatial information (see README in the data repository for more detail). See Table 1 for attribute-level information in each dataset. Data records are up to date through December 2024. The USPVDB, TZ-SAM, and OSM datasets intend on updating their completeness on a regular basis. We intend to provide annual updates to this dataset.

The GM-SEUS open repository contains the following files:

GMSEUS_Arrays_Final: Final array dataset containing boundaries from existing datasets and enhanced by buffer-dissolve-erode technique with GM-SEUS panel-rows containing all array-level attributes (ESRI:102003), geopackage, shapefile, csv
GMSEUS_Panels_Final: Final panel-row dataset containing boundaries from existing datasets and newly delineated GM-SEUS panel-rows containing all panel-row-level attributes (ESRI:102003), geopackage, shapefile, csv
GMSEUS_NAIP_Arrays: All array boundaries created by buffer-dissolve-erode method of newly delineated (NAIP) GM-SEUS panel-rows (ESRI:102003), geopackage, shapefile, csv
GMSEUS_NAIP_Panels: Newly delineated panel-rows from NAIP imagery with low-quality panel-rows removed (ESRI:102003), geopackage, shapefile, csv
GMSEUS_NAIP_PanelsNoQAQC: All newly delineated panel-rows from NAIP imagery without any quality control (ESRI:102003), geopackage, shapefile, csv
NAIPtrainRF: Training dataset of 12,000 NAIP training points (2,000 class^–1) containing class values, spectral index values, the year of NAIP imagery accessed, and point coordinates (EPSG:4326), csv
LabeledImages: Directory containing image and mask subdirectories with ~17,500 input and target images for deep learning pattern recognition applications, GeoTIFF

We provide the following attribute fields in GM-SEUS Final Arrays:

arrayID: unique numeric ID of each solar array in GM-SEUS, unitless
Source: original array boundary source from existing datasets or manual digitization, unitless
nativeID: numeric ID of each solar array in from source spatial dataset if an indexing system existed, unitless
latitude: latitude of the array boundary centroid (EPSG:4269), decimal degrees
longitude: longitude of the array boundary centroid (EPSG:4269), decimal degrees
newBound: binary, whether the array boundary was derived from the existing data sources (0) or from a buffer-dissolve-erode of panel-rows following our definition of an array boundary (1), unitless
totArea: total land footprint of panel-rows and the space between them, m²
totRowArea: If numRow is greater than 0, sum of rowArea within an array. Otherwise, estimated based on totArea and GCR1 estimation where no panel-rows were detected, m²
numRow: number of panel-rows within an array, m²
instYr: installation year from existing sources, with gaps filled in by instYrLT, year
instYrLT: LandTrendr-derived installation year independent of any data source other than Landsat spectral trajectory, year
capMW: installed capacity from existing sources, with gaps filled in by capMWest, MW_DC or MW_th
capMWest: estimated installed capacity derived from capacity to panel-row area relationships described in Eqs. 9–12 independent of any data source, MW_DC or MW_th
modType: reported panel-row (module) technology at the array level (c-Si, thin-film, csp). If unreported, assumed to be c-Si, unitless
effInit: initial panel-rows efficiency from existing sources with gaps filled in by based on efficiency estimation from modType and instYr taken from the annual Tracking the Sun report, %
GCR1: 0-1, the ratio of totRowArea to the total area of panel-rows and the space between them. For arrays with complete panel delineation and arrays where newBound is 1, this is equivalent to totArea. This is also called packing factor. If numRow is greater than 0, GCR1 is an actual GCR₁ for the array. Otherwise, GCR1 is estimated by linear regression of latitude and longitude by mount and module type, unitless
GCR2: 0-1, the ratio of the average width of the panel-row short edge (rowWidth) to the horizontal ground distance between identical panel-rows points, defined as the sum of widthAvg and rowSpace. If numRow is greater than 0, GCR2 is an actual GCR₂ for the array. Otherwise, GCR2 is estimated by linear regression of latitude and longitude by mount and module type, unitless
mount: mount technology derived from the azimuth and geometry of each panel-row within the array or from existing sources, with preference given to newly derived mount technology. Either ‘fixed_axis’, ‘single_axis’, ‘dual_axis’, or ‘mixed_’ with a lower-case letter denoting the mixed mounts (e.g., mixed_fs), unitless
tilt: panel-row tilt for fixed-axis arrays (including arrays with mixed-mounting) from existing sources and filled in by tiltEst, degrees above horizontal
tiltEst: estimated panel-row tilt for fixed-axis arrays (including arrays with mixed-mounting) estimated using pvlib, degrees above horizontal
avgAzimuth: median estimated azimuth of panel-rows within array bounds or reported azimuth from existing sources, with preference given to newly estimated azimuth. For single-axis tracking arrays this is the cardinal direction of the long-edge. For all other mount types, this is the cardinal direction of the panel-row face, degrees from north
avgLength: median length of the long edge of panel-rows within an array, meters
avgWidth: median length of the short edge of panel-rows within an array, meters
avgSpace: median spacing between the solar array rows, in meters, between edges of the panel-row projected onto the ground, meters
STATEFP: unique geographic identifier for the U.S. Census Bureau state entity, unitless
COUNTYFP: unique geographic identifier for the U.S. Census Bureau county entity, unitless
geometry: best new or available geometry matching the array definition which contains panel-rows and the space between them, derived from existing sources (newBound = 0) or from a buffer-dissolve-erode of newly delineated panel-rows (newBound = 1), meters
version: GM-SEUS version in which the array geometry and attributes are derived. Each subsequent version will re-derive new geometries and the best delineation from each version will be selected, unitless

We provide the following attribute fields in GM-SEUS Final Panel-Rows:

panelID: unique numeric ID of the panel-row in GM-SEUS, unitless
arrayID: unique numeric ID of each solar array in GM-SEUS that the panel-row is associated with, unitless
Source: panel-row boundary source from existing datasets or GM-SEUS, unitless
rowArea: top-down or apparent panel-row area directly from the output of image classification, m²
rowWidth: length of the short-edge of the panel-row, meters
rowLength: length of the long-edge of the panel-row, meters
rowAzimuth: azimuth of the panel-row, with 0 at North, degrees
rowMount: mount technology (fixed-axis, single-axis, or dual-axis) of the panel-row, unitless
rowSpace: the inter-row spacing between the panel-row and the nearest panel-row in the azimuthal direction (fixed- and single-axis) or any direction (dual-axis), meters
geometry: top-down or perceived geometry, meters
version: GM-SEUS version in which the panel-row geometry and attributes are derived. Each subsequent version will re-derive new geometries and the best delineation from each version will be selected, unitless

Technical Validation

GM-SEUS completeness compared to other datasets

Through the end of Q3 2024, the Solar Energy Industries Association and Woods Mackenzie reported that the US had installed 5.3 million solar systems with a total solar capacity of ~220 GW_DC (ref. ⁵⁴). Given available information, ~18% is residential-scale solar (~40 GW_DC), ~12% is commercial- or community-scale (~25 GW_DC), and ~70% is utility-scale (~150 GW_DC), with more than 80 GW_DC installed since 2023⁵⁴.

GM-SEUS reports an estimated 186 GW_DC installed through December 2024 (TZ-SAM and OSM provide the most recent data), or ~107% of estimated non-residential solar capacity (~85% of all solar capacity) through 2024. We also provide new sub-array metadata for 9,042 arrays (83.1 GW_DC), 5,858 (16.7 GW_DC) of which are not contained within USPVDB. Note that we include all arrays from USPVDB and TZ-SAM but refined the TZ-SAM array area where panel-row information is available and independently estimate installed capacity using site-level information.

For reference, within CONUS, the USPVDB v2.0 is complete through Q3 2023 and represents a permitted 90.4 GW_DC (ref. ⁵²), or ~65% of non-residential solar capacity through 2023. TZ-SAM Q3 2024 estimates capacity by array area and country-wide values for GCR and inverter loading ratio and estimates 197 GW_AC within the CONUS (~257 GW_DC assuming a median inverter loading ratio of 1.3 from USPVDB), or ~147% of non-residential solar capacity (~117% of all solar capacity) through Q3 2024⁵⁷. Though, note that coarse GCR estimates, and medium-coarse satellite imagery generated array geometries may overestimate encompassed area and thus estimated capacity²⁶. This is evident in the TZ-SAM arrays that intersect USPVDB arrays, which overestimate total array area on average by ~45% (by our definition of an array – Figs. 3 and 6) and installed capacity by ~30% (assuming 1.3 inverter loading ratio) for the same arrays.

Many of the input datasets report being the most complete or comprehensive for their scope at the time of publication. We have compiled these data repositories, removed repeat information preferencing quality, acquired updated data from OSM, and enhanced spatiotemporal metadata for many existing array datasets. GM-SEUS is thus a harmonization and enhancement of the most comprehensive publicly available ground-mounted solar energy datasets available in the US through 2024 (see Table 1). However, we exclude non-contiguous US regions and residential and rooftop systems. The most comprehensive residential and rooftop datasets to-date are Bradbury et al.¹ in the US (data: ref. ¹⁴⁰) and Stowell et al.⁹ in the United Kingdom (data: ref. ¹⁴¹).

We have inevitably omitted existing ground-mounted solar energy systems and likely included commissions (non-solar objects). We have no way of knowing the extent of omission error beyond comparing against broad solar industry trends and acknowledging the 1,490 point data sources that we still need to manually georeference and digitize. Since NAIP imagery is often a year behind present due to quality control and inspection requirements, we are also not able to directly determine commission error (non-solar) in the existing dataset compilation. Thus, some arrays derived with moderate resolution remote sensing (CWSD, TZ-SAM) may contain non-solar commissions. For example, we have no way of knowing if arrays without available NAIP imagery including panel-rows are: 1) arrays installed after the most recent NAIP imagery or 2) inclusive of non-solar objects. There may also be situations where erroneous classifications of non-solar objects passed the panel-row quality control. However, if we only consider arrays verifiable with recent imagery (USPVDB, CCVPV, OSM, digitized arrays, and arrays with compiled or identified panel-rows), total GM-SEUS installed capacity is 125 GW_DC, 72% of reported non-residential capacity through 2024. In general, validation of completeness is limited by our reliance on freely available data and our decision not to include additional existing data held behind paywalls. The difficulty in validating GM-SEUS underscores the motivation to create this product along with similar efforts.

Spatial confidence in array and panel-row delineation

USPVDB and EIA Form 860 form the most rigorous, robust, and widely available data from which to compare our results and is often the source for metadata on other existing datasets. The hand delineated array boundaries in USPVDB ensure completeness and also match our definition of an array (Fig. 3). Thus, for both spatial confidence and attribution technical validation, we compare our results to USPVDB.

We generated panel-row geometries for 9,042 arrays. Although we maintain USPVDB boundaries in the final dataset, we use these high-fidelity hand delineated boundaries to validate our NAIP array delineation approach for other existing array datasets. To evaluate confidence in newly generated array geometries, we use the Jaccard Similarity Index, also known as the Intersection over Union (IoU)¹⁴². IoU is bound by 0 and 1, where zero indicates no overlap and 1 indicates identical overlap of input geometries (A, B). IoU was calculated by:

$${IoU}(A,B)=\frac{A\cap B}{A\cup B}$$

(13)

We generated panel-rows using NAIP imagery and created array boundaries for 2,871 of 4,185 USPVDB arrays (52.9% of total USPVDB area). We calculated the IoU for all NAIP panel-row delineated array boundaries that intersected with any USPVDB array polygon. Due to USPVDB multi-polygons and connected array shapes, we dissolved all boundaries and considered any individual polygon where boundaries overlapped. The 1,314 array omissions were due either to poor quality panel-row or array delineation or outdated imagery compared to the installation of the array. Though, note that this partial coverage is only for array boundary delineation method validation and that we include all USPVDB arrays and array area in GM-SEUS.

The median IoU for array GM-SEUS boundaries was 0.88 (Fig. 6A), which is comparable and even superior to numerous instances of IoU being used to validate solar array boundary delineation (e.g., refs. ^{1,14,15,28,29}). The NAIP panel-row delineation method underestimates USPVDB area on average by ~12% (Fig. 6B and C). This makes sense, given our highly conservative panel-row selection for high-quality sub-array metadata and that we only consider array area within the existing array boundary. We also compared the spatial array delineation of existing datasets to USPVDB using IoU resulting in median values of 0.95 for CCVPV, 0.85 for CWSD, 0.83 for OSM, and 0.69 for TZ-SAM, with total proportion of USPVDB area captured being 3.5% for CCVPV and CWSD, 93.0% for OSM, and 99.6% for TZ-SAM. Note that CCVPV and CWD more closely follow the array definition used here (and in USPVDB), where OSM and TZ-SAM more closely correspond to project area (Fig. 3) explaining the overestimate of array area relative to USPVDB (Fig. 6C).

We also used IoU to compare newly generated GM-SEUS panel-rows to existing OSM and CCVPV panel-row datasets, given that USPVDB does not provide panel-row spatial data. The median IoU for GM-SEUS panel-row boundaries was 0.48 (Fig. 7A). This is considerably lower than the array boundary IoU, though, comparing high-spatial resolution individual panel-rows is similar to pixel-wise IoU which are known to have lower scores than array-wise IoU¹⁷. Additionally, OSM contributors most often use Bing or Maxar imagery to delineate panel-rows in the OSM user-interface¹⁴³. Horizontal accuracy standards for NAIP rectification require 95% confidence within 4-meters of true ground¹⁰². At the scale of panel-rows, ground sample errors up to a few meters can mean entirely missing or missing in part panel-row overlap with panel-rows hand delineated by OSM. Thus, what is more spatially important than panel-row geometric alignment is the correlation between the total estimated panel-row area within each array (Fig. 7B) and the difference in total panel-row for each individual intersection (Figs. 7C and 6D). Figure 7B shows that array-total panel-row area is highly correlated (log-log transform R² = 0.95) between existing panel-rows and GM-SEUS NAIP-generated panel-rows, and that NAIP-panel-rows area ~15% larger than hand delineated panel-rows (Fig. 7C and 7D).

Attribute validation

We estimated several solar array and panel-row characteristics that have available reference data for validation. We estimated the array installation year based on the LandTrendr-derived year of greatest spectral change, module efficiency based on installation year and module type, GCR based on new and existing panel-row delineation, installed capacity based on panel-row area or local GCR, predominant array mount technologies and azimuth based on panel-rows, and panel-row tilt based on latitude. We are not aware of validation datasets for other panel-row metadata metrics.

We harmonized and estimated the installation year for all arrays in GM-SEUS. Comparing GM-SEUS to the utility-scale solar arrays in USPVDB v2.0, the estimated installation year MAE was 1.52 years (Fig. 8). Error was skewed high in early installation years and low in late installation years, displaying limitations in this short-year-range and high spatial-resolution application of LandTrendr temporal segmentation. However, the more available years to segment, the less impact this error will have. Additionally, some indices (NDB and NDPVI) did not have the early installation year bias. For the individual indices used in estimating installation year, NBD had the lowest MAE (1.53 years) and TCB had the greatest MAE (4.07 years). Within USPVDB, individual indices omitted installation years for between 113 arrays (TCG) and 551 arrays (TCB). In total, 66% of estimated installation years were within one year of permitted installation year, and 80% were within two years.

We acquired permitted capacity from existing data sources for 4,894 arrays and estimated capacity for the remaining 10,123 arrays in GM-SEUS. Compared to USPVDB, estimated installed capacity log-log transform R² of 0.84 for solar PV arrays (c-Si and thin-film). Due to limited availability of CSP capacity validation data (15 arrays) and aperture area conversion factor limitations across CSP technologies, we did not perform a comparative statistical analysis for CSP estimated capacity. Estimated solar PV capacity error is shown in Fig. 9.

Relative to USPVDB, GM-SEUS mount types were correct 92% of the time. Regarding azimuth and tilt, Perry et al.⁵³ described accuracy of reported azimuths within 15 degrees of ground truth (64%) and of reported tilt within 5 degrees of ground truth (63%). For GM-SEUS, 89% of azimuth values and 12% of optimal tilt values were within 15 and 5 degrees of permit-reported azimuth and tilt respectively. Note that rather than estimating actual tilt, we estimated optimal tilt for an array using local typical meteorological year and topography data. Expanding the threshold, estimated optimum tilt was within 10 degrees of permitted tilt for 41% of arrays, and 80% within 15 degrees of permitted tilt.

Median GM-SEUS Solar PV GCR₁ values were 53% for fixed-axis, 42% for single-axis tracking, 50% for dual-axis tracking, and 63% for arrays with mixed mounts. We compared GM-SEUS GCR₁ estimates with CCVPV packing factor estimates²¹, estimates from Ong et al.¹²³ and USA-wide estimates from Phillpott et al.³⁶. Although, Phillpott et al.³⁶ reports GCR for small (43%), medium (36%), and large (30%) arrays, rather than by mount technology. GCR₁ and GCR₂ distribution and comparison to validation data is shown in Fig. 10.

Fixed-axis and single-axis tracking GCR₁ values are comparable across GM-SEUS, CCVPV, and Ong et al.¹²³, and for “small” array GCR (43%) reported Phillpott et al.³⁶. GM-SEUS dual-axis GCR₁ is more than twice (50% vs. 22%) what is reported by Ong et al.¹²³. However, Ong et al.¹²³ reported only 9 dual-axis arrays (and only 83 systems in total). We considered 230 dual-axis systems, 5,282 fixed-axis systems, 2,583 single-axis systems, and 298 mixed-mount systems. Ultimately, panel-row area (Fig. 7B–D) and GCR (Fig. 10) estimates indicate that panel-row area and packing layout are generally consistent with prior findings and data, with a small overestimation of panel-row area.

Usage Notes

GM-SEUS limitations

This dataset provides a broad characterization of solar array design practices. Any characterization of solar array design and management derived from remote sensing imagery should be considered with extreme scrutiny given the limitations of such approaches¹⁷. While our work fills a critical data gap and compiles and enhances existing high-fidelity datasets, the design practices reported here are thus subject to uncertainty and should not be used to represent actual conditions at individual sites. No warranty is expressed or implied regarding accuracy, completeness or fitness for a specific purpose. We publish this dataset as open access, for the broader science community, policy makers, and stakeholders in addressing questions about the existing renewable energy landscape and do not consent to this data being used to target, identify, or make claims about individual arrays, properties, or entities. Any such use case is strictly prohibited.

Despite our best efforts, we acknowledge limitations in the creation of GM-SEUS. We have already noted several limitations. Hu et al.¹⁷ defines several additional cautions when performing solar energy characterization using overhead imagery. These include issues with distribution shift, availability of testing data, standardization of comparison metrics, and the scale of evaluation impacting reported performance. Where possible, we mitigated these pitfalls by evaluating performance with data of similar scope and coverage (USPVDB) and at the array or panel-row level, rather than aggregating¹⁷. We also make our dataset publicly available and use common evaluation metrics (e.g., IoU).

There are additional challenges when using NAIP for image classification due to high-resolution aerial imagery metadata variability. These include timing of acquisition (season, date, time of day), camera tilt, look angle, flight path mosaic and georectification artifacts, and low-radiometric resolution¹⁰². On a pixel-wise basis, for example, these challenges could lead to underestimating the panel-row area of a single-axis tracking system if imagery was acquired at the start or end of contracted flight time¹⁴⁴. We have accounted for this bias in the past by trigonometrically correcting panel-row area based on the maximum estimated tilt angle of the tracking mount at the timing of imagery²¹. Though, even at sub-meter resolution, this correction could overestimate panel-row area if edge-pixels (or shadows) were included in the classification or if tilt is lower than expected. In fact, we know that in some arrays, GM-SEUS overestimates panel-row area due to convex-hull inclusion of isolated exterior pixels and commissions including shadows and access roads (Fig. 7C,D and Fig. 11). This example may also explain some uncharacteristically high GCR values, particularly for dual-axis installations (Fig. 10).

We do not recommend using our estimated CSP capacity values for granular analyses of CSP contribution energy infrastructure. Our current approach is a highly simplified first-order estimate of thermal capacity (Eqs. 11 and 12) for 59 of 74 included CSP arrays. While the applied estimation approach is not recommended for tower-CSP plants, we also extended it to these systems due to a lack of comparable alternatives. Additionally, we made no assumptions regarding thermal-to-electric efficiency and did not convert thermal capacity (MW_th) to electric capacity (MW_e). As a result, our results include a mix of estimated thermal capacity and electric capacity values from existing sources. Including both retained and estimated values, the total estimated GM-SEUS CSP capacity was 1.96 GW_th+e, compared to reported values from SolarPACES (1.51 GW_e), USS (1.77 GW_e), and GSPT (1.81 GW_e).

Similar to the results reported in Fujita et al.²⁶, GM-SEUS array boundaries do not represent total project area and thus total land transformation (see Fig. 3). GM-SEUS array boundaries are also spatially conservative, at times omitting array and panel-row area and under-representing the actual spatial footprint of the array (Fig. 11). This is due to our exclusion of panel-rows that were not geometrically consistent and panel-rows that produced unreasonable array boundaries. These limitations reduce the total intersection area relative to reference area and result in a low IoU (Fig. 11). We accept this limitation here because we valued high-quality panel-row metadata over within-array panel-row completeness. However, any calculation of solar energy-land interactions (e.g., land-use efficiency, land transformation, or footprint) should consider this knowledge and report limitations accordingly¹²⁴.

Input array polygon and point metadata limitations

Aside from manual digitization of point data to polygon data, GM-SEUS does not search for new arrays that are not contained within existing reference polygon datasets (Table 1). Thus, GM-SEUS completeness is currently limited by coverage of existing datasets. The coverage, metadata completeness, and quality of input datasets varies depending on the scope and age of the dataset. Below, we outline key limitations associated with the metadata attributes of these input datasets (see Table 1).

The TZ-SAM dataset^36,57 contains metadata on installation year, installed capacity, and GCR. TZ-SAM installation year is based on Copernicus Sentinel-2 imagery (S2) change detection, which is limited to installations after 2017. TZ-SAM also reports installation year as a range of potential dates based on the timing of imagery where change was detected. We select the median date within this range (only if the start of the range is 2018 or later) as the TZ-SAM provided installation year. TZ-SAM installation year is also not complete. TZ-SAM GCR estimates are derived from an OpenStreetMap validation dataset, are used to estimate installed capacity, and are provided at the country-level rather than the array level.

The CWSD dataset^25,94 contains metadata on installation year. As with TZ-SAM, CWSD installation year is based on S2 change detection, which is limited to installations after 2017.

The OSM dataset⁵⁸ contains metadata on installation year, mount, and installed capacity. OSM installation years are based on the ‘start_date’ attribute which contains crowd sourced uncertainty in definition. The same issue exists with mount and installed capacity reporting in OSM.

The SolarPACES or CSP.guru dataset^47,95 contains metadata on installation year and installed capacity. SolarPACES also contains technology information (e.g., Parabolic Trough, Power Tower), from which we made assumptions about mount technologies.

The World Resources Institute’s GPPDB dataset^42,96 contains metadata on installation year and installed capacity. GPPDB installation years were inferred from a commissioning year attribute, which may not be the same as the year of completed installation.

Use case product: labeled imagery for semantic segmentation

To display utility in the granularity of GM-SEUS arrays and panel-rows, we provide solar panel-row labeled images as an auxiliary data product intended for training deep learning models for pattern recognition (e.g., semantic segmentation). Deep learning convolutional neural networks require abundant and well-labeled training data and are used in a number of existing solar identification and characterization efforts (e.g.,refs. ^{3,4,5,6,7,10,11,12,13,14,16,17,18,19,20,23,25,27,28,30,32,33,36,53,91,92}).

Labeled imagery was created for solar energy arrays within GM-SEUS that contained NAIP generated panel-rows (CCVPV or GM-SEUS Source) and at least 10 identified panel-rows. To reduce panel-row omission error, imagery was only selected in sub-array regions where panel-rows were present. Images (inputs) are 4-band (R, G, B, and NIR) rasters masks (targets) are binary single band rasters (0: non-solar, 1: solar), where solar labels are GM-SEUS panel-row vectors rasterized at local NAIP resolution and projection. Images and masks are provided at 256 × 256 pixel dimensions. We allowed arrays to contain random point centered image windows equal to 50% of the panel-row containing array area divided by tiled area. This resulted in ~17,500 images and masks within 4,452 arrays. We intend on augmenting this dataset with higher density sampling to include 200,000 + training images in subsequent GM-SEUS versions. Example images and masks for fixed-, single-, and dual-axis mounted installations are shown in Fig. 12.

All files are stored as GeoTIFF files, with native NAIP imagery projections (UTM Zone for source imagery location, spatial reference information is included in both images and masks if reprojection is needed). Images and masks retain the same file naming logic for easy application. Importantly, file nomenclature includes the respective arrayID from the GM-SEUS, meaning images can be selected for metadata-specific applications (e.g., avgAzimuth, mount). File nomenclature is (for example): “id3044_tile0.tif”, where ‘3044’ is the arrayID from GM-SEUS and ‘tile0’ is the tile number for that array.

Code availability

The codebase for data acquisition, harmonization, and processing are available via Github (https://github.com/stidjaco/GMSEUS) and a Zenodo release (ref. ¹⁴⁵). Intermediate data products are available upon request, and the data repository README contains links to all source datasets. The code provided in the Github repository includes both Python Jupyter notebooks and Google Earth Engine JavaScript files. These Google Earth Engine files are intended to be uploaded and processed in the Google Earth Engine Code Editor (https://code.earthengine.google.com/), requiring an account and associated cloud repository.

We intend to update this work annually, adding new or updated source datasets, re-extracting metadata with updated NAIP imagery, and selecting the highest-quality array delineation from each version. Code and data updates will be made accordingly, with appropriate version increments.

References

Bradbury, K. et al. Distributed solar photovoltaic array location and extent dataset for remote sensing object identification. Sci. Data, https://doi.org/10.1038/sdata.2016.106 (2016).
Carr, N. B., Fancher, T., Freeman, A. T. & Manley, H. M. B. Surface area of solar arrays in the conterminous United States. ScienceBase-Catalog, https://doi.org/10.5066/F79S1P57 (2016).
Malof, J. M., Bradbury, K., Collins, L. M. & Newell, R. G. Automatic detection of solar photovoltaic arrays in high resolution aerial imagery. Appl. Energy, https://doi.org/10.1016/j.apenergy.2016.08.191 (2016).
Imamoglu, N., Kimura, M., Miyamoto, H., Fujita, A. & Nakamura, R. Solar Power Plant Detection on Multi-Spectral Satellite Imagery using Weakly-Supervised CNN with Feedback Features and m-PCNN Fusion. Britich Mach. Vis. Conf., https://doi.org/10.5244/C.31.183 (2017).
Camilo, J., Wang, R., Collins, L. M. & Malof, J. M. Application of a semantic segmentation convolutional neural network for accurate automatic detection and mapping of solar photovoltaic arrays in aerial imagery. 2017 IEEE Appl. Imag. Pattern Recognit. AIPR Workshop, https://doi.org/10.48550/arXiv.1801.04018 (2018).
Yu, J., Wang, Z., Majumdar, A. & Rajagopal, R. DeepSolar: A Machine Learning Framework to Efficiently Construct a Solar Deployment Database in the United States. Joule 2, 2606–2617 (2018).
Article Google Scholar
Hou, X., Wang, B., Hu, W., Yin, L. & Wu, H. SolarNet: A Deep Learning Framework to Map Solar Power Plants In China From Satellite Imagery. ICLR 2020 6 (2019).
Dunnett, S., Sorichetta, A., Taylor, G. & Eigenbrod, F. Harmonised global datasets of wind and solar farm locations and power. Sci. Data 7, 1–12 (2020).
Article Google Scholar
Stowell, D. et al. A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK. Sci. Data 7, 394 (2020).
Article PubMed PubMed Central Google Scholar
Zhuang, L., Zhang, Z. & Wang, L. The automatic segmentation of residential solar panels based on satellite images: A cross learning driven U-Net method. Appl. Soft Comput. 92, 106283 (2020).
Article Google Scholar
Costa, M. V. C. V. D. et al. Remote Sensing for Monitoring Photovoltaic Solar Plants in Brazil Using Deep Semantic Segmentation. Energies 14, 2960 (2021).
Article Google Scholar
Jiang, H. et al. Multi-resolution dataset for photovoltaic panel segmentation from satellite and aerial imagery. Earth Syst. Sci. Data 13, 5389–5401 (2021).
Article ADS Google Scholar
Kausika, B. B., Nijmeijer, D., Reimerink, I., Brouwer, P. & Liem, V. GeoAI for detection of solar photovoltaic installations in the Netherlands. Energy AI 6, 100111 (2021).
Article Google Scholar
Kruitwagen, L. et al. A global inventory of photovoltaic solar energy generating units. Nature 598, 604–610 (2021).
Article ADS PubMed Google Scholar
Plakman, V., Rosier, J. & Van Vliet, J. Solar park detection from publicly available satellite imagery. GIScience Remote Sens. 59, 462–481 (2022).
Article Google Scholar
Ge, F. et al. A Hierarchical Information Extraction Method for Large-Scale Centralized Photovoltaic Power Plants Based on Multi-Source Remote Sensing Images. Remote Sens. 14, 4211 (2022).
Article ADS Google Scholar
Hu, W. et al. What you get is not always what you see—pitfalls in solar array assessment using overhead imagery. Appl. Energy 327, 120143 (2022).
Article Google Scholar
Jiang, H. et al. Geospatial assessment of rooftop solar photovoltaic potential using multi-source remote sensing data. Energy AI 10, 100185 (2022).
Article Google Scholar
Mayer, K. et al. 3D-PV-Locator: Large-scale detection of rooftop-mounted photovoltaic systems in 3D. Appl. Energy 310, 118469 (2022).
Article Google Scholar
Ortiz, A. et al. An Artificial Intelligence Dataset for Solar Energy Locations in India. Sci. Data 9, 497 (2022).
Article PubMed PubMed Central Google Scholar
Stid, J. T. et al. Solar array placement, electricity generation, and cropland displacement across California’s Central Valley. Sci. Total Environ. 835, 155240 (2022).
Article PubMed Google Scholar
Zhang, X., Xu, M., Wang, S., Huang, Y. & Xie, Z. Mapping photovoltaic power plants in China using Landsat, random forest, and Google Earth Engine. Earth Syst. Sci. Data 14, 3743–3755 (2022).
Article ADS Google Scholar
Arnaudo, E. et al. A Comparative Evaluation of Deep Learning Techniques for Photovoltaic Panel Detection From Aerial Images. IEEE Access 11, 47579–47594 (2023).
Article Google Scholar
Clark, C. N. & Pacifici, F. A solar panel dataset of very high resolution satellite imagery to support the Sustainable Development Goals. Sci. Data 10, 636 (2023).
Article PubMed PubMed Central Google Scholar
Evans, M. J., Mainali, K., Soobitsky, R., Mills, E. & Minnemeyer, S. Predicting patterns of solar energy buildout to identify opportunities for biodiversity conservation. Biol. Conserv. 283, 110074 (2023).
Article Google Scholar
Fujita, K. S. et al. Georectified polygon database of ground-mounted large-scale solar photovoltaic sites in the United States. Sci. Data 10, 760 (2023).
Article PubMed PubMed Central Google Scholar
Kasmi, G. et al. A crowdsourced dataset of aerial images with annotated solar photovoltaic arrays and installation metadata. Sci. Data 10, 59 (2023).
Article PubMed PubMed Central Google Scholar
Ravishankar, R., AlMahmoud, E., Habib, A. & De Weck, O. L. Capacity Estimation of Solar Farms Using Deep Learning on High-Resolution Satellite Imagery. Remote Sens. 15, 210 (2022).
Article ADS Google Scholar
Tao, S., Rogan, J., Ye, S. & Geron, N. Mapping photovoltaic power stations and assessing their environmental impacts from multi-sensor datasets in Massachusetts, United States. Remote Sens. Appl. Soc. Environ. 30, 100937 (2023).
Google Scholar
Wang, J. et al. PVNet: A novel semantic segmentation model for extracting high-quality photovoltaic panels in large-scale systems from high-resolution remote sensing imagery. Int. J. Appl. Earth Obs. Geoinformation 119, 103309 (2023).
Article Google Scholar
Xia, Z. et al. Mapping global water-surface photovoltaics with satellite images. Renew. Sustain. Energy Rev. 187, 113760 (2023).
Article Google Scholar
Chen, D. et al. Classification and segmentation of five photovoltaic types based on instance segmentation for generating more refined photovoltaic data. Appl. Energy 376, 124296 (2024).
Article Google Scholar
Chen, Y., Zhou, J., Ge, Y. & Dong, J. Uncovering the rapid expansion of photovoltaic power plants in China from 2010 to 2022 using satellite data and deep learning. Remote Sens. Environ. 305, 114100 (2024).
Article Google Scholar
Feng, Q. et al. A 10-m national-scale map of ground-mounted photovoltaic power stations in China of 2020. Sci. Data 11, 198 (2024).
Article PubMed PubMed Central Google Scholar
Liu, J., Wang, J. & Li, L. Vectorized solar photovoltaic installation dataset across China in 2015 and 2020. Sci. Data 11, 1446 (2024).
Article PubMed PubMed Central Google Scholar
Phillpott, M. et al. Solar Asset Mapper: A continuously-updated global inventory of solar energy facilities built with satellite data and machine learning (2024).
Wang, J. et al. Mapping national-scale photovoltaic power stations using a novel enhanced photovoltaic index and evaluating carbon reduction benefits. Energy Convers. Manag. 318, 118894 (2024).
Article Google Scholar
Li, Q., Yu, K., Snow, C. & Chen, D. Enabling Automatic Solar PV Array Identification using Big Satellite Imagery. ACM J. Comput. Sustain. Soc., https://doi.org/10.1145/3723040 (2025).
Mao, H. et al. A method on mapping the distribution of photovoltaic power stations in complex terrain regions. Renew. Energy 247, 122964 (2025).
Article Google Scholar
Robinson, C. et al. Global Renewables Watch: A Temporal Dataset of Solar and Wind Energy Derived from Satellite Imagery. arXiv cs.LG (2025).
Wolfe, P. The Wiki-Solar Database. Wiki-Solar (2012).
Global Energy Observatory, Google, KTH Royal Institute of Technology in Stockholm, Enipedia, & World Resources Institute. Global Power Plant Database version 1.3.0. WRI https://datasets.wri.org/datasets/global-power-plant-database (2021).
Deline, C. et al. PV Fleet Performance Data Initiative: March 2020 Methodology Report, https://www.nrel.gov/docs/fy20osti/76687.pdf (2020).
Barbose, G. L., Darghouth, N. R., O’Shaughnessy, E. & Forrester, S. Tracking the Sun: Pricing and Design Trends for Distributed Photovoltaic Systems in the United States, 2024 Edition. https://emp.lbl.gov/publications/tracking-sun-pricing-and-design-3 (2024).
NREL. Agrivoltaics Map. InSPIRE OpenEI (2025).
Deline, C. et al. Photovoltaic Data Acquisition (PVDAQ) Public Datasets. NREL, https://doi.org/10.25984/1846021 (2021).
Thonig, R., Gilmanova, A. & Lilliestam, J. CSP.guru 2023-07-01. Zenodo, https://doi.org/10.5281/zenodo.1318151 (2023).
GEM. Global Solar Power Tracker, Global Energy Monitor, June 2024 release. Global Energy Monitor https://doi.org/Creative Commons CC BY 4.0 International License (2024).
Seel, J. et al. Utility-Scale Solar, 2024 Edition. OEDI, https://doi.org/10.25984/2460457 (2024).
Xu, K., Chan, G. & Kannan, S. Sharing the Sun Community Solar Project Data. NREL Data Catalog https://doi.org/10.7799/2438583 (2024).
EPA. Re-Powering America’s Land Initiative: Project Tracking Matrix. https://www.epa.gov/system/files/documents/2024-12/re_on_cl_tracking_matrix_draft_2024_final_508_12.09.24.pdf (2024).
Fujita, K. S. et al. United States Large-Scale Solar Photovoltaic Database. ScienceBase-Catalog https://doi.org/10.5066/P9IA3TUS (2024).
Article Google Scholar
Perry, K., Nguyen, Q. & White, R. Quantifying Error in Photovoltaic Installation Metadata. in vol. NREL/CP-5K00-90318 (NREL, Seattle, Washington, 2024).
SEIA & Wood Mackenzie. Solar Market Insight Report Q4 2024. https://seia.org/research-resources/solar-market-insight-report-q4-2024/ (2024).
Stid, J. T. et al. Impacts of agrisolar co-location on the food–energy–water nexus and economic security. Nat. Sustain. 8, 702–713 (2025).
Article Google Scholar
Gómez‐Catasús, J. et al. Solar photovoltaic energy development and biodiversity conservation: Current knowledge and research gaps. Conserv. Lett. 17, e13025 (2024).
Article Google Scholar
Phillpott, M. et al. Dataset: Solar Asset Mapper: A continuously-updated global inventory of solar energy facilities built with satellite data and machine learning. Zenodo https://doi.org/10.5281/zenodo.11368203 (2024).
OpenStreetMap Contributors. OpenStreetMap Elements. OpenStreetMap Wiki https://doi.org/OpenStreetMapWiki (2024).
Gilman, P. et al. SAM Photovoltaic Model Technical Reference Update. https://doi.org/10.2172/1429291 (2018).
Holmgren, W. F., Hansen, C. W. & Mikofski, M. A. pvlib python: a python package for modeling solar energy systems. J. Open Source Softw. 3, 884 (2018).
Article ADS Google Scholar
Wagner, M. J. & Wendelin, T. SolarPILOT: A power tower solar field layout and characterization tool. Sol. Energy 171, 185–196 (2018).
Article ADS Google Scholar
Prilliman, M. et al. Technoeconomic Analysis of Changing PV Array Convective Cooling Through Changing Array Spacing. IEEE J. Photovolt. 12, 1586–1592 (2022).
Article Google Scholar
Jamil, U., Hickey, T. & Pearce, J. M. Solar energy modelling and proposed crops for different types of agrivoltaics systems. Energy 304, 132074 (2024).
Article Google Scholar
Warmann, E., Jenerette, G. D. & Barron-Gafford, G. A. Agrivoltaic system design tools for managing trade-offs between energy production, crop productivity and water consumption. Environ. Res. Lett. 19, 034046 (2024).
Article ADS Google Scholar
Williams, H. J., Wang, Y., Yuan, B., Wang, H. & Zhang, K. M. Rethinking agrivoltaic incentive programs: A science-based approach to encourage practical design solutions. Appl. Energy 377, 124272 (2025).
Article Google Scholar
Gullotta, A., Aschale, T. M., Peres, D. J., Sciuto, G. & Cancelliere, A. Modelling Stormwater Runoff Changes Induced by Ground-Mounted Photovoltaic Solar Parks: A Conceptualization in EPA-SWMM. Water Resour. Manag. 37, 4507–4520 (2023).
Article Google Scholar
Nair, A. A., Rohith, A. N., Cibin, R. & McPhillips, L. E. A Framework to Model the Hydrology of Solar Farms Using EPA SWMM. Environ. Model. Assess. 29, 91–100 (2024).
Article Google Scholar
McCall, J. et al. PV Stormwater Management Research and Testing (PV-SMaRT) (Final Technical Report). https://doi.org/10.2172/2203518 (2023).
Mulla, D., Galzki, J., Hanson, A. & Simunek, J. Measuring and modeling soil moisture and runoff at solar farms using a disconnected impervious surface approach. Vadose Zone J. 23, e20335 (2024).
Article Google Scholar
Galzki, J. & Mulla, D. Stormwater runoff calculator for evaluation of low impact development practices at ground-mounted solar photovoltaic farms. Discov. Water 4, 35 (2024).
Article ADS Google Scholar
US DOE. National Transmission Planning Study. Executive Summary. https://www.energy.gov/gdo/national-transmission-planning-study (2024).
Bright, J. M., Killinger, S., Lingfors, D. & Engerer, N. A. Improved satellite-derived PV power nowcasting using real-time power data from reference PV systems. Sol. Energy 168, 118–139 (2018).
Article ADS Google Scholar
Samu, R. et al. Applications for solar irradiance nowcasting in the control of microgrids: A review. Renew. Sustain. Energy Rev. 147, 111187 (2021).
Article Google Scholar
Buster, G., Benton, B. N., Glaws, A. & King, R. N. High-resolution meteorology with climate change impacts from global climate model data using generative machine learning. Nat. Energy 9, 894–906 (2024).
Article ADS Google Scholar
Price, I. et al. Probabilistic weather forecasting with machine learning. Nature 637, 84–90 (2025).
Article ADS PubMed Google Scholar
Laws, N. D., Epps, B. P., Peterson, S. O., Laser, M. S. & Wanjiru, G. K. On the utility death spiral and the impact of utility rate structures on the adoption of residential solar photovoltaics and energy storage. Appl. Energy 185, 627–641 (2017).
Article ADS Google Scholar
Crago, C. L. & Chernyakhovskiy, I. Are policy incentives for solar power effective? Evidence from residential installations in the Northeast. J. Environ. Econ. Manag. 81, 132–151 (2017).
Article Google Scholar
Elmallah, S., Hoen, B., Fujita, K. S., Robson, D. & Brunner, E. Shedding light on large-scale solar impacts: An analysis of property values and proximity to photovoltaics across six U.S. states. Energy Policy 175, 113425 (2023).
Article Google Scholar
Hao, S. & Michaud, G. Assessing property value impacts near utility-scale solar in the Midwestern United States. Sol. Compass 12, 100090 (2024).
Article Google Scholar
Hoesch, K. W. et al. What to expect when you’re expecting engagement: Delivering procedural justice in large-scale solar energy deployment. Energy Res. Soc. Sci. 120, 103893 (2025).
Article Google Scholar
Krasner, N. Z. et al. Impacts of photovoltaic solar energy on soil carbon: A global systematic review and framework. Renew. Sustain. Energy Rev. 208, 115032 (2025).
Article Google Scholar
Walston, L. J. et al. Examining the Potential for Agricultural Benefits from Pollinator Habitat at Solar Facilities in the United States. Environ. Sci. Technol. 52, 7566–7576 (2018).
Article ADS PubMed Google Scholar
Supe, H. et al. Google Earth Engine for the Detection of Soiling on Photovoltaic Solar Panels in Arid Environments. Remote Sens. 12, 1466 (2020).
Article ADS Google Scholar
Cardoso, A., Jurado-Rodríguez, D., López, A., Ramos, M. I. & Jurado, J. M. Automated detection and tracking of photovoltaic modules from 3D remote sensing data. Appl. Energy 367, 123242 (2024).
Article Google Scholar
Oulefki, A. et al. Detection and analysis of deteriorated areas in solar PV modules using unsupervised sensing algorithms and 3D augmented reality. Heliyon 10, e27973 (2024).
Article PubMed PubMed Central Google Scholar
Curtis, T. et al. Best Practices at the End of the Photovoltaic System Performance Period. https://www.nrel.gov/docs/fy21osti/78678.pdf (2021).
Choi, J.-K. & Fthenakis, V. Crystalline silicon photovoltaic recycling planning: macro and micro perspectives. J. Clean. Prod. 66, 443–449 (2014).
Article Google Scholar
Farina, A. & Anctil, A. Material consumption and environmental impact of wind turbines in the USA and globally. Resour. Conserv. Recycl. 176, 105938 (2022).
Article Google Scholar
Hanna, F., Nain, P. & Anctil, A. Material availability assessment using system dynamics: The case of tellurium. Prog. Photovolt. Res. Appl. 32, 253–266 (2024).
Article Google Scholar
Yuan, L., Nain, P., Kothari, M. & Anctil, A. Material intensity and carbon footprint of crystalline silicon module assembly over time. Sol. Energy 269, 112336 (2024).
Article Google Scholar
Edun, A. S., Perry, K., Harley, J. B. & Deline, C. Unsupervised azimuth estimation of solar arrays in low-resolution satellite imagery through semantic segmentation and Hough transform. Appl. Energy 298, 117273 (2021).
Article Google Scholar
Perry, K. & Campos, C. Panel Segmentation: A Python Package for Automated Solar Array Metadata Extraction Using Satellite Imagery. IEEE J. Photovolt. 13, 208–212 (2023).
Article Google Scholar
Stid, J. T. et al. Spatiotemporally Characterized Ground-Mounted Solar PV Arrays Within California's Central Valley. figshare https://doi.org/10.6084/m9.figshare.23629326.v1 (2023).
Evans, M. J., Mainali, K. & Soobitsky, R. Chesapeake Solar Footprints. OSF.io https://doi.org/10.17605/OSF.IO/VQ7MT (2021).
Article Google Scholar
Lilliestam, J., Labordena, M., Patt, A. & Pfenninger, S. Empirically observed learning rates for concentrating solar power and their responses to regime change. Nat. Energy 2, 17094 (2017).
Article ADS Google Scholar
Byers, L. et al. A Global Database of Power Plants. 1–18, https://files.wri.org/d8/s3fs-public/2021-07/global-power-plant-database-technical-note-v1.3.pdf?VersionId=KNA6zn0E2HgUcEsXhtZuvfAlIqWOjLib&_gl=1*t24zup*_gcl_au* (2021).
Dunnett, S. Harmonised global datasets of wind and solar farm locations and power: dataset. figshare https://doi.org/10.6084/m9.figshare.11310269.v6 (2020).
USDA FPAC-BC-GEO. National Agriculture Imagery Program (NAIP) Imagery. Google Earth Engine Data Catalog (2023).
ESA. Copernicus Sentinel data. Google Earth Engine Data Catalog (2024).
Google Maps. [Contiguous United States]. Google Earth Engine Data Catalog (2024).
VIDA, Google, & Microsoft. Global Google-Microsoft Open Buildings. awesome-gee-community-catalog (2023).
Maxwell, A. E., Warner, T. A., Vanderbilt, B. C. & Ramezan, C. A. Land Cover Classification and Feature Extraction from National Agriculture Imagery Program (NAIP) Orthoimagery: A Review. Photogramm. Eng. Remote Sens. 83, 737–747 (2017).
Article ADS Google Scholar
Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Pelleg, D. & Moore, A., W. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. in Proceedings of the Seventeenth International Conference on Machine Learning 727–734, https://doi.org/10.5555/645529.657808 (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000).
Achanta, R. & Susstrunk, S. Superpixels and Polygons Using Simple Non-iterative Clustering. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4895–4904, https://doi.org/10.1109/CVPR.2017.520 (IEEE, Honolulu, HI, 2017).
Haralick, R. M., Shanmugam, K. & Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621 (1973).
Article ADS Google Scholar
Haralick, R. M. & Shanmugam, K. S. Combined spectral and spatial processing of ERTS imagery data. Remote Sens. Environ. 3, 3–13 (1974).
Article ADS Google Scholar
Tassi, A. & Vizzari, M. Object-Oriented LULC Classification in Google Earth Engine Combining SNIC, GLCM, and Machine Learning Algorithms. Remote Sens. 12, 3776 (2020).
Article ADS Google Scholar
Maxwell, A. E. et al. Large-Area, High Spatial Resolution Land Cover Mapping Using Random Forests, GEOBIA, and NAIP Orthophotography: Findings and Recommendations. Remote Sens. 11, 1409 (2019).
Article ADS Google Scholar
Kriegler, F. J., Malila, W. A., Nalepka, R. F. & Richardson, W. Preprocessing transformations and their effect on multispectral recognition. in The Proceedings of the Sixth International Symposium on Remote Sensing of Environment 97–131 (University of Michigan, Ann Arbor, 1969).
Rouse, J. W. Jr., Haas, R. H., Schell, J. A. & Deering, D. W. Monitoring vegetation systems in the Great Plains with ERTS. in vol. PAPER-A20 (NASA, Greenbelt, Maryland, 1974).
McFeeters, S. K. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 17, 1425–1432 (1996).
Article Google Scholar
Pengra, B. W. et al. Quality control and assessment of interpreter consistency of annual land cover reference data in an operational national monitoring program. Remote Sens. Environ. 238, 111261 (2020).
Article Google Scholar
RGI Consortium. Randolph Glacier Inventory - A Dataset of Global Glacier Outlines (NSIDC-0770, Version 7). National Snow and Ice Data Center, https://doi.org/10.5067/F6JMOVY5NAVZ (2023).
Gorelick, N. et al. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017).
Article ADS Google Scholar
Polsby, D. D. & Popper, R. The Third Criterion: Compactness as a Procedural Safeguard Against Partisan Gerrymandering. Yale Law Policy Rev. 9, 301–353 (1991).
Google Scholar
Martín-Chivelet, N. Photovoltaic potential and land-use estimation methodology. Energy, https://doi.org/10.1016/j.energy.2015.10.108 (2016).
Al, Garni, H. Z., Awasthi, A. & Wright, D. Optimal orientation angles for maximizing energy yield for solar PV in Saudi Arabia. Renew. Energy 133, 538–550 (2019).
Article Google Scholar
Jensen, A. R. et al. pvlib iotools—Open-source Python functions for seamless access to solar irradiance data. Sol. Energy 266, 112092 (2023).
Article Google Scholar
European Commission. Photovoltaic Geographical Information System (5.3). EU Science Hub (2024).
Gordon, J. M. & Wenger, H. J. Central-station solar photovoltaic systems: Field layout, tracker, and array geometry sensitivity studies. Sol. Energy 46, 211–217 (1991).
Article ADS Google Scholar
Narvarte, L. & Lorenzo, E. Tracking and ground cover ratio. Prog. Photovolt. Res. Appl. 16, 703–714 (2008).
Article Google Scholar
Ong, S., Campbell, C., Denholm, P., Margolis, R. & Heath, G. Land-Use Requirements for Solar Power Plants in the United States. NREL/TP-6A20-56290, 1086349 https://doi.org/10.2172/1086349 (2013).
Cagle, A. E. et al. Standardized metrics to quantify solar energy-land relationships: A global systematic review. Front. Sustain. 3 (2023).
Tonita, E. M., Russell, A. C. J., Valdivia, C. E. & Hinzer, K. Optimal ground coverage ratios for tracked, fixed-tilt, and vertical photovoltaic systems for latitudes up to 75°. N. Sol. Energy 258, 8–15 (2023).
Article ADS Google Scholar
Kennedy, R. et al. Implementation of the LandTrendr Algorithm on Google Earth Engine. Remote Sens. 10, 691 (2018).
Article ADS Google Scholar
Cohen, W. B., Healey, S. P., Yang, Z., Zhu, Z. & Gorelick, N. Diversity of Algorithm and Spectral Band Inputs Improves Landsat Monitoring of Forest Disturbance. Remote Sens. 12, 1673 (2020).
Article ADS Google Scholar
Huete, A. et al. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 83, 195–213 (2002).
Article ADS Google Scholar
García, M. J. L. & Caselles, V. Mapping burns and natural reforestation using thematic Mapper data. Geocarto Int. 6, 31–37 (1991).
Article ADS Google Scholar
Gao, B. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 58, 257–266 (1996).
Article ADS Google Scholar
Kauth, R. J. & Thomas, G. S. The Tasselled Cap–A Graphic Description of the Spectral-Temporal Development of Agricultural Crops as Seen by LANDSAT. in vol. Paper 159 (Purdue University, 1976).
Crist, E. P. & Cicone, R. C. A Physically-Based Transformation of Thematic Mapper Data–The TM Tasseled Cap. IEEE Trans. Geosci. Remote Sens. GE-22, 256–263 (1984).
Article ADS Google Scholar
Comte, L. & Olden, J. D. Climatic vulnerability of the world’s freshwater and marine fishes. Nat. Clim. Change 7, 718–722 (2017).
Article ADS Google Scholar
Merrifield, A. L., Brunner, L., Lorenz, R., Medhaug, I. & Knutti, R. An investigation of weighting schemes suitable for incorporating large ensembles into multi-model ensembles. Earth Syst. Dyn. 11, 807–834 (2020).
Article ADS Google Scholar
So, B. et al. Estimating the electricity generation capacity of solar photovoltaic arrays using only color aerial imagery. in International Geoscience and Remote Sensing Symposium (IGARSS) https://doi.org/10.1109/IGARSS.2017.8127279 (2017).
Article Google Scholar
IEA SHC. New Conversion Factor for Concentrating Collector Statistics. https://www.iea-shc.org/Data/Sites/1/publications/2023-07-Task64-New-Conversion-Factor.pdf (2023).
Wagner, M. J. & Zhu, G. A Generic CSP Performance Model for NREL’s System Advisor Model. In: Proc. of 17th SolarPACES Symp., Granada Spain. National Renewable Energy Laboratory, Conference Paper No.: NREL/CP-5500-52473 (2011).
Dobos, A., Neises, T. & Wagner, M. Advances in CSP Simulation Technology in the System Advisor Model. Energy Procedia 49, 2482–2489 (2014).
Article Google Scholar
Stid, J. T., Kendall, A. D., Rapp, J. & Bingaman, J. C. A comprehensive ground-mounted solar energy dataset with sub-array design metadata in the United States. Zenodo https://doi.org/10.5281/zenodo.14827819 (2025).
Bradbury, K. et al. Distributed Solar Photovoltaic Array Location and Extent Data Set for Remote Sensing Object Identification. figshare https://doi.org/10.6084/m9.figshare.3385780.v4 (2020).
Stowell, D. et al. Solar panels and solar farms in the UK - geographic open data (UKPVGeo). Zenodo https://doi.org/10.5281/zenodo.4059881 (2020).
Levandowsky, M. & Winter, D. Distance between Sets. Nature 234, 34–35 (1971).
Article ADS Google Scholar
OpenStreeMap Wiki. OpenStreetMap Using Aerial Imagery. OpenStreetMap Wiki (2024).
Bunis, L. & Mootz, J. Aerial Photography Field Office-National Agriculture Imagery Program (NAIP) Suggested Best Practices-Final Report. https://www.fsa.usda.gov/Internet/FSA_File/naip_best_practice.pdf (2007).
Stid, J. T., Kendall, A. D., Rapp, J. & Bingaman, J. C. GM-SEUS Initial Release. Zenodo https://doi.org/10.5281/zenodo.14829530 (2025).

Download references

Acknowledgements

This work was mostly supported by the USDA National Institute of Food and Agriculture INFEWS grant number 2018-67003-27406 (accession No. 1013707). Additional funding came from the Michigan State University Climate Change Research Support Program and the Foundation for Food and Agriculture Research (FFAR) Seeding Solutions Program, grant ID 23-000780. The authors thank Dr. Francis Hanna for his thorough feedback on the initial draft, the contributors of source datasets as well as all OpenStreetMap contributors and community members. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the USDA, Michigan State University, or FFAR.

Author information

Authors and Affiliations

Department of Earth and Environmental Sciences, Michigan State University, East Lansing, MI, 48824, USA
Jacob T. Stid, Anthony D. Kendall, Jeremy Rapp & James C. Bingaman
Department of Civil and Environmental Engineering, Michigan State University, East Lansing, MI, 48824, USA
Annick Anctil
Department of Sustainable Earth Systems Science, School of Natural Sciences and Mathematics, The University of Texas at Dallas, Richardson, TX, 75080, USA
David W. Hyndman

Authors

Jacob T. Stid
View author publications
Search author on:PubMed Google Scholar
Anthony D. Kendall
View author publications
Search author on:PubMed Google Scholar
Annick Anctil
View author publications
Search author on:PubMed Google Scholar
Jeremy Rapp
View author publications
Search author on:PubMed Google Scholar
James C. Bingaman
View author publications
Search author on:PubMed Google Scholar
David W. Hyndman
View author publications
Search author on:PubMed Google Scholar

Contributions

J.T.S. led the initial dataset acquisition, method and code development for compiling the existing solar array and panel-row dataset, digitizing array boundary omissions, generating the new GM-SEUS panel-row and array dataset, technical validation, and wrote the initial draft. A.D.K. significantly contributed to method development and paper edits. A.A. aided in method conceptualization and use case reporting. J.R. aided in spatial technical validation and method conceptualization. J.C.B. led the estimation of optimal tilt angle. A.D.K., A.A., and D.W.H. contributed to funding acquisition. All authors meaningfully contributed to manuscript development.

Corresponding author

Correspondence to Jacob T. Stid.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Stid, J.T., Kendall, A.D., Anctil, A. et al. A harmonized dataset of ground-mounted solar energy in the US with enhanced metadata. Sci Data 12, 1586 (2025). https://doi.org/10.1038/s41597-025-05862-4

Download citation

Received: 14 May 2025
Accepted: 20 August 2025
Published: 29 September 2025
DOI: https://doi.org/10.1038/s41597-025-05862-4