Abstract
Granular, localized data are essential for generating actionable insights that facilitate the transition to a net-zero energy system, particularly in underdeveloped regions. Understanding residential electricity consumption—especially in response to extreme weather events such as heatwaves and tropical storms—is critical for enhancing grid resilience and optimizing energy management strategies. However, such data are often scarce. This study introduces a comprehensive dataset comprising hourly transformer-level residential electricity load data collected between 2022 and 2023 from 23 residential communities across 10 cities in Guangxi Province, China. The dataset is augmented with meteorological data, including temperature, humidity, and records of extreme weather events. Additionally, calendar-related data (e.g., holidays) are included to facilitate the analysis of consumption patterns. The paper provides a detailed overview of the methodologies employed for data collection, preprocessing, and analysis, with a particular emphasis on how extreme weather influences electricity demand in residential areas. This dataset is anticipated to support future research on energy consumption, climate change adaptation, and grid resilience.
Similar content being viewed by others
Background & Summary
The increasing extreme weather events attributed to climate change significantly impact energy consumption in urban residential. Specifically, during periods of both low and high ambient temperatures, building heating, ventilation, and air conditioning (HVAC) are keys to maintain thermal comfort1,2. The heightened reliance on HVAC systems leads to increased peak electricity demand across various regions, such as the U.S.3, European2, Mexico4, China5, et al. Consequently, accurate forecasting of electricity load under diverse weather conditions is essential for effective power system operation and planning, encompassing aspects such as unit commitment, economic dispatch, demand-side management, and transmission line cogeneration management. Given that power grids are designed to meet maximum load, the availability of high-quality electricity demand data is critical for analyzing electricity consumption patterns and informing the deployment of peak generation capacity.
Currently, numerous load datasets are accessible from various countries, including Australia6, Austria7,8, Canada9, China10,11, German7,12,13, Irish14, Italy8, Japan15,16, Korean17, Portugal18,19, Spain20, UK21,22,23,24,25,26,27,28,29, Uruguay30, and USA31,32,33. For instance, the ECD-UY30 dataset provides comprehensive electricity consumption records for residential households in Uruguay, featuring the total household consumption with sample intervals ranging from one to fifteen minutes. Another dataset is collected from an industrial park with multiple types of buildings in Suzhou, China11, spanning six years with 5-minite intervals. The Spanish dataset provides valuable insights into electricity consumption patterns of Spanish household20, comprising 25,559 raw hourly time series data collected over three years. Additionally, the EWELD dataset represents a large-scale compilation of electricity consumption data from 386 industrial and commercial users in southern China, recorded over six years at fifteen-minute intervals10. Furthermore, the IoT-based data collected in a single-family apartment in China34 includes information on occupant behavior, thermal environment, and appliance electricity use. These datasets are instrumental for analyzing energy consumption patterns, conducting energy disaggregation studies, and designing energy billing plans.
Table 1 presents a summary of the existing load demand datasets. It should be noted that most of these datasets are outdated with limited coverage of recent five years. Furthermore, many of these datasets feature a restricted number of supply points, including those from sources such as AMPds2, FIKElectricity, UK-DALE, GREEND, CoSSMic, NOVAREF, DEDDIAG, REFIT, ENERTALK, ADRES, WPuQ, NESEMP, IDEAL, EULP, Elergone, EWELD, and MFRED. The sample size of these datasets is typically fewer than 500 houses, and represent the usage patterns of a small group of residents, which is insufficient for reliable demand analysis. Additionally, climate change and extreme weather events exert consideration influence on energy systems, impacting both energy demand and supply resilience. However, only a limited number of datasets, such as those from AMPds2, REFIT, WPuQ, NESEMP, IDEAL, EULP, EWELD, SERL, and SGSC, provide climate data, and even when such data is available. It predominantly consists of temperature information. Furthermore, there is a notable lack of load datasets pertaining to China and residential areas, as well as a lack of publicly accessible data on load forecasting in the context of extreme weather events. Most of the existing datasets originate from English-speaking countries that share similar cultural attributes, which may introduce biases in the interpretations and conclusions drawn from the data, particularly in the domain of human behavior analysis.
Comprehensive datasets of with above features are seldom employed for research purposes and are often difficult to obtain. Furthermore, certain available load datasets are exclusively applicable to single-family households and do not extend to building-level analysis within residential communities. Most public load datasets exhibit the limitations of small sample sizes, which has become increasingly evident in recent years. Additionally, there is a significant lack of load datasets pertaining to China and its residential areas as well as publicly available datasets on load analysis during extreme weather events.
We are committed to establishing an open load dataset that incorporates extreme weather information to fill the existing gaps. This study focuses on electricity load of residential communities, which contributes to regional diversity in accessible energy datasets. In specific, the dataset includes extreme weather events such as tropical storms and extremely high temperatures, collected from Jan. 2022 to Nov. 2023 from 23 residential communities in ten cities of Guangxi, covering more than 80,000 family households. Despite being limited in scope, the data offer valuable insights that can significantly contribute to regional decision-making and planning.
The proposed dataset is featured with the following unique characteristics.
-
It exhibits a high degree of diversity, encompassing electricity consumption data from 23 residential communities across 10 cities, which collectively represent over 80,000 family households in Guangxi, China. This dataset comprises aggregated load profiles derived from electricity transformer meters and emphasizes consumption patterns at the community level. The variety of sources enhances the dataset’s comprehensiveness, facilitating electricity load forecasting under a range of extreme conditions through data-driven methodologies.
-
The dataset encompasses a substantial collection period from 2022 to 2023, with an hourly sampling frequency. This high sampling rate accurately captures contemporary trends in energy consumption, providing a precise representation of current societal energy usage patterns. It enables high-fidelity analysis such as real-time forecasting into electricity demand during this period. The high temporal resolution, characterized by hourly measurements, allows for an in-depth examination of electrical consumption behaviors.
-
The dataset integrates weather data, as well as extreme weather events. It encompasses 12 distinct definitions of extreme weather and over 13 types of weather-related information. This comprehensive dataset facilitates quantitative analyses of climate change impacts on electricity consumption, such as peak demand. Examining load variations in response to weather fluctuations, deepen the current understanding of the interplay between external environmental factors and power consumption.
Methods
Data acquisition and collection
The dataset comprises hourly electricity demand data from 423 electricity transformer meters, which serve a total of 23 residential communities encompassing approximately 80,000 family households. Figure 1 illustrates the methodology employed in constructing the dataset. Table 2 presents essential information regarding the communities, including transformer capacity, the number of transformers, and the number of households. It is noted that the dataset is aggregated based on electricity transformers with rated capacities ranging from 400 kVA to 5,000 kVA. Variations in the number of transformers are attributed to the population size of the respective residential community. The consumption patterns observed across different transformer meters within the same community exhibit highly comparable trends, leading to the aggregation of each community into a singular load profile. The data spans from January 2022 to November 2023 and encompasses a wide array of information relevant to electricity demand, geographical location, holidays, climatic conditions, and occurrences of extreme weather.
The electricity consumption data utilized in this study were collected and validated by the local electrical department, ensuring high accuracy and reliability. The data were obtained through a hierarchical metering system that monitors electricity usage at multiple levels, including individual households, residential buildings, and transformers. At the household level, electricity meters record the energy consumption of individual residences, while each residential building is equipped with a meter to aggregate the consumption of all its households. At the community scale, transformer meters capture the total electricity demand of all buildings connected to a specific transformer.
The dataset used in this study comprises exclusively transformer-level data, providing aggregated load profiles for residential communities. These transformer meters, specifically the Holley DTSD546 Smart Meter, are equipped with advanced monitoring technology capable of recording hourly electricity consumption (http://www.holleymeter.com/prodel.html?id=117). The Holley DTSD546 is a multi-function, three-phase electronic meter designed for high precision in energy measurement. It features capabilities such as real-time data collection, remote monitoring, and comprehensive diagnostics to detect issues such as abnormal voltage, current imbalances, and meter tampering. The meter’s high-resolution data capture ensures detailed insights into energy usage trends, making it ideal for load analysis and energy planning. The electrical department conducts routine validation and quality assurance to ensure the integrity of the data. This includes flagging and addressing anomalies such as irregular consumption patterns or meter errors during the preprocessing stage, thereby maintaining the dataset’s reliability and accuracy.
The location-specific climate data come from the National Oceanic and Atmospheric Administration (NOAA) and provides essential information, including daily averages, maximum and minimum temperatures, humidity levels, mean wind speed, maximum sustained wind speed, gusts, precipitation amounts, and mean station pressure. This climate data serves as a crucial component for understanding the relationship between electricity consumption patterns and meteorological conditions. The solar radiation data used in this study were obtained from the ERA5 reanalysis dataset, provided by the European Centre for Medium-Range Weather Forecasts (ECMWF)35. This dataset, specifically titled “Reanalysis Era5 Single Levels,” contains hourly surface net solar radiation data for the years 2022 and 2023. The data was collected for a specific geographical area, encompassing latitudes from 20°N to 27°N and longitudes from 104°E to 113°E, which cover the geographic scope of the dataset. The data was retrieved in GRIB format and compressed into a ZIP file to facilitate efficient handling and analysis.
The criteria for defining extreme weather events are outlined in Table 3, which includes classifications such as heat index caution, heat index extreme caution, heat index danger, heat index extreme danger, wind chill, wind chill frostbite danger, wind chill great frostbite danger, tropical storm, severe tropical storm, typhoon, strong typhoon, and super typhoon. The extreme weather data are incorporated into the dataset using two distinct methodologies. First, we gathered data on extreme weather events from local media sources, which identified the “Top Ten” extreme weather events for 2022 and 2023 in the region. This approach ensures that significant local weather occurrences are well represented. Second, a statistical approach was employed, using NOAA weather data to compute relative humidity, heat index, and wind chill metrics, which are important for understanding extreme temperature conditions. Additionally, tropical storms and typhoons were categorized based on the Beaufort wind force scale, with gust-related threats excluded due to the absence of gust data at most monitoring stations. Table 3 outlines the criteria for defining extreme weather events. For example, a day is classified as a high-temperature day if the temperature exceeds the upper limits of the 95% confidence interval established within the dataset.
Table 4 provides a summary of the statistics related to electricity consumption data from 23 residential communities across 10 cities. It includes metrics such as mean values, standard deviation, skewness, and percentiles. Notably, significant disparities exist in the mean electricity consumption among the various communities, which can be attributed to differences in household sizes. All weather data are recorded daily and encompass mean and maximum values. Table 5 summarizes the statistics of the weather data from the same 23 residential communities, detailing parameters such as temperature, wind speed, dew point, and station pressure.
Data processing
Data censoring
To ensure data privacy and prevent the reverse engineering of transformer identities, we implemented a data censoring technique. In the original dataset, transformers were numbered according to their respective cities and communities in a logical order. Subsequently, each transformer across all cities was randomized and renumbered. As a result, transformers sharing the same city and community numbers indicate that they belong to the same community within the same city; however, the specific transformer numbers and corresponding community names have been censored. All code associated with the randomization process, as well as the original data prior to censoring, has been completely removed to safeguard privacy. This method guarantees that sensitive information regarding individual transformers cannot be traced back to specific locations.
Data cleaning
To ensure the quality of our dataset, we identified and removed outliers in the electricity consumption data, specifically instances where the maximum power consumption exceeded the rated capacity of the transformer. These cases were deemed erroneous and excluded from the dataset. For the weather data, outliers were identified through statistical analysis to confirm that measurements—such as temperature, humidity, and precipitation—fell within reasonable ranges for the region. Any data points that fell outside these expected ranges were also considered erroneous and removed to maintain the integrity of the dataset.
Extreme weather events
The extreme weather events were compiled using two distinct methodologies: local identification and a statistical approach. The local identification method involved gathering data on the “Top Ten” extreme weather events for 2022 and 2023 from local media sources, thereby identifying the most significant weather occurrences in the region. The statistical approach utilized a script to retrieve historical weather data from NOAA. From this data, we calculated relative humidity, heat index, and wind chill metrics to characterize extreme temperature conditions. Additionally, tropical storms and typhoons were categorized based on the Beaufort wind force scale. Due to the lack of gust data at most monitoring stations, gust-related threats were excluded from the extreme weather categorization.
Imputation of missing data
The dataset includes several categories, such as electricity consumption data, weather data, extreme weather indicators, and holiday data. Figure 2 presents statistics on the missing values for each meter in Community 0 of City 0. Among the 200 transformer meters, 180 exhibit data completeness exceeding 95%. It is considered acceptable to exclude the entire dataset of transformer meters if the proportion of missing values exceeds 30%.
Missing data distribution of the community 0 in City 0 from 2022 to 2023. The sparkline on the right provides a concise summary of the data completeness, highlighting the rows with the highest and lowest number of missing values in the dataset. The y-axis displays datetime, while the x-axis depicts transformers. The white spaces denote the absence of data.
To address missing values, we employed the forward-backward imputation method for both the electricity consumption and weather datasets. This approach was chosen to ensure that missing values were imputed with context from both preceding and subsequent time periods. For the electricity consumption data, we implemented forward-backward imputation by averaging the data from the previous and following weeks. This method is effective for filling gaps in data where consumption patterns are expected to be similar within a given week. The forward-backward method guarantees that the imputed values are balanced and reflect both prior and subsequent consumption trends. Similarly, for the weather data, the forward-backward imputation method was applied, with missing values filled by averaging the values from the day before and the day after the missing data point. This approach is particularly effective in situations where weather conditions follow daily cycles, as any missing data would likely resemble the values of the surrounding days. Three commonly used standard imputation methods were employed for comparison, as illustrated in Fig. 3. The linear interpolation method tends to yield suboptimal results when the time interval exceeds 24 hours; however, it performs more favorably than other methods for very short time intervals. It is crucial to recognize that missing values may convey significant information, such as meter failures due to extreme weather conditions. Furthermore, the peak demand within the load profile can be approximated using the mean value, as peak demand is integral to power grid dispatch strategies. Consequently, the raw data is preserved in the repository for future research endeavors.
Comparison of various imputation methods for addressing missing data. The selected forward-backward imputation method is denoted by the black line, while the shaded background indicates the presence of missing data. Three days in June 2022 are selected. Additional imputation methods, including linear interpolation, forward imputation, and backward imputation, are also evaluated for comparison purposes. Forward imputation addresses missing values by utilizing data from the preceding week, whereas backward imputation employs data from the subsequent week to fill in gaps. The forward-backward imputation method integrates both approaches by averaging the data from the week prior and the week following the missing values.
Data Records
The dataset is available on Figshare and can be imported into any software that is compatible with the SQLite3 database format36. A detailed analysis workflow is provided in the Code Availability section. It is important to note that the initial dataset comprises only raw data, while subsequent analyses are conducted on processed data. The dataset comprises seven distinct tables: raw transformer data, transformer metadata, weather data, weather station metadata, extreme weather data sourced from the internet, calculated extreme weather data, and festival data. The timestamps are recorded in UTC + 8 format (YYYY-MM-DD hh:mm:ss). The temporal range of the data extends from January 2022 to November 2023, with measurements taken at one-hour intervals.
Technical Validation
This section evaluates the quality and reliability of the dataset using systematic validation approaches. We concentrate on verifying the key attributes of the data, including temporal trends, geographical patterns, and the influence of meteorological factors on electricity consumption. The validation process guarantees the dataset’s robustness for future applications and research.
Temporal validation of electricity consumption patterns
We first validate the electricity consumption patterns across various time scales—annual, weekly, and daily—to ensure consistency with established trends. We examine the seasonal and daily consumption profiles of 23 residential communities, emphasizing the variation in electricity demand throughout different periods of the year. For instance, Fig. 4 illustrates the annual electricity consumption profiles of 10 cities, demonstrating a clear increase in demand during the summer months, which aligns with typical seasonal trends. Additionally, Fig. 5 presents the weekly and daily electricity consumption profiles, identifying peak consumption times, including midday and evening periods. The daily profiles reveal distinct peaks during midday and evening hours, driven by residential activities such as cooling, cooking, and lighting, with a noticeable dip during early morning hours when household activity is minimal. Weekly patterns indicate higher consumption on weekdays compared to weekends, reflecting standard workweek schedules. Monthly profiles reveal significant seasonal variations, with increased demand in the summer months due to air conditioning use and decreased demand in cooler months, influenced by weather conditions. These findings are validated against historical consumption patterns in the region to confirm their accuracy and reliability.
The Diversity Factor (Fig. 6) is evaluated as a measure of variability in electricity demand over time. This factor emphasizes the fluctuations in demand across different seasons and hourly intervals. The diversity factor is higher in summer and winter, which corresponds to increased variability in electricity consumption during extreme temperature conditions. Notably, the diversity factor is particularly pronounced during the late afternoon and early evening hours (hours 17–22), confirming the dataset’s capacity to capture demand fluctuations effectively.
Geographical validation
We evaluate the geographical variability in electricity consumption across various cities, selected for their diverse socio-economic conditions and geographic locations. The six chosen cities in Guangxi, each characterized by distinct climates and economic structures, are validated for their representative attributes. Figure 5 illustrates the differences in electricity consumption profiles among the cities, including the influence of varying work schedules on consumption patterns. These geographical variations are supported by external data sources, such as demographic and economic reports, ensuring that the selected cities accurately represent broader regional trends.
Validation of meteorological influence on electricity consumption
Moreover, we examined the effects of meteorological variables on electricity consumption patterns, utilizing C0 and C2 as case studies. C0 refers to the coastal city of BH, which has an average temperature of 22.05 degrees Celsius. Coastal cities like BH are particularly vulnerable to marine-related disasters, such as typhoons. In contrast, C2 is located in the northern Guangxi province, an area prone to cold waves and rainstorms. Figure 7 illustrates the relationship between electricity consumption and various meteorological indicators for C0, including mean temperature and humidity. The correlation is assessed using the Pearson correlation coefficient, revealing a strong association between temperature and humidity with electricity consumption, suggesting that these factors exert a significant positive influence on power usage, in alignment with the prior knowledge.
The relationship between electricity consumption and meteorological indicators for City 0 and City 2 throughout the data collection period. (a) City 0. (b) City 2. The upper triangles present the Pearson correlation coefficients among the various factors, while the lower triangles illustrate the corresponding diagrams. The correlation coefficient is represented by the Pearson coefficient, and the second row indicates the p-values associated with these correlations.
The lower left section of Fig. 7(a) illustrates the daily fluctuations in electricity consumption in relation to temperature. The data indicates a U-shaped relationship between temperature and electricity consumption. Specifically, the curve remains relatively flat within the comfortable temperature range of 18–25 °C, while it rises sharply when temperatures exceed 25 °C and increases more gradually when temperatures fall below 18 °C.
Validation of extreme weather event impact
Table 6 provides a summary of the proportion of extreme weather events recorded annually at the provincial level. In 2023, drought conditions were the most prevalent extreme weather event, comprising 9.9% of all recorded weather occurrences. As global temperatures continue to rise, climate change is likely to have significant implications for peak electricity demand during the summer moments, necessitating an increase in generation capacity. Figure 8 presents the effects of extreme weather on electricity consumption patterns in two selected cities, indicating that different types of extreme weather exert varying impacts on electricity demand. Figure 9 compares electricity demand profiles on typical days versus extreme weather days, which include conditions such as extreme heat indices, high and low temperatures, elevated humidity, level 5 winds, and precipitation exceeding 100 mm.
The impacts of extreme weather on electricity consumption for selected cities, C0 and C2. (a) City 0. (b) City 2. Extreme weather events are assessed based on the criteria outlined in Table 3. The colored background serves to highlight these weather occurrences.
Usage Notes
HVAC energy consumption is significantly influenced by various factors, including outdoor temperature, relative humidity, solar radiation, building design, indoor setpoints, and user behavior. In addition to the outdoor temperature, solar radiation serves as a crucial determinant that affects the thermal comfort experienced within a building37. The architectural characteristics of a building, such as insulation quality, window placement, and material selection, play a vital role in determining its thermal efficiency. For instance, buildings equipped with high-performance glazing and strategically designed shading devices can achieve a reduction in solar heat gain of approximately 30% to 50%, resulting in considerable energy savings for HVAC operations38. Moreover, indoor setpoints are critical, as adjustments to thermostat settings can lead to significant energy savings. Specifically, adjusting the setpoint can reduce the overall energy consumption of HVAC systems by over 7%39. User behaviors also significantly impact electricity consumption, particularly when meteorological variables fall within certain thresholds. For example, extreme heat conditions often lead to increased air conditioner usage40. These factors should be considered in future studies to enhance the accuracy and applicability of energy demand modeling. A summary of relevant datasets related to these factors is presented in Supplementary Table 1.
The dataset is derived from a limited number of cities, each possessing unique geographical and socio-economic characteristics. While this may present challenges for direct generalization to other cities, the dataset still offers a valuable foundation for decision-making and planning. By analyzing load patterns and their responses to meteorological variations, this dataset can serve as a baseline for extrapolating insights to cities with similar climates and population dynamics. The cities exhibit distinct relationships concerning average temperature, reflecting the climatic variability across Guangxi. According to the Statistics Bureau of China (https://data.stats.gov.cn/), in 2023, Guangxi’s GDP per capita is comparatively lower, at approximately $8,317, positioning it 29th among all provinces. Nanning, the provincial capital, has a population of approximately 8.89 million and a GDP per capita of around $9,474. Nanning experiences a subtropical climate, with average temperatures ranging from 10 °C to 18 °C in January to 26 °C to 34 °C in July. Guilin, located in the northeastern region of Guangxi, has a population of about 4.95 million and a GDP per capita of approximately $7,851. Guilin’s climate is characterized by mild winters and warm, humid summers, with average temperatures ranging from 6 to 15 °C in January to 26 °C to 34 °C in July. In the western part of Guangxi, Baise has a population of 3.57 million and a GDP per capita of roughly $7,980. Baise has a tropical monsoon climate, with average temperatures ranging from 10 to 20 °C in January to 25 °C to 35 °C in July.
Moreover, this dataset serves as a resource for developing time-of-use electricity pricing strategies, providing a data-driven approach to structuring tariffs that align with consumption behaviors. By analyzing peak and off-peak usage patterns, policymakers can design pricing schemes that encourage consumers to shift their energy consumption to periods of lower demand. This approach alleviates stress on the grid during peak times and promotes more efficient energy use41.
While prospective climate scenarios may differ from the conditions observed during the data collection phase, this dataset serves as a vital resource for long-term strategic planning by providing baseline information on energy consumption trends. Historical data offers a foundational reference for analyzing the potential evolution of energy consumption patterns in response to rising temperatures. For instance, Matteo et al. demonstrated the use of historical data in modeling the relationship between average temperature and energy demand across Italy42. Furthermore, the integration of historical datasets with advanced climate models enables researchers to predict energy demand under various climate change scenarios, thereby strengthening the planning for resilient energy systems43.
Climate change is characterized by long-term alterations in average weather patterns, where immediate changes may not be readily observable. The data presented in this study remain instrumental as a foundational reference for comprehending current HVAC energy consumption trends, which can be leveraged to track future variations. The Intergovernmental Panel on Climate Change (IPCC) projects that if current warming trends persist, future temperatures could rise by 1.5 °C above pre-industrial levels between 2030 and 2052, which would have significant implications for energy consumption44. Our dataset, gathered over a period exceeding one year, predominantly captures typical conditions, thereby providing a solid basis for evaluating future shifts in energy usage attributable to rising temperatures. Additionally, the dataset encompasses instances of extreme weather, including both high and low temperature events, which can serve as benchmarks for assessing the effects of such occurrences on energy demand. Additionally, the dataset includes instances of extreme weather conditions, such as high and low temperatures, which provide essential insights into their potential impacts on load demand. A study examining the relationship between electricity usage and temperature across various regions of China revealed consistent patterns over time40. Although climate change unfolds gradually, the frequency of extreme weather events, including heatwaves and cold spells, is increasing annually. Such events have a pronounced effect on residential electricity consumption, as evidenced by recent extreme weather occurrences in different regions. Consequently, this dataset remains instrumental in understanding user behavior and electricity consumption patterns over an extended timeframe, thereby supporting adaptive planning for future climate scenarios.
Achieving net-zero energy buildings, structures that generate as much energy as they consume, requires the effective integration of distributed energy resources (DERs), such as solar photovoltaic (PV) systems and energy storage solutions. Net-zero buildings not only contribute to sustainability but also reduce reliance on grid electricity. The critical challenge in this integration is addressing the temporal mismatch between energy generation and consumption. Solar PV systems, for example, produce peak energy during midday, while residential electricity demand often peaks in the late afternoon and evening. This discrepancy underscores the necessity for efficient energy storage solutions to balance supply and demand45. Accurate modeling of energy consumption patterns allows for the optimal sizing and placement of renewable energy systems, ensuring they effectively meet the energy requirements of buildings. Research has demonstrated that analyzing temporal variations in energy usage is crucial for integrating energy storage systems, enabling the storage of excess energy during low-demand periods and its utilization during peak demand times46.
This dataset offers valuable insights into consumption patterns and load demands, serving as a key tool for planning distributed energy systems aimed at achieving net-zero energy goals. It facilitates the design of cost-effective energy systems by enabling precise modeling of energy needs, which is essential for sustainable and resilient building designs. In summary, leveraging detailed consumption data and optimizing the integration of distributed energy resources and storage systems are fundamental steps toward realizing net-zero energy buildings. These efforts are vital for creating a sustainable and resilient built environment, aligning with global initiatives to reduce carbon emissions and mitigate climate change47.
Code availability
The code implementation was done with Python 3.11. The source code including data preprocessing, data analysis, and data visualization is available on GitHub (https://github.com/yuruotao/District-power) and is licensed under the MIT license. The repository includes a README.md file with detailed instructions and all functions are well commented.
References
Petri, Y. & Caldeira, K. Impacts of global warming on residential heating and cooling degree-days in the United States. Scientific reports 5, 12427 (2015).
Wenz, L., Levermann, A. & Auffhammer, M. North–south polarization of European electricity consumption under future warming. Proceedings of the National Academy of Sciences 114, E7910–E7918 (2017).
Auffhammer, M., Baylis, P. & Hausman, C. H. Climate change is projected to have severe impacts on the frequency and intensity of peak electricity demand across the United States. Proceedings of the National Academy of Sciences 114, 1886–1891 (2017).
Davis, L. W. & Gertler, P. J. Contribution of air conditioning adoption to future energy use under global warming. Proceedings of the National Academy of Sciences 112, 5962–5967 (2015).
Cao, J. et al. Spatially heterogeneous effect of temperature on electricity consumption in Shenzhen, China. Building and Environment 241, 110468 (2023).
in Smart-Grid Smart-City Customer Trial Data (2022).
Anvari, M. et al. Data-driven load profiles and the dynamics of residential electricity consumption. Nature communications 13, 4593 (2022).
Monacchi, A., Egarter, D., Elmenreich, W., D’Alessandro, S. & Tonello, A. M. in 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm). 511–516 (IEEE).
Makonin, S., Ellert, B., Bajić, I. V. & Popowich, F. Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014. Scientific data 3, 1–12 (2016).
Liu, G. et al. EWELD: A Large-Scale Industrial and Commercial Load Dataset in Extreme Weather Events. Scientific Data 10, 615 (2023).
Zhou, K., Hu, D., Hu, R. & Zhou, J. High-resolution electric power load data of an industrial park with multiple types of buildings in China. Scientific Data 10, 870 (2023).
Data Platform – Open Power System Data, https://open-power-system-data.org/ (2020).
Wenninger, M., Maier, A. & Schmidt, J. DEDDIAG, a domestic electricity demand dataset of individual appliances in Germany. Scientific Data 8, 176 (2021).
Regulation, C. f. E. (Irish Social Science Data Archive Dublin, Ireland, 2012).
in Sustainable Open Innovation Initiative.
Komatsu, H. & Kimura, O. Customer segmentation based on smart meter data analytics: Behavioral similarities with manual categorization for building types. Energy and Buildings 283, 112831 (2023).
Shin, C. et al. The ENERTALK dataset, 15 Hz electricity consumption data from 22 houses in Korea. Scientific data 6, 193 (2019).
Pereira, L. et al. FIKElectricity: A Electricity Consumption Dataset from Three Restaurant Kitchens in Portugal. Scientific Data 10, 779 (2023).
Trindade, A. in Uci maching learning repository-electricity load diagrams 2011–2014 dataset (2016).
Quesada, C., Astigarraga, L., Merveille, C. & Borges, C. E. An electricity smart meter dataset of Spanish households: insights into consumption patterns. Scientific Data 11, 59 (2024).
Murray, D., Stankovic, L. & Stankovic, V. An electrical load measurements dataset of United Kingdom households from a two-year longitudinal study. Scientific data 4, 1–12 (2017).
in Energy demand research project: Early smart meter trials, 2007–2010 (2018).
in ETER: UK Household Electricity and Activity Survey, 2016–2019: Secure Access (ed Diakonova Grunewald P., M.) (2020).
Pullinger, M. et al. The IDEAL household energy dataset, electricity, gas, contextual sensor data and survey data for 255 UK homes. Scientific Data 8, 146 (2021).
Craig, T., Polhill, J. G., Dent, I., Galan-Diaz, C. & Heslop, S. The north east scotland energy monitoring project: exploring relationships between household occupants and energy usage. Energy and Buildings 75, 493–503 (2014).
Webborn, E. et al. The SERL Observatory Dataset: Longitudinal smart meter electricity and gas data, survey, EPC and climate data for over 13,000 households in Great Britain. Energies 14, 6934 (2021).
DataStore, L. in Smartmeter energy consumption data in London households (ed UK Power Networks) (2013).
Rushby, T., Anderson, B., James, P. & Bahaj, A. Solent Achieving Value from Efficiency (SAVE) Data, 2017–2018 (2020).
Kelly, J. & Knottenbelt, W. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. Scientific data 2, 1–14 (2015).
Chavat, J., Nesmachnow, S., Graneri, J. & Alvez, G. ECD-UY, detailed household electricity consumption dataset of Uruguay. Scientific Data 9, 21 (2022).
Wilson, E. J. et al. End-use load profiles for the US building stock: Methodology and results of model calibration, validation, and uncertainty quantification. (National Renewable Energy Lab.(NREL), Golden, CO (United States) 2022).
Wang, C. et al. Impacts of climate change, population growth, and power sector decarbonization on urban building energy use. Nature Communications 14, 1–16 (2023).
Meinrenken, C. J. et al. MFRED, 10 second interval real and reactive power for groups of 390 US apartments of varying size and vintage. Scientific Data 7, 375 (2020).
Wang, C., Li, X., Sun, W., An, J. & Gao, S. Occupant behavior, thermal environment, and appliance electricity use of a single-family apartment in china. Scientific Data 11, 65 (2024).
Bell, B. et al. The ERA5 global reanalysis: Preliminary extension to 1950. Quarterly Journal of the Royal Meteorological Society 147, 4186–4227 (2021).
Li, B. et al. A Large-The provincial shapefile and whole database. figshare https://doi.org/10.6084/m9.figshare.26333452.v1 (2024).
La Gennusa, M., Nucara, A., Pietrafesa, M. & Rizzo, G. A model for managing and evaluating solar radiation for indoor thermal comfort. Solar Energy 81, 594–606 (2007).
Tzempelikos, A. & Athienitis, A. K. The impact of shading design and control on building cooling and lighting demand. Solar energy 81, 369–382 (2007).
Kusiak, A., Li, M. & Tang, F. Modeling and optimization of HVAC energy consumption. Applied Energy 87, 3092–3102 (2010).
Li, Y., Pizer, W. A. & Wu, L. Climate change and residential electricity consumption in the Yangtze River Delta, China. Proceedings of the National Academy of Sciences 116, 472–477 (2019).
Nicolson, M. L., Fell, M. J. & Huebner, G. M. Consumer demand for time of use electricity tariffs: A systematized review of the empirical evidence. Renewable and Sustainable Energy Reviews 97, 276–289 (2018).
De Felice, M., Alessandri, A. & Catalano, F. Seasonal climate forecasts for medium-term electricity demand forecasting. Applied Energy 137, 435–444 (2015).
Morais, L. B. S. et al. Short-term load forecasting using neural networks and global climate models: An application to a large-scale electrical power system. Applied Energy 348, 121439 (2023).
Allan, R. P. et al. in Climate change 2021: The physical science basis. Contribution of working group I to the sixth assessment report of the intergovernmental panel on climate change 3–32 (Cambridge University Press, 2023).
Lund, H., Möller, B., Mathiesen, B. V. & Dyrelund, A. The role of district heating in future renewable energy systems. Energy 35, 1381–1390 (2010).
Wörman, A., Pechlivanidis, I., Mewes, D., Riml, J. & Bertacchi Uvo, C. Spatiotemporal management of solar, wind and hydropower across continental Europe. Communications Engineering 3, 3 (2024).
O’Shaughnessy, E., Cutler, D., Ardani, K. & Margolis, R. Solar plus: Optimization of distributed solar PV through battery storage and dispatchable load in residential buildings. Applied Energy 213, 11–21 (2018).
Schlemminger, M., Ohrdes, T., Schneider, E. & Knoop, M. Dataset on electrical single-family house and heat pump load profiles in Germany. Scientific data 9, 56 (2022).
Acknowledgements
This work was supported in part by the Specific Research Project of Guangxi for Research Bases and Talents (2022AC21257), and the National Natural Science Foundation of China (52122706).
Author information
Authors and Affiliations
Contributions
Conceptualization and methodology, B.L., R.Y.; visualization and writing, B.L., R.Y., M.Y.; formal analysis, H.Z., G.R.; investigation, W.D., S.L., G.R., D.X.; funding acquisition, W.D., H.Z.; supervision, H.Z. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, B., Yu, R., Gan, K. et al. A Large-Scale Residential Load Dataset in a Southern Province of China. Sci Data 12, 450 (2025). https://doi.org/10.1038/s41597-025-04766-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-04766-7











