Background & Summary

China has become the leading global source of bilateral financing for energy and continues to play a central role in investment through the Belt and Road Initiative (BRI)1,2. The BRI, China’s foreign initiative since 2013, is often noted for its immense scale3, which amounts to trillions of dollars, and is widely regarded as one of the largest infrastructure initiatives of the past decade4. The initiative engages approximately two-thirds of the world’s population and accounts for one-third of the global economy5. Moreover, the Chinese approach under the BRI prioritises “hardware” investments6,7, especially in the energy infrastructure sector2,8,9.

China’s expansive overseas energy investments have generated considerable debate. Globally, electricity demand is projected to rise rapidly10, driven by population growth, urbanisation, and industrial expansion. Achieving universal electricity access in low- and middle-income countries is estimated to require approximately USD 3.1 trillion in power infrastructure investments between 2016 and 203011. Bilateral energy finance can help bridge these infrastructure gaps1,12, but it also raises concerns regarding environmental impact and financial sustainability. Early BRI lending supported numerous coal-fired power plants, drawing criticism that the initiative exported high-carbon assets13. Many host countries have taken on substantial external debt to finance these projects, and the associated loan terms shape project feasibility, debt service profiles, and long-term refinancing risks14,15. Understanding the activities of key financiers is crucial for interpreting the economic and strategic drivers of these capital flows. These financiers include policy banks13,16,17, such as the China Development Bank (CDB) and the Export–Import Bank of China (CHEXIM), as well as state-owned commercial banks and enterprises. BRI energy investments are not isolated events18,19; rather, they are deeply embedded within the dynamics of the global energy transition. Ongoing shifts influence these projects in energy technology, international financial norms, and the evolving geopolitics of China’s global engagement. In this context, a clear view of the scope, distribution, and conditions of BRI energy finance is therefore essential to assess alignment with sustainable development and climate objectives.

While many studies have examined the BRI at the country or regional level20,21,22 or by broad geographic region23,24,25,26,27,28, a critical yet under-explored analytical approach is the economic corridors lens. These corridors are explicitly defined in China’s policy documents (e.g., the 2015 Vision and Actions White Paper29) and serve as structured regional groupings for targeted infrastructure investment and economic integration30,31. Researchers have noted that a corridor-based perspective can reveal variations in policy focus, project selection, and risk profiles that would be obscured in larger regional or global analyses32,33. This analytical lens is further validated by official Chinese government documents34, which define and prioritise these corridors, suggesting they may represent distinct arenas for strategic policy implementation. For example, some corridors might be prioritised for resource access (e.g., the China–Mongolia–Russia Corridor32 and the China–Central Asia–West Asia Corridor35), whereas others emphasise market development (e.g., the New Eurasian Land Bridge31), resulting in differentiated investment patterns and risk environments.

Energy investments are central to this corridor-based framework31,32,36, and are intended to reduce infrastructure barriers while fostering new sources of growth6,37. Since corridor boundaries often correspond to known transit routes, energy needs, and regional partnerships, analysis at this level can reveal fine-grained investment patterns, such as preferences for specific fuel types or financing structures in each corridor31,36. This specificity enables a clearer assessment of whether BRI-related energy projects strengthen local power grids, meet urgent electricity needs, or help facilitate broader economic integration. Some corridors feature heavier fossil-fuel development, whereas others lean toward renewable projects, reflecting differing resource endowments and policy alignments6,38. Focusing on these officially designated corridors allows a precise analysis of how regional contexts and policy frameworks shape investment patterns, financing conditions, and development outcomes, thereby overcoming the limitations of broader geographic classifications.

Despite the BRI’s significance, comprehensive, project-level data on its energy investments remain elusive. No centralised official database exists, and existing public sources differ in scope, methodology, and coverage, precluding integrated analysis11,15. To build our dataset, we therefore reviewed six prominent sources that capture Chinese overseas activity: William & Mary’s Global Chinese Development Finance (v3.0) (AidData)39, Boston University’s China’s Global Energy Finance (GGEF) database40, the American Enterprise Institute’s China Global Investment Tracker (AEI)41, the World Resources Institute’s China Overseas Finance Inventory (COFI) database42, Fudan University’s Green Finance & Development Centre BRI Database (GFDC)43, and Global Energy Monitor (GEM) database44. For the construction of a project-level finance dataset, we selected four primary sources, AidData, CGEF, AEI, and COFI, because each provides project-level observations with investment or loan amounts and citable provenance. AidData offers broad official-finance coverage and, uniquely among public sources, reports financing terms for many loans, though with incomplete coverage7,39,45; CGEF restricts to policy bank sovereign energy loans2,46; AEI captures large corporate investments and construction contracts across sectors41; and COFI provides plant-level records for power generation in BRI countries42. The GFDC and GEM databases serve distinct but ancillary roles. GFDC publishes BRI country lists and headline deal aggregates, but does not release a public, project-level micro dataset with harmonised finance fields43. Its inclusion criteria differ from those of our primary sources; we therefore excluded GFDC from the dataset construction. GEM offers excellent plant-level technical resolution and selective finance trackers in coal and gas, but most plant trackers lack investment amounts. We excluded GEM for this reason as well. Table 1 summarises the applicability and limitations across sources, motivating our harmonisation approach. Methodological differences, distinctions between official versus corporate finance data, and varying reporting thresholds collectively necessitated the creation of an integrated and de-duplicated dataset11,47,48.

Table 1 Comparison of major datasets on Chinese overseas finance and their coverage gaps.

To address these research gaps, we constructed a comprehensive and validated dataset of Chinese power sector investments under the BRI for the period 2013–2023. The dataset consolidates records from four publicly available sources. We applied a multi-stage entity-resolution protocol to standardise project identity, assign each project to an economic corridor, and harmonise technical attributes (for example, technology and installed capacity) with financial fields such as investment amount, lender, and financing instrument. A distinctive feature of the dataset is the systematic curation of loan terms (interest rate, maturity, and grace period) recorded in parallel as reported and as imputed values, with explicit flags indicating uncertainty. An ancillary file provides modelled annual debt-service schedules for analytical use. The final database contains 412 unique power sector investments and is version-controlled and openly licensed to support replication and extension.

By resolving fragmentation across prior sources and disclosing financing structures in detail, the dataset enables comparable measurement of Chinese overseas power sector finance across institutions, instruments, technologies, countries, and BRI corridors. It supports rigorous analysis of investment trends and technology mixes, enables corridor-based comparisons of financing conditions, and facilitates modelling of concessionality, debt service burdens, and refinancing risk at the project, country, and corridor levels. The project-level identifiers and full provenance fields allow reliable linkage to external economic, energy, and environmental datasets, thereby providing a validated evidence base for assessing the BRI’s implications for global energy finance, debt sustainability, and progress toward climate objectives. This Data Descriptor refrains from policy interpretation; instead, it provides a validated evidence base to enable others to analyse trends in BRI overseas energy finance.

Methods

The goal of this work was to create a comprehensive and validated dataset of China’s energy investments under the BRI. We implemented a two-component workflow to compile the project-level dataset. First, for data collection and processing, we merged four public databases and filtered the projects to those in BRI participant countries based on the data inclusion criteria and definitions. Second, for financing data processing, we extracted disclosed loan terms and, when missing, generated flagged estimates using a conservative nearest neighbour procedure and produced modelled annual debt service schedules.

Data collection and inclusion criteria

To capture all forms of BRI energy investments, we used four publicly available databases as the core input: William & Mary’s Global Chinese Development Finance Dataset, Boston University’s China Global Energy Finance (CGEF), American Enterprise Institute’s China Global Investment Tracker (AEI), and World Resources Institute’s China Overseas Finance Inventory (COFI). Together, these sources span official development finance with partial disclosure of loan terms (AidData), policy bank sovereign lending in energy (CGEF), large outward investment deals across sectors (AEI), and plant-level records for power generation in BRI countries (COFI). Merging them broadens coverage while preserving a consistent, finance-focused unit of observation for each project. These roles are consistent with Table 1 and the variable mapping in Supplementary Table 2.

Clear inclusion and exclusion criteria were applied to ensure the dataset’s analytical focus and consistency. To be included, a project had to meet four conditions.

  1. (1)

    Start year within BRI era: For the time frame, we filtered each source to include projects with commitment or investment dates in 2013 or later, up to the end of 2023. This captures the period from BRI’s announcement through the present, aligning with the initiative’s timeline.

  2. (2)

    Host is a BRI country: BRI participant countries were defined as those that joined the initiative by signing a Belt and Road cooperation Memorandum of Understanding with China. In addition, if a project occurred in 2013 but the country joined BRI later, we included it; the criterion was that the host country was a BRI country at any point. The information on BRI countries, as listed in source databases, is included in Supplementary Table 2.

  3. (3)

    Energy infrastructure focus: For this dataset, “energy infrastructure” denoted investment in physical assets within the power and fuel systems of BRI countries from 2013 to 2023. This definition covered both the development of new assets that add, connect, or supply capacity and financial transactions involving existing assets. Eligible projects therefore comprised: (I) the construction or expansion of power generation facilities across all fuel types (coal, gas, oil, hydro, solar, wind); (ii) the development of grid infrastructure, including transmission lines and substations; (iii) the acquisition or merger of existing power generation facilities; and (iv) the creation of fuel-supply infrastructure that was integrally linked to power provision. Borderline cases were adjudicated using a “link-to-power-system” rule; for example, an upstream fossil-fuel extraction project was included only when its output was explicitly dedicated as feedstock for a specific, jointly developed power plant in the host country. We excluded purely extractive-for-export or trade-only activities that did not result in the creation or transfer of ownership of physical energy infrastructure. The technical information, as listed in source databases, is included in Supplementary Table 3.

  4. (4)

    Chinese financial involvement: The project had to involve at least one Chinese entity in the investment or financing process. We defined ‘Chinese financing’ to include sovereign loans from policy banks, loans from major state-owned commercial banks, and direct investments by Chinese state-owned or private enterprises. For each project, we identified the primary Chinese financier or investor and noted the type of financing (e.g., policy bank loan, commercial bank loan, enterprise equity) based on source information. If a project had multiple financiers (Chinese and international), all co-financiers were recorded, but at least one Chinese entity had to be involved for the project to qualify as a BRI energy investment. The investors and their coverage information, as listed in source databases, were included in Supplementary Table 4.

Information regarding the filtering process and the screening procedures for data that does not meet the requirements can be found in Table 2.

Table 2 Data filtering and merging based on the data inclusion criteria.

By applying these criteria, we acknowledged that some ambiguous or marginal cases might have been excluded. For example, a small, privately financed solar facility developed by a Chinese manufacturer in a BRI country may not have been captured in our sources and would therefore be absent from the dataset. Since the analysis focused solely on officially recognised BRI participants, Chinese-financed energy projects in countries outside the BRI, such as India or Brazil, were not included even when they occurred between 2013 and 2023. We adopted these boundaries to preserve analytical clarity and relevance to BRI policy discourse. The dataset, therefore, reflects what was publicly reported and verifiable rather than a complete census of all Chinese overseas energy activities. As a result, smaller private investments and projects in jurisdictions that are not BRI participants may be underrepresented.

Data integration and verification

To construct a comprehensive and validated dataset, we designed and executed a multi-stage, reference-based entity resolution workflow (Fig. 1). This process was designed to harmonise records from four heterogeneous sources, eliminate duplication, and ensure the highest possible level of data accuracy. The workflow was structured into four distinct phases.

Fig. 1
Fig. 1
Full size image

Overview of the data construction process for compiling the BRI energy investment dataset.

Phase 1: Initial scoping and reference set generation

The workflow begins by using AidData’s Global Chinese Development Finance database as the seed list to identify candidate BRI energy projects. AidData provides the broadest project-level coverage of China’s official overseas development finance, including loans and grants, along with geocoding and extensive source documentation39. We first applied the first four core inclusion criteria, as defined previously (project initiation in or after 2013, location in an official BRI participant country, and involvement in tangible energy infrastructure), to filter AidData’s extensive records. Since AidData does not explicitly categorise physical energy infrastructure projects, we applied field-level filtering to isolate physical energy infrastructure construction or expansion projects. In practice, this meant screening by sector codes, project titles, purpose descriptions, and narrative fields to exclude entries that were purely exploratory, trade-related, or non-infrastructure (e.g., technical assistance, feasibility studies, fuel trade deals).

Additionally, since AidData’s entries lack certain technical details, such as installed capacity or specific energy subtype, we took extra steps to refine this seed list. We cross-checked the project narratives in AidData against the original record descriptions to confirm that each candidate project satisfied our inclusion criteria (Chinese financed, power sector infrastructure, located in an officially recognised BRI participant, and committed between 2013 and 2023). We then standardised project names and removed duplicates or variant name entries. This filtering and cleaning process yielded a preliminary list of Chinese-financed energy infrastructure projects that met our inclusion criteria and fell within the scope of the BRI. This seed list served as the anchor for subsequent cross-database matching.

Phase 2: Reference-based matching and consolidation

With the reference set established, each project in the AidData seed list was systematically cross-referenced against the other three databases (CGEF, AEI, and COFI). Using the AidData entries as our reference, we applied multi-source fuzzy matching to identify cases where entries from different sources refer to the same real-world project. Each potential match was evaluated on multiple attributes, and we did not rely solely on the project name. We required agreement within defined tolerances to treat two records as the same project. Key matching criteria included:

  • Host country

  • Project or plant name, normalised to account for different spellings, translations, or abbreviations.

  • Installed capacity of the facility, with a tolerance (i.e., allowing less than 10 MW difference for smaller projects, to account for rounding or unit conversions).

  • Investment amount or loan size, with a tolerance (i.e., 5% difference in reported dollar value, considering currency conversions or rounding differences).

  • Year of commitment or financial close, allowing for slight offsets, as different sources may record either the agreement year or the construction start year. A difference of approximately one year is generally acceptable.

Matched records were consolidated into single, composite project entries. For each confirmed project, we merged information from all sources that referred to it. This consolidation involved standardising the project name and ensuring all unique identifiers were linked to the unified record. When sources provided differing data for an attribute, we resolved conflicts by prioritising the most authoritative or well-documented source, such as official disclosures (e.g., from government or multilateral reports). This Phase 2 process yielded a core list of confidently identified projects, each of which appeared in multiple public datasets. Entries present in only one source were flagged and moved to Phase 3 for further verification.

Phase 3: Unique entry identification and verification

Following cross-source consolidation, some projects remained that were recorded in only one source. Phase 3 involved reviewing these single-source records to determine whether they should be included in the final dataset. We imposed a strict verification rule to maintain data reliability, especially following the principle used in the manuscript’s approach: namely, that no project was included based on a lone source unless independently confirmed.

Our procedure for single-source cases was as follows:

  • Inter-source check: If a project from one of the supplemental databases (BU, AEI, or COFI) did not match anything in AidData’s list, we first checked whether that project might appear in any of the other two databases. For example, if AEI listed a 2018 solar farm in Country B that AidData did not have, we checked to see if BU or COFI had any records of a similar project in Country B around the same time.

  • Public documentation search: If the project remained unique to a single database, we then sought independent public documentation to corroborate it. This meant searching for official announcements, government or utility press releases, company reports, or reputable media articles that mention the project and confirm its basic details.

Only if we found at least one independent external source confirming the project did we include it in the final dataset. Essentially, a single-source project had to be backed by a second source (even if that second source was not one of the major databases, but rather a news article or an official document). This mirrors a conservative approach to avoid false positives (e.g., cases where a database might have listed a planned project that never actually proceeded). If any other source could not corroborate a project, it was excluded from the final inventory. These were often cases of tentative deals or reported memoranda that were not confirmed elsewhere. By excluding such entries, we improved the reliability of the dataset, focusing only on projects with evidence of actual commitment or implementation. By the end of Phase 3, all projects retained in the dataset were supported by multiple sources of evidence.

Phase 4: Normalisation, enrichment and finalisation

The final phase unifies all verified projects into a clean, standardised dataset. This ensured each real-world project appears exactly once, with a complete set of technical and financial attributes, and with flags indicating any data that required imputation or special treatment. Key tasks in this phase included:

  • Project name standardisation: Unified project naming conventions by resolving different transliterations, abbreviations, or punctuation. This also included tagging each project with a unique identifier to avoid any ambiguity in future referencing.

  • Lender and investor labelling: Standardised the names of Chinese banks, companies, and agencies involved. For instance, “China Development Bank” and “CDB” were recorded under a single standardised label, and distinctions between types of finance were captured in separate fields.

  • Technical attribute normalisation: We ensured that key technical fields were recorded in consistent units and categories. The Installed Capacity of power generation projects was stored as a numeric field in megawatts (MW). We defined standardised categories for Energy Source (e.g., Coal, Natural Gas, Hydro, etc.) and for Energy Sub-sector (e.g., Generation, Transmission, or Extraction Infrastructure).

  • Provenance and metadata: We recorded complete provenance information for each project. Each entry includes references to the source databases from which it originated, as well as any external documentation used for verification. We also retained any original descriptive notes from sources in a comments field. This level of documentation ensures transparency, allowing users to trace each data point back to its origin.

A key enrichment step, and a distinctive feature of this dataset, was the provision of fields for loan terms (interest rate, maturity, and grace period), which were rarely disclosed in public announcements. Many otherwise valid loan-financed projects, therefore, lack complete loan term information. To preserve analytical coverage while maintaining transparency, we implemented a two-tier imputation strategy and retained parallel “Original” and “Estimate” columns for every term, together with explicit flags indicating whether a value was imputed.

First, we checked for reported loan terms from the same energy sub-sector and the same BRI corridor. The assumption is that projects of a similar nature in the same corridor likely received loans on comparable terms, given similar economic conditions and policy contexts. If that was not available, we then used projects with the same lender and loan type within the broader region as a reference. For example, a policy-bank loan for a power plant in a neighbouring country. In a similar case, if the interest rate for a CHEXIM loan to a solar project in East Africa was unknown, but we had other solar project loans by CHEXIM in Africa, we used their average terms as a proxy for the unknown rate.

All imputed values are clearly flagged in the dataset, and the original (pre-imputation) fields are preserved alongside them (often as blank or null for missing values). This way, any analysis can either include the imputed values for completeness or ignore them if only the actual reported data was desired. We also conducted validation tests in the later Technical Validation section (Validation of Imputed Financial Terms), comparing the distribution of imputed interest rates and maturities against those reported, to ensure that our imputation did not introduce bias. The imputed distributions closely mirrored the reported ones, giving us confidence that the approach was reasonable and did not, for instance, systematically under- or over-estimate loan generosity.

To facilitate explicit, region-specific analysis, an economic corridor designation was assigned to each project (Fig. 2). This classification leverages the six major overland corridors outlined in official BRI planning documents34. The maritime groupings were also necessary to conduct a comprehensive geographic analysis, as these regions are major recipients of Chinese energy finance but are not included in the six officially designated overland economic corridors. For example, Latin America was not part of the BRI’s original scope, but by 2017, it was being described by Chinese officials as the “natural extension” of the 21st-century Maritime Silk Road49,50. To capture these maritime regions, we defined two additional analytical groupings for maritime BRI partner nations: an “SSA” corridor covering countries participating in the BRI in Sub–Saharan Africa, and a “LATAM” corridor covering countries participating in the BRI in Latin America and the Caribbean. With each project mapped into one of these eight corridors, the dataset allows for region-specific analysis of Chinese investment patterns. Table 3 provides the list of countries in each corridor.

Fig. 2
Fig. 2
Full size image

The BRI Economic Corridor.

Table 3 Economic corridor classifications, constituent countries, and typical energy focus.

After Phase 4, we finalised the master dataset of 412 unique BRI energy infrastructure projects (2013–2023), with each project represented by one record containing all reconciled information. Approximately 82% of these projects are corroborated by multiple public sources, defined as at least two of the four core databases, while the remaining 18% are documented in a single primary source that nonetheless provides project-level information with citable provenance. All records have consistent fields and units, and any data point that was adjusted or estimated (such as an imputed loan term or a converted amount) was accompanied by an explicit flag for full transparency.

Data attributes and preparation

Each project in the final dataset was described by a comprehensive set of attributes (Table 4). These fields characterise each project across multiple dimensions and can be logically grouped into five categories: (1) project identification and timeline; (2) geographical and strategic context; (3) technical specifications; (4) financial structure; and (5) data provenance and qualitative information.

  1. (1)

    Project Identification & Timeline: Provides basic characteristics and key dates for the project. This category includes a unique Project ID, the Project Title, the Commitment Year (the year financing was committed, 2013–2023), the Implementation Year, the Completion Year, and the calculated Project Duration (in years, between implementation and completion).

  2. (2)

    Geographical & Strategic Context: Provides information about where the project is located and its strategic grouping. This includes the Recipient Country (host country where the project is implemented), the assigned BRI Corridor, and a broader Recipient Region.

  3. (3)

    Technical Specifications: Provides technical details of the project. This includes Installed Capacity for generation projects (in megawatts, MW), Energy Source (primary energy type, e.g., Coal, Hydro, Gas, Solar, etc.), Energy Source Code (a three-letter code for the energy source, e.g., COA, HYD, GAS, SOL, TRN), and Energy Sub-sector (the project’s role in the energy value chain, e.g., Power Generation, Resource Extraction, Transmission & Distribution). It also includes three flag fields indicating whether the project is considered clean energy (based on the energy source; coded as G = low-carbon, N = carbon-intensive, T = neutral for grid infrastructure) and whether it involves physical infrastructure construction (Yes/No; for example, construction or expansion of facilities = Yes, whereas design, merger/acquisition, or sale of assets = No).

  4. (4)

    Financial Structure: Provides a breakdown of the project’s financing and terms. This covers the Total Investment (in million USD, typically the Chinese contribution or total project cost if Chinese involvement is dominant), the Financing Method (bilateral commercial loan, syndicated loan, export buyer’s credit loan, concessional loan, supplier’s credit loan, enterprises investment), Investment Type (Debt, Equity, or Both), Share Size (percentage of Chinese ownership for equity investments, if applicable), Main Lender or Investor, Funding Agency Type (institutional category of the main lender, e.g., “State-owned Policy Bank”), Co-financing Agencies (other involved institutions, if any), and an Entity Combo classification describing the mix of Chinese financing entities. For projects financed by loans, this category also includes detailed loan terms, such as Maturity, Interest Rate, and Grace Period, each provided in both the “Original” value (as reported) and the “Estimate” (as imputed). We also record whether the investment amount was estimated (a flag indicating if the amount was calculated due to a lack of direct disclosure). Additionally, a field indicates if the financing was bilateral or part of a multilateral arrangement.

  5. (5)

    Data Provenance & Qualitative Information: Provides fields that capture the source and context for each project entry. This includes a Detailed Info narrative providing a descriptive synopsis of the project (history, context, notable impacts, etc.), the Data Source (which public dataset or source the record was drawn from, e.g., AidData, AEI, CGEF, or COFI), the Original Source Code (the unique project identifier from the original source dataset, if applicable), and up to two Web Source fields linking to additional documentation or news articles used for verification. These web sources (archived URLs) allow users to trace each data point back to primary evidence.

Table 4 BRI energy investment dataset description.

Data Records

The dataset described in this Data Descriptor is publicly available in the Figshare repository51. The primary data file, titled “BRI_ENERGY_DATASET_SciDATA”, is provided in Microsoft Excel (.xlsx). The dataset is also provided as Supplementary Information with this article.

The main project-level data is contained in the “INV_DATA” sheet in the Excel workbook. Each row corresponds to a single energy project, and each column corresponds to one attribute as described in the Methods section (and summarised in Table 4). The accompanying data dictionary (readme) provides detailed definitions for each field and notes any special coding or caveats. The data files are version-controlled; the version associated with this publication is the initial release described in this document.

In addition to the main project-level data, the Excel file includes an ‘INV + LOAN’ sheet. This sheet contains the calculated annual debt service schedules for each loan, illustrating repayment obligations over time based on the loan terms. These are modelled values for analytical convenience (not raw data from sources). All data are provided in plain text or numeric formats to facilitate easy import into common analysis tools (e.g., Excel, R, Python). Users are encouraged to consult the readme for guidance on each field, including code definitions (such as corridor codes C1–C6 or lender abbreviations) and any data-specific caveats. Finally, to ensure broad accessibility, the dataset has been released under a CC BY 4.0 (Creative Commons Attribution 4.0 International) licence.

Technical Validation

The dataset underwent a multi-step protocol to validate the quality and consistency of the compiled dataset, recognising that it draws from heterogeneous sources with varying degrees of verification. This protocol combined cross-source verification, statistical checks, and comparison with external benchmarks.

Cross-source verification

We performed a cross-source consistency check. Wherever a project was listed in multiple sources, we cross-verified the key attributes such as investment amount, capacity, and dates. Discrepancies were investigated and resolved by consulting third-party sources. For example, if AidData reported a loan amount of USD 100 million for a project but AEI’s tracker listed USD 120 million, we searched for an official press release or ministry report for that project’s signing to see which figure was accurate. We gave preference to figures from official or primary documents or from sources with a clearly documented methodology. In many cases, AidData’s project entries include references to Chinese or local government announcements, which we found helpful in confirming details. Through this process, we enhanced the accuracy of fields such as investment amounts and dates, surpassing the accuracy of any single source. We also ensured that each real-world project appears only once in the dataset: if two sources had the same project under slightly different names, we merged them and dropped duplicate entries. This reconciliation prevents double counting, a common issue when aggregating data from different databases.

Range and outlier analysis

We conducted range and outlier analysis on all numerical fields. We sorted projects by investment amount and capacity to identify any outliers. The largest project (a USD 6.5 billion nuclear plant) and the smallest (a USD 5 million hydropower upgrading project) were cross-checked against external reports and confirmed to be correctly recorded. We also reviewed the distribution of loan interest rates and tenors: as expected, concessional loans generally fell in the 2–5% interest range with 15–20 year maturities, consistent with policy bank lending norms. No data point lay outside realistic bounds for projects of this nature.

Validation of imputed financial terms

To validate our imputation methodology, we conducted a direct statistical and visual comparison between the originally reported loan terms and the values generated by our estimation procedure. This validation ensures that the inclusion of imputed values for completeness does not introduce systematic bias into the dataset, as stated before. Table 4 presents a quantitative comparison of the descriptive statistics (mean, median, standard deviation, and count) for both the original and estimated values for interest rates, maturities, and grace periods. The statistics showed a high degree of consistency between the two groups. For instance, the mean interest rate for the original data is 3.79%, while the mean for the imputed data is 3.85%, a negligible difference. Similar alignment was observed for the median values and standard deviations across all three financial terms.

To further validate the imputation, Fig. 3 provides a visual comparison of the distributions for reported (original) and imputed (estimated) values using density plots. The plots for maturity, interest rate, and grace period all show that the imputed distributions (orange) closely mirror the shape and peaks of the reported distributions (blue) (Fig. 3). This visual evidence corroborates the statistical findings in Table 4, confirming that our imputation method preserves the underlying characteristics of the financial data and does not create artificial outliers or distortions. This transparent, evidence-based approach provides us with confidence that the imputed values are plausible and in line with expectations, enabling more comprehensive analyses without compromising data integrity. Furthermore, all imputed fields are clearly marked in the dataset, so analysts can exclude or treat them separately if desired.

Fig. 3
Fig. 3
Full size image

Density plots comparing the distributions of reported (original) and imputed (estimated) loan terms. The plots show the distributions for Maturity (in years), Interest Rate (%), and Grace Period (in years).

The minor discrepancies observed, particularly in the interest rate distribution, are not indicative of methodological bias but rather are a logical reflection of the underlying structure of the missing data (Table 5). The prevalence of missing financial terms is not random; it varies across different financing methods (detailed in Supplementary Tables 5 and 6). For instance, ‘Concessional loans’ exhibit a particularly high rate of missing interest rate data. These loans are known to carry standardised, lower interest rates, typically in the 2–4% range. Consequently, our imputation method, which draws from similar project types, naturally generates a cluster of estimated values in this specific range, which explains the prominent peak in the imputed distribution. Conversely, missing data from ‘Bilateral Commercial Loans’, which carry higher, market-based rates, contribute to a secondary cluster of imputed values. This non-random pattern of data disclosure, where certain loan types are less likely to have their terms publicly reported, is therefore logically mirrored in the imputed distributions. The closer alignment for maturity and grace period distributions suggests these terms are more standardised across different financing types.

Table 5 Comparison of descriptive statistics for original versus estimated loan terms.

Usage Notes

The BRI Energy Projects Dataset is designed as a resource for a broad range of analytical applications by researchers, policymakers, financial analysts, and civil society organisations.

Analysing investment trends

Users can analyse temporal and spatial investment trends by aggregating the data by year, country, or economic corridor. The corridor field makes it straightforward to study groups of countries that are part of the same official initiative, which is useful for evaluating China’s region-specific strategies. Users can aggregate the data by corridor, e.g., comparing total investment in the China-Pakistan Corridor versus the China–Indochina Peninsula Corridor, to observe differences in regional investment priorities.

Technology mix and energy transition studies

The dataset’s detailed categorisation of energy sub-sectors allows analysis of China’s overseas energy portfolio and the BRI’s role in the global energy transition. Researchers can measure the “greenness” of investments in each corridor by calculating the ratio of renewable to fossil-fuel capacity (using the Installed Capacity and Energy Source fields) and tracking how this ratio changes over time (using the Commitment Year field). Such analysis can reveal whether specific corridors are shifting toward cleaner energy investments.

Financial flows and Debt analysis

The data on loan terms can be used to model debt servicing burdens for host countries. A supplementary “INV + LOAN” spreadsheet in the dataset provides illustrative repayment schedules that can help identify potential debt sustainability concerns. For example, an analyst might sum the projected repayments for all projects in each country to identify periods of peak debt service obligations. (These projections assume no defaults or restructurings; in reality, some loans may be renegotiated or restructured, so users should interpret the schedules as scenario analyses rather than predictions.) By using the loan term fields, one can also identify which projects or corridors carry higher interest rates on average, or which lenders provided more concessional versus commercial terms – insights that were previously difficult to quantify at scale.

Comparative analysis with other datasets

Since our dataset combines multiple sources, users can integrate it with other datasets for more comprehensive analysis. For instance, one could combine our project data with country-level indicators (e.g., GDP, debt levels, electrification rates from World Bank or IMF data) to examine correlations between BRI energy investments and development outcomes. Each project entry is tagged with country and year, facilitating merges with country-year datasets. Similarly, researchers can cross-reference projects in our dataset with other project databases or case studies to validate and contextualise findings.

Case studies and qualitative follow-up

Researchers focusing on specific countries or projects can use the dataset as a starting point to identify relevant cases. For example, a researcher studying energy development in Pakistan under the BRI can filter for that country and retrieve all projects (our data shows several major coal, solar, and wind projects in the China-Pakistan Economic Corridor). The dataset will give key facts about each project (financier, size, status, etc.), which can guide a more detailed case study. By providing a comprehensive list, the dataset helps ensure that case studies are set in context – one can see if a particular project was one of many of a similar type or a unique case in that corridor.