Background & Summary

Considering climate change1,2 and unprecedented economic growth3, tropical cyclone (TC, locally referred to as typhoon in the Northwest Pacific) risks are increasing for coastal regions worldwide, with both the frequency and intensity of TCs showing concerning upward trends4,5,6, especially in coastal China7,8,9. As a cornerstone of sustainable development10, agriculture is particularly vulnerable to extreme meteorological events such as typhoons. Among various meteorological hazards, typhoons pose particularly severe threats to coastal agricultural systems through torrential rainfall, destructive winds, storm surges11,12,13, and saltwater intrusion14, all of which can devastate crops within hours and cause long-term impacts on agricultural productivity and rural livelihoods. While numerous studies have investigated the impacts of typhoons on agricultural systems15,16,17, the spatial resolution of previous works is often insufficient to reach county level18,19. The existing county-level analyses of typhoon disasters are limited either by short temporal coverage20 or by restricted geographical scope21. Systematic research on typhoon-induced agricultural damage in China remains limited, which is primarily due to data scarcity in existing databases, with most studies focusing on isolated events rather than long-term patterns22. This highlights the urgent need for a comprehensive, high-resolution, and long-term dataset on crops affected by typhoons. Such a dataset is essential for improving impact assessment capabilities and developing robust risk management strategies.

Given that typhoons are relatively infrequent meteorological events, existing typhoon disaster datasets often have some limitations, such as inadequate temporal coverage, discontinuous spatial distributions, and insufficient resolution. Some international disaster datasets are widely used, such as the Emergency Events Database (EM-DAT) from the Centre for Research on the Epidemiology of Disasters (CRED), NatCat from Munich Re, Sigma of Swiss Re and the GLobal IDEntifier (GLIDE) database (https://glidenumber.net/glide/public/search/search.jsp) developed by the Asian Disaster Reduction Center23,24. However, these datasets predominantly focus on national-scale data and provide limited documentation of the impact on agriculture. Datasets with county-level resolution have been established in several regions, such as the United States (The Spatial Hazard Events and Losses Database for the United States, SHELDUS25), Canada (The Canadian Disaster Database, https://www.publicsafety.gc.ca/cnt/rsrcs/cndn-dsstr-dtbs/), Japan (KITAMOTO Asanobu @National Institute of Informatics (NII): http://agora.ex.nii.ac.jp/digital-typhoon/disaster/damage/), and Taiwan Province of China26, supporting local disaster management and risk assessment frameworks.

Although there are some national or provincial disaster databases in China, they are difficult to share or make publicly available, facing challenges in terms of data reliability, coverage extensiveness, and standard consistency27. Since 1984, the Shanghai Typhoon Institute (STI) and the National Climate Center (NCC) under the China Meteorological Administration (CMA) have systematically collected typhoon disaster data28, but these data are limited to the provincial-level spatial resolution. From 2004 to 2013, the NCC provided county-level typhoon disaster records covering 15 provinces in mainland China. Thus, most related studies using these data were constrained to this decade-long period29,30. Furthermore, records spanning longer than five years are available for only eight coastal provinces (Shandong, Jiangsu, Shanghai, Zhejiang, Fujian, Guangdong, Guangxi and Hainan). Based on records from these eight provinces, Wu et al.31 reconstructed a county-level dataset of typhoon-induced direct economic losses (DELs) from 1980 to 2018, which was subsequently applied to create a DEL preassessment model32. However, this dataset only focuses on DEL metrics and excludes other critical indicators, such as the crop area affected by typhoons.

This study introduces a reconstructed county-level dataset of crop areas affected by typhoons from 1980 to 2022, which is derived from historical disaster records (2004–2013) from eight coastal provinces. We have developed and adopted a rigorous and robust reconstruction methodology, ensuring the accuracy and reliability of the reconstructed data. The reconstructed dataset can provide strong support for multiscale agricultural risk assessment and decision-making, from national policies to county-level strategies. Furthermore, this dataset can be applied to advance typhoon-related agricultural disaster research, increase forecasting accuracy, and improve early warning systems.

Methods

The 1980–2022 affected crop area dataset described in this paper is reconstructed by establishing the relationships between typhoon meteorological data and disaster data from 2004 to 2013 of each single station. The specific process is described in detail below.

Original data acquisition

The original typhoon disaster data, including 2,091 raw records across the eight coastal provinces from 2004 to 2013, is obtained from the National Climate Center during the cooperation of one National Key Research and Development Program of China, numbered 2019YFC1510205. One of the projects supporting this study was a further scientific exploration based on the previous project, so these original disaster data can be used for reconstruction. The data from this period is valuable and detailed. Each entry includes the typhoon name, the affected county and four impact metrics: DEL, affected crop area, affected population and death toll. In this study, we focused on the affected crop area.

The meteorological data include daily precipitation observations (24-hour accumulation from 12:00 UTC on the previous day) and maximum daily wind speeds derived from 2-minute averages observed at 1-hour intervals, spanning the period 1980–2022. These data were provided by the National Meteorological Information Center (NMIC) of the CMA (http://idata.cma/cmadaas/), covering 530 meteorological stations matched with corresponding county-level administrative districts across the eight provinces (Fig. 1). Quality control procedures have been applied to these data33,34.

Fig. 1
figure 1

Distribution of the eight coastal provinces and 530 county-level stations in China.

The historical typhoon best track dataset was provided by the STI of the CMA (http://tcdata.typhoon.org.cn)35,36. This dataset includes positional coordinates and intensity information at 6-hour intervals35.

Province-level annual planting areas for major crops were acquired from the National Bureau of Statistics of China (https://data.stats.gov.cn/easyquery.htm?cn=E0103).

Data preprocessing

Due to the annual variability in crop planting areas, those areas of crop potentially impacted by typhoons vary over time. For the comparability of affected crop areas among different years, we standardized all historical records to 2022 using provincial data on annual planting areas of major crops. The adjustment was performed through the following formulas:

$${A}_{i}=\frac{{S}_{2022}}{{S}_{i}}$$
(1)
$${{Ls}}_{1}={{Ls}}_{0}\times {A}_{i}$$
(2)

where Ai is the correction factor for a province for year i (i [2004, 2013]), accounting for relative variations in crop planting areas between the year i and 2022. S2022 and Si denote the total planting areas of major crops within the province for the baseline year (2022) and target year i, respectively. \({{Ls}}_{0}\) and \({{Ls}}_{1}\) represent the affected crop areas before and after standardization, respectively.

The daily precipitation and wind speed data of the 530 meteorological stations from 1980 to 2022 contain missing values, accounting for 1.06% and 1.61% of the total data, respectively. The lack of meteorological data may reduce the utilization of valuable disaster records in subsequent analyses. Thus, the inverse distance weighting (IDW) method was employed to interpolate and complete the missing meteorological values. The formula was as follows:

$$q{\rm{{\prime} }}=\frac{{\sum }_{i=1}^{sn}\left[\frac{1}{{d}_{i}^{w}},q,(,i,)\right]}{{\sum }_{i=1}^{sn}\frac{1}{{d}_{i}^{w}}}$$
(3)

where w represents a weight parameter, which was set to 2; sn represents the number of neighboring stations, with a constant value of 10; di represents the distance between the target station and its ith neighboring station; q represents the original meteorological value of each neighboring station before interpolation; and q′ represents the interpolated meteorological value at the target station. When all neighboring stations reported missing values, the value for the target station was assigned as −999. Using this methodology, complete wind speed and precipitation data from 1980 to 2022 were successfully obtained.

The objective synoptic analysis technique (OSAT)37,38 was utilized in this study to differentiate typhoon-associated precipitation and wind data from precipitation and wind data associated with other meteorological events. Given that typhoon interactions may lead to the overidentification of the precipitation caused by typhoons without direct impacts, which may inflate the reconstructed values, the results obtained via the OSAT were further corrected. First, a polyline comprising the grid points that were closest to the actual coastline in a 0.5° × 0.5° latitude and longitude grid was selected to approximate the continental coastline. The minimum distance between the track coordinates of each typhoon and the coastline was computed. Typhoons that stayed more than 450 km away from the mainland coastlines were excluded. Through this objective identification and filtration process, we acquired metrics during the typhoon impact period, such as the total precipitation, maximum daily precipitation, and maximum wind speed, covering 570 typhoons affecting the eight coastal provinces from 1980 to 2022, stored in the file named ‘Typhoon meteorological data(1980–2022).xlsx’.

In this study, we defined that a station was affected by a typhoon if it experienced typhoon-related winds or precipitation, or both. To facilitate the establishment of empirical relationships between typhoons and their associated disasters, disaster records (2004–2013) were matched with typhoon meteorological data. Records were excluded if they were duplicates, if they lacked a corresponding station, or if they had null values for both the wind and precipitation metrics. Finally, 1,845 validated disaster records were retained, covering 398 county-level meteorological stations and 75 typhoons from 2004 to 2013.

Reconstruction

For the 530 stations in this study, an average of only 4.6 original disaster records per station was insufficient to establish reliable statistical relationships. Furthermore, about a quarter of those stations have no records. Given the limitations in terms of both the temporal span (10 years at most) and spatial coverage, it is essential to reconstruct a disaster dataset that covers a longer period and broader range through a systematic reconstruction approach. The reconstruction process involves three main phases (Fig. 2).

Fig. 2
figure 2

Flow chart of the reconstruction process for the reconstructed dataset of crop areas affected by typhoons.

Phase 1 Acquire extended datasets for single stations

The statistical reliability of the functional relationships was compromised for stations with insufficient disaster records, necessitating the integration of disaster data from adjacent stations. An adjacent station was defined as a station located within a certain distance parameter of D from the target station. According to Wu et al.31, D should be consistent for all stations. In this case, the results showed that the correlation between the reconstructed data and the original disaster records decreased as the increase of the D and the number of adjacent stations. To optimize the results of fitting quadratic polynomials, we implemented a variable sample size parameter, G (25 ≤ G ≤ 85), with seven values (G1-G7) at intervals of 10. For example, extended datasets for single stations can be obtained for G1 according to the following steps:

Step 1: Assess the sample-size requirements. For a single station, if the number of disaster records is equal to or exceeds G1, all the disaster records of that station should be included in its dataset without modification. Otherwise, the dataset expansion process moves on to Step 2.

Step 2: Define the adjacent stations. The extended distance parameter D is set so that the adjacent stations can be defined. For the station with fewer records than G1, its adjacent stations are defined as those stations within a distance D of the station. D ranges from 0 to 450 km at 25-km intervals until the total number of records from this station and its adjacent stations reaches G1.

Step 3: Compile the extended dataset for each station. The disaster records in the extended dataset of the single station consists of records from this station and its adjacent stations. The extended single-station datasets are subsequently used to establish fitting relationships between the typhoon disasters and comprehensive typhoon impact factors.

We compiled extended datasets for 530 stations from 2004 to 2013, meeting the sample size requirement of G1, with varying distance parameters D. Next, we replaced G1 with G2 through G7 in succession, repeating Steps 1 through 3 iteratively, and obtaining seven groups of extended datasets. These single-station extended datasets were stored in a zip file named ‘Extended datasets for single stations (2004–2013).zip’ with seven subfolders according to G1-G7, respectively.

Phase 2 Establish relationships between disasters and comprehensive typhoon impact factors

The following steps were performed based on the seven groups of extended datasets for 530 stations from 2004 to 2013:

Step 1: Standardize the affected crop area and meteorological data. To eliminate the differences in units and magnitudes among different variables, each variable in the extended single-station dataset was standardized using the Z score method. The formula is as follows:

$${{Z}}_{{sij}}=\frac{{x}_{sij}-{\mu }_{si}}{{\sigma }_{si}}$$
(4)

i = 1, 2, 3, 4; j = 1, 2, 3, …, ns

s = 1, 2, …, 530

where s is an ordinal value of the meteorological stations; i takes the values 1, 2, 3 and 4, corresponding to the affected crop area (hm2), total precipitation (mm), maximum daily precipitation (mm), and maximum wind speed (m/s) during the typhoon impact period, respectively; ns represents the total number of samples in the extended dataset of the sth station; j denotes the sample ordinal; xsij and Zsij represent the values before and after the standardization of the jth sample for the ith variable, respectively; and μsi and σsi are the mean and variance of variable i for the sth station.

Step 2: Identify comprehensive typhoon impact factors. The canonical correlation analysis (CCA) method proposed by Hotelling39 was used to analyze the station-specific relationships between the affected crop area and the three typhoon impact factors. Thus, the weight coefficients of three standardized typical variables (total precipitation, maximum daily precipitation, and maximum wind speed), as, bs and cs, were obtained. These coefficients were subsequently utilized to construct comprehensive typhoon impact factors that were optimally correlated with the affected crop area. The formula is represented as follow:

$${y}_{s}={a}_{s}{Z}_{s2}+{b}_{s}{Z}_{s3}+{c}_{s}{Z}_{s4}$$
(5)

s = 1, 2, …, 530

where ys represents the comprehensive typhoon impact factor sequence for the sth station; \({Z}_{s2}\), \({Z}_{s3}\) and \({Z}_{s4}\) represent the standardized total precipitation, maximum daily precipitation, and maximum wind speed sequences during the impact of the typhoon for the sth station, respectively; and as, bs and cs denote the corresponding weight coefficients for these normalized typical variables, respectively.

Step 3: Establish the fitting relationships between the affected crop areas and the comprehensive typhoon impact factors. The relationship between the station-specific disaster variable \({Z}_{s1}\) (the standardized affected crop area sequence for the \({s}_{{th}}\) station) and the comprehensive typhoon impact factor sequence \({y}_{s}\) was fitted as a quadratic polynomial:

$${L}_{s}={{d}_{2s}y}_{s}^{2}+{d}_{1s}{y}_{s}^{1}+{d}_{0s}$$
(6)

s = 1, 2, …, 530

where Ls represents the standardized fitted affected crop area sequence of the sth station, and \({d}_{0s}\), \({d}_{1s}\) and \({d}_{2s}\) are the constant, linear, and quadratic coefficients, respectively. Seven groups of fitting relationships were established based on the seven groups of extended datasets with different sample numbers (G1-G7). In other words, seven extended datasets with different sample sizes (G1-G7) are available for each station, as well as the seven corresponding fitting relationships.

Step 4: Screen the fitting relationships. To ensure the validity of the fitting relationships, the seven candidate fitting relationships for each station need to be screened according to the two-sided Pearson correlation test (similarly hereinafter regarding the correlation coefficient or significance level). The evaluation began with the G1-based relationship, progressing to higher G values only when the current relationship failed to meet the significance threshold (0.05). This process continued until a statistically valid relationship was identified. The G7-based relationship was selected by default when none of the preceding relationships are valid. Finally, 530 fitting relationships corresponding to the 530 stations were obtained and denoted as Ls (ys), where s represents the serial number of the station.

These final relationships of 530 stations are stored in the file ‘Final fitting relationships for 530 stations.xlsx’, whose 7 sheets are arranged according to different G values, recording the coefficients \({d}_{0s}\), \({d}_{1s}\) and \({d}_{2s}\), R2, and significance levels. The final number of stations for each sample size threshold was distributed as follows: 266 (G1), 73 (G2), 61 (G3), 124 (G4), 2 (G5), 1 (G6), and 3 (G7) stations. Notably, the number of significant relationships at the 0.05 level were 527 (99.43%), providing robust support for the reconstruction process.

Phase 3 Reconstruct the disaster dataset

Step 1: Introduce comprehensive typhoon impact factors from 1980 to 2022. First, the sample size for the sth station was expanded from ns (2004–2013, the original recording period) to \({n}_{s}^{{\prime} }\) (1980–2022, the reconstructed period). The comprehensive typhoon impact factor \({y}_{s}^{{\prime} }\) for the reconstructed period of 1980–2022 was subsequently calculated through Eqs. (4, 5), incorporating the previously determined parameters \({\mu }_{si}\) and \({\sigma }_{si}\) from Eq. (4) and as, bs, and cs from Eq. (5).

Step 2: Reconstruct the affected crop area data for each station. The derived \({y}_{s}^{{\prime} }\) values were then substituted into the fitting relationship \({L}_{s}({y}_{s})\) to generate the standardized affected crop area value \({L}_{s}^{{\prime} }\). Finally, these standardized values were transformed into absolute affected crop areas through an inverse standardization process using Eq. (7), completing the reconstruction of the station-specific data from 1980 to 2022.

$${R}_{s}={L}_{s}^{{\prime} }{\sigma }_{s1}+{\mu }_{s1}$$
(7)

s = 1, 2, …, 530

where Rs represents the affected crop area sequence (1980–2022) for the sth station, with nonpositive values taken as 0 to indicate the absence of a disaster. \({L}_{s}^{{\prime} }\) is the standardized affected crop area sequence (1980–2022) of the sth station derived from \({L}_{s}({y}_{s})\). \({\mu }_{s1}\) and \({\sigma }_{s1}\) are the mean and variance of the affected crop area in the expanded record dataset (2004–2013) of the sth station, respectively.

In addition, few disasters in the original records had a total precipitation of less than 4 mm and a maximum wind speed of less than 5 m/s. Consequently, the affected crop area values were set to zero in the reconstructed data under the same circumstances, indicating that no disaster occurred.

Step 3: Obtain the disaster dataset. The reconstructed disaster data from 530 individual stations from 1980 to 2022 were compiled as the preliminary reconstructed dataset. Statistical validation revealed a strong correlation (r = 0.6545, p < 0.01) between the 1,845 original disaster records and their corresponding reconstructed values. This indicates that there is a high degree of agreement between the recorded and reconstructed data.

Error revisions

To enhance the reliability of the preliminary reconstructed dataset, error analyses and revisions were conducted from two perspectives: single stations and typhoon cases. Since all disaster records are nonzero values, stations at which no typhoon disaster was recorded were assigned crop damage values of zero to facilitate analysis.

In the station-specific error revision process, we first calculated the mean error Means using the following formula:

$${{Mean}}_{s}=\frac{{\sum }_{i=1}^{{T}_{1}}({R}_{s}(i)-{S}_{s}(i))}{{T}_{1}}$$
(8)

s = 1, 2, …, 530

where \({R}_{s}(i)\) and \({S}_{s}(i)\) are the reconstructed and recorded affected crop area values, respectively, for the ith typhoon at the sth station; T1 refers to the total number of disaster-causing typhoons recorded from 2004 to 2013, which is a constant of 75; and Means is the mean error of the sth station.

The spatial distribution of the mean errors across the 530 stations is presented in Fig. 3. For most stations, mean errors were positive, indicating that the reconstructed values were generally larger than the original records. Some stations in coastal regions, such as south-central and southeastern Jiangsu and southwestern Guangdong, have mean errors exceeding 5 thousand hm², with some reaching 10 thousand hm². Other notable errors (2.5–5 thousand hm2) were observed at stations in northeastern Zhejiang, northern Hainan, midwestern Guangdong, and the southern coast of Guangxi. Nevertheless, the reconstructed data is relatively accurate for approximately 83% of the stations, whose mean errors falling within the range of (0, 2.5] thousand hm2.

Fig. 3
figure 3

Distribution of the mean errors between the reconstructed and recorded affected crop areas across the 530 stations (unit: 103 hm2).

The reconstructed value of the affected crop area for the sth station was revised to \({R}_{s}^{{\prime} }\) by subtracting the mean error Means from Rs. A non-positive corrected value \({R}_{s}^{{\prime} }\) indicated that no disaster occurred, and was set to zero.

The typhoon-specific reconstruction error was further analyzed. The recorded and reconstructed affected crop areas of the 75 recorded typhoons (2004–2013) and their mean errors (MeanTC) are shown in Fig. 4. MeanTC was calculated using the following formula:

$${{Mean}}_{{TC}}=\frac{{\sum }_{i=1}^{{T}_{1}}({R}_{i}-{S}_{i})}{{T}_{1}}$$
(9)

where Ri and Si represent the reconstructed and recorded values of affected crop area, respectively, for the ith typhoon among the 75 typhoons. T1 represents the total number of typhoons covered by the original records from 2004 to 2013, with a constant value of 75 events.

Fig. 4
figure 4

Recorded and reconstructed affected crop areas for the 75 recorded typhoons from 2004 to 2013 and their mean errors (the red line) (unit: 103 hm2).

MeanTC was calculated as 432.11 thousand hm2, necessitating further correction for typhoon cases. Notably, MeanTC exceeded the reconstructed values for some originally recorded disaster-causing typhoons. Subtracting MeanTC directly from the reconstructed value for each typhoon would yield unrealistic negative values. Thus, an alternative correction approach was applied:

$${D}_{{TC}}={R}_{{TC}\min }-{S}_{{TC}\min }$$
(10)

where RTCmin represents the minimum reconstructed affected crop area among the 75 typhoon cases, STCmin denotes the corresponding recorded affected crop area, and DTC serves as a correction parameter, which was calculated as 86.90 thousand hm2.

The revised value of reconstructed affected crop area \({R}_{{t}_{0}}^{{\prime} }\) for typhoon t0 (570 in total, t0 = 1, 2, 3, …, 570) was obtained by subtracting the correction parameter DTC from the initial reconstruction result \({R}_{{t}_{0}}\). In this process, we excluded 56 typhoons with negative \({R}_{{t}_{0}}^{{\prime} }\) values, resulting in 514 validated typhoon events (t = 1, 2, 3, …, 514).

To account for the variations of the disaster severity among different typhoons at the same station, a secondary correction for station-specific disasters was implemented using the following coefficients:

$${C}_{t}=\frac{{R}_{{t}_{0}}^{{\prime} }}{{R}_{t}}$$
(11)

t = 1, 2, 3, …, 514

where Ct is the correction coefficient for the tth typhoon; Rt represents the reconstructed affected crop area value of the tth typhoon before correction, which is the summation of the station-level values \({R}_{s}^{{\rm{{\prime} }}}\) across all stations affected by the tth typhoon; and \({R}_{t}^{{\prime} }\) represents the reconstructed affected crop area value of the tth typhoon after typhoon-specific correction. The final corrected station-level reconstructed affected crop area values \({R}_{s}^{{\prime} {\prime} }\) were obtained by applying the correction coefficients Ct to \({R}_{s}^{{\prime} }\):

$${R}_{s}^{{\prime} {\prime} }={R}_{s}^{{\prime} }\times {C}_{t}$$
(12)

After performing these two error corrections (station-specific and typhoon-specific), the correlation coefficients between the 1,845 original records and their reconstructed values increased from 0.6545 to 0.6625 and 0.6679, successively, at the 0.01 significance level. These results demonstrate the effectiveness of the corrections. The revised reconstructed dataset is stored in the file ‘Reconstructed dataset of crop area affected by typhoons.xlsx’.

Classification

The reconstructed disaster data were classified into four levels (light, moderate, severe, and extremely severe disasters) based on single stations and typhoon cases, respectively. Four methods were considered for this task: the percentile method (PM), natural break classification (NBC) method, K-means clustering algorithm, and optimal partition method (OPM). The PM relies on the subjective determination of grade percentages according to data distribution characteristics.

In terms of the station-level disaster data, after comprehensive analysis and comparative testing, a hybrid classification strategy was adopted. First, the 1,845 historical records was classified using the K-means algorithm. Then, the proportions of each class obtained in the first step were applied to the other sets of data. The K-means algorithm produced the following severity distributions: light (80.05%), moderate (16.05%), severe (3.47%), and extremely severe (0.43%). These proportions were applied to classify the reconstructed data with positive affected crop area values for both the 75 recorded typhoons (2004–2013) and 514 typhoons in reconstructed dataset (1980–2022). The detailed thresholds are presented in Table 1. This hybrid approach (K-means and PM) enhances the objectivity and rigor of the classification process.

Table 1 Grading criteria for affected crop areas based on single stations and typhoon events for three sets of data (unit: 103 hm2).

The affected crop area data of the typhoon events were classified in a similar way. The proportions of the four grades (light, moderate, severe, and extremely severe) accounted for 42.67%, 37.33%, 13.33%, and 6.67%, respectively. The detailed classification thresholds for all the datasets are presented in Table 1. Single-station disaster records and disaster-causing typhoons in the reconstructed dataset are graded according to the grading criteria in the last column of Table 1, respectively. This helps users to get a quick overview of the severity of the disaster.

Data Records

The reconstructed affected crop area dataset and its relevant files are uploaded and stored in a Figshare repository (https://doi.org/10.6084/m9.figshare.28388378.v1)40, with each entry representing an individual disaster record. Table 2 presents the descriptions of the dataset, introducing its every column. Table 3 shows the basic descriptions of other relevant files, including their file names, the summarizing descriptions of the files, and the contents.

Table 2 Descriptions of the reconstructed affected crop area dataset (the file named Reconstructed dataset of crop area affected by typhoons.xlsx).
Table 3 Basic descriptions of the relevant files for the reconstructed affected crop area dataset.

Technical Validation

To better demonstrate the reliability of the reconstructed dataset, we compared the characteristics of the recorded and reconstructed data.

First, we analyzed the annual variations in the affected crop area from 2004 to 2013 through three metrics (Fig. 5). The annual cumulative affected crop area represents the total area of crops affected by typhoons across all stations within a year. The annual cumulative frequency is the total number of times that all affected stations recorded nonzero values for the affected crop area in one year.

Fig. 5
figure 5

Annual variations in (a) the annual cumulative affected crop area (unit: 103 hm2), (b) the annual cumulative frequency (unit: times), and (c) the annual cumulative number of disaster-causing typhoons on the basis of the 1,845 original records (red line with dots), reconstructed data corresponding to these 1,845 records (yellow line with diamond marks), reconstructed data of 75 recorded TCs (light blue line with square marks), and reconstructed data of all 117 TCs from 2004 to 2013 (dark blue line with triangular marks).

The changes in the annual cumulative affected area and frequency were consistent between the 1,845 records and their reconstructed counterparts (Fig. 5a,b), with correlation coefficients of 0.9807 and 0.9981, respectively, at the 0.01 significance level. In addition, the reconstructed data cover 117 disaster-causing typhoons from 2004 to 2013, which is more than the 75 recorded typhoons. This discrepancy indicates that 42 typhoons affected the mainland without any disaster data recorded. Furthermore, the reconstruction process have taken most affected stations into account, resulting in larger annual cumulative affected areas and more frequency in the reconstructed datasets for the 75 recorded typhoons and 117 reconstructed typhoons than in the original recorded data.

Figure 6 shows the spatial distributions of the average annual frequency and affected crop area for the recorded data from 2014 to 2013 and the reconstructed data over 10 years (2014–2013) and 43 years (1980–2022). A quantitative analysis of the spatial frequency distributions (Fig. 6a–c) revealed statistically significant correlations (p < 0.01) between the recorded and reconstructed data, with coefficients of 0.4414 and 0.4113 for the 10-year and 43-year reconstructions, respectively. These correlations indicate that the reconstructed data can generally capture the spatial patterns in the original records, although discrepancies exist in some regions. Overall, the average annual frequencies in the reconstructed data are greater than those in the recorded data. According to the recorded data (Fig. 6a), the most frequently impacted regions are the Zhejiang–Fujian and Guangxi–Guangdong provincial junctions and their adjacent areas. The maximum frequency clusters are observed in coastal regions among them. In terms of the two sets of reconstructed data, in addition to the aforementioned regions, some stations in central Guangdong and the Fujian-Guangdong junctions are also frequently affected. Besides, the differences between the coastal and inland areas are smaller.

Fig. 6
figure 6

Spatial distributions of the average annual frequency (ac) (unit: times) and average annual affected crop area (df) (unit: 103 hm2) based on (a,d) 10-year (2004–2013) records; (b,e) 10-year (2004–2013) reconstructed data; (c,f) 43-year (1980–2022) reconstructed data.

The spatial distributions of the average annual affected crop area (Fig. 6d–f) exhibit consistent patterns across the three sets of data, with the disaster severity decreasing from coastal to inland areas. The southern coasts of Guangdong and Guangxi, northern Hainan, coastal Zhejiang and Fujian, and eastern coasts of Jiangsu experienced the worst typhoon disasters. The correlation coefficients between the records and the two reconstructed datasets were 0.5652 (2004–2013) and 0.5706 (1980–2022) at the 0.01 significance level, indicating that the reconstructed data effectively reflect the spatial patterns of the original disaster records. Nevertheless, there are discrepancies in some regions, such as central and southern Jiangsu. The reconstructed data show that several stations in this area were more severely affected compared to the historical records. These differences may have occurred owing to the inherent limitations of the correlation-based reconstruction approach or other objective factors, such as inconsistencies in local disaster documentation practices, and regional variations in disaster response and mitigation capabilities.

Usage Notes

Combining county-level disaster records (2004–2013) with typhoon meteorological data (1980–2022), this study reconstructed a county-level dataset of crop area affected by typhoons across 8 coastal provinces in China for the period 1980–2022 using canonical correlation analysis (CCA), bias correction techniques and other statistical methods. This dataset is suitable for applications, such as studies on the development of typhoon risk forecasting and early warning models, and the formulation and implementation of disaster prevention policies. This study provided a detailed and transparent description of the dataset reconstruction methodology. Researchers who are interested can access the relevant data through the provided platforms to replicate the procedures, obtaining the same results. This transparency will facilitate systematic improvements on the dataset in the future. Furthermore, this transparency may inspire researchers undertaking reconstructions of analogous data, such as datasets for specific crops affected by typhoons, or other disaster metrics like the number of collapsed houses. All the data described in this study are publicly available and can be used with a citation.