Introduction

Crash cost estimates reflect the effects of road safety management policies and allow the appraisal of the economic benefits of road safety improvement projects1,2. However, estimating crash costs is complex due to the involvement of various factors, including direct cost (such as property damage, medical, and legal costs), indirect cost (such as congestion, productivity loss for work and family, and tax losses), and intangible cost (such as loss of life or degradation in quality of life, and the pain and suffering for both victims and their families)3,4,5,6,7. External cost estimation is particularly challenging, as it requires consideration of multiple stakeholders, including victims, insurance companies, government agencies, and affected individuals such as families and friends8.

Previous studies on crash cost estimates have largely focused on the magnitude, aggregated values from large-scale crash data, and provided an overall compilation of the estimates without exploring the finer details . For instance, Blincoe et al.9 and Wijnen et al.1 estimated that national-level crash costs range from 0.4% to 4.1% of Gross Domestic Product (GDP) in European countries and the United States. Similarly, a global comparison of crash fatality costs held the same trends, but further indicates that less-developed countries have a lower percentage of GDP attributed to crash costs10. Levinson et al.11 calculated crash costs per vehicle-kilometer-traveled (vkt), reporting $0.04/vkt for rural highways and $0.02/vkt for urban highways. This supports previous observations that rural areas suffer a higher crash costs12,13. Parry14 examined external costs borne by others, finding an average of 2.2–6.6 cents per mile annually for crashes with and without fatalities during 1998–2000. However, these studies often assume an even distribution of crashes and fail to capture spatial variations across road networks.

In reality, crash costs are influenced by road characteristics and traffic conditions, leading to significant fluctuations at a fine geographical scale15,16,17. The principles of “Sustainably-Safe Traffic” emphasize the importance of encouraging drivers to select safer routes to reduce crash casualties18,19, which requires more localized estimates of crash cost differences. Unfortunately, this aspect has not been adequately solved in previous research.

To address these gaps, this study proposes a methodological framework for link-based crash cost analysis that considers both internal and external cost factors and explains how to quantify the average and marginal values at a microscopic level. It allows for an examination of spatial variations in crash costs, identifying which road segments are safer for travel.

As a proof-of-concept, this framework is applied to the Minneapolis-St. Paul (Twin Cities) metropolitan area to assess crash costs across the region’s road network. The data, methodology, results, and conclusion of this study are discussed in sections Methodology–Conclusions in turn.

Methodology

The National Safety Council (NSC) introduced the “KABCO” injury scale to classify crash severity levels: “K” for fatal crashes, “A” for incapacitating injuries, “B” for non-incapacitating injuries, “C” for pain complaints, and “O” for property-damage only crashes. This scale has been widely adopted for reporting crash records and establishing crash cost estimates20. Previous studies typically estimated overall crash frequency without distinguishing crash types initially, and then analyzed crash severity levels separately21,22,23,24. These estimates can be further used for crash cost analysis by applying unit crash cost specifications to monetize the results. These steps build the train of thoughts in this study, as outlined in Fig. 1.

This framework can be applied to any region where the necessary data, including crash records and link-related variables, are accessible. Detailed methodologies are presented in the following subsections.

Fig. 1
figure 1

Methodological framework: internal and external crash cost estimates.

Crash frequency

Safety Performance Functions (SPFs), as defined in Highway Safety Manual (HSM)25, are statistical base models that can be used to estimate the average crash frequency based on specific roadway features under existing conditions or to predict crash frequency under projected future conditions26. SPFs are applicable at both the micro (e.g., individual road segments or intersections) and the macro level (e.g., transportation analysis zones or counties)27.

In this study, micro-level estimation is performed, considering individual link segments as observations. The resulting models are applied to all links across the metropolitan area. Notably, conventional variables used in SPFs include Annual Average Daily Traffic (AADT) and segment length. However, additional variables, such as those representing driving speed or speed variance, are also tested to improve model performance.

We use the Negative Binomial Distribution for the SPFs, which is an extension of the Poisson Distribution that effectively models count data with overdispersion (where variance exceeds the mean)26. This overdispersion often arises from the nature of crash occurrences, as many roads report a frequency of zero crashes, while only a few locations experience multiple incidents. Traditional models may underestimate the variability in such cases, leading to inaccurate representations of crash patterns.

The Negative Binomial model for SPFs is expressed as follows,

$$\begin{aligned} ln(y)=\beta _{0}+\sum _{k=1}^{K}\beta _{k}x_{k} \end{aligned}$$
(1)

where y represents the dependent variable measuring the number of crashes in SPFs, \(x_{k}\) represents independent variables, K represents the number of independent variables, \(\beta _{k}\) are coefficients.

Crash severity

The ordered probit model is widely used for crash severity analysis and is suitable for cases involving categorical dependent variables28,29,30. Specifically, it recognizes the ordinal nature of the data and operates under the assumption of a continuous underlying variable with normally distributed errors, enabling more efficient parameter estimation while reducing the number of required parameters. Moreover, this model effectively addresses the limitations found in traditional approaches, such as the independence of irrelevant alternatives and the lack of a closed-form likelihood30,31.

Its general specification is expressed as:

$$\begin{aligned} y^{*}_{j}=X_{j}\beta +\varepsilon _{j} \end{aligned}$$
(2)

where \(y^{*}_{j}\) is a latent variable describing the crash severity of the \(j^{th}\) crash, \(X_{j}\) is a vector of independent variables, \(\beta\) is a vector of coefficients, \(\varepsilon _{j}\) is the random error term.

The observed variable \(y_{j}\) is resolved by the following model,

$$\begin{aligned} y_{j}= \left\{ \begin{array}{l l} 1 & \quad {if \ \ -\infty \ \le y^{*}_{j} \le \ \mu _{1}} \\ 2 & \quad {if \ \ \ \mu _{1} \le y^{*}_{j} \le \ \mu _{2}}\\ 3 & \quad {if \ \ \ \mu _{2} \le y^{*}_{j} \le \ \mu _{3}}\\ 4 & \quad {if \ \ \ \mu _{3} \le y^{*}_{j} \le \ \mu _{4}}\\ 5 & \quad {if \ \ \ \mu _{4} \le y^{*}_{j} \le \ \infty }\\ \end{array}\right. \end{aligned}$$
(3)

where \(y_{j}=(1,2,3,4,5)\) stand for difference severity levels, recognizing property damage only, complaint of pain, non-incapacitating injury, incapacitating injury, and fatal crashes, respectively; \(\mu _{1}\), \(\mu _{2}\), \(\mu _{3}\) and \(\mu _{4}\) stand for the to-be-estimated threshold values.

Unit crash cost specifications

From a traveler’s perspective, crash costs include both internal and external components. Internal costs refer to the expenses borne by each traveler involved in a crash, while external costs account for the impact on others, such as victims, insurance companies, and government agencies8.

Blincoe et al.9 evaluated the economic costs of motor vehicle crashes by examining various cost factors based on the severity of incidents. These cost factors include direct expenses, such as medical expenses, property damage, legal fees, and vehicle repairs, as well as indirect costs like lost productivity, insurance premiums, and administrative expenses. However, a key question remains: to what extent are these cost factors external to individual travelers?

Vickrey32 defined crash externality as the increase in crash costs experienced by existing drivers due to additional vehicles on the roads. This definition simplifies the analysis by bypassing the complexities of joint effects, such as driving behavior, insurance policies, and traffic laws. Empirical studies have shown that marginal changes in crash costs can sometimes be negative, as increased congestion raises crash rates but lowers severity due to slower speeds33,34.

Parry14 quantified the external portion of various cost factors for single- and multi-vehicle crashes, providing a foundation for subsequent crash cost studies8,35,36. In our study, we define unit crash cost specifications using estimates from Blincoe et al.9 as a reference, see Table 1. This table outlines the cost factors considered in our crash cost estimates and the external proportion of each cost factor based on Parry14’s study.

In summary, in our estimates,

  • The internal costs of crashes fully account for lost productivity, both in the market and household settings, and partially include medical expenses, property damage, emergency service costs, insurance administration, and legal fees. The proportions of these costs are determined by individual health and vehicle insurance policies; however, we use an aggregated average here. The loss of quality of life is considered as an internal cost if the incident involves a single vehicle; in the case of multi-vehicle crashes, it is only partially accounted for as an internal cost.

  • The external costs of crashes include the remaining medical expenses, property damage, emergency service costs, insurance administration, legal fees, and the loss of quality of life in multi-vehicle incidents. This category also fully accounts for workplace disruptions due to employee loss or absence, as well as congestion costs resulting from vehicle crashes.

Based on the unit crash cost specifications, the average internal and external crash costs can be estimated as follows:

$$\begin{aligned} C_{\bar{s},i_{f,Q}}=\sum _{z} \frac{N_{s,i_{f,Q}}*R_{s,i_{f,Q}}*u_{s_{z}}}{N_{Y}*N_{D}*Q} \end{aligned}$$
(4)

where \(C_{\bar{s},i_{f,Q}}\): Average crash cost on link \(i_{f}\), where f specifies the functional road classifications and Q refers to the average annual daily traffic (AADT), defining the traffic condition; \(N_{s,i_{f,Q}}\): Expected crash frequency on link \(i_{f}\); \(R_{s,i_{f,Q},z}\): Probability of crashes specific to severity level z happened on link \(i_{f,Q}\); \(u_{s_{z}}\): Unit crash cost, internal or external, specific to severity level z; \(N_{Y}\) and \(N_{D}\) describe the duration of the analysis period, representing the number of years in counting and number of days in a year, respectively.

Marginal crash costs are calculated by introducing one additional vehicle on each link to evaluate the combined effects of changes in crash frequency (\(N_{s,i_{f,Q}}\)) and severity (\(R_{i_{f,Q}},z\)), expressed as:

$$\begin{aligned} C_{\hat{s},i_{f,Q}}=(C_{\bar{s},i_{f,Q+1}}-C_{\bar{s},i_{f,Q}})\times Q \end{aligned}$$
(5)
Table 1 Unit crash cost specification and their external portion.

Data

This study incorporates multiple datasets to implement the proposed framework in a real-world scenario, as outlined in the process shown in Fig. 1. The geographical scope of the study area, the seven-county Minneapolis-St. Paul Metropolitan Area, is shown in Fig. 2.

Fig. 2
figure 2

Map of the Minneapolis - St. Paul (Twin Cities) Region.

Crash records

Crash records from 2003 to 2014 were obtained from the Minnesota Department of Transportation (MnDOT). Note that these records track police-reported crashes, which are more reliable for documenting severe crashes but likely under-report minor ones. Throughout the remainder of the text, crash data refers only to reported crashes.

The records for each year include GIS attributes, e.g., route numbers, reference points, and coordinates, along with crash-related details like type, severity, weather conditions, and lighting. These records are provided as GIS shapefiles, enabling precise mapping onto the road network. The data can be aggregated by link segment to calculate crash counts for use as dependent variables in safety performance functions or analyzed individually to examine crash severity for ordered probit models.

Table 2 summarizes the number of crashes, categorized by crash type, severity, and Functional Road Classification, over the 12-year period.

Table 2 Total number of crashes (12 years) by crash type, severity, and functional road classification.

Link variables

The TomTom road network, sourced from the Metropolitan Council, is drawn as a polyline shapefile that provides spatial details of roadways within the Twin Cities network37. Variables such as link length and road type are derived directly from this dataset.

TomTom speed data offers speed estimates aggregated from millions of GPS records and linked to the TomTom road network. These estimates are stratified by time periods, dividing a day into seven parts to account for peak and non-peak hours, as well as by speed percentiles, ranging from the fastest 5% to the slowest 5% of recorded speeds. For this study, the 50th percentile (median) speed during morning peak hours (7 a.m. to 9 a.m.) is selected to represent travel speed, while the difference between the 10th and 90th percentiles is used as an indicator of speed variance. These measures serve as independent variables in crash frequency estimation models. Notably, the speed variance reflects an aggregated yearly index across all vehicles using the link, rather than intra-day or intra-vehicle variations.

The MnDOT Traffic Volume Program38 provides Annual Average Daily Traffic (AADT) estimates for Minnesota, based on data collected from approximately 33,000 count locations on trunk highways, county state aid highways (CSAH), county roads (CR), and municipal state aid streets (MSAS). Traffic counts, typically recorded over short durations (e.g., 48 hours), are adjusted using seasonal and axle correction factors (for trunk highways). As a standard independent variable in safety performance functions, AADT data for the Twin Cities metro region was extracted and integrated with the TomTom road network.

The Federal Urban/Rural GIS Shapefile, obtained from MnDOT’s Transportation Data and Analysis division39, delineates roadways in Minnesota by Federal Adjusted Urban Area boundaries into Urban, Small Urban, and Rural classifications. This dataset is also linked to the TomTom road network for further analysis.

Results

Separate models are developed for estimating crash frequency and severity, specific to crash type and functional road classifications, for two main reasons: first, these estimates are used in crash cost analysis, where internal and external cost assignments differ by crash type and by single- versus multi-vehicle crashes; second, using specialized models for different road types is statistically validated, as road types have distinct attributes influencing crash characteristics23,40.

Safety performance function

The selected independent variables for the SPFs are described in Table 3. Note that \(V_{\text {Var}}\) is not the typical measure of speed variance, which reflects the dispersion of space-mean speeds among drivers within or across lanes at the same time41,42. Instead, it is more likely to represents the dispersion of time-mean speeds, calculated as the difference between the fastest 5% and the slowest 5% speeds over a specific time period, such as morning peak hours across a year.

The regression results of the safety performance functions are shown in Table 4. Note that speed (V) is dropped from all models to avoid multicollinearity problem, as it is highly correlated with AADT (Q). AADT (Q) and segment length (L) are transformed into natural log format. Other functional forms are also tested, but cannot improve the fits.

Table 3 Definitions and descriptive statistics of independent variables selected for the safety performance functions.
Table 4 Safety performance function results for single-vehicle and multi-vehicle crashes by roadway class.

.

The regression results for the SPFs are presented in Table 4. Note that speed (V) is excluded from all models to avoid multicollinearity problem, as it is highly correlated with AADT (Q). Both AADT (Q) and segment length (L) are transformed into their natural logarithmic forms for better model performance. Alternative functional forms were tested but did not improve the model fits.

The conventional variables (Q and L) have significant positive effects on crash counts for both single- and multi-vehicle crashes across all road classifications. As expected, links with higher AADT or longer lengths experience more crashes, regardless of the crash type. Speed variance , which indicates on-road shockwaves, is positively correlated with crash counts, highlighting that more severe stop-and-go driving conditions are associated with higher collision rates, particularly for multi-vehicle crashes.

Additionally, urban roadways tend to have higher crash counts than rural ones. This pattern may be attributed to network structure features, such as higher road density in urban areas.

Ordered probit model

The same link property attributes are used as independent variables in the ordered probit models to estimate the probability of each injury severity category for a given crash. Additional dummy variables are included to capture road surface conditions, including \(W_{\text {Wet}}\), \(W_{\text {Snow}}\), \(W_{\text {Iced}}\), and \(W_{\text {Others}}\), with dry road surface serving as the baseline. To account for light conditions, two dummy variables are added to identify the road light-on (\(D_{\text {light-on}}\)) and light-off (\(D_{\text {light-off}}\)) scenarios, using daylight as the reference group. The descriptive statistics for these variables are summarized in Table 5.

As in the safety performance functions, the natural logarithmic transformations of Q and L are used, as they yield lower AIC values and higher pseudo \(R^{2}\), compared to the untransformed variables. The regression results for the ordered probit models are shown in Table  6.

The analysis reveals that AADT is negatively correlated with crash severity, indicating that higher traffic volumes are associated with less severe crashes, likely because drivers tend to exercise greater caution on busier roadways. Segment length shows a positive relationship with crash severity, as longer segments are associated with more severe crashes. Speed variance has some impact, particularly in cases such as multi-vehicle crashes on primary and minor arterials; however, the coefficients are too small to significantly affect the probability of each injury severity category. This finding is surprising, as greater speed variance was expected to have a stronger positive effect, particularly for multi-vehicle crashes. Future studies should consider this question. Additionally, urban roadways are associated with less severe crashes, likely due to lower speeds and shorter travel distances in urban environments.

The road surface condition variables show that crashes on wet, snowy, and icy roads are associated with lower injury severity compared to crashes on dry roads. These results align with previous findings that adverse winter conditions, such as snow and ice, tend to reduce injury severity due to lower average speeds and more cautious driving. However, property damage-only crashes are more frequent during winter, as noted in other studies43,44,45.

Link-based crash frequency and severity estimates

Link-based crash frequency and severity, along with their marginal changes, are estimated under current traffic conditions. The mean values across different link types are summarized in Table 7. As expected, highways experience a higher number of crashes compared to surface roadways for both single- and multi-vehicle crashes; however, the proportion of injury crashes is comparatively lower, with property damage-only (PDO) crashes accounting for over 70% of the total. Urban roads generally have a higher crash frequency than rural roads, while roads in core cities, Minneapolis and St. Paul, exhibit lower crash frequencies compared to suburban areas (urbanized areas outside the core cities). The severity distribution across area types indicates that fatal crashes, for both single- and multi-vehicle types, represent a higher percentage in rural areas. Additionally, the introduction of more vehicles on roads increases crash frequency but reduces injury severity, consistent with the regression results.

Table 5 Descriptive statistics of independent variables selected for the ordered probit models.
Table 6 Regression results of ordered probit models to estimate crash severity for single-vehicle and multi-vehicle crashes by roadway class.
Table 7 Estimates of average crash frequency and severity and their marginal changes.

Crash cost estimates

The internal and external crash costs are estimated for each link in the Twin Cities road network, considering both average and marginal costs. Weighted average costs are calculated to summarize the estimates using the following formula:

$$\begin{aligned} \bar{C}_{w,s}=\frac{\sum _{i}C_{s,i}*Q_{i}*L_{i}}{\sum _{i}Q_{i}*L_{i}} \end{aligned}$$
(6)

where \(\bar{C}_{w,s}\): Weighted average crash cost; \(C_{s,i}\): Crash cost on link i ($/veh); \(Q_{i}\): AADT on link i; \(L_{i}\): Segment length of link i (km).

The results, presented in Table 8, show that the weighted average internal crash cost across all links in the Twin Cities is approximately $0.13/veh-km, with 94.5% of links having a value below $1.00/veh-km. While costs are similar by area type, roads in core cities generally exhibit higher internal crash costs than suburban roads, and rural roads are slightly more hazardous. Highways, however, are significantly safer than other surface roadways, with a weighted average cost of about 1/2–1/3 of that for other roads.

Table 8 Link-based Internal and External Crash Cost Estimates ($/vkt): This table shows that the internal crash costs incurred by travelers themselves are higher than the external costs imposed on others, and that highway crash costs are lower compared to those for other road types.

The external average crash cost imposed on others is much lower than the internal costs borne by travelers. The weighted average external cost is approximately $0.08/veh-km, with 98% of links having an external average crash cost below $1.00/veh-km. The trends for external costs by area type and road type are the same as those for internal crash costs.

Marginal internal and external crash costs are lower than their average counterparts, but the patterns remain consistent: internal crash costs borne by travelers exceed external costs imposed on others, and highways and suburban roads (outside core cities) are safer than other roadways.

Our estimates reveal no significant conflicts with previous studies in terms of magnitudeFootnote 1. For instance, Levinson and Gillen46 reported an average crash cost of $0.048 per veh-km for intercity highways. Although this estimate does not specify the proportions of internal and external costs, we believe it primarily reflects the internal costs of crashes, as the factors considered mainly pertain to the functional years lost due to personal injury severity. Additionally, several studies have estimated external crash costs, e.g., Lemp and Kockelman36, who reported $0.077 per veh-km, and Parry and Small47, who reported $0.026 per veh-km in the United States. However, to the best of our knowledge, no previous studies provide fine-grained crash cost estimates that differentiate between internal and external costs by urban area and road type. Therefore, no further comparisons can be made at this stage.

Figure 3 illustrates the spatial distribution of crash cost estimates across the network. It confirms that internal crash costs exceed external costs for both average and marginal calculations (Fig. 3a vs. 3b and Fig. 3c vs.  3d). Additionally, highways are notable for being safer than surface roadways, as indicated by the blue shape in the network maps.

Fig. 3
figure 3

Link-based crash cost estimates for both internal and external versions ($/veh-km): highways are notable in these maps for being safer than surface roadways, as indicated by the blue shape.

Conclusion

This study develops a framework for link-based crash cost analysis, which quantifies the internal and external costs of vehicle crashes at a detailed, microscopic level and implements these costs across a metropolitan road network. To be more specific, internal costs refer to those borne by travelers involved in crashes, while external costs are those imposed on others, such as victims, insurance companies, or government agencies.

The framework is applied to the Twin Cities region as a case study. The results indicate that the weighted average internal crash cost is approximately $0.130/veh-km, while the average external crash cost is about $0.079/veh-km. This demonstrates that travelers bear significantly higher crash costs themselves compared to what they impose on others. Importantly, highways exhibit lower average internal and external crash costs compared to surface roadways, reinforcing the conclusion that they are safer due to their effectiveness in reducing crash-related costs, in line with 2022 statistics from the Insurance Institute for Highway Safety48. This finding suggests that superior design standards of highways, such as separation of traffic, access control, and surface configurations, enhance both travel efficiency and safety. Therefore, allocating resources to upgrade existing roadway infrastructure to higher standards could effectively reduce the incidence of crashes and enhance overall safety in urban areas. However, specific investments should be guided by detailed benefit-cost evaluations, consideration of government budgets, and alignment with strategic plans, particularly if other safety management projects need to take priority for greater efficiency.

Given a value of time at $18.30/hr ($0.305/min)49, which is about twice the average internal crash cost and three times the external cost (based on the network’s mean 50th percentile speed of 62.58 km/hr), crash costs appear to play a smaller role than travel time in route decisions, even if travelers were aware of these costs50. However, insurance companies could play a crucial role by offering incentives for safe driving practices. For instance, reduced premiums or rewards for drivers who maintain a clean driving record could encourage more cautious behavior behind the wheel. But this approach raises important technical questions about how to keep link-based datasets dynamically updated and effectively communicate this information to drivers. If these challenges can be addressed, a positive feedback loop could be established: as dangerous routes are improved and become safer, the updated estimates would reflect these changes, potentially altering drivers’ route preferences even further. Policymakers may also benefit from addressing these challenges, as doing so will help identify areas that require targeted safety interventions. Additionally, these datasets could be significantly important for the decision-making processes of intelligent automated vehicles, which, while generally safer, will likely still face risks in mixed environments with human-driven vehicles, pedestrians, and bicyclists.

Notably, this study uses historical crash records, constrained by limited data access, to validate the practicality of the theoretical framework for link-based crash cost estimates. While we do not anticipate that our key findings would change significantly with the latest updated dataset, given the recent statistics from crash reports and the consistency of previous studies over the past decade, we highly encourage future research to keep the data updated, preferably in a dynamic manner. This will better inform safety-related policy adjustments and enhance policy implementation.

Additionally, the estimates of crash frequency and severity, which serve as the foundation for the link-based crash cost analysis, are derived using negative binomial regression and ordered probit models. Currently, these models lack several important link attributes, such as the number of lanes, speed limits, curvature, and slope-factors that are anticipated to enhance predictive accuracy. While this limitation is not significant for the case of the Twin Cities, which are predominantly flat with well-maintained roads and effective lighting, it would be beneficial for future studies, especially those applying our framework in areas with varying slopes and road conditions, to collect this data for more reliable estimates.

Finally, for the unit cost specification, the framework employs Parry14’s settings for the external proportion of crash costs, which are derived from plausible yet heuristic methods. Refinements to these settings would improve accuracy, but would require more detailed data, including information on driver behavior, insurance policies, crash records, responsibility distribution, vehicle types, and other relevant factors.