Introduction

Condensation trails (contrails) are cirrus clouds that form when water vapor in cold humid air at high altitudes, condenses onto soot particles emitted from aircraft engines. Like other cirrus clouds, contrails have a consequential impact on the planet’s temperature by absorbing outgoing longwave radiation and reflecting incoming solar radiation. The Intergovernmental Panel on Climate Change estimates that contrails make a substantial contribution to aviation’s impact on global warming1.

The estimated warming effect comes primarily from persistent contrails, which are created by a small fraction of flights and last from ten minutes to over twenty hours2. Persistent contrails are formed when planes fly through ice-supersaturated regions (ISSRs), where the relative humidity with respect to ice is greater than 100%, and temperature conditions satisfy the Schmidt-Appleman Criterion3. Aircraft do not often fly through these conditions, and analysis of flight paths has shown that only a small percentage of flights (2–13%) would need to make small adjustments to avoid the majority of estimated contrail warming4,5,6,7,8. Indeed, simulations of navigational contrail avoidance (the practice of flying to avoid contrail forming regions) have shown that it could be an extremely cost-effective way to reduce anthropogenic climate forcing from aviation8,9. ISSR have a typical size of  ≈ 150 km horizontally10 and several hundred meters vertically11,12, so navigational contrail avoidance primarily involves flying above or below the contrail forming regions.

The practical implementation of contrail avoidance faces several challenges. An important hurdle is accurately predicting contrail likely zones (CLZs), regions where contrails are likely to form and persist long enough to have substantial climate impact. Numerical weather prediction models, while valuable, are subject to inaccuracies when predicting ice-supersaturation at the fine-grained spatial scales relevant for contrail formation13,14,15,16. Operational constraints within airline systems, including air traffic management, can also limit the seamless implementation of flight adjustments that are possible in simulations. Proving that these prediction and operational difficulties can be overcome is necessary in order to evaluate whether navigational contrail avoidance is a viable climate change mitigation strategy17. Furthermore, rigorously confirming that CLZ avoidance translates to reduced contrail formation requires robust methodologies, such as well-designed randomized controlled trials.

To test the feasibility of contrail avoidance by a single commercial airline, we conducted the study with American Airlines. Ten senior pilots participated in the trial, completing 44 flights between January and June 2023. Half of the flights (22) were rerouted to avoid CLZs, while the other half kept to their planned routes through CLZs, providing an effective control. We focused exclusively on demonstrating the feasibility of contrail avoidance on a per-flight basis and did not consider the radiative forcing of avoided contrails.

To forecast CLZs, we used both CoCiP18,19,20 (a physics-based simulation model) and a machine learning (ML) model trained on a database of automatically detected contrails21, which attempts to correct for shortcomings in the weather forecast data (see Methods section). By using (treatment- and control-group) blinded human evaluators and satellite imagery, we assessed contrail formation on a per-flight basis. We assessed the effectiveness of our forecasts with a crossover randomized controlled trial, where outbound and return flights in a pair were randomized to pass through or avoid the same forecasted CLZs. This design helped us control for confounding factors such as weather conditions, and aircraft engines. Flights in the treatment group adjusted their routes to avoid CLZs, and we observed the resulting flights in satellite imagery.

Simulations of navigational contrail avoidance6,7 show that small adjustments can prevent persistent contrails. Our methods using satellite imagery do not allow us to test this hypothesis directly, instead we are studying the impact of flight adjustments on the formation of satellite-detectable contrails.

A recent study tested contrail avoidance in the Maastricht Upper Area Control region by adjusting flight altitudes every other day when potential persistent contrails were forecasted22. They used satellite imagery and a contrail detection algorithm to assess whether the deviations were successful on average in an ice-supersaturated region (ISSR). While their approach provided insights into the effectiveness of avoidance maneuvers, it required intervention at the airspace level, limiting its applicability for individual airlines to implement specific mitigation strategies.

Our study builds upon and expands these findings by focusing on a targeted, per-flight approach. This allowed us to pinpoint whether detectable contrails formed by specific flights of interest, eliminating the need for full airspace control and enabling airline-level avoidance implementation. Furthermore, our crossover trial design provided robust statistical significance while impacting a substantially smaller number of flights compared to the alternate-day approach used in the Maastricht study22. Lastly, our per-flight intervention enabled the airline to track the fuel usage impact of avoidance maneuvers, a crucial factor for operationalizing contrail avoidance strategies.

Methods

Flight sample inclusion & exclusion criteria

To select flights for inclusion in the trial, we applied several screening criteria. First, we identified flights whose paths intersected forecasted CLZs. Next, we selected flights that departed a hub (outbound) airport, typically Dallas or Phoenix, to a satellite airport and then returned to the hub (inbound) on the same day. To fly through the same atmospheric conditions on both outbound and inbound flight legs, we restricted the sample to flights where the satellite airport was near a CLZ, and the flight time between hub and satellite airports was less than 4 hours each way. To enable the PACE software to receive CLZ prediction updates, we also required that participating aircraft have internet connections, these are provided by vendor WiFi networks through a combination of ground-to-air and satellite systems. Note that the PACE software is not directly integrated with the aircraft’s systems. The candidates from this screening were selected manually approximately two days in advance. The trial included a total of n = 44 flights. Details on these flights are listed in Table 1 of the supplementary material.

This study focused on tactical near-airport detectable contrail avoidance: delaying the plane’s climb after takeoff or descending early before landing to avoid flying through the CLZ. Contrail forecasts were not integrated into flight planning systems at the time of the study, so participating pilots were required to avoid contrails en route. Focusing on delaying the climb or descending early was the simplest way for pilots to optimize the route, work with air traffic control and manage other operational constraints. An example of an application of this strategy is in Fig. 1.

Fig. 1: Successful contrail avoidance as seen on the PACE panel.
figure 1

The PACE panel shows the vertical profile (purple) of a late ascent contrail avoidance maneuver. A contrail likely zone (CLZ) is shown in gray, just above the left side of the flight path. The pilot originally planned to fly at FL360 (36,000 feet), the level of the gray line. By staying at FL320 (32,000 feet) for part of the flight, the CLZ was avoided and no detectable contrails were created.

We used two methods to forecast CLZs: a physics-based simulation based on the Contrail Cirrus Prediction model (CoCiP)18,19 implemented in the open-source pycontrails library23 and a machine learning (ML) system trained on contrail detections and collocated numerical weather data. Both methods use numerical weather forecast data along an advection path as their primary input. CoCiP uses physical processing based on cloud microphysics, while the ML model is a neural network trained using satellite contrail detection labels.

To identify candidate flights for the trial and plan contrail avoidance routes, we generated altitude-specific CLZ predictions for altitudes between 8 and 13 km, at hourly resolution using both the CoCiP and ML models two days before the planned flights. We used European Centre for Medium-Range Weather Forecasts (ECMWF) high-resolution forecast model for weather inputs. Flights were selected based on CLZs for which the two forecasts were in agreement that a CLZ would be near the turnaround airport for both flights. The day of the flight, pilots coordinated with dispatchers and air traffic control to make the recommended vertical flight adjustments based on updated predictions of the ML-based model.

Contrail likely zone avoidance planning and execution

Prior to the departure of each participating flight, the outbound flight was randomly assigned to either the control group which flew through the CLZ as originally planned, or the treatment group which adjusted its flight to avoid the CLZ. The return flight was assigned to the opposite group to serve as a matched pair. Since flights were chosen with a CLZ near a turn-around, the same plane both flew through and avoided each CLZ 1 to 2 hours apart.

Contrail avoidance flight legs were planned using flight management system software developed by PACE, integrated with the ML contrail forecasts. This system enabled flight planners and pilots to make tactical decisions to avoid contrails, such as manually changing the altitude before takeoff or adjusting the altitude in flight. The platform’s interface was modeled on clear air turbulence, a concept already familiar to the pilots.

Figure 1 shows an example of the contrail screen used for an experiment flight in which the pilot delayed ascent after takeoff to avoid detectable contrails. This flight adjustment decreased the probability of contrail formation because the flight was at a lower altitude during the avoidance treatment than the CLZ lower bound. Note that the aircraft initially climbed to FL320 (pressure = 275hPa, temperature = 235K), which was below the relevant CLZ but still an altitude where contrails can form under certain conditions, and climbed to FL380 (pressure = 205hPa, temperature = 218K) once the CLZ had been passed.

Satellite image-based verification

To determine whether contrails were created or avoided for each flight, we used a sequence of false-color GOES-16 infrared satellite images, which included wind-advected Automatic Dependent Surveillance Broadcast flight trajectories of both the target flight and other nearby flights at 10-minute intervals. The false-color helped to highlight the presence of clouds, in particular thin cirrus clouds should show up as dark features in the image21,24. Access to all image sequences used can be found through links in Note 1 of the supplementary material. Three evaluators (authors of this work) independently assessed whether a contrail was present in each image sequence and whether it was formed by the target flight. The evaluators were blinded with respect to which flights were in the control versus the treatment group, as well as to flight altitude information which correlated with treatment. Evaluators assessed contrail formation at any point along the flight path, not just near the turn-around airports where flight adjustments took place. This was to avoid subjective judgment calls about which portions of the control flights were close enough to the target airports to be labeled.

Figure 2 shows an example satellite image, where the orange line depicts the expected location of the wind-advected flight trajectory over time and the blue lines indicate contrails detected by an automated computer vision system21. The linear contrail intersects the wind-advected flight trajectory 30 minutes after the flight passed through the advected airspace. Each labeling task consisted of analyzing sequences of these images, which were generated for the entire flight path, following each wind-advected flight trajectory for 2 hours post-flight. Evaluators were tasked with judging whether a contrail (which would show up as a dark, linear feature in the image) lines up with the expected location of the flight trajectory. Automated contrail detections were provided to the evaluators, though evaluators had the option to label flights as having formed a contrail if that contrail was visible to them, even if it was not detected by the automated system.

Fig. 2: GOES-16 satellite perspective of original and wind-advected flight paths, with contrail detections.
figure 2

Example of one frame of the GOES-16 satellite imagery sequence over the Gulf Coast area. This was used for labeling whether American Airlines flight 189 created a detectable contrail. Thick lines show the original flight path and wind-advected flight trajectory, along with contrails detected by the computer vision system21. Other advected flight paths have a variety of lighter colors on thinner lines. In this case the alignment between the advected flight path and the observed contrail led the evaluators to conclude that this flight made a detectable contrail.

Randomized crossover trial

We conducted a randomized crossover intervention trial to establish causality between contrail avoidance and contrail presence. In particular, we were interested in testing the one-sided alternative hypothesis that the treatment group (contrail avoidance) would have fewer contrails than the control group (no contrail avoidance). The outbound and return flights of the same turn around airport and CLZ were naturally treated as matched pairs, controlling for confounders such as weather conditions and aircraft engines.

To account for the relatively small sample size, we used nonparametric permutation testing. This approach does not make any assumptions about the distribution of the data. For each matched set of flights, we randomly permuted the treatment and control labels and computed a paired-sample exact sign-test statistic on the binary results25,26. We repeated this process 200,000 times to generate the null distribution, and calculated the one-sided p-value as the proportion of permuted datasets in which the test statistic was smaller or equal to the test statistic for the observed dataset. A link to the necessary code and data to reproduce the analysis can be found in Note 2 of the supplementary material.

Contrail likely zone forecast models

We used two different models to forecast CLZs. The Contrail Cirrus Prediction (CoCiP) model is a physics-based model that simulates contrail formation, evolution and impact using atmospheric conditions, aircraft type, flight path, and other features18,19. Based on weather variables provided by a forecast or reanalysis product, CoCiP evaluates whether a contrail will form and persist at a given spacetime location and models its lifetime through the initial downdraft and three-dimensional advection with a second-order Runge-Kutta method. Throughout this process, the model continuously compares with local weather conditions to determine whether the contrail continues to persist or sublimates. The model was configured with Exponential Boost Latitude Correction for humidity scaling, a segment length of one kilometer, an integration time step of 5 minutes, a maximum contrail age of 12 hours, a weather update interval of 1 hour, and a wind shear (dsn_dz) factor of 0.665. After modeling the full evolution of the contrail, climate impact quantities such as radiative forcing are calculated.

We ran a gridded version of the model, which evaluates CoCiP on a regular spatiotemporal grid rather than requiring flight waypoints. In order to translate gridded outputs into CLZs, authors of this work examined visualizations of the energy forcing of each grid point predicted by the model20 and assessed its agreement with the ML model prediction. We used ECMWF’s high-resolution forecasts as input to CoCiP27. The CoCiP model also uses cloud microphysics to determine which contrails persist, accounting for initial downdraft, fall, and sublimation.

A drawback of using weather forecast data to predict contrail formation is that existing weather products are often incorrect about the locations of high-altitude ISSRs14,15, with previous works finding that more than 80% of high-altitude ISSRs predicted in ECMWF ERA5 are found by in-situ measurements to be incorrect13. It has been suggested that ISSR prediction skill could be improved by using other atmospheric variables as dynamical proxies, which have been shown to be correlated with ice supersaturation28. In this work we use these dynamical proxies to improve a prediction model by training a neural network to predict contrail formation. For a given flight waypoint, the neural network takes as inputs not only the weather quantities directly related to contrail formation (humidity, and temperature) but other weather variables: wind velocity, relative vorticity, fraction of cloud cover, cloud ice water content, specific snow water content, and divergence. We also used local solar time, day of year, latitude, longitude, and altitude of flight waypoints as input features.

To train, validate and test the model, we use a dataset produced to study contrail formation on a per flight basis15. This dataset comprised all flights available in ground-based Automatic Dependent Surveillance Broadcast data from FlightAware29 over a region roughly spanning the contiguous United States from 28 different randomly selected days in the time period Apr 4 2019 - Apr 2 2020. To avoid train-test contamination, the flights in each set were separated by at least one day as in the original study15.

In particular, this dataset is created by advecting the flight trajectories using ECMWF high-resolution numerical weather forecast wind data and a third-order Runge-Kutta method30, and then comparing these advected trajectories to automatically detected contrails21 from the GOES-16 Advanced Baseline Imager infrared images. The result is a set of  ≈ 6 million flight segments, each of which is labeled as matching or not matching a contrail. The neural network model used the individual flight segment data and their associated weather fields noted above as training examples. It was trained by minimizing the cross entropy loss of its predicted CLZ probability against the binary label of whether it matched a contrail or not, using stochastic gradient descent. When evaluated on the contrail formation on a per flight basis dataset15 the model’s precision (for a given recall) was about twice that of the weather forecast models evaluated, i.e. when the ML model predicts a flight will match a contrail, that prediction is about twice as likely to agree with the dataset as the physics-based models evaluated in this same contrail formation on a per flight basis study15.

The ML model was a four hidden-layer fully connected classification neural network with leaky rectified linear unit activation functions and a sigmoid function in the final layer. Dropout was used for regularization. After training, the model’s predictions were used as a proxy of GOES-16-detected contrail formation likelihood. These scores were available at every point of a latitude-longitude-altitude voxel grid given numerical weather prediction features along the gridpoints’ advection paths.

In order to better understand the ML model, we performed an ablation study by removing weather features from the model and measuring it’s performance, computing the area under the Receiver Operating Characteristic curve). These scores vary between 0.5 (random predictions) and 1 (perfect predictions). We expect that removing more important features will lower the score more than removing less important ones. The results can be seen in Table 1. The analysis agrees with earlier studies: we find that humidity data is the most important feature. Second most important is cloud data, which is closely related to humidity, but also the presence of clouds may affect whether we can detect a contrail. Vertical transport is third-most important, such quantities have earlier been suggested as dynamical proxies for contrail formation28. Note that correlations between weather quantities may allow the model to estimate quantities even if they have been nominally removed. This may explain why, for example, removing humidity data still allows the model some success in predicting contrails even though humidity is a crucially important quantity. Note that removing both humidity and cloud formation information, which is highly correlated with it, leads to a much larger performance drop.

Table 1 Ablation study of the contrail prediction ML model

Figure 3 shows an example of forecasts from the ML model (left) and CoCiP model (right) used to select candidate flights. When choosing flights to participate in the experiment, both the ML and CoCiP models needed to show a CLZ on the flight path. In this example Chicago ORD was a good candidate for a turnaround airport as shown by the CLZs in both forecasting systems. Once a flight has been selected, when adjusting the flight to avoid CLZs (as in Fig. 1).

Fig. 3: Machine learning and CoCiP based contrail likely zone forecasts.
figure 3

Example contrail likely zone forecasts (CLZ) used to select flights for Thursday, March 23, 21:00 UTC from the ML model (left) and the CoCiP model (right). The left image shows all-altitude (FL 260-FL420), color-coded CLZ probability thresholds, with red, yellow, and green corresponding to high, medium, and low probabilities of contrail formation, respectively. The right image shows the forecast for flight level 360, with blue coloring indicating net-cooling contrails and red coloring indicating net-warming contrails. The contrail’s net impact was not considered for the purposes of this trial.

Post-Flight Verification

To determine whether contrails were created or avoided for each flight in the trial, three evaluators (authors of this work) independently completed post-flight analyses to assign individual binary labels. Figure 2 shows an example of the satellite imagery used. The yellow line shows the flight path, which was obtained from Automatic Dependent Surveillance Broadcast data licensed from FlightAware29.

The evaluators assessed whether a flight made a contrail based on the following criteria:

  • Proximity and direction: How close is the contrail to the advected flight trajectory, is it aligned in the same direction?

  • Presence: Is there a high level of confidence that the object in the frame is a contrail?

  • Timing: Did the suspected contrail first appear 20-40 minutes after the flight passed?

  • Persistence: Is the contrail visible in multiple frames?

  • Speed: Does the suspected contrail move at the same speed as the advected plume of the flight?

  • Exclusivity: Are there no other flights in the FlightAware database that are a substantially better match for the contrail in question?

We note that the criteria were used as guidance, as there were cases for which it was not possible to assess all criteria and others where evaluators determined matches even when one or more criteria were not met. The timing criterion was present since it often takes some time for contrails to become large enough to be detected in at least two consecutive GOES-16 frames. Like all the criteria this is guidance only, as contrails can appear much earlier than 20 minutes under certain weather conditions and temporal alignments with satellite scans31.

To account for uncertainty in contrail avoidance compliance, the evaluators assigned a binary label based on whether a contrail was created anywhere along the flight path above 415 hPa (6915 meters), typical contrail formation altitudes32. Disagreements among evaluators were resolved by majority vote. See Fig. 4 for a geographic coverage of the flight sample.

Fig. 4: Flight sample considered in the randomized crossover trial.
figure 4

Geographic coverage of flights considered for the trial, colors are used to illustrate different flights.

Results

Ascent/descent adjustments lead to a decrease in detectable contrail formation

As shown in Table 2, the treatment group, which aimed to avoid contrails, made 4 detectable contrails, 63.6% fewer than the 11 observed in the control group. We rejected the null hypothesis of no change in detectable contrail formation between the treatment and control groups (p = 0.0331, permutation rank sign test). These results suggest that the early descent or delayed ascent informed by the CLZ forecasting system caused a statistically significant reduction in detectable contrail formation. They also demonstrate the operational feasibility of detectable contrail avoidance on a per-flight basis.

Table 2 Number of flights with GOES-16 detected and undetected contrails, total detected contrail length, and total flight distance for the control and treatment groups

Detectable contrails accounted for 1.97% and 0.89% of total flight kilometers in the control and treatment groups respectively. This represents a 54.4% reduction in detectable contrails per flight kilometer in the contrail avoidance flights. We tracked fuel usage throughout the experiment using aircraft movements and inflight position reports sent through Aircraft Communication Addressing and Report System via Very High Frequency and Satellite Communications. The treatment group used an average of 2% more fuel per adjusted flight, corresponding to an additional 0.26 kg of CO2 emissions per kilometer of treatment group flight.

Discussion

Our work has shown a reduction in observed contrail formation with a treatment group size of 22 flights. We opted for a tactical near-airport contrail avoidance intervention for several key reasons: 1) This approach allowed for an effective crossover trial design, as we had flexibility in selecting destination airports near CLZs. This minimized the time difference between outbound and inbound flights encountering the same CLZ, thus controlling for potential confounders like aircraft type and atmospheric conditions. 2) Near-airport maneuvers simplified operations and ensured higher pilot compliance. 3) Compared to modifying cruise levels, which could require prolonged low-altitude flight or multiple altitude changes, delayed ascent/early descent minimized additional fuel usage by avoiding extra climbing. Additionally, lower altitudes are associated with a reduced probability of contrail formation even in the presence of some ice supersaturation. It is worth noting that, although statistical significance is influenced by sample size, it is also heavily dependent on the signal-to-noise ratio of the data33. Our crossover design trial together with the tactical near-airport avoidance intervention mitigated the change in atmospheric conditions present in treatment and control groups, thereby controlling for potential weather-related and other confounders which helped reduced noise34. Given the small sample size, we avoided asymptotic assumptions about normality and adopted a non-parametric approach, enhancing the statistical power of our hypothesis test in this context25,26.

An important future direction would be to show this method extends to larger scale, such as by performing a similar experiment with hundreds or thousands of flights. In addition to demonstrating operational feasibility, such a trial could expand the route selection criteria to allow testing avoidance of mid-flight CLZs, horizontal as well as vertical path adjustments, geographic regions with different weather patterns (e.g. North Atlantic or subtropical routes) and nighttime flights which can have a particularly strong warming impact. Moreover, future trials might consider relaxing the requirement for agreement between ML and CoCiP models in CLZ identification to explore contrail avoidance feasibility under a broader range of prediction scenarios. Eventually, trials should consider the radiative forcing impact of contrails when making flight-adjustment decisions, so that the focus can be on evaluating the feasibility of minimizing the warming impact as opposed to just contrail formation. In this experiment, we validated detectable contrail avoidance manually for each flight, but a larger trial would likely need to use an automated system15,31.

There are a few important factors to consider when interpreting the results. For example, having observed 11 contrails in the control group, where 22 flights flew through CLZs, suggests CLZ forecasting is challenging and can lead to false positive CLZs. Weather forecasts of humidity at contrail-relevant altitudes are subject to inaccuracies13,16,35. We control some of this uncertainty by using two different approaches for CLZ forecasting (physics based and empirical-based ML), and with our inclusion/exclusion criteria for the flight sample. The 50% rate of contrail observation for flights predicted to form a contrail in the control group is relatively high compared to previous approaches13,14,15. It is also worth noting, that the probability of the ISSR not being present on the turnaround leg is balanced between the treatment and control groups due to the randomization of the crossover trial design. Therefore, it should not systematically bias the overall conclusions of the trial.

Difficulties in CLZ forecasting may also explain the observed contrails in the treatment group. Alternatively, these false positives could have been caused by attributing the contrail to the wrong flight, which can happen in image sequences with a high density of flight paths and contrails such as the example in Fig. 2. False positives could also be attributed to our labeling strategy; our evaluation was conservative in that it did not take into account where in the flight the contrail was made: a contrail formed in any part of the flight path was counted towards the treatment group, even if formed after a successful CLZ avoidance intervention to ensure objective evaluation as mentioned in the Results section. Some contrails may not be visible in the images, either because of the chosen infrared channel color scheme, because the 2 km satellite resolution might not show faint or optically thin contrails or they might have been simply blocked by higher clouds15,21. Finally, some flights (e.g. military flights) may be missing from the Automatic Dependent Surveillance Broadcast flight trajectory data provided by FlightAware29, which could complicate flight attribution.

Previous simulation studies6,7 have argued that small-scale flight deviations can avoid creating contrails, but this work’s use of satellite imagery for evaluation tested whether satellite-detectable contrails were avoided. Research on the radiative properties of contrails could help to quantify under what conditions contrails form but are not detectable by satellite.

Conclusion

In this work we performed a randomized control trial of whether small-scale flight deviations can reduce detectable contrail formation. Using machine learning and physics-based CLZ forecasting models, we found a statistically significant reduction in the number of observed contrails in the flights that attempted to avoid contrail forming regions. This study provides a proof-of-concept that commercial airlines can verifiably avoid detectable contrail formation, one of the first steps towards developing a comprehensive avoidance strategy. We hope that these findings motivate further research into contrail avoidance via flight route planning at a global scale.