Introduction

Automation has been a key feature of manned spaceflight since the first orbital flight of Vostok 1 in 1961, which required no input from the cosmonaut. The Soviet shuttle, the Buran, performed its only orbital flight in 1988 without crew, and the US shuttle program was capable of full automation from launch to landing but a programmatic decision was made to have the commander take over the controls for the final two-minute descent and touchdown1. Launch of the Soyuz spacecraft and docking with the International Space Station (ISS) are also fully automated. But as long as there has been automated control of spacecraft there have been failures; Voskhod 2 (1965), Gemini VIII (1966), Soyuz 1 (1967), Apollo 13 (1970)—all featured potentially catastrophic failures of automation and equipment requiring human intervention and manual control of the spacecraft. More recently, errors in the automated Soyuz-ISS docking sequence have required crewmembers to override the system and perform a manual docking procedure (see2 for review).

The Orion spacecraft of the Artemis lunar program utilises an automated Apollo-like water landing. However, some form of operator control and supervision will likely be required for lunar and Martian landings, thus the ability of crewmembers to perform these tasks after periods of microgravity exposure remains relevant. Although no manual landings have been performed after long-duration spaceflight, the observed degradation in shuttle landing performance after short-duration missions (less than 18 days) suggested that performance may be impaired. Our study of the first 100 shuttle missions found that 20% of landings were outside of acceptable limits in terms of touchdown (TD) speed, and the maximum speed of 217 kn (main gear tire limit) was exceeded six times1.

Multiple factors could be involved in the observed decrements in pilot performance after spaceflight. Acute physiological changes occur in a 24–72 h period following gravity transitions; from 1-g to 0-g during launch and 0-g to 1-g upon return to Earth. Immediately upon entering orbit a likely sensory mismatch between actual and expected vestibular input occurs, particularly during roll and pitch head movements in which both angular (semicircular canal) and linear (otolithic) input is expected; in microgravity otolithic input encoding head tilt with regard to gravity is absent. This sensory conflict may underlie space motion sickness symptoms such as headaches, fatigue, and nausea3. Shuttle-era studies have shown that space motion sickness4 and spatial disorientation5 commonly occur during the 0-g to 1-g landing transition, particularly in response to combined canal-otolith activation via head pitch and roll tilts5. Moreover, there is a 2-L shift of blood and fluids from the legs and trunk towards the head upon entering weightlessness6,7, with a subsequent increase in intracranial pressure that may contribute to motion sickness symptoms such as headaches7.

Following the acute phase of adaptation to microgravity long-term physiological changes continue throughout the mission. The sustained increase in intracranial pressure can cause serious long-term (and in one case seemingly permanent) post-flight decrements in visual acuity through pressure-induced changes to eye structure (optic disc oedema, choroidal folds, cotton wool spots, globe flattening and distended optic nerve sheaths)8. In-flight cardiovascular adaptation results in a 10% decrease in total blood volume9, which persists early post-flight and is likely a key contributor to the post-flight orthostatic intolerance (fainting) commonly experienced by crewmembers on the day of return10. Although motion sickness symptoms dissipate after about 3 days on-orbit the central nervous system (CNS) continues to adapt to the lack of gravitational input. Pitch and roll head movements are no longer provocative, which may be due to the CNS downregulating low-frequency otolith input (< 0.3 Hz – the tilt response11) in a microgravity environment over the course of a mission, as evidenced by the significant post-flight decrease in gain of the ocular-counterrolling (OCR) reflex (a direct measure of the otolith tilt response via counter-rotation of the eyes about the line of sight towards the vertical in response to lateral head tilt) after ISS missions12. Inflight unloading of the musculoskeletal system results in a 1–2% monthly loss of bone mineral density13,14, mostly in the gravity-bearing lumbar spine, pelvis, femur and tibia15, and atrophy of postural (lumbar, quadriceps, hamstring and calf) muscle volume of around 13%16 during ISS missions. In addition to these gross physiological changes there is some evidence for a minor impairment in manual tracking17 during spaceflight, and crewmembers are exposed to a variety of other stressors including altered light–dark cycle, sleep deprivation, elevated CO2 concentration, confinement, and high mental and physical workload.

This study was designed in response to a directed NASA request for proposals addressing the risk of impaired ability to maintain control of vehicles and other complex systems following spaceflight, in particular the sensorimotor gap SM6 Need to perform a seated Manual/Visual performance assessment after long-duration spaceflight18. Our selected flight study implemented a seated test battery to quantify post-flight cognitive/sensorimotor deficits and correlate these results with operator performance during relevant simulations; driving a car, operating a Mars rover, and, the subject of the current paper, performing a T-38 Talon landing simulation. The overarching aim was to determine the underlying causes of deficits in post-flight operator proficiency to facilitate countermeasure development. The test battery and driving simulation results have been previously published2, in which post-flight deficits in driving performance and sensorimotor and cognitive function were observed in a cohort of eight astronaut subjects after ISS missions averaging 171 days2. The test battery found significant post-flight impairments in fine motor control and the ability to dual task, and increased fatigue, on the day of return from the ISS, and in a full-motion driving simulation crewmembers exhibited a markedly reduced ability to maintain lane position compared to baseline, with significantly increased incursions into the wrong lane and a longer time to correct. In addition to the test battery and driving simulation task reported above2, a subset of 5 professional pilots from the original cohort of 8 subjects2 also performed a T-38 Talon full-motion landing simulation. In this paper we describe the effects of spaceflight on pilot performance in this smaller cohort.

The standard overhead approach (see Fig. 2a) is well known to military and NASA pilots. Designed to minimise exposure to enemy fire and facilitate multiple aircraft landings19, the aircraft approaches the runway threshold at speed and scrubs energy during a series of four 90⁰ turns over the airfield (creating an elongated oval flightpath) before the final approach and touchdown. In this study we utilised an overhead T-38 approach and landing simulation to assess post-flight piloting ability, as both the overhead pattern and the T-38 trainer aircraft were well known to our military/NASA pilot cohort (a critical point as crew time constraints dictated that minimal training time was available for learning a novel task). We posited that performing a series of controlled banking turns immediately post-flight would be a challenge due to the blunting of the low-frequency (tilt) vestibular response observed after spaceflight12. On Earth, low-frequency otolithic input feeds postural and ocular reflexes and motion perception, to help maintain balance, vision and spatial orientation in response to tilts of the gravito-inertial acceleration (GIA) vector and head, and similar neural pathways mediate the response to tilts of the visual scene20,21. Our recent study of 25 returning ISS crewmembers showed a significant 24% decrease in the OCR response12 2–3 days after return from the ISS (the first available data point), thus it is likely that the tilt response was even more suppressed on the day of landing. While suited to the microgravity environment, we hypothesised that persistence of the in-flight blunting of the otolith-mediated tilt response into the early post-flight period could adversely affect a pilot’s ability to control the aircraft during the banking manoeuvres of the overhead approach, resulting in off-nominal performance as previously observed in shuttle landings1.

Methods

The experiments were approved by the Program for the Protection of Human Subjects at Icahn School of Medicine at Mount Sinai (study 08–1009) and the Institutional Review Board at NASA Johnson Space Center (protocol CR00000550), and all testing was performed in accordance with the relevant guidelines and regulations pertaining to the protection of human subjects as described in the Declaration of Helsinki and Health and Human Services 45 CFR 46. Subjects gave their written informed consent and were free to withdraw at any time. All test sessions were conducted in Building 266 at NASA Johnson Space Center (JSC), Houston, Texas. Inclusion criteria, determined by NASA, were straightforward – an astronaut assigned to a mission aboard the ISS.

Subject demographics

Five astronauts (all male), assigned to missions aboard the ISS from October 2012 until June 2015, participated in the T-38 flight simulation study. Mean age was 46 years (SD 8.3), and time aboard the ISS ranged from 146 to 200 days (mean 169.6, SD 19.5). These subjects were a subset of a group of 8 subjects who performed a cognitive/sensorimotor test battery and driving and Mars rover simulations as reported previously2. For the current study of the effect of microgravity exposure on piloting performance, this subset of five was selected on the basis of pilot experience; four were military test pilots and the fifth was an experienced NASA pilot with extensive T-38 experience.

Testing schedule

These astronaut subjects were tested four times pre-flight and three times post-flight (Fig. 1). A briefing on the tests to be performed and assessment criteria were provided prior to each test session. The first 90-min session, scheduled on average 156.8 days (SD 38.4) prior to launch, was used to familiarize crewmembers with the T-38 landing simulator (data from these sessions were not analysed). Baseline data were obtained from the subsequent three 60-min pre-flight sessions (Fig. 1a), which occurred 134.6 (SD 9.9), 84.2 (SD 13.4) and 77.2 (SD 10.8) days before launch. Crewmembers were tested late evening at JSC on the day of return from the ISS (Fig. 1b) approximately 25-h after touchdown in Kazakhstan (operationally this was still considered R + 0). Due to mission constraints one subject (S5) was not available for testing until 7:00am Houston time the day after landing (R + 1), approximately 36-h after touchdown. The mean gap between the final pre-flight test and the first post-flight session was 247 days (SD 11.9; range 239–267). The second and third post-flight sessions were conducted 3.8 days (SD 0.4; range 3–5) and 7.6 days (SD 1.1; range 7–9) after return (labelled R + 4 and R + 8, respectively).

Fig. 1
figure 1

(a) Pre-flight test schedule. (b) Post-flight test schedule.

Crew activities during the post-flight testing period were constrained to limit the risk of crew injury and motion sickness22. Treatment for motion sickness, typically with Meclizine or Promethazine, typically occurred at the landing site in Kazakhstan or on the refuelling stop prior to the final return flight back to Houston23. This experiment was prioritised such that our simulator activities were scheduled first in the post-flight testing schedule at JSC to avoid potential confounds from other testing activities.

Test battery

A battery of nine tests were performed seated as previously described2 and are briefly summarised here:

Stanford Sleepiness Scale: A subjective ordinal scale from 1 (wide awake) to 7 (struggle to remain awake)24.

Static Visual Acuity: Subjects viewed a Landolt ‘C’ eye chart position positioned 3.05m (10 ft) away. Visual acuity was determined as the smallest line on which the subject could correctly identify the orientation of at least 3 'C' optotypes, measured in logMAR.

Manual Dexterity: The Purdue Pegboard test25. Number of pins placed sequentially within 30 s (left, right and both hands).

Manual Tracking: A randomly moving target was tracked by a mouse-driven cursor with the dominant hand. The primary measure was mean tracking error (pixels).

Dual Tasking: Subjects were required to perform the tracking task above whilst responding to prompts from a second computer monitor for a 4-digit code to be entered on a keypad with the non-dominant hand.

Reaction Time: Simple reaction time was assessed by having the subject press the left mouse button as soon as possible after a circular icon appeared on a black screen.

Perspective Taking: A computerized perspective taking task based on the Directional Orientation Test from the Test of Basic Aviation Skills (TBAS), used by the US Air Force to assess potential pilot recruits2,26.

Match to Sample: Short term memory for learned associations was assessed with the match to sample task using simple 4 × 4 patterns.

Motion Perception: The task was performed with the subject seated in the motion simulator, tasked with indicating gravitational vertical with the control stick as the cabin moved in a pseudorandom manner driven by a sum of seven sines with frequencies at 0.12, 0.25, 0.32, 0.43, 0.62, 0.80 and 0.98 Hz, first in roll for 60 s, then, after a short break, in pitch.

Testing order

After entering the test room subjects were seated at a desk to perform the laptop-based test battery. At completion subjects were seated in the motion simulator cabin and the motion perception test was performed. The three simulations were then performed in the following order – 1) the T-38 landing, 2) driving2 and 3) Mars rover.

T-38 Talon flight simulation

The full-motion flight simulation was implemented with Xplane 9.0 (Laminar Research, Columbia, SC), utilizing software drivers to provide input to a six degree-of-freedom motion base (CKAS V7, Melbourne Australia – see2 for a full description). The T-38 talon aircraft model was designed by Jacques Brault and Bruce Cogan (the latter from NASA Armstrong Flight Research Center). Although not an FAA certified flight simulator, the CKAS Stewart platform forms the basis of commercial Level D flight simulators (CKAS Mechatronics Pty Ltd, Melbourne Australia) and provided washout algorithms for use with X-Plane. Subjects were tasked with landing the T-38 using an overhead approach (Fig. 2a) with a break to the left. The T-38 was chosen as it is used to train US military pilots and NASA mission specialists, groups who were expected (and did) provide the pilot subjects. Secondly, the overhead approach required the pilot to perform a series of banking turns while maintaining altitude and airspeed, a manoeuvre that was hypothesized to be problematic after microgravity due to observed blunting of the ‘tilt’ response in astronauts post-flight1,2,12.

Fig. 2
figure 2

(a) The overhead landing pattern used in the T-38 simulation. The initial position of the aircraft was 5 km due north of runway 17R at Ellington Field, Houston, with an airspeed of 300 KIAS at an altitude of 1500 ft. Yellow text boxes are adapted from the United States Air Force T-38 Training Manual (1978). Map data: Google, DigitalGlobe. (b) Landing parameters used to assess pilot performance. TD—touchdown; Hdot—sink rate or vertical velocity; KIAS—knots-indicated airspeed.

Data were processed in Labview (National Instruments, Austin TX) and provided landing metrics (Fig. 2b) including touchdown speed (KIAS – knots indicated airspeed), range (touchdown distance from runway threshold), height and vertical velocity (sink rate or Hdot) at runway threshold, and landing gear touchdown force (lbs). In keeping with US aviation terminology altitude and sink rate were expressed in feet and ft/s.

Ellington Field was selected as the airport due to its familiarity (Ellington is routinely used as a gateway for astronaut travel) and the fact that runway 17R is oriented due south (heading 179°), greatly simplifying navigational requirements (Fig. 3a). The simulation began 5 km due north of, and aligned with, runway 17R, at 300 KIAS airspeed (Fig. 3b) at an altitude of 1500 ft (Fig. 3c). By eliminating the need for navigational input from the subject we aimed to reduce the piloting task to simply that of control of the aircraft; maintaining altitude and airspeed while performing the banking turns (Fig. 3d) required to align with the runway for final approach and touchdown.

Fig. 3
figure 3

Exemplar T-38 landing simulation performed by a military test pilot. (a) Overhead view of the flight path. Map data: Google, DigitalGlobe. (b) Airspeed, (c) Altitude, (d) Roll (bank) of the aircraft. The pilot maintained airspeed at 300 KIAS until the first turn, then gradually reduced airspeed through the left bank and downwind leg whilst deploying the landing gear and flaps. Altitude was maintained at 1500 ft until the initiation of the second leftward turn, then the aircraft gradually descended through the turn and the final approach with a touchdown at 120 KIAS.

An exemplar T-38 landing profile from an experienced military test pilot is shown in Fig. 2. The pilot flew due south maintaining altitude (1500 ft) and airspeed (300 KIAS), initiating the first banking turn to the left at the end of the runway (maintaining 1500 ft altitude) until the aircraft was heading due north parallel to the runway. On this downwind leg the landing gear and flaps were lowered, reducing airspeed (while maintaining 1500 ft altitude), with the final 180° turn and descent bringing the aircraft heading to due south aligned with the runway for the final approach and touchdown at 120 KIAS. Each subject performed 2–3 landings in each of the three pre-flight baseline data collection sessions (6–9 total pre-flight). On the day of return (R + 0) each astronaut performed two landings, and one landing each on R + 4 and R + 8.

Statistics

Significance of post-flight changes at the group level with respect to baseline were assessed with the T-Test (socscistatistics.com). A single-tailed test with significance < 0.05 was used as based on post-flight changes reported in the literature and our own previous experience, we hypothesized that long duration spaceflight would only impair astronaut function post-flight (i.e., we were not expecting improved performance on landing day compared to pre-flight).

Study Limitations

In common with many spaceflight studies there were significant limitations. NASA operational constraints determined the number of subjects, the pre- and post-flight test schedule, and the length of the test sessions (60 min). Within each 1-h session the test battery, plus T-38, driving and Mars rover simulations, were performed. The limited time available for pre-flight familiarisation and testing (4.5 h total) necessitated implementation of simulations that required minimal training – that is, operational tasks in which the subjects were already skilled. The T-38 landing simulation was chosen on the expectation of recruiting former military and NASA professional pilots; the T-38 Talon is used for pilot training by the US Air Force and Navy, NASA (all astronauts must complete annual T-38 flight and simulator training27 and astronaut pilots routinely use the T-38 for travel between NASA facilities) and pilots from NATO nations. Of the 8 subjects selected for the broader study2 five were experienced T-38 pilots and required minimal training in our T-38 flight simulator. This study was somewhat underpowered, as evidenced by the lack of significant changes in TD parameters at the group level on R + 0, and actual kinematics of the simulator cabin were not measured during the T-38 landing simulations.

Results

Test battery

In our previous paper with a larger cohort of 8 ISS crewmembers (which included the 5 pilot subjects from the current study) we found significant post-flight decrements in manual dexterity (left hand), dual tasking and self-reported sleepiness2; no post-flight changes were observed in reaction time, perspective taking, match to sample, tracking alone, nor static visual acuity. A significant decrease in the post-flight pitch tilt perception response at 0.12 Hz was also observed2. Not surprisingly, analysis of test battery results for the subset of 5 pilots from the original cohort of 8 ISS subjects did not appreciably alter these findings (Table 1). There was a small but significant decrease (t[18] = 2.36 p = 0.015) in manual dexterity (left hand number of pins inserted in 30s on the Purdue Pegboard), from a pre-flight mean of 14.8 (SD 1.1) to 13.4 (SD 1.3) on R + 0. Similar decreases in performance with the right and both hands were observed on R + 0 but did not reach significance. There was no change in manual tracking accuracy alone (mean error in pixels) from a baseline of 31.7 (SD 10.4) to 33.6 (SD 9.7) on landing day. However, when a distracting task was added tracking error increased 63% post-flight (t[18] = − 2.09 p = 0.025) from a baseline of 49.5 pixels (SD 19.2) to 80.6 (SD 49) on R + 0, and subjects reported feeling sleepier (t[18] = − 3.18 p = 0.003) on R + 0 with a Stanford Sleepiness Scale mean score of 3.8 (SD 1.5) relative to the pre-flight mean of 2.3 (SD 0.7). One difference between the full (N = 8) cohort and the smaller pilot group in this study was that the 39% post-flight decrease in the pitch motion perception response at 0.12 Hz observed in the pilots did not reach significance (t[18] = 1.40 p = 0.09), likely due to increased variability with lower N. Subjective sleepiness, manual dexterity and dual tasking performance returned to baseline by the second post-flight R + 4 test session. Consistent with previous results2 there were no post-flight changes observed in reaction time, perspective taking, match to sample nor static visual acuity in the 5 pilots (Table 1).

Table 1 Results from the cognitive/sensorimotor test battery and the driving simulation for the 5 pilot subjects. Significant changes shown in BOLD font.

Driving simulations

As reported previously we found significant post-flight performance decrements in the original N = 8 cohort during a full-motion driving simulation, with subjects tasked to maintain position in the righthand lane whilst driving along a 3-km winding mountain road2. On the day of return from the ISS subjects made more crossings into the wrong lane, spent a greater percentage of time in the wrong lane and took a longer time to correct lane excursions. Focusing on the subset of 5 pilots in the current study a similar pattern was observed (Table 1 and Fig. 4). On the day of return from the ISS (R + 0) the mean number of lane crossings increased significantly (t[18] = − 3.05 p = 0.003) from a pre-flight mean of 5.8 (SD 4.2) to 12 (SD 2.8) (Fig. 4b); the percentage of time spent in the wrong lane increased (t[18] = − 4.94 p = 0.00005) from a baseline of 6.4% (SD 5.3) to 24.3% (SD 11.1) on landing day (Fig. 4c); and the time to correct (return to lane) increased (t[18] = − 2.19 p = 0.02) from 1.33 s (SD 0.89) pre-flight to 2.36 s (SD 0.97) on R + 0 (Fig. 4d). Driving performance returned to baseline by R + 4.

Fig. 4
figure 4

Performance on the mountain driving simulation for the pilot cohort. (a) Lane deviations were assessed; (b) number of lane crossings, (c) time to recover lane position, (d) percent time in wrong lane. Error bars denote SD.

T-38 landing simulations

Pre- and post-flight touchdown (TD) parameters, speed (kn), landing force (lbs), range (km), Hdot (ft/s), and height above runway threshold (ft), are shown in Table 2. At the group level we compared pre-flight TD parameters with those of the two landing attempts on the day of return (R + 0) from the ISS. TD speed exhibited a trend towards an increase on R + 0 (t[45] = − 1.26 p = 0.11), from a baseline mean of 113.6 kn (SD 10.7) to 118.4 kn (SD 11.0) on R + 0. There were no significant pre-/post-flight changes at the group level in touchdown force (lbs) (pre-flight: 7482.5 [SD 1754.8]; R + 0: 6668.8 [SD 2087.6]), range (km) (pre-flight: 0.89 [SD 0.34]; R + 0: 0.91 [SD 0.37]), nor Hdot (ft/s) (pre-flight: 75.2 [SD 8.0]; R + 0: 77.8 [SD 6.4]). It is perhaps unsurprising there were no significant post-flight changes at the group level, given the small group size (N = 5) and the limited number of landings in the post-flight sessions. There was also considerable variability in landing styles apparent pre-flight. For example, the mean touchdown speed for each subject ranged from 104.1 to 121.2 kn. This partly reflected different piloting styles; for example, there is no preflare manoeuvre for carrier landings thus naval aviators tend to perform ‘hotter’ landings.

Table 2 Pre-flight: individual and group mean and SD of the touchdown parameters for all pre-flight trials; touchdown (TD) speed (KIAS), landing force (lbs), distance (range) of touchdown from the runway threshold (km), vertical sink rate (Hdot ft/s), and height over runway threshold (ft).

To assess individual post-flight performance, we compared each pilot’s post-flight landing parameters to the mean and SD of their pre-flight tests, considering a post-flight value more than 2 standard deviations from the mean pre-flight value as a minor degradation of performance and values more than 3 SD from the pre-flight mean as strongly indicative of a major performance decrement. Using this criterion 80% of the first landing attempts after return from spaceflight (R + 0) exhibited degradation in performance (subjects 1, 3, 4, 5). Three of the five subjects exhibited touchdown values exceeding the 3 SD threshold (Table 2; red cells); subjects 1 (Fig. 5a—height above runway threshold), 3 (Fig. 5b—range and height above runway threshold) and 5 (Fig. 5c—TD speed).

Fig. 5
figure 5

Landing parameters from three subjects demonstrating significant changes in the first landing on R + 0. *** denotes a value > 3 SD above the individual’s pre-flight mean; ** denotes a value > 2 SD above the individual’s pre-flight mean. Error bars on pre-flight data represent SD. In all cases the aircraft passed the runway (RW) threshold significantly higher than during pre-flight, which led to significant increases in range (panels (a) and (b)) and (c) touchdown speed. Note the rapid recovery to pre-flight values on the second R + 0 attempt and on subsequent post-flight testing.

Four of the five subjects (S1, S3, S4, S5) exhibited landing parameters above 2 SD on the first R + 0 landing (Table 2 orange cells), and subject 4, whilst not exceeding the 3 SD threshold in any post-flight touchdowns, had landing parameters 2 SD from the pre-flight mean on both R + 1 landings and on R + 3 (TD force—not shown) and R + 7 (Fig. 6b). A promising result was that on the second R + 0 attempt only one subject (S4) exceeded the 2 SD threshold (increased range and Hdot) and none of the five exceeded the 3 SD threshold, suggesting that a rapid recovery of operator performance is possible after exposure to the task. Only one subject (S2) did not exceed the 2 or 3 SD performance thresholds on R + 0, however their R + 3 landing range and touchdown force was significantly shorter and harder (> 3 SDs) than pre-flight mean (Fig. 6a).

Fig. 6
figure 6

Landing parameters from two subjects demonstrating minor changes in the first landing on R + 0. ***denotes a value > 3 SD above/below the individual’s pre-flight mean; * denotes a value > 2 SD above/below the individual’s pre-flight mean. Error bars on pre-flight data represent SD. (a) Subject 2 was the only subject with R + 0 landing parameters within the nominal range (< 2 SD from pre-flight mean) on the initial R + 0 landing attempt. However, this subject was the only pilot to perform an off-nominal (heavy) landing in subsequent post-flight testing, with a landing force > 3 SD above pre-flight mean and a short landing (range > 2 SD less than pre-flight) on R + 3. (b) Subject 4 was the only pilot with all post-flight landing parameters < 3 SD from baseline, although both R + 0 landings featured TD speed > 2 SD from pre-flight values, and TD range more than 2 SD from baseline on the second R + 1 attempt and on R + 7.

A general observation from Fig. 5 is that all three subjects with major performance decrements (> 3 SD) on the first landing attempt after spaceflight crossed the runway threshold at a significantly higher altitude compared to their individual pre-flight means. As this likely reflected errors in aircraft control in the earlier phases of flight we assessed each of these subject’s R + 0 landing in its entirety to determine what factors contributed to the off-nominal touchdown parameters. Beginning with subject 3 (Fig. 5b), we see that both the height at the runway threshold and the distance of touchdown from the runway threshold (range) were both > 3 SD above their pre-flight mean. Figure 7a shows the altitude for all the pre-flight (blue lines) and first (thick red line) and second (thin red line) R + 0 landing attempts for subject 3. All pre-flight landings follow a similar pattern to that observed in the exemplar of Fig. 3, where altitude was maintained at 1500 ft until the initiation of the second banking turn (Fig. 7b,c), upon which the aircraft gradually descends through the turn and the final approach to touchdown. During the initial R + 0 landing, however, it is readily apparent that the pilot was unable to maintain altitude upon entering the first leftward turn, ascending rapidly during the banking manoeuvre to over 2500 ft (Fig. 7a). The pilot attempted to correct the altitude during the downwind leg, descending to around 1750 ft, before once again ascending during the second 180° turn to a little over 2000 ft, and rather than a gradual descent upon entering the final approach initiated a more aggressive descent to pass the runway threshold at almost 3 times the pre-flight height and touchdown almost 1000 m from the runway threshold.

Fig. 7
figure 7

Landing data from subject 3. (a) Altitude—blue lines are the pre-flight landings; thick red line is the first attempt on R + 0; thin red line is the second R + 0 landing. (b) Aircraft roll, showing the two 180° turns, and (c) overhead flight path, from the initial R + 0 landing. Map data: Google DigitalGlobe. Note the pilot’s inability to maintain altitude during banking on the initial R + 0 attempt.

A similar loss of altitude control was observed in Subject 5 (Fig. 8). On the first R + 0 attempt this subject passed the runway threshold at a height four times their pre-flight mean (> 2 SD) and touchdown speed, at 141.4 KIAS, was a little more than 20 KIAS above pre-flight mean (> 3 SD) (Fig. 5c; Fig. 8a inset—thick red and blue lines). The aircraft’s altitude began to rise soon after initiating the first banking turn (Fig. 8b—thick red line) and remained 200–400 ft higher than the pre-flight patterns throughout the final turns and approach. The increased touchdown speed was likely a consequence of the steeper descent just before touchdown (Fig. 8b). Deviations from pre-flight were also apparent in the banking manoeuvres (Fig. 8c) and flight path (Fig. 8d). The first 180° turn was incomplete, with the aircraft exiting the turn on a NNE heading rather than due north parallel to the runway (Fig. 8d). A further 3 banking manoeuvres were initiated to align the aircraft with the runway for final approach (Fig. 8c,d); rather than the standard two 180° turns the pilot performed a total of four smaller turns. Again, landing parameters were consistent with pre-flight values on the second R + 0 landing (Fig. 8a,b—thin red line).

Fig. 8
figure 8

Landing data from subject 5. (a) Airspeed (KIAS)—blue lines are pre-flight landings; thick red line is the first attempt on R + 0; thin red line is the second R + 0 landing. (b) Altitude (c) Aircraft roll (first R + 0 attempt), showing an incomplete first turn and three subsequent turns, and (d) overhead flight path (from the initial R + 0 landing), showing the consequence of the incomplete first turn that left the aircraft on a NNE heading. Map data: Google DigitalGlobe.

In contrast to the other two subjects above (S3 and S5) who exhibited R + 0 touchdown parameters > 3 SD above their pre-flight mean, subject 1 maintained altitude close to pre-flight throughout the initial approach and turns of the first R + 0 attempt (Fig. 9b—thick red line). However, this attempt was arguably the most dramatic of the post-flight landings. It is apparent from the pre-flight data that subject 1 tended to follow a NNE heading out of the first turn and perform a wider final turn to align with the runway for final approach (Fig. 8a—blue lines). During the initial R + 0 approach the pilot appeared to experience spatial disorientation after exiting the first turn on a NNE heading, performing a small left turn towards the runway (Fig. 9c, between points 2 and 3) but then continuing across the runway approach on a WNW heading, passing over the Sam Houston Tollway towards downtown Houston (Fig. 9a—thick red line). At this point the pilot appeared to realise their navigational error and initiated a large leftward banking turn (Fig. 9c) and approached the runway threshold on a ESE heading, performing a 50° right banking turn over the runway apron (Fig. 7c), passing over the threshold at an altitude of 470 ft with a loss of roll control resulting in a rapid series of left/right banking manoeuvres over the runway until touching down almost 1.6 km from the runway threshold and almost overrunning the 2.7 km long runway. The second R + 0 landing attempt was an improvement but the subject lost altitude control at the beginning of the first turn, ascending to 2000 ft, but recovered quickly (Fig. 7b—thin red line), passing over the runway threshold at a more reasonable 212 ft but range was still above the pre-flight mean at 1.3 km.

Fig. 9
figure 9

Landing data from subject 1 (a) Overhead flight path from pre-flight (blue lines) and the initial (thick red line) and second (thin red line) landings on R + 0. (b) Altitude (ft)—blue lines are pre-flight landings; thick red line is the first attempt on R + 0; thin red line is the second R + 0 landing. (c) Aircraft roll (first R + 0 attempt). In this instance the pilot appeared to experience spatial disorientation, crossing over the runway approach before turning back towards the runway, crossing the threshold at 500 ft whilst performing a rapid series of L/R roll manoeuvres over the runway before touchdown almost 1.6 km from the runway threshold. Map data: Google DigitalGlobe.

Discussion

The results of this study demonstrate that extended microgravity exposure during ISS missions adversely affected post-flight simulated landing performance in a small cohort of highly experienced astronaut pilots. Four of the five subjects (S1, S3, S4, S5) exhibited touchdown parameters on the initial R + 0 landing attempts > 2 to 3 SDs outside of their mean pre-flight performance, although only one subject’s landing (S1) could be seen as a potentially dangerous (simulated) outcome. The only subject to maintain touchdown parameters within 2 SDs of their pre-flight performance on landing day, subject 2, exhibited an off-nominal landing three days after return from the ISS (R + 3), touching down close to the runway threshold with a landing force well outside of the nominal range and > 3 SD above their pre-flight mean. Subject 4 was the only subject to maintain all post-flight landing parameters within 3 SD of their pre-flight mean but exhibited minor performance degradations on both R + 1 attempts (this subject was not available for testing until 36 h after return), and on R + 3. Subjects 1, 3, and 5 exhibited nominal touchdown parameters on the two outer post-flight test days (R + 4 and R + 8). The observed degradation in piloting performance on R + 0 was consistent with the significant decrements in driving performance on the day of return from the ISS2. Our pilot subjects made more crossings into the wrong lane, took longer to correct, and spent a greater percentage of time in the wrong lane compared to pre-flight (see Fig. 4). Performance returned to baseline in both the T-38 and driving tasks by R + 4.

The NASA requirement for a seated assessment protocol negated the possibility that gross physiological changes associated with long-duration spaceflight, such as atrophy of postural muscles, loss of load-bearing bone mass and post-flight orthostatic hypotension, adversely affected pilot performance, and a static visual acuity test found no clinical signs of vision loss on landing day due to increased intracranial pressure on orbit. What then are the critical factors underlying the decline in operator proficiency on R + 0? Returning from the ISS for post-flight testing is a complex process. After detaching from the station the Soyuz capsule descends to Earth over a 3-h period (with a nominal maximum g-load of 4.5-g28) followed by a parachute landing in Kazakhstan, recovery from the landing site and transfer to Houston on a NASA Gulfstream III jet28, arriving at Johnson Space Center around 24-h after departure from the ISS. Not surprisingly astronauts self-reported sleepiness was significantly higher on the day of return from the ISS. However, we believe that fatigue alone is not the primary cause of post-flight performance deficits. A sleep-restricted control cohort showed no evidence of performance decrements on the driving simulation task after a 30-h sleep restriction protocol that generated a similar heightened score on the Stanford Sleepiness Scale as the astronaut group on R + 02; the landing day performance decrements in driving ability in our pilot cohort were not primarily due to fatigue. Although lacking a similar sleep-restriction control study with experienced T-38 pilots, astronaut pilots in the current study did not exhibit changes in the simple reaction time task on R + 0, suggesting an ability to attend to the task at hand for short periods such as the T-38 simulation. The significant improvement in pilot performance on the second R + 0 landing attempt, a minute after the initial attempt and presumably achieved at a similar level of fatigue, also suggests that increased self-reported sleepiness on landing day alone was not the primary factor in post-flight pilot performance decrements.

We feel it is unlikely that the astronaut pilots simply ‘forgot’ how to perform the T-38 overhead approach and landing simulation given the approximately 247 days between the final pre-flight test session (mean 77 days before launch) and the R + 0 session following 170 days on orbit. All five subjects were experienced T-38 pilots and were required by NASA to maintain currency with annual T-38 flight and simulator training27, including the standard overhead landing pattern. In keeping with their status as professional T-38 pilots no training on our landing simulation was required beyond a familiarisation session with the simulator control hardware. Evidence of a learning effect was not observed, with no significant changes in touchdown (TD) parameters between the initial and final baseline data collection (BDC) sessions (for example, TD speed BDC 1 [116.9 kn SD 13.3] vs. BDC 3 [113.6 kn SD 9.3]). All subjects were familiar with Ellington Air Force Base and runway 17R (the simulated landing site); in fact all subjects had landed at Ellington Field just prior to R + 0 testing on their return from Kazakhstan. The simulation was designed such that navigation to the airport was not required as the initial conditions had the aircraft aligned with (and within sight of) runway 17R. Moreover, as previously reported2, a ground control group tested at the same temporal spacing as the astronauts found no change in performance on the driving simulator task after a 247-day gap. Although we did not have the resources to perform a similar ground-based control study with experienced T-38 pilots, a cohort of five private pilots tested at the same temporal spacing as the astronauts exhibited minimal changes in performance on the T-38 landing sim after a 247-day gap (see supplemental data file), although it must be stated that their performance in general was far below that of the experienced pilot cohort and thus highly variable.

The case studies of the three pilots (S1, S3, S5) who exhibited major performance decrements (> 3 SD from pre-flight mean) on the initial R + 0 landing attempts demonstrated two underlying causes; an inability to maintain altitude while performing the sweeping banking turns of the overhead approach, and spatial disorientation (a lack of awareness of the aircraft’s position and orientation with respect to the runway). The vestibular system is intimately associated with both. The two ‘hottest’ shuttle landings on record demonstrate the potential for piloting errors in these situations; STS 90 touched down at 224 kn after the commander experienced spatial disorientation (described as ‘tumbling the gyros’) following head movement late in the final approach, and a pilot-induced pitch oscillation of the shuttle nose occurred following main gear touchdown of STS 3 at 220 kn, possibly due to an error in perception of the pitch gravito-inertial acceleration (GIA) vector during deceleration1. Although we did not quantify vestibular stimuli in the current study, simulator kinematics during banking turns involved negligible cabin motion. In flight the cabin vertical is aligned with the GIA vector (the sum of gravity and centripetal acceleration) during aircraft banking, which is simulated by alignment of the cabin with the gravitational vertical (i.e. upright) during virtual banking; it is the visual scene alone that generates much of the sensation of tilt even during full-motion simulations. Our previous study of roll tilts of the head and eye during 45⁰ banking turns in a fixed-base flight simulator found a combined head-eye roll gain of 25% towards the scene vertical29, likely an optostatic cervical/ocular response to sustained tilt of the visual scene30,31. A direct measure of the otolith-tilt response, OCR gain, was significantly reduced by 24% 2–3 days after ISS missions in 25 astronauts12, and was likely even more depressed on landing day given the recovery of astronaut performance by R + 4 observed in the current and previous studies2. Similarly, motion perception results from our recent ISS study (which included these 5 pilot subjects) demonstrated a reduction in post-flight sensitivity to roll and pitch motion of the simulator cabin at a frequency of 0.12 Hz2. As optokinetic and vestibular reflexes share substantial neural pathways20 we propose that post-flight impairment of the ability to process low-frequency tilt information, whether from the otoliths or vision, was a likely contributor to the pilots’ inability to maintain control during the banking turns of the T-38 approach, which corresponded to GIA tilts of around 0.017 Hz, well within the 0.33 Hz limit of the human tilt response11.

There was a small but significant decline in manual dexterity on landing day in the pilot cohort that was not observed in the ground control study2. It is possible that this small decrement (around 10%) in fine motor control may have impacted pilot performance, although it is hard to imagine such a minor change causing such obvious deviations from pre-flight pilot performance as observed in the R + 0 landings. Perhaps more relevant was the significant performance deficit when dual tasking2; adding a distracting task significantly reduced manual tracking accuracy, suggesting a post-flight lack of cognitive reserve (dual-tasking was unaffected by a 247-day gap in testing and following a sleep restriction protocol in the ground control cohorts2, thus this deficit was clearly related to spaceflight). In addition to deficits in the tilt response described above, an inability to effectively multi-task post-flight was likely a contributing factor to the inability of our pilot subjects to maintain proper altitude and heading whilst banking the aircraft.

The results of this T-38 study, and our previous associated study of sensorimotor/cognitive and driving performance after spaceflight2, demonstrate that subtle physiological changes in-flight degrade post-flight pilot performance, consistent with results from the driving task2 and our previous study of actual shuttle landings1. Our results show that highly experienced professional pilots had difficulty controlling an aircraft in a gravitational environment after 6 months in microgravity, with an inability to accurately process vestibular and visual tilts and to carry out multiple conflicting tasks simultaneously. It is difficult to quantify what risks these impairments may have in future missions if unaddressed, although the worst-case scenario of loss of control during a landing millions of kilometres from aid is readily imaginable. It is arguably more valuable to use the results to inform countermeasure development. A positive result was that our astronaut pilots were able to successfully perform the landing task on the second attempt on R + 0, suggesting a rapid recovery once exposed to the task at hand. Pre-task (‘just in time’) simulation training, such as the laptop landing simulator used by shuttle pilots on orbit prior to return, may help maintain proficiency during extended spaceflight. Measures to aid pilots in resolving GIA and visual tilts in provocative inertial environments, such as improved visual displays or tactile vests32,33 to indicate the GIA or gravitational vertical, should be included in design considerations for future exploration class spacecraft. In addition, limiting dual or competing tasks during mission-critical phases should be considered.