Introduction

The fatigue detection of pilots is an important subject in aviation due to the direct safety of flights and passengers1. Pilots are primary factors in ensuring flight safety2,3,4. According to the investigation outcomes of flight accidents, about 70% of them are directly related to human factors. Especially, most of the investigated accidents are caused by pilots when their fatigue-related conditions are dominant5,6. Strong correlations between facial expressions and human fatigue levels have been found in numerous studies7,8. Similarly, pilots' fatigue states can be efficiently determined by surveilling their expression alterations during flights. Moreover, two different approaches are employed to pinpoint fatigue levels: 1. physiological signals; and 2. face information.

When fatigue detection is determined based on utilizing physiological signals, electroencephalogram (EEG), electrocardiogram (ECG), electro-myo-graphic (EMG), and other types of measurements are primarily employed to obtain pilots' brain, heart, and muscle signals through various kinds of sensors. Thus, the fatigue stages of pilots can be numerically measured9,10. When ECG, EMG, pulse, and respiration data were collected during flights, Jiayun et al.11 suggested an algorithm to evaluate the workload of the pilot and optimize the design of the aircraft’s cockpit ergonomics. Fei et al.12 extracted the attributes of pilots' EEG signals and employed the support vector machine (SVM) algorithm to cluster pilots’ fatigue states. Xu et al.13 suggested a model based on a hybrid multi-class Gaussian process to pinpoint pilots’ fatigue levels by investigating the surface electromyographic signals measured from the pilot's neck and the muscles of the upper arm. Hu et al.14 provided psychological insights into available non-invasive fatigue measurements of drivers and pilots by segregating between drowsiness and mental fatigue. Du15 investigated pilots’ fatigue when EEG signals were used.

Alternatively, information extraction from human faces helps collect data. Yang16 introduced a network framework based on the length and angle attributions of face grid points to resolve the low accuracy problem of face expression detection due to skewed face postures when face structure is point-wise represented. Wang et al.17 proposed a method that utilizes monitoring pilot fatigue levels when human eye detection was implemented. You Y18 suggested a method utilizing machine vision and the percentage of eyelid closure over the pupil over time (PERCLOS) algorithm by the camera to derive information for fatigue detection. The pilot images were collected to process facial recognition, eye recognition, and eye state determination to monitor pilots’ fatigue in real-time. Zhang et al.19 suggested a method that detects and tracks pilots’ head positions based on a cascade CNN, which can effectively surveil the head movements of the pilot in the training stage of the cockpit simulation. Liu et al.20 proposed a deep learning algorithm to detect fatigue by using facial expressions, aiming at improving the accuracy and timeliness of the fatigue detection of drivers. Twenty-four face attributions were extracted. Two parameters describing drivers' fatigue states were computed, and finally, a fuzzy inference system was implemented to determine drivers' fatigue states.

Most of the available literature utilizes physiological data and face images to model and analyze. However, quite a few research papers have been present to detect the face attribution points of pilots. In the article, flight trainees were utilized and their face attribution points were derived. Land and air calls were simulated to construct a face fatigue model, called EMF, by utilizing face attribution points. Then, a particle swarm optimization-based CNN (PSO-CNN) algorithm was proposed to construct a model to recognize face fatigue and implemented for determining the fatigue levels of flight trainees.

Experimental design

Participants

In the study total of forty male students whose ages between 20 and 22 years with a mean and standard deviation of 21.5 and 0.47 years, respectively, pursuing a degree in flight technology at the College of General Aviation and Flight at Nanjing University of Aeronautics and Astronautics (NUAA) participated in the tests. All had normal visions (or normal vision after correction) and no vestibular symptoms or neurologic disorders were detected and received training before flight simulations were run. Moreover, consuming alcohol or taking any neurological drugs was not allowed before the experiment was conducted. Five of the subjects had sufficient sleep (more than 8 h) on the day before the experiment was run while five subjects did not get sufficient sleep (5–6 h).

Experimental flight subjects

Airfield traffic patterns are chosen to simulate flights since the comprehensive literature survey and consultation with senior flight instructors underline the importance of the subject. Note that the airfield traffic pattern is a pivotal component of the pilot training stage since it is composed of maneuvering around airports. In the training scenario, pilots acquire skills such as takeoff, climb, turning, leveling off, descent, and landing. Figure 1 depicts the schematic diagram of an airfield traffic pattern.

Fig. 1
figure 1

A diagram of the airfield traffic pattern.

Procedure

Participants completed the five-sectioned flight simulation in the Primary Flight Simulation Laboratory at NUAA, where the subjects' face video recordings were collected when the pilots' land and air calls were simulated during the flights.

The Cessna C172SP Skyhawk airplane was utilized and Beijing Capital International Airport was chosen to run the experiment. The airport’s environmental condition was set to be clear sky and wind velocity of 5–15 knots. The flight simulation utilized the subject of the airfield traffic pattern that lasted about 12 min. Figure 2 depicts the whole process and gives a picture of the flight trainee’s face.

Fig. 2
figure 2

The experimental process.

The experiment requires the flight trainees to properly maneuver the airplane and make land and air calls when the simulation is on. The whole experiment was videotaped, and the video clips of maneuvering, land and air calls, and yawning were screened. Also, when the experiment was terminated, flight trainees were asked to express how they felt while a recorded video was played to collect data regarding feelings.

The point extraction of face features

In the article, the open-source Dlib library is employed to derive face attribution points20. The shape_predictor_68_face_landmarks.dat in the Dlib library is employed to pinpoint face attribution points of faces in the recorded video when pilots and air calls occur. Thus, the 68 attribution point coordinates of the pilots’ faces in each frame are obtained. Figure 3 depicts 68 face attribution points of a face, and the authors’s picture shows them.

Fig. 3
figure 3

The 68 face attribution points.

Facial feature model

EMF feature model for face fatigue recognition

EMF face fatigue model’s attribution dimensions are composed of three attributions such as eyes, mouth, and face contour, respectively. The detection model of face fatigue is constructed based on these three attributions, and the author’s picture shows them in the middle.

Eye attributions

When flight trainees operate aircraft under normal conditions, the aspect ratio of the eyes (EAR) is stable in the vicinity of a certain score except for blinking. On the other hand, when yawning occurs, the EAR alters substitutionally. Figure 4a depicts the eye attribution points. A total of six attribution points of the eye are pinpointed and represented by P1-P6, respectively in Fig. 4a. Equation (1) presents the eye aspect ratio.

$$ EAR = \frac{{\left\| {p_{2} - p_{6} } \right\| + \left\| {p_{3} - p_{5} } \right\|}}{{2\left\| {p_{1} - p_{4} } \right\|}} $$
(1)
Fig. 4
figure 4

Attribution points of face fatigue detection.

Mouth attributions

When flight trainees operate aircraft during land-air conversations, the mouth aspect ratio (MAR) alters. However, when flight trainees yawn, as depicted in Fig. 5, their MARs are important. Figure 4b depicts the mouth attribution points. The similar six attribution points of the mouth are represented by P1-P6, respectively. Equation (2) presents the MAR.

$$ MAR = \frac{{\left\| {p_{8} - p_{12} } \right\| + \left\| {p_{9} - p_{11} } \right\|}}{{2\left\| {p_{7} - p_{10} } \right\|}} $$
(2)
Fig. 5
figure 5

Alterations in facial attributions.

Facial attributions

When flight trainees yawn during a land-air conversation, his or her facial contour will alter substitutionally and is distinct from what happens in normal speech. Thus, the aspect ratio of the facial contour (FAR) can reflect the distinction between these two occurrences. Figure 4c depicts the facial attribution points. The similar six attribution points of the FAR are identified and characterized by the attribution points pinpointed by the Dlib model package. Equation (3) presentsthe FAR.

$$ FAR = \frac{{\left\| {p_{14} - p_{18} } \right\| + \left\| {p_{15} - p_{17} } \right\|}}{{2\left\| {p_{13} - p_{16} } \right\|}} $$
(3)

The attribution points of eyes, mouth, and face, respectively, could well reflect the fatigue characterizations of the trainees. Thus, the face fatigue recognition of the EMF model is constructed based on the attribution points of the three regions.

EMF’s face fatigue attributions

Processing of face attribution data

In the article, a 20 s video of a flight trainee's face is captured at 10 frames/s with a resolution of 640 × 360. The whole video is transformed to obtain 200 images, and some frame images are extracted to obtain the images in Fig. 5. The video content contains normal speech and yawning, respectively. The change in each dimension of the EMF model is analyzed through the video. The portrait in Fig. 5 is the test flight trainee.

The characterization changes of the EMF model

In the research, the variations of the EAR, MAR, and FAR of the above 20 s video were derived as depicted in Fig. 6. In the figure, the X-axis is the frame of the image (in units of frames) and the Y-axis is the ratio of the width to the height (unitless).

Fig. 6
figure 6

The point changes of facial features.

For the video under consideration, normal speech was performed from 60 to 90 frames and a yawn happened from 120 to 160 frames. Normal speech and yawn expressions change from 60 to 90 and 120 to 160 frames, respectively in Fig. 6. The face attribution data revealed that when yawning happened, there was a significant variation in the EAR, MAR, and FAR. On the other hand, there were variations in the MAR and FAR during normal speech. However, an insignificant variation in the EAR occurred. Hence, note that the proposed EMF model could reflect the changes in face attributions well.

Methodology

Experimentally collected data

Recordings from 10 subjects were selected to be processed and analyzed. Face attribution data were marked when yawning occurred as a fatigue representation and normal speaking and no facial expression change occurred as a non-fatigue-less representation. A total of 766 face attribution data were derived. When flight trainees maneuver the aircraft in simulation, conduct land and air talk, and yawn, 225, 287, and 254 images, respectively exist. These are labeled as 1, 2, and 3, respectively. Table 1 presents them.

Table 1 The extracted data.

PSO-CNN

Particle swarm optimization

The initialized particle swarm optimization (PSO) is composed of a group of random particles. Then, the optimal solution is determined by running iteration steps21. Equations (4) and (5) present the fundamental mathematical expressions of the PSO.

$$ \begin{gathered} v_{i} = v_{i} + c_{1} \times rand\left( {} \right) \times \left( {pbest_{i} - x_{i} } \right) \hfill \\ + c_{2} \times rand\left( {} \right) \times \left( {gbest_{i} - x_{i} } \right) \hfill \\ \end{gathered} $$
(4)
$$ x_{i} = x{}_{i} + v_{i} $$
(5)

where N represents the total particle numbers; vi denotes the speed of the particle; rand() designates a random number between (0,1); xi shows the particle current position, i = 1,2,3…N; c1 and c2 represent the learning factors, which are generally set to 2; and pbest and gbest represent the two extremes that the particle needs to follow.

CNN

Figure 7 depicts that a CNN, composed of a convolutional layer, a pooling layer, and a fully connected layer, is a kind of feed-forward neural network that is defined by an activation function22.

Fig. 7
figure 7

A CNN algorithm.

The convolutional layer includes multiple convolutional kernels, which cover an area called the "sensory field". The pooling layer is implemented to pick attributions to decrease the attribution numbers in the input data. The fully connected layer is employed to fit the derived attributions to the output non-linearly.

The combination of EMF with PSO-CNN model

In the research, EMF-PSO-CNN is employed to recognize face features. The recognition of the feature points of a face and the calculation of the aspect ratio of the EMF model are first conducted. Then, the recognition results are visualized. The process can synchronize video recognitions and calculate the aspect ratio of the three dimensions of the EMF model.

Then, the PSO-CNN algorithm is employed for training. The convolutional layer contains multiple convolutional kernels, which cover an area called the "sensory field", the pooling layer is used for feature selection to reduce the number of features in the input data, and the fully-connected layer is implemented to fit the extracted nonlinear features to the output. Figure 8 depicts the structure of the algorithm.

Fig. 8
figure 8

The structure of the algorithm.

Network training and prediction

The training process employs the Keras architecture based on the deep learning framework of TensorFlow to build a runtime environment. CPU: Intel i5 -10400f. 3.2 GHz; operating system: Win10 64-bit; programming language: Python 3.7.7; deep learning architecture: TensorFlow 2.3.0 Keras 2.4.3 are the parameters of the experimental environment.

75% of the data were allocated as training data and 25% as test data, respectively After the second fully connected layer is optimized by the PSO, the optimal neuron number is 38. Table 2 presents the parameters of the whole network.

Table 2 The model parameters of the PSO-CNN.

Figure 9 depicts The specific structure of the PSO-CNN algorithm.

Fig. 9
figure 9

The structure of the proposed PSO-CNN.

Figure 10a depicts that the model optimization is carried out using an RMSprop optimizer with a learning rate of 0.001, a training batch of 16, and an iteration number of 500. The final detection precision of the proposed algorithm is attained based on both training and test data, respectively.

Fig. 10
figure 10

Accuracy of the model recognition.

After running 500 training sessions with the PSO-CNN, the precision reached up to 93.9% on the validation set.

Then, the CNN with the fully connected layer is presented in Fig. 10b when the PSO is not used to optimize. Figure 10b depicts the final detection precision of the proposed algorithm based on the training and test datasets, respectively, when the PSO is used to optimize.

The highest precision of 89.6% on the validation set after running 500 sessions when CNN is employed is lower than the highest precision of the PSO-CNN.

Comparative analysis

To validate the employed transformer deep learning algorithm, the number of heads in the transformer layer is set to 4, and Table 3 presents the structure of the algorithm.

Table 3 The model parameters of the transformer.

The PSO-CNN model utilizes 75% of the data as training and 25% as a test. The iteration number is set to 500. The loss function is Categorical-Crossentropy and the optimizer is the Adam algorithm. The model accuracy is depicted in Fig. 11.

Fig. 11
figure 11

The accuracy of model recognition in Transformer.

To validate the reliability of the PSO-CNN, two conventional machine learning methods, the Random Forest23 and Support Vector machine algorithms24, were picked to compare. These two algorithms are run until the iteration number reaches 50 in each and finally, the optimal recognition accuracy is obtained. Table 4 summarizes the recognition results of each algorithm.

Table 4 Comparison of algorithms.

Table 4 depicts that when the 4 conventional algorithms are compared, the optimized PSO-CNN has a high recognition ratio and robustness, and can achieve accurate classification of face fatigue levels in flight trainees.

The proposed algorithm is fast but is restricted by hardware conditions when compared to conventional machine learning algorithms that run slightly slowly. Overall, the proposed algorithm can be implemented to detect the fatigue condition of flight trainees.

Verification and Validation of algorithm

In order to deeply verify the accuracy of the model, we collected some additional data to verify and validate the algorithm. We searched for ten flight trainees majoring in flight technology according to the same criteria. Their mean age was 20.5 years with a standard deviation of 0.71. The face feature point extraction process is shown in Fig. 12:

Fig. 12
figure 12

Face feature point process.

A total of 358 data were collected in the validation test. When flight trainees maneuver the aircraft in simulation, conduct land and air talk, and yawn, 112, 124, and 122 images respectively exist.

The face feature points are extracted by EMF model and some of the data obtained are shown in Table 5:

Table 5 Validation of experimentally extracted data.

The trained PSO-CNN model is used to recognize the validation data and the accuracy of recognition is obtained as 91.2%. The ability to recognize the flight trainee's face expression well verifies the reliability of the model.

Conclusion

The subsequent conclusions were obtained when flight trainees' simulated Airfield Traffic Patterns with land-air calls were under investigation.

  1. (1)

    The face video recordings of flight trainees during land-air calls were simulated, and the face attribution points were obtained by employing the Dlib package.

  2. (2)

    Based on the extracted attribution points of flight trainees' faces, an EMF fatigue model was constructed.

  3. (3)

    A PSO-CNN algorithm was constructed and implemented to train and predict the fatigue attribution of the flight trainees' face data by running a simulation, and the prediction accuracy reached 93.9%. To compare the results of the proposed algorithm with those of the RF and SVM algorithms, a comparison study is run to validate.

  4. (4)

    The training speed of the model can be effectively reduced, and the efficiency of recognition can be improved by screening the data of face attribution to pinpoint fatigue levels of flight trainees' faces.

In future research, we will plan to optimize the algorithm further to improve the accuracy when flight trainees conduct land-air calls.