Introduction

With the development of web services and mobile technology over the last decade, smartphone-based ride-hailing service emerges as a supplement to public transport modes. The ride-hailing service first appeared in 2011, since when the market has grown rapidly in a short period, however, without proper regulations at the early stages of its development1. Still, it features greater flexibility in daily travel routines with reduced costs and has become increasingly popular worldwide, especially in fast-developing countries. In China, for example, by the end of 2022, there had been more than 210 ride-hailing firms nationwide, serving a total of nearly 340 million registrants, over 6% of whom were daily active users2. As the travel demand has been constantly rising, the industry is currently employing over 3.08 million drivers and managing approximately 1.16 million vehicles; and over 300 new drivers are joining every day3. In the meantime, the surge of the travel demands translates into a high driving exposure on roads, accompanied by a high reporting rate of traffic accidents4. In Shenzhen, one of the first-tier cities in China, 176,948 traffic violations were recorded in 2017, where 3,879 accidents were found to be related to ride hailing. The accident rate of ride-hailing cars was as high as 7.15%, while that of private cars was only 0.28% in the same period5. There is therefore an urgent need for a better understanding of the underlying causes of the ride-hailing accidents.

Research on the effect of ride-hailing cars over the traffic accident rate has so far proved inconclusive. Some found a decline in certain types of accidents by the participation of ride-hailing in road traffic6,7,8. In reality, many of the ride-hailing taxi drivers are only on a part-time basis and used to drive for commuting only, thus contributing to lower the level of vehicle miles traveled. As pointed out by Barrios et al.9, ride-hailing taxis in-crease the rate of traffic accidents with the increasing number of vehicles and miles traveled on the roads. Some other studies, however, showed evidence that ride-hailing taxis may not always raise the rate of road traffic accidents9,10. Besides, ride-hailing services may attract travelers from other modes of public transportation, such as buses and sub-ways, resulting in more traffic congestion on the urban road networks, especially those in fast-developing countries. It is also noted that most of the existing studies were conducted in developed countries; few investigated the risk factors related to the ride-hailing drivers, given a wider social, demographic and economic context of developing countries. Since ride-hailing cars usually operate in cities, the risk of causing traffic accidents may be relatively higher in developing countries, given their more complex and mixed road traffic conditions.

Compared with traditional professional drivers, a few characteristics of the ride-hailing drivers may be more likely to pose potential risks to road safety. For example, many ride-hailing drivers have a full-time job and offer ride-hailing services during their off-duty times. In that case, the “part-time” drivers would be driving after extended periods of wakefulness or at nights. Previous studies showed that long, continuous driving hours would be more likely to lead to an accident11,12. Accordingly, the traffic regulators stipulate a maximum of continuous driving hours and that of daily working hours for professional drivers. These regulations, however, are not sufficiently effective for ride-hailing drivers because it is difficult to evaluate the impact of fatigue from their primary job on their part-time driving or to ensure their adequate rest before driving ride-hailing cars. Moreover, Mao et al.13 found that nearly 33% of the ride-hailing driver tend to work for long-shift bookings and their daily driving distance may even be up to 162.7 km – the heavy workload is an important reason for the high accident rate of ride-hailing drivers. Due to the special working schedule, Berneking et al.1 pointed out that fatigue and sleepiness were inherent safety risks in the ride-hailing industry, both of which proved to be key factors in increasing the risk of drowsy driving accidents14.

Risky driving behaviors have also been considered as major contributors to accidents in the authors’ recent studies11,15. For ride-hailing drivers, there are several reasons for a higher accident risk caused by risky driving behaviors. For instance, ride-hailing drivers typically have not received rigorous training and screening that is required for taxi or truck fleets13. Therefore, these drivers may fall short of professional drivers in terms of driving experience and skills, emergency handling, and emotional control. Besides, most ride-hailing drivers do not have enough experience in the service industry. As a result, they are more susceptible to passenger influence while driving, leading to more risky driving behavior. As highlighted by Acheampong16, reckless driving and risk-taking behavior by drivers appears to be frequently in ride-hailing services and have seriously affected road safety. It was found that ride-hailing drivers tended to be more pressed for saving time and earning profits, and thus more likely to commit risky driving behaviors, such as neglecting to use turn signals, encroaching car lanes, speeding, and running red lights17. In addition, since the smartphone is an important intermediary between the ride-hailing drivers and their customers, risky behaviors related to smartphones are quite common among these drivers. According to the Provisions on Penalties for Road Traffic Safety Violations, using phone while driving is classified as a traffic violation. Drivers who violate this provision face penalties, including demerit points and fines, as this behavior can easily lead to serious traffic accidents. A survey conducted by Truong and Nguyen18 compared the frequency of smartphone use when driving between traditional professional drivers and the ride-hailing counterparts. Their findings showed that the latter used smartphones more frequently and caused more phone-related traffic accidents. Ansar et al.19 examined the effect of driving distraction caused by passenger interference, smartphone use and entertainment elements on ride-hailing drivers. They concluded that the phone-related behaviors might make drivers be more susceptible to emotional changes while driving, thus increasing the risk of accidents.

Given the risk factors noted above, the safety issues in ride-hailing services have attracted significant attention in academia, though, the studies on ride-hailing drivers in developing countries are limited. In this paper, we attempt to fill in the gap and take China as an example to investigate the relationship between accident severity caused by ride-hailing drivers and a range of influencing factors, including demographic, working conditions, driver fatigue, and risky driving behaviors. To achieve this goal, we used a Random Forest (RF) to identify the most significant factors that influence the accident se-verity, and a Bayesian Network (BN) model to find out the causal nature of associations between the identified risk factors and accident severity, considering the interactions among all variables.

Materials & methods

Data collection

In this study, participants completed an electronic questionnaire through an online survey platform (www.wjx.cn). Ethics approval of the research was granted by Chang’an University Human Ethics Committee, and we confirm that all experiments were performed in accordance with the guidelines and regulations. The research team entered into a partnership with a professional survey company in China, which was responsible for sending questionnaires to ride-hailing drivers. At first, drivers received the link to the survey shared in WeChat groups and the questionnaire was filled out voluntarily by the drivers. Participants were then informed of their rights and the protection of their personal information in an electronic consent form, which also emphasized the purposes of this study and promised the survey data would be used for academic research only. Importantly, the questionnaires were completely anonymous. The minimum response time for each questionnaire is 10 min and drivers who complete the questionnaire will be compensated with 12 CNY. Experienced ride-hailing drivers, each with a minimum of 2 years of experience, were invited. The survey was conducted between October 10 and November 13 in 2023. The questionnaires were screened to ensure that respondents answered all key questions related to demographic information, working conditions, fatigue status, risky driving behavior, and accident history. Surveys lacking responses to any of these critical sections were classified as incomplete and deemed invalid. Out of a total of 1,534 drivers surveyed, 1,356 (about 88.4% of the respondents) returned valid questionnaires, for which the geographical distribution is shown in Fig. 1.

Fig. 1
figure 1

Geographical distribution of the sample.

The questionnaire consists of four parts, as shown in the appendix (Table A1). The first part includes questions about demographic information, such as gender, age and education level, for examining the relationship between socio-demographic variables and traffic accidents4,15. Considering a large number of ride-hailing drivers have multiple jobs and work under adverse conditions, which may increase their accident risk12, the second part deals with the drivers’ working conditions, including their employment status, daily working hours, daily income, dissatisfaction with incomes, and the commission rate of each order drawn by the ride-hailing company. Questions in the third part are about driver fatigue, involving continuous driving profile, fatigue perceptions, and sleep problems. These were referenced from reports regarding professional drivers who are vulnerable to fatigue or sleepiness20. In the fourth part, the drivers were expected to report the frequency of risky behaviors while driving on a five Likert scale from “Never” to “Always”. Questions on risky behaviors were referenced from Wang et al.11 and Nguyen-Phuoc et al.17. An example is that, “As a ride-hailing driver, how often do you engage in the following risky behaviors: Using apps on your phone? Using the phone for navigating? Crossing red light? etc.” Note that three questions related to phone use while driving were designed due to the dependence of ride-hailing drivers on smartphones18. Finally, accident records for each participant were collected over the past two years, consistent with the time window selected in previous studies to reduce potential recall bias regarding accidents21,22.

Research design

BN model has been gaining popularity in recent transportation safety research23,24,25,26,27. It visualizes the complex relationships and interactions among independent and dependent variables for constructing probabilistic inference and diagnosis. Thus, this method is suitable for modeling inter-correlated independent variables to better explain heterogeneous influence of ride-hailing drivers on accident injury severity from the attribute changes. However, BN model has difficulty identifying significant factors affecting the target variable, and each factor input to the model exponentially increases the computational complexity28. The 88.4 possible structure of BN with \(n\) variables increases super-exponentially as the number of variables \(n\) increases, and its magnitude reaches \({\text{1}}{{\text{0}}^{{\text{18}}}}\) for \(n=10\), and \({\text{1}}{{\text{0}}^{23}}\) for \(n=11\)29. Therefore, to achieve feasible and efficient estimation of BN model, it is essential to apply a variable selection procedure to exclude the other variables that do not influence model performance. In this study, a RF model is used to select significant variables and screen out unnecessary and redundant attributes. The details of RF model and BN model are provided in the following sections.

Random forest based variable selection

RF is an integrated learning algorithm, which consists of a set of tree-structured classifiers with independent identically distributed random vectors30. When dealing with classification issues, the model outputs the most voted classification of all trees. Since the number of trees can be customized, RF can give stable results. Moreover, compared with traditional variable selection methods (e.g. classification trees, stepwise regression, etc.), RF has the ability to handle multi-collinearity problem associated with candidate variables and is robust against over fitting. Therefore, RF has been frequently used to rank the importance of variables and to select significant variables in traffic safety studies31,32,33. The algorithm of RF can be referred to Breiman34.

In this study, RF was used to identify the significant factors that had a great impact on the accident severity of ride-hailing drivers. To this end, the importance of each variable to accident severity should be calculated, which could be measured by \(Gini\) index:

$$Gini(t)=1 - \sum\limits_{{i=1}}^{m} {{p^2}(i|t)}$$
(1)

where \(Gini(t)\) is the \(Gini\) index for a node t, m is the number of classes for this node, and \(p(i|t)\) denotes the estimated class probabilities. For a splitting variable \({X_i}\), the importance of \({X_i}\) represents the mean decrease in the \(Gini\) index over all trees in the forest for the node.

After the importance ranking of all variables is determined, a stepwise procedure was followed to select the significant variables that should be included in BN model. The variables were dropped one by one by checking the out-of-bag (OOB) error rates of RF models, which is calculated as the proportion of times that the voted class is not equal to the true class and averaged over all cases in the OOB data.

Bayesian network

BN model is a directed acyclic graph, consisting of a set of nodes, arcs, and probabilities. The nodes and arcs represent, respectively, variables and causal relationships between them. The directed arcs imply local conditional dependencies between parent and child nodes. The probabilities quantitatively describe the causal relationships represented by the arcs from one node to another.

The joint probability distribution of a set of random variables \(V=\left\{ {{X_1},{X_2},{X_3}, \cdots ,{X_n}} \right\}\) based on the conditional independence and the chain rule35, is included in the network as:

$$P(V)=\prod\limits_{{i=1}}^{n} {P({X_i}|Parent({X_i}))}$$
(2)

where \(P(V)\) is the joint probability of V and \(Parent({X_i})\) represents the parent nodes of \({X_i}\).

According to Bayesian theorem, the BN model update the prior probability of event occurrence with new information, denoted E, to yield the posterior probability. This could be represented by the following equation:

$$Pr(V|E)=\frac{{Pr(E|V)Pr(V)}}{{Pr(E)}}$$
(3)

where \(Pr(E|V)\) is the probability of event E conditional on the occurrence of event V; \(Pr(V)\) and \(Pr(E)\) are the probabilities that event V and E occur independently.

Modeling process and sensitivity analysis

The modeling process of BN model includes two steps: structure learning and parameter learning. Generally, there are three methods to build the BN model: (1) relying on the knowledge of domain experts; (2) using data driven algorithm; (3) combining (1) and (2)36. As for the expert knowledge is subjective and limited, and the structure learned by algorithm can appear to be unrealistic, the third method was selected in this study. First, the PC algorithm, a popular constraint-based method, was used to learn the original BN model. This method infers the relationship between paired variables by applying conditional independence tests, and uses a backward stepwise testing procedure to determine absent edges in the resulting graph37. Then, the edges that do not fit the theoretical reality are deleted according the expert knowledge and previous literature. With the established BN model, the parameter learning is conducted using CPTs from the dataset. This process is accomplished by GeNIe Software 3.0 (BayesFusion, LLC; http://www.bayesfusion.com/), which performs the maximum likelihood estimates using the Expectation-Maximization algorithm38.

Once the modeling process is completed, the prior probability of the target variable can be obtained. Sensitivity analysis calculates how the probabilities change as new evidence or cases become available. The probability of a state in a variable was adjusted to \(100\%\) each time, while other states remain unchanged, and the probability change of target variable can be calculated:

$$Sensitivity=\frac{{Probability(set100\% ) - Prior{\text{-}}probability}}{{Prior{\text{-}}probability}}$$
(4)

Results

Descriptive statistics

Table 1 presents sample statistics about demographic characteristics, working conditions and driver fatigue of the respondents. Of the total of 1,356 valid responses, over a half accident records, with 43.9% of drivers were involved in property damage accidents, and 8.5% in injury accidents; the proportion of male drivers was nearly twice that of female drivers. The imbalance among the “No accidents”, “Property damage” and “Injury” categories may affect the performance of our model; however, it somewhat reflects real-world conditions, where serious injuries are relatively rare. Maintaining this realism is likely to enhance the transferability of our solutions. Over 90% of the surveyed drivers were under the age of 40, the vast majority (86.7%) of whom had achieved a level of higher education. The data suggested that those ride-hailing drivers were mostly young male and well educated, and however accounted for a higher rate of accidents. This may be attributed to factors such as younger and more educated drivers having less driving experience and poorer judgment of road conditions.

Table 1 Characteristics of ride-hailing taxi drivers (n = 1356).

The results also paint a gloomy picture about the working conditions of ride-hailing driver. 63.1% of drivers were employed as ride-hailing drivers on a part-time basis. Not surprisingly, although 93.5% of the drivers claimed that their continuous driving times were less than four hours, 43.4% reported they had fatigue feeling while driving and 33.5% had sleep problems. Ride-hailing drivers suffer a heavy workload, but they don’t earn much, as 46.4% of them reported that their daily income was less than 200 yuan. Despite the low income of many drivers, ride-hailing companies still charged a high daily management fee, with a commission rate greater than 15% for each order. Relatedly, only 11.7% of the drivers were satisfied with their daily income.

Table 2 shows the prevalent risky driving behaviors in terms of frequency among ride-hailing drivers. As expected, the frequency of phone-related behaviors is quite high, as using a phone to navigate while driving (M = 3.91 ± 1.04) was found to be the most frequent behavior, followed by using a phone to talk (M = 2.79 ± 1.21) and using apps on the phone (M = 2.03 ± 0.91). For the other nine risky behaviors, the three most frequently behaviors were honking at other drivers, driving with one hand and failure to use a lamp indicator.

Table 2 Distribution of risky driving behaviors while driving (%).

RF results and important variable selection

Initially, the 23 explanatory variables described in Fig. 2a were input to the RF model for the accident severity. To obtained stable results, the different number of trees to grow was tried, and in this study, 500 trees were sufficient. The number of variables randomly sampled as candidates at each split is recommended in the literature as the square root of the number of explanatory variables (\(\sqrt {23} \approx 5\))39.

For RF model with all explanatory variables, the OOB error rate of the model was 28.6%, and the importance rank of the variables based on their corresponding MDG values was showed in Fig. 2a. The driver’s daily working hours, with an MDG value of 69.6, ranked at the top and turned out to have the most significant impact on the classification of accident severity.

Then, the RF model was conducted in a successive phase in which the number of input variables was set from 1 to 23 according to the importance rank of variables, and the changes of OOB error rates were showed in Fig. 2b. The result showed that when the numbers of variables set to 16, the OOB error rate get the minimum value of 0.27, indicating the seven least important variables can be removed from the model. Therefore, the top 16 variables ranked by the RF model were used to construct BN model.

Fig. 2
figure 2

Selection of factors predicting accident severity by RF. (a) Variables ranked by MDG. (b) Select variables by OOB error.

Bayesian network modeling results

Model performance

To validate the model, 80% of the 1356 records were randomly selected as a training set and 20% a test set. The original five-level categorization resulted in small sample sizes (< 5%) for some combinations, which adversely affected the stability of the conditional probability tables. Therefore, the five-point Likert scale measurements were reduced to three40. We merged “never” and “seldom” into “seldom”, “often” and “always” into “often”, the intermediate value “sometimes” remained unaltered.

Table 3 presents a confusion matrix showing the predicted results from the BN model estimation results for both datasets. In Table 4, the predicted accuracy rate of “no accidents”, “property damage” and “injury” were 90.3%, 60.0%, and 60.7% for testing dataset, respectively. The overall estimation accuracy of this trained BN model was 74.8% for testing dataset, indicating the model has good prediction performance.

Table 3 BN model classification confusion matrix for the training and testing dataset.

The Receiver Operating Characteristic (ROC) curve was also used to evaluate the overall performance of the BN model. The area under a ROC curve (AUC) is a quantitative index to assess the overall performance of model classification. Typically, the value of AUC more than 0.7 indicates a good model prediction performance31. Figure 3 showed that the AUCs achieved by the BN model. The AUCs for “no accidents”, “property damage” and “injury” accident severity classification of the testing dataset were 0.85, 0.86, and 0.84, respectively, which also confirmed the validity of the model.

Fig. 3
figure 3figure 3

The ROC curve for three levels of accident severity (testing data). (a) No accidents (AUC = 0.85). (b) Property damage (AUC = 0.86). (c) Injury (AUC = 0.84).

Relationship identification

Using the 16 explanatory variables identified by RF and the target variable (i.e. accident severity), the BN model with 17 nodes was obtained by the combination of PC algorithm and expert knowledge, as showed in Fig. 4. The input variables are all related to the accident severity in a direct or indirect way. Among all the explanatory variables, nine of which were directly related to accident severity. They are fatigue feeling, sleep problems, use apps on your phone, use a phone to navigate, use a phone to talk, honking at other drivers, driving with one hand, parking in forbidden areas and speeding. These variables were also correlated with each other. On the other hand, the remaining seven explanatory variables turned out to be indirectly related to accident severity. For instance, the variable of long daily working hours was one of the causes of sleep problems whose frequency could however directly affect the accident severity.

Fig. 4
figure 4

The BN model learned by the PC algorithm.

Sensitivity analysis

With the BN model, we could make inference by updating the probabilities of different states in each node. The compiled BN model was shown in Fig. 5. In the first run, the initial probabilities of “no accidents”, “property damage” and “injury” were 46.6%, 37.2% and 16.2%, respectively. Since the 16 explanatory variables have a total of 49 categories, a total of 49 states requires sensitivity analysis. For instance, assuming that the variable “sleep problems” is rated 100% “often”, the probabilities of “no accidents”, “property damage” and “injury” were predicted to change to 31.9%, 40.3% and 27.8%, respectively. In this case, the sensitivity values of the three levels of accident severity were 14.7%, 3.1% and 11.6%, respectively.

Fig. 5
figure 5

The compiled BN model for ride-hailing taxi accident severity.

Table 4 presents the sensitivities for each state of different variables having impact on accident severity. The causal factors of accident severity differ between the three levels. For drivers with no accident record, often feel fatigued while driving (15.9%), often experiencing sleep problems (14.7%), often speeding (14.4%), and often or sometimes parking in forbidden areas (14.1%; 14.0%) were the top five states that reduce the probability of not being involved in an accident. For drivers who had records of property damages in an accident, often or sometimes feeling fatigued while driving (4.3%; 3.6%), often using a phone to talk (3.3%), often experiencing sleep problems (3.1%), and working over 10 h daily (3.1%) are the top five states that increase the probability of being involved in property damage accidents. For drivers who reported injury accident records, often or sometimes speeding (17.1%; 13.9%), often or sometimes parking in forbidden areas (15.4%; 15.2%), and often using apps on your phone (13.4%) are the top five states that increase the probability of causing injuries in an accident.

Table 4 Sensitivity analysis of different variable state for three accident severity (%).

Discussion

While many previous studies have explored the risk factors of traffic accidents for different types of professional drivers, this study set its sight on an emerging occupation, namely, the ride-hailing drivers, to investigate the influence of risk factors on different accident severity. Overall, the results reveal that both part-time and full-time ride-hailing drivers are subject to heavy workload and low income, which are potential triggers for them to engage in more risky driving behaviors and thus increase the risk of accidents. The information about the workload portrays a picture even more worrisome than similar profession such as taxi drivers12, because most of the ride-hailing drivers have multiple jobs and work extreme long times1.

The results also suggest that the main causes of the three levels of accident severity are different, especially between property damage accidents and injury accidents. Factors related to driver fatigue were strongly associated with property damage accidents, while risky driving behaviors such as speeding, parking in prohibited areas, and driving with one hand had a greater impact on injury accidents, and, of course, mobile phone use had a great impact on both types of accidents. Similar discrepancies were also found in previous studies on accident severity29,41, although the study population and explanatory variables were different from the present study. Also, the results proved that reducing the frequency of risky behaviors, managing driver fatigue and improving working conditions could increase the probability of no accidents in different degrees. These finds clarify which factors should be prioritized at different accident severity and shed a light on how to efficiently improve the safety performance of ride-hailing drivers.

Besides, although previous studies have demonstrated the relationship between risky driving behaviors and accident involvement, the present study adds to the literature by finding that phone-related behaviors including using apps on the phone, using the phone for navigating, and talking on the phone are directly related to predict accident severity. Moreover, the more frequently drivers use their phones, the higher the probability of causing a serious accident. Indeed, using a phone while driving can seriously distract the driver’s attention and weaken their ability to perceive and react to changes in the complex traffic environment. Under such circumstances, drivers have less control over their vehicles and tend to miss important road safety information, making them more prone to accidents, especially in the case of an unexpected situation or emergency. Most importantly, this study found that ride-hailing drivers have become somewhat dependent on the use of mobile phones. In addition to the necessity of processing the orders on ride-hailing apps, almost all ride-hailing drivers rely on navigation aids while driving. One possible explanation is that the group of drivers includes a large number of part-time workers. These people may not be professional drivers, and some are even novice drivers who are not familiar with most road conditions in the city. Another possible reason is that using navigation facilitates advanced route planning, which helps save travel time and thus allows drivers to earn more orders.

This investigation also provides an understanding of the correlations between phone-related behaviors, driver fatigue, and other risky driving behaviors. These results indicated that drivers who frequently use apps on the phone or use the phone for talking are more likely to drive with one hand or honk at other drivers. Generally, using functions on the phone required handheld or one-handed operation, so the phone-related behaviors while driving can undoubtedly trigger more frequent one-hand driving. Besides, when the drivers receive orders via their phone, they should get to the passenger’s location as quickly as possible. At this time, the drivers may become more aggressive and more likely to honk at other slower drivers, especially when experiencing traffic jams. Meanwhile, the frequency of using phones while driving reflects the busy working situation of ride-hailing drivers. As noted in the studies of Peng et al.12, the desire to make more money drives drivers to take on a heavier workload, which further exposes them to more severe physical and mental stress. Therefore, it is not surprising to find that phone use is also closely linked to driver fatigue and sleep problems, not to mention the fact that mobile phone use while driving can complicate driving tasks and cause drivers to expend energy on dealing with other non-work-related matters.

Additionally, the results of this study support that driver fatigue and sleep problems influence accident severity from both the direct and the indirect aspects. On the one hand, driving while fatigue always leads to impaired driving skills, lack of concentration, and slower response times of drivers, which have been proved to be highly correlated with higher accident severity14. Meanwhile, serious sleep problems not only exacerbate driver fatigue but also lead to a decline in memory and alertness, causing drivers to make more mistakes while driving42. On the other hand, this study also confirmed that driver fatigue and sleep problems induce more risky driving behaviors, which are directly associated with behaviors such as speeding, parking in forbidden areas, and failure to use an indicator lamp, all of which have been considered important precursors of traffic accidents in previous studies11,15,17.

Regarding demographic factors, only age was found to have an indirect effect on accident severity, by influencing drivers’ use of mobile navigation. As mentioned above, ride-hailing drivers use mobile navigation with high frequency. This finding further establishes that younger drivers rely more on the navigation to guide their driving. Since the ride-hailing taxi driver is a young driver-dominated profession, and the industry has inevitably introduced some drivers with less driving experience and poorer judgment of road conditions, the interventions should totally consider the particularities of this group of drivers.

As for working conditions, related factors can affect accident severity in many indirect ways. For example, drivers with long daily working hours are prone to continuous driving hours and are more likely to feel fatigued or fall into sleep while driving. Drivers who are dissatisfied with income or are subject to a high commission ratio will use their phones more frequently while driving because they do not want to miss any orders. Consistent with previous studies13,18, this study also confirmed that driver fatigue, sleep problems, and phone-related behaviors are important precursors of traffic accidents. Hence, it can be concluded that working under adverse conditions lead ride-hailing drivers more vulnerable to accidents. Moreover, the finds showed that drivers who worked for 5 to 8 h or over 10 h daily have a higher probability of causing serious accidents. Considering that the income of ride-hailing drivers depends largely on their working hours, it can be inferred that both part-time and full-time drivers are suffering heavy workloads as they tend to work longer hours to increase their income. It is worth emphasizing that full-time drivers are more likely to cause accidents when working similar hours than part-time drivers. One reason is that ride-hailing drivers are often hired as “independent contractors,” so they are not screened for medical problems that might reduce alertness, such as obstructive sleep apnea1. Another possible explanation lies in the fact that full-time drivers are more likely to be exposed to accident risk factors due to long-term exposure to the road43.

Conclusions

This study identified the risky factors of accidents related to ride-hailing drivers and investigated the relationships between demographics, working conditions, driver fatigue, and risky driving behaviors, and accident severity among Chinese ride-hailing drivers. An online self-reported survey was conducted from October 10 to November 13 in 2020, and a total of 1,356 effective questionnaires were collected. A two-steps method was applied in this study: first, a random forest (RF) technique was used to rank the importance of factors predicting accident severity, and then a Bayesian Network (BN) model was employed to establish the relationship between accident severity and significant factors. The results of the RF showed that it is appropriate to eliminate the seven least important variables according to the OOB error rate. Then a total of 16 significant factors were used to construct a BN model and the relevant results indicated that nine variables including fatigue feeling, sleep problems, use apps on your phone, use the phone to navigate, use the phone to talk, honking at other drivers, driving with one hand, parking in forbidden areas and speeding were directly related to accident severity. The remaining variables on demographic and working conditions can influence accident severity in indirect manners. Finally, sensitivity analysis further explored the main causes of different accident severity.

Findings from the present study have many practical implications. To reduce serious accidents, the management of ride-hailing drivers’ risky driving behavior should be prioritized. For example, increasing the management and assessment of new drivers; appropriately strengthening the monitoring and punishment of risky driving behaviors; creating some incentives to reward drivers who exhibit good driving behavior over a designated period. Since mobile use is an important means of taking online-orders, the related risky behaviors require more precise countermeasures: grading penalties ranging from no penalty to license revocation can be carried out depending on the possible consequences of phone use; integrating more functions that can be done by voice in ride-hailing applications. Besides, measures should be in place to reduce driver fatigue. It is responsible for ride-hailing companies to assess the number of part-time workers, the shifts that drivers typically work, and the frequency of drivers’ fatigue and sleep problems, to advocate for policies on hours-of-service regulations and appropriate work-rest schedules. Meanwhile, a more rational order allocation scheme should be adopted to ensure drivers could schedule shifts and rest more effectively. Importantly, the findings indicated that the improvement of income could reduce drivers’ working hours and reliance on phone use, which eventually alleviates driver fatigue. Achieving this goal requires financial support from the government for this industry, allowing ride-hailing companies to reduce the commission ratio of each order and then improve service quality to attract more customers.

A few limitations remain. First, while the study acknowledges the geographic distribution of respondents, it does not explicitly address subtle regional differences in driving conditions, regulations, and accident rates across various cities in China. Future comparative analyses will examine how these factors influence accidents among online taxi drivers in different regions. Second, our reliance on categorizing crashes solely by their highest severity is a limitation, as it overlooks the cumulative effects of multiple minor crashes and their frequency. This approach may obscure important behavioral differences among drivers, particularly between those with several minor crashes and those with no crashes or a single severe incident. Future research should investigate the interplay between accident frequency and severity to enhance our understanding of accident risk.