Abstract
Service issues and limitations of generic measures (scales, instruments) necessitate well-developed customized measures to evaluate service quality. Against methodological limitations that grounded in researchers’ thoughts and second-hand knowledge (existing literatures and materials) to derive measure items and dimensions, limited considerations for measure validity (e.g., measurement invariance), this study proposes a generic approach for customized measure development firstly by employing grounded theory to preliminarily derive measure items and dimensions from the first-hand knowledge of customers’ thoughts (customers’ experiences and perceptions expressed through focus group discussions), then by combining exploratory factor analysis and confirmatory factor analysis to purify and assess the measure, and finally by conducting multi-group confirmatory factor analysis to test measurement invariance across different user cohorts. This approach was used for the less researched ride-hailing industry. The data were collected from 1464 ride-hailing users in Suzhou, China. The results suggested a 3-dimension 12-item model represented by Service, Integrity and Efficiency being a meaningful and valid measure. This multidimensional model could be viewed as a more parsimonious alternative to using a vast set of measure items individually in analyzing ride-hailing service quality, which may provide operators with managerial implications in improving the service.
Similar content being viewed by others
Introduction
As a new-type app-based door-to-door transport mode, ride-hailing has exploded in popularity all around the world. In China, for instance, the number of certificated platforms, drivers, vehicles, and users had reached 337, 6.57 million, 2.79 million, and 528 million by December 2023, completing 894 million trips in the single month of December. However, there are many service issues behind its explosion. For instance, users in Shanghai reported multifaceted grievances in the first quarter of 2023: (a) procedural failures (e.g., drivers abruptly terminating trips before destinations), (b) ethical breaches (e.g., big data-enabled price discrimination exploiting frequent users), (c) safety compromises (e.g., dangerous driving patterns), and (d) economic exploitation (e.g., unjustified detours and surge pricing). There is ample scope to improve ride-hailing services, but strategies should be notified by an in-depth knowledge of users’ needs from the service quality perspective. Therefore, how to measure, then evaluate and finally improve ride-hailing service quality to sustain its competitive edge has become the development priority of this industry.
Berry et al. (1990) stated that “customers are the sole judges of service quality”. Service quality hinges on customers’ perceptions of each specific attribute characterizing the service (De Oña et al., 2013). How to develop a scale (measure, instrument) embracing appropriate attributes (attribute-specific items, indicators) to measure service quality from the customer’s perspective has been extensively explored. It’s generally recognized that service quality is multidimensional (Parasuraman et al., 1988). SERVQUAL (Parasuraman et al., 1988), SERVPERF (Cronin and Taylor, 1992), and E-S-QUAL (Parasuraman et al., 2005) are three multidimensional generic measures that are most commonly used and replicated in literatures. However, doubts have long been raised in using these scales and their simple adaptation versions across a broad spectrum as they can’t cover some unique features of a particular service. For example, these generic scales lack constructs (dimensions, factors) or items to capture ethical breaches, such as big data-enabled price discrimination, which is unique to ride-hailing services and also a common grievance among users. As a consequence, research emphases have transferred from adapting generic measures to developing customized measures for specific industry settings. Customized measures have attracted great research interest from practitioners and researchers in traditional transportation industries like air transport (e.g., Bezerra and Gomes 2016), railway transport (e.g., Nathanail, 2008), and especially public transport (bus and urban rail) (e.g., Wen et al., 2005; Lai and Chen, 2011; De Oña et al., 2013; Yaya et al., 2015; Soltanpour et al., 2018; Zhang et al., 2019). By contrast, customized measures for the new emerging paratransit industry of ride-hailing have received very limited research attention and were confined to a specific region that can’t apply to other geographical contexts. Therefore, this study, as the first step of an extensive research project aiming at improving ride-hailing services and attracting more users, places its emphasis on scale development and validation. The objective and contribution is twofold. Firstly, customize a multidimensional service quality measure for the less researched ride-hailing industry. Second, and more importantly, propose a generic approach (for the development and validation of a multidimensional scale) that can be readily and effectively transplanted to any service setting (any industry or region). To accomplish these objectives, this study employs a mixed-methods design combining qualitative and quantitative analysis. First, grounded theory is applied to derive service quality dimensions and items directly from user narratives through focus group discussions. These dimensions and items then undergo rigorous statistical validation via exploratory and confirmatory factor analyses. Finally, multi-group invariance testing ensures the measure’s robustness across diverse user cohorts.
This study proceeds as follows. Section “Literature review” presents the literature review. It is followed by a description of our research methods and data in Section “Methods and data”. Sections “Results” and “Discussion” present the results and a discussion, respectively. The paper ends with conclusions and a discussion of future work in Section “Conclusions and future work”.
Literature review
A comprehensive review of existing literature on ride-hailing service quality reveals a notable focus on exploring relationships with constructs like satisfaction and loyalty via structural equation modeling (SEM), while neglecting the systematic development and validation of measurement scales (Nguyen-Phuoc et al., 2020; Su et al., 2021; Akram et al., 2024; Ricardianto et al., 2024; Katili et al., 2024). Ride-hailing represents a key form of paratransit, which can take various forms (e.g., jeepneys, jitneys, samlors, taxis, etc.) besides ride-hailing. Therefore, recent studies on customized service quality measures for paratransit were extensively collected, studied and summarized in Table 1.
Contextual variability in service quality dimensions and items
As shown in Table 1, although all these measures are for paratransit and most have considered the multidimensionality nature of service quality, dimensions and items exhibit pronounced variability across service types (jitney, jeepney, taxi, ride-hailing, etc.). Even within the same paratransit service type (e.g., traditional taxis), measures differ significantly across regions. These reflect both universal priorities and unique context-specific user needs. For traditional taxis, items concerning drivers' attire and etiquette, facility cleanliness and conditions, waiting time, and journey time appear consistently across regions, underscoring their foundational role in service quality. However, exclusive items have also been derived for each specific region context like Doha (Shaaban and Kim, 2016), Santander (Alonso et al., 2018), Hong Kong (Wong and Szeto, 2018), Melbourne (Rose and Hensher, 2018), etc., reflecting unique context-specific user needs. These disparities align with the assertion of De Oña and De Oña (2015) that service aspects appreciated by users are highly dependent on their geographical area. Therefore, it is crucial to develop service quality measures that are pertinent to the service type and region context.
Ride-hailing: the hybrid challenge of physical and digital touchpoints
Ride-hailing, as a fusion of traditional taxi services and mobile technology, introduces unique measurement requirements (Shah, 2020). Studies commonly replicate taxi-related items (e.g., driving security, waiting time). However, dimensions and items concerning app use have been considered only by Nguyen-Phuoc et al. (2020), Shah (2020), Nguyen-Phuoc et al. (2021), Li et al. (2022), Kumar et al. (2022) and Boar et al. (2023), but not the case for Su et al. (2021), Shah and Hisashi (2022), Vega-Gonzalo et al. (2023) and Wang et al. (2023). This inconsistency highlights a critical gap in capturing the full spectrum of ride-hailing service quality, where both offline (driver behavior, vehicle condition) and online (app functionality, booking reliability) interactions are essential.
Neglect of measurement invariance and subgroup differences
Past practices have no or limited consideration for measure validity. With only a few exceptions (e.g., Rose and Hensher, 2018), reliability and validity have been tested more or less in most literatures, usually by exploratory factor analysis (EFA), confirmatory factor analysis (CFA), or both. However, researchers seldom consider the measurement invariance of measures. Measurement invariance is to assess whether the factor structure and parameter estimates of a measurement model are statistically the same across different user groups. As service aspects appreciated by users rely highly on their sociodemographic and travel characteristics (e.g., travel reason) (De Oña and De Oña, 2015), failing to test invariance limits the generalizability of findings across different cohorts. Therefore, it is imperative to test measurement invariance across different cohorts of gender, income, trip purpose, etc.
Overreliance on expert-driven design and second-hand knowledge
Methodologically, most studies (Sumaedi et al., 2012; Rose and Hensher, 2018; Wong and Szeto, 2018; Nguyen-Phuoc et al., 2020; Shah, 2020; Nguyen-Phuoc et al., 2021; Su et al., 2021; Kumar et al., 2022; Halakoo et al., 2022; Li et al., 2022; Vega-Gonzalo et al., 2023; Wang et al., 2023; Boar et al., 2023; Ricardianto et al., 2024) followed researchers’ or experts’ own judgments in selecting and adapting dimensions and items from existing literatures and materials to construct a scale. Even when incorporating minority users’ judgments through a pilot survey (e.g., Shah, 2020; Askari et al., 2021; Kumar et al., 2022), still anchored in existing literatures and materials. Participants’ perspectives on the interested social phenomenon should unfold as participants view it, not as researchers view it (Marshall and Roseman, 1989). It goes against the philosophy of “customers are the sole judges of service quality” as dimensions and items considered important by researchers/experts were chosen instead of users/customers. More importantly, existing literatures and materials are second-hand knowledge that couldn’t authentically and comprehensively represent service aspects appreciated by users in the specific context. Scales grounded in expert-driven design and second-hand knowledge may introduce conceptual risks: misinterpreting customer perceptions, overlooking context-specific needs, and weakening the theoretical validity of measurement tools. Such limitations underscore the need for methodologies that center user-generated insights to ensure scales authentically reflect the service aspects that users themselves deem critical.
The current study seeks to fill these gaps by holding focus groups to collect first-hand knowledge of users’ experiences and perceptions on ride-hailing from which measure dimensions and items will then be extracted via grounded theory coding techniques, and by testing measurement invariance across a wide range of variables via multi-group confirmatory factor analysis (MGCFA) when validating the measure. It takes Suzhou, a prefecture-level city of Jiangsu Province in eastern China, as an example. Our selection of Suzhou serves dual purposes: 1) validating the operational feasibility of the generic approach through concrete implementation, and 2) demonstrating its effectiveness in generating customized measurement tools. The city exhibits unique demographic and economic characteristics that differ from other cities (e.g., 65.2% of its population being immigrants and a GDP ranking 6th nationally). This may lead to divergence in service quality indicators prioritized by ride-hailing users in Suzhou and in other cities, necessitating a tailored measure rather than using the generic measures for Suzhou. To our knowledge, it’s the first customized measure research for the specific context of ride-hailing in Suzhou.
Methods and data
Dimensions and items generation
Intangibility, production-consumption inseparability, and heterogeneity make measuring service quality very complex (Yaya et al., 2015). For such a complex issue, grounded theory which allows the natural emergence of theory from data via systematic coding procedures is an effective inductive tool. It derives dimensions and items directly from user narratives rather than drawing and retrofitting from existing literatures and materials, which can accurately grasp user-articulated pain points like ethical breaches (e.g., big data-enabled price discrimination exploiting frequent users). This perfectly aligns with the core need to identify unknown service quality dimensions and items from unstructured consumer experiences when developing scales. Besides, the constant comparison method transforms fragmented user feedback into a hierarchical conceptual system (i.e., main theme → themes (dimensions) → categories (indicators)), directly supporting the design of a hierarchical indicator framework for scales. In contrast, alternative methodologies exhibit critical limitations. Ethnography requires long-term immersion to observe the holistic culture of ride-hailing systems (e.g., driver communities, platform rules), producing thick descriptions rather than operationalizable indicators, which are difficult to translate into measurable scale items. Phenomenology focuses on revealing the essence of subjective experiences, but scales require cross-group stable measurable variables. Overemphasis on individual perceptions may compromise scale generalizability across different subgroups. The method chosen to generate measure dimensions and items here favors the original grounded theory proposed by Glaser and Strauss (1967). It lays emphasis on the bottom-up extraction of themes without using the predetermined frameworks favored by Strauss and Corbin (1990), which follows certain data collection and data analysis strategies.
Data collection
In-depth interviews and focus groups are the most commonly used qualitative data collection methods (Mars et al., 2016). Focus groups allow participants to respond to each other’s comments, helping to generate new concepts. Considering this merit and data collection efficiency, multiple rounds of focus groups were held to collect first-hand knowledge of users’ experiences and perceptions based on their day-to-day ride-hailing practices. A recruitment notice was posted to residents in Suzhou via the WeChat app in November 2022. People who had chanced upon the notice and were interested in participating could contact and inform their gender, age, ride-hailing adoption, and usage frequency. Those who hadn’t used ride-hailing in recent three months were excluded to make sure all recruited participants could vividly remember their ride-hailing experiences and perceptions in Suzhou. According to suggestions of 3–5 focus groups per study and 6-10 participants per group from Morgan (1992), stratified random sampling was applied to engage the participation of 32 users in each round of four focus group discussions (Table 2). We created age-homogeneous groups to make each relatively similar in composition, while the gender and usage frequency makeup within each group was heterogenized to facilitate diverse voices and views. Each recruited participant signed an informed consent form and was rewarded with a ¥150 gift card for attendance. Four topics were successively introduced in each group to prompt the discussion: (a) instances of and reasons for satisfaction/dissatisfaction when using ride-hailing services in Suzhou; (b) descriptions of an ideal ride-hailing service; (c) factors important in evaluating ride-hailing service quality; (d) performance expectations concerning ride-hailing services. Focus group discussions were held on December 10, 2022 (Saturday), and each lasted nearly two hours, which were digitally recorded (audio) and transcribed verbatim.
Data analysis
Kurniawan et al. (2018) pointed out that “respondents’ verbal expressions which signify events, actions, reactions, beliefs, values, attitudes, aspirations, deliberations, concerns, experiences and feelings are instances of code-worthy data”. These expressions in the original transcripts were analyzed through substantive and theoretical coding procedures following the guidelines provided by Glaser and Strauss (1967) to uncover key concepts, categories, themes, and their interrelationships. During the substantive coding stage, the code-worthy data were coded with straightforward terms which were then further refined into concepts and categories through constant comparison. Constant comparison is a defining characteristic of grounded theory, which identifies codes in the data and constantly compares them to previously identified codes, thereby revealing patterns in the data and allowing new concepts and categories to emerge where statements do not fit any of the currently identified ones (Glaser, 1992). Once the categories have thus been developed, the theoretical coding began. Again, using constant comparison, relations were sought between categories, allowing them to be merged and combined to form superordinate themes that would eventually develop into the main theme. Ultimately, after 7 rounds of focus group discussions, the saturation state that new data only supported previously identified concepts and didn’t introduce any new ones marks the end of coding procedures. To ensure the agreement of coding, this whole process was performed by two different teams. Intercoder reliability was assessed using Cohen’s Kappa, with discrepancies discussed and adjusted through iterative consensus meetings. Examples of code-worthy sentences and their analysis results are shown in Table S2 in the supplementary information file. The analysis ultimately yielded 103 concepts that were sorted into 12 categories and further refined into 5 themes. Therefore, a 5-dimension 12-item scale was preliminarily developed (Fig. 1). It is essentially a hypothesis (this 5-dimension 12-item scale can effectively measure the ride-hailing service quality in Suzhou) that needs to be empirically tested with large samples. The next step is to perform a questionnaire survey and test this hypothesis based on collected samples.
Questionnaire design and data collection
The questionnaire was designed on a professional online survey platform in China named “Wenjuanxing”. At the very beginning, a screening question was asked to exclude respondents who hadn’t used ride-hailing in Suzhou in recent three months. The 12 items derived in Section “Dimensions and items generation” were recast into scale statements. A five-point Likert scale ranging from “strongly disagree” (1) to “strongly agree” (5) was adopted to derive users’ perception on these attributes. In addition, the survey also contained questions about a range of relevant sociodemographic and travel characteristics including gender, age, occupation, monthly household income, private car ownership, ride-hailing usage frequency, ride-hailing usage scenario (occasion/reason for travel, trip purpose), choice of alternative mode of transportation if ride-hailing services were unavailable (alternative travel mode, mode substituted), ride-hailing trip timing (time of day), as well as questions not pertinent to the present study. Before the full-scale survey, a paper-based pilot survey was carried out with 30 employees working in CCDI (Suzhou) Exploration & Design Consultant CO., Ltd. and corrections effected.
The survey link was shared to residents in Suzhou through the WeChat app from January 2, 2023, to February 3, 2023. WeChat red packets (an e-form money) were provided as a gift by lottery for participation in the survey. Questionnaires with extremely short time, inconsistent-, nonlogical- and incomplete answers, and straight-lining answer patterns were removed. For questions having the option “others”, some participants chose “others” and refused to give their exact answers further. These questionnaires were treated as incomplete and were also removed, leaving a total of 1464 valid questionnaires that constituted the analytical sample.
Scale purification and assessment
EFA can aid in unveiling the real factor structure represented by a series of measure items. Therefore, an EFA with varimax rotation (orthogonal rotation) was conducted in SPSS Statistics 22 to suggest a more likely factor structure for the 12 items. Before the analysis, (a) univariate and multivariate normality, and multicollinearity of these 12 items were checked via skew, kurtosis, Mardia’s multivariate kurtosis, and VIF (variance inflation factor); (b) their internal consistency was tested via Cronbach’s alpha; (c) their appropriateness for EFA was assessed by KMO and Bartlett’s test.
Once knowing the actual factor structure, we specified a CFA in Amos 24 to evaluate this structure. The maximum likelihood method with a bootstrap procedure (a resampling of 5000) was selected. The bootstrap procedure allows to solve the violation of normality (Kline, 2015). The measurement model must have acceptable model fit, reliability and validity, or else it should be adjusted. Model fit was assessed with the indexes most used in literatures, namely, χ2 (chi-square), df (degrees of freedom), CFI (comparative fit index), RMSEA (root mean square error of approximation), TLI (Tucker–Lewis index) and GFI (goodness of fit index) (Jackson et al., 2009). The cut-offs are CFI, TLI, and GFI > 0.9 while RMSEA < 0.08 (Hu and Bentler, 1999). Reliability and validity were checked via indicator reliability, internal consistency reliability, convergent validity, and discriminant validity (Hair et al., 2009; Urbach and Ahleman, 2010). The criteria are shown in Table 3. Poor model fit, reliability, and validity may be associated with items of low factor loadings and large modification index values. The methodological proposal of Chen and Hwang (2006) was followed for model adjustment. Firstly, delete items with factor loadings smaller than 0.45. Then, delete items with large modification index values. The model fit should be checked every time an item is deleted. Besides, offending estimates like negative or nonsignificant variances, standardized factor loadings in absolute value above 1.0 were checked (Marsh et al., 1998; Hair et al., 2009; Kolenikov and Bollen, 2012).
Measurement invariance
To further probe measure validity, invariance tests were performed across different subgroups of gender, age, occupation, monthly household income, private car ownership, usage frequency, usage scenario, alternative travel mode, and trip timing by means of MGCFA in AMOS 24.
It involves a sequence of hierarchical steps that start with identifying a baseline model (Byrne, 2012). The measurement model obtained through scale purification and assessment was estimated separately in each subgroup to examine its suitability (model fit, factor loadings) as the baseline model. After completing this preliminary task, the configural invariance test (test for the equivalence of factor structure) was conducted by examining the CFI, RMSEA, TLI, GFI of the unconstrained model (Table 4). According to Byrne (2012), factor loadings, factor covariances, factor variances are key and most commonly tested parameters in determining measurement invariance, while error variance equivalence is now widely accepted as an excessively rigorous test. Therefore, three further invariance tests—factorial invariance, factor covariance invariance, factor variance invariance (tests for the equivalence of parameter estimates)- were performed by comparing a series of nested models (2-1, 3-2, 4-2 in Table 4). The difference between two nested models is significant or not can be tested via chi-square test (calculate change in χ2 and df between nested models, and get the corresponding p-value based on Δχ2 and Δdf) (Long, 1983). This chi-square test with p-value > 0.05 contends that the corresponding null hypothesis in Table 4 should be accepted. If it is rejected due to p-value < 0.05, a practical criterion of ΔCFI in absolute value < 0.01 (ΔCFI is the change in CFI between two nested models which is a robust statistic to test the between-group invariance) indicates that the difference is largely unsubstantial and thus the hypothesis should still be accepted (Cheung and Rensvold, 2002). A change of CFI in absolute value > =0.01, supplemented by a change of RMSEA in absolute value > =0.015, would indicate nonequivalence (F. F. Chen, 2007).
Results
Results of preliminary statistical analysis
Ride-hailing adoption by sociodemographic and travel characteristics is presented in Fig. 2. Respondents were mainly females (54.7%) and aged below 40 (69.7%). More than half were private company workers (51.3%). The monthly household income suggested a skew towards low-to-medium segments (64.2% below ¥20,000). Car ownership rate was high among respondents, as 64.4% reported having one. Almost two-thirds (66.3%) reported having used ride-hailing at least three times a week. The usage scenario revealed a diversified trend with commute and business trips significantly higher than the others. In the absence of ride-hailing, most respondents reported that they would choose traditional taxis (41.8%) and public transport (41.5%) as alternative modes, which is significantly higher than private cars (16.7%). Most trips were made during 6:30–19:00, while trips during late evening and night periods were substantially less. Non-parametric tests-of-difference were applied to check whether different sociodemographic cohorts displayed significantly different levels of “usage frequency”. As shown in Table 5, users who are females, >55, student, and private company workers make <¥10,000 a month within the household, and have a private car used ride-hailing less frequently. This trend may stem from multiple barriers: safety concerns (e.g., fear of harassment or assault, particularly among women due to media-reported incidents), limited tech literacy (e.g., older users struggling with app use), cost sensitivity (e.g., students and low-income groups prioritizing affordability over convenience), and unpredictable schedules (e.g., workers relying on cheaper, reliable fixed-route alternatives). Ride-hailing operators (Transportation Network Companies or TNCs such as Didi, Uber) and officials can tailor appropriate tactics to increase usage frequency of these groups if they want to further augment company revenue, reduce private car ownership and use, reduce parking demands, etc.
Ride-hailing users in Suzhou seemed to hold a relatively satisfied attitude towards the service, but it still had a considerable margin for improvement as 3 out of 12 attributes (i.e., deviation, price markup and discrimination, detour) didn’t exceed the passing score of 3.00 (Fig. 3). “Deviation” may stem from algorithmic limitations (e.g., outdated traffic data or failure to account for real-time variables like weather) or opaque communication, which users perceive as intentional manipulation. “Price markup and discrimination” may arise from unregulated surge pricing and algorithmic profiling (e.g., charging frequent users higher rates). For “detour”, driver incentives to inflate fares and poor real-time oversight may be key drivers. To improve the overall quality of ride-hailing service, these three weaknesses should be addressed firstly.
Results of exploratory factor analysis and confirmatory factor analysis
Skew ∈ (-0.903, 0.540) below 1 and kurtosis ∈ (−1.331, 0.736) below 2 in absolute values demonstrated that all items were approximately univariate normal. Mardia’s multivariate kurtosis = 52.821 (critical ratio = 55.128 > 5.0) indicated a deviation from multivariate normality, supporting the use of bootstrap. VIF of all items ∈ (1.436, 2.849) < 5, confirming the absence of multicollinearity. Cronbach’s alpha = 0.755 > 0.7 confirmed the internal consistency. As KMO = 0.851 > 0.7 and significance level = 0.000 < 0.01 for Bartlett’s test, the data presented a good level of adequacy for EFA. The EFA results are summarized in Table 6. It revealed 3 factors (instead of 5 indicated by grounded theory) having eigenvalues ≥ 1, which were labeled as “Service”, “Integrity”, “Efficiency” and together explained 68.600% of the variance.
According to the EFA results, a 3-dimension 12-item measurement model was built in Amos 24 (Fig. 4). It achieved an adequate model fit (χ2 = 273.627, df = 51, CFI = 0.972 > 0.9, RMSEA = 0.055 < 0.08, TLI = 0.964 > 0.9, GFI = 0.970 > 0.9). The results of reliability and validity are summarized in Table 7. With a minimum of 0.569 and a maximum of 0.914, all factor loadings > 0.5. All p-values were below 0.05. Indicator reliability was therefore met. AVE were all above 0.5, CA, CR were constantly above 0.7, meeting the ideal requirements of convergent validity and internal consistency reliability. Discriminant validity was proved as all correlations between constructs were smaller than the corresponding square root of AVE. Besides, all variances were positive and significant, and all standardized factor loadings were below 1.0, indicating no offending estimates. Therefore, it’s perfectly reasonable to take this 3-dimension 12-item measure as the baseline model for the following analysis.
Results of multi-group confirmatory factor analysis
The 3-dimension 12-item model was estimated separately in each subgroup. All model fit indexes complied with the cut-off values (Table 8), and all factor loadings were above 0.5 except for the “deviation” of several subgroups (i.e., students, 20,001¥−50,000¥, not more than twice a week, business trips, escorting children to and from school, 16:30–19:00, 19:00–6:30), but still above 0.45, confirming the suitability of this 3-dimension 12-item model as the baseline model.
The results of measurement invariance tests are presented in Table 9. Columns at the left show that all models, especially the unconstrained models, achieved adequate model fit. Therefore, the configural invariance was verified. The comparison results of nested models are presented in columns at the right. For nested models with p-value > 0.05, the corresponding invariance hypotheses were thus confirmed. For nested models with p-value < 0.05, all ΔCFI in absolute values were below 0.01 (except for the nested model 4-2 of usage frequency), indicating that the corresponding invariance hypotheses should be accepted in practice. Although ΔCFI for the nested model 4-2 of usage frequency in absolute value was 0.011 > 0.01, the absolute change of RMSEA was 0.003 < 0.015 indicates the difference was unsubstantial and the null hypothesis thus should not be rejected. To sum up, the factor structure, factor loadings, factor covariances, and variances of the 3-dimension 12-item model were invariant across different subgroups of gender, age, occupation, monthly household income, private car ownership, usage frequency, usage scenario, alternative travel mode and trip timing, which further confirmed the validity of this measure.
Discussion
Theoretical Implications
The resulted 3-dimension 12-item structure confirms the necessity of developing an industry-specific scale customized to a specific geographical context and verifies the feasibility and validity of the proposed generic approach in customizing a multidimensional scale. As expected, this customized measure is quite different from generic measures (SERVQUAL, SERVPERF, E-S-QUAL), and customized measures for ride-hailing industry in other geographical contexts (e.g., Nguyen-Phuoc et al., 2021; Kumar et al., 2022; Shah and Hisashi, 2022; Vega-Gonzalo et al., 2023; Wang et al., 2023) and for other industries (e.g., Bezerra and Gomes, 2016; Soltanpour et al., 2018; Tiglao et al., 2020). These differences have reversely verified the necessity of developing customized scales. Despite these differences, essential items mentioned in most transport service literatures like crews attire and etiquette, facility cleanliness, and conditions are replicated here none the less. All other items (except for “price markup and discrimination”) can find identical or similar conceptions in the published ride-hailing, taxi or other paratransit literatures. Items here bearing such a striking similarity to previous literatures offer a strong evidence that the grounded theory approach is effective in obtaining scientifically sound measure items from the first-hand knowledge of customers’ thoughts. The joint use of EFA, CFA, and MGCFA demonstrates feasibility and validity in further refining, assessing and testing the scale, as a meaningful and valid multidimensional scale represented by Service, Integrity, and Efficiency has finally resulted. The reason why “price markup and discrimination” is included may be because this phenomenon in Suzhou is so much more severe than other places that users constantly bring it up in focus group discussions, which is also verified by its lowest score among the 12 items (2.53). By introducing this item as a critical determinant of perceived quality, “Integrity” emerges as a standalone dimension which fundamentally challenges and extends classical service quality frameworks such as SERVQUAL and SERVPERF. Unlike traditional dimensions (e.g., tangibility, reliability), “Integrity” incorporates algorithmic fairness into the theoretical system, explicitly addressing the socio-technical inequities inherent in digitally mediated services.
Practical implications
Ride-hailing adoption is more prevalent in women and those who work in private companies, while men and people who work in government and public institutions tend to use it with higher frequency. In common with many preceding research (e.g., Clewlow and Mishra, 2017; Lavieri and Bhat, 2019), ride-hailing adopters and frequent users tend to be younger in Suzhou. This may result from the fact that younger individuals have more exposure to new technologies, products, and services due to their more extensive social networks. Although ride-hailing is often cited as a possible mobility solution for the aging population, the older’s low adoption and usage frequency suggest that there are significant hurdles to overcome. The majority of ride-hailing users in Suzhou are in low-to-medium income segments (<¥20,000), while the usage frequency is lowest in the low-income segment (<¥10,000). Most ride-hailing users own private cars but have a lower usage frequency. These two results suggest that ride-hailing for them (low-income segment and users owning private cars) may serve as more of a convenience feature for one-off trips rather than being an accessibility facilitator for routine trips. The former is probably because of their low income and consumption power, while the latter may rely more on private cars for daily travel. Similar results were found by Lavieri and Bhat (2019) as well. To address the uneven ride-hailing adoption, TNCs and urban planners should implement safety-focused features like real-time SOS alerts and certified “Women-Safe Driver” programs, enhance affordability through subscription plans or income-tiered discounts, and simplify accessibility via lite app versions or SMS-based booking for tech-averse users. Additionally, targeted community engagement—such as workshops demonstrating time-saving benefits for students or partnerships with safety advocates—could rebuild trust and drive adoption. By systematically addressing safety, cost, and accessibility barriers, TNCs can expand their reach to these underserved segments while promoting inclusive mobility. To address low scores for deviation, platforms should integrate real-time data (e.g., traffic sensors, weather APIs) and offer in-app explanations for discrepancies (e.g., “30% delay due to congestion”), coupled with automatic refunds for fare deviations exceeding 10%. Solutions tackle low scores for price markup discrimination include government-mandated surge caps (max 1.5x base fare), third-party audits to detect biased pricing patterns, and user controls like “wait-for-lower-fare” options. Implementing GPS-based route compliance checks (penalizing deviations >5% without consent) and empowering passengers with in-app alerts for route changes can mitigate the detour issue, alongside bonus programs for drivers adhering to recommended paths.
This 3-dimension 12-item scale offers a highly effective tool for evaluating the ride-hailing service quality in Suzhou. Leveraging this scale, operators are enabled to precisely compute the overall service quality score of Suzhou’s ride-hailing services through the weighted average approach. By juxtaposing scores from different time periods, one can intuitively assess the enhancement of service quality and the efficacy of diverse service quality improvement initiatives. A cross-city comparison of this total score with those of other cities allows for a clear analysis of the strengths and weaknesses of Suzhou’s ride-hailing service quality at a national level. Moreover, through the application of the IPA (Importance-Performance Analysis) method, by integrating indicator weights with scores, the priority of various improvement indicators can be ascertained. This, in turn, furnishes operators with a robust foundation for optimizing services in a targeted fashion.
Conclusions and future work
On the theoretical front, this study contributes to propose a generic approach (for the development and validation of a multidimensional scale) applicable to any service setting (not limited to ride-hailing in Suzhou). We conclude the approach as follows: (1) collect first-hand knowledge of users’ experiences and perceptions on the service studied by focus group discussions and extract measure dimensions and items by grounded theory coding techniques; (2) purify and assess the scale based on the criteria of model fit, reliability and validity by EFA and CFA; (3) test measurement invariance across different user cohorts by MGCFA. The first use of grounded theory in scale development offers new insights into the approach. First-hand knowledge makes measure derived possess conceptual soundness and indeed interpret customer perceptions. Measurement invariance tests help to further confirm the measure validity. On the practical side, this study identified a 3-dimension 12-item measure for the less researched ride-hailing service quality in Suzhou context. Additionally, the Suzhou data also unveil uneven ride-hailing adoption and usage frequency across different sociodemographic cohorts. All these contribute to advance the knowledge of transportation officials, urban planners and TNC operators about ride-hailing services in Suzhou, which will help improve the services and attract more users.
The current study has some limitations that suggest directions for further research. Firstly, the reliance on self-reported data may introduce biases such as social desirability and memory inaccuracies. Future studies should incorporate objective measures, such as real-time usage data or behavioral analytics, to complement self-reported findings. Secondly, The influence of culture on perceived service quality has not been adequately addressed. Cross-cultural studies should be conducted in the future to explore how cultural differences shape user expectations and evaluations of ride-hailing services. Finally, the study does not explore dynamic aspects of service quality, such as real-time feedback mechanisms or longitudinal changes in user expectations. Future research should investigate how service quality perceptions evolve over time and how platforms can adapt to these changes.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Akram U, Lavuri R, Mathur S (2024) Hey boomer, “your ride has arrived”: Are you willing to continue using the ride-hailing app? J Retail Consum Serv 77:103678. https://doi.org/10.1016/j.jretconser.2023.103678
Alonso B, Barreda R, Dell’Olio L et al. (2018) Modelling user perception of taxi service quality. Transp Policy 63:157–164. https://doi.org/10.1016/j.tranpol.2017.12.011
Askari S, Peiravian F, Tilahun N et al. (2021) Determinants of users’ perceived taxi service quality in the context of a developing country. Transp Lett 13:125–137. https://doi.org/10.1080/19427867.2020.1714844
Berry LL, Zeithaml VA, Parasuraman A (1990) Five imperatives for improving service quality. Sloan Manag Rev 31:9–38
Bezerra GCL, Gomes CF (2016) Measuring airport service quality: a multidimensional approach. J Air Transp Manag 53:85–93. https://doi.org/10.1016/j.jairtraman.2016.02.001
Boar A, Bastida R, Marimon F (2023) Sustainable development goals and quality practices: a winning combination for customer loyalty in ride-hailing companies. Transp Lett 15:1–12. https://doi.org/10.1080/19427867.2023.2233213
Byrne BM (2012) Structural Equation Modeling with AMOS: Basic Concepts, Applications, and Programming, 3rd ed. Routledge Taylor & Francis Group, New York
Chen FF (2007) Sensitivity of goodness of fit indexes to lack of measurement invariance. Struct Equ Model 14:464–504. https://doi.org/10.1080/10705510701301834
Chen YJ, Hwang FM (2006) Assessing measurement invariance of organizational citizenship behavior scale between male and female staff. Psychol Test 53:297–325
Cheung GW, Rensvold RB (2002) Evaluating goodness-of-fit indexes for testing measurement invariance. Struct Equ Model 9:233–255. https://doi.org/10.1207/S15328007SEM0902_5
Chin WW (1998) The partial least squares approach to structural equation modelling. In: Modern Methods for Business Research. Lawrence Erlbaun Associates, Mahwah, p 295–336
Clewlow RR, Mishra GS (2017) Disruptive transportation: the adoption, utilization, and impacts of ride-hailing in the United States, Institute of Transportation Studies, University of California, Davis Research Report UCD-ITS-RR- 17-07
Cronin JJ, Taylor SA (1992) Measuring service quality: a reexamination and extension. J Mark 56:55–68
De Oña J, De Oña R (2015) Quality of service in public transport based on customer satisfaction surveys: a review and assessment of methodological approaches. Transp Sci 49:605–622. https://doi.org/10.1287/trsc.2014.0544
De Oña J, De Oña R, Eboli L et al. (2013) Perceived service quality in bus transit service: a structural equation approach. Transp Policy 29:219–226. https://doi.org/10.1016/j.tranpol.2013.07.001
Fornell C, Larcker DF (1981) Evaluating structural equation models with unobservable variables and measurement error. J Mark Res 18:39–50. https://doi.org/10.2307/3151312
Glaser BG (1992) Basics of grounded theory analysis: emergence vs. forcing. Sociology Press
Glaser BG, Strauss AL (1967) The discovery of grounded theory: strategies for qualitative research. Aldine Publishing Company, Chicago
Hair JF, Black WC, Babin BJ, Anderson RE (2009) Multivariate data analysis, 7th ed. Prentice Hall, Upper Saddle River
Halakoo M, Mesbah M, Habibian M et al. (2022) Modelling quality of service in a fixed route shared taxi (Jitney). Case Stud Transp Policy 10:1988–2000. https://doi.org/10.1016/j.cstp.2022.08.010
Hu LT, Bentler PM (1999) Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model 6:1–55. https://doi.org/10.1080/10705519909540118
Jackson DL, Gillaspy JA, Purc-Stephenson R (2009) Reporting practices in confirmatory factor analysis: an overview and some recommendations. Psychol Methods 14:6–23. https://doi.org/10.1037/a0014694
Joewono TB, Kubota H (2007) User satisfaction with paratransit in competition with motorization in indonesia: anticipation of future implications. Transportation 34:337–354. https://doi.org/10.1007/s11116-007-9119-7
Katili FA, Robby FA, Handayani PW (2024) The influence of the ride hailing apps loyalty program on customer loyalty: a case study in Indonesia. Transp Res Interdiscip Perspect 26:101141. https://doi.org/10.1016/j.trip.2024.101141
Kline RB (2015) Principles and practice of structural equation modeling. The Guilford Press, New York
Kolenikov S, Bollen KA (2012) Testing negative error variances: Is a heywood case a symptom of misspecification? Socio Methods Res 40:124–167. https://doi.org/10.1177/0049124112442138
Kumar A, Gupta A, Parida M et al. (2022) Service quality assessment of ride-sourcing services: a distinction between ride-hailing and ride-sharing services. Transp Policy 127:61–79. https://doi.org/10.1016/j.tranpol.2022.08.013
Kurniawan JH, Ong C, Cheah L (2018) Examining values and influences affecting public expectations of future urban mobility: a Singapore case study. Transp Policy 66:66–75. https://doi.org/10.1016/j.tranpol.2017.12.014
Lai WT, Chen CF (2011) Behavioral intentions of public transit passengers-the roles of service quality, perceived value, satisfaction and involvement. Transp Policy 18:318–325. https://doi.org/10.1016/j.tranpol.2010.09.003
Lavieri PS, Bhat CR (2019) Investigating objective and subjective factors influencing the adoption, frequency, and characteristics of ride-hailing trips. Transp Res Part C Emerg Technol 105:100–125. https://doi.org/10.1016/j.trc.2019.05.037
Lederer AL, Sethi V (1992) Root causes of strategic information systems planning implementation problems. J Manag Inf Syst 9:25–45. https://doi.org/10.1080/07421222.1992.11517946
Lewis BR, Snyder CA, Rainer RK (1995) An empirical assessment of the information resource management construct. J Manag Inf Syst 12:199–223. https://doi.org/10.1080/07421222.1995.11518075
Li X, Du M, Zhang Y, Yang J (2022) Identifying the factors influencing the choice of different ride-hailing services in Shenzhen, China. Travel Behav Soc 29:53–64. https://doi.org/10.1016/j.tbs.2022.05.006
Long JS (1983) Confirmatory factor analysis: a preface to LISREL. Sage Publications
Mars L, Arroyo R, Ruiz T (2016) Qualitative research in travel behavior studies. Transp Res Procedia 18:434–445. https://doi.org/10.1016/j.trpro.2016.12.057
Marsh HW, Hau KT, Balla JR et al. (1998) Is more ever too much? The number of indicators per factor in confirmatory factor analysis. Multivar Behav Res 33:181–220. https://doi.org/10.1207/s15327906mbr3302_1
Marshall C, Roseman G (1989) Designing qualitative research. Sage Publications, Newbury Park
Morgan DL (1992) Designing focus group research. In: Tools for primary care research. Sage Publications, Thousand Oaks
Nathanail E (2008) Measuring the quality of service for passengers on the Hellenic railways. Transp Res Part A Policy Pr 42:48–66. https://doi.org/10.1016/j.tra.2007.06.006
Nguyen-Phuoc DQ, Su DN, Tran PTK et al. (2020) Factors influencing customer’s loyalty towards ride-hailing taxi services – A case study of Vietnam. Transp Res Part A Policy Pr 134:96–112. https://doi.org/10.1016/j.tra.2020.02.008
Nguyen-Phuoc DQ, Vo NS, Su DN et al. (2021) What makes passengers continue using and talking positively about ride-hailing services? The role of the booking app and post-booking service quality. Transp Res Part A Policy Pr 150:367–384. https://doi.org/10.1016/j.tra.2021.06.013
Nunnally JC, Bernstein IH (1994) Psychometric Theory, 3rd ed. McGraw-Hill, New York
Parasuraman A, Zeithaml VA, Berry LL (1988) Servqual: a multiple-item scale for measuring consumer perception of service quality. J Retail Spring 64:12–40
Parasuraman A, Zeithaml VA, Malhotra A (2005) ES-QUAL: a multiple-item scale for assessing electronic service quality. J Serv Res 7:213–233
Ricardianto P, Ikhsan RB, Suryobuwono AA et al. (2024) What makes consumers attitudinal loyalty on ride-hailing services? An investigation Indonesian consumers’ perceived safety in using ride-hailing apps. J Open Innov Technol Mark Complex 10:100306. https://doi.org/10.1016/j.joitmc.2024.100306
Rose JM, Hensher DA (2018) User satisfaction with taxi and limousine services in the Melbourne metropolitan area. J Transp Geogr 70:234–245. https://doi.org/10.1016/j.jtrangeo.2018.06.017
Shaaban K, Kim I (2016) Assessment of the taxi service in Doha. Transp Res Part A Policy Pr 88:223–235. https://doi.org/10.1016/j.tra.2016.04.011
Shah SAH, Hisashi K (2022) Analyzing travelers’ attitude towards ride-hailing services in developing countries: case of Lahore, Pakistan. IATSS Res 46:223–235. https://doi.org/10.1016/j.iatssr.2021.12.006
Shah TR (2020) Service quality dimensions of ride-sourcing services in Indian context. Benchmarking Int J 28:249–266. https://doi.org/10.1108/BIJ-03-2020-0106
Soltanpour A, Mesbah M, Habibian M (2018) Perceived Service Quality in Urban Rail Transit: a Comparison of Structural Equation Models. Paper presented at the Transportation Research Board 97th Annual Meeting
Straub DW (1989) Validating instruments in MIS research. MIS Q Manag Inf Syst 13:147–165. https://doi.org/10.2307/248922
Strauss AL, Corbin JM (1990) Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Sage Publications
Su DN, Nguyen-Phuoc DQ, Johnson LW (2021) Effects of perceived safety, involvement and perceived service quality on loyalty intention among ride-sourcing passengers. Transportation 48:369–393. https://doi.org/10.1007/s11116-019-10058-y
Sumaedi S, Mahatma IG, Bakti Y, Yarmen M (2012) The empirical study of public transport passengers’ behavioral intentions: the roles of service quality, perceived sacrifice, perceived value, and satisfaction (case study: paratransit passengers in Jakarta, Indonesia). Int J Traffic Transp Eng 2:83–97
Tiglao NCC, De Veyra JM, Tolentino NJY et al. (2020) The perception of service quality among paratransit users in Metro Manila using structural equations modelling (SEM) approach. Res Transp Econ 83:100955. https://doi.org/10.1016/j.retrec.2020.100955
Urbach N, Ahleman F (2010) Structural equation modeling in information systems research using Partial Least Squares. J Inf Technol Theory Appl 11:5–40
Vega-Gonzalo M, Aguilera-García Á, Gomez J et al. (2023) Traditional taxi, e-hailing or ride-hailing? A GSEM approach to exploring service adoption patterns. Transportation. https://doi.org/10.1007/s11116-022-10356-y
Wang T, Zhang Y, Li Y et al. (2023) Determinants of passenger satisfaction with ride-hailing services: case study in Nanjing, China. J Urban Plan Dev 149:1–15. https://doi.org/10.1061/jupddm.upeng-3960
Wen CH, Lan LW, Cheng HL (2005) Structural equation modeling to determine passenger loyalty toward intercity bus services. Transp Res Rec 249–255. https://doi.org/10.3141/1927-28
Wong RCP, Szeto WY (2018) An alternative methodology for evaluating the service quality of urban taxis. Transp Policy 69:132–140. https://doi.org/10.1016/j.tranpol.2018.05.016
Yaya LHP, Fortià MF, Canals CS et al. (2015) Service quality assessment of public transport and the implication role of demographic characteristics. Public Transp 7:409–428. https://doi.org/10.1007/s12469-014-0099-7
Zhang C, Cao X, Nagpure A et al. (2019) Exploring rider satisfaction with transit service in Indore, India: an application of the three-factor theory. Transp Lett 11:469–477. https://doi.org/10.1080/19427867.2017.1398484
Acknowledgements
This work was funded by the Academic Leader under Jiangsu Qinglan Project of China (5010712124), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (24KJB470021), and the Innovation and Entrepreneurship program of Jiangsu Province (JSSCBS20211126).
Author information
Authors and Affiliations
Contributions
Jin Li: Conceptualization, Methodology, Formal analysis and investigation, Data curation, Writing-original draft preparation; Xiaomei Xu: Conceptualization, Methodology, Writing-original draft preparation, Writing-review and editing, Funding acquisition; Hui Xu2: Funding acquisition, Supervision, Project administration; Yali Song: Formal analysis and investigation, Data curation.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
Ethical approval for this study was granted by the ethics review panel convened by the Academic Committee at Suzhou City University on October 13, 2022. The approval scope explicitly covers all methodology involving human participants detailed in the research protocol, including: (1) audio-recorded focus group discussions with Suzhou ride-hailing users to capture their service experiences and perceptions, and (2) anonymous questionnaire surveys collecting sociodemographic characteristics, travel characteristics, and service perception ratings. All procedures complied with institutional and national ethical standards, the Declaration of Helsinki (1964), and its subsequent amendments.
Informed consent
Informed consent was obtained from all participants prior to involvement. For focus group discussions, written informed consent was secured on December 10, 2022, using Chinese-language forms detailing: research objectives (audio-recording their discussions on ride-hailing experiences/perceptions), privacy safeguards (anonymization and post-transcription audio destruction), voluntary participation rights, and data usage (academic purposes). For the questionnaire survey, digital informed consent was obtained via a mandatory agreement embedded at the beginning of questionnaires, requiring participants to check “I understand and voluntarily participate” before answering questions (Date: from January 2, 2023, to February 3, 2023). The survey maintained strict anonymity (no collection of names, IDs, or identifiers), with all data confidentially maintained for academic purposes.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, J., Xu, X., Xu, H. et al. Measuring customer-perceived service quality in the ride-hailing industry: a generic approach for the development and validation of a multidimensional scale. Humanit Soc Sci Commun 12, 1570 (2025). https://doi.org/10.1057/s41599-025-05902-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1057/s41599-025-05902-9