Main

Poor sanitation is an enduring public health challenge, with 1.5 billion people globally lacking access to a basic toilet1. Studies of the effectiveness of sanitation improvements often focus on disease2 and behaviour change3. Sustainable Development Goal (SDG) 6.2 measures progress on sanitation by type of toilet service, which is important and objectively measurable1. Improvements in people’s subjective sanitation experiences are also important but rarely rigorously measured. These experiences could be things that happen to, or are felt by, people while they carry out sanitation practices. Sanitation-related quality of life refers to how sanitation practices and services directly affect people’s experiences, for example privacy, safety and disgust4. Measuring these outcomes is important because they are often rated highly (and alongside or above disease) as drivers of household sanitation decisions5,6,7 and contribute to health in its broadest sense8.

‘Health-related quality of life’ experiences have routinely been measured in health studies since the 1980s (ref. 9). Field-specific quality of life measurement in many areas of environmental health has been limited, but the recently developed water insecurity experiences (WISE) scales10,11 are now delivering insights into the causes and consequences of water insecurity12,13. Sanitation has lagged behind, but the Sanitation-related Quality of Life index (SanQoL-5) has now been applied in 15 populations across 9 countries. SanQoL-5 measures people’s sanitation experiences across five attributes: privacy, disgust, safety, shame and disease14. Each attribute is measured by a short question with answers on a three-level frequency scale (Table 1). The quality of life theory underlying SanQoL-5 is the capability approach to welfare economics15,16. The five attributes measure outcomes people have ‘reason to value’ about sanitation, as identified in qualitative work4 and prior literature17,18. The questions measure ‘functionings’ (people’s achievement of capabilities in the past 30 days) rather than the capabilities themselves (the broader set from which people have freedom to choose).

Table 1 SanQoL-5 questions (descriptive system)

The SanQoL-5 draws on methods common in health economics for developing measures to use in economic evaluation19. An economic evaluation purpose requires a small number of attributes or questions (typically <7), such as the ‘EQ-5D’, which measures and values health-related quality of life20,21. Attributes are selected primarily for content validity—the extent to which the most relevant and important aspects of a concept are captured22. Rather than using factor analysis as applied in scale development, the attributes are valued as an index anchored at 0 and 1, with weights based on preference elicitation23. The index then represents a given population’s relative valuation of those preselected attributes.

The original SanQoL-5 development study14 found that the index demonstrated favourable psychometric proprieties, but only in one urban setting (Maputo, Mozambique) with a modest sample size (n = 424). Since initial development, there have been refinements to the questions based on mixed-methods cognitive testing and piloting in Zambia and Ghana. Given that SanQoL-5 has now been applied in 15 populations and its questions recommended by the United Nations Children’s Fund (UNICEF)/World Health Organization (WHO) Joint Monitoring Programme for gender monitoring24, revalidation of the measure in broader settings and larger samples is warranted but has not yet been undertaken.

In this Article, we assess the validity and reliability of the SanQoL-5 index by interviewing 6,165 people across rural and urban areas in six countries: Ethiopia, India, Kenya, Malawi, Mozambique and Zambia. In doing so, we address the need for validated sanitation measures focused on quality of life, which can be simply and consistently deployed in studies and routine monitoring.

Sample characteristics

Samples ranged from 0% rural to 100% rural (Table 2), and sampling and representativeness varied by country (Methods). Samples were approximately gender-balanced in all countries apart from Malawi, where 83% of respondents were women (Table 2). The mean age was about 40 years in all countries. Samples varied in terms of types of sanitation. In Mozambique, the most common toilet (50%) was a pit latrine with concrete slab, with 32% using a flush or pour-flush toilet. In India, all those using a toilet used a flush or pour-flush. In all other countries (pour-)flush toilets were very uncommon (<1%), with most people (61–83%) using a pit latrine with wood/soil slab. Open defecation (at the last time people defecated) ranged from 3% in Malawi to 35% in India. We present distributions of SanQoL-5 attribute levels (Fig. 1), as well as histograms of SanQoL-5 index values (Supplementary Information A). All attribute levels received at least 10% of responses in all countries. SanQoL-5 value sets (weighted indices) are provided in Supplementary Information B.

Table 2 Respondent and toilet characteristics
Fig. 1: Distributions of SanQoL-5 attribute levels by country.
figure 1

a, Ethiopia, n = 1,570. b, India, n = 1,212. c, Kenya, n = 988. d, Malawi, n = 1,400. e, Mozambique, n = 601. f, Zambia, n = 365.

Validity and reliability

There was good evidence for the construct validity of SanQoL-5. There was evidence at P < 0.10 for 91% of the hypothesized associations with toilet quality characteristics in individual regressions (87% at P < 0.05) and 86% (71% at P < 0.05) in concurrent regressions (Table 3). All associations were in the hypothesized direction, indicating that better-quality toilets were associated with higher SanQoL-5. Full regression output is reported in Supplementary Information C. The only hypothesized variable showing a lack of association with SanQoL-5 in more than one country was whether the toilet was shared with other households. Only 10% of negative controls were associated with SanQoL-5 at P < 0.10.

Table 3 P values on coefficients for hypothesized associations in GLMM regressions, individually and concurrently

People with higher sanitation levels of service tended to have higher SanQoL-5 index values, which supports ‘known groups’ validity (Fig. 2). In the two samples in which >15% of the sample practised OD, there was a SanQoL-5 gain from toilet use over OD (Table 3 and Fig. 2). Correlation between SanQoL-5 and sanitation visual analogue scale (SanVAS)25 scores (Supplementary Information D) ranged from 0.28 in Malawi to 0.58 in Mozambique, all in the hypothesized direction at P < 0.001, which is evidence of convergent validity (Supplementary Information E). Considering discriminant validity, SanQoL-5 was independent of two EQ-5D health-related QoL variables in the two countries we collected those data (India and Kenya). Specifically, the five-level EQ-5D variable for ‘problems in walking about’ was not correlated with SanQoL-5 in either country (Kenya r = −0.04 (P = 0.33), India r = −0.04 (P = 0.89)). EQ-5D ‘pain or discomfort’ was also not correlated with SanQoL-5 (Kenya r = −0.08 (P = 0.06), India r = −0.033 (P = 0.26)). Cronbach’s α ranged from 0.73 to 0.92 per country (0.85 in pooled data), indicating good internal reliability (Supplementary Information E).

Fig. 2: SanQoL-5 index kernel density distributions by toilet type.
figure 2

a, Ethiopia. b, India. c, Kenya. d, Malawi. e, Mozambique. f, Zambia.

Item response theory

In the pooled and country-specific item response theory (IRT) models, category characteristic curves show a distinct peak for the ‘sometimes’ level across all attributes (Supplementary Information F), confirming that it is appropriate to maintain three-level attributes rather than binary. The item information functions show that all attributes are giving good information across the construct (theta) with privacy and shame providing more information than the others (Supplementary Information F). The five attributes have similar ‘difficulty’ (no Guttman ordering), which feeds into a smooth test information function covering the breadth of the construct. Neither the pooled nor country-specific models raise any concerns.

Measurement invariance

With 5 attributes and 15 possible comparisons among 6 countries, there were 75 possible instances of differential item functioning (DIF). Among these, seven (9%) exhibited DIF in ordinal logistic regression that was ‘meaningful’, following the widely used26 cut-off of 2% increase in pseudo-R2 (Supplementary Information G). This provides good evidence of equivalent measurement and meaning across countries. While further assessment of DIF in larger representative samples is recommended, these results broadly support the cross-cultural comparability of SanQoL-5.

Comparing question framings

Analyses comparing the old question framings to the current framings (Table 1) in Ethiopia and Zambia support use of the current questions, where ‘always’ is the worst outcome. A higher proportion of hypothesized variables had statistically significant associations in both countries (Ethiopia and Zambia) under the current questions. In no area did the old questions perform better (Supplementary Information H).

Discussion

This study evaluated multiple aspects of validity and reliability of a five-attribute Sanitation-related Quality of Life index (SanQoL-5), using rigorous psychometric methods across six countries in diverse rural and urban settings. The SanQoL-5 questions are short and simple, and together take around 1–2 min to administer. The SanQoL-5 covers a breadth of what people value about sanitation: avoiding disgust, avoiding shame, avoiding disease risks, having safety and having privacy. Rather than focusing on toilet types like SDG 6, the SanQoL-5 index captures people’s sanitation-related experiences.

Our study included populations using a variety of sanitation types in rural and urban settings in six countries, with SanQoL-5 responses covering the full range of attribute levels (Fig. 1). We have presented evidence for different types of validity and reliability generated using these datasets, and our findings on measurement invariance support cross-cultural comparability. We believe that SanQoL-5 can be widely applied with adults, but at this stage there is good evidence of validity only in African countries and northern India. Validation is a continuous process, even for long-established measures27. Further exploration of the validity of SanQoL-5 in other world regions is required, and we recommend that piloting and/or cognitive interviews ideally be undertaken before application in new settings or languages28. We also recommend that users of SanQoL-5 undertake their own validity and reliability assessments, wherever possible. There was prior evidence of test–retest reliability14, although further exploration of this is needed, and future studies should also investigate predictive validity (for example, SanQoL-5 at one timepoint ‘foretelling’ some subsequent outcome). Translations in several languages are available at www.SanQoL.org. Before this study, an earlier version of SanQoL-5 had been validated in one setting (urban Mozambique). Improvement of the questions based on mixed-methods research in multiple countries resulted in the updated version of SanQoL-5 that we have evaluated here, which is easier to understand and performed better in head-to-head comparisons (Supplementary Information H).

There are several possible applications of the SanQoL-5 index. First, SanQoL-5 can be used as an outcome in impact evaluation (for example, differences compared with a counterfactual), as already done in several studies29,30,31. Second, it can be used in the monitoring and evaluation of programmes (for example, differences in a group over time). SanQoL-5 was already used for this purpose by the non-government organizations World Vision and Water & Sanitation for the Urban Poor, and in a high-frequency monitoring study of container-based sanitation32. Third, it can be used in needs assessment, for example characterizing the scale and nature of sanitation problems in a population. Fourth, it can be used in economic evaluation of sanitation programmes, as in a cost-effectiveness analysis in Mozambique33 and a benefit–cost analysis in progress in Malawi29. It is for this economic purpose that SanQoL-5 was designed as a weighted index20, thereby capturing the value of sanitation to people. This design feature was a necessary condition for allowing SanQoL-5 gains to be given monetary value in benefit–cost analysis, based on willingness to pay.

In all of these uses, SanQoL-5 provides complementary subjective information to objective ‘quality of service’ measures (for example, roof or wall quality in the ‘sanitation quality index’34). Subjectivity is a characteristic of all quality of life measures35. It can be helpful to know, for example, whether people’s subjective perception of disease risk has changed as a result of a programme, as compared with actual disease cases. These two things may not always be closely related36.

In all six settings evaluated here, people with progressively higher levels of sanitation service tended to have progressively higher SanQoL-5 (Fig. 2). This is evidence of known groups validity but also demonstrates the potential of SanQoL-5 to evaluate relative QoL gains arising from different sanitation programmes and policies. There was diversity of SanQoL-5 experience within each sanitation service category (Fig. 2). This is unsurprising because each contains a variety of individuals with their own characteristics and experiences, as well as toilet subtypes in different states of condition, noting the conceptual model underlying SanQoL-5 (ref. 4).

Sharing toilets with other households is often assumed to deliver worse outcomes than private toilets37. On the one hand, therefore, it was unexpected that the variable for sharing was not statistically significantly associated with SanQoL-5 in Ethiopia and Zambia. On the other hand, these samples were predominantly rural (81% and 100%, respectively), and it is plausible that any negative consequences of sharing are more acute in dense urban settings. Sharing toilets may be more palatable in sparsely populated rural areas with smaller numbers of sharing households. Among those who shared toilets, the median number of households sharing was 2 in all countries (predominantly rural settings) except Mozambique (urban) where it was 3. Further exploration in urban settings of the relationship between sharing and SanQoL-5 is required.

SanQoL-5 has thus far only been used in adult populations, but it could be useful in adolescent or child populations, for example, in the context of school sanitation as well as households. About one-sixth of the world’s population are adolescents38, who may experience sanitation in different ways to adults39,40. Further work on content validity and ease of understanding is required for its use among children and adolescents. Questions may also need amending, as in the youth version of the EQ-5D41.

The SanQoL-5 captures five dimensions of sanitation-related quality of life and makes no claim to measure all aspects of sanitation-related QoL that may be important. Users requiring more granularity might include other longer measures alongside it. For example, scales in the Agency, Resources, and Institutional Structures for Sanitation-related Empowerment (ARISE) family capture many aspects of sanitation-related QoL among women in more detail, but with a large number of questions that take more time42. Users are reminded that SanQoL-5 development followed design principles common in measures for economic evaluation23, with attributes selected for content validity22 rather than based on factor analysis. As with any measure development effort, alternative methods might have delivered a different instrument. As above, we recommend measuring quality of service alongside QoL outcomes34.

A priority for future research is a more detailed exploration of which toilet types or characteristics are associated with the biggest gains in SanQoL-5, to inform policy and programming decisions. A further priority is the investigation of gender differences in SanQoL-5 (in particular, intrahousehold differences), as investigated for water and food security43,44. A strength of SanQoL-5 is that it is applicable to any gender, meaning it can identify gaps or inequalities between women and men.

Strengths of our study include the diversity of countries, rural and urban milieus and toilet types used, as well as the variety of analytical methods for assessing different aspects of validity and reliability. Limitations include that data were not collected for some aspects of reliability, for example test–retest, although this was assessed in the earlier SanQoL-5 study14. Responsiveness of SanQoL-5 to changes in sanitation services over time could also not be assessed in this study, although it was demonstrated in an earlier study45. Other limitations include that, although samples were large enough for validity and reliability assessment, they were in relatively small geographic areas within each country (apart from Kenya, which was nationally representative). Furthermore, no high-income countries were included, but evidence suggests that there may be sanitation-related quality of life deficits in those countries (e.g., among groups often excluded from sanitation services owing to poverty or discrimination)46.

The SanQoL-5 index provides a short and simple measure capturing the outcomes people value about sanitation, which are also what often motivate toilet purchases and upgrades. A single overall score, combining five important experiences, is practical for assessing the impact of sanitation improvements. Monitoring for SDG 6 focuses on toilet types, but achieving and sustaining progress on sanitation will require efficient resource allocation, which takes account of people’s experiences, too. Understanding which programme designs and technologies are associated with the largest gains in SanQoL-5 can help to target investments to where they will see the greatest uptake and economic returns.

Methods

Study settings

We use data from previous studies in Ethiopia, India, Kenya, Malawi29, Mozambique47 and Zambia. The Ethiopia sample comprised 1,586 people from 24 communities (81% rural) across six districts (woredas) in three regions of the country. The India sample comprised 1,213 people from 60 communities (87% rural) representative of two states (Bihar and Uttar Pradesh), specifically two people (one male, one female) per household in 607 households. The Kenya sample comprised 1,000 people from 60 communities (71% rural) in a nationally representative sample of 600 households with a secondary respondent in 400 households. The Malawi sample comprised 1,400 people from 70 rural villages in Chiradzulu district. The Mozambique sample comprised 601 people, half from 24 urban blocks (quarteirões) in Maputo City and half from 18 blocks in the large town of Dondo in Sofala province. The Zambia sample comprised 365 people from nine rural villages in the Chongwe district. Our final sample includes 6,165 households across the four sites, representing heterogeneous geographies, cultures and sanitation infrastructure availability. Random sampling of households was used in different ways in all sites. Further details of underlying studies are in Supplementary Information I.

SanQoL-5 data and weighting

The SanQoL-5 questions are presented in Table 1. Answers are combined into a single score ranging from 0 to 1. Higher SanQoL-5 scores are better, with 1 denoting ‘full sanitation capability’ (maximum QoL) and 0 ‘no sanitation capability’ (minimum QoL). With a three-level response to each of the five questions, there are 243 (= 35) possible combinations of SanQoL-5 attribute levels. The rationale for non-equal weights is that, in a given population, reduced disgust might hold greater value for people on average than improved privacy. These preferences are important to account for in the economic applications that SanQoL-5 is designed for, for example, benefit–cost analysis48. Therefore, rather than assuming that disgust has the same value as privacy, preferences can be elicited from the relevant population using methods such as discrete choice. Because the weights are elicited from people themselves, the SanQoL-5 index represents the value of sanitation to people in that population.

The set of preference weights for the 243 attribute combinations is known as a value set. Four of our studies apply value sets generated within the study, using a discrete choice experiment (Mozambique), attribute scoring (Malawi) and attribute ranking (Ethiopia and Zambia) (Supplementary Information J). The India and Kenya samples apply the discrete choice experiment value set. The SanQoL-5 index represents a given population’s relative valuation of the attributes, so weights are typically slightly different in different countries, as with health-related QoL indices such as EQ-5D20.

Overall study design

We apply a combination of classical test theory and IRT to assess different aspects of validity and reliability. First, we assessed construct validity—whether an instrument measures the construct it intends to measure. We took a predictive approach to construct validity, by testing hypotheses about how SanQoL-5 would covary with hypothesized variables. Second, we assessed convergent validity—whether two instruments aiming to measure similar constructs are correlated (an aspect of construct validity). We assessed this by correlation between SanQoL-5 and a SanVAS with scores ranging from 0 to 100 (Supplementary Information D)25. We used Spearman’s rank correlation (r) because, like EQ-5D index values49, SanQoL-5 index values are not usually normally distributed in a given population. We hypothesized that there would be moderate correlation (0.4 > r < 0.6), following norms for health VAS50. Third, we assessed discriminant validity (the opposite concept to convergent) by correlation between SanQoL-5 and the two EQ-5D questions (on mobility and pain) included in the India and Kenya questionnaires21. We used Spearman’s rank correlation for the same reason, hypothesizing no correlation (r = 0). Fourth, we assessed known groups validity—whether an instrument can discriminate between two groups expected to differ in terms of the outcome (another aspect of construct validity). We explored this by assessing whether people with higher levels of sanitation service tended to have higher SanQoL-5 index values. Finally, we assessed internal reliability—how consistently different questions in a measure capture the same construct35. We assessed internal reliability using Cronbach’s α (>0.7)51 and item-total correlation (>0.4)52. In statistical tests, P < 0.05 in a two-tailed test was considered statistically significant evidence of association.

Hypotheses for construct validation

We prespecified hypotheses about the presence of associations between SanQoL-5 index values and a set of toilet characteristics (hypothesized variables)53,54,55. These were predominantly fieldworker observations of toilet characteristics including: walls being solid; faeces not being observed on the pan/slab; the pan/slab being concrete or similar; a water seal being present; the toilet having an inside lock; and the toilet not being shared with other households. Variables were binary coded such that positive regression coefficients are hypothesized (1 = better outcome, 0 = worse). For example, we hypothesized that solid walls are more likely to provide privacy and safety than makeshift or absent walls, and solid walls would have a positive correlation with SanQoL-5. In making hypotheses, we drew on the literature on sanitation and mental well-being, as well as motives for sanitation behaviours17,18. Further details and rationales for hypothesized variables are provided in Supplementary Information K.

We also included negative controls hypothesized not to be strongly associated with SanQoL-5 (ref. 56), namely household size and whether the respondent had a partner. These are imperfect, because we were limited by what was asked in the original surveys. For example, household size could influence SanQoL-5 if it means more people are sharing a toilet. However, we would not hypothesize household size to be a strong predictor of SanQoL-5 in samples of only around 1,000 people.

Construct validity analyses were completed for each country separately. We assessed a binary variable for a given country only if ≥15% of the sample with non-missing data was in each category, to ensure a minimum of statistical power. We tested hypotheses using generalized linear mixed models (GLMMs) in Stata 18. In India and Kenya, where there were two respondents per household, we used three-level GLMMs with random effects at the household and community level. In other countries, we used two-level GLMMs, with the exception of Zambia where there were only nine clusters, so we used wild bootstrap inference with linear regression57. We clustered standard errors at the community level. We regressed on SanQoL-5 index values per country, including as a covariate each hypothesized variable in turn. We also explored the consequences of accounting for covariance between toilet characteristics, by including all hypothesized variables as covariates concurrently.

Item response theory

We used the graded response model (GRM) to assess the psychometric properties of each attribute and its contribution to the information function for unweighted SanQoL-5. GRM is widely used in the evaluation of health-related QoL measures because it allows polytomous variables, that is, with multiple attribute levels58,59. GRM is not part of the Rasch family because it allows discrimination to vary across items35. For IRT analyses, we pooled data across countries, as well as running models for individual countries where n ≥ 1,000 (Ethiopia, India, Kenya and Malawi)35. Based on the GRM, we present item information and test information functions, as well as category characteristic curves.

Measurement invariance via DIF

For measures to be compared across countries or settings, it is important that there is equivalence of measurement and meaning. We explored measurement invariance using DIF by ordinal logistic regression because SanQoL-5 attributes are polytomous. Specifically, we followed the approach of Penton et al.26 based on level sum scores (LSS), as recommended for EQ-5D60. LSS is the sum of attribute level scores and can be thought of as an unweighted SanQoL-5 index score (Supplementary Information G). With 6 countries, there were 15 possible country pairs. With 5 attributes, there were 75 possible instances of DIF overall. For each of the 15 pairs, we ran 2 models. For model 1, we ran ordinal logistic regression on each attribute score (ranging from 0 to 2) for those two countries only, including LSS as an independent variable. In model 2, we ran the same regression but also including a dummy variable for the two countries (for example, 0 for Kenya and 1 India). We calculated the difference in pseudo-R2 between models 1 and 2, interpreting a difference of >2% as ‘meaningful’ DIF between those two countries (if the coefficient on country dummy had P < 0.05). This is the same cut-off used by Penton et al.26 and earlier authors61,62. We took the more conservative approach by not first ‘purifying’ LSS as some studies do63.

Comparing question framings

In the first two studies in which the SanQoL-5 was used14,45, the questions had been framed such that ‘always’ was the best outcome. For example, ‘Can you use the toilet without feeling disgusted?’. Mixed-methods cognitive and piloting work in support of the Zambia study identified this framing as challenging to understand in local languages without further explanation (as well as other languages spoken by the team, for example, Hindi). To facilitate a comparison, we included the old (‘always = best’) questions alongside the new/current question framing (Table 1) in Zambia. A third of the Ethiopia sample (n = 506), which undertook fieldwork at a similar time, were also asked both sets of questions. A further analysis in our present study was therefore comparing the performance of the ‘always = best’ and ‘always = worst’ framings, using the same validity and reliability methods as above. For example, we tested the construct validity hypotheses under the two question framings for the five SanQoL-5 attributes and compared results. For a fair comparison in Ethiopia, we compared results only for the n = 506 who completed both question formulations (rather than the full n = 1,586 sample)

Ethics

The Malawi study received prior approval from the National Committee on Research in the Social Sciences and Humanities (ref: NCST/RTT/2/6) in Malawi. The Mozambique study received prior approval from the Comité Institucional de Ética do Instituto Nacional de Saúde (ref: 028/CIE-INS/2023) in Mozambique. The Zambia study received prior approval from the University of Zambia Biomedical Research Ethics Committee (ref: UNZA-1389/2020). The India study received prior approval from Convergent Institutional Review Board (ref: 2023-24/019) in India. The Kenya study received prior approval from the AMREF Ethical and Scientific Review Committee (ref: P1508-2023) in Kenya. The Ethiopia data were collected as part of an internal evaluation by World Vision, who secured a prior approval letter from each district sampled for data collection. Use of the Ethiopia data was approved by the London School of Hygiene & Tropical Medicine (LSHTM) because anonymized data had been made openly available online by World Vision at https://osf.io/x5myz/ before this study commenced. The protocol covering Ethiopia and Zambia was approved by the LSHTM MSc Research Ethics Committee (Ref: 29049), while the LSHTM Observations/Interventions Research Committees approved the studies in India/Kenya (ref: 29640), Malawi (ref: 28249) and Mozambique (ref: 28190). Informed consent was obtained from all research participants before studies commenced. Participants were not compensated in Ethiopia, India, Kenya, Malawi or Zambia. In Mozambique, participants were given a paper calendar. This study was performed in line with the principles of the Declaration of Helsinki.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.