Introduction

The Sustainable Development Goals (SDGs) ratified by the United Nations Member States adopted ambitious targets for water, sanitation and hygiene (WASH) under Goal 6, in which Target 6.1 resolves universal access to safe drinking water for all1,2. The implementation of ambient water quality monitoring programs is important, as stated in SDG indicator 6.3.2; however, most developing nations lack this capacity3,4. The availability of physical and process infrastructure, along with effective planning and management, are the basic and essential requirements to ensure the safety of drinking water5. Water testing is highly important for monitoring the efficiency of water supply operations6. Countries should ensure that water supplies, whether public or private, are to be operationalized on the basis of risk-management strategies and that water safety must be ensured through the regular testing of all types of supplies7. Testing of water provides an early warning to envisage necessary remedial actions aimed at reducing undesirable constituents to the acceptable level; hence, water quality data should be reliable and dependable5,8.

Challenges of water quality monitoring

Various studies have identified constraints in testing water quality in well-equipped laboratories, particularly in developing nations and low-income settings. Owing to inadequate water quality testing infrastructure in public health laboratories, the generation of accurate water quality data on a regular basis is a great challenge for developing nations3,9,10,11. In accordance with the operational guidelines of the Jal Jeevan Mission in India, the sub-district water quality laboratory must test 100% of the water sources within its jurisdiction, the district water quality laboratory must test 3,000 samples per year, and the state water quality laboratory must test at least 5% of the total drinking water samples/sources across all districts6. There are approximately 2,100 water quality testing laboratories across all states and Union territories, which tested approximately 7.42 million water samples during the years 2023–2412, clearly indicates the requirement of tremendous resources for water quality testing. Hence, owing to the limited resources available to set up well-equipped water testing laboratories in adequate numbers, laboratory testing of the water quality of every source is quite challenging in developing countries10,13. The most viable solution under such circumstances is to monitor water quality via easy-to-use water quality field test (FTK) kits. The initial screening and periodical monitoring of all drinking water sources can be performed using FTKs in cost-effective ways by involving communities. The results of FTK can also be supplemented by accurate laboratory analysis for confirmation, wherever the same is required14.

Community-based water quality monitoring and surveillance

Community-based monitoring is desirable for generating data in challenging situations when the government or other institutions are unable to monitor the quality of the environment due to limited resources15,16,17,18. The village-level water and sanitation committee must ensure the testing of all drinking water sources via the FTK by involving the local community, and to achieve such objectives, five people, preferably women in each of the 600 thousand villages of India, are trained to use the FTK6,19. The role of women is central in the provision and management of water according to Goal 6 of the Sustainable Development Agenda 20304. As per operational guidelines for implementation of the Jal Jeevan Mission (JJM), active participation of women at all levels of institutional arrangements with special emphasis at the village level has to be ensured6. After adequate training for 2 to 3 days, the operators of FTK can test many water sources to provide test results for further interventions in a timely manner. FTKs are expected to become more prevalent with improvements through technological interventions5,6,20,21,22. Community-based water quality monitoring data have been used for the evaluation of water quality trends to identify hotspots for decision-making processes, thereby emphasizing the need for affordable testing via FTK23,24. FTKs have been used to test water sources for arsenic contamination to establish water safety25.

Reliability of water quality field test kits

To ensure accuracy, reliability and dependability, water quality testing programs, whether in the laboratory or in the field, should be subjected to quality assurance (QA) and quality control (QC) procedures. The measurement accuracy is the closeness of agreement between a measured quantity value and a true quantity value of a measurand8. Generally, QA and QC procedures are implemented in laboratory settings and are not available for field test kits. FTKs, when used appropriately, are useful tools for rapid testing of many contaminants in a non-formal laboratory setting in cost-effective ways in comparison to tests performed in conventional laboratories25. The performance of FTK is affected whenever demand for a large number of units emerges due to a lack of a quality assurance system at the manufacturer end26. Hence, proper evaluation and validation of the FTK prior to its application is an essential prerequisite5. The QA plan for FTK should include laboratory evaluation of kits, training and capacity building of operators, systems for data collection and analysis and checking of QC samples in the laboratory14. The critical parameters taken for performance evaluation of the FTK include accuracy, linearity, operator bias, and ease of handling27. It is also critical to assess the results obtained by FTKs for false positives and false negatives while comparing the data with results of the same parameters tested via accurate laboratory analysis10,14. Hence, the sensitivity and specificity of kits for detecting the parameter of interest are significant attributes. The Bureau of Indian Standard (BIS) specification specifies test parameters and permissible tolerances for checking the conformity of the FTK at ± 5% and recommends that if three consecutive readings of the FTK are beyond the acceptable limits but within the specified tolerances, then reverification via a laboratory method may be employed28. Keeping in view of the above, the objectives of the reliability assessment of water quality FTKs include (i) performance through a testing regime by using FTK testing instructions to test different sample concentrations of spiked water, through the conductance of replicate tests with each water sample, to arrive at statistically reliable results; (ii) performance of corresponding tests via standard laboratory analytical methods; and (iii) comparison of the reliability of the results of tests performed via FTK with corresponding laboratory test results via appropriate statistical methods.

Adequacy and safety of water quality test kits

Apart from producing reliable data for testing, the water quality FTKs have other significant attributes, such as ‘adequacy of FTK’, ‘safety aspects of FTK’, and ‘information made available to users’. The BIS specification describes the requirements of the apparatus and reagents, including the shelf-life of chemicals, fabrication, labelling and marking28, however, in the present study, we intend to provide inputs for the development of the protocol to quantify the impacts of the adequacy, safety and information aspects of the FTK. The ‘adequacy of FTK’ is an important aspect, which means that (i) FTK has sufficient consumable to perform the stated number of tests; (ii) FTK can easily be portable for site analysis; (iii) FTK is not congested with material; (iv) FTK is a user-friendly means that procedures for operations are not complicated; and (v) the main apparatus and accessories are adequate10. The ‘safety aspects of FTK’ are other very significant parameters to be considered during evaluation, as FTKs are to be used by community members who are not familiar with the harmful impacts of chemicals. The various attributes related to safety are (i) hazards involved in using FTK; (ii) leakages of chemicals from droppers, reagent bottles, etc.; (iii) leakages of gas, in the case that gas evolves during testing; (iv) packaging of chemicals; and (v) fragility of the apparatus and reagent bottles10. The ‘Information for Users’ is also a very prominent aspect, which has the following attributes: (i) the user’s instructions are simple and adequate; (ii) necessary precautions are covered; (iii) procedures are provided for used material disposal; (iv) specifications of chemicals are available in the public domain; and (v) remedial measures are provided in case of accidents10.

Evaluation of field test kits in the laboratory and field

The present study illustrates a comprehensive evaluation of four numbers of commercially available arsenic FTKs at the Shriram Institute for Industrial Research, Delhi, India. All the FTKs were procured and codified as AsKit-1, AsKit-2, AsKit-3 and AsKit-4 to maintain confidentiality as major focus of this study is to develop and validate the FTK evaluation protocol. The chemicals/reagents available in kits are codified and are not available in the public domain. However, field test kits for arsenic testing are generally based on the Gutzeit method with some modifications. In general, as per this method, arsenic in a water sample is reduced by reaction with metallic zinc in powdered form or any other reducing agent in the presence of acid to arsine (AsH3) gas, which is passed through a column containing a roll of cotton or filter impregnated with lead acetate [Pb(C2H3O2)2] solution to remove interferences. Generated arsine produces yellow–brown or pink stain/colour on test paper or graduated detector tube (generally impregnated with mercuric bromide) marked with arsenic concentration point. The intensity of stain/colour is proportional to the amount of arsenic concentration in the sample. The test results obtained with the kit are qualitative or semi-quantitative and need to be validated with accurate testing procedures. Hence, a laboratory evaluation of FTKs with respect to their effectiveness in testing arsenic was performed, followed by a field evaluation of the most effective FTK.

Materials and methods

Experimental water preparation and arsenic recovery

Arsenic (as As) concentrations ranging between 5 and 500 µg/L were selected on the basis of the concentration points specified in the FTKs under evaluation to determine the maximum number of samples spiked with standards required for the evaluation of FTKs (Table 1). Experimental water samples were prepared by spiking standards using certified reference materials (CRM) of arsenic traceable to the National Institute of Standards and Technology (NIST). The standards were spiked at different concentrations in demineralized (DM) water as well as in natural (ground) water to obtain water samples from different matrices. Groundwater samples from two different sources were selected: water having total dissolved solids, 550 mg/L termed as matrix-1 (M1) water and water having total dissolved solids, 1200 mg/L termed as matrix-2 (M2) water. Demineralized water represented control water (true negative). The arsenic concentration (as As) was analysed via the continuous hydride generation atomic absorption spectrometric (AAS-HG) method8,29.

Table 1 Arsenic concentration range and arsenic (as As) detection by FTKs.

Technical efficiency of the FTK

The ability of FTK to detect the presence/absence of arsenic was evaluated via (i) detection of the test parameter by FTK in ‘experimental water’ approximately equal to or equal to the specified concentration level to indicate true positive ‘a’; (ii) detection of the test parameter by FTK in ‘blank water’ or higher than the specified concentration level to indicate false positive ‘b’; (iii) non-detection of the test parameter or lower than the specified concentration of test by FTK in ‘experimental water’ to indicate false negative ‘c’; and (iv) non-detection of the test parameter by FTK in ‘blank water’ or in the water containing the specified component at the below detection level to indicate true negative ’d’. The reliability of FTK in terms of technical efficiency was evaluated through estimation of sensitivity, specificity and overall testing efficiency. Sensitivity (SEArsenic) is the ability of FTK to test the presence of arsenic in a sample correctly and is associated with true positives and false negatives (Eq. 1). Specificity (SPArsenic) is the ability of FTK to test the absence of arsenic in a sample correctly and is associated with true negatives and false positives (Eq. 2)30,31. The overall testing efficiency (OTEArsenic) is, therefore, associated with both the sensitivity and specificity (Eq. 3)10,32.

Hence,

$${\text{SE}}_{{{\text{Arsenic}}}} = { }\frac{{\text{a}}}{{\left( {{\text{a}} + {\text{c}}} \right)}} \times { }100{ }$$
(1)
$${\text{SP}}_{{{\text{Arsenic}}}} = { }\frac{{\text{d}}}{{\left( {{\text{b}} + {\text{d}}} \right)}} \times { }100{ }$$
(2)
$${\text{OTE}}_{{{\text{Arsenic}}}} = { }\frac{{\left( {{\text{a}} + {\text{d}}} \right)}}{{\left( {{\text{a}} + {\text{b}} + {\text{c}} + {\text{d}}} \right)}} \times { }100{ }$$
(3)

Data analysis and interpretation for overall performance evaluation

The major parameters of performance evaluation of FTK are technical efficiency (KTe), adequacy (KAdq), safety (KSafe) and information (KInf), which are quantified by assigning a statistical weight on a 1–10 scale to each parameter to obtain the individual parameter weight (IPW)33. Accordingly, indicators of adequacy (A1 to A5), safety (S1 to S5) and information (I1 to I5) were identified (Table 2), and a weight was assigned to each FTK on the basis of compliance with set criteria such as each indicator to carry ‘2’ points for compliance, ‘1’ points for partial compliance and ‘0’ points for non-compliance (Table 2). The opinions of five people were taken into account when assigning weights to each indicator. The technical efficiency weight (KTe) was estimated as 1/10 (OTEarsenic), whereas the weights for adequacy (KAdq), safety (KSafe) and information (KInf) were estimated by \(\sum\nolimits_{{{\text{i}} = 5}}^{{\text{n}}} {{\text{A}}_{{\text{i}}} }\), \(\sum\nolimits_{{{\text{i}} = 5}}^{{\text{n}}} {{\text{S}}_{{\text{i}}} }\) and \(\sum\nolimits_{{{\text{i}} = 5}}^{{\text{n}}} {{\text{I}}_{{\text{i}}} }\), respectively. The overall performance of the FTK was assessed by assigning a parameter importance factor (PIF) to each parameter34 to estimate the cumulative parameter weight (CPW). FTK adequacy and information are the parameters of least importance, as shortcomings with respect to these parameters can be reversed with some interventions; the safety parameter has at least two times greater importance than adequacy or information, as the reversal of shortcomings with respect to this parameter may require some research work; the technical efficiency parameter is five to six times greater than adequacy or information, as reversal of shortcomings with respect to this parameter may require major research work10. Hence, the cumulative parameter weight (CPW = 6KAdq + KAdq + 2KSafe + KInf) can be estimated at the 1–100 scale. Accordingly, the ranking and rating of FTK can be performed by adopting criteria such as CPW ≥ 90 ≤ 100 (Excellent/A1 grade); CPW ≥ 80 < 90 (Good/A2 grade); CPW ≥ 70 < 80 (Satisfactory/B1 grade); CPW ≥ 60 < 70 (Average/B2 grade); and CPW < 60 (Below average/C grade)10.

Table 2 Indicators for FTK Adequacy, Safety and Information.

Field testing of water via FTK and laboratory testing

Field testing of water via FTK was performed in the Bhojpur district of Bihar, where groundwater sources at many locations are reported to have concentrations of arsenic above the permissible level29,35,36. Field-tested samples were preserved at the site and transported to the laboratory to test arsenic via the AAS-HG method within the maximum permissible time limits as per the guidelines of Standard Methods for Examination of Water and Wastewater8.

Results

Arsenic recovery in experimental water

The percent relative standard deviation (%RSD) for the average recovery (n = 5) of arsenic (as As) in experimental water (with a true concentration ranging between 5 and 100 µg/L) was estimated to range from 2.2 to 5.0% in the DM water matrix, 1.8 to 5.3% in the M1 water and 2.5 to 6.7% in the M2 water (Fig. 1). The results of multiple linear regression indicate a very strong correlation (Pearson) between the true arsenic concentration (T) and arsenic recovery in DM water (DM), M1 water and M2 water (T: DM equals 0.9998; T: M1 equals 0.9998; T: M2 equals 0.9999). The one-way ANOVA test statistic F distribution df (3, 44) (right-tailed) equals 0.00052322, which is in the 95% region of acceptance; the p value equals 0.99998; the mean ± SD (n = 12) is 48.3 ± 31.8 µg/L for true values of arsenic, 48.8 ± 31.9 µg/L for arsenic recovery in DM water, 48.6 ± 31.9 µg/L for arsenic recovery in M1 water and 48.8 ± 32.1 µg/L for arsenic recovery in M2 water.

Fig. 1
figure 1

Average recovery of arsenic (as As) at each concentration point (n = 5) in different matrices.

Laboratory evaluation results

The results obtained with the FTKs were compared with the results of AAS-HG (in the range of ± 5%). SEArsenic, SPArsenic and OTEArsenic for (i) AsKit-1 (n = 42 for pooled matrix & n = 14 for the respective matrix) were estimated to be 100.0%, 92.3% and 97.6%, respectively (OTEArsenic estimated 100.0% each in DM water & M1 water and 92.9% in M2 water); (ii) AsKit-2 (n = 27 for pooled matrix & n = 9 for the respective matrix) were estimated to be 100.0%, 80.0% and 88.9%, respectively (OTEArsenic estimated 88.9% each in DM water, M1 water and M2 water); (iii) AsKit-3 (n = 33 for pooled matrix & n = 11 for the respective matrix) were estimated to be 100.0%, 70.6% and 84.2%, respectively (OTEArsenic estimated 90.9% in DM water and 81.8% each in M1 water and M2 water); and (iv) AsKit-4 (n = 33 for pooled matrix & n = 11 for the respective matrix) were estimated to be 100.0%, 66.7% and 81.8%, respectively (OTEArsenic estimated 81.8% each in DM water, M1 water and M2 water) (Table 3). The technical efficiency weights (KTe) for AsKit-1, AsKit-2, AsKit-3 and AsKit-4 were estimated to be 9.76, 8.89, 8.48 and 8.18, respectively. The corresponding KTe values with importance factors were estimated to be 58.56, 53.34, 50.88 and 49.08, respectively (Table 4). The limit of detection of AsKit-1 is 10 µg/L, AsKit-2 is 10 µg/L, AsKit-3 is 5 µg/L, and AsKit-4 is 5 µg/L (Table 1). The estimated values of OTEArsenic for AsKit-1 is 97.6% (RSD 4.2%), AsKit-2 is 88.9% (RSD < 1%), AsKit-3 is 84.8% (RSD 6.2%), and AsKit-4 is 81.8% (RSD < 1%) (Table 3).

Table 3 Sensitivity, specificity and overall testing efficiency of the FTK evaluation in the laboratory.
Table 4 Statistical weights (individual parameters and cumulative parameters).

The weights for adequacy (KAdq), safety (KSafe) and information (KInf) based on the set criteria (Table 2) and the application of importance factors were estimated to be 10.0, 20.0 and 4.0, respectively, for AsKit-1; 10.0, 20.0 and 4.0, respectively, for AsKit-2; and 10.0, 20.0 and 4.0, respectively, for AsKit-3; and 6.0, 12.0 and 2.0, respectively, for AsKit-4 (Table 4). Information is one parameter in which all the kits score low points, as critical information for the users is not included in the kit manual. Accordingly, the CPW estimated for AsKit-1 was 92.56 (Excellent/A1 grade); for AsKit-2, it was 87.34 (Good/A2 grade); for AsKit-3, it was 84.88 (Good/A2 grade); and for AsKit-4, it was 69.08 (Average/B2 grade) (Table 4).

Field testing results vis-à-vis laboratory testing

AsKit-1 was used in the field to test arsenic (as As) in water samples drawn from 80 sources, the results of which ranged from < 10 µg/L (minimum detection level of AsKit-1) to 100 µg/L, with arsenic not detected in 28 samples (Table 5). The results of all the samples analysed via the AAS-HG method indicated that the concentration of arsenic (As) ranged from < 5 µg/L (minimum detection level of AAS-HG) to 97 µg/L. Among the 28 samples in which arsenic was not detected in the field via AsKit-1 (n = 80), 27 were found to contain arsenic at concentrations < 5 µg/L via the AAS-HG method, and one was found to contain arsenic at a concentration of 5.3 µg/L, thereby indicating that 28 samples were estimated to be true negative (Table 5). Compared with the AAS-HG results (within ± 5%), the AsKit-1 results were true positive in 16 samples; false positive in 19 samples; and false negative in 17 samples; thus, SEArsenic, SPArsenic and OTEArsenic for AsKit-1 (n = 80) in the field vis-à-vis AAS-HG were estimated to be 48.5%, 59.6% and 55.0%, respectively (Table 5). Compared with the AAS-HG results (within ± 10%), the AsKit-1 results revealed true positive results in 44 samples; false positive results in 3 samples; and false negative results in 5 samples. SEArsenic, SPArsenic and OTEArsenic for AsKit-1 (n = 80) in the field compared with AAS-HG were estimated to be 89.8%, 90.3% and 90.0%, respectively (Table 5). Hence, the FTK results can best be comparable within ± 10% of accurate laboratory results.

Table 5 Arsenic test results: AsKit-1 in the field and AAS-HG in the laboratory.

Relationship between the AAS-HG results and AsKit-1 results

As the research work has several assumptions, the present study intended to determine the relationship between the AsKit-1 test results obtained in the field and the AAS-HG results estimated in the laboratory via statistical models such as regression and chi-square analysis. However, before these models are put on, it is important to ensure that the assumptions are fulfilled. The initial assumption of regression analysis is a linear relationship between the dependent and independent variables, with the dependent variable exhibiting a normal distribution. The Q‒Q plot generated via log-transformed data (Fig. 2) shows that the data are symmetrically distributed and tend to cluster towards the middle of the plot, thereby validating normality post-transformation. The box and whisker plot (n = 52) (Fig. 3) indicates lower quartile values (Q1) of 10.00 µg/L for AsKit-1 and 11.00 µg/L for AAS-HG and upper quartile values (Q3) of 65.00 µg/L for AsKit-1 and 64.25 µg/L for AAS-HG and an interquartile range (IQR) of 55.00 µg/L for AsKit-1 and 53.25 µg/L for AAS-HG. Another important assumption of regression analysis is homoscedasticity, and the variance of the residual should be constantly distributed. Visualizing the residual plot reveals a symmetric distribution around the 0 (central) line; hence, this assumption of homoscedasticity is also fulfilled. The standard deviation of the residuals (Sres) equals 5.565 (Fig. 4), which indicates that the regression line fits the data well and that the predictions are better. The regression line equation drawn for the results obtained from the AAS-HG and FTK results above the detection level (10 µg/L) of AsKit-1 (n = 52) is \({\hat{\text{Y}}}\)  = 0.8907 + 0.9867X (Fig. 5). R-Squared (R2) equals 0.9678 means that 96.8% of the variability of AAS-HG results is explained by AsKit-1 results. The regression equation validates the strong relationship between the AAS-HG and AsKit-1 results with strong model variability. The regression plot shows that the model has minimal mean squared error, as tight clustering around the best-fit line exists. Moreover, these results are supported by the predicted interval plot, where most of the variable points fall within a 95% confidence interval, highlighting the model’s accuracy. A correlation coefficient (R) of 0.9838 means that there is a very strong direct relationship between the results of AsKit-1 and AAS-HG. The slope, b1 = 0.9867 (CI 0.9356, 1.0378), means that the increase in AsKit-1 results in a value of 1, and the value of AAS-HG results in an increase of 0.9867 (Fig. 5 and Fig. 6). Goodness-of-fit test shows an overall regression (right-tailed), F (df1:1; df2:50) = 1503.52, p value < 0.05, which means that the null hypothesis H0 can be rejected (Fig. 7).

Fig. 2
figure 2

Residual QQ plot (log transformed data).

Fig. 3
figure 3

Box and Whisker plot.

Fig. 4
figure 4

Residual plot.

Fig. 5
figure 5

Line fit plot (results of AsKit-1 vis-à-vis AAS-HG).

Fig. 6
figure 6

Prediction interval plot (results of AsKit-1 vis-à-vis AAS-HG).

Fig. 7
figure 7

Distribution: F (df1:1; df2:50)—Right-tailed.

The present study also aimed to examine the significant relationship between the observed frequencies and expected values. The hypotheses are as follows: (i) null hypothesis (H0): there is no significant difference between the observed frequency and expected values; and (ii) alternate hypothesis (H1): there is a significant difference between the observed frequency and expected values. Goodness-of-fit model using the chi (χ2) distribution (k = 8; df = 7; n = 159; right-tailed) employed between the observed frequencies and expected values of Askit-1 and the observed frequency and expected values of the AAS-HG. The values in both cases were categorized into four different concentration ranges: < 10, ≥ 10 < 20, ≥ 20 < 50, and ≥ 50 < 100. The modelling indicates that the test statistic χ2 equals 0.6892, which is in the 95% region of acceptance (-∞: 14.0671). Since the p value (0.9984) > α, Ho is accepted, and the statistical model fits the observations (Fig. 8). The observed effect size phi (Φ) is small, 0.066, indicating that the magnitude of the difference between the observed and expected data is small (Table 6). Hence, there is no significant difference between the observed frequencies and expected values.

Fig. 8
figure 8

χ2 distribution.

Table 6 Goodness-of-fit Chi (χ2) square model inputs.

Discussions

Technical performance of the FTK

The FTK conformity is checked at ± 5% of the true concentration of a parameter to be tested in the laboratory28. Within ± 5% of the arsenic at different concentration points in the variable water matrix (Table 2), we identified one false positive by AsKit-1 (n = 42), three false positives by AsKit-2 (n = 27), five false positives by AsKit-3 (n = 33) and six false positives by AsKit-4 (n = 33). Reddy et al. (2020) evaluated eight commercially available kits by checking the performance of the kits vis-à-vis AAS-HG analysis results and reported that two kits provided both accurate and precise estimates of arsenic, four kits were either accurate or precise, and two kits were neither accurate nor precise37. Kearns and Tyson (2012) reported improved performance of a kit with increased reaction time38. George et al. (2012) carried out extensive studies with a kit and reported good performance, with a positive bias at higher concentrations24. Steinmaus et al. (2006) reported an overall good correlation between the results of arsenic testing both by laboratory methods and by FTK, but few instances were also identified by these authors; both results markedly differed, and reasons for such discrepancies were anticipated owing to the presence of chemical interferences in water sources22. Kearns et al. (2019) reported poor performance of the arsenic FTK, particularly for waters containing relatively high concentrations of arsenic39. van Geen et al. (2019) reported that field test kits tend to overestimate the concentration of arsenic40. Our study revealed lower specificity (SPArsenic) for three kits, viz. AsKit-2 (80.0%), AsKit-3 (70.6%) and AsKit-4 (66.7%). The reason for the lower specificity is the increased number of false positives in the variable water matrix. AsKit-1, for which the specificity was estimated to be 92.3% and the overall testing efficiency was 97.6%, was taken to the field to perform field testing of water via this kit. The water quality testing conditions in the field are more variable and dynamic, in addition to the fact that field operators may face more challenges than laboratory technicians5,14. Field samples (n = 80) were tested via AsKit-1, and 19 false positives were detected during the comparison of the results, with a ± 5% range for the same samples tested via AAS-HG and 3 false positives during the corresponding comparison in the ± 10% range. This means that when testing by FTK, which is a screening tool to test water quality, the criterion of ± 5%, as given in BIS 18283 (2023), is quite stringent28. According to Nickson et al. (2019), ± 20% accuracy can be maintained by FTK41.

Adequacy, safety and information of FTK

BIS 18283 (2023) describes the qualitative and general requirements of the kit apparatus or instrument, reagents and chemicals, kit fabrication, legible and indelible labelling and marking and availability of the instruction sheet or operation manual28. Arora et al. (2009) reported that the ease of use of a field test kit influences the ability and effectiveness of the kit in performing the test25. Reddy et al. (2020) assessed kits considering several factors, including procedures with active steps, color chart readability, reagent packaging and information available in the kit manual37. We identified five indicators (each carrying a maximum weight of ‘2’) to assess FTK in terms of adequacy (KAdq), safety (KSafe) and information (KInf) with a maximum weight of ‘10’ for each parameter, and with the application of an importance factor, CPW [(KAdq + KSafe + KInf); the maximum score of 40] was estimated to be 34.0 each for AsKit-1, AsKit-2 and AsKit-3 and 20.0 for AsKit-4 (Table 2). The low CPW (KAdq + KSafe + KInf) is attributed to inadequate information covered in the instruction manual. A quantitative assessment of FTKs based on IPW (KTe, KAdq, KSafe and KInf) and subsequent estimation of CPW can be an effective tool for developing a decision support system to select the most effective FTK to strengthen community-based water quality monitoring programs.

Conclusions

Field test kits play an important role in strengthening community-based water quality monitoring and surveillance programs in developing countries. However, the performance of FTKs varies widely; hence, there is an intense need to develop a reliability model to evaluate the effectiveness of FTKs. In the present study, four arsenic FTKs were evaluated in the laboratory through testing in specified concentration ranges in the water of different matrices and subsequent use of effective FTKs in the field for in situ testing of water and analysis of the same water sample in the laboratory via the AAS-HG technique. During the laboratory evaluation of FTK, when the results of the kit were compared with those of AAS-HG in the ± 5% range, the overall testing efficiency (OTEArsenic), including sensitivity and specificity, was estimated to be 97.9% for AsKit-1, 88.9% for AsKit-2, 84.2% for AsKit-3 and 81.8% for AsKit-4. When AsKit-1 was used in the field to test the arsenic samples (n = 80), the OTEArsenic was estimated to be 55.0% when the field test results were compared with the AAS-HG results in the ± 5% range and 90.0% when the field test results were compared with the AAS-HG results in the ± 10% range. The lower overall testing efficiency of FTK in the field may be due to variable and dynamic conditions in the field and the possibility of operating bias due to the involvement of field staff. On the basis of the present study, key recommendations for the development of protocols for the reliability assessment of FTKs may include (i) appropriate tolerance for variation between FTK results and accurate laboratory test results can be considered in the ± 10% range in view of field dynamics and variability; (ii) FTK adequacy assessment can include indicators such as ease of handling, availability of consumables, kit portability, space in the kit box, etc.; (iii) FTK safety can include indicators such as hazard assessment, leakage of liquid and gas, packaging of chemicals, fragility of the glass apparatus, etc.; and (iv) FTK information can include indicators such as the availability of simplified instructions in various methods for operation, safety measures, necessary precautions, instructions for the disposal of used material, etc. To quantify the assessment findings for the development of a decision support system to select the most effective kit, a weightage scheme can be applied for each parameter, and an indicator to estimate the IPW and importance factors can be applied to estimate the CPW. In addition, training and orientation to use FTK are important aspects; hence, an enabling environment may be created to increase the capacity of community members for the safe use of FTKs.