Introduction

The fraction of exhaled nitric oxide (FeNO) is a marker for type 2 airway inflammation used in asthma1,2,3,4,5,6. FeNO is measured in a short point-of-care pulmonary function test facilitated by portable chemoelectric analyzers, which generally replaced the stationary chemiluminescence analyzers, as previous, often smaller studies detected clinical interchangeability of the results7,8,9,10,11,12,13,14,15. FeNO is affected by airway flow, participant demographics, and lung function parameters16,17. As proper breathing maneuvers are crucial, it is unclear whether the participant’s ease of analyzer use could influence FeNO results. In fact, usability was not comprehensively assessed in any FeNO analyzers previously18, and only a few studies analyzed usability with visual analog scales or exam time14,19,20.

To assess the usability and clinical equivalence of FeNO analyzers, the current study was performed prospectively in a large adult general population cohort: Three portable devices, NIOX VERO (CN, Circassia Pharmaceuticals plc, Oxford, United Kingdom), NObreath (BN, Bedfont, United Kingdom), and Vivatmo pro (BV, Bosch Healthcare Solutions, Waiblingen, Germany), and CLD 88 Analyzer (EC, Ecomedics, Durnten, Switzerland), a stationary chemiluminescence analyzer as reference, were compared regarding their usability, and measurement interchangeability in a large general adult population from the LEAD (Lung, hEart, sociAl, boDy) study.

Methods

The details about the LEAD study have been published in detail elsewhere21. Shortly, the LEAD study recruited participants from 2011 until data collection in 2022. This study was a cross-sectional prospective study and was approved by the Ethics Committee of Vienna (EK-11-117-0711). All participants or their legal representatives provided signed and informed consent. All methods were performed in accordance with the relevant guidelines and regulations.

This study included participants from the LEAD general population cohort, including respiratory healthy and comorbid participants aged 18–91 years old (Figure S1)21. A complete dataset of measurements consisting of FeNO values was required, in addition to assessments of participants regarding age, height, weight, sex, spirometry results, respiratory diseases (self-reported), and symptoms (respiratory symptoms, coughing, sputum production, breathlessness, wheezing)22. This study aimed for a general population cohort with randomly included comorbidities, specifically also including participants with asthma and other respiratory comorbidities (Supplemental table S1). Participants not able to perform all planned measurements were excluded.

Devices were used in random order. Participants filled out usability questionnaires and performed other assessments and patient history with doctor-diagnosed comorbidities, as previously published21. Additionally, all FeNO measurements were completed within a one-hour span on the same day, and not after lung function analyses with forced breathing maneuvers. All assessments were done by trained technical personnel and physicians. Measurements were done in a seated position, with visual and auditorial information cues, to establish an exhalation time of 10 s at a standardized exhalation flow rate of 50 mL/s. Participants had two minutes or more of relaxed tidal breathing between measurements11.

A measurement acceptance was defined as a medical staff-approved and valid single FeNO level usable in standard clinical procedures. One measurement was defined as one attempt to perform the FeNO exhalation maneuver while the device recorded, exempting at most one tutorial or training attempt without measurement14. Per participant and device, one trial (first accepted measurement) was analyzed.

The ease of use, device handling, technical issues, and procedure comfort were assessed. Furthermore, exam time per attempt and overall testing time were recorded by the examiners. The study was performed in two consecutive parts. The first 235 of 486 participants and the examiners responded to questionnaires focused on test attempts, test time, and issues arising within testing, particularly adapted from the technology-agnostic23 survey System Usability Scale (SUS). The following 251 of 486 participants received a measurement with EC and no SUS questionnaires. The questionnaires were based on the SUS scoring point system24 additionally to test attempt counts, time scales, yes–no categorical answers, and subjective assessment numerical rating scales with five guide points, which were interpreted as percentages. The general SUS threshold for acceptance in the industry is defined at 68 points (0–100), and a higher score indicates better usability25,26,27. In addition to the SUS survey, questions about ease-of-use, device handling, technical issues, and procedure comfort from the user’s and examiner’s perspective.

Statistical analysis

Statistical analysis was performed using R 3.5.1 (R Core Team, 2021), GraphPad Prism 9 (GraphPad Software, Inc., La Jolla, CA, USA), and data management in Excel 2013 (Microsoft, Redmond, USA). Descriptive statistics were performed. Friedman test with Dunn’s multiple comparisons test or pairwise Wilcoxon signed rank tests (with Bonferroni correction for multiple comparisons) were performed to compare usability and comfort scales between devices. Agreement analysis with sensitivity and specificity was performed per a contingency table. Bland–Altman plots were used to assess bias and agreement by displaying absolute differences and limits of agreement of FeNO measurements. Bland–Altman analysis was performed on untransformed data and reported as absolute values for clarity, particularly as it is unclear whether log transformation would be appropriate for this analysis7. Spearman’s rank correlation coefficients were calculated for correlation comparisons among the devices10. An analysis area of FeNO ≤ 70 ppb and the clinical equivalence thresholds for good agreement were chosen for clinical relevance11, according to guidelines citing a 10 ppb difference at values < 50 ppb, and a 20% difference considered at values ≥ 50 ppb20,28; furthermore, these guideline-derived values defined the thresholds to evaluate the Bland–Altman test to demarcate adequate clinical agreement20,28.

To analyze the influence of SUS on the inter-device measurement differences, linear models were calculated. A regression analysis was performed to assess FeNO measurement differences to EC, including symptoms, respiratory diseases, lung function values, and demographics. The significance level was set to 0.05, and no correction for multiple testing was performed if not otherwise stated, as this was an exploratory study. Figures were truncated for better readability.

Results

Demographic and comorbidities of the 486 participants from this LEAD general population subgroup cohort, of which the first 235 participants were asked to answer usability questionnaires and the latter 251 to use the EC device, are described in Table 1 (and supplemental table S1). Of these, 1.95% (n = 5) were not able to complete the measurements with all 4 devices. Measured FeNO means ranged from [mean ± standard deviation, range]: 17 ± 18 to 27 ± 25 ppb (range: 0–300 ppb), with the EC device at 23 ± 20 ppb (Figure S2).

Table 1 Demographic parameters of the cohort.

Usability analysis

Overall test time and number of attempts needed for a successful measurement were significantly higher for CN than for BN, which were also significantly higher than for BV (Fig. 1). Exam time per attempt for all hand-held analyzers ranged from 10.9 ± 3.9 to 17.7 ± 9.5 s, and overall testing time from 97 ± 56 to 165 ± 43 s. Prior participant experience levels with FeNO analyzers (scale 0–100%) were similar, with 15.8% ± 32.6%, 17.3% ± 33.2%, and 21.1% ± 35.1% for CN, BN, and BV, respectively (with P-values 0.8, 0.2, 0.08, respectively).

Fig. 1
figure 1

Usability and measurement time in different hand-held analyzers. (A) The BN device showed significantly better usability scores adapted from the SUS questionnaire than CN or BV. CN and BV did not have significantly different scores. (B)–(D) The BV device was the most time-efficient compared to CN and BN and required the fewest additional attempts. (E)–(F) The examiners reported subjective differences as % agreement to a stated quality between hand-held analyzers. Assessed qualities included how intuitive the usability of the devices was and how easy the calibration was. P-values denote results of the Friedman test (1a) and pairwise differences from Wilcoxon signed-rank tests with a Bonferroni threshold of p = 0.017 (1b–1f.). SUS, System Usability Scale (range of 0–100, low to high usability, dashed line: threshold for acceptance in industry); CN: NIOX VERO; BN: NObreath; BV: Vivatmo pro.

Usability measured by the SUS usability questionnaire showed that BN was better rated with 87.6 ± 15.0 points in comparison to both CN and BV with 83.6 ± 15.7 and 83.1 ± 16.8 points, respectively (Friedmann test, p = 0.0008, multiple comparisons see Fig. 1).

BN was seen as the best device overall by 39.8% of participants, CN by 34.5%, and BV by 25.4%, whereas test execution was generally good with all devices subjectively assessed at 73.7–80.1% optimal execution rate (Supplemental table S2).

Examiners reported that BN was significantly more intuitively usable (examiners n = 10, Figure S3). High explanation effort was most often necessary for CN, with 6.4%, followed by BN, with 4.7%. Very few technical problems arose with all hand-held devices, no problem affected examination, and no device caused a critical problem (rate for any problem: CN: 33%, BN: 22%, BV: 0%). Questionnaires showed that learning curves were short, devices were considered safe, and all devices were notably easy to use in daily routine, with no significant difference between devices.

Agreement analysis

Assessments of clinical equivalence were performed on 486 participants, 251 of whom had EC measurements. There was a high overall agreement between the devices in measuring above or below clinically relevant FeNO thresholds, including 20, 25, and 40 ppb, representing thresholds proposed by the Global Initiative for Asthma (GINA), American Thoracic Society (ATS) guidelines, or recent guidelines to assist in asthma rule-in, as seen in Fig. 22,4,29.

Fig. 2
figure 2

All devices showed high overall agreements at the clinically relevant FeNO cut-offs of ≥ 20, ≥ 25, and ≥ 40 ppb, with a particularly good negative percent agreement. OA: Overall agreement. PPA: Positive percent agreement with 95% CI. NPA: Negative percent agreement with 95% CI. CI, Confidence interval. EC, Ecomed CLD 88, reference device. CN, NIOX VERO. BN, Nobreath. BV, Vivatmo pro.

The Bland–Altman plots detected good clinical agreement between all hand-held devices and the chemiluminescent EC analyzer, as well as with each other (Fig. 3). The mean inter-device difference for CN vs. EC was − 0.7 ppb (95% Limits of agreement [LoA]: − 11.6 and 10.2 ppb), BN vs. EC 7.5 ppb (95%LoA: − 8.4 and 23.5 ppb), and BV vs. EC − 2.5 ppb (95%LoA: − 17.3 and 12.3 ppb). Therefore, the mean inter-device differences were considered within the bounds of clinical acceptance criteria, and the portable chemoelectrical devices were shown to be clinically equivalent20,28. The analysis showed medium-wide limits of agreement with increasing inter-device variability at higher FeNO values7. All inter-device correlations elicited a highly significant correlation with r-values of 0.72–0.86 (all p < 0.0001, Table 2, Supplemental Figure S4).

Fig. 3
figure 3

Bland–Altman analysis between devices, n = 486. (A)–(C) Comparison of the chemiluminescence analyzer EC versus the portable chemoelectric analyzers CN, BN, and BV. (D)–(F) Inter-device differences of the portable devices at FeNO < 70 ppb. CN: NIOX VERO; BN: NObreath; BV: Vivatmo pro; EC: CLD88 analyzer.

Table 2 All devices showed a high correlation with P value < 0.0001 in all comparisons.

Any handling difficulties of the devices (as assessed by SUS) were not associated with significant inter-device measurement differences (Table 3). Inter-device measurement differences were only associated in CN devices with the presence of coughing, and no factors for BN and BV in multiple regression analyses (supplemental table S3).

Table 3 Impact of usability on analyzer result differences, assessed by univariate linear regression analyses.

Discussion

In the current general population study, participants achieved an optimal execution rate of > 74% with all hand-held devices, even though prior experience levels were low. A study of patients with asthma reported a high experience rate with the analyzers with similarly high success rates11. Summarized with statements of a published technical standard and other studies, which indicate that a single measurement is enough to measure FeNO validly11,30,31, the learning curve and effort for all assessed devices is low, which facilitates practical use.

Previous studies assessed usability with attempt numbers or visual analog scales11,14,20. One study in adolescents showed that there was a trend for CN to be more difficult to use than BN on a visual scale14. Another study showed that BV required the fewest extra attempts for a successful measurement compared to CN11. The current general population study in adults showed similar results. Test attempt counts in a study of different analyzers including CN were generally similar to the current study20. Time to successful measurement was assessed by one previous study, detecting 110–118 s total exam time in two different devices including CN, which was slightly shorter than the current exam time measured for CN and BN, but considerably longer than the current mean exam time of BV20.

To our knowledge, FeNO analyzer usability was not yet independently assessed with a standardized questionnaire or other usability tools, including protocols or interviews of participants or examiners18,20. The SUS was previously widely used to assess medical systems, software, and home appliances24,32,33, and has a high internal validity with much larger, detailed questionnaires in the non-medical system areas18,34,35. SUS questions were used within this study24, and the score from participants was significantly better for BN than either BV or CN, which were similar. All mean SUS scores were well above the threshold for acceptance in the industry25,26,27.

Previous FeNO equivalence studies were inconsistent with sometimes smaller or varied cohorts8,9,11,14,36,37,38, comparing mostly two portable analyzers7,31, with mixed cohorts of the general population or participants with respiratory diseases, especially asthma. Other previous studies compared one or more portable devices with a reference device11,19,39,40,41,42,43,44.

Despite the diverse cohorts including children or adults, respiratory healthy or comorbid participants, most studies indicated a high to very high inter-device correlation, with coefficients often in the range of 0.9 or above, between references, BN37,38, BV11,12, CN11,39, and NIOX mino19,42,45,46. The current study confirmed these results in a larger cohort of 486 participants. In particular, the mean differences between the reference device, BN, and BV were < 10 ppb, and between the reference and CN, < 1 ppb, which represent good to excellent clinical agreements between the devices.

Previous studies mostly saw bias values below the clinically relevant threshold20,28, but the reporting of Bland–Altman results were variable7. Studies with log-transformed results including BN and CN7,11,19,46, and other studies citing relative LoA values19 and absolute LoA values including CN, BV, and NIOX mino39,42,45, concluded that the measured devices had an acceptable degree of agreement and were clinically equivalent. Similarly, a study of BN in participants with asthma detected clinical equivalence with a small bias toward lower FeNO values37. Some previous studies contrasted these results, including small studies of 5–32 patients of mainly the NIOX mino device with references, showing bias in both directions7. Another study of CN and BN found differences not only between devices but between three attempts on the same analyzer, possibly because of a different cohort of children with asthma, which could lead to a higher FeNO variability14,47. Another study saw high variability particularly at high values of FeNO in patients with asthma48. The discrepancies could be explained by uncommon equivalence tests in some studies or comparable results at low FeNO values with large variability at high values7, as analyzers have higher variability at large FeNO ranges11,19,48.

Relevant clinical decisions4,22,29 commonly depend on thresholds of 20–50 ppb determined by ATS, GINA or ERS guidlines2,11,49. The guideline-determined thresholds to assist asthma diagnosis rule-in of ≥ 25–40 ppb2,4,6 were detected in the current study by all devices with a high reference agreement index, corroborating the results of another study comparing portable devices11. A high overall agreement index was also measured with BN, which showed an isolated lower positive percent agreement index.

Other studies assessed analyzer agreement for FeNO as a biomarker for airway eosinophilia20,38 or asthma diagnosis7,36 with medium to high rates of success7; however, FeNO is not a biomarker for eosinophilia1, and thresholds determined as primary asthma diagnostic are questionable4.

Flow is an essential component of FeNO measurement and is dependent on airway caliber50,51,52,53. In clinical settings with increased airway inflammation, such as exacerbated asthma or airway reactions after an allergen challenge, concomitant bronchial constriction is known to reduce the airway caliber, potentially increasing the variability of high FeNO values53,54. Changes of FeNO in high ranges are deemed clinically relevant above 20% change from baseline; therefore, the current study focussed on agreement at clinically relevant cut-offs and a range below 70ppb011.

A potential limitation of FeNO equivalence studies, as our study, is the repeated measurement of FeNO in quick succession, which might lead to a decline in measured FeNO values55. However, more than 2 min of normal tidal breathing was enabled between each test, and the order of the devices was randomized per protocol. Serial FeNO tests are also known to not significantly deviate from same-device validity tests12.

Another limitation could be that this study did not take into account ambient NO levels11. However, the placement of the tests did not change throughout the study, and environmental changes of NO previously did not show any influence on inter-device measurability11.

Conclusion

In this study, all portable FeNO analyzers showed overall good usability with an above-average SUS usability score. The best usability score was observed with the BN device, while the BV device had the shortest measuring time and the fewest additional attempts. The assessed devices showed short learning curves, no critical problems, and high daily routine usability. Concerning the result equivalence, the lowest difference to the stationary EC analyzers was observed with the CN device. All three devices had high overall FeNO agreement rates on relevant clinical thresholds. This study is the first extensive usability assessment in FeNO analyzers and highlights relevant aspects for usability improvement and further research.