Main

Sepsis, an often fast-progressing and potentially fatal condition, is defined as a life-threatening acute organ dysfunction caused by a dysregulated host response to an infection1. Sepsis requires rapid administration of antimicrobials and fluids2,3 because the risk of death from sepsis continues to be high4,5. Sepsis is often subtle and difficult to detect, and the diagnosis is frequently missed, with potentially deadly consequences. As a result of difficult diagnosis and an imperative for early treatment, many patients with suspected sepsis who have alternative diagnoses are overtreated with antibiotics and hospitalization, which is costly, potentially harmful and contributes to antibiotic resistance. The need to determine whether a patient with suspected sepsis should be treated with anti-infective therapy (for example, antimicrobials and/or surgical source control) or not, and whether they should be hospitalized or not, underscore the need to assess a patient with suspected infection along two ‘axes’: infection presence and severity6,7.

There are currently no tests used in clinical practice that rapidly and reliably determine the presence and severity of infection6,8. Protein-based biomarkers, such as C-reactive protein (CRP) and procalcitonin, or direct pathogen detection tests can assist in the diagnosis of infection but often lack the accuracy and/or speed to allow identification of the presence of infection or to distinguish bacterial from viral etiology that is important for antibiotic decision-making at the time of presentation9,10. Patients with suspected infection (for example, belly pain) may in fact have a non-infectious cause of illness (pancreatitis); hence, a ‘sepsis’ test should also be able to diagnose whether a patient is infected or not. Although early admission to the intensive care unit (ICU) is imperative for patient survival advantages11, biomarkers (for example, lactate and cellular morphology tests) and clinical scores (for example, Sequential Organ Failure Assessment (SOFA) and quick SOFA (qSOFA)) for estimating risk can be time-consuming to collect and have limited accuracy for predicting the likelihood of decompensation in broader populations12,13.

We developed the TriVerity test to address this unmet need by determining (1) the likelihood of a bacterial infection, (2) the likelihood of a viral infection and (3) all-cause risk of mechanical ventilation, vasopressor use and/or new renal replacement therapy (RRT) within 7 days (‘ICU-level care’). TriVerity assigns a sample into one of five interpretation bands ranging from Very Low to Very High for each of the three likelihood scores using machine learning algorithms applied to semiquantitative measurements of 29 host immune mRNAs, which were previously reported as associated with infection status, type and severity14,15,16,17. Testing takes approximately 30 minutes using the cartridge-based Myrna instrument, with an operator hands-on-time of under 1 minute. The scores each fall into one of five interpretation bands ranging from Very Low to Very High.

Here we report the results of the SEPSIS-SHIELD study conducted for US Food and Drug Administration (FDA) clearance (obtained 10 January 2025; K241676) across 22 emergency departments in the United States and Europe. Building on previous observational studies18,19,20,21, we prospectively recruited 1,441 patients with suspected acute infection or suspected sepsis and evaluated whether TriVerity can (1) diagnose bacterial infections versus viral infections versus non-infectious mimics and (2) predict need for ‘ICU-level care’. We compared the accuracy of TriVerity to established diagnostic and prognostic biomarkers and performed a preliminary investigation of potential clinical utility for improving appropriate administration of antibiotics and prediction of sepsis.

Results

Between March 2020 and May 2024, 1,441 adult patients with suspected acute infection or suspected sepsis and (1) at least one abnormal vital sign or (2) at least two vital sign changes with a blood culture order were enrolled from 22 emergency departments (Fig. 1 and Supplementary Table 1). After excluding screen failures and withdrawals, 1,222 patients had valid TriVerity results, and 729 of these were clinically adjudicated as consensus for the presence of a bacterial and/or viral infection (Yes or No adjudication status; Methods). We present these consensus-adjudicated patients as the main population (primary diagnostic endpoint); the secondary outcome of forced adjudication includes all 1,222 patients but with less certain adjudication status (Yes plus Probable and Unlikely plus No adjudications are grouped together; Methods). For the prognostic endpoint, 1,120 patients were evaluable (Fig. 1).

Fig. 1
figure 1

Overview of the study design and results for consensus clinical adjudication and clinical outcome.

Characteristics of study participants at emergency department presentations

Mean age, sex, race and ethnicity were representative of the US emergency department population (Extended Data Table 1). Among patients evaluable for the diagnostic endpoint, the mean age was 50.6 years, and 47.3% of the patients were female. Most patients were White (63.2%), followed by Black (30.6%) and Hispanic/Latino (13.3%). Similar percentages were observed among those evaluated for the prognostic endpoint (Extended Data Table 1). Metabolic/endocrinological, respiratory and cardiovascular diseases were the most prevalent medical conditions. There were no marked differences between patients evaluable for the diagnostic endpoint versus those evaluable for the prognostic endpoint. Overall, 132 patients (18.1%) evaluable for the diagnostic endpoint were immunosuppressed compared to 206 patients (18.4%) evaluable for the prognostic endpoint. Malignancies were the most frequently found type of immunosuppression (approximately 10% of patients), followed by solid organ transplantation, steroid treatment and HIV/AIDS (Extended Data Table 1).

The mean leukocyte counts in patients eligible for the diagnostic endpoint was 11.8 × 109 per liter, the mean neutrophil percentage was 76.9 and the mean lymphocyte percentage was 12.5; similar numbers were observed in patients eligible for the prognostic endpoint (Supplementary Table 2a). Mean concentrations of biomarkers (CRP, procalcitonin and lactate) are shown in Supplementary Table 2b.

Among 1,120 patients eligible for the prognostic endpoint, 24.3% were discharged to home, 55.8% were admitted to a regular ward and 13.8% were admitted to an ICU (Extended Data Table 2a). The mean length of hospital stay was 5 days and of ICU stay was 6.1 days; there was no significant difference in consensus-adjudicated patients. Out of the 1,120 patients, 122 (10.9%) met the primary severity endpoint, which included mechanical ventilation (n = 63, 51.6%), vasopressor use (n = 99, 81.2%) and/or RRT (n = 23, 18.9%) (Extended Data Table 2b). Among 147 patients who were transferred to the ICU, 98 (66.7%) received the ‘ICU-level care’ interventions (composite primary severity endpoint). Extended Data Table 2b also shows differences in clinical outcomes stratified by clinically adjudicated infection status.

Infection status and anatomical location of infection

Out of 729 consensus-adjudicated patients for diagnostic endpoint, 448 (61.5%), 165 (22.6%) and 12 (1.6%) had bacterial, viral and bacterial–viral co-infection, respectively, whereas 104 (14.3%) were adjudicated to not have an infection (Table 1). Among 460 patients adjudicated to have bacterial infections, urinary tract infections (n = 142, 30.9%) and skin or soft tissue infections (n = 133, 28.9%) were most frequent, followed by bloodstream (n = 92, 20%), gastrointestinal tract (n = 69, 15%) and respiratory tract (n = 58, 12.6%) infections. Among 177 patients adjudicated to have viral infections, 169 (95.5%) had respiratory tract infections, followed by gastrointestinal tract infections (n = 4, 2.3%). Three parasitic infections were diagnosed (Giardia lamblia enterocolitis, n = 1; Trichomonas vaginalis vulvovaginitis, n = 2). The percentage of bacterial infections was relatively high (63.1%), driven by seasonal epidemiology and the broad inclusion criteria allowing for enrollment of patients with all types of suspected infections (not only respiratory infections). Infection status for the forced adjudication cohort is shown in Supplementary Table 3a.

Table 1 Clinically adjudicated infection status and anatomical localization of infection

TriVerity result output

TriVerity provides three scores: Bacterial, Viral and Severity. Each score ranges from 0 to 50 and is divided into five interpretation bands (Very Low (0–10), Low (11–20), Moderate (21–30), High (31–40) and Very High (41–50)) that reflect increasing likelihoods of the corresponding infection type or severity. The suggested clinical interpretation of the two highest bands (‘Very High’ and ‘High’) is ‘rule-in’, whereas, for the two lowest bands (‘Very Low’ and ‘Low’), it is ‘rule-out’. Accuracy of TriVerity scores was evaluated as compared to post hoc clinical adjudications (Methods and ref. 22). For each of the three scores, when considering them separately, 80–86% of the patients were assigned to one of the rule-in or rule-out bands. When considered together, almost all of TriVerity results (99.6% and 99.3% of patients in the consensus and forced adjudication cohort, respectively) fell into one of the four clinically actionable interpretation bands (that is, Very Low, Low, High and Very High) for at least one of the diagnostic and prognostic scores.

Accuracy of TriVerity for diagnosis of bacterial infection

The Bacterial score had an AUROC of 0.83 (80% CI: 0.81–0.85) for detecting bacterial infections in the consensus population. Probability of bacterial infection, as measured by likelihood ratio, ranged over 100-fold and increased monotonically by interpretation band (likelihood ratios: Very Low, 0.08 (80% CI: 0.05–0.11); Low, 0.54 (0.45–0.63); Moderate, 1.14 (0.94–1.42); High, 2.50 (1.97–3.31); Very High, 8.04 (5.66–12.43) (Table 2). The Bacterial score had specificity of 95.5% and 90.7% for the Very High and High bands, respectively, and sensitivity of 97.2% and 81.5% for the Very Low and Low bands, respectively (Table 2). Notably, 81.3% of the patients with consensus adjudication (and 80.4% of those with forced adjudication) fell into one of the clinically actionable Very Low, Low, High and Very High interpretation bands for bacterial infection. At a prevalence of 63.1% for bacterial infections, the probability of having a bacterial infection for the Very High interpretation band was 93.2% and for Very Low was 12.1%. Using forced adjudication (entire population including ‘uncertain’ adjudication—that is, probable and unlikely cases), the area under the curve (AUC) for the detection of bacterial infections was 0.76 (80% CI: 0.75–0.78) (Supplementary Table 3b); sensitivity, specificity and likelihood ratio for the accuracy of TriVerity Bacterial scores are shown in Extended Data Table 3a.

Table 2 Accuracy of TriVerity Bacterial score for the diagnosis of bacterial and viral infections

Accuracy of TriVerity for the diagnosis of viral infection

The Viral score had an AUROC of 0.91 (80% CI: 0.89–0.93) for the detection of viral infections. Likelihood of viral infection increased monotonically more than 400-fold by interpretation band (likelihood ratios: Very Low, 0.09 (80% CI: 0.05–0.14); Low, 0.32 (80% CI: 0.22–0.41); Moderate, 0.87 (80% CI: 0.64–1.13); High, 2.36 (80% CI: 1.68–3.25) and Very High, 40.93 (80% CI: 27.73–72.16)). The Viral score had specificity of 98.6% and 94.0% for the Very High and High bands, respectively, and sensitivity of 95.5% and 90% for the Very Low and Low bands, respectively (Table 2). Notably, 86.1% of patients with consensus adjudication (and 81.3% of those with forced adjudication) fell into the clinically actionable Very High, High, Low and Very Low bands. At a prevalence of 24.3% for viral infections, the probability of having a viral infection in the Very High band was 92.9% and for Very Low was 2.9%. In the forced adjudication, the Viral score had an AUROC of 0.83 (0.81–0.85) for the detection of viral infections; sensitivity, specificity and likelihood ratio for the accuracy of TriVerity Viral scores are shown in Extended Data Table 3b.

The accuracy of the Viral score was robust in patients diagnosed with SARS-CoV-2 (Extended Data Table 4a), demonstrating its applicability and generalizability to emerging pathogens. Median Viral scores were highest in patients with infection from influenza A/B and SARS-CoV-2; patients diagnosed with human metapneumovirus and respiratory syncytial virus had intermediate Viral scores, whereas those with adenovirus and rhinovirus/enterovirus had the lowest Viral scores (Extended Data Table 4b).

Cross-classifications of both Bacterial and Viral scores for patients clinically adjudicated as bacterial infection, viral infection, co-infection and non-infected are shown in Supplementary Table 4a–d.

Overall, the Bacterial and Viral scores were strongly associated with increasing likelihoods of bacterial and viral infections, respectively. Notably, the Bacterial and Viral scores increased monotonically, and the 80% CIs for adjacent bands did not overlap in any of the analyses (Supplementary Fig. 1a,b).

Accuracy of TriVerity compared to commonly used biomarkers

The AUROC of the Bacterial score (0.83, 80% CI: 0.81–0.85) was significantly higher than those of commonly used biomarkers for the diagnosis of infections, including procalcitonin (AUROC = 0.71, 80% CI: 0.68–0.73), CRP (AUROC = 0.74, 80% CI: 0.72–0.77) and white blood cell (WBC) counts (0.76, 80% CI: 0.73–0.78) (P < 0.0001 for all comparisons; Extended Data Table 5a). Supplementary Fig. 2 shows the correlation of WBC, CRP and procalcitonin concentrations with TriVerity Bacterial interpretation bands; Extended Data Table 2d shows the AUROCs for these biomarkers in the forced adjudication cohort.

In addition, the diagnostic accuracy of TriVerity generalized across races better than other biomarkers. Specifically, although the overall AUROC for procalcitonin was 0.71 for diagnosis of bacterial infection, it was substantially lower in Blacks (AUROC = 0.66) and other races (AUROC = 0.62) compared to Whites (AUROC = 0.74), highlighting lower clinical utility of procalcitonin in non-White populations. By contrast, the Bacterial score’s overall AUROC of 0.83 remained virtually identical in Whites (0.82) and Blacks (0.83) and was higher in other races (0.91) (Extended Data Table 5b).

Because TriVerity measures the host immune response to infection, we investigated whether the Bacterial and Viral scores maintained their diagnostic accuracy in immunocompromised patients, who also have an increased risk of infection. The AUROCs for TriVerity Bacterial and Viral scores were not significantly different between immunocompromised and immunocompetent patients (0.80, 80% CI: 0.75–0.85 versus 0.83, 80% CI: 0.81–0.85 for Bacterial scores; 0.89, 80% CI: 0.86–0.94 versus 0.91, 80% CI: 0.89– 0.94 for Viral scores) (Extended Data Table 6a,b). Lastly, when considering median Bacterial and Viral scores for specific anatomical sites of infection, patients adjudicated as positive for bacterial infection in the bloodstream had the highest median Bacterial score, followed by patients with bacterial infections of the respiratory and urinary tracts (Extended Data Table 7a,b). Only AUCs for the Bacterial scores in patients with bloodstream infections were significantly (P < 0.05) higher than the overall bacterial AUCs.

Prognostic accuracy of the TriVerity Severity score

The Severity score predicted the need for ‘ICU-level care’, defined as an acute need for mechanical ventilation, vasopressor use and/or RRT within 7 days, with an AUROC of 0.78 (80% CI: 0.75–0.81). Risk of requiring ICU-level care monotonically increased by interpretation band over 50-fold (likelihood ratios: Very Low, 0.22 (80% CI: 0.14–0.31); Low, 0.43 (80% CI: 0.30–0.57); Moderate, 1.63 (80% CI: 1.35–1.97; High, 2.41 (80% CI: 1.96–2.91); Very High, 11.33 (80% CI: 7.07–17.75)). The Severity score demonstrated specificity of 98.7% and 86.1% for the Very High and High bands, respectively, and sensitivity of 91.8% and 87.7% for the Very Low and Low bands, respectively (Table 3). Most patients (79.6%) were in the clinically actionable Very High, High, Low and Very Low interpretation bands. At a prevalence of 10.9%, the probability of requiring ICU-level care within 7 days was 58.1% for the Very High band and 2.7% for the Very Low band. Likelihood ratios of the Severity scores for the three different individual components of ‘ICU-level care’ were similar to the overall likelihood ratios presented above (Extended Data Table 8). Kaplan–Meier survival analysis also found significantly increasing hazard ratios for the need of ‘ICU-level care’ between days 0 and 7 from ‘Very Low’ to ‘Very High’ Severity bands (Supplementary Fig. 3).

Table 3 Accuracy of TriVerity Severity score for the prediction of the need of mechanical ventilation, vasopressor use and/or RRT within 7 days

Lactate, a mandated biomarker commonly used to estimated severity as part of the SEP-1 bundle, had an AUROC of 0.76 (80% CI: 0.73–0.80) for predicting the need for ICU-level care. In the same patients, the Severity score had an AUROC of 0.78 (80% CI: 0.75–0.80). Lactate higher than 4 mmol l−1 demonstrated specificity of 95.8%, similar to the Very High and High bands for TriVerity Severity score. By contrast, sensitivity in patients with lactate lower than 2 mmol l−1 was 66.7% (Extended Data Table 9a), which was substantially lower than the sensitivity for Very Low and Low bands for the Severity score (>87%). In total, 213 patients had indeterminate lactate concentrations (2–4 mmol l−1), of whom 58 needed ‘ICU-level care’ or died (Extended Data Table 9b). Of these 58, TriVerity identified 46 (79.3%) as Moderate to Very High risk of severe illness, substantially reducing the uncertainty in identifying patients at higher risk of severe illness compared to lactate (Extended Data Table 9b). Hence, despite similar AUROCs, TriVerity demonstrated significantly lower false-negative and higher true-positive rates in patients with indeterminate lactate concentrations.

The severity of a patient’s clinical condition at the time of presentation in the emergency department can be assessed using clinical scores, such as the qSOFA score. We investigated whether integrating qSOFA with the TriVerity Severity score would further improve the accuracy of predicting ‘ICU-level care’. We determined pre-test and post-test probabilities of sequential qSOFA plus Severity scores (Fig. 2 and Supplementary Table 5). ICU-level care requirement among patients with low-risk qSOFA scores (0–1) was 7.7% and for those with high-risk qSOFA scores (2–3) was 46.3% (Fig. 2). However, stratifying patients by TriVerity Severity score interpretation bands markedly increased predictive accuracy in both the low and high clinical risk patients. For instance, a patient with qSOFA 0–1 (7.7% risk overall) would increase to a 52% risk with a ‘Very High’ TriVerity Severity score, whereas a patient with qSOFA 2–3 (46% risk overall) would decrease to a 18–25% risk with a ‘Low’ or ‘Very Low’ Severity score. The overall sensitivity of predicting ‘ICU-level care’ using qSOFA scores alone was 33.93%. When we combined qSOFA with the Severity score, the sensitivity increased significantly to 84.82% (P < 0.05).

Fig. 2: TriVerity Severity scores used in combination with qSOFA to predict the need for mechanical ventilation, vasopressor use and/or RRT (‘ICU-level care’) within 7 days.
figure 2

a, Patients with low qSOFA risk (scores 0–1). b, Patients with high qSOFA risk (scores 2–3). Overall percentages of patients with need for mechanical ventilation, vasopressor use and/or RRT (‘ICU-level care’) within 7 days independent of qSOFA and TriVerity results (= pre-test probabilities) are shown in black circles, whereas gray circles represent percentages of patients requiring ‘ICU-level care’ based on qSOFA scores alone. Colored circles represent the percent of patients requiring ‘ICU-level care’ based on combined qSOFA plus TriVerity Severity scores; colors represent TriVerity Severity score interpretation bands (Very Low, dark green; Low, bright green; Moderate, orange; High, bright red; Very High, dark red).

Finally, the Severity score also predicted the composite need for ‘ICU-level care’ and/or 28-day mortality with a specificity of 99.0% for the rule-in Very High band and a sensitivity of 95.0% for the rule-out Very Low band, which further supports its predictive value of severe illness (Supplementary Table 6).

Potential clinical utility

To examine the potential clinical utility of TriVerity, we performed several preliminary analyses. We note that these analyses were not part of the preplanned analyses. First, we investigated whether the Bacterial scores could help reduce inappropriate antibiotic treatment using post hoc clinically adjudicated infection status as the gold standard (Fig. 3a). Thirty-three patients were adjudicated to have a bacterial infection but did not receive antibiotics on the day of presentation, of whom 10, 11 and 3 (overall 24 (72.7%)) had a Bacterial score of Moderate, High or Very High, respectively (Fig. 3a), suggesting that TriVerity Bacterial score could have helped emergency department providers avoid delays in antibiotic administration. On the other hand, 103 patients were adjudicated to not have a bacterial infection but received antibiotics on the day of presentation, of whom 62 (60.2%) had a Bacterial score of Low or Very Low (Fig. 3a), suggesting that TriVerity results could have helped emergency department providers avoid antibiotic overprescription.

Fig. 3: Potential diagnostic and prognostic clinical utility of TriVerity.
figure 3

a, Administration of antibiotics on day 0 compared to true infection status (post hoc clinically adjudicated bacterial infection). b, Post-test probabilities of bacterial infection as determined by pre-test probability and TriVerity likelihood ratios for the five Bacterial score bands (dark red, Very High; red, High; orange, Moderate; green, Low; dark green, Very Low). Pre-test and post-test probabilities for the five TriVerity Bacterial score bands are hypothetical but cover the entire range of possible probabilities. Dotted vertical and horizontal lines indicate physician decision thresholds (treat or not treat with antibiotics)23. c, Percent of patients in each combination of TriVerity Bacterial and Severity bands with bacterial sepsis, as by adjudicated bacterial infection and either delta in SOFA ≥ 2 or requirement of ICU-level care within 7 days.

Second, we applied likelihood ratios associated with TriVerity Bacterial bands to calculate theoretical post-test probability of bacterial infection for a range of hypothetical pre-test probabilities—ranging from 0 to 1 in 0.1-increment bins (Fig. 3b)—to reflect that physicians will use the Bacterial score in a variety of patients, including in those they judge infection is unlikely (for example, pre-test probability of 10%) or very likely (for example, pre-test probability of 90%). Using a threshold of 69% probability of bacterial infection for providers to prescribe antibiotics23, the post-test probability crosses the threshold in patients with ‘Very High’ Bacterial scores and pre-test probability as low as 0.3. Similarly, a ‘High’ Bacterial score results in a post-test probability of 0.69 in patients with a pre-test probability greater than 0.5. Conversely, Bacterial scores in the Very Low and Low bands reduce post-test probability below the 0.69 threshold for patients with a pre-test probability less than 90% and 80%, respectively (Fig. 3b). This analysis demonstrates that TriVerity Bacterial scores can assist emergency department providers in providing antibiotics to patients most likely to have bacterial infections and avoid antibiotics in patients less likely to have bacterial infections.

Finally, we explored whether the combination of high Severity and Bacterial scores would identify cases of bacterial sepsis. Because sepsis was not an adjudicated endpoint in the study, we used the definition of adjudicated bacterial infection and a change in SOFA score of ≥2 or requiring ICU-level care as a surrogate and examined the percent of patients with each combination of Bacterial and Severity score who met this definition for sepsis (Fig. 3c). We found that 68% of patients with a ‘Very High’ Bacterial score and a ‘Very High’ Severity score met this definition of sepsis. In addition, 33% of patients with a ‘High’ Bacterial score and a ‘Very High’ Severity score, or vice versa, met this definition of sepsis. In contrast, no patients with a ‘Very Low’ Bacterial score and a ‘Low’ or ‘Very Low’ Severity score had sepsis. Less than 3% of patients with a ‘Low’ Bacterial score and a ‘Low’ or ‘Very Low’ Severity score had sepsis.

Using a logistic regression model for infection status, including vital signs and laboratory values, and TriVerity (even though the adjudicators were blinded to all TriVerity results), the Bacterial score was the most significant variable associated with infection status, followed by WBC and procalcitonin (Supplementary Table 7).

Discussion

Diagnosis and prognosis of patients presenting with symptoms of acute infections or suspected sepsis is an ongoing clinical challenge. Current diagnostic tools have limited sensitivity or require a long time to obtain results and do not assess prognosis or etiology6,8,24. Patients with suspected acute infection or suspected sepsis need to be evaluated along two clinical ‘axes’ (infection and severity) for appropriate treatment6. However, no single biomarker can adequately resolve a patient regarding presence (infected or not), type (bacterial or viral) and severity (risk of decompensation and organ dysfunction) of infection.

The SEPSIS-SHIELD study demonstrated high overall accuracy of the FDA-cleared TriVerity test, with sensitivities and specificities in the Very Low and Very High bands of 92–99% for establishing infection status and need for ‘ICU-level care’. This was true across multiple subpopulations and outperformed existing biomarker tests. Although a high proportion of patients were immunosuppressed, no significant differences were observed in the accuracy of TriVerity compared to immunocompetent patients. Notably, the performance in immunocompromised patients has not been determined in clinical studies for other host response tests, as these were validated in patient cohorts excluding immunosuppression25.

Our results suggest that TriVerity has potential to assist emergency department clinicians in several clinical decisions, including identifying patients who would benefit from anti-infective therapy, while mitigating antibiotic overuse, thereby assisting in reducing common side effects of antimicrobials and combating further emergence of antimicrobial resistance26,27. Notably, TriVerity met the expert consensus target product profile for diagnostic assays to differentiate between bacterial and non-bacterial infections for reducing antimicrobial overuse28. Recently introduced protein-based host response tests only distinguish bacterial versus non-bacterial etiologies but do not distinguish viral infections from non-infectious inflammation or co-infections29,30. By contrast, to our knowledge, TriVerity is the first direct ‘biomarker’ for the diagnosis of viral illness; specific and accurate identification of patients with viral infections will be adjunctive to other approaches for emergency department clinicians to decide on pathogen-specific viral diagnostics, antiviral treatment and isolation measures to prevent spread of infection. Furthermore, the simultaneous presentation of TriVerity Bacterial and Viral scores may reinforce the impact of a single bacterial result in support of antimicrobial stewardship. TriVerity Viral results were stable across different viruses, including influenza and SARS-CoV-2, although SEPSIS-SHIELD enrollment occurred before, during and after the COVID-19 pandemic. The virus-agnostic design and the high accuracy of TriVerity for the diagnosis of infection with SARS-CoV-2 also suggests its potential to assist in detection of emerging viral pathogens before pathogen-specific tests become available.

In addition, TriVerity also diagnosed bacterial–viral co-infections. Although the prevalence of co-infection is low in the emergency department, ruling out bacterial co-infections is important to allow withholding antibiotics, especially when radiographic or other clinical features introduce diagnostic uncertainty and during pandemic threats31,32. Finally, TriVerity results indicating low probabilities of bacterial and viral infection allow clinicians to focus their efforts on non-infectious diagnoses, thereby avoiding ordering of expensive pathogen identification tests.

The early identification and treatment of suspected sepsis is critical33, but there is disagreement among major societies regarding exact guidance and timing for optimizing clinical care. For example, the SEP-1 bundle from the Centers for Medicare & Medicaid Services and Surviving Sepsis Campaign guidelines strongly recommend the rapid administration of antibiotics to patients with suspected sepsis, but physician-based societies (for example, the Infectious Diseases Society of America) have argued that providers do not have adequate tools to judge who has sepsis, resulting in overtreatment and substantial costs associated with patients given unnecessary antibiotics34,35, including the rise of antimicrobial resistance. However, occult sepsis continues to be overlooked in a substantial proportion of emergency department patients. Although qSOFA ≥ 2 conveys a high mortality rate, further classification of patients with qSOFA ≤ 1 is needed because, despite low mortality rate, 30% of sepsis deaths occur in patients with low qSOFA36. In a post hoc analysis, after stratifying patients based on qSOFA, using TriVerity Severity score further improved the risk prediction dramatically by improving accuracy in those patients clinically at low risk but at substantially higher risk of severe illness. TriVerity also correctly identified those patients with substantially lower risk who had been clinically identified as high risk. The Severity score also provided additional information to aid in identification of patients with sepsis. An alternative test based on host mRNA expression was developed as an aid to differentiate infection-positive sepsis from infection-negative systemic inflammation but is cleared only for ICU use37. TriVerity also demonstrated high accuracy in ruling in and ruling out ICU transfer and predicted 28-day mortality with high accuracy. Taken together, these results suggest that TriVerity could improve patient management in the emergency department, with results offering potential for better informed clinical decision-making, including disposition decisions.

Taylor et al.23 recently surveyed 153 US clinicians responding to real-world derived vignettes of suspected sepsis to analyze decision thresholds to treat or not with antibiotics, using infection prediction models and varying case severities. Although the overall threshold to treat was 69% probability of infection, it varied significantly, such that it was 55% for high illness severity versus 84% for low illness severity. These findings strongly suggest the potential of the combined TriVerity diagnostic (Bacterial and Viral) and illness severity scores to further increase the threshold for antibiotic initiation. Incorporating context-dependent thresholds—potentially specific for TriVerity results in the outer Very High and Very Low bands versus the inner High and Low bands—into discriminating and well-calibrated models will inform the development of future clinical decision support systems.

Our preliminary results also demonstrate the potential clinical utility of the diagnostic and prognostic scores from TriVerity. In the SEPSIS-SHIELD study, TriVerity could have substantially reduced inappropriate antibiotic treatment by reducing both false positives (treated with antibiotics but no bacterial infection) and false negatives (bacterial infection not treated with antibiotics), which, in turn, could assist emergency department providers in acting as antimicrobial stewards. TriVerity also correctly identified patients who died without receiving the ICU-level care, demonstrating its potential to reduce the false negatives for prognosticating illness severity. The limitations of clinical estimation of infection and severity probabilities by emergency department providers reported here are in line with reports in the literature24. Further interventional studies are required to determine the clinical benefits of TriVerity.

Results of the present study, along with those of other studies18,19,20,21, support broad generalizability of our findings. Demographics of participants were representative of the emergency department patients with respect to age, sex, race and ethnicity, underlying conditions, immune status and clinical outcomes. Unlike procalcitonin, which has historically been less accurate in Black/African American and Hispanic subgroups38, TriVerity Bacterial scores maintained significantly higher accuracy than procalcitonin across White, Black and other races, suggesting its potential for improved diagnostic accuracy in minority populations. We also observed broad generalizability across patients with diverse suspected infection sites. Notably, we did not observe significant differences in the performance of TriVerity among patients with infections localized at different anatomical sites, except for bloodstream infections, which is expected because bloodstream infections would lead to stronger immune responses.

Notably, our statistical analysis approach ensured inclusion of all patients, as there was no equivocal adjudication (adjudication categories were Yes, Probable, Uncertain or No) and included all patients with results that fell into the ‘Moderate’ TriVerity bands. This approach is critical because several studies of other tests have been published in which statistical analysis excluded both patients with uncertain adjudication (a clinically critical ‘gray zone’ population) or uninfected and those in the middle (for example, equivocal) interpretation bands, thereby unfairly overstating those testsʼ overall accuracy25,39,40,41,42.

Our study has limitations. First, without a true gold standard, we used clinical adjudication as the reference standard to determine patient infection status22. As expected, the accuracy of the TriVerity test for diagnosing bacterial and viral infections was highest in the consensus population (certain infection status). Although consensus adjudication is the most accurate comparator to assess infection status, it excludes a fraction of patients with uncertain infection. As expected, including patients with uncertain infection status reduced the accuracy due to potential misclassifications in forced adjudication. Second, the prevalence of bacterial infections was higher than viral infections and non-infectious diseases, which was likely driven by (1) the broad inclusion criteria resulting in approximately 50% of all patients having abdominal, skin and soft tissue or urinary tract diseases (rarely caused by viral pathogens) and (2) the seasonality of bacterial versus viral infection. Depending on local epidemiology and setting, TriVerity may have varying positive and negative predictive values. However, we note that SEPSIS-SHIELD enrolled patients over cold and other seasons. Therefore, we think that the results reported here are unlikely to be affected by these factors. Furthermore, the development of the machine-learning-based TriVerity classifiers was based on a large number of datasets from highly diverse cohorts of patients with suspected infections and/or suspected sepsis43. Lastly, the percentage of patients who required ICU-level care was lower than expected; thus, although results of the TriVerity Severity score indicate high accuracy for the prediction of ‘ICU-level care’, future validation studies are warranted prior to use in substantially sicker populations. We previously showed consistently high accuracy of the TriVerity Severity score among patients admitted to the ICU44,45,46.

In conclusion, TriVerity rapidly detects bacterial infections, viral infections and bacterial–viral co-infections, can rule out bacterial and viral infections and has superior accuracy compared to routine biomarkers. TriVerity can also predict the severity of illness and, when combined with clinical scores, adds granularity and potential clinical utility for predicting severe infection, including sepsis. Interventional studies are needed, and are currently underway, to provide evidence of actionability and clinical benefit of TriVerity. These will determine whether TriVerity can enable personalized management of patients with suspected acute infections and suspected sepsis for improved overall healthcare outcomes.

Methods

Ethics statement

This study was approved by local institutional review boards (enrolling sites) or a central institutional review board (Advarra). Written consent was obtained from each patient. The study was conducted with the highest respect for the individual patients and in accordance with the protocol; the ethical principles originating from the Declaration of Helsinki; the informed consent regulations stated in Title 21 Code of Federal Regulations, Part 50; International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use; Good Clinical Practice (E6) §4.8; and all applicable local and FDA regulations.

Study design

Patients were enrolled in the ‘TriVerity in the Diagnosis and Prognosis of Emergency Department Patients with Suspected Infections and Suspected Sepsis’ (SEPSIS-SHIELD; NCT04094818) study, a prospective, non-interventional, minimal-risk study enrolling adult patients presenting to the emergency department with (1) suspected acute infection and at least one abnormal vital sign or (2) suspected sepsis with at least two vital sign changes and a blood culture order. Patients were enrolled from 22 emergency departments (Fig. 1 and Supplementary Table 1). Emergency departments of community and academic hospitals were located at 21 geographically diverse locations throughout the United States and at one site in Europe (Supplementary Table 1). During the ‘FROZEN’ phase of the study (2 March 2020 to 16 February 2023), blood samples were frozen and sent to one of two reference laboratories to perform TriVerity after thawing the frozen samples. During the ‘FRESH’ phase of the study (8 December 2023 to 28 May 2024), TriVerity was performed on freshly obtained blood samples (no freezing) at the enrolling site. Fresh–frozen equivalency was established by comparing the accuracy between results obtained in the FRESH phase versus those in the FROZEN phase of the study (Supplementary Fig. 4).

The study plan, conduct and statistical analysis were guided by detailed discussions with the FDA via the Q-Submission Program and in sprint discussions under the FDA’s Breakthrough Devices Program. Power analyses were based on performance characteristics of TriVerity (bacterial, viral and illness severity) and estimated prevalences of infections, non-infectious differential diagnoses and severe outcomes.

Patients and sample and data collection

Inclusion criteria were as follows:

  1. 1.

    age ≥18 years

  2. 2.

    suspected acute infection (for example, respiratory, urinary, abdominal, skin and soft tissue infection, meningitis/encephalitis or any other infection)

  3. 3.

    at least one vital sign change (heart rate >90 beats per minute, temperature >38 °C or <36 °C, respiratory rate >20 breaths per minute or PaO2 <60 mmHg or SpO2 < 90%, systolic heart pressure <100 mmHg, altered mental status per clinical examination) or suspected sepsis of any cause (defined as blood culture order by the treating physician and at least two vital sign changes)

  4. 4.

    able to provide informed consent or consent by a legally authorized representative

Exclusion criteria were as follows:

  1. 1.

    Patient-reported treatment with systemic antibiotics, systemic antiviral agents or systemic antifungal agents within the past 7 days prior to the emergency department visit (use of antiviral treatment for HIV and hepatitis B and C, topical antibiotics, topical antivirals or topical antifungal agents, anti-herpes prophylaxis, perioperative (prophylactic) antibiotics and a single dose of antimicrobials during the present emergency department visit (<10 hours before blood draw) did not result in exclusion)

  2. 2.

    patients receiving palliative or hospice care or those receiving limited interventional care

  3. 3.

    prisoners, those mentally disabled or those unable to give consent

  4. 4.

    patients receiving experimental therapy or already enrolled in an interventional clinical trial in which a patient received some type of intervention, which could include, but was not limited to, investigational drugs, medical devices or vaccines

  5. 5.

    patients previously enrolled in this clinical trial

Of note, patients with any form of immunosuppression were eligible for SEPSIS-SHIELD.

All patients had routine samples collected as per the standard of care and were managed as per standard of care independent of the study conduct. The study intervention consisted of blood collection via venipuncture (2.5 ml of whole blood into PAXgene Blood RNA Tubes (PreAnalytix) for TriVerity testing and 5 ml of whole blood for central laboratory testing of serum CRP and procalcitonin) and a nasopharyngeal swab collection from patients with suspected upper respiratory tract infections. Patients were followed for up to 28 days; those discharged earlier were contacted via phone call to collect follow-up information (readmission, etc.).

Data collection using a case report form occurred from the day of presentation in the emergency department (covering 7 days before enrollment) to the follow-up phone call after day 28. Data collected included detailed information on demographics; medical history; clinical, laboratory and imaging findings; and information related to management, including treatment with antibiotics and ICU-level treatment. Furthermore, at the time of enrollment, a questionnaire was presented to the emergency department provider and was completed no later than 24 hours after the time of enrollment. The questionnaire included questions on the provider’s presumptive diagnosis, the provider’s assessment of probability of bacterial and viral infection as well as the need for ‘ICU-level care’. All data were stored pseudonymized in a secure database (Medrio).

Endpoints

The primary endpoints for bacterial and viral etiology were determined by independent adjudicators following a transparent and standardized clinical adjudication process developed for the SEPSIS-SHIELD trial by a multidisciplinary team of physicians and laboratorians22. Adjudicators were blinded to the TriVerity results at all times and did not adjudicate cases enrolled at their own institution. In brief, two independent physicians (randomly chosen from a pool of 12 emergency department physicians) reviewed comprehensive clinical, laboratory and other patient information to adjudicate the presence or absence of bacterial and viral infections into Yes, Probable, Unlikely and No categories (no equivocal adjudication allowed). Discordant cases were resolved by one of two expert physicians (experienced and involved in generating the adjudication protocol) who were blinded to the initial reviewer results. Cases adjudicated as Yes and No (the subgroup with certain infection status) formed the consensus cohort presented in the main body of this report. Cases adjudicated as Probable or Unlikely were forced into categories of Yes/Probable and No/Unlikely to form the forced cohort (all patients including uncertain and certain adjudications). Results for the forced adjudication analysis are presented in Extended Data Table 2a–d. The primary severity outcome (patients requiring ‘ICU-level care’) was defined as receipt of acute mechanical ventilation, vasopressor and/or new RRT within 7 days. Admission to the ICU was captured independently.

TriVerity test

TriVerity (formerly HostDx or InSep) is a gene expression profiling assay that quantifies the relative expression of 29 host response genes from 2.5 ml of whole blood collected in PAXgene Blood RNA Tubes. The TriVerity system (Extended Data Fig. 1a and Supplementary Fig. 5a) comprises the TriVerity cartridge and the Myrna instrument with embedded software and proprietary preprocessing and machine learning classification algorithms that process expression data and deliver result readouts. In brief, the 29 mRNAs were selected from extensive multicohort analyses of thousands of blood transcriptome profiles across tens of independent cohorts to diagnose presence16, type15 and severity17 of infection and were further optimized19. On the Myrna device, the signal processing estimates 29 relative gene expression values based on the cartridge well camera images taken at regular intervals during loop-mediated isothermal amplification47. We apply two logistic regression classifiers to the mRNA expression data to estimate probabilities of bacterial, viral or no infection and the probability of severe illness. These classifiers (described in detail in ref. 43) were finetuned using a combination of hyperparameter optimization and grouped cross-validation approaches, which was found to reduce overfitting during a classifier development. The system is designed for single-sample testing. TriVerity converts probabilities to the three scores corresponding to (1) the likelihood of a bacterial infection, (2) the likelihood of a viral infection and (3) the severity of the patient’s illness. They fall into actionable interpretation bands ranging from Very High to High, Moderate, Low and Very Low (Extended Data Fig. 1b). TriVerity received FDA clearance on 10 January 2025. Results of analytical studies, including limit of detection, equivalency of fresh versus frozen samples, interfering substances, quality controls and other details of the TriVerity system, are described in the FDA Decision Summary (https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K241676) and in a separate report48.

Statistical methods

The analysis population consisted of all patients who provided consent, met all inclusion criteria, met no exclusion criteria and had PAXgene blood samples successfully processed (resulting in three TriVerity scores). The overall analysis population was divided into diagnostic and prognostic analysis populations. The diagnostic endpoint population included patients who had been adjudicated for bacterial and viral infection to define the ground truth, defined as either a certain (Yes or No) or an uncertain (Probable or Unlikely) infection status. The diagnostic population was further classified as Consensus (only contains patients with a certain (Yes or No) adjudication) and Forced (all patients adjudicated as (Yes or Probable) versus (No or Unlikely)). The primary prognostic endpoint population included patients with complete information regarding the presence of acute need for mechanical ventilation, vasopressor use and/or RRT within 7 days. The secondary prognostic endpoint included patients who met the primary endpoint and/or 28-day mortality.

TriVerity bands were assessed in terms of likelihood ratio, sensitivity, specificity and predictive values (post-test probabilities) dependent on the band (whether intended as a rule-in band with a case being a true positive or a rule-out band with a case being a false negative). For continuous variables (for example, age and WBCs), results were summarized with the numbers of observations, means, standard deviations, medians and ranges and 80% CIs. These calculations and endpoints were determined in discussions with the FDA and were used in prior FDA registrational clinical studies (https://www.accessdata.fda.gov/cdrh_docs/reviews/K210254.pdf). Fisher’s exact test determined significance for differences in need of ‘ICU-level care’ across patients as well as tested if there were significant differences in positive rate for each of the TriVerity Severity bands stratified by qSOFA scores. P values for differences between the AUROCs of TriVerity and biomarker values, clinical scores (qSOFA) and differences across races were calculated using DeLong’s test; P values comparing immunocompromised and immunocompetent patients as well as COVID-19 versus non-COVID-19 patients were calculated using bootstrapping. Statistical analyses were performed in SAS software version 9.4 or R software version 4.2.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.