Introduction

Meningitis is a life-threatening condition derived from the inflammation of the meninges, the membranes surrounding the central nervous system. Most meningitis cases are caused by viral pathogens and often have a benign course,1 but when caused by bacteria or fungi, they can rapidly progress to severe disease and jeopardize survival. Early suspicion, accurate differential diagnosis and prompt treatment are crucial to minimize mortality or sequelae among survivors.2,3

Although substantial progress has occurred in the last decades,4,5,6 the burden of acute bacterial meningitis (ABM) remains unacceptably high. ABM-associated mortality has declined since 1990 by 21%, but the goal of defeating meningitis -ambitiously set by the World Health Organization (WHO) to 2030 in their roadmap7 - seems, in the short-term, unrealistic. Meningitis cases among children under the age of 5 years accounts for over half (51%) of all cases (1.28 M annual cases), with 112,000 deaths estimated to occur in 2019 (47% of the total mortality share).7,8 Globally, meningitis resulted in 21.9 million disability-adjusted life-years lost and 1.48 million years of life lived with a disability.8,9 Among ABM survivors, sequelae are often severe and life-altering, including neuromotor disability, chronic seizures, hydrocephalus, hearing and visual loss, and cognitive and behavioral impairment, being present in around 13% of all cases.10,11 The risk of sequelae can be exacerbated by comorbidities, immunocompromising conditions, and, especially, delays in diagnosis and treatment.2

In children, clinical presentation of ABM is often subtle, insidious, or overlapping with that of many other common conditions, and can include fever, headache, impaired consciousness, neck stiffness or vomiting. Neonates and young infants have the most unspecific presentation, which can partially be explained by their immature immunological system, but also due to their different anatomy of the cranium, which allows -due to the unclosed fontanel- some degree of inflammation without presenting the typical irritation and compression symptoms that occur in older children and adults with a solid and closed surrounding bony structure.12,13

Given the high morbidity and mortality associated with ABM, and precisely on account of its unspecific and overlapping symptomatology, clinical guidelines in high-income countries (HICs) tend to proactively recommend conducting lumbar punctures (LPs) to rule out this condition,14 fostering a protocolized screening approach. An accurate confirmation of ABM requires the analysis of the cerebrospinal fluid (CSF) obtained through a LP. Such analysis may confirm the inflammatory nature of the process, by documenting an abnormally increased white blood cell (WBC) count in the fluid, and in some cases the identification of the causative pathogen. LP is not exempt of risks and requires clinical expertise, particularly when conducted in younger patients and neonates, and can induce adverse events such as apnea and desaturation in the previously sick newborn.15 Additionally, even with all available resources, 14–40% of LPs are hematic or traumatic. Neonates are at a higher risk of traumatic LPs, with the risk doubling in comparison to that of older children.15 The presence of blood in the CSF, including red blood cells (RBCs) and WBCs, further hinders the lab-interpretation of the CSF’s cellularity.16 Importantly, the invasiveness of the technique is a major concern for caregivers, being a stressful situation for them and patients.17 The protocolized approach to screen for ABM in place in most HICs induces the proactive performance of LPs in those children (0–28 days of life) with a higher risk of invasive bacterial disease, to ensure that no ABM cases are missed. However, and given the relatively low incidence of this infection, less than five percent of all LPs yield positive results.18,19,20

The burden of ABM relies mostly on low and middle-income countries (LMICs), especially in sub-Saharan Africa,9 but also in northern Africa (including Morocco).21,22

In LMICs, the invasiveness of the procedure and the lack of capacity to perform it by untrained personnel, along with the scarcity of advanced laboratory support services,23 make CSF analysis seldom feasible. As a result, meningitis is often managed empirically whenever suspected, and many initially subtle cases (such as febrile or clinically septic neonates) are missed. For instance, the WHO Integrated Management of Childhood Illness (IMCI) guidelines for neonates up to two months of age, and for children under 5, recommend treating with high doses of antibiotics patients with suspected meningitis prior to transferring them (if possible), with no mention to LPs.24,25 This likely majorly underestimates the number of real ABM cases26 and contributes to unnecessary prolonged and broad-spectrum antibiotic use, increasing the risk of antimicrobial resistances.27 Despite all these shortcomings, LP remains the gold standard for screening and diagnosing any form of meningitis.7

To try to bypass the aforementioned challenges, we have developed a novel non-invasive high-resolution ultrasound-based (HRUS) device (Neosonics®), capable of detecting backscatter signals from WBC in the CSF through the anterior fontanel. By using deep learning (DL) models, the technology is expected to classify patients according to their WBC level in CSF, thus theoretically allowing a non-invasive and quick approach for the screening of infant meningitis and, therefore, narrowing down indications of LPs.

The aim of this study is to validate the safety and accuracy of our device in patients with suspected meningitis, where this was eventually confirmed or ruled out by CSF laboratory-based WBC counting.

Methods

From June 2020 to June 2023, three Spanish University Hospitals (La Paz University Hospital, Madrid; Quirónsalud University Hospital, Madrid; Sant Joan de Déu University Hospital, Barcelona), one Mozambican public teaching hospital (Hospital Central de Maputo, Maputo) and one Moroccan University Hospital (Hôpital d’Enfants-Centre Hospitalier Universitaire Ibn Sina, Rabat) prospectively recruited neonates and infants with an open anterior fontanel and suspected meningitis. Inclusion criteria were: i. having an open anterior fontanel; ii. age <24 months iii. indication for a LP due to suspected meningitis; iv. a LP performed within 24 h from image acquisition; v. willingness to participate and have a consent form signed by legal guardians (convenience sample). Patients with central nervous system malformations, or a history of traumatic events or suspected/confirmed intracranial bleeding were excluded.

Patients were managed based on standard-of-care clinical guidelines at each of the hospitals. Clinicians were blinded to the results of the device evaluations at the time of decision-making. Positive microbiological results (bacterial growth in CSF or pathogen detection with molecular diagnostic techniques) were indicative of a meningitis confirmed case. WBC count was performed using the Fuchs-Rosenthal chamber in Spain, the Malassez chamber in Morocco, and Neubauer chamber in Mozambique. We applied a standard correction formula in hematic punctures by reducing 1 white blood cell (WBC) for each 1000 RBCs,28 and samples with ≥100.000 RBC were excluded from the analysis.29 For the purpose of the analysis, the positivity threshold was established at a WBC count in CSF 30 or more WBC/mm3. We systematically collected demographic data, clinical variables at presentation, CSF test results, blood test results, outcome and final diagnoses (according to ICD-10 guidelines), using the RedCap® open-access online platform.

The Neosonics® device is a non-commercially available device that has not yet received any certification clearance. It is a compact ultrasound system working at a central frequency of 20 MHz and able to detect white blood cells in suspension, to analyse serous body fluids composition at a higher sensitivity to changes not captured by conventional ultrasound systems. DL is used in the device to provide automatic and accurate classification information. All clinicians received training for the use of the device before starting the enrollment.

Clinicians who recruited patients received a 2-h training of the device use, irrespectively of their prior knowledge or expertise in ultrasounds. In this study, images were displayed to the user to generate a first proof-of-concept validation and to train an algorithm to automatically locate CSF regions (Fig. 1a). The process of acquiring CSF frames for analysis involves two steps. The initial step is a sanity check in low-resolution (LR) to manually choose images located in the CSF area and eliminating ones with visible poor coupling or bad image quality. A second step involves the acquisition of multiple images per participant in high-resolution (HR) (as many as possible during the exam, with no pre-established threshold). HR images were externally processed post-hoc by standard image processing techniques. For the algorithm training we included patients with optimal acquisitions, with the aim to feed the diagnostic tool with minimal errors.30 For case/control classifications, each frame in HR mode was labeled as “Control” if cellularity according to gold standard was below 30 cells/mm3 or “Case” if cellularity was equal to or above 30 cells/mm3, as most expected patients were neonates.31 The pre-processed images were then trained on a DL model based on the ResNet50 architecture initialized with pretrained weights,32 with a leave-one-out strategy to train the model in each iteration on N-1 patients and test the model on the remaining dataset (Fig. 1b). With these models, each single image was given a probability of either belonging to the case or the control class (Fig. 1c). The sensitivity and specificity at the frame level were computed as the true positive and true negative rate over single images, respectively. We calculated the median value over all images for each patient to obtain one final probability value per class and patient (Fig. 1c).30

Fig. 1: Pipeline from image acquisition to final prediction.
figure 1

a Image acquisition pipeline. Image of the prototype of the customized HRUS probe placed on top of a newborns’ fontanelle, with image display shown on a computer screen. The image is first shown in LR, with visible structures of the fontanelle, the cortex, and the region of interest: CSF space. A LR image provides the first sanity control to avoid bad coupling images and ensure CSF space is located. Multiple HR acquisitions are then obtained in areas of CSF as it naturally flows beneath the fontanelle. Examples of different WBC concentrations are shown with their paired gold standard results (WBC count in LP) in each frame (a negative case with 0 WBC/mm3, a positive case with 85 WBC/mm3 (meningitis with low cellularity) and a positive case with 2400 WBC/mm3 (meningitis with high cellularity). b Each HR processed frame is labeled as meningitis if the LPs result shows 30 or more WBC/mm3 and as control if <30 WBC/mm3. By using a ResNet50 pretrained deep learning architecture, we apply a leave-one-out strategy to train the algorithm to be able to test it on each patient. c The algorithm is tested on all images acquired for each patient. Here we can see an example of a control (Patient “a”) and a meningitis case (Patient “b”). All images are processed by the algorithm, and each frame is given an individual probability to belong to the meningitis and the control classes. For each patient, images’ probabilities were combined by calculating the median probability to be a control or a case and are shown in the figure as black crosses. Finally, that median probability from all individuals’ images gives us a classification as a control or meningitis case with a probability in the range of [0,1].

The study was conducted following the clinical trial protocol available at https://kriba.ai The study in Spain was approved by the Spanish ethical committee (HULP 5183) and by the Spanish Agency of Medicines and Medical Devices (AEMPS 711/18/EC) according to the European Regulation of Medical Devices (EU) 2017/745; in Mozambique by the Mozambican National Bioethics Committee (608/CNBS/23); and in Morocco by the Moroccan Ethics Committee (C70-20), by Barcelona Clinic Hospital (HCB/2022/0300) and by the Directorate of Medicines and Pharmacy from Ministry of Health and Social Protection of the Kingdom of Morocco (N°01 RB/DMP/18/AIC).

Data analysis was performed using Microsoft Excel (Version 16.4) and Python 3.10.6. The deep learning models have been trained using the PyTorch framework with an NVIDIA GeForce RTX 3090 graphics processing unit. Figures have been edited using Adobe Illustrator 25.4.8.

Results

Between June 2020 and June 2023, 30 Spanish patients from 4 University Hospitals, 68 Mozambican participants from one public teaching hospital, and 127 patients from one Moroccan University Hospital were enrolled in the study. No adverse events were reported in relation to the use of the device throughout the study. We ultimately included 2237 optimal HR images to train the algorithm from 34 multiple-image acquisitions and paired LPs; 11 corresponded to patients with \(\ge\)30 WBC/mm3 in CSF and 23 controls. Once trained, the algorithm was tested on 3782 HR images corresponding to 76 acquisitions and paired LPs, of which 18 were cases with \(\ge\)30 WBC/mm3 and 58 were controls, including the cohort of 42 participants from Morocco as an independent test-set, not previously used to train the DL algorithm (Fig. 2).

Fig. 2: Participants flowchart.
figure 2

Images were only used to train a LR algorithm due to suboptimal acquisitions (wrong anatomical location or bad coupling) from 11/30 (37%) Spanish and 16/68 (24%) Mozambican participants. These images have been used to train a LR quality algorithm to automatically detect bad coupling and wrong anatomic locations. Among patients with correctly located images, 19/52 (37%) participants from Mozambique, 3/19 (5%) from Spain, and 15/42 (12%) did not have gold standard (LP results) to be compared with. In the Spanish cohorts, three patients had unsuccessful LPs. In Mozambique, they had either unsuccessful of very traumatic LPs, or the laboratory was unable to process the sample and come up with a cellularity count, and in Morocco LPs were either unsuccessful or were highly haematic and, therefore, excluded as reliable gold standard to compare the technology with. Finally, 17 Mozambican patients’ images were not used to train the algorithm because of low fontanelle tissue saturation, indicative of high signal attenuation, CSF images obtained out of the optimal focus or a small number of HR frames obtained. Due to clinical indications, two patients with meningitis (cases) had a repeated LP performed a few days after recruitment, resulting in an additional HRUS acquisition for each of them.

Demographics and patients’ characteristics

Participants from Mozambique were older (47.5 days [IQR 33–100]), compared to those from Spain (23 days [IQR 8–33.5]) and Morocco (3.5 days [IQR 2–19], p < 0.05 (Table 1). Patients received antibiotics prior to the LPs in 8/17 (47%) of the cases in Spain, in 12/17 (71%) in Mozambique and in 39/42 (93%) in Morocco. The median number of days from admission to first LP was 0 days (IQR 0–4) in Spain, and 1 day in Mozambique (IQR 0–2.5) and Morocco (IQR 0–3).

Table 1 Characteristics of participants, main CSF results, clinical diagnoses and results of the HRUS device

Among included participants (<100.000RBC/mm3), 23/42 (55%) LPs contained RBC > 1.000/mm3, requiring WBC count correction. In total, there were 23 meningitis cases defined by clinicians; 12 only due to a WBC count over their age-related threshold, 7 microbiologically confirmed cases with an elevated WBC count, and 4 microbiologically confirmed cases with <30WBC/mm3. In Mozambique, where CRP and PCT are not available, all LPs were performed due to clinical suspicion. The laboratory was not able to provide with a quantitative result of RBC when the sample contained more than 100.000 RBC, therefore highly hematic samples were excluded (Table 1).

Accuracy of the HR device

The mean time to obtain the HRUS scans with the prototype was 12.1 min (CI 9.35–14.77) per patient in Spain, 17.9 (CI 14.26–21.63) in Mozambique and 17.5 (CI 13.12–29.6) in Morocco. The device correctly classified 17/18 samples with a cellularity equal or higher than 30WBC/mm3 (sensitivity 94.4%), and 55/58 controls (specificity 94.8%), achieving an overall accuracy of 94.7%. Therefore, the positive predictive value (PPV) was 85.0% and the negative predictive value (NPV) 98.2%.

Some meningitis cases did not present with a high cellularity, and therefore, were not detected by the device. A 41 days-old meningitis patient from Mozambique presented 25 cells/mm3 at the time of the LP and HRUS acquisition, which was 6 days after starting antibiotics for meningitis suspicion - LP was not initially performed due to clinical instability. Four Moroccan participants had a microbiologically confirmed meningitis without elevated WBC count; 3 of them having received antibiotics prior to LP due to initial clinical instability, and one enterovirus case (Table 1).

We evaluated the devices’ performance both at image frame and patient levels. The accuracy (overall percentage of correct classification) reached 82.0%, with a sensitivity of 72.2% and a specificity of 86.3% for each single frame (Fig. S1, supplementary material). The final prediction on a patient level, though, relies on a global analysis of all the acquired HR frames per patient. The model gave for each image the probability for belonging either to the meningitis or the control class, and a final probability per patient based on the median probability of all patients’ frames (Table 1, Figs. 2, 3, and 4).

Fig. 3: Probability distribution of the model classification of all image frames shown per subject.
figure 3

Each image shows the probabilities of every frame acquired during the test. According to gold standard results, controls with cellularity under 30WBC/mm3 are labeled in blue and cases in orange. Each dot shows the probability of each obtained frame of belonging to the control class (X axis) versus belonging to the meningitis class (Y axis). The black cross shows for each subject the median value over probabilities over all dots - aka combined probabilities to belong to a “meningitis” or “control” class.

Fig. 4: Probability distribution of the model classification of overall image frames.
figure 4

Each dot shows the probability of each frame belonging to the control class (X axis) versus belonging to the meningitis class (Y axis). Blue dots belong to the control class individuals’ frames and orange dots to the meningitis class. The red cross shows the mean probability for all cases to be classified as “cases”, and the blue cross represents the mean probability of all control frames to be classified as “controls”.

Discussion

The primary objective of this study was to assess the accuracy of a novel, non-invasive device, to correctly classify infants and neonates according to their WBC count in CSF, compared to the gold standard technique: the LP and laboratory-based CSF analysis. This novel technology aims to provide results comparable to those obtained through CSF laboratory WBC counts to modulate meningitis suspicion and indications of LPs.

Meningitis screening in high- and low-income settings

In HICs, where the incidence of ABM is relatively low, the highly unspecific clinical presentation in neonates and infants, and its high associated mortality, necessarily lead to a proactive screening approach. These protocols result in a high number of LP indications, but a low yield of positive results.16,19 In fact, it is notable that 80% of patients in our Spanish cohort underwent a LP due to nonspecific presentations such as fever or elevated inflammatory markers in blood along with their young age.16,19 Importantly, 5% of Spanish initially enrolled patients were excluded due to unsuccessful LPs, highlighting the need for alternative tests when LPs are not feasible or when they are traumatic, as CSF results become challenging to interpret. Since one criterion for inclusion was to have a LP performed, we cannot account for patients with suspected meningitis in which LPs were not done due to clinical contraindications.

Efficient identification of meningitis cases remains a challenge in LMICs, where scarce resources often result in underdiagnosis or delayed diagnosis and treatment, significantly increasing morbidity, sequelae, and mortality rates.33,34,35,36,37 In Morocco, a middle-income country, where the study was performed in a neonatal unit, all participants had elevated CRP in blood, and only a few presented with clinical manifestations. Interestingly, up to 15 participants were excluded due to heavily traumatic LPs,29 inaccurate to serve as a reliable gold standard to compare the device with. Among included participants, more than a half necessitated for a leukocyte correction due to a moderately elevated RBC count (1.000–100.000 RBC/mm3).

In Mozambique, a low-income country, where CRP and PCT were not available, all LPs were done upon clinical suspicion of meningitis. This fact probably explains the older age of Mozambican patients, as clinical manifestations are milder among the youngest, and suggest that many neonates in LICs are usually not subject to the practice of a LP. Notably, we could not use data from almost 40% of the initially enrolled cohort due to lack of gold standard results to be compared to, as LPs were either unsuccessful or highly contaminated with blood. This is likely caused by the generalized use of intramuscular - instead of higher-gauge - needles to perform the procedure.38 Obtaining successful CSF results was a challenge, even in the main city of Mozambique, in the context of a study, with extra staff and laboratory support.

Less than half of meningitis cases in the cohort had microbiological confirmation. This can be explained in LMICs by the very proactive use of empirical treatment given before transferring patients to a hospital with capacity to perform LPs,24,25 by the limited sensitivity of CSF cultures, and by the lack of molecular diagnostic methods in place in Mozambique. Furthermore, almost all LPs were performed at least one day after admission and antibiotic treatment, jeopardizing the reliability of laboratory-based tests. Altogether, this supports the idea of meningitis being mostly treated based on clinical suspicion in LMICs, as indicated by IMCI guidelines,24 which has been shown inaccurate.39

Despite the difficulty of obtaining reliable golden standard results in our cohort, which has been a limitation, our results reflect the reality of meningitis diagnosis challenges globally; LPs are very difficult to obtain and analyse in LMICs, and even when all the resources are available, they are often performed after a necessary stabilization of critically ill patients and antibiotic treatment. Also, they are often traumatic, necessitating for correction formulas to obtain an approximate number of WBCs. Of note, for the study purposes we have used the 1:1000 correction formula, but the 5 different hospitals used in their clinical practice up to 3 different correction methods.29

The most important strength of the HRUS device is its non-invasiveness, which would allow clinicians to narrow down indications of LPs, especially in HICs where complementary tools (such as CRP and PCT) are available19; to perform the test even in severely ill patients much before the clinical stability necessary for LPs, and to repeat it as many times as necessary throughout the course of the disease, for instance to monitor the response to treatment. Finally, given the high incidence of traumatic LPs, especially among neonates, the non-invasive device could, in many occasions, work as an alternative method to modulate clinical suspicion of meningitis.

HRUS device’s training and performance

The HRUS prototype device correctly classified 17/18 patients with 30 or more WBC/mm3 (sensitivity 94.4%) and 55/58 patients with <30 WBC/mm2 (specificity 94.8%). Sixty-nine initially enrolled patients did not present good quality HR images, manually selected by clinicians, of the CSF. However, most of them were among the first patients enrolled, which indicates that were due to the usual learning process of both clinicians and engineers in the implementation of a novel technique. Nevertheless, their LR images have been used to develop a quality control algorithm able to automatically locate appropriate CSF locations,30 which is already being used in the new prototype of the device.

There were 5 participants with clinically diagnosed meningitis and <30 WBC/mm3. However, the aim of the device is to be equal as the laboratory-based WBC count and has been shown as accurate in doing that, and all patients with high suspicion of meningitis should always be managed accordingly, as if they had a low WBC count obtained through a laboratory test. While we cannot know the CSF count prior to their antibiotics administration and clinical stability, one can assume that probably, a HRUS test done upon arrival could potentially show higher levels of WBC than those obtained through LP after treatment. This delayed LP performance is common, globally, when infants are clinically unstable, but especially in LMICs, as LPs are not a part of the initial protocol, and antibiotics are administered empirically.24

Limitations

The device wrongly classified one patient with 40WBC/mm3 as a control. This participant had a traumatic LP and presented 50 total WBC with 10.000 RBC. However, by applying the blood RBC/WBC ratio, which was 343 (instead of 1000) in this case, the result would have been 21 corrected WBC and, the classification of HRUS would have been correct. We did not use this alternative conversion to keep consistency across the cohort, but these results highlight that the variability of interpretation of the gold standard29 poses a challenge in evaluating the diagnostic accuracy of an alternative test. This is especially important in samples with relatively low WBC counts, as interobserver variability and conversion factors can affect the final classification (meningitis vs. no-meningitis). Also, this participant had a thick fontanel (>4 mm), and we have observed that the devices’ precision decreases in patients with thicker fontanels while clinical features or severity don’t seem to alter the precision of the results (Fig. S2, supplementary material). To try and overcome that, in a newer version of the technology the signal strength is increased when the fontanel is thicker than 4 mm. The device misclassified three participants with <30WBC/mm3. Their CSF space was very narrow (under 2 mm) and their images seem to show a high level of clutter noise. Further investigations are needed though to overcome these challenges, and a larger number of acquisitions per patient has been implemented in our current investigations. Images acquisition with this prototype took around 12 min per patient, which is suboptimal. That was mainly due to the necessity to manually locate CSF areas. The automated version of the device being used in the current investigations bypasses this challenge, and the usability goal is to obtain good quality images in under three minutes per patient. This new prototype will be tested in the efficacy phase of the study. While these proof-of-concept results are promising, an efficacy phase with a much larger number of patients is necessary to confirm that all limitations can be addressed while maintaining the high accuracy of the predictions.

The non-invasive WBC counter, can lead to a more accurate indication for LP and/or indications of treatment, resulting in better patient outcomes and in a reduction of the risk of long-term sequelae,7,8,9 along with a likely more cost-effective use of resources in all settings.40 In HICs it would especially spare patients’ discomfort and health-care systems’ costs, and in LMICs it would be effective in reducing the burden of sequelae and mortality attributed to meningitis.

In conclusion, this non-invasive ultrasound-based technique, utilizing DL models to determine WBC in CSF, shows great promise in screening for meningitis in neonates and infants with a permeable fontanel. In this proof-of-concept study, the device showed a high-accuracy in classifying patients according to their laboratory WBC count. Most outstanding benefits of the device would be to modulate meningitis suspicion when LPs and CSF analyses are not feasible (severe patients globally and most patients from LMICs). Importantly, it would also increase meningitis suspicion in LICs, especially among the youngest, in whom LPs are not performed systematically, while it would decrease indications for LPs in HICs among patients with low degree of suspicion. This could allow a more efficient use of resources and efforts in defeating meningitis strategies, improving patient outcomes at both the individual and global health levels.