NGS-based Aspergillus detection in plasma and lung lavage of children with invasive pulmonary aspergillosis

Wesdorp, Emmy; Rotte, Laura; Chen, Li-Ting; Jager, Myrthe; Besselink, Nicolle; Vermeulen, Carlo; Hagen, Ferry; van der Bruggen, Tjomme; Lindemans, Caroline; Wolfs, Tom; Bont, Louis; de Ridder, Jeroen

doi:10.1038/s41525-025-00482-8

Download PDF

Article
Open access
Published: 17 March 2025

NGS-based Aspergillus detection in plasma and lung lavage of children with invasive pulmonary aspergillosis

npj Genomic Medicine volume 10, Article number: 24 (2025) Cite this article

3930 Accesses
4 Citations
18 Altmetric
Metrics details

Subjects

Abstract

In immunocompromised pediatric patients, diagnosing invasive pulmonary aspergillosis (IPA) poses a significant challenge. Next-Generation Sequencing (NGS) shows promise for detecting fungal DNA but lacks standardization. This study aims to advance towards clinical evaluation of liquid biopsy NGS for Aspergillus detection, through an evaluation of wet-lab procedures and computational analysis. Our findings support using both CHM13v2.0 and GRCh38.p14 in host-read mapping to reduce fungal false-positives. We demonstrate the sensitivity of our custom kraken2 database, cRE.21, in detecting Aspergillus species. Additionally, cell-free DNA sequencing shows superior performance to whole-cell DNA sequencing by recovering higher fractions of fungal DNA in lung fluid (bronchoalveolar lavage [BAL] fluid) and plasma samples from pediatric patients with probable IPA. In a proof-of-principle, A. fumigatus was identified in 5 out of 7 BAL fluid samples and 3 out of 5 plasma samples. This optimized workflow can advance fungal-NGS research and represents a step towards enhancing diagnostic certainty by enabling more sensitive and accurate species-level diagnosis of IPA in immunocompromised patients.

Metagenomic next-generation sequencing and galactomannan testing for the diagnosis of invasive pulmonary aspergillosis

Article Open access 28 December 2024

Implications for the diagnosis of aspiration and aspergillosis in critically ill patients with detection of galactomannan in broncho-alveolar lavage fluids

Article Open access 15 January 2025

Epidemiological characteristics of patients with invasive pulmonary aspergillosis infected with Aspergillus fumigatus from a tertiary hospital in Ningxia, China

Article Open access 15 April 2025

Introduction

Invasive mold disease (IMD) is a threat to immunocompromised children, especially those with hematological malignancies or undergoing hematopoietic stem cell transplantation (HSCT)¹. Despite receiving antifungal prophylaxis, breakthrough IMD can still occur, with an incidence of up to 20%^2,3,4,5,6, with Aspergillus being the most common cause of IMD⁷. Early and accurate identification of fungal pathogens is crucial for tailoring antifungal treatment, especially as diverse fungal pathogenic species require different antifungal treatments. For aspergillosis, an azole is recommended as first-line treatment, whereas for mucormycosis amphotericin B is the first-choice treatment⁸.

The current diagnostic toolbox for IMD includes radiologic imaging, microbiological bronchoalveolar lavage (BAL) fluid analysis (i.e., culture, antigen testing and PCR) and antigen testing on serum. While valuable⁹, these microbiological tests have limited sensitivity, particularly as prior antifungal treatment — common in pediatric patients — compromises their performance. For instance, BAL PCR shows a sensitivity of 0.17% (95% CI, 0.05–0.45), while BAL galactomannan has a sensitivity of 60%^10,11. Additionally, the galactomannan antigen test lacks species level identification, which is crucial in the differentiation between invasive aspergillosis from invasive non-Aspergillus species. Therefore, there is an urgent clinical need to expand the diagnostic toolbox for early, sensitive and accurate species level diagnosis, preventing disease progression, improving patient outcomes and avoiding unnecessary exposure to prolonged toxic antifungal therapy.

Microbial next-generation sequencing (NGS) detects microbial DNA of pathogens in patients with infectious diseases and holds promise for IMD diagnosis^{12,13,14,15,16}. Microbial NGS enables species-level identification of pathogens and is regarded as sensitive when applied to BAL samples¹⁶, while its sensitivity in plasma appears to depend on the specific pathogen responsible for the IMD^15,16. Yet, the preferred microbial NGS workflow for pediatric IMD diagnosis is unclear and multiple technical gaps exist. Computationally, there is a lack of standardization on taxonomy classification for fungal identification in samples from patients with IMD. Specifically, the impacts of reference database composition, genomic sequence processing (i.e., masking low-complexity and contaminated regions) and threshold settings for sequencing read classification in fungal diagnosis remain unclear. Given that only a small fraction of short-read sequences corresponds to potential pathogens, any taxonomic misassignment due to suboptimal computational methods or parameters can greatly affect pathogen identification and subsequent therapeutic decisions. Such misclassification may occur when reads coincidentally match to multiple genomes in the reference database or persistently match to an incorrect genome or no genome at all leading to false positive or false negative outcomes¹⁷.

Additionally, it remains unclear what the optimal wet-lab strategy is to maximize fungal DNA yield. Specifically, little is known about the impact of sample source, DNA type, and DNA isolation method on the success of fungal diagnosis. Although traditionally whole-cell DNA (wcDNA) sequencing has been applied to samples collected near the infection sources, like BAL fluid, recent research indicates that BAL fluid cell-free DNA (cfDNA) sequencing may outperform wcDNA sequencing in pulmonary infections¹⁸. At the same time, plasma microbial cfDNA sequencing has also shown potential in mostly small cohort studies involving adult IMD patients^12,13,15,16, and one pediatric study¹⁴, highlighting its potential but also emphasizing the need for further investigation. Methods like DNA isolation and adapter ligation can also affect fungal DNA abundance. Although single-stranded (ss) sequencing libraries are preferred for e.g., bacterial and viral cfDNA over double-stranded (ds) ligation-based DNA library preparation¹⁹, further exploration through comparative evaluation has not yet been done to determine the optimal approach for recovering fungal mold DNA from liquid biopsies.

In this study, we focus on the detection of Aspergillus, the predominant pathogen associated with pulmonary IMD. We aim to close above-mentioned technical knowledge gaps by optimizing six key steps through comparative experimental testing (Q#1-6; detailed in Fig. 1). We introduce a refined wet-lab strategy together with an open-source cfDNA pathogen identification workflow, referred to as cell-free DNA Single-strand Pathogen Identification pipeline (cfSPI). cfSPI is optimized for the detection of Aspergillus species with minimal false positives taxonomic mislabeling and maximum accuracy of true positive detections. The cfSPI pipeline uses paired-end Illumina sequencing data and incorporates host genome mapping. Unmapped reads are subsequently classified using kraken2²⁰, with enhancements such as an improved reference database through dustmasking and cleanup^20,21, and an optimal confidence threshold (CT) for classification accuracy. We show that these factors, all refined in this study, can impact Aspergillus classification accuracy^{17,22,23,24,25}.

**Fig. 1: NGS library strategies and the cfSPI open-source workflow for *Aspergillus* detection.**

Finally, as a proof-of-principle, we applied cfDNA sequencing with cfSPI to seven pediatric patients (7 BAL fluid; 5 plasma) with invasive pulmonary aspergillosis (IPA) and eighteen external controls (9 BAL fluid; 9 plasma), successfully detecting Aspergillus species in the majority (6/7) of these IPA cases. This work establishes the groundwork for large cohort evaluations of the accuracy and sensitivity of cfDNA NGS for Aspergillus diagnosis in suspected patients, ultimately contributing to the potential implementation of NGS in the IMD diagnostic work-up.

Results

cfSPI is an open-source pipeline optimized for accurate Aspergillus detection. The pipeline processes paired-end Illumina sequencing data through quality filtering, host genome mapping, and classification using kraken2, a high-performance DNA-to-DNA tool leveraging 31-mer matches against a reference database. Each step is carefully fine-tuned, as described in the next section. To validate the results produced by cfSPI, we generated 87 simulated Illumina sequencing cfDNA datasets (55 Aspergillus, 7 Penicillium, 25 other fungi). Moreover, we Illumina sequenced samples of seven probable invasive pulmonary aspergillosis patients and 18 external controls to further showcase the utility of cfSPI on patient data.

Optimizing host read subtraction and kraken2 database composition for the cfSPI pipeline

Detecting Aspergillus-derived DNA fragments from (cf)DNA sequencing data is a ‘needle-in-a-haystack’ challenge, where the vast majority of DNA reads will be derived from the host. For this reason, host-read subtraction through mapping to the host genome is a critical initial step in cfSPI to minimize the risk of overestimating microbial counts. Previous work already highlighted that improper subtraction can inflate bacterial counts²³. There are a number of frequently used human genome versions, such as GRCh38.p14 and CHM13v2, which vary in completeness. The impact of mapping to these different genome versions on detecting fungal-derived reads in liquid biopsies remains unclear. To evaluate this, we mapped the sequencing reads from our external control samples (9 BAL; 9 plasma) to reference genomes GRCh38.p14 or CHM13v2. In addition, we performed dual-mapping meaning mapping was performed to a combined reference containing both GRCh38.p14 and CHM13v2. Our results revealed that both mapping to CHM13v2 and dual-mapping strategies significantly reduced the fraction of unmapped reads compared to mapping to GRCh38.p14, with rates dropping from 2.27% to 1.77% and 1.71%, respectively (Fig. 2a).

**Fig. 2: Determining the impact of CHM13v2 host genome mapping for optimizing microbial read quantification in control samples.**

Unmapped reads were taxonomically classified using kraken2 (see Methods), using the default confidence threshold (CT = 0). We found that fungal-classified reads dropped from 81.60 reads per million (RPM) with GRCh38.p14 to 60.23 RPM and 58.86 RPM with CHM13v2 and dual-mapping (Fig. 2b)²⁴. These findings indicate that including CHM13v2 is essential to prevent inflated fungal counts. Overall, dual-mapping emerges as the preferred strategy for reducing misclassifications to the fungal kingdom.

Next, we focused on optimizing the kraken2-mediated taxonomic classification of non-human reads. To address the possibility of human reads remaining unidentified during host mapping, we evaluated whether incorporating CHM13v2 into the standard NCBI RefSeq database used by kraken2, which traditionally included only GRCh38.p14, could help reduce misclassifications (see Supplementary Fig. 2 for database details). Indeed, inclusion of CHM13v2 (which we refer to as database ‘uR.7’) increased reads labeled as human (Fig. 2c, CT = 0, the default setting in kraken2; Supplementary Fig. 3a CT = 0.8), even after dual-mapping, while reducing fungal (Fig. 2d), but not other microbial counts (Supplementary Fig. 3b-c). Additionally, omission of CHM13v2 from the hash-table database led to misclassification of reads as microbial (Supplementary Fig. 4a), including as Aspergillus (Supplementary Fig. 4b). Therefore, in our cfSPI pipeline, we utilize kraken2 databases that include both CHM13v2 and GRCh38.p14 to further mitigate the risk of misclassifying human-derived reads as fungal or microbial taxa.

The default kraken2 NCBI RefSeq database (‘uR.7 w/o CHM13v2’) traditionally contains only seven out of the over 300 known Aspergillus species (Supplementary Fig. 2a-b), leading to inadequate Aspergillus classification. Specifically, 88.1% of reads from 55 simulated Aspergillus datasets remained unclassified (i.e., not classified to any taxa; Fig. 3b) when using a CT of 0.8, due to the absence of these species in the database (Fig. 3b). With the aim to enhance (species-level) classification of Aspergillus, we replaced the fungal genomes in the uR.7 database with cleaned, dustmasked or unaltered fungal sequences from EuPathDB²⁶ and MycoCosm²⁷ (Fig. 3a; database details in Supplementary Fig. 2, Methods, and Aspergillus species in Supplementary Data 3). This resulted in new kraken2 databases: uRE.21, dRE.21, which include 21 Aspergillus species), and uRE.31, dRE.31, which include 31 Aspergillus species. As well as: dREM.258 and dREM.260, which include 258 and 260 Aspergillus species, respectively.

**Fig. 3: Database-dependent taxon detection in simulated *Aspergillus* samples.**

When evaluating the performance of these new kraken2 databases, we first assessed the effects of dustmasking and cleanup, followed by the impact of database augmentation on the classification of simulated Illumina cfDNA Aspergillus datasets. Using a cleaned or dustmasked database proved crucial for preventing taxonomic misassignments of Aspergillus. We found a slight but significant reduction in overall classification rates (7.0–8.2%) and true positives at the genus (6.7–7.6%) and species levels (8.1–9.1%) with a CT of 0.8 (Supplementary Fig. 5a-c, e-g). However, classification accuracy improved significantly at CT values < 0.2 (Supplementary Fig. 5h), reaching a 0.30–0.38% reduction in false positives (Supplementary Fig. 5d). These findings suggest that masking unreliable sequences in reference genomes is highly recommended. Second, we evaluated the impact of database augmentation on classification accuracy across different taxonomic levels. By incorporating fungal sequences from EuPathDB²⁶ and MycoCosm²⁷ into the kraken2 database, we identified a trade-off between database size and classification accuracy. Specifically, we observed the followings trends linked the extended databases: improved overall classification rates (Fig. 3b), increased true positives at the genus level (Fig. 3c), a broader range of detectable Aspergillus species (Supplementary Data 4), and a reduced false positive rate during species classification (Fig. 3e; Supplementary Fig. 6). However, we also observed a lower percentage of reads classified at the species level (Fig. 3d). This while, misclassification of Penicillium as Aspergillus generally remained low but increased when expanding the database with MycoCosm genomes, particularly in dREM.260, where we noted 1.97% misclassification at the genus level and 1.90% at the species level (Fig. 3f).

Overall, we conclude that medium-sized databases with 21 to 31 Aspergillus genomes are optimal for species level detection, while larger databases (258 to 260 genomes) excel in broader genus level detection. The curated cRE.21 database thereby demonstrated the highest sensitivity for species level detection (mean 53.8%; Fig. 3b), while dREM.260 excelled in genus level detection (mean 73.6%; Fig. 3c). Consequently, the cfSPI pipeline uses cRE.21 for species identification and dREM.260 for genus identification.

Optimizing sample workup for fungal NGS

Sample preparation can impact the sensitivity of shotgun microbial NGS for Aspergillus detection. We isolated cfDNA and wcDNA from BAL fluid and cfDNA from plasma, constructing sequencing libraries using either single-stranded (ss) or double-stranded (ds) ligation methods (for schematic overview see Fig. 1; for details see Methods). Sequencing was performed on 60 libraries (see Supplementary Data 2). With the optimized computational workflow, elevated fungal counts are interpreted as indicative of improved fungal DNA retrieval rather than false positive observations.

Previous work, focusing on the cfDNA in plasma, demonstrated that ss-ligation produced a higher yield of short (<100 bp) microbial cfDNA compared to dsDNA ligation¹⁹. We assessed if the same holds true for fungal and Aspergillus cfDNA in our sample set of IPA patient- and external control samples, while making use of the cRE.21 database and a CT of 0.8. A noticeable trend in all samples suggests that ss-ligation generally resulted in elevated fungal (Fig. 4a) and Aspergillus (Supplementary Fig. 7a) relative abundance in both plasma and BAL samples (n.s., Wilcoxon rank-sum test; p > 0.05). We observed Aspergillus reads in almost all of these liquid biopsy samples (Supplementary Fig. 7c). In addition, ss-ligation resulted in a narrower library-size range compared to ds-ligation (Supplementary Fig. 8), circumventing DNA yield-reducing bead-based size selection (i.e., elimination of DNA molecules >700 bp; see Methods and Supplementary Data 2). Together, these results confirm that ss-cfDNA NGS is more effective than ds-DNA NGS for the recovery of fungal DNA from liquid biopsy samples¹⁹.

**Fig. 4: ss-ligation of cfDNA most effective in retrieving fungal DNA.**

Further analysis of ss-cfDNA workup revealed a significantly higher fungal (Fig. 4b) and Aspergillus (Supplementary Fig. 7b) abundance in the cfDNA when compared to wcDNA BAL sequencing (p ≤ 0.05, one-tailed Wilcoxon’s rank with Bonferroni correction). This difference could not be attributed to a difference in sequencing-depth as there is no correlation between the total read count and the relative number of fungal reads in our external control samples (Supplementary Fig. 7e; p > 0.05, Pearson correlation). These observations thus confirm earlier reports^18,28 that cfDNA contains a relatively higher fraction of fungal DNA molecules than wcDNA. Moreover, our study reveals that fungal counts in IPA patient BAL fluid ss-cfDNA samples are, on average, 5.4x higher than the relative fungal abundance in IPA patient plasma samples (n.s., one-tailed Wilcoxon’s rank with Bonferroni correction). This may be attributed to the direct sampling of BAL fluid at the presumed site of the Aspergillus infection. Taken together, the ss-cfDNA library preparation method results in higher fungal DNA relative abundance from both BAL and plasma samples, thereby exhibiting slightly higher abundances in BAL fluid.

Optimizing confidence thresholding: analyzing theoretical minimum for Aspergillus detection

After application of the cfSPI pipeline some Aspergillus counts were above zero in control samples (see Supplementary Fig. 7b,d for Aspergillus prevalence and Supplementary Fig. 7b for relative abundance in external control samples). These Aspergillus background levels must thus be considered when conducting diagnostic testing. Overall, background levels in control samples were higher at the genus (dREM.260-mediated) than at the species level (cRE.21-mediated), and higher in plasma samples compared to BAL samples (mean 1.5x and 1.2x using the cRE.21 and dREM.260, respectively) (Fig. 5a,c). Furthermore, these background levels were influenced by classification confidence thresholding (Fig. 5a,c). Acknowledging that both computational choices thus directly affect our ability to detect elevated Aspergillus DNA levels in patients suspected of a pulmonary fungal infection, we developed a methodology to explore the relationship between fungal background levels, classification rates in simulated datasets, and theoretical Limits of Significant Detection (LoSD).

**Fig. 5: Computational analysis theoretical minimum fraction required for *Aspergillus* detection to optimize database and parameter selection.**

In this LoSD analysis, we computed the theoretical minimum number of molecules per million (MPM) necessary for the detection of significantly elevated Aspergillus taxa above the control background, i.e., the background levels observed in immunocompromised pediatric patients without suspicion of a fungal infection (see Methods for details). Based on the previously observed species level classification rates (Fig. 3), our hypothesis was that the cRE.21 database would outperform the dREM.260 for detecting Aspergillus in clinical samples. And indeed, our LoSD analysis showed that species detection with the dREM.260 necessitated a substantially higher cfDNA load than species level detection with the cRE.21 (Fig. 5b; for details see Supplementary Fig. 9a).

Recognizing the interplay between database and CT choices in our LoSD analysis, we established the most optimal CT for each database. Species level cRE.21-mediated detection was most sensitive when employing a kraken2 CT of 0.4, while genus level dREM.260-mediated detection required a CT of 0.9 (Fig. 5b,d). These findings are based on plasma, where fungal load is generally lower, making optimal sensitivity critical for reliable detection. Employing the cRE.21 for species level detection (CT = 0.4), we could detect Aspergillus species levels up to 4 MPM across all 20 simulated Aspergillus datasets of species included in the cRE.21 (Fig. 5b). When utilizing the dREM.260 (CT = 0.9) for genus level detection, detection of ≤4 MPM was achieved in 93% of the plasma and BAL ss-cfDNA simulations (n = 52), respectively (Fig. 5d). Surprisingly, the LoSD appeared relatively stable across different library sizes (Supplementary Fig. 10). Nevertheless, our findings discourage sequencing fewer than 40 million reads due to the adverse impact (i.e., substantial increase in the minimal required MPM) on the theoretical LoSD (Supplementary Fig. 10).

While the cRE.21 is thus established as the most sensitive for species level identification, we emphasize that the impact of database selection on the minimum required MPM can vary among different Aspergillus species. For example, the theoretical minimal A. oryzae MPM was 2 for cRE.21 and dREM.260, while for A. niger our analysis indicated a minimum of 0.25-0.5 MPM required with cRE.21 compared to 2 MPM with dREM.260 (Supplementary Fig. 9b-c). Exceptions notwithstanding, we confirmed our prior hypothesis that detection of Aspergillus species should be performed using cRE.21 supplemented by cREM.260 for genus level detection if species-specific results are negative.

Diagnostic performance assessment: a proof-of-principle with seven IPA patients

As a proof-of-principle, we applied ss-cfDNA cfSPI to seven IPA cases which met the inclusion criteria (see Methods; Supplementary Fig 11; Table 1) and were classified as probable according to the EORTC/MSG criteria⁹ (A01-A05, A14-A15). Patient A03 had been classified as a probable IPA due to host factors (Table 1), repetitive high positive serum galactomannan, and suspected lesions on imaging (i.e., HRCT), but the initial BAL procedure showed negative results during diagnostic work-up. Subsequent repeat BAL procedure confirmed aspergillosis through positive BAL galactomannan at a later time point (this subsequent BAL sample was not included in our study). In our retrospective study, we subjected the initial BAL sample to ss-cfDNA NGS analysis. In total, we sequenced 7 BAL fluid samples (for A01-A05, A14-A15), paired with corresponding plasma (for A01-A05) and internal control plasma samples (for A01-A05) where possible.

Table 1 Clinical details of pediatric patients with probable IPA

Full size table

Comparing IPA liquid biopsy samples (plasma and BAL of A01-A05 and A14-A15, obtained at diagnosis) to 18 external control liquid biopsy samples from immunocompromised pediatric patients without suspected IPA infection (9 BAL; 9 plasma), A. fumigatus was significantly elevated in 5/7 of IPA BAL samples and 3/5 IPA plasma samples (Fig. 6a; mean pairwise Fisher’s exact test, p ≤ 0.001; for details see Methods). Importantly, none of the 18 external control samples tested positive with either of the databases (Supplementary 13b,d), indicating high specificity for ss-cfDNA NGS in immunocompromised pediatric patients. Similar results were observed when comparing IPA plasma to the internal control plasma as shown in Supplementary Fig. 12a (pairwise Fisher’s exact test, p ≤ 0.001). The positive samples exhibited fractional abundance of A. fumigatus ranging between 1.26 and 40.90 RPM in BAL and 0.31 and 7.71 RPM in plasma (Fig. 6b). In total, 6/7 patients (all except patient A05) had a positive result in at least one liquid biopsy using cfSPI ss-cfDNA NGS. Notably, cfSPI did not detect A. fumigatus in the BAL of patient A05, aligning with negative GM test results at the time of collection. Furthermore, cfSPI was the only molecular test that could detect Aspergillus in patient A03 compared to standard fungal molecular diagnostics (Table 1), showcasing the potential added value of our workflow.

**Fig. 6: Elevated *Aspergillus* levels in a subset of IPA patient samples processed via ss-cfDNA NGS.**

The LoSD experiments demonstrated that if cRE.21-mediated species level diagnostics yield negative results, then the simultaneously conducted dREM.260-mediated genus level detection results should be interpreted. Among our IPA samples, the majority (8 out of 12) tested positive for A. fumigatus when using cRE.21, which led to no further interpretation of their dREM.260 results. The remaining 4 samples (BAL samples from patients A03 and A05, as well as plasma samples from patients A04 and A05), also tested negative with the dREM.260 database (Fig. 6c-d; and Supplementary Fig. 12b).

Notably, the internal control sample of patient A05 — the only patient in whom we could not detect Aspergillus with cfSPI in samples collected at diagnosis (Fig. 6; Supplementary Fig. 12) — showed elevated A. fumigatus cfDNA-levels 24 days before diagnostic work-up (Supplementary Fig. 13a; see Supplementary Fig. 11 for timeline). To gain insight in the time course of Aspergillus ss-cfDNA levels within this patient, we subjected two additional plasma samples (plasma A05-2 and A05-3; collected respectively 31 and 38 days prior to diagnosis) for sequencing. Both samples showed no elevated Aspergillus cfDNA levels (Supplementary Fig. 13a,c; employing the cRE.21 and dREM.260, resp.), suggesting a potential false positive Aspergillus observation in patient A05 obtained by cfSPI 24 days before diagnosis. However, the positive internal control sample for patient A05 could also represent the detection of a true biological signal detected by cfSPI, even though the GM test was negative. In conclusion, elevated cfDNA levels detected by cfSPI in the absence of clinical or radiological symptoms, as observed in patient A05, should be interpreted with considerable caution at this time.

As a quality control, we systematically aligned the reads classified as A. fumigatus from each sample to all Aspergillus genomes incorporated in constructing the cRE.21 database. Across positive cases, a median of 97.9% and 97.6% of A. fumigatus classified reads aligned with a high mapping quality to A. fumigatus strains A1163 ( = CBS121325) and Af293 (=CBS126847), respectively (Supplementary Fig. 14). Mapping to the closely related species A. fischeri and A. novofumigatus, on the other hand, only resulted in 54.0% and 36.2% mapped reads and low mapping quality (Supplementary Fig. 14). Minimal alignment was observed for all other Aspergillus species (Supplementary Fig. 14a). Reassuringly, a uniform distribution of cRE.21 classified reads was observed when mapping to the consensus genome A. fumigatus Af293 (Supplementary Fig. 15). Taken together, these findings confirm the presence of true-positive A. fumigatus classified cfDNA reads in our IPA patient liquid biopsy samples.

Discussion

Our objective was to enhance the standardization of liquid-biopsy shotgun microbial NGS and to translate this knowledge into the cfSPI open-source analysis workflow that can be readily accessed and utilized by the scientific community. Our analysis involved both real-world (60 sequence libraries, generated from 46 different liquid biopsy samples; Supplementary Data 2) and simulated Illumina sequencing data (of 87 simulated datasets). We demonstrated that dual mapping to the human CHM13v2.0 and GRCh38.p14 references, along with the incorporation of both genomes into the classification database, is essential to prevent incorrect classification of human reads as microbial reads, potentially reducing the chances of false positive diagnoses. Additionally, we demonstrated that the NCBI RefSeq kraken2 standard database is unable to detect most Aspergillus species, thus highlighting the necessity of database extension, particularly accomplished by incorporation of (pathogenic) fungal species and closely related taxa to enable species level Aspergillus detection. From a wet lab perspective, our results indicate that ss-cfDNA sequencing is a superior method for detecting fungal cfDNA compared to wcDNA sequencing. This is supported by the fact that our wcDNA isolation protocol yielded a maximum fungal count of only 2.55 RPM while the cfDNA yielded up to 12.55 RPM. The insights gained from our comparative research are poised to enhance the optimization of microbial NGS procedures and contribute to exploration of more precise and reliable liquid biopsy-based diagnostics for IMD.

In our retrospective proof-of-principle study, we demonstrated that cfSPI enables the detection of elevated levels of A. fumigatus in 71% (5/7) of BAL samples and 60% (3/5) of plasma samples compared to controls. This suggests that ss-cfDNA sequencing of BAL fluid supernatant — a relatively understudied liquid biopsy type in pediatrics — may offer diagnostic potential, given that BAL fluid fungal fractions typically exceed those in patient plasma samples. This finding aligns with previous research on detecting bacterial and viral pathogens in local body fluids^29,30, as well as prior studies on Aspergillus detection in BAL samples from adults³¹. Nevertheless, due to the minimally invasive nature of plasma sampling, there remains a strong interest in exploring the diagnostic potential of plasma microbial NGS for IMD. The observed sensitivity of plasma sequencing (60%) is consistent with previous sequencing findings in adult cohorts, where sensitivity is ranging between 38.5%¹⁶ and 61%¹⁵, suggesting that BAL testing might be unnecessary in up to 60% of IPA cases. Although this is a small proof-of-principle cohort, sequencing of BAL samples resulted in a better sensitivity than the standard diagnostics in BAL^10,11. Importantly, we noted minimal false positive rates, indicating that cfSPI offers compelling evidence for the presence of Aspergillus.

While the exclusive detection of A. fumigatus in our IPA patient set is in line with findings by Hill et al. ¹⁵, the size and composition of our IPA patient set limits our ability to evaluate the effectiveness of cfSPI ss-cfDNA NGS in identifying other Aspergillus species. Nonetheless, our simulations suggest the capability of detecting diverse Aspergillus species and distinguishing clinically relevant ones such as A. fumigatus and A. terreus. The limited availability of retrospective samples also restricts our ability to definitively demonstrate (or rule out) potential additional benefits of dREM.260 as a supplementary test for pan-Aspergillus detection at the genus level. Therefore, our hypothesis that in cases where the pathogen in a patient is not covered by the compact curated cRE.21 database, resulting in negative species-specific findings, the supplementary dREM.260 results could be crucial for comprehensive pan-Aspergillus detection, requires further investigation and validation. Moreover, a comparison between our optimized ss-cfDNA NGS workflow for Aspergillus detection and conventional diagnostics remains impossible due to the highly limited size of our proof-of-principle and exclusive application of ss-cfDNA to probable IPA cases, thus precluding insight into its diagnostic performance in fungal infections.

To advance this technical research towards potential clinical implementation, our workflow still requires an external validation cohort comprising pediatric patients suspected of IMD, including those classified as possible, probable, and proven cases. A critical step in demonstrating clinical applicability is testing whether our cfSPI workflow maintains its sensitivity and specificity outside of a retrospective, controlled research environment. By making our workflow open-source and designed for in-house use, we aim to facilitate external validation efforts while minimizing turnaround times by eliminating the need for sample shipping. This approach also supports the development of local expertise and ensures adaptability to specific populations through the seamless integration of customized classification databases. These features provide a significant advantage over commercial tests like the Karius Test, of which the database notably shares substantial overlap with the genomes integrated in our cRE.21 database (see Supplementary Fig. 16).

Collectively, our study provides valuable insights in the use of ss-cfDNA NGS for Aspergillus detection in pediatric immunocompromised hosts. The cfSPI framework introduced here has the potential to expedite future fungal-NGS investigations through its open-source analysis workflow. Our findings, complemented with new simulations and LoSD analysis and coupled to benchmarking using short-read sequencing data from fungal culture isolates, can solidify the groundwork for enhancing these open-source detection tests, not only for Aspergillus but also for other pathogens responsible for IMD.

Methods

Aim of this study

The aim of this study is optimization of the microbial NGS workflow tailored to Aspergillus detection in liquid biopsy samples. This involves shotgun sequencing of liquid biopsy samples, such as blood or BAL fluid, with the objective of sequencing intact microbial wcDNA or small DNA molecules from degraded microbes (i.e., cell-free DNA, cfDNA). Following microbial NGS, taxonomic identification of the DNA source is performed by tracing the origin of sequenced DNA molecules. This process includes human read subtraction, taxonomic classification, and subsequent statistical analysis to determine if there is supporting evidence for a pathogenic microbe. In this study, we refined both the wet-lab and computational workflow (cfSPI) by optimizing six key steps (Q#1-6) in microbial NGS-based fungal diagnostics, as detailed in Fig. 1.

IPA patients

Diagnostic work-up of IMD suspected patients includes clinical microbiology

At the Princess Máxima Center for Pediatric Oncology (PMC) in Utrecht, The Netherlands, patients suspected of IMD are evaluated by a chest high-resolution computerized tomography (HRCT) scan. If suspected lesions for IMD are seen on HRCT, a BAL is performed. The BAL fluid provides material for microscopy, fungal culture, galactomannan (GM) assay (Platelia Aspergillus Ag, Bio-Rad), and molecular diagnostics. A BAL GM > 1.0 is considered to be positive according to the criteria of the European Organization for Research and Treatment of Cancer and Mycoses Study Group (EORTC/MSG) criteria⁹.

Retrospective proof-of-principle: inclusion of IPA patient samples

We included plasma and BAL fluid from immunocompromised pediatric patients who were diagnosed with an IPA at the PMC between 2020 and 2023. We searched manually for cases with probable or proven IPA according to the EORTC/MSG criteria⁹ (Table 1).

The date of diagnosis was based on the date of BAL retrieval with a positive microbiological finding. One BAL fluid sample and one blood plasma sample (time-matching the BAL fluid sample or at least one or two days around the date of diagnosis) were used from each case, as well as a plasma sample collected approximately fourteen days prior to the date of diagnosis (Supplementary Fig. 11). These plasma samples collected earlier in time served as internal control samples. We also included BAL fluid and plasma samples from pediatric immunocompromised patients without IPA, so called external control samples. BAL fluid control samples were used from patients pre-HSCT, during an anesthetic procedure for line insertion³². All materials used for this study were clinical samples routinely stored at -70°C. Before freezing, the plasma samples were prepared by centrifuging EDTA plasma to remove the cells. BAL fluids were stored directly after use without prior centrifugation. Our study sets a high standard by incorporating both internal and external control samples, surpassing other studies that solely reported sequencing results from suspected IPA cases.

In total, this study encompassed 25 pediatric patients, including 7 diagnosed with probable IPA and 18 external immunocompromised controls (9 plasma samples and 9 BAL fluids). All included patients provided written informed consent for participation in the biobank for storage and use of their rest materials (International Clinical Trials Registry Platform: NL7744; https://onderzoekmetmensen.nl/en/trial/21619). For use of samples and data in this study we refer to local Biobank and Data Access Committee approval (approval number PMCLAB2022.364). All patients have also provided informed consent for sequencing and use of these data for publication. This study was conducted in compliance with the principles of the Declaration of Helsinki.

Diagnostic sensitivity and specificity

One of the aims of this pilot study was to examine the diagnostic sensitivity of microbial NGS-based detection of Aspergillus DNA in immunocompromised pediatric patients. The diagnostic sensitivity was gauged by the proportion of probable IPA patients displaying significantly heightened levels of at least one Aspergillus species, or at the Aspergillus genus level, in at least one liquid biopsy sample (blood or BAL fluid) in comparison to external controls.

The specificity of microbial NGS was determined by the proportion of external control patient samples that displayed significantly elevated levels of at least one Aspergillus species or at the Aspergillus genus level, in at least one liquid biopsy sample (blood or BAL fluid) in comparison to all other external controls.

Wet-lab

Sample preparation and DNA isolation

Plasma was obtained from EDTA blood samples by centrifugation, 10 min at 1500 g, followed by an additional centrifugation step for 5 min at 12,000 g, to remove all cells (both centrifugation steps at room temperature). Plasma was subsequently stored at -80 °C. Fresh BAL fluid samples were stored at -80 °C until further processing. To separate the (whole) cells from the BAL fluid the samples were centrifuged for 5 min at 13,000 rpm at 4 °C. Subsequently, the pellet and supernatant were processed in separate DNA isolation methods for wcDNA sequencing and cfDNA sequencing, respectively.

cfDNA nucleic acids were isolated from BAL supernatant, EDTA plasma and sterile Dulbecco’s Phosphate Buffered Saline (DPBS), using the Circulating Nucleic Acid Kit (Qiagen, 55114) with the following modifications to the manufacturer’s protocol: A select set of our BAL fluid and plasma samples were complemented with DPBS, up to a total volume of 2 mL. Furthermore, the lysis time was increased from 30 to 60 minutes and the final elution of cfDNA was done with Nuclease Free water (Invitrogen, 10977-035).

DNA from 5 (out of 13) BAL pellets was extracted from whole cells by mechanical bead beating (see Supplementary Data 2, “internal” DNA isolation) where 50 µL DPBS and a 5 mm stainless steel-bead (Qiagen, 69989) were added to the BAL pellet. Samples were beaten for 2 min (with a frequency of 2 5 1/s) on the TissueLyser II (Qiagen) and subsequently diluted in ATL buffer (Qiagen, 939011) and transferred to a fresh tube. Additional ATL buffer was added to a total volume of 300 µL, with an overnight incubation at 56°C after addition of Proteinase K (Qiagen, 19131). wcDNA isolation was performed using the DNeasy Blood & Tissue Kit (Qiagen, 69506), following manufacturers protocol adjusting subsequent volumes to accommodate the larger lysis volume.

From the remaining 8 of our 13 BAL pellet samples (see Supplementary Data 2, external DNA isolation) wcDNA was isolated according to a standard protocol for fungal DNA isolation prior to RT PCR (UMC Utrecht, Dept. Medical Microbiology, The Netherlands). In short, BAL pellets were bead beaten and snap lysed (by freezing them at -80 °C and heating them to 96 °C). After addition of a lysis buffer, nucleic acids were isolated using the MagNA Pure system (Roche).

All DNA samples were quantified using the Qubit dsDNA High Sensitivity assay kit or Broad Range assay kit (Thermofisher Scientific, Q32854 and Q32853, respectively). The DNA fragment length distribution and concentration of the cfDNA was evaluated using the TapeStation 2200, D1000HS kit (Agilent, 5067-5585).

Preparation of next-generation sequencing libraries

For ss-ligation based DNA-capture the SRSLY PicoPlus NGS Library Prep Kit for Illumina (Claret Bioscience, CBS-K250B-96) was used, according to the manufacturer’s Moderate Fragment Retention version of the protocol, using max 5 ng cfDNA as input and 11 PCR cycles. For ds-ligation-based cf/wcDNA-capture the KAPA Library Preparation Kit (Roche) was used, with a maximum of 50 ng of input DNA and 8-12 PCR cycles. For ds-capture of the BAL pellet DNA, the manufacturer’s protocol was followed. For ds-capture of cfDNA the following changes to the manufacturer’s protocol have been made: no fragmentation step and following similar bead clean up steps as the moderate fragment retention protocol to improve the yield of small fragments. All library preparations were quantified using the Qubit dsDNA High Sensitivity Assay Kit (Thermofisher Scientific, 32854) and size distribution was analyzed using the TapeStation 2200, either the D1000 and/or D5000 kits (Agilent, 5067-5583/5067-5589). Some samples showed an aberrant size distribution (substantial fraction of >700 bp fragments); these samples were subjected to a bead-based size selection protocol (see Supplementary Data 2), to remove long fragments as Illumina sequencing is optimized for fragments <700 bp. Concentration (see Supplementary Data 2, total DNA yield in ng) and size were re-evaluated after size selection. Samples were pooled equimolarly and submitted for sequencing.

Next-generation sequencing

Sequencing of all 60 Illumina libraries was conducted on the NovaSeq 6000, 2×150 bp reads, resulting in 42 to 218 million reads per ss-cfDNA sample, 37 to 126 million reads per ds-cfDNA patient sample, and 34 to 62 million reads per wcDNA patient sample (see Supplementary Data 2).

Read simulations

In order to evaluate the impact of kraken2’s reference hash-table database composition and confidence threshold on the Aspergillus classification accuracy, we simulated cfDNA-like datasets from 87 complete, scaffold, or draft genomes derived from the NCBI RefSeq, among which 55 Aspergillus, 7 Penicillium, and 25 other pathogenic fungal genomes with a Illumina read simulator tool named ReSeq³³. To create a realistic error profile of sequencing reads, the unprocessed ss-cfDNA sequencing reads from plasma of patient A01 (i.e., A01Pasp, for details on sample workup see Supplementary Data 2) sequenced with Illumina Novaseq 6000 2×150 bp were mapped to the human GRCh38.p14 reference genome using Bowtie2 (run with option “-X 2000”) alignment software³⁴. Subsequently, the reference sequence statistics were determined using ReSeq (option “--statsOnly”), and 151 bp paired-end reads were simulated in-silico using the illuminaPE ReSeq command (option “--noBias”)³³. For each genome we simulated between 99,050 and 101,023 reads; for details on simulated datasets and the number of reads per dataset see Supplementary Data 1.

Database construction

For the purpose of fungal nucleic acid detection, we utilized 9 kraken2 classification databases, namely uR.7, uR.7 w/o CHM13v2, cRE.21, dRE.21, uRE.21, uRE.31, dRE.31, dREM.258 and dREM.260 of which details on the construction and composition are reported in Supplementary Data 5.

Prior to hash-table construction, specific genomic regions of the reference genomes were masked to prevent spurious misclassifications. This masking procedure involved dustmasking, where low-complexity regions were masked using the DUST algorithm²¹ as advised by the developers of kraken2, more rigorous decontamination efforts thereby masking contaminant organism sequences, or a combination of dustmasking and decontamination (i.e., full cleanup), such as in the work of Lu and Salzberg³⁵. Contaminant organisms’ sequences are sequences within genome assemblies that do not accurately represent the organism’s genetic information.

Database names indicate the masking procedure used (unaltered, dustmasked or cleaned), the database sources (RefSeq, EuPathDB and/or MycoCosm) and the number of Aspergillus species included (Supplementary Fig. 2). For example, cRE.21 refers to the cleaned version of a combination of RefSeq and EuPathDB with a total of 21 Aspergillus species. Each database was built using the NCBI taxonomic information, which was downloaded on 05-05-2023.

The NCBI RefSeq²² is a comprehensive and curated collection of nucleotide sequences, encompassing a wide range of species, which — in the context of kraken2 classification — are often used as a standard reference hash-table database construction. Using the “kraken2-build --download-taxonomy” command, genomes of the NCBI RefSeq were downloaded on 15-05-2023 (RefSeq release 218, file creation date 05-05-2023), including 1,495 archaea, 285,827 bacteria, 498 fungi, 98 protozoa, 14,979 viral and 1 human sequence plus 3,137 contigs which were part of the UniVec_Core. UniVec_Core comprises oligonucleotide and vector sequences sourced from bacteria, phage, yeast, and synthetic constructs, excluding vector sequences of mammalian origin.

The EuPathDB encompasses a curated set of genomic sequences of 386 pathogenic fungi, protists, oomycetes as well as evolutionarily related non-pathogenic species²⁶. EuPathDB genomic sequences were downloaded on 16-05-2023 (file creation date was 28-10-2020), consisting of the following subsets: AmoebaDB (n = 30), CryptoDB (n = 18), FungiDB (n = 164), GiardiaDB (n = 10), MicrosporidiaDB (n = 35), PiroplasmaDB (n = 10), PlasmoDB (n = 45), ToxoDB (n = 33), TrichDB (n = 1) and TriTrypDB (n = 42). Upon inspection post-download, we observed that two fungal genomes were omitted from the purified edition of FungiDB-46. Jennifer Lu, one of the authors of the contaminant removal paper³⁵, kindly supplied us with the latest version of these two genomes:

FungiDB-54_PgraminisCRL75-36-700-3_Genome_cleaned_v_final.fna and FungiDB-54_Ptriticina1-1BBBDRace1_Genome_cleaned_v_final.fna.

An updated version of the seqid2taxid.map used for database construction was also provided by Jennifer Lu. In total, EuPathDB version 46 includes 27 Aspergillus genomes, representing 21 Aspergillus species, while version 64 contains 38 genomes, representing 31 Aspergillus species.

The MycoCosm²⁷ is a web-based resource and information portal developed by the Joint Genome Institute (JGI) for fungal genomics. It provides access to a comprehensive collection of fungal genomes, associated functional annotations, and tools for comparative analysis. We acquired all 763 genomic sequences of Aspergillus assembly scaffolds from this resource, along with their corresponding taxonomic annotations.

Host read subtraction

Our objective was to eliminate potential false positives microbial reads originating from incomplete host read subtraction. To achieve this, we employed mapping to either the GRCh38.p14, CHM13v2, or a dual-mapping utilizing a consolidated reference index for Bowtie2 alignment, encompassing both GRCh38.p14 and CHM13v2, thereby avoiding a two-step mapping process.

Sequencing and synthetic data processing: the cfSPI-pipeline

Illumina sequencing and synthetic data were processed using the Snakemake³⁶ cfSPI-pipeline available in our Github repository (https://github.com/AEWesdorp/cfSPI/cfspi/). In short, duplicates were removed (using nubeam³⁷), after which high-quality sequencing data was generated (using fastp³⁸) by default removal of low quality reads and usage of a low complexity filter as well as by adapter removal and removal of short (<35 bp) reads (using AdapterRemoval³⁹). After subtraction of host sequences by mapping to the human reference genome using Bowtie2³⁴, the remaining paired-end reads were taxonomically classified using kraken2²⁰ with the 9 kraken2 databases specified above, employing a confidence threshold (CT) ranging from 0.0 (no filter on the fraction of k-mers matching, default setting) to 1.0 (100% of k-mers within the read match a taxa, very stringent setting), increments of 0.1.

Remapping of classified reads

Reads classified as Aspergillus at the species level with the cRE.21 database, along with all lower-ranking taxa within the same clade, were aligned to the respective Aspergillus species genomic sequences used for the cRE.21 database construction, through Bowtie2.

Relative abundance per taxon

The relative abundance per taxon within each sample was quantified as Reads Per Million (RPM), a normalized measure that accounts for i.e., differences in sequencing depth. RPM was calculated according to the following formula:

$${RPM}={totalSumTaxonReads}/{totalNumQCReads}* {1,000,000}$$

(1)

totalSumTaxonReads corresponds to the number of reads classified at each taxon (e.g., genus- or species level) and all lower-ranking taxons belonging to the same clade. totalNumQCReads is the count of reads that passed quality control (i.e., number of reads remained after duplicate removal, low-quality and low-complexity reads filtering, and adapter removal; see Supplementary Fig. 1a).

Limits of Significant Detection

In order to explore the relationship between fungal background levels in immunocompromised individuals, the observed classification rate in simulated datasets, and the resulting theoretical ‘Limits of Significant Detection’, we formulated the following methodology. First, we defined our taxon, database, and CT of interest. Second, we set the theoretical sequencing depth (tSD), ranging from 10 to 100 million reads (with intervals of 15 million) as well as a theoretical number of molecules per million (mpm) ranging from 0.25 to 4096. Third, we obtained the median background abundance (ba) observed in our external immunocompromised samples for the specified taxon, database, and CT of interest (normalized, RPM). Fourth, we determined the classification rate (cr) observed in our simulated datasets for the specified taxon, database, and CT of interest. Subsequently, we applied the following calculation to determine the total taxon read count in our artificial sample, rounded to the nearest integer:

$${total\; taxon\; count}=\left\lfloor ({tSD}* {ba})+({tSD}* {mpm}* {cr})+0.5\right\rfloor$$

(2)

Following this, we applied a one-tailed Fisher’s exact test to assess whether the observed total taxon count in our theoretical samples significantly differed from that in our external control samples. A mean p ≤ 0.001 was deemed statistically significant. The lowest mpm value with a p-value ≤ 0.001 was reported as the MPM.

Identification of elevated Aspergillus levels

Following taxonomic classification of all clinical samples, we conducted a one-tailed Fisher’s exact test to assess statistical differences in the read count of a specified taxon between our patient samples and internal/external control samples. This test was based on comparing the following two counts in the contingency tables:

1.
The number of reads classified at the taxon of interest, including reads at the specified taxonomic level (e.g., genus- or species level) and all lower-ranking taxa within the same clade.
2.
The number of reads remaining after duplicate removal, low-quality, and low-complexity read filtering, excluding those classified at the taxon of interest.

The significance level was set at p ≤ 0.001, calculated by deriving the mean of all Fisher’s exact tests conducted across the samples. This analysis aimed to identify meaningful differences in taxon-specific read counts between patient and control groups, considering the overall composition and diversity of microbial taxa in the studied samples.

Statistical analysis

To evaluate the influence of computational choices, including host-read subtraction and database composition, we utilized the one-tailed paired t-test. For comparisons involving sample types and library preparation, we applied the one-tailed Wilcoxon rank-test. To account for multiple testing, we employed Bonferroni correction in these analyses.

Software

Data and statistical analyses were conducted in R (v.4.2.0). Figures were generated in R (v.4.2.0), and illustrations created using BioRender and Adobe Illustrator (2024, v28.6).

Data availability

Data will be made available on reasonable request, through the European Genome-phenome Archive (EGA) under accession number EGAS00001008021. Additionally, taxonomic abundance matrixes are provided via the GitHub repository upon publication.

Code availability

Details regarding the cfSPI pipeline can be found here: https://github.com/AEWesdorp/cfSPI/cfspi/. The code, encompassing data simulation scripts and the analysis of data presented in this manuscript, are available here: https://github.com/AEWesdorp/cfSPI/.

References

Loeffen, E. A. H. et al. Treatment-related mortality in children with cancer: prevalence and risk factors. Eur. J. Cancer 121, 113–122 (2019).
Article PubMed Google Scholar
Pana, Z. D., Roilides, E., Warris, A., Groll, A. H. & Zaoutis, T. Epidemiology of invasive fungal disease in children. J. Pediatric Infect. Dis. Soc. 6, S3–S11 (2017).
Article PubMed PubMed Central Google Scholar
Lehrnbecher, T. et al. Incidence and outcome of invasive fungal diseases in children with hematological malignancies and/or allogeneic hematopoietic stem cell transplantation: Results of a prospective multicenter study. Front. Microbiol. 10, 681 (2019).
Article PubMed PubMed Central Google Scholar
Cesaro, S. et al. Retrospective study on the incidence and outcome of proven and probable invasive fungal infections in high-risk pediatric onco-hematological patients. Eur. J. Haematol. 99, 240–248 (2017).
Article CAS PubMed Google Scholar
Bartlett, A. W. et al. Epidemiology of invasive fungal infections in immunocompromised children; an Australian national 10-year review. Pediatr. Blood Cancer 66, e27564 (2019).
Article PubMed Google Scholar
Kazakou, N. et al. Invasive fungal infections in a pediatric hematology-oncology department: a 16-year retrospective study. Curr. Med. Mycol. 6, 37–42 (2020).
PubMed PubMed Central Google Scholar
Bury, D. et al. Clinical presentation and outcome of invasive mould disease in paediatric patients with acute lymphoblastic leukaemia. EJC Paediatric Oncol. 3, 100143 (2024).
Article Google Scholar
Groll, A. H. et al. 8th European Conference on Infections in Leukaemia: 2020 guidelines for the diagnosis, prevention, and treatment of invasive fungal diseases in paediatric patients with cancer or post-haematopoietic cell transplantation. Lancet Oncol. 22, e254–e269 (2021).
Article PubMed Google Scholar
Donnelly, J. P. et al. Revision and update of the consensus definitions of invasive fungal disease from the European Organization for research and Treatment of cancer and the Mycoses Study Group education and research consortium. Clin. Infect. Dis. 71, 1367–1376 (2020).
Article PubMed Google Scholar
Reinwald, M. et al. Therapy with antifungals decreases the diagnostic performance of PCR for diagnosing invasive aspergillosis in bronchoalveolar lavage samples of patients with haematological malignancies. J. Antimicrob. Chemother. 67, 2260–2267 (2012).
Article CAS PubMed Google Scholar
de Mol, M. et al. Diagnosis of invasive pulmonary aspergillosis in children with bronchoalveolar lavage galactomannan: BAL Galactomannan Aspergillosis Children. Pediatr. Pulmonol. 48, 789–796 (2013).
Article PubMed Google Scholar
Hong, D. K. et al. Liquid biopsy for infectious diseases: sequencing of cell-free plasma to detect pathogen DNA in patients with invasive fungal disease. Diagn. Microbiol. Infect. Dis. 92, 210–213 (2018).
Article CAS PubMed Google Scholar
Ma, X. et al. Invasive pulmonary aspergillosis diagnosis via peripheral blood metagenomic next-generation sequencing. Front. Med. (Lausanne) 9, 751617 (2022).
Article PubMed Google Scholar
Armstrong, A. E. et al. Cell-free DNA next-generation sequencing successfully detects infectious pathogens in pediatric oncology and hematopoietic stem cell transplant patients at risk for invasive fungal disease. Pediatr. Blood Cancer 66, e27734 (2019).
Article PubMed Google Scholar
Hill, J. A. et al. Liquid biopsy for invasive mold infections in hematopoietic cell transplant recipients with pneumonia through next-generation sequencing of microbial cell-free DNA in plasma. Clin. Infect. Dis. 73, e3876–e3883 (2021).
Article CAS PubMed Google Scholar
Huygens, S. et al. Diagnostic value of microbial cell-free DNA sequencing for suspected invasive fungal infections: a retrospective multicenter cohort study. Open Forum Infect. Dis. 11, ofae252 (2024).
Article PubMed PubMed Central Google Scholar
Marcelino, R. et al. The use of taxon-specific reference databases compromises metagenomic classification. BMC Genomics 21, 184 (2020).
Article Google Scholar
He, P. et al. Comparison of metagenomic next-generation sequencing using cell-free DNA and whole-cell DNA for the diagnoses of pulmonary infections. Front. Cell. Infect. Microbiol. 12, 1042945 (2022).
Article CAS PubMed PubMed Central Google Scholar
Burnham, P. et al. Single-stranded DNA library preparation uncovers the origin and diversity of ultrashort cell-free DNA in plasma. Sci. Rep. 6, 27859 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Article CAS PubMed PubMed Central Google Scholar
Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J. Comput. Biol. 13, 1028–1040 (2006).
Article CAS PubMed Google Scholar
Nasko, D. J., Koren, S., Phillippy, A. M. & Treangen, T. J. RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol. 19, 165 (2018).
Article PubMed PubMed Central Google Scholar
Gihawi, A. et al. Major data analysis errors invalidate cancer microbiome findings. MBio 14, e0160723 (2023).
Article PubMed Google Scholar
Wright, R. J., Comeau, A. M. & Langille, M. G. I. From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools. Microb. Genom. 9, 000949 (2023).
CAS PubMed PubMed Central Google Scholar
Méric, G., Wick, R. R., Watts, S. C., Holt, K. E. & Inouye, M. Correcting index databases improves metagenomic studies. bioRxiv 712166 https://doi.org/10.1101/712166 (2019).
Warrenfeltz, S. et al. EuPathDB: The eukaryotic pathogen genomics database resource. Methods Mol. Biol. 1757, 69–113 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ahrendt, S. R., Mondo, S. J., Haridas, S. & Grigoriev, I. V. MycoCosm, the JGI’s fungal genome portal for comparative genomic and multiomics data analyses. Methods Mol. Biol. 2605, 271–291 (2023).
Article CAS PubMed Google Scholar
Yu, L. et al. Metagenomic next-generation sequencing of cell-free and whole-cell DNA in diagnosing central nervous system infections. Front. Cell. Infect. Microbiol. 12, 951703 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gu, W. et al. Detection of cryptogenic malignancies from metagenomic whole genome sequencing of body fluids. Genome Med 13, 98 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sun, T. et al. A paired comparison of plasma and bronchoalveolar lavage fluid for metagenomic next-generation sequencing in critically ill patients with suspected severe pneumonia. Infect. Drug Resist. 15, 4369–4379 (2022).
Article PubMed PubMed Central Google Scholar
Chen, S., Kang, Y., Li, D. & Li, Z. Diagnostic performance of metagenomic next-generation sequencing for the detection of pathogens in bronchoalveolar lavage fluid in patients with pulmonary infections: systematic review and meta-analysis. Int. J. Infect. Dis. 122, 867–873 (2022).
Article CAS PubMed Google Scholar
Versluys, A. B. et al. High diagnostic yield of dedicated pulmonary screening before hematopoietic cell transplantation in children. Biol. Blood Marrow Transplant. 21, 1622–1626 (2015).
Article PubMed PubMed Central Google Scholar
Schmeing, S. & Robinson, M. D. ReSeq simulates realistic Illumina high-throughput sequencing data. Genome Biol 22, 67 (2021).
Article PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lu, J. & Salzberg, S. L. Removing contaminants from databases of draft genomes. PLoS Comput. Biol. 14, e1006277 (2018).
Article PubMed PubMed Central Google Scholar
Mölder, F. et al. Sustainable data analysis with Snakemake. F1000Res. 10, 33 (2021).
Article PubMed PubMed Central Google Scholar
Dai, H. & Guan, Y. Nubeam-dedup: a fast and RAM-efficient tool to de-duplicate sequencing reads without mapping. Bioinformatics 36, 3254–3256 (2020).
Article CAS PubMed Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Schubert, M., Lindgreen, S. & Orlando, L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes 9, 88 (2016).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We acknowledge the Utrecht Sequencing Facility (USEQ) for providing sequencing service and data (USEQ is subsidized by the University Medical Center Utrecht and The Netherlands X-omics Initiative; NWO project 184.034.019). The authors also would like to thank C. Beudeker and M. van der Flier for providing blood plasma samples through the DIAMONDS study. Furthermore, we thank J. Lu for generously supplying an updated version of the EuPathDB seqid2taxid.map and the inclusion of two supplementary genomic FungiDB sequences. The authors would also like to thank A.K.M. Rosendahl Huber for assistance with the preparation of the manuscript as well as W.A.A. Steenhuijsen Piters, A.C. Fluit and J.D.F. Groot-Mijnes for their critical feedback on the setup of the study. Additionally, we would like to express our sincere thanks to all funding agencies whose support and funding have been instrumental in enabling and advancing our research. EW acknowledges support from a Vidi Fellowship (639.072.715) awarded to JdR by the Dutch Organization for Scientific Research (Nederlandse Organisatie voor Wetenschappelijk Onderzoek, NWO).

Author information

These authors contributed equally: Laura Rotte, Li-Ting Chen, Myrthe Jager.

Authors and Affiliations

Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
Emmy Wesdorp, Li-Ting Chen, Myrthe Jager, Nicolle Besselink, Carlo Vermeulen & Jeroen de Ridder
Oncode Institute, Utrecht, The Netherlands
Emmy Wesdorp, Li-Ting Chen, Myrthe Jager, Nicolle Besselink, Carlo Vermeulen & Jeroen de Ridder
Hematopoietic stem cell transplantation, Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
Laura Rotte & Caroline Lindemans
Westerdijk Fungal Biodiversity Institute, Utrecht, The Netherlands
Ferry Hagen
Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, The Netherlands
Ferry Hagen
Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Utrecht, The Netherlands
Ferry Hagen
Department of Medical Microbiology, University Medical Centre Utrecht, Utrecht, The Netherlands
Tjomme van der Bruggen
Department of Pediatric Infectious Diseases and Immunology, Wilhelmina Children’s hospital, UMC Utrecht, Utrecht, The Netherlands
Caroline Lindemans, Tom Wolfs & Louis Bont

Authors

Emmy Wesdorp
View author publications
Search author on:PubMed Google Scholar
Laura Rotte
View author publications
Search author on:PubMed Google Scholar
Li-Ting Chen
View author publications
Search author on:PubMed Google Scholar
Myrthe Jager
View author publications
Search author on:PubMed Google Scholar
Nicolle Besselink
View author publications
Search author on:PubMed Google Scholar
Carlo Vermeulen
View author publications
Search author on:PubMed Google Scholar
Ferry Hagen
View author publications
Search author on:PubMed Google Scholar
Tjomme van der Bruggen
View author publications
Search author on:PubMed Google Scholar
Caroline Lindemans
View author publications
Search author on:PubMed Google Scholar
Tom Wolfs
View author publications
Search author on:PubMed Google Scholar
Louis Bont
View author publications
Search author on:PubMed Google Scholar
Jeroen de Ridder
View author publications
Search author on:PubMed Google Scholar

Contributions

E.W., L.R., L.C., and M.J. conceived experiments and wrote the article. L.R. searched for and collected samples, managed patients, provided clinical information and sample data and interpreted with E.W. sequencing data to clinical data. E.W., L.C. and N.B. performed experiments. E.W. and L.C. contributed to constructing the data analysis pipeline and conducting the data analysis. E.W. curated tables and created figures. M.J., C.V., F.H., T.v.d.B., C.L., T.W., L.B., and J.d.R. designed experiments, contributed to the writing of the article and/or provided (clinical) information.

Corresponding authors

Correspondence to Louis Bont or Jeroen de Ridder.

Ethics declarations

Competing interests

F.H. has received products/financial compensation from Pathonostics, OLM Diagnostics, Altona Diagnostics, EWC Diagnostics, CHROMagar, and IMMY in the context of product validation (and has published or will publish about the outcome of these studies). C.L. has received financial support from Pfizer for educational purposes. L.B. has regular interaction with pharmaceutical and other industrial partners. L.B. has not received personal fees or other personal benefits, but UMCU has received significant funding (>€100,000 per industrial partner) for investigator-initiated studies from AstraZeneca, Sanofi, Janssen, Pfizer, MSD, and MeMed Diagnostics, as well as major funding from the Bill and Melinda Gates Foundation and through public-private partnerships such as the IMI-funded RESCEU and PROMISE projects, involving partners GSK, Novavax, Janssen, AstraZeneca, Pfizer, and Sanofi, along with substantial funding from Julius Clinical for participation in clinical studies sponsored by AstraZeneca, Merck, and Pfizer, and minor funding (€1,000-25,000 per industrial partner) for consultation, DSMB membership, or invited lectures by Ablynx, Bavaria Nordic, GSK, Novavax, Pfizer, Moderna, AstraZeneca, MSD, Sanofi, and Janssen. L.B. is the founding chairman of the ReSViNET Foundation. J.d.R. is cofounder and CTO of Cyclomics, a genomics company. L.R., L.C., M.J., N.B., C.V., T.v.d.B. and T.W. declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wesdorp, E., Rotte, L., Chen, LT. et al. NGS-based Aspergillus detection in plasma and lung lavage of children with invasive pulmonary aspergillosis. npj Genom. Med. 10, 24 (2025). https://doi.org/10.1038/s41525-025-00482-8

Download citation

Received: 03 May 2024
Accepted: 28 February 2025
Published: 17 March 2025
Version of record: 17 March 2025
DOI: https://doi.org/10.1038/s41525-025-00482-8

This article is cited by

Aspergillus fumigatus in mechanically ventilated pneumonia— independent mortality risk and synergistic microbiome signatures from a multicenter mNGS cohort
- Xian Zhang
- JunLong Xu
- Xuwei He
BMC Pulmonary Medicine (2026)