Complete genomic characterization of global pathogens respiratory syntical virus and human norovirus using probe based capture enrichment

Bhamidipati, Sravya V; Surathu, Anil; Chao, Hsu; Agustinho, Daniel P.; Xiang, Qin; Kottapalli, Kavya; Santhanam, Abirami; Momin, Zeineen; Walker, Kimberly; Menon, Vipin K.; Weissenberger, George; Emerick, Nathanael; Mahjabeen, Faria; Meng, Qingchang; Hu, Jianhong; Sucgang, Richard; Henke, David; Sedlazeck, Fritz J.; Khan, Ziad M.; Metcalf, Ginger A.; Avadhanula, Vasanthi; Piedra, Pedro A.; Ramani, Sasirekha; Atmar, Robert L.; Estes, Mary K.; Petrosino, Joseph F.; Gibbs, Richard A.; Muzny, Donna M.; Javornik, Sara Cregeen; Doddapaneni, Harsha

doi:10.1038/s41598-025-03398-6

Download PDF

Article
Open access
Published: 01 July 2025

Complete genomic characterization of global pathogens respiratory syntical virus and human norovirus using probe based capture enrichment

Sravya V Bhamidipati¹^na1,
Anil Surathu^2,3^na1,
Hsu Chao¹,
Daniel P. Agustinho¹,
Qin Xiang¹,
Kavya Kottapalli¹,
Abirami Santhanam¹,
Zeineen Momin¹,
Kimberly Walker¹,
Vipin K. Menon¹,
George Weissenberger¹,
Nathanael Emerick¹,
Faria Mahjabeen¹,
Qingchang Meng¹,
Jianhong Hu¹,
Richard Sucgang^2,3,
David Henke²,
Fritz J. Sedlazeck¹,
Ziad M. Khan¹,
Ginger A. Metcalf¹,
Vasanthi Avadhanula²,
Pedro A. Piedra^2,4,
Sasirekha Ramani²,
Robert L. Atmar^2,5,
Mary K. Estes^2,5,
Joseph F. Petrosino^2,3,
Richard A. Gibbs¹,
Donna M. Muzny¹,
Sara Cregeen Javornik^2,3 &
…
Harsha Doddapaneni¹

Scientific Reports volume 15, Article number: 20526 (2025) Cite this article

3029 Accesses
1 Citations
Metrics details

Subjects

Abstract

Respiratory syncytial virus (RSV) is the leading cause of lower respiratory tract infections in children worldwide, while human noroviruses (HuNoV) are a leading cause of epidemic and sporadic acute gastroenteritis. Generating full-length genome sequences for these viruses is crucial for understanding viral diversity and tracking emerging variants. However, obtaining high-quality sequencing data is often challenging due to viral strain variability, quality, and low titers. Here, we present a set of comprehensive oligonucleotide probe sets designed from 1,570 RSV and 1,376 HuNoV isolate sequences in GenBank. Using these probe sets and a capture enrichment sequencing workflow, 85 RSV positive nasal swab samples and 55 (49 stool and six human intestinal enteroids) HuNoV positive samples encompassing major subtypes and genotypes were characterized. Samples with Ct values 17.0–29.9 for RSV, and 20.2–34.8 for HuNoV, with some HuNoV below the detection limit were sequenced. The percentage of reads mapped to viral genomes was 85.1% for RSV and 40.8% for HuNoV post-capture, compared to 0.08% and 1.15% in pre-capture libraries. Full-length genomes were obtained for all RSV positive samples and in 47/55 HuNoV positive samples—a significant improvement over genome recovery from pre-capture libraries. RSV transcriptome (subgenomic mRNAs) sequences were also characterized from this data.

Genetic diversity of respiratory syncytial virus in children with community-acquired pneumonia in Guangzhou: an epidemiological update

Article Open access 01 July 2025

Norovirus replication, host interactions and vaccine advances

Article 17 January 2025

Genome-wide study of globally distributed respiratory syncytial virus (RSV) strains implicates diversification utilizing phylodynamics and mutational analysis

Article Open access 19 August 2023

Introduction

Respiratory syncytial virus (RSV) and human norovirus (HuNoV) are clinically significant pathogens due to the considerable burden of disease they impose globally ^1,2 . RSV is the leading cause of severe respiratory illness and mortality especially in infants and young children, and a major cause of illness in the elderly³. HuNoV is the most common cause of acute gastroenteritis globally ⁴. While all viruses warrant attention in virology and public health, the high prevalence and broad impact of RSV and HuNoV infections underline their particular importance. RSV and HuNoV are RNA viruses, with distinctive genome structures and characteristics that define their respective families^5,6. RSV belongs to the Pneumoviridae family and Orthopneumovirus hominis species. It carries a single-stranded, negative-sense, non-segmented RNA genome. The RSV genome consists of approximately 15,200 bp containing 10 genes encoding 11 proteins. Each gene encodes for a separate mRNA except M2, which contains two overlapping open reading frames (ORFs). HuNoV is a positive-sense, single-stranded RNA virus that belongs to the Caliciviridae family and Norovirus norwalkense species. The genome is between 7,500 to 7,700 bp in length and is divided into three overlapping ORFs⁷. ORF1 encodes a large polyprotein cleaved into six non-structural proteins, while ORF2 and ORF3 encode the major (VP1) and minor (VP2) capsid proteins respectively. The HuNoV genome is covalently linked at the 5’ end to a small viral protein (VPg), which is instrumental for the initiation of protein synthesis^6,8,9, and is polyadenylated at the 3′-end.

RSV and HuNoV are known for their substantial strain diversity^3,9 and are divided into numerous genotypes, each bearing unique genetic sequences. RSV is divided into two major subtypes: RSV-A and RSV-B, based on major antigenic differences in the G glycoprotein and reactivity to monoclonal antibodies^10,11. These groups are further classified into genotypes based on the nucleotide sequence of the second hypervariable region of the C-terminal end of the G gene. The number of RSV genotypes keeps evolving, with 24 lineages within RSV-A and 16 within RSV-B identified thus far¹². However, there is no consensus on the classification for assigning genotypes or their nomenclature. The most recent genotypes circulating worldwide are RSV/A/Ontario (ON) and RSV/B/Buenos Aires (BA), with a unique 72 and 60 nucleotide duplication in the distal third of the G gene, respectively. Based on phylogenetic analysis of major capsid protein VP1 amino acid sequences, noroviruses are divided into ten genogroups (GI-GX), of which human infections are caused by viruses in GI, GII, GIV, GVIII, and GIX genogroups. Each genogroup is divided into genotypes and some genotypes are further divided into variants⁹. The prototype HuNoV is the GI.1 Norwalk virus, GII.4 viruses are responsible for a majority of the HuNoV outbreaks worldwide, although other genotypes such as GII.17 have emerged as the leading cause of gastroenteritis in some countries in some years⁸. Therefore, obtaining full-length genomes to facilitate accurate characterization of RSV and HuNoV genotypes is important and will aid in obtaining comprehensive information on viral diversity, track the appearance of treatment induced mutants as well as community acquired mutants of these viruses and possibly link specific viral genome signatures with disease severity. This will also help to monitor the presence of intragenotypic and intergenotypic recombinants for HuNoV.There are several demonstrated approaches to obtain genomic sequences from viruses¹³. RSV sequencing has been reported using NGS methods such as overlapping amplicon-based and targeted metagenomic sequencing^14,15,16,17. For HuNoV, amplicon-based sequencing¹⁸, capture probe-based enrichment^19,20, PolyA + enrichment²¹ and long read sequencing²² have been described. Each of these methods has its caveats, and obtaining full-length genomes from these viruses has been challenging due to the sequence heterogeneity among different genotypes and low viral titers in some samples. Furthermore, the current commercial options such as the Twist Comprehensive Viral Research Panel, for capture-based enrichment are designed to enrich and detect a broad range of viruses rather than targeting RSV and HuNoV viruses and all their known genotypes for complete genome sequencing²³. This study aims to provide comprehensive probe sets for these two important viral pathogens and a single workflow that can be used to recover full-length genomes and facilitate accurate genotyping of both viruses. Furthermore, the generated sequence data has been demonstrated for the first time to study the RSV genome ORF expression patterns.

Results

We utilized capture probes and a streamlined target enrichment workflow for sequencing and analysis of RSV and HuNoV genomes (Fig. 1). To demonstrate the utility of the capture enrichment methodology, sequencing data from pre-and post-capture libraries of both RSV and HuNoV were analyzed for efficiency of genome recovery and accuracy of genotyping. Samples used in this study were all RSV or HuNoV positive and their subtypes/genotypes were previously determined using qPCR assays as detailed in the methods. For RSV, 85 post-capture libraries and 24 pre-capture libraries belonging to RSV-A and RSV-B subtypes were sequenced (Table 1). For HuNoV, 55 post- and pre-capture libraries were sequenced. These 55 HuNoV represent GI.1, GII.4, and other GII genotypes (GII.3, GII.6, and GII.17) (Table 1).

Table 1 Sample composition, mapping, and genome assembly statistics.

Full size table

Sequencing results and capture enrichment efficiency

The sequences were trimmed to remove low-quality regions, and the resulting non-human reads were analyzed using the VirMAP pipeline. A summary of the mapping and assembly statistics can be found in Table 1 and Additional Table 1. Overall, post-processing reads (trimmed and host-filtered reads) in the post-capture libraries mapped to their respective target virus; this proportion was significantly lower in pre-capture libraries (Fig. 2).

A total of 1.74 billion raw reads were generated from 85 RSV post-capture libraries with an average of 20.58 million (SD = 54 million) total raw, 300,000 (SD = 6,7000) host-mapped, and 14 million (SD = 39.1 million) viral genome mapped reads. (Table 1). The mean percentage of post-processing reads mapped to the RSV genome was 85.1%. This pattern was similar between RSV-A and RSV-B subtypes (Fig. 2). To assess the enrichment efficiency of post-capture libraries compared to pre-capture libraries, a subset of 24 pre-capture libraries were randomly selected and sequenced they generated a total of 0.32 billion raw reads with an average of 13.3 million (SD = 5.4 million) total raw, 6.7 million (SD = 4.5 million) host-mapped, and 661 (SD = 1,3000) RSV mapped reads (Table 1). The mean percentage of post-processing reads mapped to the RSV genome in the pre-capture libraries was 0.08% (Fig. 2).

The 55 HuNoV post-capture libraries generated a total of 1.31 billion raw reads with an average of 23.9 million (SD = 55.5 million) total raw, 81,123 (SD = 304,000) host mapped, and 13.6 million (SD = 44.1 million) HuNoV mapped reads (Table 1). To assess the capture efficiency, 55 pre-capture libraries were sequenced. They generated a total of 2.54 billion raw with an average of 46.2 million (SD = 57.8 million) total raw, 12.5 million (SD = 40.5 million) host mapped, and 128,000 (SD = 638,000) HuNoV mapped reads (Table 1). The mean percentage of post-processing reads mapped to HuNoV genomes was 40.8% in post-capture libraries and 1.15% in the pre-capture libraries. The percentage of reads that mapped to the HuNoV genomes varied among the genotypes as shown in (Fig. 2). Detailed statistics for RSV and HuNoV genomes can be found in Supplementary Table S1.

The comprehensiveness of genome recovery and genotyping

To evaluate the capability of the capture methodology to assemble full-length genomes, the VirMAP pipeline was used to reconstruct RSV and HuNoV genomes. The VirMAP summary statistics are shown in Fig. 3 and Table 1. Genome recovery success using the capture probe sets was evaluated, by classifying the genome reconstruction as ‘complete’ (within expected length range, > 90% completeness & > 20x coverage), ‘complete with low coverage’ (within expected length range, > 90% completeness & < 20x coverage) or ‘incomplete’ (below expected length range, < 90% completeness & < 20x coverage).

Complete genomes were successfully reconstructed for all 85 post-capture RSV libraries. In the 24 pre-capture libraries, there was one complete genome, six complete with low coverage, and 17 incomplete genomes (Fig. 3). The assembled genome length for the post-capture libraries was between 15,116 and 15,346 bp, and between 11,948 and 15,253 bp in pre-capture libraries (Table 1). The average coverage ranged from 3,153x to 3.05 million x with a mean of 123,000 (SD = 342,000) in post-capture. In 24 pre-capture libraries, it ranged from 1x to 59x, with a mean of 6x (SD = 11) (Fig. 3 and Supplementary Table S1). The 85 RSV post-capture genomes had a completeness of 99–100%, allowing the assignment of subtype as RSV-A or RSV-B (Supplementary Table S1).

Of the 55 HuNoV post-capture libraries, 47 samples yielded complete genomes with > 20x coverage (Fig. 3 and Table 1). The complete genomes were assembled from 34/35 samples (97%) with Ct values between 20 and 30, 11/13 samples (85%) with Ct values between 30 and 35, and 2/7 (29%) samples that were below the limit of the standard RT-qPCR assay.

In addition, complete genomes could be assembled in two samples with < 20x coverage. The two samples were p1540-BCM18-4 with a Ct value of 30.4 and p1540-BCM18-5-AP which had a high Ct value of 34.4. p1540-BCM18-4 produced a low coverage (10x) complete genome. Similarly, a low-coverage complete genome (90% and 15x) was recovered from sample p1540-BCM18-5-AP.

The four samples with incomplete genomes had Ct values ranging from 34.5 to Ct below the detection limit. The remaining two samples that failed to produce genome assemblies had Ct values of 28.3 and below the detection limit, respectively, and both underperformed in the pre-capture libraries, pointing to sample-related issues.

Of the 55 HuNoV pre-capture, 18 samples yielded complete genomes. There were 7 samples with complete low coverage, 17 with incomplete genomes, and 13 samples for which the genome assembly failed (Fig. 3 and Table 1).

The assembled genome lengths of the HuNoV post-capture libraries were between 0 and 7,671 bp and for pre-capture libraries between 0 and 7,651 bp (Table 1). The genome coverage ranged from 0 × to 3.64 million x, with a mean of 241,000x (SD = 782,000) in the post-capture libraries. The pre-capture libraries yielded a genome coverage range between 0 and 78,000x, with a mean of 2,284x (SD = 113,000) (Table 1 and Supplementary Table S1).

Complete HuNoV genome reconstructions were genotyped via the CDC-developed Human Calicivirus Typing Tool²⁴. Of the 47 samples with complete genomes, 22 belonged to GI.1, 15 belonged to GII.4 and the remaining 10 belonged to other GII genotypes (Fig. 3 and Table 1). In both RSV and HuNoV data sets, there was agreement in subtype or genotype assignment between the complete post-capture and pre-capture genomes.

To assess the ability of this probe-based capture enrichment method to enhance viral genome coverage depth, we realigned reads to either a reference genome (RSV) or individual sample-assembled genomes (HuNoV) and calculated the percentage of bases in the genome that are covered at a minimum of 20x in both post- and pre-capture libraries. Through this analysis, three HuNoV samples that met the first genome completeness criteria showed a relatively low breadth of 20x coverage (Fig. 4). To rule out any process-related issues or problems with the capture probe itself, capture was repeated using newly prepared libraries and were sequenced for these three samples (p1540-723–100,595-AP, p1540-TCH-17–78-AP, and p1540-BCM18-5-AP). The results were the same as the first time, indicating that the problem is sample-related.

RSV ORF expression

To identify and quantify sub-genomic mRNAs, the sequenced RSV reads were aligned to RSV-A or RSV-B reference genomes. The RSV genome has a total of 11 ORFs and the ORF read coverage for genotypes RSV-A and RSV-B are presented as normalized read pair counts (FPKM) (Fig. 5).

A total of 46 samples were infected with RSV-A subtype. All 11 ORFs were quantified in post-capture libraries (Fig. 5). ORFs SH and G had the highest expression with an average of 124,303 and 109,011 FPKM respectively (Supplementary Table S2). ORF M2-2 & M2-1, on the other hand, had the lowest expression with 19,890 and 26,690 FPKM respectively.

In comparison, 13 pre-capture libraries belonging to the RSV-A genotype, ORFs SH and G showed the highest expression, with an average of 139,449 and 109,086 FPKM respectively. The lowest expression was seen in ORFs NS2 and M2-2, with an average of 13,659 and 23,684 FPKM, respectively. Incomplete expression of ORFs was recorded in 9 pre-capture libraries, likely due to low read coverage. Notably, NS2 and M2-2 were not detectable in 7 and 6 of the pre-capture libraries, respectively (Supplementary Table S2).

The remaining 39 samples were infected with the RSV-B subtype, all 11 ORFs were expressed in post-capture. ORFs G and M had the highest average FPKM values of 98,558 and 49,966, respectively, and ORFs M2-2 and N had the lowest values of 16,173 and 25,708 FPKM, respectively (Fig. 5).

In the 11 pre-capture libraries, the expression level was highest in ORFs G and NS1 with average values of 106,693 and 50,260 FPKM, and the lowest values of 13,829 and 20,336 were in ORFs M2-2 and M2-1 respectively. In 6 pre-capture libraries, incomplete expression of ORFs occurred. Expression was not detected in 5 libraries for ORFs M2-2 and M2-1, while SH ORF expression was not detected in 4 libraries (Supplementary Table S2).

Discussion

In this study, comprehensive capture probes were designed and used in conjunction with the capture enrichment method to sequence complete RSV and HuNoV genomes from clinical samples. Acute respiratory infections and acute gastroenteritis caused by viruses are leading causes of morbidity and mortality worldwide and account for over three million healthcare visits annually in the US. RSV is the major respiratory pathogen causing illness and deaths among infants and the elderly, with over 33 million cases of lower respiratory tract illness reported each year². Similarly, HuNoV is the most common cause of acute gastroenteritis worldwide, with an estimated 677 million cases reported each year²⁵. The epidemiology of HuNoV is complex involving multiple genogroups and genotypes of viruses. New variants of the globally dominant genotype GII.4 are known to periodically replace circulating variants. The incidence of both RSV and HuNoV had substantially reduced during the COVID-19 pandemic but has now returned to pre-pandemic levels^26,27. With the introduction of RSV vaccines, it is important to determine if there are evolutionary changes in the virus due to selection pressure. Tracking genomic changes over time for both these viruses is thus important for monitoring virus evolution and will also be useful to define factors associated with virulence for these two pathogens.

Since these viruses represent two significant pathogens, requiring reliable methods for studying their genomic variability and evolution is imperative. The use of capture enrichment methodology overcomes any PCR primer design problems across the diverse viral strains and reduces non-target sequencing typically seen in standard RNA-seq.

Recently, Baier et al., designed their RSV capture probe set using a total of 1,101 complete genome sequences and used it to characterize the RSV-B outbreak in 2019 in four patients¹⁴. Previously probe-based capture enrichment for HuNoV from human samples²⁸ and infected oysters²⁰ was reported. Brown et al. reported the largest HuNoV probe set of the two studies which was designed using 622 norovirus partial or complete genomes and tested using different isolates of GI and GII¹⁹. In this study, we report the custom-designed RSV probe set, based on 1,570 genomic sequences, covering 99.79% of targeted isolates, and the HuNoV probe set, designed from 1,376 sequences, covering 99.68% of targeted isolates which, to our knowledge, this represents the most comprehensive probe sets designed to date for sequencing the RSV and HuNoV.

Several process improvements such as sorting samples based on the Ct values (from high titer to low titer) on a plate during cDNA and library construction and arraying samples in alternate columns on a plate, were implemented to mitigate any potential contamination between samples. For target enrichment, to manage uneven sequence yields among samples, based on our previous experiences with SARS-CoV-2 enrichment, library pools were created based on Ct. values²⁹. While the uneven yields were still noted in these pools, enough reads were obtained for all 85 RSV and 47/55 HuNoV samples to generate full-length genomes.

A comparison between post-and pre-capture libraries for both RSV and HuNoV samples revealed that the percentage of reads aligning to the target virus genome (Table 1; Fig. 2), as well as the number of samples that resulted in full-length genomes (Fig. 3 and Fig. 4), was significantly higher in the post-capture libraries compared to the pre-capture libraries. Post-capture libraries showed 85.1% of reads mapping to the RSV genome, an 850x enrichment over 0.08% in pre-capture libraries. In HuNoV samples, 40.8% of reads mapped post-capture, a 40.8x increase from the 1.15% in pre-capture libraries. These results are in line with previously reported probe-based enrichment methods for viral sequencing^29,30.

Complete genomes were successfully assembled for all 85 RSV post-capture libraries, while only one complete genome was recovered from 24 pre-capture libraries. There were six samples under the ‘complete with low coverage’ genomes category and 17 samples with ‘incomplete’ genomes. (Table 1 and Supplementary Table S1). Subtypes could be assigned to all 85 samples with 46 RSV-A subtypes and 39 RSV-B subtypes. RSVAB-WGS³¹ is an amplicon-based protocol specifically designed for sequencing the RSV genome. It utilizes 12 primers to cover both subtypes, generating PCR fragments ranging from 1.5 to 2.5 kb. In 34 clinical samples, over 90% of the genome was recovered for Ct. values ≤ 25, while coverage dropped to 60–90% for Ct. 26–27 and 50% for Ct. above 27. In our study, we recovered full-length genomes from RSV A and B subtypes up to Ct. 30.

Complete genomes were successfully reconstructed for 47/55 HuNoV post-capture libraries. Among the remaining eight, two samples were categorized as ‘complete with low coverage’, four had ‘incomplete’ genomes and two samples failed to generate genome assemblies. As previous works have demonstrated, for reliable genome recovery the upper Ct threshold is approximately 30–33 cycles^29,32. Several factors play into this threshold, which includes the stage of the infection cycle when the viruses are actively replicating as well as the quality and quantity of the viral RNA. In this data set, complete genomes with > 20x coverage could be recovered in 97% (34/35) of the samples with Ct < 30, there was one sample p1336-100,626 (Supplementary Table S1 with a Ct value of 28.3 failed to produce a full-length genome in both post and pre-capture libraries pointing towards the sample-related issue. Similarly, complete genomes with > 20x coverage could be recovered in 65% (13/20) of samples with Ct > 30 using the capture probes. Of note, two samples, failed in both post and pre-capture sequencing (2/8 samples), suggesting low viral titers or poor sample quality and 5 samples were below the limit of detection. In the pre-capture set, only 18 out of 55 yielded complete genomes (Fig. 3), together suggesting that capture enrichment is highly desirable.

The breadth of coverage at 20x depth was calculated to assess the efficiency of capture enrichment to enhance viral genome coverage depth (Fig. 4). Notably, a substantial increase was observed in RSV, with both RSV-A and RSV-B samples exhibiting a dramatic post-capture rise in 20x coverage. HuNoV samples also displayed increased coverage post-capture, with remarkable coverage improvement across distinct genotypes, suggesting that the capture method offers significant benefits for RSV and HuNoV genome sequencing.

Both the results of this study and previous reports have shown that oligonucleotide capture methods show robust performance as the probes can tolerate variation in target sequences during enrichment, have overlapping designs, and can enrich degraded samples, thereby greatly improving the chances of complete genome recovery^29,30,33. As shown previously in Influenza A H3N2 virus³⁴ and Zika virus³⁵, the capture methodology has built-in tolerance to successfully enrich sequences from strains and genomic regions that could have mutations and therefore, the probes can enrich genomes from genotypes/genogroups.

The capture probes and the methodology described in this paper have been previously utilized to generate whole genome sequencing of both RSV and HuNoV clinical samples^36,37. In the RSV study, 69 samples were collected longitudinally from HCT adults with normal (< 14 days) and delayed (≥ 14 days) RSV clearance enrolled in a Ribavirin trial. Full-length genomes obtained from post-capture sequencing were analyzed across RSV-A or RSV-B to determine the inter-host and intra-host genetic variation and the effect on glycosylation³⁶. The inter-host variation of RSV centered mostly in the Glycoprotein (G) gene followed by polymerase (L) and matrix (M) genes. Intra-host variation, primarily occurred in the G gene followed by non-structural protein (NS1), and L genes, and only in the delayed viral clearance group suggesting prolonged viral shedding and immune deficiency result in RSV variation. Studies involving RSV genomes and transcripts (ORFs) using capture probe–based sequencing in adults receiving hematopoietic stem cell transplantation demonstrated that highly abundant RSV transcripts had lower genetic variation than RSV genomes, implying a putative role in stabilizing the intra-host viral variation. In addition, capture based RSV sequencing can be used to measure RSV evolutionary dynamics. Whole genome sequencing can be used to study how RSV evolves over time, how different lineages appear, disappear, and accumulate variations, as well as how these changes differ between RSV subtypes and how various parts of the RSV genome evolve over time³⁸.

In the HuNoV study, the evolutionary dynamics of human norovirus in healthy adults were studied using 156 HuNoV sequential samples from a controlled infection study³⁷.

Complete genomes were assembled for 123 of 156 samples (79%) including 45% of samples with Ct values below the limit of detection (> 36 cycles) of the GI.1 genotype and collected up to 28 days post-infection. Non-synonymous amino acid changes were observed in all proteins, with capsid VP1 and nonstructural protein NS3 showing the highest variations. These findings indicate limited conserved immune pressure-driven evolution of the GI.1 virus in healthy adults and highlight the utility of capture-based sequencing to understand HuNoV biology.

Studying viral ORF expression is important to understanding viral pathogenesis, differentiation factors between subtypes, and the effects of genomic mutations on gene function including vaccine development. The RSV genome codes for 11 viral proteins, including three transmembrane glycoproteins G, F, and SH; matrix protein (M) and two transcription/replication regulating proteins (M2-1 and M2-2); three proteins related to nucleocapsid (N, P, L), and lastly two non-structural proteins NS1 and NS2³⁹. There are multiple reports of RSV ORF expression analysis where earlier studies suggested a gradient of gene transcription across the genome. ORF NS1 had the highest and ORF L had the lowest expression. Later reports demonstrated non-gradient mRNA levels, with the highest expression levels of the attachment ORF G^40,41,42. Differential patterns in RSV ORF expression in genotypes are also known⁴³. None of these studies used data from capture-enriched libraries that provide higher efficiency in RSV sequence recovery directly from patient samples.

Here we for the first time demonstrated the use of RSV sequence data generated from strand-specific libraries to study ORF expression. RSV is a negative-sense RNA virus and the ORFs are positive-strand mRNAs therefore, the reads from a strand-specific library derived from the sense strand (mRNA) will map onto the antisense strand of the reference genome, while those obtained from the genomic RNA map onto the sense strand.

While the ORF expression between post-capture and pre-capture libraries showed similar trends (Fig. 5) differences in ORF expression were not observed in a substantial number of both the RSV-A and RSV-B pre-capture libraries. This is not surprising given the low percentage of viral reads observed in these libraries. These results strongly suggest that the capture methodology significantly increased our ability to analyze ORF expression patterns without inducing any technical biases. Additionally, ORF expression differences were also noted between the two subtypes (Fig. 5). RSV-A subtype samples showed the highest expression in transmembrane ORFs SH and G, while ORFs M2-2 and M2-1 showed the lowest expression. RSV-B subtype samples had the highest expression in ORFs G and M and the lowest in ORFs M2-2 and N. Such genotype-specific differences were also reported by our group as well as others^15,44. ORF gene expression generated from this approach can be utilized to investigate differences in viral gene expression in vitro within organoid models across various strains and hosts aiding in the study of RSV pathogenesis.

ORF analysis in HuNoV samples is not possible using the short reads generated in this study, as both the genome and ORFs in HuNoV are positive-strand RNA. Further, unlike the SARS-CoV-2 genome, where each ORF has a 5’ leader sequence, there are no such key ORF sequence differentiators in HuNoV that could be used to identify reads specifically originating from ORFs. Long-read sequencing data is recommended to identify and analyze HuNoV ORF expression profiles.

Conclusion

In conclusion, we describe two comprehensive probe sets and the capture enrichment methodology to successfully recover complete genomes from diverse genotypes of two important human viral pathogens. The methodology described to obtain the complete genome sequences is already in use to study viral genome evolution in these viruses. This type of sequencing data is also useful, as demonstrated here, in studying the RSV ORF expression patterns.

Methods

Samples used in this study

RSV samples are part of active surveillance of pediatric acute respiratory illness (ARI) through the CDC’s New Vaccine Surveillance Network (NVSN)⁴⁵. RSV-positive samples were collected from patients enrolled at the Houston NVSN site only. RSV-positive samples were collected as mid-turbinate nasal and throat swabs in viral transport media and aliquoted and stored at -80 °C until testing or sequencing is performed. An aliquot of 200ul is then extracted as described in the methods for RNA isolation. There is no pre-treatment.. HuNoV positive stool samples were collected as part of a controlled human infection model for GI.1 virus⁴⁶ and there was no pre-treatment during RNA isolation. In addition, stool samples that were collected for HuNoV cultivation in human intestinal enteroids from several sources at the Texas Medical Center.In total 85 RSV samples and 55 HuNoV samples were characterized. In the HuNoV sample set 54 samples were unique and one sample had two different extraction methods but was treated as an independent sample.

All 85 RSV samples and 49/55 HuNoV are collected from infected persons while the remaining 6 HuNoV samples are from HuNoV-infected human intestinal enteroids.

RNA isolation

For the 85 RSV samples, approximately 200ul of each primary sample was extracted using the PureLink Pro Viral 96 DNA/RNA extraction kit (Thermo 12280096A) following the manufacturer’s instructions. Samples were eluted in 100ul.

For the 55 HuNoV stool samples, three RNA extraction kits were used starting with 0.2g of primary sample. For 33 samples, the MagAttract PowerMicrobiome DNA/RNA extraction kit (Qiagen 27,500–4-EP) and for 16 samples, the AllPrep PowerFecal Pro DNA/RNA extraction kit (Qiagen 80,254) was used. For the 6 HuNoV infected human intestinal enteroids, RNA was isolated using the MagMAX-96 viral RNA isolation kit. Samples were eluted in 100ul.

Viral titer quantification

Real-time qPCR of RSV was performed using primers targeting the N gene as previously described⁴⁷.

HuNoV titers were assessed by reverse transcription-quantitative polymerase chain reaction (RT-qPCR), using the qScript XLT One-Step RT-qPCR ToughMix reagent with ROX reference dye (Quanta Biosciences). The primer pair and probe COG2R/QNIF2d/QNIFS⁴⁸ was used for GII genotype and NIFG1F/V1LCR/NIFG1P was used for GI.1 genotype. Per sample Ct values can be found in Supplementary Table S1.

Capture probe design

The RSV probe set size was 23.77Mb and was designed based on 1,570 publicly available genomic sequences of RSV isolates. There are 87,025 unique probes of 80 bp length covering 99.79% of the targeted RSV isolates. The HuNoV probe set size was 9.6Mb and was designed based on 1,376 publicly available genomic sequences of HuNoV isolates. There are 39,300 unique probes of 80 bp length covering 99.68% of the targeted HuNoV isolates (Table 2). All major HuNoV genogroups were included in the probe design. The GenBank IDs for the references can be found in the capture design files of both RSV and HuNoV (see Supplementary File 1 and Supplementary File 2).

Table 2 The number of isolates used and the final capture probe design details.

Full size table

cDNA preparation

Samples were processed in alternate columns on a 96-well plate and sorted from top left to bottom right from the highest titer to the lowest titer. On each plate, along with the samples, an aliquot of Universal Human Reference (UHR) RNA sample (ThermoFisher) was processed which served as a positive control for cDNA synthesis and library preparation. For each batch of RNA extracted, there was an extraction negative control. cDNA was generated utilizing NEBNext® RNA First Strand Synthesis Module (E7525L; New England Biolabs Inc.) and NEBNext® Ultra™ II Directional RNA Second Strand Synthesis Module (E7550L; New England Biolabs Inc.). Total RNA (5μl) in a 15μl mixture containing random primers and 2X 1st strand cDNA synthesis buffer were incubated at 94°C for 10 min. to fragment the RNA to 200–600 bp. RNA was converted to cDNA by adding a 5 µl enzyme mix containing 500ng Actinomycin D (A7592, Thermo Fisher Scientific), 0.5 μl RNase inhibitor, and 1 μl of Protoscript II reverse transcriptase, then incubated at 25°C for 10 min, 42°C for 50 min, 70°C 15 min, before being cooled to 4°C on a thermocycler. Second strand cDNA was synthesized by adding 60 μl of mix containing 48 μl H2O, 8 μl of 10X reaction buffer, and 4 μl of 2nd strand synthesis enzyme, and incubated at 16°C for 1 h on a thermocycler. The double-stranded (ds) cDNA was purified with 1.8X volume of AMPure XP beads (A63882, Beckman) and eluted into 42 μl 10 mM Tris buffer (Cat#A33566, Thermo Fisher Scientific). Because these libraries were prepared primarily for sequence capture, rRNA depletion or Poly A + RNA isolation steps were not performed.

Capture enrichment and sequencing

RSV and HuNoV cDNA were hybridized in separate pools with biotin-labeled RSV and HuNoV capture probes. The 85 RSV samples were enriched in three library pools consisting of 24 samples with Ct. values 17 to 21.5, 31 samples with Ct. values 21.8 to 25, and 30 samples with Ct. values 25.1 to 29.9 along with samples with Ct ND. The 55 HuNoV libraries were grouped as three pools, with one pool containing 14 samples with Ct. values between 21.5 and 25.7 and a second pool containing 13 samples with Ct. values between 26.3 and 34.5 along with samples with Ct values ND, and the third pool containing 28 samples with Ct. values between 20.2 and 34.88.

All six pools of cDNA libraries were incubated at 70°C for 16 h followed by enrichment PCR as previously reported²⁹. The amount of each cDNA library pooled for hybridization and post-capture amplification of 12–20 PCR cycles was determined empirically according to the virus Ct values. Between 1.8 and 4.0 μg pre-capture library was used for hybridization with the viral probes and the post-capture libraries were sequenced on Illumina NovaSeq S4 flow cell, to generate 2 × 150 bp paired-end reads. Pre-capture libraries for 24 RSV samples and all 55 of the HuNoV samples were also sequenced.

RSV and HuNoV genome assembly

Following sequencing, raw data files in binary base call (BCL) format were converted into FASTQs and demultiplexed based on the dual-index barcodes using the Illumina ‘bcl2fastq’ software. Demultiplexed raw fastq sequences were processed using BBDuk⁴⁹( to quality trim, remove Illumina adapters, and filter PhiX reads. Trimmed FASTQs were mapped to a combined PhiX (and human reference genome database (hg38) using BBMap⁴⁹ to determine and remove human/PhiX reads. Trimmed and host-filtered reads were processed through VirMAP⁵⁰ to assemble complete RSV or HuNoV genomes. The VirMAP summary statistics include information on reconstructed genome length, the number of reads mapped to the reconstruction, and the average coverage across the genome.

HuNoV genome reconstructions were genotyped via the CDC-developed Human Calicivirus Typing Tool²⁴. Final reconstructions were manually inspected using Geneious Prime® 2022.1.1 and aligned against the relevant HuNoV or RSV reference genomes to determine the quality of assemblies. The breadth of coverage at 20x depth was calculated by re-aligning the raw reads reference genome (RSV) or individual sample-assembled genome (HuNoV) using BWA MEM⁵¹ (version 0.7.17-r1188) with standard parameters. Coverage for each sample was assessed using “samtools depth” (version 1.6), applying a mapping quality filter of 20 phred scores (-q 20). Downstream analysis of summary statistics was done using R⁵².

RSV expression profile analysis

VirMAP⁵⁰ trimmed reads from both the pre-and post-capture datasets were mapped to RSV-A ON and RSV-B BA reference genomes³⁶, according to the sample genotypes, using BBMap version 39.01. Gene annotation for the reference genomes ON and BAF was conducted using VIGOR⁵³. Since RSV is a negative-stranded RNA virus, read pairs with read 1 mapped to the negative strand are from the viral genome, while read pairs with read 1 mapped to the positive strand of the reference genome are from the viral mRNAs. Read pairs were assigned to each gene using featureCounts version 2.0.1⁵⁴ with “-s 1 -p” options for counting read pairs mapped to the positive strand of the reference genome. The read pair counts assigned to each gene were then normalized to the number of read pairs per kb gene length and per million mapped reads (FPKM) and plotted using the R ggplot2 package⁵⁵.

Data availability

The datasets generated and or analyzed during the current study are available in NCBI GenBank and SRA, under the accession numbers PRJNA1195114, PRJNA1195142, and PRJNA1195144. Analysis and figure code are available at the following GitHub link⁵⁶. Correspondence and requests for data and materials should be addressed to the corresponding authors Sara Javornik Cregeen and Harsha Doddapaneni.

References

Ahmed, S. M. et al. Global prevalence of norovirus in cases of gastroenteritis: a systematic review and meta-analysis. Lancet Infect Dis 14(8), 725–730 (2014).
Article PubMed PubMed Central Google Scholar
Li, Y. et al. Global, regional, and national disease burden estimates of acute lower respiratory infections due to respiratory syncytial virus in children younger than 5 years in 2019: a systematic analysis. Lancet 399(10340), 2047–2064 (2022).
Article PubMed PubMed Central Google Scholar
Yu, J. M., Fu, Y. H., Peng, X. L., Zheng, Y. P. & He, J. S. Genetic diversity and molecular evolution of human respiratory syncytial virus A and B. Sci Rep 11(1), 12941 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Yen, C. et al. Impact of an emergent norovirus variant in 2009 on norovirus outbreak activity in the United States. Clin Infect Dis 53(6), 568–571 (2011).
Article PubMed Google Scholar
Pangesti, K. N. A., Abd El Ghany, M., Walsh, M. G., Kesson, A. M. & Hill-Cawthorne, G. A. Molecular epidemiology of respiratory syncytial virus. Rev Med Virol https://doi.org/10.1002/rmv.1968 (2018).
Article PubMed Google Scholar
Ludwig-Begall, L. F., Mauroy, A. & Thiry, E. Noroviruses—the state of the art, nearly fifty years after their initial discovery. Viruses 13(8), 1541. https://doi.org/10.3390/v13081541 (2021).
Article CAS PubMed PubMed Central Google Scholar
Atmar, R. L. & Estes, M. K. Diagnosis of noncultivatable gastroenteritis viruses, the human caliciviruses. Clin Microbiol Rev 14(1), 15–37 (2001).
Article CAS PubMed PubMed Central Google Scholar
Robilotti, E., Deresinski, S. & Pinsky, B. A. Norovirus. Clin Microbiol Rev 28(1), 134–164 (2015).
Article CAS PubMed PubMed Central Google Scholar
Chhabra, P. et al. Updated classification of norovirus genogroups and genotypes. J Gen Virol 100(10), 1393–1406 (2019).
Article CAS PubMed PubMed Central Google Scholar
Anderson, L. J. et al. Antigenic characterization of respiratory syncytial virus strains with monoclonal antibodies. J Infect Dis 151(4), 626–633 (1985).
Article CAS PubMed Google Scholar
Mufson, M. A., Orvell, C., Rafnar, B. & Norrby, E. Two distinct subtypes of human respiratory syncytial virus. J Gen Virol 66(Pt 10), 2111–2124 (1985).
Article CAS PubMed Google Scholar
Goya, S. et al. Standardized phylogenetic classification of human respiratory syncytial virus below the subgroup level. Emerg Infect Dis 30(8), 1631–1641 (2024).
Article CAS PubMed PubMed Central Google Scholar
Fitzpatrick, A. H. et al. High throughput sequencing for the detection and characterization of RNA viruses. Front Microbiol 12, 621719 (2021).
Article PubMed PubMed Central Google Scholar
Baier, C. et al. Target capture sequencing reveals a monoclonal outbreak of respiratory syncytial virus B infections among adult hematologic patients. Antimicrob Resist Infect Control 11(1), 88 (2022).
Article PubMed PubMed Central Google Scholar
Lin, G. L. et al. Simultaneous viral whole-genome sequencing and differential expression profiling in respiratory syncytial virus infection of infants. J Infect Dis 222(Suppl 7), S666–S671 (2020).
Article CAS PubMed Google Scholar
Talts, T. et al. Robust and sensitive amplicon-based whole-genome sequencing assay of respiratory syncytial virus subtype A and B. Microbiol. Spectr. https://doi.org/10.1128/spectrum.03067-23 (2024).
Article PubMed PubMed Central Google Scholar
Wang, L. et al. Next-generation sequencing of human respiratory syncytial virus subgroups A and B genomes. J. Virol Methods 299, 114335 (2022).
Article CAS PubMed Google Scholar
Fitzpatrick, A. H. et al. Amplicon-based high-throughput sequencing method for genotypic characterization of norovirus in oysters. Appl. Environ. Microbiol. 89(5), e0216522 (2023).
Article PubMed Google Scholar
Brown, J. R. et al. Norovirus whole-genome sequencing by sureselect target enrichment: a robust and sensitive method. J. Clin. Microbiol. 54(10), 2530–2537 (2016).
Article CAS PubMed PubMed Central Google Scholar
Strubbia, S. et al. Metagenomic to evaluate norovirus genomic diversity in oysters: Impact on hexamer selection and targeted capture-based enrichment. Int. J. Food Microbiol. 323, 108588 (2020).
Article CAS PubMed Google Scholar
Fonager, J. et al. A universal primer-independent next-generation sequencing approach for investigations of norovirus outbreaks and novel variants. Sci Rep 7(1), 813 (2017).
Article ADS PubMed PubMed Central Google Scholar
Flint, A. et al. Genomic analysis of human noroviruses using combined Illumina-Nanopore data. Virus Evol 7(2), 079 (2021).
Article Google Scholar
Kapel, N. et al. Evaluation of sequence hybridization for respiratory viruses using the twist bioscience respiratory virus research panel and the OneCodex Respiratory Virus sequence analysis workflow. Microb Genom 9(9), 001103 (2023).
CAS PubMed PubMed Central Google Scholar
Human Calcivirus Typing Tool. https://calicivirustypingtool.cdc.gov/bctyping.html
Pires, S. M. et al. Aetiology-specific estimates of the global and regional incidence and mortality of diarrhoeal diseases commonly transmitted through food. PLoS ONE 10(12), e0142927 (2015).
Article PubMed PubMed Central Google Scholar
Hamid, S. et al. Seasonality of respiratory syncytial virus–United States, 2017–2023. MMWR Morb Mortal Wkly Rep 72(14), 355–361 (2023).
Article PubMed PubMed Central Google Scholar
Kambhampati, A. K. et al. Notes from the field: norovirus outbreaks reported through NoroSTAT - 12 States, August 2012-July 2022. MMWR Morb Mortal Wkly Rep 71(38), 1222–1224 (2022).
Article PubMed PubMed Central Google Scholar
Strubbia, S. et al. Characterization of norovirus and other human enteric viruses in sewage and stool samples through next-generation sequencing. Food Environ Virol 11(4), 400–409 (2019).
Article CAS PubMed PubMed Central Google Scholar
Doddapaneni, H. et al. Oligonucleotide capture sequencing of the SARS-CoV-2 genome and subgenomic fragments from COVID-19 individuals. PLoS ONE 16(8), e0244468 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kuchinski, K. S. et al. Targeted genomic sequencing with probe capture for discovery and surveillance of coronaviruses in bats. eLife https://doi.org/10.7554/eLife.79777 (2022).
Article PubMed PubMed Central Google Scholar
Iglesias-Caballero, M. et al. Genomic characterisation of respiratory syncytial virus: a novel system for whole genome sequencing and full-length G and F gene sequences. Eurosurveillance https://doi.org/10.2807/1560-7917.ES.2023.28.49.2300637 (2023).
Article PubMed PubMed Central Google Scholar
Xiao, M. et al. Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples. Genome Med 12(1), 57 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wylie, K. M. et al. Detection of viruses in clinical samples by use of metagenomic sequencing and targeted sequence capture. J. Clin. Microbiol. https://doi.org/10.1128/JCM.01123-18 (2018).
Article PubMed PubMed Central Google Scholar
Ohrmalm, C. et al. Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm. Nucleic Acids Res 38(21), e195 (2010).
Article PubMed PubMed Central Google Scholar
Twistbioscience: The Effects of mismatches on DNA Capture by Hybridization. https://www.twistbioscience.com/resources/white-paper/effects-mismatch-dna-capture-hybridization twist bioscience com/resources/white-paper/effects-mismatch-dna-capture-hybridization.
Avadhanula, V. et al. Inter and intra-host diversity of RSV in hematopoietic stem cell transplant adults with normal and delayed viral clearance. Virus Evolut. https://doi.org/10.1093/ve/vead086 (2024).
Article Google Scholar
Sasirekha Ramani SJJC, Anil Surathu, Frederick H. Neill, Donna M. Muzny, Harsha Doddapaneni, Vipin K. Menon, Kristi L. Hoffman Matthew C. Ross Ginger Metcalf, Antone R. Opekun, David Y. Graham, Richard A. Gibbs, Joseph F. Petrosino, Mary K. Estes, Robert L. Atmar: Intra- and Inter-host evolution of human Noravirus in healthy adults bioRxiv 2023
Henke, D. et al. Examining intra-host genetic variation of RSV by short read high-throughput sequencing. bioRxiv 217, 1247 (2024).
Google Scholar
Sullender, W. M. Respiratory syncytial virus genetic and antigenic diversity. Clin. Microbiol. Rev. 13(1), 1–15 (2000).
Article CAS PubMed PubMed Central Google Scholar
Aljabr, W. et al. Investigating the influence of ribavirin on human respiratory syncytial virus RNA synthesis by using a high-resolution transcriptome sequencing approach. J Virol 90(10), 4876–4888 (2016).
Article CAS PubMed PubMed Central Google Scholar
Levitz, R. et al. Distinct patterns of innate immune activation by clinical isolates of respiratory syncytial virus. PLoS ONE 12(9), e0184318 (2017).
Article PubMed PubMed Central Google Scholar
Noton, S. L. & Fearns, R. Initiation and regulation of paramyxovirus transcription and replication. Virology 479–480, 545–554 (2015).
Article PubMed Google Scholar
Piedra, F. A. et al. Non-gradient and genotype-dependent patterns of RSV gene expression. PLoS ONE 15(1), e0227558 (2020).
Article CAS PubMed PubMed Central Google Scholar
Piedra, F. A. et al. Modeling nonsegmented negative-strand RNA virus (NNSV) transcription with ejective polymerase collisions and biased diffusion. Front. Mol. Biosci. 9, 1095193 (2022).
Article CAS PubMed Google Scholar
New Vaccine Surveillance Network. 2024. https://www.cdc.gov/nvsn/php/about/index.html
Atmar, R. L. et al. Determination of the 50% human infectious dose for Norwalk virus. J Infect Dis 209(7), 1016–1022 (2014).
Article PubMed Google Scholar
Avadhanula, V. et al. Infection with novel respiratory syncytial virus genotype Ontario (ON1) in adult hematopoietic cell transplant recipients, Texas, 2011–2013. J Infect Dis 211(4), 582–589 (2015).
Article PubMed Google Scholar
Loisy, F. et al. Real-time RT-PCR for norovirus screening in shellfish. J Virol Methods 123(1), 1–7 (2005).
Article CAS PubMed Google Scholar
BBMap. https://jgi.doe.gov/data-and-tools/software-tools/bbtools/
Ajami, N. J., Wong, M. C., Ross, M. C., Lloyd, R. E. & Petrosino, J. F. Maximal viral information recovery from sequence data using VirMAP. Nat Commun 9(1), 3205 (2018).
Article ADS PubMed PubMed Central Google Scholar
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. https://doi.org/10.48550/arXiv.1303.3997
The R Project for Statistical Computing. https://www.r-project.org/
Wang, S., Sundaram, J. P. & Spiro, D. VIGOR, an annotation program for small viral genomes. BMC Bioinf. 11, 451 (2010).
Article Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7), 923–930 (2014).
Article CAS PubMed Google Scholar
ggplot2. https://ggplot2.tidyverse.org/
Github link: https://github.com/BCM-GCID/Capture_benchmarking_paper

Download references

Acknowledgements

This work was supported by the National Institute of Allergy and Infectious Diseases (Grant#1U19AI144297). No additional external funding was received for this study. The authors are grateful to the production teams at The Alkek Center for Metagenomics and Microbiome Research and Human Genome Sequencing Center for data generation. We would also like to thank Frederick Neill from the HuNoV group. This work was supported by the National Institute of Allergy and Infectious Diseases (Grant#1U19AI144297). No additional external funding was received for this study.

Author information

Sravya V Bhamidipati and Anil Surathu have contributed equally to this work.

Authors and Affiliations

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
Sravya V Bhamidipati, Hsu Chao, Daniel P. Agustinho, Qin Xiang, Kavya Kottapalli, Abirami Santhanam, Zeineen Momin, Kimberly Walker, Vipin K. Menon, George Weissenberger, Nathanael Emerick, Faria Mahjabeen, Qingchang Meng, Jianhong Hu, Fritz J. Sedlazeck, Ziad M. Khan, Ginger A. Metcalf, Richard A. Gibbs, Donna M. Muzny & Harsha Doddapaneni
Department of Molecular Virology & Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
Anil Surathu, Richard Sucgang, David Henke, Vasanthi Avadhanula, Pedro A. Piedra, Sasirekha Ramani, Robert L. Atmar, Mary K. Estes, Joseph F. Petrosino & Sara Cregeen Javornik
Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
Anil Surathu, Richard Sucgang, Joseph F. Petrosino & Sara Cregeen Javornik
Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
Pedro A. Piedra
Department of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA
Robert L. Atmar & Mary K. Estes

Authors

Sravya V Bhamidipati
View author publications
Search author on:PubMed Google Scholar
Anil Surathu
View author publications
Search author on:PubMed Google Scholar
Hsu Chao
View author publications
Search author on:PubMed Google Scholar
Daniel P. Agustinho
View author publications
Search author on:PubMed Google Scholar
Qin Xiang
View author publications
Search author on:PubMed Google Scholar
Kavya Kottapalli
View author publications
Search author on:PubMed Google Scholar
Abirami Santhanam
View author publications
Search author on:PubMed Google Scholar
Zeineen Momin
View author publications
Search author on:PubMed Google Scholar
Kimberly Walker
View author publications
Search author on:PubMed Google Scholar
Vipin K. Menon
View author publications
Search author on:PubMed Google Scholar
George Weissenberger
View author publications
Search author on:PubMed Google Scholar
Nathanael Emerick
View author publications
Search author on:PubMed Google Scholar
Faria Mahjabeen
View author publications
Search author on:PubMed Google Scholar
Qingchang Meng
View author publications
Search author on:PubMed Google Scholar
Jianhong Hu
View author publications
Search author on:PubMed Google Scholar
Richard Sucgang
View author publications
Search author on:PubMed Google Scholar
David Henke
View author publications
Search author on:PubMed Google Scholar
Fritz J. Sedlazeck
View author publications
Search author on:PubMed Google Scholar
Ziad M. Khan
View author publications
Search author on:PubMed Google Scholar
Ginger A. Metcalf
View author publications
Search author on:PubMed Google Scholar
Vasanthi Avadhanula
View author publications
Search author on:PubMed Google Scholar
Pedro A. Piedra
View author publications
Search author on:PubMed Google Scholar
Sasirekha Ramani
View author publications
Search author on:PubMed Google Scholar
Robert L. Atmar
View author publications
Search author on:PubMed Google Scholar
Mary K. Estes
View author publications
Search author on:PubMed Google Scholar
Joseph F. Petrosino
View author publications
Search author on:PubMed Google Scholar
Richard A. Gibbs
View author publications
Search author on:PubMed Google Scholar
Donna M. Muzny
View author publications
Search author on:PubMed Google Scholar
Sara Cregeen Javornik
View author publications
Search author on:PubMed Google Scholar
Harsha Doddapaneni
View author publications
Search author on:PubMed Google Scholar

Contributions

S.J.C and H.D: Conceptualization S.V.B, A.S, D.P.A, S.J.C and H.D: Writing and Analysis. S.V.B, D.P.A, H.C, K.K, Z.M, G.W, N.E, F.M, J.H, V.K.M, Q.M: Data Generation A.S: Project Administration Q.X, A.S, K.W, R.S, D.H, F.S, Z.K, G.A.M, V.A, P.O, S.R, R.A, M.E, D.M.M: Review & Editing R.A.G, J.F.P: Funding Acquisition All authors reviewed the manuscript.

Corresponding authors

Correspondence to Sara Cregeen Javornik or Harsha Doddapaneni.

Ethics declarations

Competing interests

R.L.A. and M.K.E. have grant support from Hillevax, Inc. and are consultants for that company. Baylor College of Medicine (R.L.A. and M.K.E. as inventors) has a patent for norovirus growth in human intestinal enteroids, and M.K.E. has a patent on methods and reagents to detect and characterize Norwalk virus and related viruses. The rest of the authors declare no competing interests.

Consent to participate

All methods were performed in accordance with relevant guidelines and regulations.

Ethical approval

RSV-positive samples were collected from patients enrolled at the Houston NVSN site only. They were obtained after written informed consent was obtained from the parent/guardian of the eligible children and age-appropriate assent from participating children. Institutional review board approval was obtained locally from Baylor College of Medicine (H-37691) and at the CDC.

Norovirus-positive samples were collected as deidentified samples prior to disposal from a clinical laboratory or as residual samples collected in a clinical study. The latter samples were obtained after written informed consent was obtained from the participants. Institutional review board approval was obtained locally from Baylor College of Medicine for both study groups (H-8390, H-45026).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Bhamidipati, S.V., Surathu, A., Chao, H. et al. Complete genomic characterization of global pathogens respiratory syntical virus and human norovirus using probe based capture enrichment. Sci Rep 15, 20526 (2025). https://doi.org/10.1038/s41598-025-03398-6

Download citation

Received: 26 November 2024
Accepted: 20 May 2025
Published: 01 July 2025
Version of record: 01 July 2025
DOI: https://doi.org/10.1038/s41598-025-03398-6