Introduction

Group A rotavirus (RVA) persists as an important enteric pathogen associated with diarrhoea in children under five years of age1. Rotavirus vaccines were recommended by the World Health Organization (WHO) in countries with high disease burden in 20092, and Mozambique introduced the rotavirus vaccine (Rotarix® GlaxoSmithKline Biologicals, Rixensart, Belgium) in September 20153, supported by data from the Global Enteric Multicenter study (GEMS) conducted in South Asia and sub-Saharan Africa, including Mozambique4. These data showed the contribution of rotavirus in diarrhoeal diseases, with an attributable fraction of 35.0% of moderate-to-severe diarrhoea (MSD), and 20.0% of less severe diarrhoea (LSD) in Mozambican infants4,5. The Centro de Investigação em Saúde de Manhiça (CISM) has been monitoring the impact of the vaccine through a passive surveillance3,6, which demonstrated a decrease in RVA-related cases, from 22.9 to 11.5% pre and post-vaccine introduction3. Despite the extensive documentation of the beneficial impact of rotavirus vaccination in sub-Saharan Africa7, rotavirus remains among the leading pathogens associated with diarrhoea in children under five years of age8.

RVA has a genome composed of 11 gene segments of a double-stranded RNA (dsRNA), enclosed in a non-enveloped icosahedral triple-layered particle, consisting of outer, intermediate, and inner proteins9. These segments encode six structural proteins (VP1-VP4, VP6 and VP7) and five or six non-structural proteins (NSP1-NSP5/NSP6)9. The two major viral capsid proteins, VP7 (encoded by segment nine, G-type- glycoprotein G) and VP4 (encoded by segment four, P-type-protease sensitive P), form the basis for the conventional rotavirus binominal classification into G and P genotypes, widely used in RVA surveillance studies10. Due to the segmented genome, RVAs are prone to genome reassortments10. Whole genome characterization has been used to describe and trace the origin of the strains detected in humans and animals11. The most common and predominant human genotypes based on VP7 and VP4 characterization are G1, G2, G3, G4, G9 and G12 for VP7, and P[4], P[6] and P[8] for VP412. Whole genome classification identified three main genotype constellations: The prototype Wa-like (genotype 1), consisting of G1-P[8]-I1-R1-C1-M1-A1-N1-T1-E1-H110, the prototype DS-1-like (genotype 2), consisting of G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H210 and the minor prototype AU-1-like (genotype 3) consisting of G3-P[9]-I3-R3-C3-M3-A3-N3-T3-E3-H310. Some combinations, such as G1P[8], and G9P[8] have reported unusual DS-1-like constellations12,13. As of June 30, 2024, based on the nucleotide sequencing, 42G, 58P, 32I, 28R, 24 C, 23 M, 39 A, 28 N, 28T, 32E and 28 H genotypes have been identified14.

Previous studies conducted in various regions of Mozambique, including in Manhiça district, demonstrated that G1P[8], G2P[4], G12P[6], G12P[8] were more frequent in the country pre-vaccine introduction, with a switch to G3P[4] and G3P[8] post-vaccine introduction6,15,16,17. Additionally, studies conducted in Maputo, analysing human and animal strains revealed, through whole-genome sequencing (WGS), the circulation of the typical Wa-like G1-P[8]-I1-R1-C1-M1-A1-N1-T1-E1-H118, and DS-1-like G2/G8-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2 constellations in humans19. While typical porcine constellation (G4P[6]/G9P[13]- I1/5-R1-C1-M1-A1/8-N1-T1/7-E1-H1)20 and artiodactyl bovine-like constellation consisting of G10P11-I2-R2-C2-M2-A3/A11/A13-N2-T6-E2-H3 have been found in animal strains21.

In Manhiça district, we characterized by WGS G3P[8] strains with typical Wa-like constellation, which was limited to rotavirus strains recovered from children with MSD22. Although, there is some available data regarding WGS of rotavirus in the country18,19,20,21,23, yet, there is still scarce data about characterization of different strains either Wa-like or DS-1 like constellations. Herein, we aimed to perform in-depth characterization of strains detected in children with (MSD and LSD) and without diarrhoea (healthy controls from the community) in Manhiça district, Southern Mozambique, through whole-genome sequencing, to obtain a more precise understanding of their genetic composition.

Results

Sequence analyses and full genome constellations

Rotavirus strains (n = 21) consisting of 11 pre and 10 post-vaccine introduction samples, from children under five years of age with MSD and LSD and without diarrhoea were sequenced. The Illumina MiSeq instrument registered an overall Phred quality score of Q ≥ 30 for all the sequence data. Full or near full length of the eleven gene segments, encoding VP1-VP4, VP6-VP7, and NSP1-NSP5/6 was determined for the 21 strains. All strains revealed to have a DS-1-like genotype constellation (Table 1). The open reading frames (ORFs) of the 11 gene segments of each of the 21 studied strains are summarized in Supplementary Table S1, and all ORFs were deposited in GenBank under accession numbers PP313617-PP313847.

Table 1 Genomic constellations of the 11 segments of DS-1-like strains from children with (MSD and LSD) and without diarrhoea in Manhiça District, Mozambique, 2009–2018.

Phylogenetic and sequence analyses of VP7 encoding genome segment of G2, G3, G8 and G9

Phylogenetic analysis of G2 (pre-vaccine in children with MSD and a child without diarrhoea), G3 (post-vaccine in children with MSD and LSD), G8 (pre and post-vaccine in children with MSD and LSD) and G9 (post-vaccine in children with LSD) were based on previous described lineages24,25,26,27. In general, all segments were close to strains detected in other regions (Africa, Europe, Asia, and the Americas), with some particularities. G2 strains formed two clusters and were close to strains detected in Mozambique from 2011 to 2012 and were distant from two strains from 2013 (MOZ/0440/2013/G2P[4] and MOZ/0126/2013/G2P[4]), with an average of 96.0% nt and 96.1% aa identities (Fig. 1A and Supplementary, Table S2). Meanwhile, G3 were close to a human strain detected in Pakistan (PAK622/2016/G3P[8]) and a G3 strain detected in a bovine in India (RVA/Bovine_Bf212/COVASU/Parbhani/2017/G3P[X]), sharing 99.0% identity of nt and aa and were distant from previous G3 strains detected in the same period in MSD cases in Manhiça (MOZ/MAN-1811463.8/2021/G3P[8] and MOZ/MAN-1811450.8/2021/G3P[8]), with an average of 98.6% of nt and aa identities (Fig. 1B and Supplementary Table S3).

G8 were in two clusters close to strains detected in Mozambique in 2012, with identities of 99.9% of nt and 98.8% aa (Fig. 2A and Supplementary Table S4). G9 clustered near strains detected in South Africa, one human (RVA/Human/ZAF/MRC-DPRU4677/2010/G9P[8]) strain sharing 98.2% for both nt and aa identities and G9 detected in two pigs (RVA/Pig-wt/ZAF/MRC-DPRU1540/2007/G6G9P[X] and RVA/Pig-wt/ZAF/MRC-DPRU1522/2007/G5G9P[X]) with an average of 98.9% identities of both nt and aa. This strain was distant from a strain detected in 2011 in Mozambique (RVA/Human-wt/MOZ/21162/2011/G9P[8]) with identities of 98.8% for both nt and aa (Fig. 2B and Supplementary Table S5).

Fig. 1
figure 1

Maximum likelihood phylogenetic tree based on the ORF of the VP7 encoding gene segment, applying Tamura 3-parameter + gamma distributed (T92 + G) model: (A) G2 strains; (B) G3 strains. The study strains are indicated in red circles for MSD, black triangle for LSD and green square for children without diarrhoea. Roman numerals indicate lineages. Only bootstrap values ≥ 70% are shown adjacent to each branch node. The Wa-like RVA strain from the USA was included as an out-group. Scale bars indicate the number of substitutions per nucleotide position.

Fig. 2
figure 2

Maximum likelihood phylogenetic tree based on the ORF of the VP7 encoding gene segment, applying Tamura 3-parameter + gamma distributed (T92 + G) model: (A) G8 strains; (B) G9 strain. The study strains are indicated in red circles for MSD, black triangle for LSD and green square for children without diarrhoea. Roman numerals indicate lineages. Only bootstrap values ≥ 70% are shown adjacent to each branch node. The Wa-like RVA strain from the USA was included as an out-group. Scale bars indicate the number of substitutions per nucleotide position.

Phylogenetic and sequence analyses of the VP4 encoding genome segment of P4 and P6 strains

The phylogenetic trees of P[4] (from children with MSD and LSD and a child without diarrhoea) and P[6] (from children with MSD and LSD) were constructed based on previous described lineages25,28,29. Overall, they were close to strains detected in Africa, Europe, Asia, and the Americas, with some specificities. P[4] strains combined with G3 and G2 were in lineage III in the same cluster with a strain from Mozambique (RVA/Human/MOZ/044/2013/G2P[4]) sharing identities ranging from 98.3 to 98.7% nt and 98.3–98.8% aa. Different with P[4] combined with G8, which were in lineage II. They were close to Mozambican strains from 2012, with identities ranging from 97.5 to 100% nt and 97.6-100% aa. Additionally, these strains were closely related to a P[4] strain detected in a pig in South Africa (RVA/Pig-wt/ZAF/MRC-DPRU1533/2007/G2P[4]), with an average of 97.8% of nt and 97.9% of aa identities (Fig. 3A and Supplementary Table S6). P[6] strains were all close displaying 98.7% of both nt and aa identities. (Fig. 3B and Supplementary Table S7).

Fig. 3
figure 3

Maximum likelihood phylogenetic tree based on the ORF of the VP4 encoding gene segment. (A) P[4] genotype applying Tamura 3-parameter + invariable sites (T92 + I); (B) P[6] genotype applying Tamura 3-parameter + gamma distributed invariable sites (T92 + G + I) model. The study strains are indicated in red circles for MSD, black triangle for LSD and green squares for children without diarrhoea. Roman numerals indicate lineages. Only bootstrap values ≥ 70% are shown adjacent to each branch node. The Wa-like RVA strain from the USA was included as an out-group. Scale bars indicate the number of substitutions per nucleotide position.

Phylogenetic and sequence analyses of VP1-VP3, and VP6 encoding genome segments

The phylogenetic analyses of the nucleotide sequences coding for VP1-VP3, and VP6 were constructed based on previous described lineages19,30,31. The studied strains clustered in lineage V in VP1 and VP6, lineage IV in VP2 and into two lineages in VP3 (V and VII). All the 21 study strains from MSD, LSD and strains from children without diarrhoea had the same topology in the trees. Notably, all G3P[4] strains (from children with MSD and LSD) clustered together. Meanwhile, G2P[4], G2P[6] and G8P[4] strains displayed a more dispersed pattern across VP1-VP3, trees; however, they formed distinct clusters in VP6, regardless of being from children with MSD or LSD, or strains from children without diarrhoea (Supplementary Figs. S1-4). In general, in each of these encoding genome segments (VP1-VP3 and VP6) the studied strains grouped into multiple clusters, alongside Mozambican (strains circulating between 2012 and 2013) and global strains, displaying identities ranging from 97.3 to 100.0% of both nt and aa (Supplementary Figs. S1–S4 and Tables S8–11). The VP1 cognate of G2P[6] strains from MSD were in the same cluster with a G8P[11] strain detected in a camel in Sudan (RVA/Camel-wt/SDN/MRC-DPRU/447/2002/G8P[11]), sharing 98.9% nt and aa identities (Supplementary Fig. S1 and Table S8).

Phylogenetic and sequence analyses of NSP1-NSP5/6 encoding genome segments

NSP1-NSP5/6 encoding genome segments were constructed using lineage framework previously described19,21,31,32. Phylogenetically, NSP1 and NSP2 trees of the 21 strains had similar topology, they formed two major clusters, one of G3P[4] (from children with MSD and LSD) and the other of G2P[6] (from a child with MSD) (Supplementary Figs. S5-6). Meanwhile, the topology of NSP3 and NSP5/6 were also similar, NSP3 formed two major closely related clusters, one composed of G3P[4] (from children with MSD and LSD) and the other encompassing G2P[4] (from children with MSD and children without diarrhoea) and G2P[6] (MSD) strains. These clusters shared both nt and aa identities ranging from 96.0 to 96.6% (Supplementary Figs. S7, 9 and Tables S14, 16). Conversely, NSP5/6 featured one cluster of G3P[4] (from children with MSD and LSD) closely related to a human strain from Thailand (THA/DBM2018-105/2018/G2P[4]), sharing an average 99.0% of both nt and aa identities, and a human strain from Pakistan (PAK663/2016/G3P[4]), with an average 99.8% of both nt and aa identities (Supplementary Tables S16). The other NSP5/6 cluster was from G2P[4] (from children with MSD and a strain from children without diarrhoea), G8P[4] (from children with LSD) and G2P[6] (from children with MSD) and was close to a strain from Italy (RVA/Human-wt/ITA/PG01/2011/G2P[4]) sharing an average 96.5% of nt and 98.8% of aa identities (Supplementary Figs. S9 and Table S16).

The NSP4 cognate of G8P[4] (MSD and LSD) were in a cluster with bovine strains from Mozambique (RVA/Cow-wt/MOZ/MPT-93/2016/G10P[11] and RVA/Cow-wt/MOZ/MPT-307/2016/G10P[11]) with an average 89.1% nt and 89.5% aa identities. They also displayed close similarity to a caprine strain from Bangladesh (RVA/Caprine/BD/GO34/1999/NSP4), with an average 93.0% nt and 93.2% aa identities and a bovine strain from India (RVA/Bovine/IND/IC15/IVRI/2012/NSP4) with an average 93.6% nt and 93.8% aa identities (Supplementary Fig. S8 and Table S15). Additionally, NSP4 cognate of G3P[4] (from children with MSD and LSD) was in the same cluster with a bovine strain from India (RVA/Bovine/IND/IC15/IVRI/2012/NSP4), with an average 97% of both nt and aa identities and close to a caprine strain from Bangladesh (RVA/Caprine/BD/GO34/1999/NSP4) with an average 92.3% nt and 92.6% aa identities (Supplementary Fig. S8 and Table S15).

Comparative amino acid analysis between P4 strains combined with G2, G3 and G8

The deduced amino acid sequences of P[4] strains were compared using the DS-1-like strain from the USA (RVA/Human-tc/USA/DS-1/1976/G2P[4]) as a reference strain. Notably, they exhibited aa exclusive differences in 10 positions. Furthermore, the examination of the VP4 epitope sites unveiled marked disparities across all the P[4] strain. The highest aa differences were observed within P[4] combined with G8, with 21 different residues from the other P[4]’s, examples of changes include D150E; R172K; S187R; N189D; S279N; S305L and S603L; V311I; G651R; N684S; I630M. Meanwhile, three aa residues were exclusively different in P[4] strains combined with G2 (V35, V130, G162), while only two aa were different in P[4] strains combined with G3 (K136T, V607I) (Supplementary Table S17). All the P[4] study strains did preserve the arginine (R) cleavage sites at residues 230, 240 and 246. Additionally, they preserved the proline (P) residues at positions 68, 71, 224 and 225 in the conserved region (Supplementary Table S17).

Comparative amino acid analysis of NSP4

The amino acid sequences within the NSP4 encoding genome segment of the studied strains were compared with the animal strains with which they clustered with in the phylogenetic trees, using the prototype DS-1 like from the USA (RVA/Human-tc/USA/DS-1/1976/G2P[4]) as reference strain. Several noteworthy aa differences emerged across distinct antigenic regions in some strains, namely in the antigenic region IV: differences at positions L16S, M17R, and S19R; antigenic region III: differences at positions K115N, Y131H, and K133R; antigenic region II: changes spanning positions V136I to I142V. Region I also revealed amino acid differences at positions R154K, E157K, E160K, N161S, and K163R (Supplementary Table S18). Within the enterotoxin domain (aa 114–135), we detected a few aa changes, specifically, G3P[4] strains (MSD and LSD) displayed a change from K115N. Additionally, in G8P[4] and G8P[6] strains, both from LSD, there was a mutation at position Y131H. Additionally, all strains from G2P[4], G2P[6], and G8P[4] from pre-vaccine in children with MSD and LSD, and children without diarrhoea exhibited a common aa difference at position M135V. This same change was also observed in G8P[6] and G9P[6] strains from children with LSD post-vaccine introduction (Supplementary Table S18).

Discussion

This is one of the few studies characterizing rotavirus strains by WGS in Mozambique. We demonstrated that all strains analysed exhibited a DS-1-like constellation, collectively. Whole-genome data regarding the characterization of Mozambican DS-1-like rotavirus strains are limited, especially among children with LSD and children without diarrhoea. Many surveillance studies identify rotavirus genotypes through RT-PCR33. Available data of whole-genome rotavirus DS-1-like strains in Mozambique are limited to G2P[4] and G8P[4] detected in children with MSD19. The phylogenetic analysis of each of the 11 genome segments of the studied strains (from children with MSD, LSD and children without diarrhoea), suggested that they were genetically similar to each other, as they clustered together. The fact that these strains are similar may indicate the host factors as the main determinant of whether they develop more or less severe symptoms after infection. Therefore, additional studies are needed to evaluate these host factors, such as malnutrition, which has been described to predispose infants to enteric infections34.

Meanwhile, the study strains were similar to other strains detected in Mozambique, other African (Zimbabwe, Zambia, Kenya, South Africa, Ghana, Malawi, Senegal, Gabon, Cameroon and Congo), European (Italy), Asian (Japan, India, Thailand, Pakistan, Philippines), Oceania (Australia) and North American (USA) countries in all the 11 genome segments, suggesting transmission within various regions. Notably, in the phylogenetic tree of the NSP4 encoding genome segment some study strains were clustering with animal strains detected in Mozambique and other countries (Bangladesh and India). The NSP4 is known to have a higher evolutionary rate and has the potential to reassort more frequently in nature than other genome segments35,36, such as demonstrated by the analysis of G2P[4] from Belgium detected after vaccine introduction36 and a DS-1-like G8P[8] strain detected in Thailand37.

It is also worth noting that genome segments of some strains from children with and without diarrhoea, were close strains known to circulate in humans, but were detected in animals, such as P[4] in a pig, G3 in a bovine, and G8P[11] in a camel. Previous work in Manhiça revealed possible reassortment events between various segments (VP7, VP6, VP1, NSP3, and NSP4) of Wa-like G3P[8] strains from MSD cases, closely related with porcine, bovine, and equine strains22. The pattern observed with our strains in general, may in part support the hypothesis that rather than whole genomes being transferred from other species to humans there is frequent exchange of genome segments38 and this occurrence is mainly observed in rural areas39, such as our study setting3.

As the P[4] strains clustered in two separate lineages, G2P[4] and G3P[4] in lineage III and G8P[4] in lineage II, their amino acid sequences were examined to determine their differences. The analysis showed some amino acid differences between them, including differences in the antigenic regions of the VP4 outer capsid protein, which are critical for inducing neutralizing antibodies40. Many of the amino acid substitutions are not believed to be associated with changes in protein functions, such as a change from valine to isoleucine or isoleucine to methionine, which are non-reactive and rarely directly involved in protein function41, and Asparagine to aspartic acid which are both involved in protein active or binding sites41. Although, other are believed to be associated with changes, example is the change from Arginine, which is a polar amino acid located in protein active or binding sites, to glycine may probably reduce the cholangiocyte binding and infectivity41. This has been demonstrated in a mutant of rotavirus VP4 which had an R446G change generated by a reverse genetics system42. These findings may allow speculating that the changes observed in residue G162 of P[4] combined with G2 from a strain isolated from children without diarrhoea may have resulted in the reduction of virus infectivity as described above42. All P[4] from the studied strains, conserved the potential VP4 arginine cleavage sites (positions 230, 240, and 246), similar to most rotavirus strains. These sites are responsible for assuring the virus infectivity43. The proline residues which are known to cause three-dimensional structural distortion and may influence the conformation of the VP4 protein43 were also conserved in the studied strains (positions 68, 71, 224, 225).

Analysis of consensus NSP4 amino acid sequences of the studied strains, from residues 1 to 175 showed several amino acid substitutions at the previously described antigenic site II (ASII:136–150)44, with a change observed in position V136I in two strains from children without diarrheoa and one MSD strain pre-vaccine introduction. This position is reported within the region described to alter pathogenesis mediated by the NSP4 protein45. The substitution of amino acids within the antigenic sites may be assumed to have an evolutionary impact46. Changes in the sequences of NSP4 together with the structural proteins VP4 and VP7 are reported to be associated with asymptomatic infections in humans47, even though the amino acid variation in NSP4 has not always been associated with asymptomatic infections48. On the other hand, some amino acid substitutions were observed in the enterotoxin domain (positions 114–135) of the studied strains. As NSP4 is associated with rotavirus pathogenesis and it has an enterotoxin-like activity, which induces an intracellular calcium imbalance, resulting in membrane instability and diarrhoea, which has been shown in experiments with mice49, the observed substitutions in some studied strains within this region, may suggest somehow to affect the toxigenic activity or conformation of the NSP4 protein or may even be associated with changes in rotavirus virulence45.

One of the limitations of this study is the reduced number of strains from children without diarrhoea that were analysed (three), and all were isolated pre-vaccine introduction. Additionally, few similar strains were observed in both periods, limiting the comparison of strains pre- and post-vaccine introduction. Although there is evidence of possible occurrence of reassortment events in the Manhiça district, was not possible to perform reassortment, because strains with the NSP4 segment close to animal strains do not have similar VP7 and VP4 genome segments. Additionally, a comprehensive protein-protein interaction analysis or modelling is required to evaluate a potential impact in protein function. For example, the pathogenesis mediated by NSP4 has been previously described to be more likely associated with the conformation of the carboxyl region of the toxigenic region, rather than with the presence of a specific amino acid45.

We may conclude that rotavirus strains from children with and without diarrhoea in children under five years of age were genetically similar among the respective genotypes and we had strains with a segment with some similarities with animal strains, suggesting possible reassortment events in our setting. Moreover, the aa substitutions observed in non-antigenic regions of P[4] strains may have led them to cluster in different lineages, according to their combination with a specific VP7 genotype. The aa change from arginine to glycine within the enterotoxin domain of NSP4 of the strains from children without diarrhoea and from arginine to glycine in the aa sequence of VP4 of one strain from MSD, suggests the change in the infectivity and pathogenesis of the strains, although further analyses are required. These results highlight the importance of continuous whole-genome surveillance to evaluate the non-antigenic regions in RVA infectivity and evolution. In addition, the results reinforce the need to implement studies to assess host factors that may predispose to symptomatic or asymptomatic infection, and to implement surveillance using the ‘One Health’ approach to assess the interaction of human and animal rotavirus strains and the impact of the vaccine on circulating and emerging strains.

Methodology

Sample source and site description

This study analysed 21 stool samples from children under the age of five years with (MSD and LSD) and without (health controls from community) diarrhoea. The samples were collected as part of the GEMS study conducted pre rotavirus vaccine introduction (2007–2012) and the surveillance of rotavirus conducted post-vaccine introduction (2015–2019), according to the relevant guidelines and regulations4. MSD were cases with criteria for hospitalization, and LSD cases attended during the outpatient visits, as previously described5,6,50. Samples from eleven children (6 girls and 5 boys) were collected pre-vaccine introduction and ten children (7 girls and 3 boys) post-vaccine introduction. Five of the children enrolled post-vaccine introduction were fully vaccinated, two had received only one dose and three were not vaccinated.

This study included samples that had been genotyped in our previous study6, using conventional reverse transcription polymerase chain reaction (RT-PCR)51. Their percentages were as follows: G2P[4], 6% (in children with MSD) and 3% (in children without diarrhoea) pre-vaccine introduction; G2P[6], 8% (in children with MSD) pre-vaccine, G3P[4], 10% (in children with MSD) and 20% (in children with LSD) post-vaccine introduction6. Additionally, samples genotyped later, were included in the study, such as G8P[4], G8P[6], and G9P[6]. These genotypes were also selected by the fact that they had sufficient RNA quantity for WGS. The description of the number of strains for each case and period (pre-and post-vaccine introduction) was presented in the Supplementary Table S19. GEMS and the rotavirus surveillance were carried out by CISM, which has been running the morbidity (implemented in the Manhiça District Hospital and other peripheral heath facilities) and the demographic surveillances, covering the entire District. The Manhiça district and the surveillance platforms’ characteristics have been described previously52.

Laboratory procedures

Double stranded RNA (dsRNA) extraction and purification

The dsRNA extraction in stool samples was performed using TRIzol™ (Invitrogen, Carlsbad, CA) following a previously described protocol53, including some modifications and purification steps, as previously detailed22. Briefly, a tube containing a suspension of 500 mg of stool in 1 mL of TRIzol was incubated at room temperature for 5 min and centrifuged at 16,000 xg for 15 min at 4 °C (similar conditions applied in the following steps). Afterward, 300 µL of chloroform (Sigma-Aldrich®, St. Louis, MO, United States) were added to the solution and centrifuged. The supernatant was passed on to a new tube and 650 µL of isopropanol (Sigma-Aldrich®, St. Louis, MO, United States) added, followed by centrifugation, removal of the supernatant and drying of the pellet in the tubes. About 95 µL of elution buffer – EB (MinElute Gel extraction kit-Qiagen, Hilden, Germany) were added to the tubes to re-suspend the pellet. Subsequently, 30 µL of 8 M lithium chloride (Sigma, St. Louis, MO, USA) was added to the RNA, incubated for precipitation at 4 °C overnight, to allow the complete removal of single stranded RNA (ssRNA) and inhibitors of cDNA synthesis54. Following incubation, the solution was centrifuged at 16,000 xg at 4 °C for 30 min and the supernatant purified, using the MinElute gel extraction kit (Qiagen, Hilden, Germany) according to the manufactures’ protocol. To evaluate the integrity of all the extracted RNAs, approximately 5 µL was loaded onto 1.0% agarose gel, along with 100 bp molecular-sized ladder, stained with 0.5 µg/mL ethidium bromide and visualized in the gel-documentation system, under ultraviolet light (Bio-Rad Laboratories, Hercules, CA, United States).

cDNA synthesis, DNA library preparation and whole genome sequencing

The dsRNA was sequenced at the University of the Free State - Next Generation Sequencing (UFS-NGS) Unit, South Africa. First, all the extracted dsRNA was synthesized to complementary DNA (cDNA) using the Maxima H Minus Double-Stranded cDNA Synthesis kit (Thermo Fischer Scientific, Waltham, MA), as described by the manufacturer. Briefly, 13 µL of the dsRNA were denatured at 95 °C for 5 min, and 1 µL of 100 µM of random hexamer primer added to the tube, followed by an incubation in a thermocycler at 65 °C. Afterward, 5 µL of the First Strand Reaction Mix and 1 µL of the First Strand Enzyme Mix were added to the tubes and incubated at 25 °C for 10 min, with subsequent incubation of 2 h at 50 °C and reaction termination through heating at 85 °C for 5 min. Fifty five microliters of nuclease-free water were added to the tubes, with 20 µL of 5X Second Strand Reaction Mix, and 5 µL of Second Strand Reaction Enzyme Mix followed by incubation at 16 °C for 60 min. Six microliters of 0.5 M EDTA and 10 µL of RNAse were added to stop the reaction and the cDNA was incubated at room temperature for 5 min, followed by purification using the MSB® Spin PCRapace Purification Kit (Stratec Molecular, Berlin, Germany). Afterwards, the DNA libraries were prepared using the Nextera® XT DNA Library Preparation Kit (Illumina, San Diego, California, US), as previously described12,22 and the sequencing was performed using the Illumina MiSeq® sequencer (Illumina, Inc., San Diego, CA, USA), as previously described12,22.

Data analysis

Genome assembly and generation of whole genome constellations

All the sequence reads generated from the Illumina® MiSeq platform in FASTQ format were trimmed to remove low quality and short reads using Geneious Prime® (v.2022.0.1) software. The ‘per base sequence quality plot’ module was assessed to consider data files with a distribution of Q30 quality score for subsequent analysis. Afterwards, a de novo assembly, to get a complete genome and reference mapping of the reads to assign to a specific location in the genome55, was performed using the prototype DS-1-like G2P[4] reference strain, RVA/Human-tc/USA/DS-1/ 1976 /G2P[4] (accession numbers HQ650116-HQ650126) using CLC Bio Genomics Workbench (v.22.0, Qiagen) software56 and Geneious Prime® (v.2022.0.1) software57, respectively. The curated assembled and mapped sequences were used to derive consensus for respective genome segments of each strain using Geneious Prime® (v.2022.0.1)57. The genotype of each genome segment was determined using the Virus Pathogen Database and Analysis Resource (ViPR) tool (accessed on 15th December 2022)58. FASTA files with nucleotide sequences of a given rotavirus genome segment were submitted and compared to a reference database of known rotavirus sequences. A combination of nucleotide alignments and genotype-specific cut-off values was used to assign the genotype of the input sequence.

Sequence and phylogenetic analysis

The study sequences and additional representative sequences from different lineages of RVA downloaded from GenBank were included in the dataset for comparisons. Each dataset with a specific gene segment was aligned using the Multiple Sequence Comparison by Log Expectation (MUSCLE) algorithm. For the phylogenetic analysis of each genome segment, the evolutionary model that best fitted the data was determined based on the Corrected Akaike Information Criteria (AICc), and the trees constructed using the Maximum-Likelihood method inferred for each segment and applying a bootstrap method with 1,000 replicates to assess the tree robustness59. Bootstrap values ≥ 70% were considered consistent. Nucleotide and amino acid sequence identities were estimated using the pairwise distance matrices and the p-distance algorithm. All the phylogenetic analysis was performed using the Molecular Evolutionary Genetics Analysis (MEGA), version 1159.