Introduction

Parasites of the genus Theileria are blood-borne parasite transmitted by ticks that primarily resides in the red blood cells, lymphocytes, and macrophages of ruminants like bovines, sheep, and equines, leading to theileriosis1,2,3,4. Clinical manifestations include loss of appetite, fever, jaundice, anemia, hemoglobinuria, and swollen lymph nodes3,5. Severe infections can result in death due to delayed intervention or inadequate treatment. Parasites of the genus Theileria can be categorized into two types based on their ability to transform host cells: those that can cause infected host cells to multiply indefinitely as the number of parasites increases is referred to as malignant Theileria, and those that do not, known as benign Theileria6,7,8. Theileriosis, caused by both types, has resulted in significant economic losses in the livestock industry due to high morbidity and mortality rates in animals. In sub-Saharan Africa, countries like Burundi, the Democratic Republic of the Congo, Rwanda, southern Sudan, Tanzania, Uganda, Zambia, and Zimbabwe, experience annual deaths of at least 100,000 cattle, and costs hundreds of millions of dollars in annual economic losses across much of southern Europe, Africa and South Asia9,10. Furthermore, given that theileriosis is primarily transmitted by ticks, the expansion of suitable habitats for these vector ticks, driven by global environmental and climatic changes, has led to ongoing exposure of animals to infected ticks. Consequently, this situation contributes to the persistent worsening of theileriosis11. In addition, although drug treatment can alleviate the clinical symptoms of sick animals, it does not completely eliminate the pathogens within their bodies. These animals will continue to carry the pathogens throughout their lives, creating conditions for the spread and persistence of the disease12. Consequently, the theileriosis, which is caused by the genus Theileria, significantly impairs both the quality and yield of livestock products, thereby hindering the sustainable development of the livestock industry.

Accurate identification of the genus Theileria species is a prerequisite for diagnosing theileriosis. By studying the types and distribution of the genus Theileria, we can gain insights into the epidemic characteristics, transmission routes, and patterns of theileriosis. Moreover, identifying specific the genus Theileria species that pose significant threats to animal health enables the implementation of targeted pharmacological interventions and control strategies. Consequently, precise identification of the genus Theileria species plays a crucial role in the effective prevention and control of theileriosis. However, in the identification of parasites of the genus Theileria, our approach has primarily relied on traditional scientific methodologies, specifically utilizing morphological characteristics, vector analysis, pathogenicity studies, host specificity data, and epidemiological information as the foundational criteria13. And traditional taxonomic methods are often influenced by subjective factors, leading to biased identification results. Moreover, these methods have limitations in distinguishing species with very similar morphologies or the same species at different developmental stages. Hence, the selection of a rapid and accurate identification method is crucial for the effective control of theileriosis.

Mitochondria are found in most eukaryotic organisms and are crucial for energy production and cellular function maintenance14. These organelles possess their own genome, known as the mitochondrial genome. Due to its greater reliability and faster evolutionary rate compared to nuclear DNA, mitochondrial DNA has become a popular molecular marker in phylogenetic studies15. This allows for a more in-depth examination of the taxonomic relationships among various organisms and serves as a foundation for population genetics research16. The current data on Theileria mitochondrial genomes are limited, which presents challenges for classification, identification, phylogenetic research, and the prevention and control of theileriosis. It is essential to enhance our understanding of Theileria mitochondria through genome research to acquire more genetic data. This will establish a basis for accurate identification and molecular biology studies of Theileria.

T. velifera belongs to the phylum Apicomplexa, class Piroplasmea, order Piroplasmida, family Theileriidae and genus Theileria, it is primarily transmitted by the ticks, specifically the Amblyomma genus, and is mainly parasitic on bovines17. It is commonly found in African countries, such as Ethiopia, Mozambique, South Africa, Guinea, and Kenya18. In China, T. velifera is mainly prevalent in the southwest region, particularly in Yunnan Province. Because T. velifera can infect wild and domesticated bovine worldwide, it can cause significant economic losses19,20. Despite the significance of T. velifera, there is a notable scarcity of biological information regarding this parasite, particularly at the molecular level, which includes details about its mitochondrial genome sequence and structure. Consequently, this study presents the first sequencing and analysis of the complete mitochondrial genome of T. velifera. This work enriches and fills a gap in the mitochondrial gene database of the genus Theileria, providing valuable information for the identification of the genus Theileria and its molecular epidemiology research. At the same time, the structure and organization of mitochondrial genomes in Apicomplexan parasites were analyzed in this study. Phylogenetic trees were constructed using 2 PCGs, cox1 and cob, to explore evolutionary relationships. The findings presented in this article contribute to a better comprehension of the mitochondrial genome of T. velifera and lay the groundwork for future investigations into phylogenetic relationships among various Apicomplexan species at the mitochondrial genome level.

Materials and methods

Sample collection, DNA extraction and mitochondrial genome sequencing

To collect blood samples from bovine infected with T. velifera, with the consent of the animal owner, we utilized disposable venous blood collection needles and sterile EDTA anticoagulant tubes, obtaining a total of 331 venous blood samples (0.5 ml each) from cattle raised by farmers in Tengchong City, Yunnan Province (2502′N, 9850′E). Among these, 2 blood samples were confirmed to be T. velifera-positive by amplification and sequencing of the ribosomal 18 S rRNA in Yunnan Provincial Institute for Endemic Disease Prevention and Control (Yunnan, China), and these specimens were obtained for the purpose of this study. To extract a sufficient amount of DNA, we utilized the TIANamp Genomic DNA Kit (TIANGEN, Beijing, China) according to the manufacturer’s instructions, 200 µl EDTA anticoagulant blood was extracted from a positive specimen tube to extract total genomic DNA. Then the isolated DNA samples were stored in a -20 ℃ freezer, while the remaining anticoagulant blood was transferred to a clean and sterile EP tube as a reference specimen and stored at -20 ℃ in the Parasitology Museum of Dali University, Yunnan, the voucher number is DLUP2022101804. This study was approved by the Animal Ethics Committee of Dali University (Application number: 2022-SL-280, Approval number: 2022-P2-280). The extracted DNA was utilized to create an Illumina paired-end library, followed by sequencing of the mitochondrial genome on the Illumina Novoseq 6000 platform at Harbin Botai Biotechnology Co., Ltd. in Heilongjiang, China.

Genome assembly, annotation and data analysis

The raw reads by sequencing had a certain proportion of low-quality reads. In order to ensure the accuracy and reliability of the subsequent information analysis results, the raw reads were filtered first to obtain clean reads, followed by genome assembly using IDBA software21. Subsequently, the mitochondrial genome was annotated using the MITOS web server22, then the mitochondrial gene sequences of related parasites were queried by BLAST in the GenBank database and the homologous proteins were compared and confirmed. These rRNA genes were further annotated by probing formerly reported rRNA gene fragments of T. parva, T. orientalis, and T. uilenbergi. Additionally, ClustalW was used to align the identified rRNA sequences with those previously reported. And tRNAscan-SE v.2.0 was employed to search for tRNA genes in default search mode, as well as other mitochondrial sequence sources23. The base component bias was determined using the formulas AT-skew = (A-T) / (A + T) and GC-skew = (G-C) / (G + C). Nucleotide composition analysis was conducted using DNAStar V7.1, while relative synonymous codon use (RSCU) calculation was performed using CodonW 1.4.2. The mitochondrial genome was visualized on the Gene Graphics drawing online website (https://www.genegraphics.net/ ).

Phylogenetic analysis

The nucleotide sequences of cox1 and cob from Apicomplexan parasites were retrieved from GenBank (Table 1). The cox3 gene was omitted due to significant variation in Theileria and Babesia spp24. Subsequently, Clustal W in MEGA 11.0 software was employed to align the cox1 and cob nucleotide sequences of these Apicomplexan parasites, genetic distance analysis of these sequences was then conducted using MEGA 11.0 software. The concatenated amino acid sequence of cox1 + cob was utilized to construct the phylogenetic tree using MEGA 11.0 software, the amino acid sequences of these Apicomplexan parasites were subjected to Clustal W multiple sequence alignment. The phylogenetic tree was then constructed using maximum likelihood (ML) analysis with 1,000 bootstrap repeats, based on JTT and Freqs models. Finally, the tree was visualized using iTOL25 (iTOL: Interactive Tree Of Life (embl.de)).

Table 1 Comparative analysis of the Mt genome of apicomplexan parasites.

Results and discussion

General characterization of Theileria velifera mitochondrial genome

The complete mitochondrial genome of T. velifera has been submitted to GenBank with entry number ON684327. Sequence analysis revealed a linear structure for the complete mitochondrial genome of T. velifera, spanning a total length of 6,125 bp. It includes 3 PCGs (cox1, cox3, and cob), 5 LSU rRNA gene fragments (LSU1, LSU3, LSU4, LSU5, and LSU6), and 2 TIRs at the both ends. And no tRNA was found in the mitochondrial genes of the parasite, aligning with previous findings that Apicomplexan parasites typically have three PCGs, fragmented rRNA genes and without tRNA presence24,26,27. In mitochondrial genes of T. velifera, the three PCGs are 1,428 bp (cox1), 636 bp (cox3), and 1,173 bp (cob) in length. The sizes of the 5 LSU rRNA fragments were 299 bp (LSU1), 111 bp (LSU3), 82 bp (LSU4), 69 bp (LSU5), and 38 bp (LSU6) (Table 2). The TIR values for both initiation and termination were 64 bp. This contrasts with previous studies on the mitochondrial genome of Apicomplexan parasites, where the terminal reverse repeat size is typically around 440–450 bp (TIR)24. The complete mitochondrial genome of T. velifera exhibited base composition percentages of A = 33.49%, T = 35.46%, C = 15.36%, and G = 15.69%, with AT and GC contents of 68.95% and 31.05% respectively. The AT Skew and GC Skew values were calculated as -0.029 and 0.011 (Table 2). Among the mitochondrial genes, cox1, LSU1, and LSU4 were found to be encoded by one strand, while cox3, LSU3, LSU6, cob, and LSU5 were encoded by the other strand (Fig. a).

Table 2 Composition and skewness of Theileria velifera mitochondrial genome.

Protein-coding genes and rRNA of Theileria velifera

The total length of the three PCGs of T. velifera is 3,237 bp, accounting for 52.85% of the complete genome. Meanwhile, we also calculated the amino acid usage of the three PCGs (Table 3) and the relative synonymous codon usage (RSCU) (Fig. 1). The analysis results indicate that the three PCGs collectively encode 1,076 codons (excluding stop codons). Among these, the most frequently used codon is UUU (75), while CCC, UGA, CGC, CGA, and CGG are not utilized. Additionally, leucine (Leu) has the highest utilization rate at 14.13%, followed by serine (Ser) at 9.94% and Valine (Val) at 9.01%. In contrast, Arginine (Arg) (0.46%) and cysteine (Cys) (1.30%) exhibit the lowest utilization rates among the amino acids. The study calculated the RSCU values of PCGs, with values above 1 indicating a strong codon use preference, values equal to 1 indicating no preference, and values less than 1 indicating weak preference28. The top five codons by RSCU are AGA (4.55), CCA (2.75), UUA (2.17), GCU (2.10), and UCU (2.07). Interestingly, the third position of these codons is either A or U, reflecting a bias towards the use of bases A and T. Furthermore, it was observed that most codons with RSCU values greater than 1 end with A or U, while those ending with G or C tend to have RSCU values less than 1, a pattern consistent with other metazoans29,30.

Table 3 Codon usage in the mitochondrial genome of Theileria velifera.
Fig. 1
figure 1

Theileria velifera protein-coding gene relative synonymous codon usage (RSCU).

The mitochondrial genome of T. velifera contains five LSU rRNA fragments, which is compared to the region of the rRNA LSU of other Apicomplexan parasites studied, designated LSU1-LSU6, the LSU2 fragment was not found in the mitochondrial genome of T. velifera. LSU1, LSU3, and LSU6 of T. velifera are primarily situated between the cox3 and cob genes, while LSU4 and LSU5 are predominantly located near the cob gene and terminal TIR (Fig. 2a).

Fig. 2
figure 2

Mitochondrial genome structure of Theileria velifera (ON684327) (a) and other Thaleria spp. and Babesia spp. parasites, including: (b) Theileria parva (AB499089), (c) Theileria orientalis (AB499090), (d) Theileria uilenbergi (MZ231018), (e) Theileria equi (AB499091), (f) Babesia caballi (AB499086), (g) Babesia bovis (AB499088), (h) Babesia bigemina (AB499085), (i) Babesia motasi (MN605892), (j) Babesia duncani (NC_039721), (k) Babesia gibsoni (KP666169), (l) Babesia rodhaini (AB624360), (m) Babesia microti (NC_031328), arrows indicate the direction of gene transcription. The protein-coding genes include cox1, cox3, and cob, cox1 represented by a brown arrow, cox3 by a yellow arrow, cob by a red arrow, and the cox3-like gene of Theileria equi is shown in purple arrow, LSU is represented by white arrows. TIR represents the end reverse repeat sequence, represented by a blue box.

Comparison of the mitochondrial genome of Theileria velifera with other apicomplexan parasites

In this study, we compared T. velifera in this study with Apicomplexan parasites available on NCBI in terms of length, PCGs, mitochondrial genome structure, A + T content, and parasitic host (Table 1). This study revealed that while Haemoproteus spp., Polychromophilus murinus and Legerella nova possess circular mitochondrial DNA, T. velifera and other Apicomplexan parasites have linear mitochondrial DNA24,31,32,33,34,35,36,37,38. The mitochondrial genome length of T. velifera and most Apicomplexan parasites is approximately 6,000 bp. However, T. equi (8,246 bp), Babesia rodhaini (6,929 bp), and B. microti (10,547 bp and 11,109 bp) have significantly longer mitochondrial genome lengths compared to other Apicomplexan parasites. B. microti, for example, has a genome length almost twice that of most other Apicomplexan parasites. Among these Apicomplexan parasites examined in this study, T. equi has two cox3-like genes, while T. velifera and other Apicomplexan parasites possess one cox3 gene. And none of them have mitochondrial tRNA genes, so some scholars believe that their deletion can be compensated by functional replacement of cytoplasmic tRNA imports39,40,41. T. velifera and most Apicomplexan parasites analyzed here typically exhibit a high A + T content of around 70%14. However, it was observed that B. microti, L. nova and Eimeria necatrix have slightly lower A + T levels compared to other Apicomplexan parasites, approximately 64%. In terms of parasitic hosts, the different species of Apicomplexan parasites in this study showed differences, Theileria and Babesia spp. can be parasitic on different hosts, including bovine, sheep, equine, canine, murine, and humans. On the other hand, Plasmodium spp. primarily infect humans, murine, monkey, rodent, bird and lizard. Toxoplasma gondii mainly infect humans and cats. Eimeria spp. parasitizes chickens, geese, and rabbits. And the primary parasitic hosts of Leucocytozoon spp. are birds and chickens, the main hosts for Haemoproteus spp. and Isospora spp. are exclusively birds. Nycteria spp. primarily parasitize bats14,27,42,43.

In a comparative analysis of start and stop codons in PCGs, T. velifera was compared with other species of the genus Theileria and Babesia in this study (Table 4). The start codons for cox1, cox3, and cob in T. velifera were ATA, ATG, and TTG, respectively, while the stop codons were TAA, TAA, and TAG, respectively. The analysis revealed that the start codons forms used in these parasites are ATN, GTN, and TTN, with ATN being the most commonly utilized. Additionally, it was observed that GTN and TTN are more frequently used in cox1 compared to cox3 and cob. In addition, the start codon of the cox1 gene of T. annulata is CAT. The distribution of stop codons included TAA, TAG, and TGA, with TAA being the most prevalent, followed by TAG. In the start codons of the PCGs of these parasites examined in this study, in addition to the commonly used codon pattern ATN, several special start codons are also utilized, including GTG, GTT, CAT, TTA, TTG, and TTT. These special start codons are subject to correction during mRNA editing and do not affect the normal translation of the protein44.

Table 4 Initiation and termination codons in protein-coding genes of the genus Theileria and Babesia parasites mitochondrial genomes.

We conducted an analysis of the distribution and transcription orientations of PCGs, LSU, and TIR in T. velifera with other Thaleria spp. and Babesia spp. parasites in this study (Fig. 2). In terms of PCGs, T. equi contains two cox3-like fragments, cox1 and cob genes, while Thaleria spp. and Babesia spp. parasites have three PCGs: cox1, cox3, and cob. The distribution order of these genes varies among species, in this study, with T. velifera and most Theileria and Babesia spp. having the order cox1, cox3, cob. In terms of gene length, there was no significant difference in the size of cox1, cox3, and cob between T. velifera and other Thaleria spp. and Babesia spp. parasites. Additionally, when comparing the expression directions of cox1, cox3, and cob genes in these species, it was observed that the majority of species shared the same expression direction as T. velifera. Specifically, the expression direction of the cox1 gene was 5’ to 3’, while for cox3 and cob genes, the transcription direction was 3’ to 5’16,45,46.

In terms of rRNA, T. velifera and other species of the genus Theileria and Babesia only contain LSU, typically five or six copies. Research indicates that rRNA fragmentation is a common occurrence in various natural organisms and not exclusive to the mitochondrial genome of Apicomplexan organisms, this phenomenon has been observed in Apicomplexans, Dinoflagellates, Oxytricha, Paramecium and Tetrahymena, with rRNA fragments dispersed across the mitochondrial genome and encoded on either strand47,48. However, the process by which these rRNA fragments assemble to form ribosomes remains unclear49. Interestingly, most parasites of the genus Theileria and Babesia in this study, LSU1 and LSU3 are found between cox3 and cob genes, while LSU2 and LSU6 are also in this region. LSU4 and LSU5 are mostly located between cob gene and the terminal TIR. Additionally, LSU1 and LSU3, as well as LSU4 and LSU5, are transcribed in opposite directions in most genus Babesia and Theileria species, whereas LSU2 and LSU6 have the same transcription direction.

In the context of TIR, T. velifera and Thaleria spp. and Babesia spp. do possess TIR sequences. Notably, in T. equi, TIR sequences at both ends exhibit a significant overlap with cox3-like sequences. Furthermore, most the genus Babesia and Theileria species, including T. velifera, have a single pair of TIR sequences located at both ends of linear mitochondria, whereas B. rodhaini and B. microti have two long pairs of TIR sequences, named IRa and IRb, and a flip-flop inversion in between50. Reports have indicated individual TIR lengths ranging from 184 to 1,082 bp for B. rodhaini and B. microti, and 1,563 bp for T. equi, which are notably longer than the typical TIR length previously documented (440–450 bp). Furthermore, in terms of total mitochondrial length, B. rodhaini, B. microti, and T. equi exhibit considerable length compared to other Babesia and Theileria spp. species. These findings suggest that the amount and size of TIR may play a significant role in the variations in mitochondrial genome sizes. Simultaneously, other researchers argue that TIR plays a pivotal role in the replication and stability of linear mitochondrial genomes14,24,50.

Phylogenetic analysis

Due to the high divergence of the cox3 gene in the genus Theileria and Babesia parasites, we obtained nucleotide sequences of cox1 and cob from Apicomplexan parasites from GenBank. Combined with the cox1 and cob sequences of T. velifera measured in this study, we conducted an analysis of genetic distances among these Apicomplexan parasites based on the cox1 and cob genes (Supplementary Table S1). The results indicate that among the Apicomplexan parasite species examined in this study, T. velifera exhibited a genetic distance ranging from 0.276 to 0.356 from other species within the Thaleria spp., 0.318 to 0.465 from the Babesia spp., and 0.426 to 0.438 from the Plasmodium spp. The genetic distance from Leucocytozoon spp. is between 0.425 and 0.444, from Haemoproteus spp. it is between 0.424 and 0.448, and from Isospora spp. it is between 0.453 and 0.460. Furthermore, the genetic distance to the Eimerium spp. was found to be 0.452 to 0.467, while the distance to T. gondii was 0.486. Notably, T. velifera displayed the closest genetic distance to T. taurotragi (0.276) and the furthest distance to T. gondii (0.486). Additionally, within the genus Babesia examined in this study, T. velifera was genetically more closely related to most Babesia spp. than to T. equi.

Due to the high divergence of the cox3 gene in the genus Theileria and Babesia, we utilized the cox1 and cob amino acid sequences from the mitochondria of Apicomplexan parasites available in the NCBI database, along with the cox1 and cob sequences from the mitochondrial genome of T. velifera obtained in our study (Table 1). These concatenated amino acid sequences were employed as a dataset for the construction of a phylogenetic tree (Fig. 3). In phylogenetic analysis, the evolutionary tree is primarily divided into two major branches. The first major branch comprises the species of the order Haemosporida, Eugregarinida, and Eucoccidiorida. Within the order Haemosporida, the parasites of the genus Plasmodium form a distinct subgroup, while the parasites of the genus of Haemoproteus, Leucocytozoon, and Nycteria are grouped together. This clustering indicates a relatively close relationship among the parasites of the Haemoproteus, Leucocytozoon, and Nycteria genus. At the same time, in the order Eucoccidiorida, the genus of Isospora and Eimeria species are relatively close. Another large branch is mainly the parasite of the order Piroplasmorida, mainly including the genus Theileria and Babesia, which further divides into five main subgroups: (1) Babesia microti group including B. rodhaini and B. microti, (2) Western Babesia group containing B. duncani, (3) Babesia sensu stricto encompassing species such as B. gibsoni, B. canis, etc., (4) T. equi, and (5) Theileria sensu stricto comprising species like T. orientalis, etc. This classification aligns with previous research51,52. In this study, T. velifera was first clustered with T. annulata, T. lestoquardi, T. taurotragi, and T. parva. and then with T. uilenbergi, T. luwenshuni and T. orientalis, indicating a closer relationship between T. velifera and T. annulata, T. lestoquardi, T. taurotragi, and T. parva when compared to T. uilenbergi, T. luwenshuni and T. orientalis. This relationship is further supported by the 18s RNA phylogenetic tree17,18.

Fig. 3
figure 3

The phylogenetic relationship between T. velifera and Apicomplexan parasites was analyzed using JTT and Freqs models. Maximum likelihood analysis was performed based on the amino acid sequences of cox1 and cob genes to infer the phylogeny, with T. velifera highlighted in orange background. Bootstrap values > 50% from 1000 replicates are shown on the nodes.

Theileria spp. from the branch containing T. velifera initially grouped with B. ducani, then with Babesia sensu stricto, and finally with T. equi. This suggests that T. velifera is more closely related to B. duncani and Babesia sensu stricto when compared to T. equi. Furthermore, it indicates that Theileria sensu stricto, which includes T. velifera, shares a common ancestral branch with B. duncani and Babesia sensu stricto53,54. Studies have demonstrated that B. duncani is classified as a distinct member of a new clade when compared to other Apicomplexan parasites, as indicated by 18 S rRNA and neighbor-joining tree analysis. Additionally, 18 S rRNA analysis reveals that B. duncani and B. conradae are part of the same clade within the Western Babesia group. However, due to the lack of identification of the cox1 and cob genes in B. conradae, this species was not included in the analysis conducted51,53,54. Furthermore, numerous studies have provided evidence that Babesia sensu stricto forms a monophyletic group, a conclusion supported by both molecular and biological data. Numerous studies have demonstrated that T. equi belongs to a distinct phylogenetic lineage separate from Theileria sensu stricto, B. ducani, and Babesia sensu stricto. This differentiation has been supported by the analysis of two molecular markers, 18 S rDNA and RPS8. Furthermore, the mitochondrial genome structure of T. equi differs significantly from that of Theileria sensu stricto including T. velifera, B. duncani, and Babesia sensu stricto. These findings increasingly suggest that T. equi occupies a unique branch within the system27,51. As a result, it can be concluded that mitochondrial genome sequences play a crucial role in phylogenetic investigations of Apicomplexan parasites.

Conclusion

In this study, the mitochondrial genome of T. velifera is reported for the first time, revealing three PCGs, five LSUs, and a pair of TIRs, but lacking tRNA. A comparative analysis with other Apicomplexan parasites was conducted to enhance our systematic understanding of their evolutionary relationships. The findings indicate structural and phylogenetic similarities between the mitochondrial genome of T. velifera and Babesia sensu stricto, Theileria sensu stricto, while T. annulata, T. lestoquardi, T. taurotragi, and T. parva show greater similarity. This study presents robust molecular evidence for the accurate identification of T. velifera and establishes a theoretical foundation for subsequent genomic research. Furthermore, it provides a molecular basis for epidemiological investigations of T. velifera. And by comparing the mitochondrial genomes of different Apicomplexan parasites, this research enhances our comprehensive understanding of the phylogenetic relationships among these organisms. Overall, this work will contribute valuable insights into the phylogeny and population genetics of Apicomplexan parasites, while also providing critical molecular evidence for further investigations into their evolutionary mechanisms and genetic diversity.