Detection and functional assessment of structural variants using whole-genome re-sequencing data in Nellore cattle

Marín-Garzón, Natalia A.; Mota, Lucio F. M.; Vargas, Giovana; Arikawa, Leonardo M.; Fonseca, Larissa F. S.; Fernandes Júnior, Gerardo A.; Carvalheiro, Roberto; Albuquerque, Lucia G.

doi:10.1038/s41598-025-14139-0

Download PDF

Article
Open access
Published: 19 August 2025

Detection and functional assessment of structural variants using whole-genome re-sequencing data in Nellore cattle

Natalia A. Marín-Garzón¹,
Lucio F. M. Mota¹,
Giovana Vargas¹,
Leonardo M. Arikawa¹,
Larissa F. S. Fonseca¹,
Gerardo A. Fernandes Júnior¹,
Roberto Carvalheiro² &
…
Lucia G. Albuquerque^1,3

Scientific Reports volume 15, Article number: 30364 (2025) Cite this article

3112 Accesses
Metrics details

Subjects

Abstract

Ongoing advances in genome sequencing technologies have enabled the identification of numerous structural variants (SVs) in livestock genomes, which are the main determinants of complex traits due to their impact on gene expression. Thus, this study aimed to detect structural variants using whole genome re-sequencing (WGS) data and identify positional candidate genes and quantitative trait loci (QTL) overlapping the most frequent SV regions in Nellore cattle. The WGS from 151 representative Nellore bulls were analyzed to conduct genome-wide structural variation calling and to identify common SV regions. Gene and QTL information surrounding the most frequent SV regions was identified using the Ensembl Genes and Cattle QTL database. The identified genes were functionally classified for biological mechanisms and pathways (Gene Ontology - GO) using the panther database. A total of 215,031 SVs were identified, with most of them being copy number variants (CNV) (183,032 deletions and 14,013 duplications) and 17,986 inversions (INV). These SVs cover, on average, 4.81% of the autosomal genome. Furthermore, we found 3,752 non-redundant SV regions that are frequent in at least 5% of the bulls. These SV regions mainly correspond to CNV regions (97%) and inversion regions (3%). In total, all SV regions cover 13.13% of the total autosomal genome, with 11.4% attributed to CNV regions and 1.7% to inversion regions. Moreover, we found that 532 SV regions were common in more than 50% of the bulls evaluated and overlapped 130 QTL previously associated with economically important traits related to exterior, health, meat and carcass, milk, production, and reproduction. A total of 1,164 positional candidate genes were identified, with 204 SVRs overlapping these genes. These genes are significantly overrepresented in GO terms related to biological processes (BP), molecular functions (MF), and biochemical pathways, playing an essential role in environmental adaptation mechanisms and feed efficiency indicator traits. Our results suggest that genes surrounding SV regions play key biological functions essential to thermotolerance, immunity, metabolism, tissue integrity, and environmental adaptation in tropical regions.

Comprehensive detection of structural variations in long and short reads dataset of French cattle

Article Open access 19 November 2025

SVLearn: a dual-reference machine learning approach enables accurate cross-species genotyping of structural variants

Article Open access 11 March 2025

Whole-genome resequencing and genetic diversity of five indigenous cattle breeds from China

Article Open access 21 January 2026

Introduction

The widespread developments of molecular techniques and bioinformatic algorithms have enabled the unraveling of genetic mechanisms driving differences between individuals, which are mainly attributed to single-nucleotide variants (SNVs), small insertions and deletions (indels; < 50 bp), and structural variations (SVs)^1,2,3. Structural variants are generally defined as inherited variations in DNA sequences larger than 50 bp that exhibit changes in the copy number, orientation, and/or chromosomal location in relation to the reference genome^4,5. The balanced SV types correspond to the mobile elements (inversions and translocations), which cause rearrangements of DNA sequences^1,5. On the other hand, small DNA segments, such as insertions and deletions (indels) and copy number variants (CNVs), comprise the unbalanced SVs in which gains or losses of DNA segments are observed. CNVs are the most studied SV types in livestock and encompass duplication and deletion events ranging in size from 1 Kb to 5 Mb^3,6,7.

Compared to SNPs, SVs affect larger fractions of the genome and account for the greatest amount of total polymorphic content among individual genomes⁸. SVs may have potentially greater effects than SNPs, from modification of genes to disruptions in the regulation of gene expression, leading to recessive disorders^6,9. A variety of based platforms, such as comparative genomic hybridization and SNP arrays, have been commonly used for SVs-wide detection in livestock¹⁰. However, array-based approaches lack genome coverage, especially in segmental duplication regions, which are known for being hotspots for CNV formation¹¹. The advantages of detecting SV using whole genome sequencing data (WGS) include higher coverage and resolution, more accurate prediction of copy numbers, more precise detection of breakpoints, and higher power to identify novel SVs¹¹. In cattle, the study of structural variations has reached increasing attention because of their association with economically important phenotypes such as fatty acid profile¹²meat tenderness¹³feed conversion ratio¹⁴growth traits¹⁵milk production traits¹⁶and milk somatic cell score¹⁷.

Although WGS has become more feasible in livestock, most studies regarding structural variant-wide detection are still being conducted using SNP array-based approaches. Few published studies explored the structural diversity using whole genome sequencing data, all with less than 100 animals^{18,19,20,21,22,23,24}. Furthermore, most have been focused on the assessment of copy number variants (CNVs) rather than the other SV types. The detection of balanced and unbalanced structural variants using whole genome sequencing data may contribute to the unraveling of the molecular mechanisms underlying genetic diversity, breed adaptation, and evolution, especially in Nellore cattle, which have not been well elucidated so far. Therefore, the aims of this study were: (i) to detect structural variants using whole genome re-sequencing data of 151 representative Nellore bulls by combining calling algorithms; (ii) to discover non-redundant and highly frequent structural variant regions (SVR) in the analyzed bulls; (iii) to search for positional candidate genes and quantitative trait loci (QTL) overlapping the most frequent SVR in the population; and (iv) to assess the functional impact of positional candidate genes overlapping SVR through enriched gene ontology terms (GO terms) related to biological process (BP), molecular function (MF), cellular component (CC), and biochemical pathways.

Materials and methods

DNA samples and re-sequencing

A total of 151 Nellore bulls comprised the analyzed population for the current study. The ancestors were selected based on their contributions to the genetic diversity of the Brazilian Nellore population. For that, pedigree information of 2,688,124 related individuals, with a total of 9,811 sires, from which 6,040 were founders, and 915,371 dams were used. The selected bulls for DNA re-sequencing were the ones less related to the others, those having the higher number of progenies and that have greater contributions to the genotyped population, which consists of more than 60,000 genotyped animals from five Brazilian Nellore breeding programs: DeltaGen (http://deltagen.com.br/), Nelore Qualitas (https://qualitas.agr.br/), Cia de Melhoramento (https://www.ciademelhoramento.com.br), PAINT (https://www.paintmga.com.br/), and Centro de Pesquisa de Bovinos de Corte - Instituto de Zootecnia (https://iz.agricultura.sp.gov.br/). For further information regarding the selected ancestors, see Fernandes Junior et al.².

Genomic DNA was extracted from semen samples (100 µL) using the DNeasy Blood and Tissue DNA Extraction Kit (Qiagen, Valencia, CA, USA). Each purified DNA sample was quality-controlled by agarose gel electrophoresis (1%). Extracted DNA was quantified based on the absorbance using the NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific, Santa Clara, CA, USA, 2007).

The whole-genome re-sequencing of representative bulls was performed using Illumina sequencing by synthesis technology. From all DNA samples, 52 were re-sequenced using the Hiseq X Ten and the other 99 with the NovaSeq 6000 System platform (Illumina Inc., San Diego, USA). DNA libraries were constructed with end-linked adapters, and the resulting libraries were re-sequenced by paired-end sequencing (sequencing of both ends of DNA fragments), which produced an average of 360 million reads per bull with a length of 150 Bp (2 × 150) and an average genome coverage of 14.5x, ranging from 7.8 to 26.3x. Poor quality reads were trimmed using SeqClean software (https://hpc.ilri.cgiar.org/seqclean/seqclean) with default parameters. The SAMtools software v. 1.22²⁵ (https://www.htslib.org/download/) was used to index reads, and the alignment to the bovine reference genome ARS-UCD1.2 (http://www.ncbi.nlm.nih.gov/genome/?term=bos+taurus) was performed using the software HISAT2 v. 2.2.0²⁶ (https://daehwankimlab.github.io/hisat2), in which 97.9% of reads were successfully paired. Further details regarding the re-sequencing and alignment can be found in Fernandes Junior et al.² and Fernandes Junior et al.²⁷.

Detection of structural variants

Detection of structural variants (SV) was conducted by adapting the workflow proposed by Letaief et al.²⁸ in French dairy and beef cattle (Fig. 1). SV calling for each sample was performed using Parliament2²⁹, a software that integrates multiple bioinformatics tools for high-quality structural variation events calling. The consensus strategy of Parliament2 (https://github.com/fritzsedlazeck/parliament2) provides information about the size, type, location, and calling method for each SV event. We used BAM and BAI files containing the whole genome nucleotide sequences of 151 Nellore bulls, as well as BAM and BAI files containing the bovine reference genome sequence (ARS-UCD1.2). For this dataset, we executed Parliament2, setting the default parameters for each sample through a single Docker image across the DNAnexus platform. A total of three bioinformatic tools were integrated through Parliament2: CNVnator v. 0.4.1³⁰ (https://github.com/abyzovlab/CNVnator), Lumpy v. 0.3.1³¹ (https://github.com/arq5x/lumpy-sv) and Delly v. 1.2.6 ³² (https://github.com/dellytools/delly). CNVnator, Lumpy, and Delly are bioinformatics tools designed for detecting structural variants (SVs) from whole-genome sequencing (WGS) data. CNVnator identifies copy number variations (CNVs) by analyzing read depth. Lumpy utilizes a hybrid approach that combines paired-end, split-read, and read-depth signals to detect various types of structural variants, including deletions, duplications, inversions, and translocations. Delly, on the other hand, employs paired-end and split-read mapping techniques to accurately identify a wide range of structural variants, including complex rearrangements.

After computing Parliament2, we obtained VCF files containing the SV called from each bioinformatic approach for each of the 151 Nellore bulls, consisting of a total of 6 VCF files, described as follows: 2 VCF files containing CNVnator and Lumpy detected variants and a total of 4 VCF files containing each Delly detected variant type (duplications, deletions, insertions, and inversions).

Filtering of called structural variants

The filtering of detected SVs was performed at individual and population levels in order to remove unreliable raw SVs and reduce the false discovery rate. For the following analyses, different sets of utilities integrated by BCFtools³³ and Bedtools³⁴ were used. All called SVs were first filtered at the bioinformatic tool level, following specific criteria as suggested for each SV caller.

1.
CNVnator: we only retained the called structural variants that were significant for the t-statistical test (p-value < 0.05), for which the null hypothesis statement is that the mean read signal depth in an SV region is the same as the mean signal depth in the sample.
2.
Delly: only called SV that passed the quality filters (i.e., flagged PASS) and supported for more than 3 paired-end events (PE support > 3) were kept. After filtering, the VCF containing the remaining DEL, DUP, and INS variants was joined using the option concat of BCFtools to obtain a single VCF containing the copy number variants (CNVs), while the VCF containing filtered INV variants was maintained separated.
3.
Lumpy: SVs with a read count supported by more than 1 split-read event or more than 3 paired-end events (SR > 1 or PE > 3) were maintained.

The SVs that passed the described criteria were compared at the individual level (i.e., within the same sample) to identify a high-confidence set of SVs for each bull (composed only of SVs common to all bioinformatic tools). As each of the three callers used has a different power to detect a specific SV type, such as CNVnator, which only detects CNVs, the within-sample analysis was performed for each SV type separately:

1.
Identification of high-confidence CNVs: the consensus files containing the filtered variants from CNVnator and Lumpy, and the concatenated and filtered file containing duplications, deletions, and insertions from Delly, were compared against each other using the option intersect of Bedtools by fitting a minimal reciprocal overlap threshold of 90%.
2.
Identification of high-confidence INVs: the consensus-filtered file containing the SVs detected from Lumpy and the one containing the inversions detected from Delly were compared against each other using the option intersect of Bedtools by fitting a minimal reciprocal overlap threshold of 90%.

The establishment of a reciprocal overlap threshold of 90% means that if at least 90% of the sequence of a certain SV detected by one caller is the same as the sequence of an SV predicted by the others, that SV must be deemed as a high-confidence SV for a certain bull.

After executing the within-sample filtering, we obtained two VCF files for each bull, one containing the high-confidence inversions (INVs) and the other one containing the high-confidence copy number variants (CNVs). All VCF files containing each type of high-confidence SVs were joined using the option concat of BCFtools to obtain two files containing the high-confidence CNVs and INVs, respectively, for all bulls. CNVs and INVs were then submitted to additional filtering, where SVs located in sexual chromosomes and with a length shorter than 1 Kb and higher than 5 Mb were removed. The remaining CNVs and INVs were considered for downstream analysis.

Identification of structural variant regions (SVRs)

The joining of structural variants into regions is required for population-wide analysis, as it allows the detection of DNA segments having different structural variation events³⁵. In the present study, the structural variant regions (SVRs) were built by collating, separately, the two types of high-confidence SVs (CNVs and INVs) along all bulls that overlapped in at least 1 bp. For this, we used the utility merge of Bedtools (https://github.com/arq5x/bedtools2) fitting operations such as count and collapse, which allowed the identification of samples sharing a certain region and the occurring SV event. Regions that were not shared by at least 5% of the bulls (N = 7) were removed. As a result, we obtained two sets, one containing the regions of copy number variants (CNVRs) and the other one with the regions of inversions (INVRs), which were then joined in a single file using the option concat of BCFtools, to facilitate the descriptive analysis of SVRs (Fig. 2).

Statistical analysis and graphical visualization of SVs and SVRs

All reported descriptive statistics and plots about the results were calculated using functions and utilities of the R package tidyverse³⁶ through the Integrated Development Environment RStudio. The association of different metrics, such as the number, distribution, and (average) length of the predicted SVs and SVRs, with chromosome size and genome coverage rate, was assessed through the pairwise Pearson correlation method using the R package Performance Analytics³⁷.

Functional analysis of highly frequent SVR

The functional implications of predicted SVR were assessed by searching genes and QTLs overlapping the most frequent SVR in the 151 bulls (SVR present in more than 50% of bulls, N > 75) using the GALLO package of R³⁸. Genes and QTLs were retrieved from the Ensembl Genes database³⁹ and cattle database from Animal QTLdb⁴⁰respectively. Only significant QTLs (p-value < 0.05) and protein-coding genes overlapping the most frequent SVR were reported in this study.

Functional enrichment analysis for positional candidate genes overlapping the most frequent SVR was performed through PANTHER 17.0 software (https://pantherdb.org/). PANTHER statistical overrepresentation test was conducted using, as an analyzed list, the set of positional candidate genes harbored by the most frequent SVR (a total of 1,164 genes) and, as a reference list, the Bos taurus genes contained in PANTHER databases. The annotation data sets were PANTHER GO-slim Molecular Function, PANTHER GO-slim Biological Process, PANTHER GO-slim Cellular Component, PANTHER Pathway, and Reactome Pathway. A Binomial test was applied to test whether the genes overlapping the SVR were enriched for the gene ontology categories and biochemical pathways. This assesses the probability of observing a certain number of genes related to a specific biological mechanism within the found gene set, given all genes in the Bos taurus reference genome. As different biological functions were tested at the same time, the Bonferroni correction method for multiple testing was used, which allows for reducing the probability of false positive associations of genes and gene ontologies and/or pathways. Only genes with significant associations (p-value < 0.05) with gene ontologies (GO) or pathways were reported. The Tidyverse (https://www.tidyverse.org/) and ggplot2 (https://ggplot2.tidyverse.org/) R packages were used to graphically represent the QTL type and trait ID of significant QTLs, as well as the significantly enriched GO and/or pathways, through pie charts and bar plots.

Results and discussion

Genome-wide SV detection

Number of high-confidence SVs

A total of 215,031 high-confidence structural variants were obtained for the analyzed bulls that passed all the described filtering criteria. Statistics summary for the number and length of high-confidence SVs, including CNVs (deletions and duplications) and inversions, are presented in Table 1. Although insertions were detected by the SV callers, after applying filtering criteria and executing the reciprocal overlapping into the CNV set, no insertions remained, as they were detected with low quality and corresponded to small insertions (average length of 1 bp), which are not considered as structural variation.

Table 1 Statistics summary for the number and length of detected high-confidence svs.

Full size table

In this study, SVs were detected using three complementary callers, such as CNVnator (based on read-depth), Delly (based on paired-end mapping), and Lumpy (which integrates both paired-end and split-read signals). Each tool’s output was first filtered using tool-specific quality thresholds. Then, for each SV type, only the variants that were shared across all three callers within the same individual, based on a 90% reciprocal overlap threshold, were retained as high-confidence SVs. Although this approach did not allow quantifying partial overlaps between specific pairs of tools (e.g., CNVnator vs. Delly), it ensured a highly stringent consensus set, minimizing the false discovery rate by focusing exclusively on SVs consistently detected by all methods. This strategy favored precision over sensitivity and reflected the complementary nature of the tools used, each with different strengths depending on SV type and detection principle. As such, the 215,031 high-confidence structural variants reported represent a conservative yet robust subset for downstream analyses (Supplementary Table S1).

From the total number of detected SVs, 197,045 corresponded to CNVs, and the number of deletions was 13.06 times higher than the number of duplications (Table 1). This result is similar to that reported by Silva et al.²²who detected 11.8 times more deletions than duplications in Nellore cattle using whole genome sequencing data. It also agrees with the findings of Santana et al.¹⁴in which the number of loss events was higher than the number of gain events for the detected CNVs from SNP data in Nellore cattle, although the ratio presented here is higher than reported by the latter authors (1.18 times more losses than gains). Similarly, our result is congruent to the ratio reported by Braga et al.²⁰ in Gyr cattle (1.96 times more deletions than duplications), who detected CNVs from WGS using CNVnator and Delly software, and agrees with the findings of Hu et al.⁴¹ in different cattle breeds, including Nellore, where the number of deletions was higher than the number of duplications. Interestingly, the authors pointed out that the proportion of observed deletions was higher in the Bos indicus than in Bos taurus animals. In contrast, the ratio of deletions over duplications observed in this study diverges from what was reported by Peripolli et al.¹⁸who used whole genome sequencing data from three Brazilian locally adapted taurine breeds, where the number of observed duplications was 1.85 times higher than the observed deletions.

The remarkably elevated number of deletions may be related to mechanisms causing structural variations within the genome. Non-allelic homologous recombination (NAHR) has been known as the main mechanism causing deletions and translocation events⁴². The findings of our study, supported by the previously reported in Nellore and other Bos indicus breeds, may suggest that the deleted DNA fragments have caused the evolution and diversity of Nellore cattle genomes, but this can only be known by investigating the genomes of a higher number of ancestors. Besides, it is plausible that the alignment of sequences to the Bos taurus reference genome led to bias. The comparison of the copy number likely excluded specific sequences present in Bos taurus that are no longer present in the indicine cattle genomes, leading algorithms to consider those sequences as deleted fragments⁴¹. Thus, a better characterization of structural variation in indicine cattle requires the availability of a reliable Bos indicus reference genome.

Regardless of the algorithm used to detect the structural variants from short-read sequencing, it is often expected to find a higher number of deletions than duplications. The read pair approach detects the duplicated copies in a mapped read located at a shorter distance than the fragment length, and when the length of the duplicated sequence is larger than the insert size, the algorithm tends to fail in the accurate detection of that duplication⁴³. On the other hand, the read depth (RD) algorithm is more likely to easily detect deletions, as a decrease in the read depth or coverage at a certain genomic region signals them¹. While duplications may not always result in a straightforward increase in read depth, once the duplicated region may have variable copy numbers and irregular coverage, making detection more challenging⁴³. Furthermore, increases in the read depth or coverage are not always indicative of duplicated copies. These could be caused by alignment artifacts that cannot be differentiated from a true duplication through the RD approaches⁴⁴.

Combining different SV calling algorithms should reduce the false discovery rate compared to the use of a single approach. Furthermore, the application of a stringent merge strategy and filtering criteria, as conducted in this study, may ensure a more reliable identification of copy number variants²⁸. The increase in the accuracy of detection of all types of SVs can be achieved through long-read sequencing (up to 80 Kb), in which single reads span complex structural variants, but its use on a large scale continues to be limited because of the high cost⁸.

In relation to the balanced structural variants, inversions represented 8,36% of the total predicted high-confidence SVs (Table 1). The proportion of inversions observed in the present study was lower than that reported by Boussaha et al.²⁴who used WGS data from 380 animals of different Bos taurus dairy and beef cattle breeds, in which the inversions represented more than 30% of the total number of detected SVs. Our results also differ from the report by Chen et al.⁴⁵where the proportion of inversions was higher than 20% of the total number of detected SVs in a study using WGS data from 308 Holstein animals. However, our results agree with the report by the latter authors on Jersey cattle, in which inversions represented 9% of the total number of SVs detected. The proportion of detected inversions in our study was higher than that reported by Lee et al.⁴⁶ (~ 4%), who used RD and SR approaches to detect SVs from WGS data in Holstein cattle.

There is still a lack of knowledge regarding the characterization of balanced structural variants, such as inversions, in Nellore cattle. This is because most of the reported studies in this breed are based on SNP genotyping data^12,14,22from which it is not possible to identify mobile elements but only copy number gains and/or copy number losses⁴. Nevertheless, there is an increasing interest in creating structural variation maps in cattle, with a higher number of published studies using whole genome sequencing data in taurine dairy and beef cattle than in indicine cattle.

In evolutionary genomics, chromosomal inversions are known as a powerful force contributing to local adaptation and diversification, as they have the potential to protect inverted sequences from recombination, allowing the maintenance of favorable allele combinations⁴⁷. Despite their evolutionary importance, there is not enough information to enable the characterization of this type of SV in cattle, and most of the studies continue focusing on deletion and duplication events. This is likely because the detection of SVs is often conducted through the RD approach, which only allows the identification of CNVs once balanced rearrangements (inversions and translocations) do not cause changes in read depth signature^5,48. Furthermore, the complexity of the nature of these variants makes their interpretation and discovery more challenging when compared to the other SV types. The inversions and translocations’ breakpoints usually lie within complex repeats that reduce mappability, for which reliable detection requires the use of bioinformatics tools based on a combination of algorithms, such as split reads and paired reads⁴⁸.

Length classification of high-confidence SVs

The high-confidence SVs have an average length of 86.9 Kb (± 365.1), ranging from 1 Kb to 4,989.5 Kb (Table 1). Although the CNVs, particularly the deletions, represent the highest number of the detected SVs, on average, they are the smallest (22.68 ± 167.31 Kb), while inversions are the longest ones (613.03 ± 851.34 Kb), followed by duplications (250.61 ± 560.12 Kb) (Table 1). Analysis of the distribution of SVs according to different size classes revealed that most CNVs are 1–5 Kb long (73%), while inversions are distributed in all size classes, but most of them are 200 Kb long (48%) (Fig. 3b). From the predicted deletions, the highest proportion is distributed into the classes 1–5 Kb and 5–10 Kb (77.2 and 16%, respectively), whereas predicted duplications are distributed in all size classes, where the lowest proportion are 5–10 Kb long (7.2%) (Table 2).

Table 2 Statistics summary of the number and proportion of each SV type according to different size classes.

Full size table

The distribution of high-confidence CNVs in the established size classes agrees with the results of Bickhart et al.¹⁰who observed significantly shorter CNVs (< 15 Kb) in a Nellore individual than the average length of the CNVs detected in different taurine individuals using WGS data. In contrast, our results differ from those reported by Lemos et al.¹² in Nellore cattle, who observed that most of the CNVs detected from SNP data had lengths ranging from 10 to 500 Kb and that less than 30% of the CNVs were 1–10 Kb-long. The results presented here also differ from those reported by Silva et al.²² in Nellore cattle, in which the highest number of CNVs predicted from SNP data ranged from 100 to 900 Kb.

It is important to note that the differences in CNV length between the report in the present study and the results reported in the literature are mostly attributed to the nature of the data used and, consequently, to the bioinformatic algorithm for CNV calling. WGS data provides higher resolution compared to SNP data, as it allows the identification of sequence reads spanning the entire genome. WGS-based algorithms enable the identification of smaller structural variants, especially CNVs, with greater power³. In contrast, CNV detection using SNP genotyping data often relies on analyzing changes in signal intensity at SNP probes. As there is a wide range of commercial SNP arrays, they all have different probe densities, which directly impact CNV results. SNP-based detection methods exhibit good performance in detecting large events, but compared to WGS-based algorithms, they tend to lack power in detecting smaller CNVs (< 10 Kb) because of the low SNP probe density and resolution⁴.

In general, the inversions are the largest structural variants detected, with an average length of 613.03 ± 851.34 Kb (median of 167.45 Kb) (Table 1). The length of inversions observed in this study is higher than that reported by Chen et al.⁴⁵ in Holstein (median ~ 2.53 Kb) and Jersey (median ~ 1.34 Kb) cattle. Considering the observed average length of inversions, 48% are distributed into the size class > 200 Kb, 22% into the size class 50–200 Kb, while a low proportion (3%) is distributed into the size class 5–10 Kb (Fig. 4a; Table 2). These results differ from those reported by Boussaha et al.²⁴ in European dairy and beef cattle animals, in which more than 50% of the detected inversions were distributed into the size class corresponding to 1–10 Kb.

It is outstanding that the combination of tools and SV calling algorithms enabled the detection of large inversions, as they tend to be harder to identify. There are few available and reliable information regarding inversions and translocations in livestock species, especially in cattle, limiting comparisons. More studies of inversions are available in humans than in cattle, as there is an increasing interest in assessing the impact of reversed chromosome regions on phenotypes and diseases⁴⁹. Accordingly, the vast number of inversions in the human genomes available in the InvFEST public database⁵⁰ are lower than 100 Kb long, from which variants with lengths lower than 10 Kb are considered reliable, and those larger than 10 Kb are probably false positives⁴⁹.

Nevertheless, genomic studies on inversions reviewed by Wellenreuther & Bernatchez⁴⁷showed that the average length of chromosomal inversion events in different species of plants and animals is 8.4 Mb, ranging from 130 Kb to 100 Mb. According to studies in evolutionary genomics in different species, including invertebrates and mammals, large chromosomal inversions are more favored by selection than shorter inversions, exhibiting lower rates of recombination between a larger number of genetically distant loci^47,51.

Chromosomal location and distribution of high-confidence SVs

The predicted high-confidence structural variants were not uniformly distributed across the 29 autosomal chromosomes. The top five chromosomes harboring the highest number of predicted SVs are BTA1 (N = 14,996), BTA2 (N = 13,773), BTA5 (N = 12,834), BTA4 (N = 12,16), and BTA10 (N = 10,372), while the top five chromosomes harboring the smallest number of SVs are BTA25 (N = 2,770), BTA19 (N = 3,145), BTA22 (N = 3,676), BTA27 (N = 4,035), and BTA24 (N = 4,607). The distribution of each SV type along the 29 autosomes is shown in Fig. 4.

Regarding each SV type, the BTA1, BTA2, and BTA5 carry the highest number of deletions, whereas BTA23, BTA15, and BTA10 carry the highest number of duplications, and BTA4, BTA10, and BTA5 harbor the highest number of inversions (Fig. 4a). Our findings regarding the distribution of SV along the chromosomes are consistent with those observed by Peripolli et al.¹⁸ when analyzing WGS data from three Brazilian locally adapted taurine breeds, in which the largest number of structural variants were found on BTA 1 and 15. However, the authors only predicted CNVs (duplications and deletions). The chromosomal locations observed here are also similar to those reported by Letaief et al.²⁸ in different French dairy and beef cattle breeds, in which the largest number of SVs was observed on chromosomes 1, 6, and 5 and the smallest number on chromosome 25. The authors also combined different tools for the SVs calling. The use of different bioinformatic algorithms when analyzing WGS data, as well as the filtering and validation of SVs at software, individual, and population levels, can lead to the identification of high-confidence structural variants. In contrast, the chromosomal location of SV was different from the reported in Nellore cattle from SNP and WGS data by Silva et al.²²in which most predicted structural variations were located on BTA19, BTA11, BTA2, and BTA18 for SNP-based called SVs, and on BTA12, BTA15, BTA1, and BTA2 for WGS-based called SVs. Notice that the mentioned study was only focused on CNV assessment.

Chromosomal extension covered by high-confidence SVs

BTA10 was ranked as the fifth, regarding the total number of SVs that carried, but the average length of its harbored SVs (728.36 Kb) was notably high in relation to the average length of SVs located on the other chromosomes, followed by SVs located on BTA9, 23, 5, and 15, with average lengths ranging from 102.98 Kb to 143.48 Kb (Fig. 4b). However, the Pearson correlation did not indicate statistical evidence regarding the association between the number of SVs and the average length of SVs harbored by each chromosome (Pearson correlation score p-value = 0.197, p-value = 0.3053). Our findings were similar to those reported by Peripolli et al.¹⁸in which the longest CNV, corresponding to deletion events, was observed on chromosome 10. The BTA10 was the one having the largest proportion of average chromosome size covered by structural variation (average coverage rate = 50.4%), followed by BTA 23, 9, 15, and 28, with average coverage rates ranging from 7.6 to 11% (Fig. 4c). The lowest average coverage rate was observed on BTA25, which was also the chromosome harboring the lowest number of structural variants and the smallest ones and was followed by BTA17, 19, 16, 14, 8, and 3, with average coverage rates ranging from 0.9 to 1.1%. These results differ from those reported by Letaief et al.²⁸where the highest percentage of sequence covered by copy number variations in French dairy and beef cattle breeds was found on chromosome 27, but agree in that the least covered sequence by structural variants was located on BTA25.

Pearson correlation analysis revealed a significantly high and positive association between the number of structural variants and the chromosome size (Pearson correlation score p-value = 0.87, p-value < 6.8e-10), which agrees with the findings of Letaief et al.²⁸ but is different from the results reported by Boussaha et al.²⁴ in French dairy cattle. In contrast, no statistical evidence regarding correlations between the chromosome size and the average proportion of chromosome extension covered by SVs (average coverage rate) was found in this study (Pearson correlation score p-value = 0.04, p-value = 0.833).

At the individual level, an average number of 1,437 SVs (± 894.7) were predicted for each bull, with a minimum number of 89 and a maximum of 3,428 SVs. The descriptive statistics about the number and length of each SV type observed in the analyzed bulls are presented in Supplementary Table S2. Structural variation encompasses, on average, 4.81% (± 4.38) of the Nellore bulls’ total genome extension. The largest genome coverage rate of SV observed in a bull was 20.27% and was largely attributed to inversion events, and the lowest genome coverage rate was 0.03% (Fig. 5, Supplementary Table S2). A significant, moderate, and positive correlation was found between the number of SVs predicted per bull in each chromosome and the chromosomal coverage rate (Pearson correlation score p-value = 0.36, p-value < 2.2e^− 16).

Discovery of potential structural variant regions (SVR)

Number, chromosomal location and distribution of SVRs

High-confidence structural variants identified in the genome of bulls were used to detect genomic regions where different types of structural variants may occur. Specifically, a deletion may be observed in one bull and a duplication in another bull (across animals). Although less common, complex rearrangements may also lead to the coexistence of both deletion and duplication affecting overlapping segments within the same individual (within animal), as previously reported for certain types of structural variant architectures. The main purpose of translating structural variants into regions (SVR) is to characterize the genetic architecture underlying population history, to assess the potential impact of structural variations on gene function and expression, and to integrate information into genomic databases, allowing a more comprehensive understanding of differences between species, breeds, and populations³⁵.

Due to the different nature of SV types, two datasets were formed: one containing the detected CNVs (encompassing duplication and deletion events), and the other containing only the inversions. This enabled the proper identification of the SV event occurring in a specific region. Thus, SVRs were constructed from those two data sets by merging the SVs found in the individual genome of bulls that reciprocally overlapped in at least one base pair, resulting in two types of SVRs: CNVRs (regions of copy number variants) and INVRs (regions of inversions). In addition, only SVRs that were frequent in at least 5% of the analyzed Nellore bulls were considered for further analysis. A total of 3,752 highly frequent SVRs were identified, from which 97% are composed of CNVRs (N = 3,641), and 3% correspond to INVRs (N = 111; Fig. 6a).

From the total number of detected CNVRs, the vast majority (90%, N = 3,286) corresponds to deletion events (CNVR-DEL), a small proportion (7%, N = 256) to mixed events in which both deletions and duplications occurred (CNVR-MIXED), and the lowest proportion (3%, N = 99) to only duplications (CNVR-DUP) (Fig. 6b).

The distribution of events occurring in the detected CNVRs observed in this study is different from those reported by Lemos et al.¹² in Nellore cattle, in which the CNVRs were composed in more balanced proportion by the three types of events (38.5% gains, 33% mixed, and 28.5% losses). The results also diverge from the reported CNVRs in Silva et al.²² in Nellore cattle, where 65% of the CNVRs corresponded to mixed events, 31.5% to duplication, and a small proportion (2.89%) to deletion events. Although the CNVRs detected in Nellore cattle by Santana et al.¹⁴ were attributed in higher proportion to copy number losses than copy number gains, the proportion of deletion events corresponding to the CNVRs observed in this study is higher than the reported by those authors (41.6% deletion events, 35% duplications, and 23% mixed events).

Using the information on WGS in Gyr cattle, Braga et al.²⁰ found that the CNVRs detected from CNVnator calls were composed of more than 50% deletions, around 36% of duplications, and 11% of complex events (duplications and deletions). However, the results presented here are more consistent with what was reported by these authors, in which the CNVRs detected from Delly calls were attributed 70% to deletions, 7.8% to duplications, and 22.2% to complex events. Correspondingly, our findings agree with the results of Hou et al.⁵² in different breeds, including taurine, indicine, and African cattle, in which most of the detected CNVRs were composed of loss events (61.3%), but the representation of regions of deletions observed in this study is still higher.

A considerably higher number of deletions than duplications composing a CNVR was expected, given the total number of DEL calls obtained. Nevertheless, it is still surprising that a small fraction of duplications represents common events for all bulls. Interestingly, this finding is similar to the one reported by Letaief et al.²⁸ in French dairy and beef cattle, in which only 1.6% of CNVRs corresponded to duplications, 4,1% to complex events (deletions and duplications), and regions of deletions comprised more than 95%. Our results are also in concordance with the results reported in a recent study in Simmental cattle combining WGS data from short and long reads, where only 4,2% of the CNVRs were constituted by duplication regions, 39.6% by mixed events, and more than 56% by deletions¹⁹. It is important to outline that the number of the detected duplications in this study corresponded only to tandem duplications, as the algorithms and bioinformatic tools used to conduct the SV calling do not enable the identification of discontinuous duplications. Further factors that have likely driven the small number of called duplications relative to deletions were already discussed and are basically referred to as the greatest ability of SV callers to identify sequence losses in short reads sequencing data^43,45the difficulty to differentiate duplications from alignment artifacts⁴³and potential molecular mechanisms such as NAHR that potentially leads to more deletions⁴².

The highest number of highly frequent SVRs was harbored by BTA1 (N = 261), followed by BTA6 and BTA2 with 211 SVRs each. BTA25, which carries the lowest number of SVs, has the lowest number of SVR as well (N = 47), followed by BTA19 and 22 with 67 SVR each (Table 3; Fig. 7). Regarding CNVRs, CNVR-DEL and CNVR-MIXED were uniformly distributed across the 29 autosomes, but CNVR-DUP was not observed in BTA20, BTA24, and BTA28 (Supplementary Figure S1). The BTA1 harbored the highest number of CNVRs (N = 254), most of them corresponding to deletions (N = 235 CNVR-DEL), followed by mixed (N = 12 CNVR-MIXED) and duplication events (N = 7 CNVR-DUP) (Supplementary Table S3, Figure S1). This result is in concordance with previous observations of CNVR distribution in Nellore cattle, reported by Lemos et al.¹² and Silva et al.²²where the highest number of CNVRs were located on BTA1 but differs from theirs in that the smallest proportion of CNVR were located on BTA29 and BTA27, respectively. Our findings are similar to the results of Letaief et al.²⁸where the highest and the lowest number of CNVRs were distributed on BTA1 and BTA25, respectively, and a similar distribution was observed by Peripolli et al.¹⁸ in two of the three locally adapted taurine breeds. The SVRs composed of inversion events, BTA15 and BTA18 harbored the highest number of INVR, with 11 and 9 INVR, respectively, while BTA22 and BTA24 carried the lowest number of INVR (N = 1). INVR were distributed along the autosomes, except the BTA14, which did not harbor any INVR (Supplementary Table S3, Figure S2).

Table 3 Statistics summary for the number of svrs, length and proportion of chromosome (Chr) extension covered by structural variant region (SVR) per autosomal chromosome.

Full size table

Average length and autosomal genome extension covered by SVRs

The 3,752 SVRs exhibit an average length of 89.91 Kb (± 469.09) and a median of 2.08 Kb. The average length of INVR is 396.44 Kb (± 951.03 Kb) (Table 4). Regarding CNVRs, the length trend is similar to the single events, in which the region of deletions exhibits the highest number but the shortest length (25.95 ± 232.35 Kb), followed by regions of duplications (36.47 ± 91.23 Kb), and the regions of mixed events (799 ± 1.24 Kb) which represent the longest CNVR type (Table 4).

Table 4 Statistics summary for the number, length, and genome coverage rate for each structural variant region (SVR) type (present in at least 5% of bulls).

Full size table

The longest SVR found has a length of 8,530.07 Kb, and the join of all SVRs length comprises a total of 337,348.3 Kb, which represents 13.12% of whole autosomal genome extension (attributed to CNVRs in 11.40% and INVR in 1.71%) (Table 4). This result is in concordance with what Lemos et al.¹² previously reported, where the predicted regions with duplication and deletion events (CNVR) from SNP genotyping data represented 13% of the genome coverage. Remarkably, the authors analyzed a Nellore population from which the analyzed bulls belonged to the same population as the evaluated in this research. Thus, similar results regarding the genome coverage of structural variations were expected. However, some of our findings have differed from those previously reported by these authors, mainly due to differences in the number of animals analyzed, the data nature used, and the combination of different strategies and algorithms for structural variants calling.

The total genomic coverage of CNVR found in this study was higher than the reported coverage rate in other Nellore cattle populations. For example, Santana et al.¹⁴ observed a total coverage of CNVR of 9% for predictions made from SNP data, and Silva et al.²² reported an autosomal coverage of CNVR of 1.5% for predictions made from SNP and NGS data. When comparing our results with the reported for other taurine and indicine breeds, our observed coverage rate was also higher than those from Bickhart et al.¹⁰ (2.1%), Bickhart et al.²³ (3.1%), Keel et al.⁵³ (6.7%), and Hu et al.⁴¹ (1.5%), and Sun et al.¹⁹ (0.18%), in which CNVRs reported were those shared by diverse cattle breeds including Nellore, Brahman, Gir, and different Bos taurus breeds.

Likewise, SV distribution according to size classes, most of the detected SVRs (73%) have lengths ranging from 1 to 5 Kb, 14% are 5–10 Kb-long, 4% are 10–50 Kb-long, 3% 50–200 Kb-long, and 6% are larger than 200 Kb. (Table 5). CNVRs are mainly distributed into the classes 1–5 Kb and 5–10 Kb (Fig. 3a), while INVRs are more balanced and distributed into three size classes (10 to 50 Kb, 50 to 200, and larger than 200 - Fig. 8).

Table 5 Number and proportion of each type of detected SVR according to different size classes.

Full size table

The top five longest CNVRs are located on BTA11, 26, 21, 2, and 24, with average lengths ranging from 186.99 to 116.22 Kb (Supplementary Figure S3a). Otherwise, the top five shortest CNVRs are harbored by BTA6, 16, 25, 17, and 1, with average lengths ranging from 34.23 to 49.5 Kb. It is remarkable that although BTA1 is ranked first in terms of the number, the harbored CNVR is of short lengths. This supports the results of the Pearson correlation test, which revealed that there is a trend that the higher the number of SVRs carried by each chromosome, the shorter the SVRs are (Pearson correlation coefficient p-value = -0.37, p-value = 0.04976). By their side, the top five longest regions of inversions are carried by BTA29, 10, 7, 12, and 20, while the shortest INVR is located on BTA22, 2, 16, 1, and 6, respectively (Supplementary Figure S4a).

Regarding the proportion of total chromosome extension that is covered by SVRs, the BTA29 is the most covered by regions with structural variation (chromosomal coverage rate of 30.53%), followed by chromosomes 26, 28, 20, and 21, with chromosomal coverage rates ranging from 22.46 to 24.39%. On the other side, BTA25 exhibits the lowest chromosomal coverage rate (5.21%), followed by BTA6, 16, 17, and 1, with chromosomal coverage rates ranging between 5.82 and 8.07% (Fig. 9; Table 3). BTA26 is the most covered chromosome by CNVR at 24.34%, while BTA25 showed the lowest covered at 4.68% (Supplementary Figure S3b).

The maximum percentage of chromosomal coverage observed in this study is higher than the maximum percentage reported by Lemos et al.¹² (24.26%) in Nellore cattle, differing from their results regarding the chromosomal distribution of coverage rates. Specifically, Lemos et al.¹² identified BTA19, BTA23, and BTA25 as the most covered chromosomes, whereas BTA9, BTA13, and BTA8 were the least covered by CNVR. The results also differ from those reported by Silva et al.²² in Nellore cattle, where CNVRs on BTA24 exhibited the greatest total length and those on BTA19 had the shortest. In our study, the highest chromosomal coverage by INVRs was observed on BTA29, whereas BTA22 exhibited sparse coverage by inversions (Supplementary Figure S4b). Notably, BTA29 did not exhibit the highest number of SVRs, nor were the longest SVRs observed on this chromosome. Rather, the highest coverage on BTA29 can be attributed to the presence of two large inversions, combined with its smallest length, which increases the proportion of its sequence affected by SVs. Our results coincided with those from Letaief et al.²⁸ on French cattle, where chromosomes 25 and 1 were the least covered by structural variants. Pearson correlation analysis revealed a significant, strong, and positive association between the number of SVRs and the chromosome size (Pearson correlation score p-value = 0.93, p value = 4.84e^− 13), which is concurrent with the findings of Letaief et al.²⁸ and Silva et al.²². In contrast, no significant correlation was detected between chromosome size and the proportion of chromosomal length covered by SVRs (chromosomal coverage rate).

Functional annotation and enrichment of SVR

Most of the identified SVRs were common for less than 50% of bulls (Fig. 10), but a considerably high number of SVRs were shared by more than half of the studied population (N = 532). To assess the functional impact on genetic mechanisms underlying Nellore cattle structural variations, genes, and QTLs overlapping SVRs shared by more than 50% (N > 75 bulls) were retrieved from a publicly available genomic dataset, as described in the material and methods section. Figure 11 illustrates the chromosomal location of the most frequent SVRs (N = 532). Overall, from the 532 most frequent SVRs, 96 overlapped a total of 130 QTLs, which were distributed in 6 QTL types: exterior, health, meat and carcass, milk, production, and reproduction, which are related to a total of 50 traits (Supplementary Table S4). Figure 12 represents the number of SVRs overlapping QTLs, according to QTL type and trait associated.

The highest number of SVRs overlapped with QTLs related to residual feed intake (N = 21), followed by structural soundness (N = 16), multiple births (N = 9), clinical mastitis (N = 7), and milk energy yield (N = 7). The SVRs overlapping QTLs were distributed in all chromosomes except on BTA24. The BTA6 was the chromosome harboring the largest number of QTLs overlapping SVRs (N = 11), followed by BTA7 (N = 8) and BTA3, 4, 5, and 23 with 7 QTLs each. In addition, the region named CNVR1799 (BTA6:72.92–74.15 Mb) that was present in 75 bulls overlapped a total of 6 QTLs, distributed in two QTL types: production (residual feed intake), and exterior (teat placement, foot angle, udder quality, udder depth, and degree of spotting). Regarding the INVR, 4 regions overlapped QTLs: INVR27 (BTA1:39.48–39.53 Mb) overlapping a QTL related to structural soundness, INVR131 (BTA3:20.94–22.58 Mb) overlapping a QTL related to milk yield, INVR549 (BTA15:78.78–80.14 Mb) overlapping a QTL related to liver abscess, and INVR557 (BTA16:3.11–3.12 Mb) overlapping a QTL related to structural soundness. Most of the 96 SVRs were linked to just one QTL.

Regarding the gene annotation, from the 532 most frequent SVRs, a total of 205 SVRs overlapped known genes (Supplementary Table S5). A total of 1,164 positional candidate genes surrounding the SVRs were annotated. The most annotated region of copy number variants named as CNVR3428 (BTA13:57.35–57.36 Mb) harbored a total of 84 genes, followed by the region CNVR4347 (BTA18:61.87–61.90 Mb), which harbored 58 genes, and the regions, CNVR1601 (BTA5:112.69-112.71 Mb), CNVR4532 (BTA20:33.44–33.51 Mb), and CNVR4703 (BTA21:31.13–31.19 Mb), harboring 51 genes each one (Supplementary Table S5).

From the SVR that overlapped positional candidate genes, there were in total 7 regions of inversions, from which the named INVR549 (BTA15:78.97–78.98 Mb) was the most annotated harboring 44 genes, followed by INVR131 (BTA3:20.94–22.58 Mb) that harbored 24 positional candidate genes (Supplementary Table S5). All the SVRs carrying known genes were distributed in all chromosomes, and the largest number of annotated genes was harbored by BTA21 (N = 116), followed by BTA5, 2, 13, and 18 with 109, 105, 100, and 78 annotated genes, respectively.

PANTHER overrepresentation analysis through the Binomial test and Bonferroni correction method showed that from the 1,164 positional candidate genes that overlapped the most frequent SVRs, 695 were significantly enriched for three gene ontology categories and one biochemical pathway (Fig. 13). The three gene ontologies were biological process (BP), with 632 genes grouped into 15 significant GO terms (Supplementary Table S6), molecular function (MF) with 139 genes grouped into 4 significant GO terms (Supplementary Table S7), and cellular component (CC) with 236 genes grouped into 2 significant GO terms (Supplementary Table S8). In addition, 26 genes were significantly associated with a biochemical pathway (Supplementary Table S9).

Regarding biological-related GO terms, most of the enriched genes were significantly associated with cellular process (GO:0009987, N = 49), followed by biological regulation (GO:0065007, N = 386), regulation of biological process (GO:0050789 N = 369), regulation of cellular process (GO:0050794, N = 352), and response to stimulus (GO:0050896, N = 236). Enriched genes were also associated with biological processes related to cell signaling and response, including cell communication (GO:0007154, N = 195), signaling (GO:0023052, N = 194), signaling transduction (GO:0007165, N = 184), G protein-coupled receptor signaling pathway (GO:0007186, N = 72), nervous system process (GO:0050877, N = 61), sensory perception of chemical stimulus (GO:0007606, N = 57), sensory perception (GO:0007600, N = 57), cellular response to stimulus (GO:0051716, N = 31), cellular response to endogenous stimulus (GO:0071495, N = 32), and response to endogenous stimulus (GO:0009719, N = 31) (Supplementary Table S6).

For Molecular Function, 63 genes were significantly associated with odorant binding (GO:0005549), 129 to transmembrane signaling receptor activity (GO:0004888), 130 to signaling receptor activity (GO:0038023), and 139 to molecular transducer activity (GO:0060089) (Supplementary Table S7). With respect to cellular components, 32 genes were significantly associated with the organelle sub-compartment (GO:0031984) and 230 with the membrane (GO:0016020) (Supplementary Table S8). Finally, a total of 26 genes were significantly associated with the phospholipid metabolism pathway (Supplementary Table S9). It is outstanding that from the enriched set, there are several genes (91 genes) belonging to the olfactory receptor (OR) family, which overlapped SVRs located on BTA3, 5, 7, 15, 23, and 29. Most of these OR genes are harbored by regions of inversions, where 39 OR genes are located in the region named INVR549 (BTA15; 78.78–80.14 Mb - Supplementary Table S5) that is shared by 76 bulls and overlaps a health type QTL (liver abscess, Supplementary Table S4). The regions INVR131 (BTA3; 20.94–22.58 Mb - Supplementary Table S5) and INVR240 (BTA5:58.73–58.94 Mb; Supplementary Table S5) harbored 2 OR genes each one, and are shared by 83 and 86 bulls, respectively. Additionally, INVR131 overlaps a milk-type QTL related to milk yield (Supplementary Table S4). The other enriched OR genes are mainly harbored by copy number regions of mixed events (deletions and duplications), such as the region CNVR1946 (BTA7:8.71–10.10 Mb; Supplementary Table S5), which harbored 19 OR genes. This region is common for 101 bulls and overlaps a production type QTL related to residual feed intake (Supplementary Table S4). Similarly, 15 OR genes are located in the region CNVR5606 (BTA29:27.14–27.60 Mb; Supplementary Table S5), are observed in 87 bulls, and 7 OR genes are carried by the region CNVR4958 (BTA23:29.21–29.32 Mb; Supplementary Table S5) common for 76 bulls. Other 5 enriched OR genes overlapped the mixed CNV region CNVR1465 (BTA5:58.63–58.91 Mb; Supplementary Table S5), which is located near a reproduction QTL related to multiple births and that is common for 102 bulls. The OR5M10C gene is located in the region CNVR4931 (BTA23:17.75–22.14 Mb; Supplementary Table S5), which overlaps two reproduction QTLs (multiple birth and interval to first estrus after calving; Supplementary Table S4), and one health type QTL related to health complement hemolytic activity (Supplementary Table S4). Finally, the OR5AE3 gene is harbored by a deletion region named CNVR2043 (BTA7:40.99–41.03 Mb), which is common for 82 bulls.

Some authors have reported significant associations of SNPs located in the vicinity of OR genes with economically important traits in beef cattle. For example, the OR5M10 gene, which was observed in this study within the region INVR549, was reported by Higgins et al.⁵⁴ near the SNP rs136789347, which was significantly associated with the average daily gain (ADG) in Irish beef cattle breeds. Here, the OR5M10 gene was significantly enriched for 12 of the 15 GO terms related to Biological Processes and for all GO terms related to Molecular Functions (Supplementary Table S6 and S7). Seabury et al.⁵⁵ also reported significant associations of SNPs surrounding OR genes with average daily gain and residual feed intake in three different U.S. beef cattle breeds. In Nellore cattle, Olivieri et al.⁵⁶ found 3 OR genes harbored by a genomic region located on BTA15 that explained more than 1% of additive genetic variance of dry matter intake (DMI).

The mixed region CNVR1465 (BTA5:58.63–58.91 Mb) carried the genes OR6C1, OR6C1N, and OR6C1S, which were also found by Santana et al.¹⁴ overlapping CNV regions significantly associated with feed conversion rate in Nellore cattle, in a genotyped-based study. Those genes were significantly enriched for the MF GO terms transmembrane signaling receptor activity (GO:0004888), signaling receptor activity (GO:0038023), and molecular transducer activity (GO:0060089) (Supplementary Table S7). The mentioned mixed CNV region overlapped a reproduction QTL related to multiple birth traits. Although most of the studies found in the literature reveal the functional role of the OR gene family on feed efficiency-related traits, some genes belonging to the gene family have also been associated with reproductive performance in cattle⁵⁷. The OR genes are also involved in sperm chemotaxis and the development of female germ cells and oocytes⁵⁸.

Similarly, the gene set OR7A112, OR7A78, OR7A95, OR7A97, and OR7A99 harbored by the mixed region CNVR1946 were also reported by Braga et al.²⁰ overlapping CNVRs in Brazilian dairy Gyr cattle, from SNP and WGS data. In this study, those genes were significantly associated with 8 of the 15 BP-related GO terms, 3 MF-related GO terms, and the CC GO term membrane (GO:0016020). Among those genes, the OR795 gene was also reported by Rafter et al.⁵⁹ as a flanked gene into a deletion region significantly associated with meat carcass traits in Charolais, Holstein, and Limousin cattle.

The OR genes comprise the largest superfamily in the mammalian genome. They belong to the G-protein-coupled chemoreceptors (GPCRs) and are essentially expressed on countless olfactory sensory neurons within the nasal cavity^60,61. These protein-coding genes play roles in the reception of a wide range of chemical molecules in the environment⁶². Odorant mechanisms are essential for animals as they send signals to the brain in order to locate feed sources, danger, or potential mating⁶². Besides the nasal cavity, olfactory receptors can be expressed in other organs where they bind to nutrients and metabolites to trigger physiological responses such as changes in gut motility, ventilation rate, and cellular migration⁶¹. The OR activity is mediated by the central nervous system and is influenced by hormonal, nutritional, and metabolic mechanisms regulating energy balance that trigger behaviors such as feed searching and intake^63,64,65.

The structural conformation of the OR gene family has been well-studied in several vertebrate species. There is evidence showing that the OR gene family has evolved due to duplication mechanisms, which has guaranteed their maintenance and survival in the vertebrates’ genomes^1,66. Lee et al.⁶² observed a large variation in OR gene copy number in Korean Holstein and Angus cattle. The authors found via PCR that 40% of the tested OR loci exhibited CNVs, which suggests that copy number variations of OR genes in cattle are highly frequent and that their genetic diversity confers differences in olfaction capacity and, then, in fitness. Although duplications have been described as the main mechanism driving the maintenance of the OR genes, it results particularly interesting that regions of inversions detected in this study harbored such a great number of genes belonging to the olfactory receptor superfamily. Due to the functional relevance of OR genes for adaptation and fitness, it is plausible that under selection pressure, the desirable alleles for OR genes were protected against recombination for being widely located on regions of inversion events, thus playing important roles in the adaptation of Nellore to environmental conditions.

Another important gene family found overlapping the most frequent SVR is the B defensin gene family (DEFB), from which 6 genes (DEFB1, DEFB10, DEFB119, DEFB124, DEFB13, and DEFB7) were significantly associated with 11 biological processes GO terms, mainly related to cellular communication, signaling and response to stimulus (Supplementary Table S6). Defensins are protein-coding genes that have key functions in the innate immune system of multicellular organisms. Besides playing important roles in antimicrobial activity, these molecules act in diverse biological mechanisms such as cell signaling, ion channel transport, toxic functions, and enzyme inhibitor activities in animals⁶⁷. As well as olfactory receptor genes, the immunity-related genes are known to be more likely duplicated than other genes. Our results are in concordance with those reported by Butty et al.²¹. These authors observed members of the DEFB family in their predicted CNV regions in a Holstein population from genotypes and WGS data.

Similarly, in a population of Holstein and Jersey cattle, Chen et al.⁴⁵ identified that genes belonging to expanded gene families, such as DEFB, were more associated with SV regions than other genes. Particularly, the gene DEFB1, which was found within the mixed copy number region named CNVR5333 (27:6.18–7.28 Mb), was also reported in the two mentioned studies. The DEFB1 was significantly related to GO term cellular response to endogenous stimulus, signaling receptor activity, and molecular transducer activity. Important genes in the vicinity of structural variant regions have been previously reported in the literature for their association with economically important phenotypes. For example, members of the secretory phospholipase A2 family of genes (PLA2G2A, PLA2G2D1, PLA2G2E, and PLA2G5), which were harbored by the mixed copy number region CNVR717 (BTA2:132.53-133.81 Mb), were also found into CNV regions significantly associated to meat carcass traits in Nellore cattle by Berton et al.⁶⁸. On the other hand, four members of Adhesion G Protein-Coupled Receptors (ADGRF1, ADGRF2, ADGRF4, and ADGRF5), were harbored by the mixed region CNVR4931 (BTA23:17.75–22.14 Mb), which were linked to one health QTL related to complement hemolytic activity, and to two reproduction QTLs related to multiple birth, and interval to first estrus after calving (Supplementary Table S4). Those genes were previously reported by Li et al.⁶⁹ for being significantly associated with feed efficiency indicator traits in Canadian crossbred cattle, such as residual feed intake, daily dry matter intake, average daily gain, and metabolic daily gain.

A large set of genes belonging to the zinc fingers binding protein family were also found near the structural variant regions. Particularly the mixed regions CNVR4338 (BTA18:57.18–60.17 Mb) and CNVR4347 (BTA18:61.81–63.40 Mb) harbored various members of that gene family, such as the genes ZNF524, ZNF581, and ZNF677, which were previously associated with dry matter intake, average daily gain, and feed efficiency in an experimental population of Nellore cattle by Olivieri et al.⁵⁶. Zinc finger proteins usually regulate gene expression by binding to specific DNA sequences via their DNA-binding zinc finger domains and recruiting cofactors that mediate chromatin remodeling and transcriptional repression or activation. In a transcriptomic study in Bos indicus cattle, Samuel & Dinka⁷⁰ observed that ZNF genes are likely involved in the regulation of the expression of olfactory receptor genes. Thus, ZNF genes may also play roles in cell signaling and communication through the sense of olfaction, which is essential for environmental adaption. The genes identified surrounding structural variants regions are more involved in functions related to biological and cell regulation, response to stimulus, and cell communication and signaling. Those functions are essential drivers of the environmental adaptation of genomes. The reported literature in Nellore cattle supports that most of the described genes are potential candidates for feed efficiency indicator traits, which reflect the biochemical mechanisms in which they are involved that have led to improved fitness.

Functional insights from genes related to high-frequency SVRs

Adaptation to tropical regions represents one of the most significant challenges in cattle breeds, requiring a coordinated response to persistent heat stress, seasonal nutritional limitations, and pathogen exposure. By analyzing the 532 most frequent SVRs in Nellore cattle, we identified genomic regions enriched for genes and QTLs associated with thermotolerance, immune function, metabolic regulation, and tissue resilience, all key attributes for tropical adaptation (Supplementary Table S4 and S10).

The ability to handle elevated ambient temperatures is one of the most important attributes required for cattle raised in tropical regions. We have found SVRs encompassing genes related to cellular protection and heat dissipation, which are important traits for thermal tolerance (Supplementary Table S10). Notably, HSP90AB1 (BTA 23, 17.75–17.76 Mb), a heat shock protein essential for maintaining proteostasis under thermal stress, has been directly associated with enhanced thermotolerance in cattle populations^71,72. Similarly, PRLR (BTA 20, 38.97–39.10 Mb), particularly its slick hair variants, promotes shorter hair coats that improve heat loss through convection and radiation, an adaptation widely documented in tropical breeds^73,74. Genes such as THRB (BTA 27, 41.46–41.90 Mb) and MC4R (BTA 24, 59.16–59.17 Mb), which modulate metabolic rate and appetite, respectively, play a pivotal role in limiting endogenous heat production during thermal stress^75,76,77while ion channel genes like KCNMA1 (BTA 28, 32.62–33.39 Mb) may facilitate vasodilation and peripheral cooling⁷⁸.

The capacity to mitigate oxidative damage, often a by-product of heat exposure, appears to be influenced by SVRs mapping the genes EGLN1 (BTA 28, 4.09–4.15 Mb), a key regulator of the hypoxia-inducible factor (HIF) pathway, as well as NFE2L2 (BTA 2, 19.62–19.65 Mb), JUN (BTA 3, 87.27–87.27 Mb), and ATF2 (BTA 2, 21.74–21.79 Mb) all of which are involved in activating antioxidant and cellular stress responses^78,79,80. Additional support for cutaneous resilience in hot and pathogen-rich environments may be conferred by DEFB1 (BTA 27, 6.72–6.72 Mb), while energy-sensing mechanisms mediated by PRKAA1 (AMPK) (BTA 20, 33.67–33.7 Mb) enable rapid metabolic adaptation under heat load^81,82. In addition to thermal regulation, SVR surround immune competence genes are also involved in the expression of traits important for adaptation to tropical production systems. SVRs overlapping the BoLA complex, a cornerstone of adaptive immunity, have been repeatedly linked to resistance against mastitis, tick infestation, and other tropical diseases⁸³. A set of innate immune genes (ALCAM, SYK, NLRP2, and FCAR) was identified within SVRs, together with immunoregulatory loci such as LAIR1, SIGLEC10, PILRA, and TNIP3. These immune functional gene sets contribute to a balanced immune response by mitigating chronic inflammation in challenging environments^84,85. Genes regulating hematopoiesis and lymphocyte development, including GATA2, SP110, IL7R, LIFR, and OSMR, were also found^86,87,88. In parallel, SLC11A2, which limits iron availability to intracellular pathogens, suggests the importance of nutritional immunity for disease resistance⁸⁹.

Tropical forage systems often present marked seasonal fluctuations in nutrient content and availability. In this context, SVRs were enriched for genes supporting metabolic flexibility and nutritional efficiency. MC4R and PRKAA1 play a central role by regulating feed intake and energy homeostasis⁷⁶. At the cellular level, nutrient sensing and growth coordination are enabled by IGF1R (BTA 21, 7.78–8.08 Mb), RPTOR (BTA 19, 51.69–52.01 Mb), and RICTOR (BTA 20, 35.36–35.5 Mb), key components of anabolic signaling pathways^90,91. Genes such as SREBF2 (BTA 5, 112.83–112.89 Mb), PCK1 (BTA 13, 58.61–58.65 Mb) and GPD1 (BTA 22, 6.59–6.69 Mb) support lipid and glucose metabolism under nutritional stress^92,93,94while detoxification enzymes encoded by ALDH4A1 (BTA 2, 133.76–133.79 Mb), AOX1 (BTA 2, 89.1–89.17 Mb) and AHR (BTA 4, 25.81–25.86 Mb) may facilitate adaptation to bioactive compounds found in diverse tropical forages⁹⁵.

Transcription factors such as MORC3, ETS2, JUN, GATA2, and NFE2L2, which are located within SVRs (Supplementary Table S10), regulate the expression of gene sets activated by environmental stressors^96,97,98,99. Epigenetic regulators, such as MYSM1 (BTA 3, 87.35–87.39 Mb), transcriptional repressors like SIN3A (BTA 21, 33.38–33.45 Mb), and stress regulators like PML (BTA 21, 34.58–34.63 Mb), may further fine-tune immune and metabolic responses^100,101. Moreover, NR1D2 (REV-ERBβ) (BTA 27, 41.99–42.01 Mb), a circadian regulator, suggests that alignment between metabolic activity and environmental light-dark cycles may also contribute to tropical resilience¹⁰².

The maintenance of cellular integrity under chronic environmental stress is another key adaptation. SVRs overlapping XRCC6, BLM, FANCI, and RUVBL1 indicate active DNA repair mechanisms that prevent genomic instability under oxidative and thermal stimulus^{103,104,105,106}. Tissue health is further supported by autophagy regulators (WDFY3 and OMA1) and anti-apoptotic factors such as BCL2L1, while redox regulators like TXNIP, NOX4, and SESN3 ensure intracellular oxidative balance^{107,108,109,110,111,112}.

The structural integrity and plasticity of tissues, especially skin and connective tissue, are maintained by SVRs containing FSTL1, ADAMTS2, COL15A1, HMCN1, BMP7, XYLT1, FAT4, and CTNNA1. These genes contribute to extracellular matrix remodeling and efficient wound repair, enhancing resistance to ectoparasites and mechanical injury in extensive grazing systems^{111,112,113,114,115}. The results demonstrate that SVRs in Nellore cattle capture functionally important genomic regions related to heat resilience, immune defense, metabolic adaptability, and tissue robustness. Considering this layer of structural genomic information into tropical breeding programs holds substantial promise for accelerating the development of resilient, high-performing animals suited to increasingly demanding environments.

Conclusion

This study provides a comprehensive assessment of structural variation in the Nellore cattle genome, using advanced WGS algorithms to detect deletions, duplications, and inversions. We identified 215,031 structural variants, including 183,032 deletions, 14,013 duplications, and 17,986 inversions, highlighting the widespread contribution of mobile elements to the genomic architecture of the Nellore population in Brazil. Notably, we are the first to report inversion events in Nellore cattle, emphasizing the influence of mobile elements on genomic structure. Among these, 3,752 structural variants were present in over 5% of bulls, and 532 structural variants were common to over 50% of the key ancestors analyzed. Notably, 97% of these SVRs mapped to regions of copy number variation (CNVR), and 3% were inversions (INVR). Among the CNVRs, 90% were deletions, 3% were duplications, and 7% contained both.

Functional annotation of these high-frequency SVRs identified 1,160 positional candidate genes overlapping QTLs involved in thermotolerance, immune regulation, metabolic flexibility, and tissue integrity, essential components of adaptation to tropical environments. The integration of such functionally relevant variants emphasizes the importance of structural genomic diversity in shaping adaptive phenotypes under environmental stressors such as heat, nutrient fluctuation, and pathogen pressure. These findings enhance our understanding of the genetic mechanisms behind Nellore cattle’s environmental adaptation due to changes in functionally significant genes. The inclusion of SV-based information in genomic evaluations may enhance the accuracy of selection for environmentally adaptive traits, complementing SNP-based analyses and offering novel opportunities for precision breeding in beef cattle.

Data availability

The data that support the findings of this study are available from Gensys breeding program (https://gensys.com.br/), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors (Dr. Lucia G. Albuquerque email: galvao.albuquerque@unesp.br) upon reasonable request and with permission of Gensys breeding program.

References

Bickhart, D. M. & Liu, G. E. The challenges and importance of structural variation detection in livestock. Front. Genet. 5, 77142 (2014).
Article Google Scholar
Fernandes Júnior, G. A. et al. Whole-genome sequencing provides new insights into genetic mechanisms of tropical adaptation in Nellore (Bos primigenius indicus). Scientific Reports 10, 1–7 (2020). (2020).
Ho, S. S., Urban, A. E. & Mills, R. E. Structural variation in the sequencing era. Nature Reviews Genetics 21, 171–189 (2020). (2019).
Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nature Reviews Genetics 12, 363–376 (2011). (2011).
Escaramís, G., Docampo, E. & Rabionet, R. A decade of structural variants: description, history and methods to detect structural variation. Brief. Funct. Genomics. 14, 305–314 (2015).
Article PubMed Google Scholar
Zhang, F., Gu, W., Hurles, M. E. & Lupski, J. R. Copy number variation in human health, disease, and evolution. Annu. Rev. Genomics Hum. Genet. 10, 451–481 (2009).
Article PubMed PubMed Central CAS Google Scholar
Freeman, J. L. et al. Copy number variation: new insights in genome diversity. Genome Res. 16, 949–961 (2006).
Article PubMed CAS Google Scholar
Couldrey, C. et al. Detection and assessment of copy number variation using PacBio long-read and illumina sequencing in new Zealand dairy cattle. J. Dairy. Sci. 100, 5472–5478 (2017).
Article PubMed CAS Google Scholar
Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 2009 464:7289 464, 704–712 (2010).
Bickhart, D. M. et al. Copy number variation of individual cattle genomes using next-generation sequencing. Genome Res. 22, 778–790 (2012).
Article PubMed PubMed Central CAS Google Scholar
Zhao, M., Wang, Q., Wang, Q., Jia, P. & Zhao, Z. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinform. 14, 1–16 (2013).
Article Google Scholar
Lemos, M. V. A. et al. Association study between copy number variation and beef fatty acid profile of Nellore cattle. J. Appl. Genet. 59, 203–223 (2018).
Article PubMed Google Scholar
Silva, V. H. et al. Genome-Wide detection of CNVs and their association with meat tenderness in Nelore cattle. PLoS One. 11, e0157711 (2016).
Article PubMed PubMed Central Google Scholar
Santana, M. H. A. et al. Copy number variations and genome-wide associations reveal putative genes and metabolic pathways involved with the feed conversion ratio in beef cattle. J. Appl. Genet. 57, 495–504 (2016).
Article CAS Google Scholar
Zhou, Y. et al. Genome-wide CNV analysis reveals variants associated with growth traits in Bos indicus. BMC Genom. 17, 1–9 (2016).
Article CAS Google Scholar
Chen, L., Pryce, J. E., Hayes, B. J. & Daetwyler, H. D. Investigating the effect of imputed structural variants from Whole-Genome sequence on Genome-Wide association and genomic prediction in dairy cattle. Animals 11, 541 (2021).
Article PubMed PubMed Central Google Scholar
Durán Aguilar, M. et al. Genome-wide association study for milk somatic cell score in Holstein cattle using copy number variation as markers. J. Anim. Breed. Genet. 134, 49–59 (2017).
Article PubMed Google Scholar
Peripolli, E. et al. Assessment of copy number variants in three Brazilian locally adapted cattle breeds using whole-genome re-sequencing data. Anim. Genet. 54, 254–270 (2023).
Article PubMed CAS Google Scholar
Sun, T. et al. Whole genome sequencing of simmental cattle for SNP and CNV discovery. BMC Genom. 24, 1–11 (2023).
Article Google Scholar
Braga, L. G. et al. Identification of copy number variations in the genome of dairy Gir cattle. PLoS One. 18, e0284085 (2023).
Article PubMed PubMed Central CAS Google Scholar
Butty, A. M. et al. High confidence copy number variants identified in Holstein dairy cattle from whole genome sequence and genotype array data. Sci. Rep. 10 (1), 10–11 (2020). (2020).
Article Google Scholar
Silva, J. M. et al. Genome-wide copy number variation (CNV) detection in Nelore cattle reveals highly frequent variants in genome regions harboring QTLs affecting production traits. BMC Genom. 17, 1–14 (2016).
Article Google Scholar
Bickhart, D. M. et al. Diversity and population-genetic properties of copy number variations and multicopy genes in cattle. DNA Res. 23, 253–262 (2016).
Article PubMed PubMed Central CAS Google Scholar
Boussaha, M. et al. Genome-Wide study of structural variants in bovine holstein, Montbéliarde and Normande dairy breeds. PLoS One. 10, e0135931 (2015).
Article PubMed PubMed Central Google Scholar
Li, H. et al. The sequence alignment/map format and samtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article PubMed PubMed Central CAS Google Scholar
Fernandes Júnior, G. A. et al. Imputation accuracy to whole-genome sequence in Nellore cattle. Genet. Selection Evol. 53, 1–10 (2021).
Article Google Scholar
Letaief, R. et al. Identification of copy number variation in French dairy and beef breeds using next-generation sequencing. Genet. Sel. Evol. 49, 77 (2017).
Article PubMed PubMed Central Google Scholar
Zarate, S. et al. Parliament2: accurate structural variant calling at scale. Gigascience 9, (2020).
Abyzov, A., Urban, A. E., Snyder, M., Gerstein, M. & CNVnator An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Article PubMed PubMed Central CAS Google Scholar
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: A probabilistic framework for structural variant discovery. Genome Biol. 15, 1–19 (2014).
Article Google Scholar
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Article PubMed PubMed Central CAS Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, (2021).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article PubMed PubMed Central CAS Google Scholar
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
Wickham, H. et al. Welcome to the tidyverse. J. Open. Source Softw. 4, 1686 (2019).
Article ADS Google Scholar
Peterson, B. G., Carl, P. & PerformanceAnalytics Econometric Tools for Performance and Risk Analysis. CRAN: Contributed Packages Preprint at (2007). https://doi.org/10.32614/CRAN.package.PerformanceAnalytics
Fonseca, P. A. S., Suárez-Vega, A., Marras, G. & Cánovas, Á. GALLO: an R package for genomic annotation and integration of multiple data sources in livestock for positional candidate loci. Gigascience 9, (2020).
Harrison, P. W. et al. Ensembl 2024. Nucleic Acids Res. 52, D891–D899 (2024).
Article PubMed CAS Google Scholar
Hu, Z. L., Park, C. A. & Reecy, J. M. Bringing the animal QTLdb and CorrDB into the future: meeting new challenges and providing updated services. Nucleic Acids Res. 50, D956–D961 (2022).
Article PubMed CAS Google Scholar
Hu, Y. et al. Comparative analyses of copy number variations between Bos Taurus and Bos indicus. BMC Genom. 21, 1–11 (2020).
Article Google Scholar
Arlt, M. F., Wilson, T. E. & Glover, T. W. Replication stress and mechanisms of CNV formation. Curr. Opin. Genet. Dev. 22, 204–210 (2012).
Article PubMed PubMed Central CAS Google Scholar
Teo, S. M., Pawitan, Y., Ku, C. S., Chia, K. S. & Salim, A. Statistical challenges associated with detecting copy number variations with next-generation sequencing. Bioinformatics 28, 2711–2718 (2012).
Article PubMed CAS Google Scholar
Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).
Chen, L., Chamberlain, A. J., Reich, C. M., Daetwyler, H. D. & Hayes, B. J. Detection and validation of structural variations in bovine whole-genome sequence data. Genet. Selection Evol. 49, 1–13 (2017).
Google Scholar
Lee, Y. L. et al. High-resolution structural variants catalogue in a large-scale whole genome sequenced bovine family cohort data. BMC Genom. 24, 1–17 (2023).
Article Google Scholar
Wellenreuther, M. & Bernatchez, L. Eco-Evolutionary genomics of chromosomal inversions. Trends Ecol. Evol. 33, 427–440 (2018).
Article PubMed Google Scholar
Rasekh, M. E. et al. Discovery of large genomic inversions using long range information. BMC Genom. 18, 1–12 (2017).
Google Scholar
Puig, M., Casillas, S., Villatoro, S. & Cáceres, M. Human inversions and their functional consequences. Brief. Funct. Genomics. 14, 369–379 (2015).
Article PubMed PubMed Central CAS Google Scholar
Martínez-Fundichely, A. et al. InvFEST, a database integrating information of polymorphic inversions in the human genome. Nucleic Acids Res. 42, D1027–D1032 (2014).
Article PubMed Google Scholar
Dobigny, G., Britton-Davidian, J. & Robinson, T. J. Chromosomal polymorphism in mammals: an evolutionary perspective. Biol. Rev. 92, 1–21 (2017).
Article PubMed Google Scholar
Hou, Y. et al. Fine mapping of copy number variations on two cattle genome assemblies using high density SNP array. BMC Genom. 13, 1–10 (2012).
Article Google Scholar
Keel, B. N., Keele, J. W. & Snelling, W. M. Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds. Anim. Genet. 48, 141–150 (2017).
Article PubMed CAS Google Scholar
Higgins, M. G. et al. GWAS and eQTL analysis identifies a SNP associated with both residual feed intake and GFRA2 expression in beef cattle. Scientific Reports 2018 8, 14301 (2018).
Seabury, C. M. et al. Genome-wide association study for feed efficiency and growth traits in U.S. beef cattle. BMC Genomics 2017 18:1 18, 1–25 (2017).
Olivieri, B. F. et al. Genomic regions associated with feed efficiency indicator traits in an experimental Nellore cattle population. PLoS One. 11, e0164390 (2016).
Article PubMed PubMed Central Google Scholar
Melo, T. P., De Camargo, G. M. F., De Albuquerque, L. G. & Carvalheiro, R. Genome-wide association study provides strong evidence of genes affecting the reproductive performance of Nellore beef cows. PLoS One. 12, e0178551 (2017).
Article PubMed PubMed Central Google Scholar
Diedrichs, F. et al. Comparative molecular portraits of human unfertilized oocytes and primordial germ cells at 10 weeks of gestation. Int. J. Dev. Biol. 56, 789–797 (2013).
Article Google Scholar
Rafter, P. et al. Genome-wide association analyses of carcass traits using copy number variants and Raw intensity values of single nucleotide polymorphisms in cattle. BMC Genom. 22, 1–16 (2021).
Article Google Scholar
Olender, T., Lancet, D. & Nebert, D. W. Update on the olfactory receptor (OR) gene superfamily. Hum. Genomics. 3, 87–97 (2008).
Article PubMed PubMed Central CAS Google Scholar
Connor, E. E., Zhou, Y. & Liu, G. E. The essence of appetite: does olfactory receptor variation play a role? J. Anim. Sci. 96, 1551–1558 (2018).
Article PubMed PubMed Central Google Scholar
Lee, K. et al. Analysis of cattle olfactory subgenome: the first detail study on the characteristics of the complete olfactory receptor repertoire of a ruminant. BMC Genom. 14, 1–11 (2013).
Article Google Scholar
Julliard, A. K., Koborssy, A., Fadool, D., Palouzier-Paulignan, B. & D. A. & Nutrient sensing: another chemosensitivity of the olfactory system. Front. Physiol. 8, 269622 (2017).
Article Google Scholar
Fleischer, J., Bumbalo, R., Bautze, V., Strotmann, J. & Breer, H. Expression of odorant receptor Olfr78 in enteroendocrine cells of the colon. Cell. Tissue Res. 361, 697–710 (2015).
Article PubMed CAS Google Scholar
Palouzier-Paulignan, B. et al. Olfaction under metabolic influences. Chem. Senses. 37, 769–797 (2012).
Article PubMed PubMed Central CAS Google Scholar
Niimura, Y. & Nei, M. Extensive gains and losses of olfactory receptor genes in mammalian evolution. PLoS One. 2, e708 (2007).
Article ADS PubMed PubMed Central Google Scholar
Islam, S., Akhand, M. R. N. & Hasan, M. Evolutionary trend of bovine β-defensin proteins toward functionality prediction: A domain-based bioinformatics study. Heliyon 9, (2023).
Berton, M. P. et al. Genome-wide association study between copy number variation regions and carcass- and meat-quality traits in Nellore cattle. Anim. Prod. Sci. 61, 731–744 (2021).
Article CAS Google Scholar
Li, J., Mukiibi, R., Wang, Y., Plastow, G. S. & Li, C. Identification of candidate genes and enriched biological functions for feed efficiency traits by integrating plasma metabolites and imputed whole genome sequence variants in beef cattle. BMC Genom. 22, 1–12 (2021).
Article Google Scholar
Samuel, B. & Dinka, H. In Silico analysis of the promoter region of olfactory receptors in cattle (Bos indicus) to understand its gene regulation. Nucleosides Nucleotides Nucleic Acids. 39, 853–865 (2020).
Article PubMed CAS Google Scholar
Collier, R. J., Collier, J. L., Rhoads, R. P. & Baumgard, L. H. Invited review: genes involved in the bovine heat stress response. J. Dairy. Sci. 91, 445–454 (2008).
Article PubMed CAS Google Scholar
Dikmen, S. et al. The SLICK hair locus derived from senepol cattle confers thermotolerance to intensively managed lactating Holstein cows. J. Dairy. Sci. 97, 5508–5520 (2014).
Article PubMed CAS Google Scholar
Huson, H. J. et al. Genome-wide association study and ancestral origins of the slick-hair coat in Tropically adapted cattle. Front. Genet. 5, 79498 (2014).
Article Google Scholar
Sosa, F. et al. Inheritance of the SLICK1 allele of PRLR in cattle. Anim. Genet. 52, 887–890 (2021).
Article PubMed CAS Google Scholar
Weitzel, J. M. et al. Hepatic thyroid signaling of heat-stressed late pregnant and early lactating cows. J. Endocrinol. 234, 129–141 (2017).
Article PubMed PubMed Central CAS Google Scholar
Al-Thuwaini, T. M., Al‐Shuhaib, M. B. S., Lepretre, F. & Dawud, H. H. Two co‐inherited novel SNPs in the MC4R gene related to live body weight and hormonal assays in Awassi and Arabi sheep breeds of Iraq. Vet. Med. Sci. 7, 897–907 (2021).
Article PubMed CAS Google Scholar
Maurya, V. P. et al. Adaptive mechanisms of livestock to changing climate. in Climate Change Impact on Livestock: Adaptation and Mitigation 123–138 (Springer India, New Delhi, doi:https://doi.org/10.1007/978-81-322-2265-1_9. (2015).
Chapter Google Scholar
Sarubbi, J. et al. Hypothalamic neuromodulation and control of the dermal surface temperature of livestock during hyperthermia. Animals 14, 1745 (2024).
Article PubMed PubMed Central Google Scholar
Murata, H. et al. Heat stress induces oxidative stress and activates the KEAP1-NFE2L2-ARE pathway in bovine endometrial epithelial cells. Biol. Reprod. 105, 1114–1125 (2021).
Article PubMed Google Scholar
Zhang, Y. et al. Activation of transcription factor AP-1 in response to thermal injury in rat small intestine and IEC-6 cells. BMC Gastroenterol. 15, 83 (2015).
Article ADS PubMed PubMed Central Google Scholar
Daneshi, M., Caton, J. S., Caixeta, L. S., Eftekhari, Z. & Ward, A. K. Expression, regulation, and function of β-Defensins in the bovine mammary glands: current knowledge and future perspectives. Animals 13, 3372 (2023).
Article PubMed PubMed Central Google Scholar
Wu, L. et al. AMP-Activated protein kinase (AMPK) regulates energy metabolism through modulating thermogenesis in adipose tissue. Front. Physiol. 9, 122 (2018).
Article ADS PubMed PubMed Central Google Scholar
Derakhshani, H., Plaizier, J. C., De Buck, J., Barkema, H. W. & Khafipour, E. Association of bovine major histocompatibility complex (BoLA) gene polymorphism with colostrum and milk microbiota of dairy cows during the first week of lactation. Microbiome 6, 203 (2018).
Article PubMed PubMed Central Google Scholar
Taye, M. et al. Exploring the genomes of East African indicine cattle breeds reveals signature of selection for tropical environmental adaptation traits. Cogent Food Agric 4, (2018).
Colombi, D. et al. Genomic responses to Climatic challenges in beef cattle: A review. Anim. Genet. 55, 854–870 (2024).
Article PubMed CAS Google Scholar
Lacetera, N. et al. Lymphocyte functions in dairy cows in hot environment. Int. J. Biometeorol. 50, 105–110 (2005).
Article ADS PubMed Google Scholar
Chen, D., Tang, T. X., Deng, H., Yang, X. P. & Tang, Z. H. Interleukin-7 biology and its effects on immune cells: mediator of generation, differentiation, survival, and homeostasis. Front. Immunol. 12, 747324 (2021).
Article PubMed PubMed Central CAS Google Scholar
Peters, I. J. A., de Pater, E. & Zhang, W. The role of GATA2 in adult hematopoiesis and cell fate determination. Front. Cell. Dev. Biol. 11, 1250827 (2023).
Article PubMed PubMed Central Google Scholar
Gunshin, H. et al. Slc11a2 is required for intestinal iron absorption and erythropoiesis but dispensable in placenta and liver. J. Clin. Invest. 115, 1258 (2005).
Article PubMed PubMed Central CAS Google Scholar
Zhao, Z. et al. A critical functional missense mutation (T117M) in sheep MC4R gene significantly leads to Gain-of-Function. Animals 14, 2207 (2024).
Article PubMed PubMed Central Google Scholar
Ma, Y. L. et al. Copy number variation (CNV) in the IGF1R gene across four cattle breeds and its association with economic traits. Arch. Anim. Breed. 62, 171–179 (2019).
Article PubMed PubMed Central Google Scholar
Liang, C. et al. Regulatory roles of SREBF1 and SREBF2 in lipid metabolism and deposition in two Chinese representative Fat-Tailed sheep breeds. Anim. 2020. 10, 1317 (2020).
Google Scholar
Huang, J. et al. Comparative transcriptome analysis reveals that PCK1 is a potential gene affecting IMF deposition in Buffalo. BMC Genom. 21, 1–12 (2020).
Article Google Scholar
Li, G. et al. RNA-Seq analysis identifies differentially expressed genes in the longissimus dorsi of Wagyu and Chinese red steppe cattle. Int. J. Mol. Sci. 24, 387 (2023).
Article ADS CAS Google Scholar
Sejian, V., Bhatta, R., Gaughan, J. B. & Dunshea, F. R. & Lacetera, N. Adaptation of animals to heat stress. animal 12, s431–s444 (2018).
Zhong, Z. et al. MORC proteins regulate transcription factor binding by mediating chromatin compaction in active chromatin regions. Genome Biol. 24, 96 (2023).
Article PubMed PubMed Central CAS Google Scholar
He, F., Ru, X. & Wen, T. NRF2, a transcription factor for stress response and beyond. Int. J. Mol. Sci. 2020. 21, 4777 (2020).
Article CAS Google Scholar
Lee, J. et al. Activation of Ets-2 by oxidative stress induces Bcl-xL expression and accounts for glial survival in amyotrophic lateral sclerosis. FASEB J. 23, 1739–1749 (2009).
Article PubMed PubMed Central CAS Google Scholar
Rosati, V. C., Quinn, A. A., Gleadow, R. M. & Blomstedt, C. K. The putative GATA transcription factor SbGATA22 as a novel regulator of Dhurrin biosynthesis. Life 14, 470 (2024).
Article ADS PubMed PubMed Central CAS Google Scholar
Ibeagha-Awemu, E. M. & Zhao, X. Epigenetic marks: regulators of livestock phenotypes and conceivable sources of missing variation in livestock improvement programs. Front. Genet. 6, 139313 (2015).
Article Google Scholar
Jiang, X. X. et al. Epigenetic regulation of antibody responses by the histone H2A deubiquitinase MYSM1. Sci. Rep. 5, 13755 (2015).
Article ADS PubMed PubMed Central Google Scholar
Liu, H. Y. et al. Dietary conjugated Linoleic acid modulates the hepatic circadian clock program via PPARα/REV-ERBα-Mediated chromatin modification in mice. Front. Nutr. 8, 711398 (2021).
Article PubMed PubMed Central Google Scholar
Shah, R. B. et al. FANCI functions as a repair/apoptosis switch in response to DNA crosslinks. Dev. Cell. 56, 2207–2222e7 (2021).
Article PubMed PubMed Central CAS Google Scholar
Rovani, B. T. et al. Analysis of nuclear maturation, DNA damage and repair gene expression of bovine oocyte and cumulus cells submitted to ionizing radiation. Anim. Reprod. 20, e20230021 (2023).
Article PubMed PubMed Central Google Scholar
Patel, D. S., Misenko, S. M., Her, J. & Bunting, S. F. BLM helicase regulates DNA repair by counteracting RAD51 loading at DNA double-strand break sites. J. Cell Biol. 216, 3521–3534 (2017).
Article PubMed PubMed Central CAS Google Scholar
Gorynia, S. et al. Structural and functional insights into a Dodecameric molecular machine – The RuvBL1/RuvBL2 complex. J. Struct. Biol. 176, 279–291 (2011).
Article PubMed CAS Google Scholar
Amills, M. & Bouzat, J. L. Characterization of the bovine BCL2L1 gene and related pseudogenes. Anim. Genet. 34, 457–461 (2003).
Article PubMed CAS Google Scholar
Wu, D. J. et al. Autophagy-linked FYVE containing protein WDFY3 interacts with TRAF6 and modulates RANKL-induced osteoclastogenesis. J. Autoimmun. 73, 73–84 (2016).
Article PubMed PubMed Central CAS Google Scholar
Baker, M. J. et al. Stress-induced OMA1 activation and autocatalytic turnover regulate OPA1-dependent mitochondrial dynamics. EMBO J. 33, 578–593 (2014).
Article PubMed PubMed Central CAS Google Scholar
Gilkerson, R., Kaur, H., Carrillo, O. & Ramos, I. OMA1-Mediated mitochondrial dynamics balance organellar homeostasis upstream of cellular stress responses. Int. J. Mol. Sci. 2024. 25, 4566 (2024).
Article CAS Google Scholar
Porto-Neto, L. R. et al. The genetic architecture of Climatic adaptation of tropical cattle. PLoS One. 9, e113284 (2014).
Article ADS PubMed PubMed Central Google Scholar
Taye, M. et al. Whole genome detection of signature of positive selection in African cattle reveals selection for thermotolerance. Anim. Sci. J. 88, 1889–1901 (2017).
Article PubMed CAS Google Scholar
Ly, T. D. et al. The impact of inflammatory stimuli on Xylosyltransferase-I regulation in primary human dermal fibroblasts. Biomedicines 10, 1451 (2022).
Article PubMed PubMed Central CAS Google Scholar
Fu, C. T. et al. An evolutionarily conserved PTEN-C/EBPα-CTNNA1 axis controls myeloid development and transformation. Blood 115, 4715–4724 (2010).
Article PubMed PubMed Central CAS Google Scholar
Kelwick, R., Desanlis, I., Wheeler, G. N. & Edwards, D. R. The ADAMTS (A disintegrin and metalloproteinase with thrombospondin motifs) family. Genome Biol. 16, 1–16 (2015).
Article CAS Google Scholar

Download references

Acknowledgements

The authors acknowledge the São Paulo Research Foundation (FAPESP) for the financial support grants number #2009/16118-5, #2017/10630-2, #2018/20026-8 and #2022/11852-7.

Funding

This research was funded by São Paulo Research Foundation (FAPESP) grant numbers #2009/16118-5, #2017/10630-2, #2018/20026-8, and #2022/11852-7. Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brazil (CAPES), finance code 001.

Author information

Authors and Affiliations

School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Via de Acesso Prof. Paulo Donato Castelane, Jaboticabal, 14884-900, SP, Brazil
Natalia A. Marín-Garzón, Lucio F. M. Mota, Giovana Vargas, Leonardo M. Arikawa, Larissa F. S. Fonseca, Gerardo A. Fernandes Júnior & Lucia G. Albuquerque
CSIRO Agriculture and Food, Hobart, TAS, 7000, Australia
Roberto Carvalheiro
National Council for Science and Technological Development, Brasilia, 71605-001, DF, Brazil
Lucia G. Albuquerque

Authors

Natalia A. Marín-Garzón
View author publications
Search author on:PubMed Google Scholar
Lucio F. M. Mota
View author publications
Search author on:PubMed Google Scholar
Giovana Vargas
View author publications
Search author on:PubMed Google Scholar
Leonardo M. Arikawa
View author publications
Search author on:PubMed Google Scholar
Larissa F. S. Fonseca
View author publications
Search author on:PubMed Google Scholar
Gerardo A. Fernandes Júnior
View author publications
Search author on:PubMed Google Scholar
Roberto Carvalheiro
View author publications
Search author on:PubMed Google Scholar
Lucia G. Albuquerque
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: N.A.M.G., L.F.S.F., L.M.A. and L.F.M.M.; methodology: N.A.M.G., G.V., L.F.S.F. and G.A.F.J.; resources: L.G.A.; data curation: N.A.M.G., G.V. G.A.F.J. and L.F.M.M.; writing original draft preparation: N.A.M.G.; writing review and editing: L.F.M.M. and L.M.A.; visualization: L.F.M.M. and L.M.A.; supervision: R.C. and L.G.A; project administration: L.G.A. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Lucio F. M. Mota.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics declarations

The animal procedures and data sampling presented in this study were approved and performed following the Animal Care and Ethical Committee recommendations of the São Paulo State University (UNESP), School of Agricultural and Veterinary Science (protocol number 18.340/16). All the data sampling was performed following the CEUA/ FCAV-UNESP guidelines and regulations in agreement with the Regulations for the Administration of Affairs Concerning Experimental Animals (Ministry of Science and Technology, Brazil). In addition, we confirmed the statement that the study was conducted following the ARRIVE guidelines.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Marín-Garzón, N.A., Mota, L.F.M., Vargas, G. et al. Detection and functional assessment of structural variants using whole-genome re-sequencing data in Nellore cattle. Sci Rep 15, 30364 (2025). https://doi.org/10.1038/s41598-025-14139-0

Download citation

Received: 17 January 2025
Accepted: 29 July 2025
Published: 19 August 2025
Version of record: 19 August 2025
DOI: https://doi.org/10.1038/s41598-025-14139-0

Subjects

Abstract

Similar content being viewed by others

Comprehensive detection of structural variations in long and short reads dataset of French cattle

SVLearn: a dual-reference machine learning approach enables accurate cross-species genotyping of structural variants

Whole-genome resequencing and genetic diversity of five indigenous cattle breeds from China

Introduction

Materials and methods

DNA samples and re-sequencing

Detection of structural variants

Filtering of called structural variants

Identification of structural variant regions (SVRs)

Statistical analysis and graphical visualization of SVs and SVRs

Functional analysis of highly frequent SVR

Results and discussion

Genome-wide SV detection

Number of high-confidence SVs

Length classification of high-confidence SVs

Chromosomal location and distribution of high-confidence SVs

Chromosomal extension covered by high-confidence SVs

Discovery of potential structural variant regions (SVR)

Number, chromosomal location and distribution of SVRs

Average length and autosomal genome extension covered by SVRs

Functional annotation and enrichment of SVR

Functional insights from genes related to high-frequency SVRs

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics declarations

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links