Abstract
The A2 β-casein milk variant has attracted increasing scientific interest due to its potential health benefits. This has led to a growing trend in certain countries toward breeding A2A2 genotype cows to produce milk containing only A2 β-casein. However, it is unknown whether the β-casein genotype (A1A1 vs A2A2) is associated with gene expression differences in the mammary gland, potentially affecting milk synthesis or composition. Therefore, this study aimed to compare the milk fat globule (MFG) transcriptome of cows with different β-casein genotypes (A1A1 vs. A2A2) using RNA-seq (n = 14). The functional enrichment analysis of highly expressed genes in MFG revealed a strong representation of genes involved in protein synthesis, milk fat secretion and cellular energy metabolism, key processes essential for lactation. The genes encoding α-S1-casein (CSN1S1), β-lactoglobulin (PAEP/BLG), β-casein (CSN2), glycosylation-dependent cell adhesion molecule-1 (GLYCAM1), cytochrome c oxidase subunits I and III (COX1 and COX3), and κ-casein (CSN3) were found as the most abundant transcripts in MFG, with mean values of 90,401, 71,022, 53,579, 49,494, 38,551, 33,584, and 30,791 FPKM, respectively. Differential expression analysis between the two experimental groups revealed minimal differences at the gene level, with only two genes identified significantly different out of the 11,180 and 11,257 genes expressed in MFG (identified with DESeq2 and EdgeR packages, respectively). Both methods identified NADH dehydrogenase subunit 6 (ND6) as down-regulated in A2A2 cows, with DESeq2 additionally detecting down-regulation of 16S mitochondrial rRNA (16S Mt-rRNA). ND6 and 16S Mt-rRNA are mitochondrial genes involved in oxidative phosphorylation and energy production, and their reduced expression may suggest slightly impaired mitochondrial activity in A2A2 animals. Overall, these findings suggest that the β-casein genotype (A1A1 vs. A2A2) is not strongly associated with gene-level transcriptomic differences in the mammary gland, which could be of interest in the context of current selective breeding strategies for A2A2 genotype. However, other aspects such as post-transcriptional regulatory mechanisms should be further analysed.
Similar content being viewed by others
Introduction
Cow’s milk contains approximately 3.4% of total protein, of which 78–80% are caseins1. Among then, β-casein is the second most abundant casein type in cow’s milk, after αs1-casein, comprising 30–37% of the total casein content1,2. The β-casein gene (CSN2), located on the casein (CN) locus of chromosome 6, is highly polymorphic. At least 17 genetic variants have been described3,4, with A1 and A2 being the most prevalent in commercial dairy populations5,6. These two variants differ by a single nucleotide acid polymorphism (SNP), resulting in histidine (A1 β-casein)-to-proline (A2 β-casein) at position 67 of the protein7,8, potentially affecting its digestibility and interaction with gastrointestinal enzymes. This SNP has attracted interest due to the formation of the bioactive peptide β-Casomorphin-7 (BCM-7)9,10 during digestion of A1 β-casein, which has been associated with gastrointestinal effects such as delayed transit and inflammation11,12,13,14,15,16. Contrastingly, milk containing exclusively A2 β-casein, has been associated with softer stools17 and improved digestive comfort12,14,18,19,20,21. Therefore, this type of milk is already widely available commercially as an alternative for those with gastrointestinal discomfort12,19.
Despite some of the uncertainties, particularly the limited number of clinical studies in humans and the absence of sufficient scientific evidence regarding the health effects of A1 β-casein2,22,23, a trend toward selective breeding of A2A2 cows has emerged. Global semen trade companies have introduced the A2A2 genotype in their sire directories as a trait of interest in breeding programs. Furthermore, dairy products containing only A2 variant produced by cows that carry A2A2 genotype are already available in several regions, such as New Zealand, North America, Australia, China, the United Kingdom, the Netherlands, Italy, and Spain, often at premium prices2,24,25,26. This type of milk represents an opportunity to add value to dairy products and may offer a diversification strategy for small-scale dairy farms in competitive milk market25,27.
Nevertheless, beyond consumer heath, the inclusion of A2 β-casein genotype as a selection criterion also raises questions about potential impacts on milk synthesis and composition. While numerous association studies have investigated the relationship between β-casein genotypes and milk production traits, results remain inconsistent across populations and should be interpreted with caution. Some found no significant associations28,29,30, but others linked A2A2 to higher milk/protein yield and lower fat percentage or somatic cell count27,31,32,33. Moreover, at proteomic level, Wang et al.34 showed that specific proteins casein micelles, whey and milk fat globule membrane (MFGM) may vary with β-casein genotype. They observed that A1A1 milk was enriched in ceruloplasmin, protein S100-A9, and cathelicidin-2, whereas A2A2 milk showed higher levels of lactoferrin and CD5L, proteins involved in immune response and lipid metabolism. These compositional differences may reflect underlying molecular mechanisms regulated at the transcriptional level. Indeed, milk protein genes, although not regulatory per se, can include SNPs within their coding gene sequence that may be in linkage disequilibrium with cis-acting regulatory elements or may influence transcript splicing and stability35,36. For example, in Huang et al.35 associations between SNPs in casein genes and the concentrations of major caseins and total protein content were reported, suggesting possible genotype-dependent variation in gene expression.
High-throughput transcriptomic technologies such as RNA-seq enable to investigate whether such genetic variation translates into differences in gene expression in the mammary gland37,38,39,40,41,42,43,44. Previous studies have demonstrated that milk fat globules (MFG), can entrap variable amounts of cytoplasmic material from mammary epithelial cells (MEC), providing a non-invasive alternative to tissue biopsies. This approach does not disrupt the normal lactation process and allows for repeated sampling during lactation45,46,47, allowing representative transcriptomic profiling of the lactating mammary gland47,48.
This transcriptomic approach offered a promising framework to explore whether β-casein genotype would be associated with differential gene expression patterns during lactation in the context of A2A2-based breeding strategies. Building on this rationale, we hypothesised whether significant differences exist in gene expression profiles or key metabolic pathways involved in milk synthesis and composition between cows with different β-casein genotypes. Accordingly, the aim of the present study was to compare the MFG transcriptome of Holstein cows with A1A1 and A2A2 genotypes, using conventional gene-level differential expression analysis.
Results
RNA-sequencing, mapping and gene expression estimation
Upon RNA-Seq analysis, a total of 740 million reads were generated from sequencing 14 cDNA libraries, resulting in an average of 52.7 million paired-end reads of 151 bp in length per sample. Results indicated that the quality of the sequencing data was high, with 94% of bases scoring Q30 or above for all samples. In addition, a high alignment rate was obtained with an average of 86.02% of reads mapped to unique positions in the protein coding regions of the bovine reference genome (Table 1).
This alignment rate was similar to the results obtained in RNA samples isolated from milk somatic cells of dairy sheep49. Contrastingly, a lower alignment rate was obtained by Cánovas et al.38 (60–75%) for the RNA isolated from five different sources in the mammary gland (including milk somatic cells, antibody-captured milk mammary epithelial cells (mMEC), laser microdisected mammary epithelial cells (LMEC), mammary gland tissue and MFG), and by Wickramasinghe et al.41 (65%) for RNA isolated from milk somatic cells of dairy cows.
After the removal of lowly expressed genes, only genes with more than 10 counts in at least 6 samples were retained. This resulted in 11,180 genes identified using the DESeq2 package and 11,257 genes using the EdgeR package as expressed in the analysed MFG transcriptomes. The gene expression level distribution in the MFG transcriptome is represented in Fig. 1. Based on other studies, genes were categorised into normalised read count groups as follows40,41,49: lowly expressed genes (< 10 FPKM), medium expressed genes (≥ 10 FPKM to 500 FPKM), highly expressed genes (≥ 500 to 4000 FPKM) and most highly expressed genes (≥ 4000 FPKM).
Gene expression level distribution in the milk fat globule (MFG) transcriptome analysed by RNA-sequencing (RNA-seq). Genes were categorised into fragments per kilobase per million mapped reads (FPKM) groups as follows: lowly expressed genes (< 10 FPKM), medium expressed genes (≥ 10 FPKM to 500 FPKM), highly expressed genes (≥ 500 to 4000 FPKM), and most highly expressed genes (≥ 4000 FPKM). The 11,257 expressed genes with EdgeR are represented.
Gene ontology enrichment analysis of highly expressed genes in the milk fat globules
We first examined the global transcriptomic activity of the mammary gland of A1A1 and A2A2 cows, focusing on the most highly expressed genes. A total of 142 genes were identified as highly expressed in the analysed samples, having an average of FPKM ≥ 500. To further investigate the functional associations of these highly abundant genes in the bovine MFG, the DAVID GO terms enrichment analysis was carried out37,50,51. The significantly enriched GO terms were classified into 27 functional groups, being the 142 highly expressed genes enriched for 8 molecular functions terms and 15 biological processes terms (FDR < 0.05). The most significant enriched terms for each category are shown in Fig. 2 and 3.
The specific genes involved in each function and process are detailed in Supplementary Table S1 and S2. The “Structural constituent of ribosome” was the most significantly enriched molecular function (FDR = 2.24 × 10–82), which included 67 highly expressed genes and accounted for the 50% of the molecular function terms. Other GO terms related to protein synthesis machinery were “RNA binding” (FDR = 8.62 × 10–14), including 29 genes, followed by “rRNA binding” (FDR = 1.63 × 10–09) with 10 genes, “translation elongation factor activity” (FDR = 1.68 × 10–05) with 6 genes and “large ribosomal subunit rRNA binding” (FDR = 0.019) with 3 genes. Apart from that, two molecular functions related to the cellular energy supply machinery were detected: the “NADH dehydrogenase (ubiquinone) activity” (FDR = 8.57 × 10–06) and the “cytochrome-c oxidase activity” (FDR = 5.36 × 10–04), which comprised 6 and 4 genes, respectively. In respect to biological processes, “translation” was the most enriched GO term (FDR = 2.26 × 10–62), which included 52 genes. Similarly, other biological processes in relation with translation process were the GO terms “cytoplasmic translation” (FDR = 2.10 × 10–19), “ribosomal small subunit biogenesis” (FDR = 9.26 × 10–18) and “translation elongation” (FDR = 3.87 × 10–08), comprising 17, 15 and 7 genes, respectively. Furthermore, four biological processes related to hormonal regulation, namely “response to 11-deoxycorticosterone” (FDR = 3.36 × 10–07), “response to dehydroepiandrosterone” (FDR = 3.36 × 10–07), “response to progesterone” (FDR = 7.50 × 10–05) and “response to estradiol” (FDR = 8.71 × 10–05) were also significant, including all of them the milk protein genes encoding α-S1-casein (CSN1S1), α-S2-casein (CSN1S2), β-lactoglobulin (PAEP/LGB), β-casein (CSN2), κ-casein (CSN3) and α-lactoalbumin (LALBA). Finally, biological processes related to the mitochondria and the cellular energy supply were also enriched with GO terms “mitochondrial electron transport” (FDR = 5.32 × 10–04), “electron transport coupled proton transport” (FDR = 2.40 × 10–04), and “ATP synthesis coupled electron transport” (FDR = 2.40 × 10–04), which included 5, 4 and 4 genes, respectively.
The results of the DAVID functional annotation clustering (FAC) analysis (Fig. 5 and 6), carried out with the 142 highly expressed genes, showed that these genes are mainly related to protein synthesis, more specifically, to the translation machinery, translation process (including initiation, elongation and termination), translation regulation and, post-translational modifications (annotation clusters 1, 2, 3, 4, 5, 7, 10, 11, 12). In addition, significant clusters were found for hormonal regulation of the mammary gland (annotation cluster 6) and mitochondrial processes involved in cellular energy supply (annotation clusters 8 and 9).
The most highly expressed genes in MFG transcriptome are shown in Table 2 (top 25 transcripts with the highest abundance, FPKM ≥ 4000). The milk protein genes CSN1S1, PAEP/BLG, CSN2 and the gene encoding for the protein glycosylation-dependent cell adhesion molecule (GLYCAM1) were the genes showing the highest expression levels with mean values of 90,400, 71,021, 53,578 and 49,494 FPKM, respectively. Although the other main milk protein genes, CSN1S2, CSN3 and LALBA, were also among the top 25 transcripts, they exhibited lower expression levels, with mean values of 30,790 FPKM, 25,112 FPKM, and 17,897 FPKM, respectively. In addition, several genes involved in cellular energy production were highly expressed, including the cytochrome c oxidase (COX) subunits 1, 2 and 3 (COX1, COX2 and COX3); the NADH-dehydrogenase (ND) subunits 1, 3 and 4 (ND1, ND3 and ND4); the ATP synthase F0 subunit 6 (ATP6); and the cytochrome b (CYTB) genes. In particular, COX1 and COX3 genes had significant expression and, although the differences were not statistically significant, COX1 showed average expression levels of 25,253 FPKM in A2A2 cows and 54,066 FPKM in A1A1 cows, while COX3 had 23,226 FPKM in A2A2 cows compared to 45,669 FPKM in A1A1 cows.
Furthermore, the results indicated that from the ribosomal protein genes, which accounted for 48.9% of the highly expressed genes, only three showed FPKM values higher than 4000. Specifically, ribosomal proteins lateral stalk subunit P0, lateral subunit P1, L23 and S2 (RPLP0, RPLP1, and RPS2). In relation to genes associated with milk fat, perilipin 2 (PLIN2) with 21,930 FPKM, fatty acid binding protein 3 (FABP3) with 10,090 FPKM, and xanthine dehydrogenase/oxidase (XDH) with 5,087 FPKM were found among the most highly expressed genes in the MFG transcriptome.
Differentially expressed genes between cows with differential β-casein genotype
Differential expression analysis was performed using the total number of genes expressed in the MFG transcriptome: 11,180 genes identified using the DESeq2 package and the 11,257 genes using the EdgeR, respectively. The analysis with DESeq2 identified two genes as differentially expressed: 16S rRNA and NADH dehydrogenase subunit 6 gene (ND6) (log10(Padj value) < 0.05 and log2FC >|1|). Both genes were down-regulated in A2A2 genotype cows with respect to A1A1. With EdgeR, two differentially expressed genes (DEGs) were also identified: ND6, down-regulated in A2A2 genotype cows with respect to A1A1, and calpain-6 gene (CAPN6), which was up-regulated in A2A2 genotype cows (FDR < 0.05 and log2FC >|1|). The differential expression analysis results are shown in Table 3.
Discussion
The use of MFG as a source of RNA for transcriptomic profiling in the present study was based on the evidence of that this milk-derived material can entrap variable amounts of cytoplasmic material from MEC43,45 and their unique advantage as a non-invasive and more ethical sampling. Consequently, MFG could contain transcripts that directly originate from MEC, supporting their use as a proxy for mammary gland transcriptomes (Fig. 4). Indeed, they have been successfully used in previous studies to explore gene expression dynamics related to milk synthesis and activity in the mammary gland40,42,43,52,53,54,55. Cánovas et al.38 further demonstrated a strong correlation between gene expression profiles from MFG and mammary gland tissue (Pearson r = 0.88–0.92), highlighting their transcriptomic representativeness. Additionally, a recent study reported that approximately an 80% of similarity in miRNAs was shared between MFG and mammary tissue, further supporting their biological relevance.
Milk fat globule (MFG) formation in mammary epithelial cells (MEC). During MFG formation, triacylglycerol (TAG) cores are surrounded by a phospholipid monolayer derived from the endoplasmic reticulum, giving rise to lipid droplets (LDs). These LDs are then enveloped by an outer bilayer membrane originating from the apical plasma membrane of MEC, forming the milk fat globules (MFG). The resulting MFGs are enclosed by a trilayered milk fat globule membrane (MFGM), which includes both the phospholipid monolayer from the ER and the bilayer from the plasma membrane of MEC. MFG may enclose variable amounts of MEC cytoplasm between the monolayer and bilayer membranes, forming cytoplasmic crescents, which may also contain RNA in varying amounts.
However, the MFG fraction also presents technical challenges. Due to their low cytoplasmic content and enriched small RNAs56, MFG contain low amounts of ribosomal RNA (18S and 28S)40,57, resulting in consistently low RIN values. In our study, the mean RIN was 3.59 ± 0.27, below conventional thresholds established for RNA-seq studies. Nevertheless, other RNA quality indicators, such as RNA IQ, and A260/280 ratio, and sequencing parameters (e.g., read depth, base quality, alignment rate), confirmed the suitability of the RNA extracts as described previously58. Moreover, the limitations of RIN as a universal metric for RNA quality have been highlighted in other low-rRNA samples, such as spermatozoa, where small RNAs predominate and make RIN values unreliable as a unique quality criterion59. Similarly, transcriptomic studies of milk samples in other dairy contexts have reported acceptable RNA-seq performance despite low RIN values56,60. Altogether, these findings support the use of MFG as a biologically informative and technically viable RNA source, but their particularities regarding RIN values should be taken into account, and it is essential to ensure that other quality parameters such as A260/280 ratio, RNA IQ, and sequencing metrics are properly met.
During lactation, the demand for protein and energy increases substantially. Protein synthesis requires ribosomal components, amino acids, and large amounts of energy61. In this study, 48.9% of the 142 highly expressed genes (average FPKM ≥ 500) encoded ribosomal proteins, which are key for ribosome biogenesis, assembly, and translation, essential processes to meet lactation demands37,62,63,64. The subsequent GO analysis showed that “structural constituent of ribosome” (GO:0003735) was the most enriched molecular function, grouping 67 of the 142 genes, while “translation” (GO:0006412) was the top biological process with 52 associated genes (Fig. 2 and 3). To make the biology clearer, the DAVID Functional Annotation Clustering (FAC) was conducted (Fig. 4 and 5) to group together a significant number of genes that share similar annotations65. Both GO terms and related term were grouped into cluster 1 (enrichement score: 74.42), including “ribosome” (bta03010), “ribosomal protein” (KW-0689), “ribonucleoprotein” (KW-0687), and “cytosolic large ribosomal subunit” (GO:0022625).
Bubble map of enriched clusters of DAVID functional annotation clustering (FAC) of the highly expressed genes (average FPKM ≥ 500) in the milk fat globule (MFG) transcriptome of cows with A1A1 and A2A2 β-casein genotypes. The figure shows clusters with enrichment score values higher than 4. Clusters with an enrichment score > 1 and FDR < 0.05 were considered significant. Count refers to the number of genes involved in the term and the percentage is calculated as (involved genes/total genes). FDR = false discovery rate.
Bubble map of enriched clusters of DAVID FAC of the highly expressed genes in the milk fat globule (MFG) transcriptome of cows with A1A1 and A2A2 β-casein genotypes. The figure shows clusters with enrichment score values lower than 4. Clusters with an enrichment score > 1 and FDR < 0.05 were considered significant. Count refers to the number of genes involved in the term and the percentage is calculated as (involved genes/total genes). FDR = false discovery rate.
Moreover, annotation clusters 2, 3, 4, 5, 7, 10, 11, and 12 were associated with protein synthesis. Cluster 2 (enrichment score: 29.39) grouped genes involved in post-translational modifications, including “isopeptide bond” (KW-107), “crosslink Glycyl lysine isopeptide (Ly-Gly) interchain with G-Cter in SUMO2” (CROSSLINK), and “ubI conjugation” (KW-0832). Cluster 11 (score: 1.64) focused on the ubiquitin system, with terms as “ubiquitin protein ligase binding” (GO:0,031,625) and “ubiquitin-dom” (IPR019956), key to tagging proteins for degradation. Both clusters are linked by ubiquitin-like modifiers such as SUMO2, which conjugates to proteins via isopeptide bonds and facilitates various post-translational modifications processes such as nuclear transport, DNA repair, and proteasomal degradation66,67. Cluster 3 (enrichment score: 21.09) included terms such as “cytosolic small ribosomal subunit” (GO:0022627), “ribosomal small subunit biogenesis” (GO:0042274), “nucleolus” (GO:0005730), and “small-subunit processome” (GO:0032040), all related to ribosomal small subunit biogenesis in the nucleolus68. Cluster 4 (enrichment score: 8.03) featured “RNA binding” (KW-0699) and “rRNA binding” (GO:0019843), reflecting the central role of RNA-binding proteins (including ribosomal proteins and ribosome biogenesis factors) and RNA molecules in ribosome function69. These RNA-binding proteins are also crucial for numerous biological processes, ranging from transcription and splicing to intracellular transport, translation, and decay69. Cluster 5 (enrichment score: 5.42) included “translation prot SH3 like sf” (IPR008991), referring to the domains with SH3-like topology, which are found in proteins associated with translation machinery. In addition, cluster 7 (enrichment score: 3.95) was related to the translation elongation, whilst cluster 10 (enrichment score: 1.74) was involved in translation regulation. Finally, cluster 12 (enrichment score: 1.46) featured Zinc Finger domains, which are a family of transcriptional factors with different DNA-binding domains involved in cell differentiation, development, or apoptosis70. Hence, GO and FAC analysis of highly expressed genes in the bovine MFG transcriptome corroborated that the most were associated with protein synthesis.
Protein synthesis is an active and energy-intensive process involving transcription, translation, and protein folding, thus, requiring the expression of genes related to cellular energy production37,71,72. Lactation further increases these demands, presenting a significant metabolic challenge for female mammals due to hormonal fluctuations and rising energy requirements, particularly from early to peak lactation72,73. In this study, GO analysis revealed significant enrichment of the molecular function terms “NADH dehydrogenase (ubiquinone) activity” (GO:0008137) and “cytochrome-c oxidase activity” (GO:0004129), corresponding to respiratory chain complex I and IV, respectively. Additionally, enriched biological processes in MFG included “mitochondrial electron transport” (GO:0006120), “electron transport coupled proton transport” (GO:0015990), and “ATP synthesis coupled electron transport” (GO:0042773). Moreover, annotation cluster 8 (enrichment score: 3.45) and cluster 9 (enrichment score: 2.43) grouped these GO terms with others from KEGG and UP_KW, such as “respiratory chain” (KW-0679), ‘electron transport” (KW-0249), “oxidative phosphorylation” (bta00190), “mitochondrial respiratory chain complex IV” (GO:0005751), and “mitochondrion” (KW-0496). These results would highlight the central role of mitochondria in energy production to support milk synthesis, as they generate up to 90% of cellular ATP through oxidative phosphorylation72,73.
In relation to milk proteins, caseins and whey proteins are the major proteins, and their synthesis is hormonally regulated41,61,71,74. In particular, prolactin, growth hormone, thyroid hormone and corticosteroids (including glucocorticoids and mineralcorticoids), play direct roles in milk synthesis71. GO analysis showed enrichement of biological processes such as “response to dehydroepiandrosterone” (GO:1,903,494), “response to 11-deoxycorticosterone” (GO:1903496), “response to progesterone” (GO:0032570) and “response to stradiol” (GO:0032355) (Fig. 3). The milk protein encoding genes CSN1S1, CSN1S2, CSN2, CSN3 and LALBA were associated to these processes. Furthermore, the cluster 6 (enrichment score: 4.28) grouped these GO term with INTERPRO and UP_KW entries related to milk proteins, such as “casein” (IPR001588), “milk protein” (KW-0494) or “secreted” (KW-0964), reinforcing their link hormonal regulation. The results are consistent with previous studies; for instance, Jia et al.50 found similar hormonal process enrichment in a proteomic study of the MFGM in Holstein cow. Dehydroepiandrosterone (DHEA), 11-deoxycorticosterone (11DOC), estradiol and progesterone are endogenous steroid hormones released by endocrine glands and cells, distributed to various organs and tissues via bloodstream, and regulate organ metabolism and physiological functions50,75. DHEA and 11DOC modulate responses to stressful stimuli50,76,77, with DHEA having anti-inflammatory and antioxidant properties with protective and regenerative functions50. Estrogens, including estradiol and progesterone, promote MEC proliferation and differentiation in preparation for lactation74,78.
Regarding the most highly expressed genes in the MFG transcriptome (FPKM > 4000; Table 2), 25 genes out of the 11,257 detected genes exhibited very high expression levels (EdgeR), accounting for 63.5% of the total reads (Table 2). This result suggests that a small number of genes contribute to the majority of the RNA transcripts from MFG. Notably, the top-expressed genes included those encoding major milk proteins, namely the four caseins (CSN1S1, CSN1S2, CSN2, and CSN3) and the two whey proteins (PAEP/BLG and LALBA). This align with the previous reports from bovine mammary gland studies by Cánovas et al.38, Wickramasinghe et al.41 and Becket et al.40. Among them, CSN1S1 and PAEP/BLG genes showed the highest FPKM values, consistent with Fang et al.79, who analysed RNA-seq data from 91 tissues and cell types, including mammary gland.
GLYCAM1 gene was also among the most highly expressed genes, with an average FRKM of 49,494. GLYCAM1 is MFGM-specific protein in milk and one of the most abundant in this fraction80. In the mammary gland, this gene may play a role in the cellular transport and secretion of MFG, while also providing a protective function for immunologically immature offspring81. GLYCAM1 expression pattern is similar to milk proteins, being induced during pregnancy and lactation under the hormonal control of prolactin and progesterone82.
Throughout lactation, milk fat yield increases due to enhanced de novo fatty acid synthesis in the mammary gland83. Indeed, several lipid-metabolism related genes were among the top-25 most highly expressed. Notably, PLIN2 showed the highest expression (21,930 FPKM). This protein, located in the MFGM84,85,86,87,88, is involved in triglyceride accumulation and act as a transcriptional regulator of other milk fat genes64. Additionally, FABP3 gene, with 10,090 FPKM, plays a crucial role in triglyceride formation by transporting long-chain fatty acids into the nucleus and activating key receptors such as the peroxisome proliferator-activated receptor gamma (PPARγ)61,89. During milk fat synthesis, these triglyceride cores are initially enveloped by a monolayer membrane derived from the endoplasmic reticulum, forming lipid droplets. These droplets are then surrounded by an outer bilayer membrane from the apical plasma membrane of MEC, leading to the formation of MFG86,90,91,92. The XDH gene (5,808 FPKM), encoding a redox enzyme XDH, plays a key role in the incorporation of lipid droplets into the apical plasma membrane and the subsequent secretion of MFG. Vorbach et al.93 demonstrated that the knockout of XDH impairs MFG secretion, as it is essential for the proper function of proteins such as butyrophilin (BTN1A1)94. Therefore, the higher expression levels of XDH observed compared to BTN1A1 (625.83 FPKM), are consistent with the essential role of XDH. BTN1A1 is also vital for the regulated secretion of lipid droplets, serving both as a structural and signalling receptor through its interaction with XDH94,95,96.
The presence of genes associated with the protein synthesis machinery (RPLP0, RPLP1, RPS2 and RACK1) among the most highly expressed is consistent with the increase protein synthesis demands during lactation, as previously mentioned. Additionally, TPT1 gene, involved in essential functions for efficient milk production during lactation, such as cell survival, protein synthesis and calcium regulation97,98, also showed high expression (6,674 FPKM). Finally, the FTH1 gene with an expression level of 4,260 FPKM, engaged in iron homeostasis, is involved in various cellular functions, including cellular proliferation and immune responses99.
Several genes related to cellular energy supply were also abundantly expressed, notably components of the mitochondrial respiratory chain: cytochrome c oxidase (COX1, COX2 and COX3) and NADH-dehydrogenase (ND1, ND3 and ND4), along with CYTB and ATP6 genes. These genes, part of complexes IV and I, III and V, contribute to electron transport and proton pumping aerobic energy generation in the form of ATP synthesis via mitochondrial oxidative phosphorylation73,100.
Taken together, all the highly expressed genes detected in our study are commonly expressed during lactation in cows61,86, results that are consistent with previous studies developed in the MFG38,40,42.
After characterising the overall transcriptomic profile of mammary gland through MFG analysis, we next assessed whether there were differential gene expression patterns between cows carrying the A1A1 and A2A2 β-casein genotypes. Only two DEGs were detected with each statistical package. The ND6 gene, involved in energy production machinery via mitochondrial electron transport and proton pumping for ATP synthesis73, was down-regulated in A2A2 cows (log2FC = −1.3567; DESeq2). The average expression of ND6 was 86 FPKM. Similarly, the 16S Mt-rRNA gene, a key component of the mitochondrial ribosomes, called mitoribosomes101,102, showed lower expression in A2A2 cows (log2FC = −1.3086; DESeq2), with average expression level of 6,484 FPKM in the A1A1 cows versus 2,607 in the A2A2 cows. These results may suggest slightly enhanced mitochondrial activity in A1A1 cows. On the other hand, EdgeR analysis identified ND6 as down-regulated in A2A2 cows with respect to A1A1 (log2FC = −1.3433), and calpain-6 gene (CAPN6), which was up-regulated in A2A2 cows (log2FC = 3.7758). Calpain genes encode a family of complex multi-domain intracellular cysteine proteases that perform limited cleavage of target proteins in response to calcium signalling103. However, the specific role of this gene in the MEC remains unclear104. The overall DEG count was very low, with only two DEGs out of 11,257 genes expressed in MFG, as shown in MA plot (Fig. 7). While the functional analysis of highly expressed genes revealed transcriptomic signatures related to milk synthesis and mammary epithelial activity, the minimal number of DEGs identified between genotypes limits the depth of genotype-specific functional interpretation. This may reflect either a true biological similarity between A1A1 and A2A2 cows or the limited sensitivity of differential gene expression analysis. Future studies using alternative analytical approaches, such as isoform-level differential expression analysis105,106, may help uncover additional regulatory differences not detectable in a gene-level analysis, or alternatively confirm the near absence of transcriptomic divergence between genotypes.
The detected transcripts distribution is represented through a MA plot (M = log ratio and A = mean average) in Fig. 7.
MA plot (M = log ratio and A = mean average) of the log2 average normalised counts (logCPM) against the log2 fold-change (logFC) across the genes generated by glimmaMA function in EdgeR. Each gene is represented by a grey dot. Significant differentially expressed genes (DEGs) are coloured in red (down-regulated) and in green (up-regulated).
Conclusion
The functional analyses of the highly expressed genes in MFG (FPKM ≥ 500) revealed significant enrichment of pathways associated with milk proteins and related hormones, milk fat secretion, ribosomal proteins, and cellular energy metabolism, which are commonly expressed during lactation and align with the high metabolic demands of this stage. In addition, key genes among the most highly expressed (FPKM ≥ 4000) were identified, including those involved in milk proteins (CSN1S1, CSN1S2, CSN2, CSN3; PAEP/BLG, LALBA), milk fat secretion (GLYCAM1, PLIN2, FABP3, XDH), ribosomal proteins (RPLP0, RPLP1, RPS2, RACK1), and, notably, components of the mitochondrial respiratory chain for energy production to support milk synthesis (COX, ND).
The differential expression analysis revealed highly similar gene-level transcriptomic profiles between A1A1 and A2A2 cows. Only two DEGs were identified with each statistical package between genotypes. Among them, ND6 and 16S Mt-rRNA are mitochondrial genes involved in OXPHOS and energy production, and their reduced expression may suggest slightly impaired mitochondrial activity in A2A2 animals. Overall, these results indicated that the β-casein genotype is not strongly associated with substantial transcriptional differences in the mammary gland, a finding that may be relevant in the context of ongoing selective breeding strategies for A2A2 animals in dairy sector. However, further research is needed to assess whether additional regulatory mechanism such as alternative splicing events or non-coding RNAs, may influence milk synthesis in the mammary gland, beyond what is observable with a gene level differential analysis.
Material and methods
Animals and housing
The cows used in this study were housed in a freestall barn with individual cubicles equipped with sand bedding and belonged to a commercial farm (S.A.T Etxeberri, Navarre, Spain). Animals had free movement in the feeding and walking areas, with continuous access to fresh water. The barn was equipped with mechanical ventilation, including fans, to ensure air circulation, as well as cow brushes to promote comfort and welfare. Milking was performed using an Automated Milking System (AMS) (Lely Astronaut A3, Lely, Maassluis, the Netherlands). All cows received a total mixed ration (TMR) diet twice a day. The detailed ingredient, chemical composition and energy of TMR is provided in Table 4. In addition to the TMR, the cows received 6 kg of concentrate feed when being milked in the milking robot (204.4 crude protein (CP), 5.5 ether extract (EE), 55.4 crude fiber (CF), 361.6 starch, 11.3 Ca, 5.3 P, and 3.3 Mg expressed as g/kg dry matter (DM)).
Since no animals were use, bred, or subject to experimental procedures, specific ethical approval was not required. Only milk samples, provided by a commercial dairy farm, were used for analysis. The collection of milk samples by responsible staff of the farm was performed in accordance with the regulations of the Ethics, Animal Experimentation and Biosafety Committee of the Public University of Navarre (2007–11-12), and the Spanish national regulation (RD 53/2013), which establishes the basic standards for animal safety in experimentation and other scientific purposes, including teaching107.
Animals selection and milk sample collection
A total of 14 lactating healthy Holstein cows were selected for the study based on β-casein genotype (CSN2 gene): A1A1 (n = 7) vs. A2A2 (n = 7). All cows were in the third lactation, with 225.8 ± 73.1 (means ± SD) days in milk and an average age of 59.0 ± 3.8 months. They had similar milk yield characteristics, with average kilograms of milk, fat, and protein percentages of 47.8 ± 7.7, 3.83 ± 0.54, and 3.31 ± 0.22 for A1A1 cows, and 47.2 ± 5.7, 3.87 ± 0.55, and 3.56 ± 0.26 for A2A2 cows. These values were obtained from the official records (Spanish Federation of Holstein Cattle; CONAFE) of test corresponding to the closet day preceding RNA sampling, they represent standardised phenotypic data widely used for genomic evaluations and phenotypic analyses. To increase genetic diversity and reduce potential biases arising from multiple daughters of the same sire, cows selected were daughters of 12 different bulls. The study was conducted between mid-May and mid-June 2023, during the morning milking (9:00–11:00 am). Prior to sampling, all equipment used was disinfected with both ethanol 70% and RNaseZap (Ambion, Austin, TX, USA). Nipple cleaning was also carried out by applying both ethanol 70% and povidone iodine twice, followed by careful drying of the entire nipple. The first portion of milk was discarded prior to sampling to avoid collecting foremilk and potential contamination, and to ensure sample representativeness. Milk was then manually collected from individual cows by the responsible farm staff and transferred into 50 mL RNase-free tubes. Approximately 20–25 tubes were collected from each animal to ensure enough MFG fraction for RNA isolation. Full milk samples were immediately snap-frozen in liquid nitrogen and transported to the laboratory, where they were stored at −80 °C until further analysis.
RNA isolation and quality assessment
RNA isolation from MFG was performed as described in detail Jiménez-Montenegro et al. (2025)58. Briefly, milk samples stored at −80°C were thawed at room temperature and centrifuged at 6000 g during 10 min at 4°C. Then, the milk fat layer (the MFG fraction) was collected using a sterile spatula and was transferred into 15 mL RNase-free tubes. The total volume of milk used from each cow varied (200 to 500 mL) in relation to the MFG quantity collected from each cow (0.5–5 mg MFG fraction/50 mL tube). The collected MFG fraction was homogenised in Trizol LS reagent (Invitrogen, Waltham, MA, USA) at a proportion of 3 mL Trizol LS reagent per 1 mL of collected MFG and the obtained mixture was stored at −80°C until used for RNA isolation. Total RNA was isolated from MFG using Trizol LS reagent (Invitrogen, Waltham, MA, USA) and GenElute mammalian RNA isolation kit (Sigma-Aldrich, CA, USA). Finally, the isolated RNA was eluted into 40 µL of diethyl pyrocarbonate (DEPC) treated water. Then, the RNA quantity, quality and integrity were evaluated using different instruments to ensure accurate and reliable measurements, as previously described in Jiménez-Montenegro et al.58. The RNA quality and integrity were determined using the Nanodrop 2000 spectrophotometer (Thermo Scientific, Madrid, Spain), which determines the ratio A260/280 and A260/230; the TapeStation 4200 RNA Screentape (Agilent Technologies, Santa Clara, CA, USA), which determines the RNA Integrity Number (RIN) and the Qubit 4 fluorometer (Life Technologies, CA, USA), which stablishes the RNA Integrity and Quality (IQ) value. RNA quantity was measured with the Nanodrop 2000 spectrophotometer, the Qubit 4 fluorometer and the Victor Nivo Multimode Microplate Reader (PerkinElmer, Waltham, MA, USA). Briefly, the mean RNA concentrations obtained with Nanodrop was 120.43 ± 22.27 ng/μL and Qubit HS assay and QuantIT RiboGreen assay yielded 102.87 ± 15.64 ng/μL and 109.43 ± 22.69 ng/μL, respectively. The mean A260/280 and A260/230 ratios were 2.03 ± 0.01 and 1.34 ± 0.06, respectively. The average RIN value of all the RNA extracts was 3.59 ± 0.27 (mean ± SE), which was below the conventional benchmark of 7 for RNA-seq studies108,109,110. In contrast, average IQ and A260/280 values were within the ranges accepted by the bibliography for optimal sequencing (9.51 ± 0.15 and 2.03 ± 0.01, respectively)111,112,113. However, as demonstrated in our previous methodological study using the same milk samples and animals58, milk fat globules are a non-conventional RNA source with inherently low rRNA content, leading to artificially low RIN values. Despite this, all samples passed the standard sequencing quality checks, including sequence read lengths (151 bp) and base-coverage (100%), base content (49%), base quality scores (36), and alignment rates (above 85%), indicating high-quality RNA suitable for transcriptome profiling.
Library preparation and RNA sequencing
Sequencing libraries were constructed using the Illumina TruSeq stranded mRNA kit (Illumina, San Diego, CA, USA58. Briefly, messenger RNA (mRNA) molecules were purified from the total RNA and subsequently fragmented. The cleaved RNA fragments were reverse transcribed into first-strand complementary DNA (cDNA) using SuperScript II reverse transcriptase (Invitrogen, Waltham, MA, USA) and random primers. The second-strand cDNA was synthesised using DNA Polymerase I, RNase H and dUTP. Then, a single ‘A’ nucleotide was added to the 3’ ends of the blunt fragments to prevent them from ligating to each other during the subsequent adapter ligation reaction. The resulting products were then purified and enriched by PCR to create the final cDNA library. The libraries were quantified using KAPA Library Quantification kits for Illumina Sequencing platforms according to the qPCR Quantification Protocol Guide (Kapa Biosystems, Wilmington, MA, USA) and qualified using the TapeStation D1000 ScreenTape (Agilent Technologies, Santa Clara, CA, USA). Indexed libraries were then submitted to an Illumina NovaSeq6000 (Illumina, Inc., San Diego, CA, USA) platform for paired-end (2 × 150 bp) sequencing58.
Bioinformatics analysis
Quality control analysis of the raw sequencing data was performed using the FastQC tool version 0.11.9 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Cutadapt software (version 4.0) was then used to perform trimming procedure, in which indexing adapters, as well as low quality reads (those with ambiguous bases and/or shorter than 10 bases), were removed from the analysis. After trimming, the cleaned sequencing reads were again subjected to FastQC analysis. Subsequently, cleaned sequencing reads were aligned to the bovine reference genome, ARS.UCD1.3 (Ensembl release 112), using the Spliced Transcripts Alignment to a Reference (STAR) aligner (version 2.7.11b). Within STAR, the quantmode function was utilised to quantify the number of sequencing reads aligned to each gene. Unmapped or multi-positions mapped reads were excluded from further analysis and only uniquely mapped reads were considered.
Outlier detection was carried out through a generalized model principal component analysis (GLM-PCA). Based on this analysis, samples that deviated significantly from the rest based on their scores in the principal component space were considered potential outliers and removed from downstream analysis to avoid biasing the results114,115. As a consequence, one animal with A1A1 genotype (cow 7) was considered an outlier.
A list of highly expressed genes was obtained by normalising the data through the calculation of FPKM with EdgeR (version 4.0.16). This FPKM value involved two normalisation steps: the number of fragments was normalised firstly by the sequencing depth of each sample in the read counts file, and secondly, by the transcript length of each gene in the bovine reference genome (ARS.UCD1.3; Ensembl release 112). Genes with an average FPKM value greater than 500 were considered highly expressed.
Subsequently, a GO enrichment analysis with the highly expressed genes was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID). Multiple testing was accounted based on the FDR of 0.05. Since DAVID supports various ontologies, a functional annotation clustering (FAC) was conducted for categories sharing a significant number of genes116,117. The use of a FAC reduces the complexity of handling similar redundant terms, allowing for a more focused biological interpretation at the group level117,118. The biometric indicators considered in the FAC were: Functional Annotations (UP_KW_Molecular_Function, UP_KW_Biological_Process, UP_KW_Cellular_Component, UP_KW_PTM, and UP_SEQ_Feature), Gene Ontology (GOTERM_MF_Direct, GOTERM_BP_Direct, and GOTERM_CC_Direct), Interactions (UP_KW_Ligand), Pathways (KEGG_Pathway), and Protein Domains (INTERPRO, PIR_Superfamily and SMART). Annotation clusters with an enrichment score of at least 1 were considered significant.
To evaluate the DEGs from the RNA-seq read counts, two different Bioconductor packages of Rstudio (version 4.3.2), DESeq2 (version 1.42.0)119 and EdgeR (version 4.0.16)120, were used. The selection of these two software was based on the literature evidence supporting their robustness, regarding that when it comes to integrating methods for identifying DEGs, their combination allows an enhanced sensitivity and leads to more reliable outcomes49,121. DESeq2 and EdgeR conduct pairwise comparisons among two or more groups utilising parametric tests, assuming read counts adhere to a negative binomial distribution with a gene-specific dispersion parameter. The main differences between these packages consist in the estimation of the dispersion parameter and the normalisation methods employed49,121. Firstly, read counts were filtered and genes expressed at very low level were removed from the analysis. Only genes with more than 10 counts in at least six samples were considered to be expressed. For this purpose, the fpm (fragments per million) function of the DESeq2 package (version 1.42.0) and the cpm (counts per million) function of EdgeR (version 4.0.16) were respectively used. In order to determine DEGs between A1A1 and A2A2 cows for β-casein, after filtering, read counts were normalised by sequencing depth, gene expression dispersions were estimated, the model was fittled and the contrasts were performed. With respect DESeq2, genes with an adjusted p-value (Padj value) < 0.05 and a log2 Fold Change (log2FC) >|1| were considered differentially expressed. For EdgeR, genes with a False Discovery Rate (FDR) < 0.05 and a log2FC >|1| were considered differentially expressed.
Data availability
The datasets generated and/or analysed during the current study are available in the Gene Expression Omnibus (GEO) repository under the accession number: GSE273229. These can be accessed through the NCBI GEO database at https://www.ncbi.nlm.nih.gov/geo/info/seq.html.
References
Kaskous, S. A1- and A2-Milk and their effect on human health. J. Food Eng. Technol. 9, 15–21 (2020).
Jiménez-Montenegro, L., Alfonso, L., Mendizabal, J. A. & Urrutia, O. Worldwide research trends on milk containing only A2 β-Casein: A bibliometric study. Animals 12, 1–17 (2022).
Hewa Nadugala, B., Pagel, C. N., Raynes, J. K., Ranadheera, C. S. & Logan, A. The effect of casein genetic variants, glycosylation and phosphorylation on bovine milk protein structure, technological properties, nutrition and product manufacture. Int. Dairy J. https://doi.org/10.1016/j.idairyj.2022.105440 (2022).
de Vasconcelos, M. L., Oliveira, L. M. F. S., Hill, J. P. & Vidal, A. M. C. Difficulties in Establishing the adverse effects of β-casomorphin-7 released from β-casein variants—A review. Foods https://doi.org/10.3390/foods12173151 (2023).
Daniloski, D., McCarthy, N. A., Huppertz, T. & Vasiljevic, T. What is the impact of amino acid mutations in the primary structure of caseins on the composition and functionality of milk and dairy products?. Curr. Res. Food Sci. 5, 1701–1712. https://doi.org/10.1016/j.crfs.2022.09.026 (2022).
Farrell, H. M. et al. Nomenclature of the proteins of cows’ milk - Sixth revision. J. Dairy Sci. 87, 1641–1674 (2004).
Bell, S. J., Grochoski, G. T. & Clarke, A. J. Health implications of milk containing β-casein with the A 2 genetic variant. Crit. Rev. Food Sci. Nutr. 46, 93–100 (2006).
Asledottir, T. et al. Identification of bioactive peptides and quantification of β-casomorphin-7 from bovine β-casein A1, A2 and I after ex vivo gastrointestinal digestion. Int. Dairy J. 71, 98–106 (2017).
Auestad, N. & Layman, D. K. Dairy bioactive proteins and peptides: A narrative review. Nutr. Rev. 79, 36–47 (2021).
Sobczak, M., Sałaga, M., Storr, M. A. & Fichna, J. Physiology, signaling, and pharmacology of opioid receptors and their ligands in the gastrointestinal tract: Current concepts and future perspectives. J. Gastroenterol. 49, 24–45 (2014).
Pal, S., Woodford, K., Kukuljan, S. & Ho, S. Milk intolerance, beta-casein and lactose. Nutrients 7, 7285–7297 (2015).
Choi, Y., Kim, N., Song, C., Kim, S. & Ho Lee, D. The effect of A2 milk on gastrointestinal symptons in comparison to A1/A2 milk: A single-center, randomized, double-blind. Cross-Over Study. J. Cancer Prev. 29, 45–53 (2024).
Kay, S. I. S. et al. Beneficial effects of milk having A2 β-casein protein: Myth or reality?. J. Nutr. 151, 1061–1072 (2021).
Milan, A. M. et al. Comparison of the impact of bovine milk β-casein variants on digestive comfort in females self-reporting dairy intolerance: A randomized controlled trial. Am. J. Clin. Nutr. 111, 149–160 (2020).
Haq, M. R. U., Kapila, R., Sharma, R., Saliganti, V. & Kapila, S. Comparative evaluation of cow β-casein variants (A1/A2) consumption on Th2-mediated inflammatory response in mouse gut. Eur. J. Nutr. 53, 1039–1049 (2014).
Barnett, M. P. G., Mcnabb, W. C., Roy, N. C., Woodford, K. B. & Clarke, A. J. Dietary A1 β-casein affects gastrointestinal transit time, dipeptidyl peptidase-4 activity, and inflammatory status relative to A2 β-casein in Wistar rats. Int. J. Food Sci. Nutr. 65, 720–727 (2014).
Ho, S., Woodford, K., Kukuljan, S. & Pal, S. Comparative effects of A1 versus A2 beta-casein on gastrointestinal measures: A blinded randomised cross-over pilot study. Eur. J. Clin. Nutr. 68, 994–1000 (2014).
Li, X., Spencer, G. W. K., Ong, L. & Gras, S. L. Beta casein proteins – A comparison between caprine and bovine milk. Trends Food Sci. Technol. 121, 30–43 (2022).
Ramakrishnan, M., Eaton, T. K., Sermet, O. M. & Savaiano, D. A. Milk containing a2 β-casein only, as a single meal, causes fewer symptoms of lactose intolerance than milk containing a1 and a2 β-caseins in subjects with lactose maldigestion and intolerance: a randomized, double-blind, crossover trial. Nutrients 12, 1–15 (2020).
De Gaudry, D. K. et al. Milk A1 β-casein and health-related outcomes in humans: A systematic review. Nutr. Rev. 77, 278–306 (2019).
Jianqin, S. et al. Effects of milk containing only A2 beta casein versus milk containing both A1 and A2 beta casein proteins on gastrointestinal physiology, symptoms of discomfort, and cognitive behavior of people with self-reported intolerance to traditional cows’ milk. Nutr. J. 15, 1–16 (2016).
Gonzales-Malca, J. A., Tirado-Kulieva, V. A., Abanto-López, M. S., Aldana-Juárez, W. L. & Palacios-Zapata, C. M. Worldwide research on the health effects of bovine milk containing A1 and A2 β-casein: Unraveling the current scenario and future trends through bibliometrics and text mining: Impact of A1 and A2 bovine milk on health. Curr. Res. Food Sci. 7, 1–14. https://doi.org/10.1016/j.crfs.2023.100602 (2023).
DATEX Working Group. Scientific report of EFSA: Review of the potential health impact of β-casomorphins and related peptides. EFSA J. https://doi.org/10.2903/j.efsa.2009.231r (2009).
Mounika, V., Kumar, B. G. & Supriya, K. Study on consumer buying behavior, awareness and preference for A2 milk in Hyderabad, India. Asian J. Agric. Ext., Econ. Sociol. 38, 21–29 (2020).
Fernández-Rico, S. et al. A2 milk: New perspectives for food technology and human health. Foods 11, 1–20 (2022).
Oglobline, A. N., Padula, M. P. & Doble, P. A. Quality control of A1-free dairy ☆. Food Control 135, 1–7 (2022).
Alfonso, L., Urrutia, O. & Mendizabal, J. A. Conversion to A2 milk production with regard to a possible market demand for dairy farms: Possibilities and implications. ITEA Inf. Tec. Econ. Agrar. 115, 231–251 (2019).
Ardicli, S., Aldevir, O., Aksu, E. & Gumen, A. The variation in the beta-casein genotypes and its effect on milk yield and genomic values in Holstein-Friesian cows. Anim. Biotechnol. 34, 4116–4125 (2023).
Duifhuis-Rivera, T. et al. Polymorphisms in beta and kappa-casein are not associated with milk production in two highly technified populations of holstein cattle in Mexico. J. Anim. Plant Sci. 24, 1316–1321 (2014).
Ozdemir, M., Kopuzlu, S., Topal, M. & Bilgin, O. C. Relationships between milk protein polymorphisms and production traits in cattle: A systematic review and meta-analysis. Arch. Anim. Breed 61, 197–206 (2018).
Morris, C. A. et al. Associations between β-casein genotype and milk yield and composition in grazing dairy cows. N. Z. J. Agric. Res. 48, 441–450 (2005).
Olenski, K., Kamiński, S., Szyda, J. & Cieslinska, A. Polymorphism of the beta-casein gene and its associations with breeding value for production traits of Holstein-Friesian bulls. Livest. Sci. 131, 137–140 (2010).
Çardak, A. D. Effects of genetic variants in milk protein on yield and composition of milk from Holstein-Friesian and Simmentaler cows. S Afr. J. Anim. Sci. 35, 1–7 (2005).
Wang, X. et al. Comparative proteomic characterization of bovine milk containing β-casein variants A1A1 and A2A2, and their heterozygote A1A2. J. Sci. Food Agric. 101, 718–725 (2021).
Huang, W. et al. Association between milk protein gene variants and protein composition traits in dairy cattle. J. Dairy Sci. 95, 440–449 (2012).
Si, J. et al. Complete genomic landscape reveals hidden evolutionary history and selection signature in asian water buffaloes (bubalus bubalis). Adv. Sci. https://doi.org/10.1002/advs.202407615 (2024).
Cui, X. et al. Transcriptional profiling of mammary gland in Holstein cows with extremely different milk protein and fat percentage using RNA sequencing. BMC Genom. 15, 1–15 (2014).
Cánovas, A. et al. Comparison of five different RNA sources to examine the lactating bovine mammary gland transcriptome using RNA-Sequencing. Sci. Rep. 4, 1–7 (2014).
Dado-Senn, B. et al. RNA-Seq reveals novel genes and pathways involved in bovine mammary involution during the dry period and under environmental heat stress. Sci. Rep. 8, 1–12 (2018).
Beckett, L. et al. Mammary transcriptome reveals cell maintenance and protein turnover support milk synthesis in early-lactation cows. Physiol Genom. 52, 435–450 (2020).
Wickramasinghe, S., Rincon, G., Islas-Trejo, A. & Medrano, J. F. Transcriptional profiling of bovine milk using RNA sequencing. BMC Genom. 13, 1–14 (2012).
Nichols, K., Bannink, A., Van Baal, J. & Dijkstra, J. Impact of post-ruminally infused macronutrients on bovine mammary gland expression of genes involved in fatty acid synthesis, energy metabolism, and protein synthesis measured in RNA isolated from milk fat. J. Anim. Sci. Biotechnol. 11, 1–12 (2020).
Yang, J. et al. Differential expression of genes in milk of dairy cattle during lactation. Anim. Genet. 47, 174–180 (2016).
Bhat, S. A. et al. Comparative transcriptome analysis of mammary epithelial cells at different stages of lactation reveals wide differences in gene expression and pathways regulating milk synthesis between Jersey and Kashmiri cattle. PLoS ONE 14, 1–27 (2019).
Lemay, D. G. et al. RNA sequencing of the human milk fat layer transcriptome reveals distinct gene expression profiles at three stages of lactation. PLoS ONE 8, 1–18 (2013).
Boutinaud, M., Herve, L. & Lollivier, V. Mammary epithelial cells isolated from milk are a valuable, non-invasive source of mammary transcripts. Front. Genet. https://doi.org/10.3389/fgene.2015.00323 (2015).
Brenaut, P. et al. Validation of RNA isolated from milk fat globules to profile mammary epithelial cell expression during lactation and transcriptional response to a bacterial infection. J. Dairy Sci. 95, 6130–6144 (2012).
Chen, Q. et al. Milk fat globule is an alternative to mammary epithelial cells for gene expression analysis in buffalo. J. Dairy Res. 83, 202–208 (2016).
Suárez-Vega, A. et al. Characterization and comparative analysis of the milk transcriptome in two dairy sheep breeds using RNA sequencing. Sci. Rep. 5, 1–11 (2015).
Jia, W., Zhang, R., Zhu, Z. & Shi, L. A high-throughput comparative proteomics of milk fat globule membrane reveals breed and lactation stages specific variation in protein abundance and functional differences between milk of saanen dairy goat and holstein bovine. Front. Nutr. 8, 1–13 (2021).
du Plessis, L., Škunca, N. & Dessimoz, C. The what, where, how and why of gene ontology-A primer for bioinformaticians. Brief Bioinform. 12, 723–735 (2011).
Maningat, P. D. et al. Gene expression in the human mammary epithelium during lactation: The milk fat globule transcriptome. Physiol. Genom. 37, 12–22 (2009).
Lemay, D. G. et al. Sequencing the transcriptome of milk production: Milk trumps mammary tissue. BMC Genom. 14, 1–17 (2013).
Deacon, A. M., Blouin, R., Thibault, C. & Lacasse, P. Mechanism underlying the modulation of milk production by incomplete milking. J. Dairy Sci. 106, 783–791 (2023).
Shaban, S. M., Hassan, R. A., Hassanin, A. A. I., Fathy, A. & El Nabtiti, A. A. S. Mammary fat globules as a source of mRNA to model alterations in the expression of some milk component genes during lactation in bovines. BMC Vet. Res. 20(286), 1–12 (2024).
Medrano, J. F., Cánovas, A. & Islas-Trejo, A. RNA Sequencing for the Analysis of Complex Traits in Milk: Detection of Bacteria. in RNA Sequencing for the Analysis of Complex Traits in Milk: Detection of Bacteria (Proceedings, 10th World Congress of Genetics Applied to Livestock Production, California, USA, 2014).
Li, Z. et al. Methods for Preserving Cellular and Milk Fat Globules RNA from Human Milk Samples. Preprint at https://doi.org/10.21203/rs.3.rs-6968867/v1 (2025).
Jiménez-Montenegro, L., Alfonso, L., Soret, B., Mendizabal, J. A. & Urrutia, O. Preservation of milk in liquid nitrogen during sample collection does not affect the RNA quality for RNA-seq analysis. BMC Genom. 26, 525 (2025).
Chen, W., Yu, L., Jia, Z. & Zhu, J. Transcriptome sequencing reveals significant RNA variation in human sperm samples. BMC Res. Notes 18, 288 (2025).
Li, R., Dudemaine, P. L., Zhao, X., Lei, C. & Ibeagha-Awemu, E. M. Comparative analysis of the miRNome of bovine milk fat, whey and cells. PLoS ONE 11, 1–21 (2016).
Bionaz, M., Hurley, W. & Loor, J. Milk protein synthesis in the lactating mammary gland: Insights from transcriptomics analyses. Milk Protein https://doi.org/10.5772/46054 (2012).
Arora, R. et al. Buffalo milk transcriptome: A comparative analysis of early, mid and late lactation. Sci. Rep. 9, 1–9 (2019).
Dai, W. et al. Complementary transcriptomic and proteomic analyses reveal regulatory mechanisms of milk protein production in dairy cows consuming different forages. Sci. Rep. 7, 1–14 (2017).
Bionaz, M. & Loor, J. J. Gene networks driving bovine milk fat synthesis during the lactation cycle. BMC Genom. 9, 1–21 (2008).
Sherman, B. T. et al. DAVID: A web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 50, W216–W221 (2022).
Zamudio-Arroyo, J. M., Peña-Rangel, M. T. & Riesgo-Escovar, R. L. ubiquitinación: Un sistema de regulación dinámico de los organismos. Revista especializada en ciencias químico-biológicas 15, 1–10 (2012).
Gareau, J. R. & Lima, C. D. The SUMO pathway: Emerging mechanisms that shape specificity, conjugation and recognition. Nat. Rev. Mol. Cell Biol. 11, 861–871 (2010).
Phipps, K. R., Charette, J. M. & Baserga, S. J. The small subunit processome in ribosome biogenesis-progress and prospects. Wiley Interdiscip. Rev. RNA 2, 1–21 (2011).
Catalanotto, C., Barbato, C., Cogoni, C. & Benelli, D. The RNA-binding function of ribosomal proteins and ribosome biogenesis factors in human health and disease. Biomedicines 11, 1–15 (2023).
Li, X. et al. Structures and biological functions of zinc finger proteins and their roles in hepatocellular carcinoma. Biomark. Res. 10, 1–13. https://doi.org/10.1186/s40364-021-00345-1 (2022).
Bionaz, M. & Loor, J. J. Gene networks driving bovine mammary protein synthesis during the lactation cycle. Bioinform. Biol. Insights 5, 83–98 (2011).
Favorit, V., Hood, W. R., Kavazis, A. N. & Skibiel, A. L. Graduate student literature review: Mitochondrial adaptations across lactation and their molecular regulation in dairy cattle*. J. Dairy Sci. 104, 10415–10425 (2021).
Álvarez-Delgado, C. The role of mitochondria and mitochondrial hormone receptors on the bioenergetic adaptations to lactation. Mol. Cell Endocrinol. 551, 1–11 (2022).
Bermejo-Haro, M. Y., Camacho-Pacheco, R. T., Brito-Pérez, Y. & Mancilla-Herrera, I. The hormonal physiology of immune components in breast milk and their impact on the infant immune response. Mol. Cell Endocrinol. 572, 1–11 (2023).
Xu, B. et al. Effect of physiological and production activities on the concentration of naturally occurring steroid hormones in raw milk. Int. J. Dairy Technol. 73, 471–478 (2020).
Fustini, M. et al. Overstocking dairy cows during the dry period affects dehydroepiandrosterone and cortisol secretion. J. Dairy Sci. 100, 620–628 (2017).
Vagnerová, K. et al. Profiling of adrenal corticosteroids in blood and local tissues of mice during chronic stress. Sci. Rep. 13, 1–11 (2023).
Kurpińska, A. & Skrzypczak, W. Hormonal changes in dairy cows during periparturient period. Acta Scientiarum Polonorum Zootechnica 18, 13–22 (2020).
Fang, L. et al. Comprehensive analyses of 723 transcriptomes enhance genetic and biological interpretations for complex traits in cattle. Genome Res. 30, 790–801 (2020).
Thum, C., Roy, N. C., Everett, D. W. & McNabb, W. C. Variation in milk fat globule size and composition: A source of bioactives for human health. Crit. Rev. Food Sci. Nutr. 63, 87–113 (2023).
Dowbenko, D., Kikuta, A., Fennie, C., Gillett, N. & Lasky, L. A. Glycosylation-dependent cell adhesion molecule 1 (GlyCAM 1) mucin is expressed by lactating mammary gland epithelial cells and is present in milk. J. Clin. Investig. 92, 952–960 (1993).
Hou, Z. et al. Glycosylation-dependent cell adhesion molecule 1 (GlyCAM 1) is induced by prolactin and suppressed by progesterone in mammary epithelium. Endocrinology 141, 4278–4283 (2000).
Zhang, L. et al. Perspective on calf and mammary gland development through changes in the bovine milk proteome over a complete lactation. J. Dairy Sci. 98, 5362–5373 (2015).
Abdisa, K. B. et al. Metabolic syndrome and biotherapeutic activity of dairy (cow and buffalo) milk proteins and peptides: Fast food-induced obesity perspective—A narrative review. Biomolecules 14, 1–61 (2024).
Nie, C. et al. Structure, biological functions, separation, properties, and potential applications of milk fat globule membrane (MFGM): A review. Nutrients 16, 1–21 (2024).
Lopez, C. Intracellular Origin of Milk Fat Globules, Composition and Structure of the Milk Fat Globule Membrane Highlighting the Specific Role of Sphingomyelin. In Advanced Dairy Chemistry, Volume 2: Lipids 107–131 (Springer International Publishing, 2020).
Fong, B. Y., Norris, C. S. & MacGibbon, A. K. H. Protein and lipid composition of bovine milk-fat-globule membrane. Int. Dairy J. 17, 275–288 (2007).
Guo, Z., Ma, L. & Bu, D. Omics, the new technological approaches to the milk protein researches. Milk Protein-New Res. Appr. https://doi.org/10.5772/intechopen.102490 (2022).
Liang, M. Y. et al. Functional analysis of FABP3 in the milk fat synthesis signaling pathway of dairy cow mammary epithelial cells. In Vitro Cell Dev. Biol. Anim. 50, 865–873 (2014).
Cohen, B. C., Shamay, A. & Argov-Argaman, N. Regulation of lipid droplet size in mammary epithelial cells by remodeling of membrane lipid composition-a potential mechanism. PLoS ONE 10, 1–19 (2015).
Walter, L. et al. Milk fat globule size development in the mammary epithelial cell: A potential role for ether phosphatidylethanolamine. Sci. Rep. 10, 1–13 (2020).
Argov-Argaman, N., Mesilati-Stahy, R., Magen, Y. & Moallem, U. Elevated concentrate-to-forage ratio in dairy cow rations is associated with a shift in the diameter of milk fat globules and remodeling of their membranes. J. Dairy Sci. 97, 6286–6295 (2014).
Vorbach, C., Scriven, A. & Capecchi, M. R. The housekeeping gene xanthine oxidoreductase is necessary for milk fat droplet enveloping and secretion: Gene sharing in the lactating mammary gland. Genes Dev. 16, 3223–3235 (2002).
Ogg, S. L., Weldon, A. K., Dobbie, L., Smith, A. J. H. & Mather, I. H. Expression of butyrophilin (Btn1a1) in lactating mammary gland is essential for the regulated secretion of milk-lipid droplets. Proc. Natl. Acad. Sci. U. S. A., PNAS 101, 10084–10089 (2004).
Smith, I. A. et al. BTN1A1, the mammary gland butyrophilin, and BTN2A2 are both inhibitors of T cell activation. J. Immunol. 184, 3514–3525 (2010).
Jeong, J. et al. The PRY/SPRY/B30.2 domain of butyrophilin 1A1 (BTN1A1) binds to xanthine oxidoreductase. Implications for the function of BTN1A1 in the mammary gland and other tissues. J. Biol. Chem. 284, 22444–22456 (2009).
Choudhary, S. et al. Examination of the xanthosine response on gene expression of mammary epithelial cells using RNA-seq technology. J. Anim. Sci. Technol. 60, 1–12 (2018).
Paten, A. M. et al. Functional development of the adult ovine mammary gland-insights from gene expression profiling. BMC Genom. 16, 1–13 (2015).
Zhang, Q. et al. HMOX1 promotes ferroptosis in mammary epithelial cells via FTH1 and is involved in the development of clinical mastitis in dairy cows. Antioxidants 11, 1–20 (2022).
Timón-Gómez, A. et al. Mitochondrial cytochrome c oxidase biogenesis: Recent developments. Semin. Cell Dev. Biol. 76, 163–178 (2018).
Saikia, D. P. et al. Molecular characterization of the mitochondrial 16S rRNA gene of cattle, buffalo and yak. Vet. Arh. 86, 777–785 (2016).
Rathore, S. Structural characterisation of mitochondrial macromolecular complexes using cryo-EM Mitoribosome biogenesis and respiratory chain supercomplex (Stockolm University, 2020).
Campbell, R. L. & Davies, P. L. Structure–function relationships in calpains. Biochem. J. 447, 335–351 (2012).
Tonami, K. et al. Calpain 6 is involved in microtubule stabilization and cytoskeletal organization. Mol. Cell Biol. 27, 2548–2561 (2007).
Soneson, C., Love, M. I. & Robinson, M. D. Differential analyses for RNA-seq: Transcript-level estimates improve gene-level inferences. F1000Res 4, 1521 (2015).
Jones, E. F., Haldar, A., Oza, V. H. & Lasseigne, B. N. Quantifying transcriptome diversity: A review. Brief. Funct. Genom. 23, 83–94. https://doi.org/10.1093/bfgp/elad019 (2024).
BOE. Royal Decree 53/2013 of 1 February Establishing the Basic Rules Applicable to the Protection of Animals Used for Experimental and Other Scientific Purposes, Including Teaching. (2013).
Keogh, K., Kelly, A. K. & Kenny, D. A. Effect of plane of nutrition in early life on the transcriptome of visceral adipose tissue in Angus heifer calves. Sci. Rep. 11, 1–12 (2021).
Sheng, Q. et al. Multi-perspective quality control of Illumina RNA sequencing data analysis. Brief. Funct. Genom. 16, 194–204 (2017).
Du, L. et al. Integrating genomics and transcriptomics to identify candidate genes for subcutaneous fat deposition in beef cattle. Genomics 114, 1–12 (2022).
Thermo Fisher Scientific. Qubit RNA IQ Assay Kit Qubit RNA IQ Assay: A Fast and Easy Fluorometric RNA Quality Assessment. (2018).
Lai, T. C. et al. Achieving the best RNA quality in urologic tumor samples intended for transcriptome analysis. Urol. Sci. 32, 186–192 (2021).
Ahlberg, E., Jenmalm, M. C. & Tingö, L. Evaluation of five column-based isolation kits and their ability to extract miRNA from human milk. J. Cell Mol. Med. 25, 7973–7979 (2021).
Mangiola, S., Thomas, E. A., Modrák, M., Vehtari, A. & Papenfuss, A. T. Probabilistic outlier identification for RNA sequencing generalized linear models. NAR Genom. Bioinform. 3, 1–9 (2021).
Merino, G. A. et al. The impact of quality control in RNA-seq experiments. J. Phys. Conf. Ser. 705, 1–10 (2016).
Sousa, L. P. B. et al. Genome-wide association and functional genomic analyses for various hoof health traits in North American Holstein cattle. J. Dairy Sci. 107, 2207–2230 (2024).
Ng’oma, E. et al. Transcriptome profiling of natural dichromatism in the annual fishes Nothobranchius furzeri and Nothobranchius kadleci. BMC Genom. 15, 1–7 (2014).
Ma, C. et al. Pan-cancer surveys indicate cell cycle-related roles of primate-specific genes in tumors and embryonic cerebrum. Genome Biol. 23, 1–29 (2022).
Love, M. Count Outlier Detection Using Cook’s Distance 1 Run DE Analysis with and without Outlier Removal. https://www.huber.embl.de/DESeq2paper/vignettes/cooksDist.pdf (2014).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2009).
Costa-Silva, J., Domingues, D. & Lopes, F. M. RNA-Seq differential expression analysis: An extended review and a software tool. PLoS ONE 12, 1–18 (2017).
Acknowledgements
This research was funded by the Public University of Navarre within the framework of Young Researcher Projects 2022 and by the Department of University, Innovation, and Digital Transformation of the Government of Navarre, grant number 0011-1408-2022-000024 (RES 95E/2022 of May 17th). The authors would like to thank S.A.T. Etxeberri and Albaitaritza Genetics S.A. for their assistance.
Funding
Navarre Goverment,0011-1408-2022-000024,Universidad Pública de Navarra,PJUPNA27-2022
Author information
Authors and Affiliations
Contributions
Conceptualization and experiment design O.U. Investigation, B.S., L.J-M, and O.U. Sample collection, J.A.M., L.A., and L.J-M. Methodology, L.A., L.J-M., and O.U. Laboratory experiment performance, L.J-M. Formal analysis, L.J-M., and O.U. Sequencing data analysis, L.J-M, L.A., and O.U. Writing original draft preparation, L. J.-M., and O.U. Writing review and editing, B.S., J.A.M., L.A., L.J-M., and O.U.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jiménez-Montenegro, L., Alfonso, L., Soret, B. et al. Transcriptomic profiling of milk fat globules in cows with different β-casein genotypes. Sci Rep 15, 33511 (2025). https://doi.org/10.1038/s41598-025-17503-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-17503-2