Introduction

Peroxidases, or PODs, are members of large multigene families found in many different organisms; these enzymes use hydrogen peroxide (H2O2) as an electron receiver at their reactive sites to aid oxidation processes, with the help of metals. PODs may be divided into two main types, based on their catalytic properties and architectural features: heme-containing PODs, and those without heme1. Moreover, hemoglobin-associated PODs can be further separated into animal-derived and non-animal-derived divisions2. There are three main subclasses within the ungulate family, Classes I, II, and III3, among which, the plant-specific Class III peroxidases (EC 1.11.1.7) are denoted by a number of acronyms, such as POX, POD, Px, PER, and Prx4; we shall refer to POD as “Class III peroxidase” for the duration of this study.

Class III plant peroxidase enzymes have a highly conserved collection of amino acids that include protoporphyrin IX domains and single peptide chains; they also have eight cysteine residues that form disulfide bridges, as well as two histidine residues that interact with a single heme5,6. These enzymes mainly take part in the peroxide and hydroxyl radical cycles, which are intended to limit the production of reactive oxygen species (ROS) and hydrogen peroxide7. Most plant PODs form glycosylated proteins through their association with carbohydrate side chains, which protects them from proteolytic degradation and maintains their stability8,9,10. Additionally, POD proteins support a variety of physiological functions, including the integration of cell wall components, the modification of plant hormones, pathogen resistance, the breakdown of toxic chemicals, and enhancements in salt tolerance11. The function of POD proteins in plants’ responses to biotic and abiotic stress conditions is supported by a number of genetic studies; for example, overexpressing several POD genes (AtPrx22, AtPrx39, and AtPrx69) in Arabidopsis thaliana increases its resistance to cold7, and transgenic tobacco’s tolerance to aluminum stress is increased by upregulating the expression of AtPrx6412. While pepper plants with overexpressed CaPOD2 in Arabidopsis show resilience to bacterial assaults, those with silenced CaPOD2 genes are more vulnerable to xanthomonas infections13. Furthermore, the GhPOX1 variation in cotton is linked to an increase in the generation of reactive oxygen species14. Ultimately, these investigations highlight the beneficial impacts of Class III plant peroxidases on the plant’s response to biological and environmental stressors.

Soybean is among the most widely grown crops in the world and is an essential source of both cooking oil and plant protein15. All stages of soybean growth and maturity are at risk from the stressors of high temperatures and humidity, which are worsened by climate change. A variety of harmful outcomes, such as weakened cell membrane integrity, deformed leaf and flower tissues, decreased pollen viability and germination rates, pollen abortion, reduced pod setting, abnormal seed composition, lower seed vitality, slowed or stopped germination, decreased field emergence rates, and diminished nutritional content, can result from such unfavorable conditions, seriously hindering the normal progression of plant growth; when all of these things come together, crop output and quality significantly decline16,17,18,19,20. It is estimated that worldwide soybean output would decrease by 3.1% for every degree Celsius by which the average global temperature climbs21, as stress from both high temperatures and humidity is a major factor affecting the yield and quality of soybean22. The exact method by which the GmPOD gene family functions under high-temperature and -humidity stress conditions in soybean is still unclear, despite the gene family’s substantial documentation in a variety of plants; therefore, determining the GmPOD gene family in soybean is crucial for understanding genes associated with tolerance to high-temperature and -humidity stress conditions, both theoretically and practically. This information may make it easier to understand how soybean responds to stress and to create cultivars that are more resistant to it. In order to provide insights into how GmPOD family members regulate soybean development and stress responses, we conducted this study with the aims of identifying the GmPOD gene family in soybean and conducting thorough analyses of its phylogenetic relationships, gene structure, domains, chromosomal positioning, collinearity, protein interaction networks, cis-regulatory elements, and tissue-specific expression patterns.

Materials and methods

Identification of GmPOD family members in soybean

The soybean genome sequence source was Soybase (https://www.soybase.org/)(the study used the genome of the version to Wm82 a4. V1, the version from the Phytozome database (https://phytozome-next.jgi.doe.gov/info/Gmax_Wm82_a4_v1)); protein sequences for POD proteins from pepper (Use the version for PI 159236 v1.55 genome, from Phytozome database (https://phytozome-next.jgi.doe.gov/info/Cannuum_PI159236_v1_55)) and Arabidopsis were obtained from the Phytozome Araport11 version in the database (https://phytozome-next.jgi.doe.gov/info/Athaliana_Araport11). Using known GmPOD protein sequences (Additional file 1), Hidden Markov Model (HMM) profiles were constructed, and then used to scan soybean protein databases using the HMMER program (version 3.3.2)23. The Pfam peroxidase (PF00141) structural domain was used to validate the identified soybean GmPOD proteins using the InterPro database (https://www.ebi.ac.uk/interpro/), SMART database (http://smart.embl-heidelberg.de/), and NCBI’s Conserved Domain Database (https://www.ncbi.nlm.nih.gov/cdd/). Other potential proteins were eliminated, and proteins containing the entire Pfam peroxidase (PF00141) domain were identified as GmPOD members. The names of the potential soybean GmPOD genes were determined by their locations on the physical map. The Compute pI/Mw program (https://web.expasy.org/compute_pi/) was used to ascertain the isoelectric point (pI) and molecular weight (MW) of each GmPOD protein24. Furthermore, amino acid counts were predicted using the ProtParam program25.

Chromosomal locations, gene duplication analysis, and synteny analyses of GmPOD genes

GFF3 files were used to collect information on the chromosomal locations of the members of the GmPOD gene family. Use the “One Step MCScanX” function of TBtools (version 1.120) to complete the lineage analysis of the GmPOD gene family, the detection of gene duplication events, and draw chromosome mapping and collinearity relationship diagrams through this software26. Circos (Version 0.69–9.69 is adopted) was used to display the location of gene duplication events27. Using MCScanX (This study used the original version released in 2012), the duplication patterns and collinearity of GmPOD genes within soybean species were investigated, and TBtools (version 1.120) was used to display the results28. Using the Simple Ka/Ks calculator in the TBtools program, the non-synonymous substitution rate (Ka), synonymous substitution rate, and the ratio of non-synonymous to synonymous substitution rates (Ka/Ks) were calculated from the coding sequences26,29.

Protein–protein interaction network of GmPOD genes

A search for the full appropriate typical transcript protein sequences was conducted on the STRING database (version v11.5) using the protein sequences from the 180 discovered GmPOD genes30. The data on Arabidopsis’s protein–protein interaction network and sample protein sequence sets were also made available via the STRING database30. When constructing the GmPOD protein-protein interaction network through the STRING database (version 11.5) in this study, the following key parameters were set: score critical value: Use the default medium confidence threshold of the database (combined score ≥ 0.4) to screen significant interaction relationships to exclude protein interactions with low confidence. Analysis of the protein–protein interaction (PPI) network linked to the GmPOD family was conducted using the TBtools program26; then, the results were visualized using Cytoscape (version 3.9.1).

Conserved motif and gene structure analysis of GmPOD genes

TBtools version 1.120 was used to study the GmPOD genomic architecture in order to clarify the exon/intron organization of soybean GmPOD genes26. Utilizing the MEMESuite program (https://meme-suite.org/meme/index.html), the conserved motifs found in the GmPOD gene were investigated; all other parameters were maintained at their normal values, and the analysis was limited to identifying up to 10 motifs31. Gene structure and conserved motif analysis: After predicting the conserved motifs of the GmPOD gene using MEME Suite, the exon/intron structure of the gene (CDS regions marked with yellow boxes and introns marked with black lines) and the distribution of 10 conserved motifs were presented through TBtools (version 1.120).

Phylogenetic tree construction of GmPOD protein

The POD amino acid sequences were obtained and concatenated from Arabidopsis32, pepper33, and soybean (source: https://www.soybase.org), after which multiple sequence alignment was carried out on this composite dataset using ClustalW version 2.0.11. Each protein sequence was then aligned using MAFFT version 7. The maximum likelihood (ML) approach was used to generate a phylogenetic tree without a root, using 1000 bootstrap replicates and MEGA 11.0 software34. The GmPOD genes were categorized using the topological structure of the evolutionary tree and the taxonomy of their Arabidopsis counterparts35. A more accurate visualization of the phylogenetic tree was achieved using the online application iTol (Use the online version of iTOL v6) (http://itol.embl.de)36.

Analysis of cis-acting elements of GmPOD gene family promoters

We obtained the promoter sequences, which extend two kilobases upstream of the transcription start site for every GmPOD gene, from the soybean genome database and performed an analysis to determine putative cis-acting regulatory elements within these sequences using the PlantCARE(Uses the core version updated in 2012) online tool (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed 27 April 2024)37, after which we examined features linked to phytohormones and stress response38 and used TBtools to visualize the results of the promoter analysis26. In this study, when extracting the promoter sequence of the target gene (the upstream 2 kb region), for the two special cases of “overlapping genes” and “short promoter region (less than 2 kb)”, the following standardized processing strategies were adopted: Handling of overlapping promoter regions When the upstream 2 kb regions of two adjacent genes (such as gene A and gene B, located on the same chromosome and with the same/opposite transcription directions) overlap (i.e., the distance between the downstream end of gene A and the upstream end of gene B is less than 2 kb), we take the midpoint between the transcription start sites (TSS) of the two genes as the boundary. Extract the respective promoter sequences respectively. Handling of shorter promoter regions: When the target gene is located at the end of the chromosome or its upstream is adjacent to the chromosomal structural region (such as centromeres, telomeres), making it impossible to extract the complete 2 kb upstream sequence, we take the actual length from the gene TSS to the upstream chromosomal boundary/structural region as the promoter sequence.

Tissue-specific expression analysis of GmPOD gene family

The Yanglab website (https://yanglab.hzau.edu.cn/SoyMD/#/transcriptomics/expression?) provided information on the expression levels of GmPOD family members in soybean across a variety of tissues, including the roots, stems, young leaves, old leaves, flowers, seeds, embryos, and endosperms. Tissue-specific expression analysis After obtaining the GmPOD gene expression data from the Yanglab website, the original data were converted to log2(tpm + 1) values using TBtools (version 1.120) (genes with log2(tpm + 1) < 1 were defined as low expression or no expression), and tissue expression heat maps were drawn. The resultant data were processed as heatmaps, categorized by expression levels, using the TBtools program26.

Prediction of MiRNA targeting sites and analysis of GmPOD protein structure

Known soybean miRNA sequences were downloaded from the miRBase database (version v22.1)(https://www.mirbase.org/download/). miRNA targeting sites were predicted using psRNATarget (Adopts version 2.0) online tool (https://www.zhaolab.org/psRNATarget/analysis?function=2)39. The prediction results were screened, the prediction sites with the expected value were retained, and the unreasonable prediction results were removed. Using AlphaFold2 online platform (version v2.3.1)(https://alphafold.ebi.ac.uk/) predict the three-dimensional structure of protein genes. Tertiary structure similarity among members of the GmPOD family was evaluated using the TM-scoring tool (http://zhanggroup.org/TM-score/).

Plant materials, growth conditions, and high-temperature and -humidity treatment

May White soybean seeds were soaked overnight at room temperature in an incubator, after which the seeds were spread out on damp filter paper and allowed to pre-germinate for 48 h at 25 °C in a Petri dish (The soybean variety “May White” selected in this study is a conventional medium-maturity variety traditionally cultivated in the middle and lower reaches of the Yangtze River. Germplasm registration has been completed on the national crop germplasm Resources platform (https://www.cgris.net/) (registration number: ZJHZ20220518-01). The specific agronomic traits of this variety are as follows: the plant height is 65–75 cm, with a limited podding habit. The leaves are ovate, the flowers are white, and the pods are yellowish-brown, with 2–3 seeds per pod. The seeds are round, with white seed coats and a weight of 18–20 g per 100 grains.)40. After pre-germination, the seeds were planted in pots with a soil and vermiculite combination (2:1) and cultivated for two weeks at 20 °C in a standard greenhouse with a 16/8-hour light/dark cycle. Thereafter, the seedlings were divided into experimental and control groups. Twelve hours before stress treatment, the potted soybeans in the experimental group were overwatered at one time (300 mL per pot) to maintain the soil relative humidity at 85%−90%; During the entire stress treatment period (0–48 h), the temperature was maintained at 40 °C (with an error of ± 0.5 °C), and the photopod was 16/8 h (light intensity 400 µmol·m⁻²·s⁻¹), forming a clear stress gradient with the control group (20 °C, air relative humidity 50%−55%). In order to evaluate the levels of GmPOD gene expression, leaves from the fourth node were taken at 0, 24, and 48 h; the samples were then subjected to real-time fluorescence quantitative PCR (qRT-PCR) analysis.

RNA extraction and qRT-PCR analysis

With the use of a HiPure Universal RNA mini-kit (MGBio, Shanghai, China), RNA was extracted from soybean leaves. Using MonadTMRTIII Super combined with dsDNase (provided by Monad Biotechnology Co., LTD., Shanghai, China), reverse transcription was carried out. Following the manufacturer’s instructions, the MonAmpTMChemoHS qPCR Mix (from Monad Biotechnology Co., LTD., Shanghai, China) was used to perform quantitative PCR for the target gene on the CFX96 Real-Time PCR machine (BioRad). Relative mRNA expression levels were determined using the comparative threshold cycle (CT) technique41, which took into account information from three different biological replicates. Supplementary File 3 contains a list of the primers used for real-time PCR. In this study, the qRT-PCR experiment selected the soy constitutive actin gene (GmACT1, GenBank registry number: NM_001250579.1) as the internal reference gene for expression level normalization.

Statistical analysis

Three independent replicates were used for the experimental protocols, and the results are shown as averages plus or minus standard deviation (SD). Duncan’s-test was used to determine the statistical significance of any alterations, with a P value of less than 0.05 being considered statistically significant.

Results

Genome-wide identification of GmPOD gene family in soybean

In this study, POD protein sequences from Arabidopsis thaliana and Capsicum annuum L. were used to create a Hidden Markov Model (HMM), which allowed for the discovery of the soybean GmPOD gene family (Supplementary file 1); in total, 180 GmPOD genes were found in soybean as a result of this procedure and were given the names GmPOD-1 through GmPOD-180, based on where in the genome they were found (Supplementary file 4). These detected GmPOD have genomic mRNAs ranging in length from 605 to 21,060 base pairs; the coding sequences for the putative GmPOD proteins ranged from 315 to 1308 base pairs; the proteins themselves had between 104 and 989 amino acid residues, while their molecular weights (MW) and isoelectric points (pI) ranged from 11.48 to 47.34 kDa and 4.14 to 9.84, respectively (Supplementary file 4).

Chromosome localization, gene replication events, and synchronic analysis of GmPOD family

Based on the genetic information acquired, the chromosomal locations of these genes were mapped in order to examine the chromosomal distribution of GmPOD genes in soybean. We discovered that the 180 identified GmPOD genes were dispersed among 20 chromosomes (Fig. 1), as follows: Chr09 had 17 GmPOD genes; Chr02, Chr03, and Chr15 had 15 genes; Chr17 had 12 genes; Chr11 had 11 genes; Chr01, Chr10, and Chr13 had 9 genes; Chr14 and Chr16 had 8 genes; Chr06, Chr12, Chr18, and Chr20 had 7 genes; Chr04, Chr08 and Chr19 had 6 genes; Chr07 had 5 genes; and only Chr05 had 1 GmPOD gene (Figs. 1 and 2). Based on each gene’s chromosomal location, homology and gene duplication events were examined to clarify the duplication history of GmPOD genes over the course of evolution, and 80 (44.4%) of the GmPOD genes were found to have tandem duplications. Red lines in Fig. 2 indicate the locations of 30 pairs of tandem repeats and 56 pairs of segmental repetitions that were found throughout our study, also presented in Table 1. When comparing the homologous fragments of the genomes of peppers, soybeans and Arabidopsis thaliana, many POD genes are the same (Fig. 3). It was particularly found that 82 POD genes were homologous in Glycine max and Arabidopsis thaliana (Fig. 3).

Utilizing the TBtools (v.1.120) program, the non-synonymous (Ka) and synonymous (Ks) substitution rates for these gene pairs were determined in order to obtain a better understanding of the evolutionary dynamics of the GmPOD gene family, as well as to identify the times of evolution and selective pressures operating on the duplicated genes. According to the literature, segmental and tandem repeat occurrences in plant gene pairs are subject to positive, purifying, and neutral selection when the Ka/Ks ratios are > 1, <1, and = 1, respectively42,43. Given that the Ka/Ks ratio was considerably smaller than 1, purifying selection appears to have had a major role in the development of GmPOD gene pairs in soybean. Moreover, the dicotyledonous formula T = Ks/2λ was employed to determine the evolutionary dates of these recurrences44,45. According to Table 1, these estimations suggest that the tandem duplications may have occurred between 1.38 and 55.16 million years ago, and the segmental duplications between 4.38 and 67.46 million years ago.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

The locations of GmPOD genes on chromosomes. Tandem duplication-containing genes are indicated in red. Every bar chart shows the relevant chromosomal number at the top. On the left side of the figure, the chromosomal size is expressed in millions of base pairs (Mb).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

GmPOD gene homology. Gray lines depict all homologous regions within the soybean genome. The red lines in the center signify gene duplication events.

Table 1 Ka, Ks, and Ka/Ks ratio values for the duplication of gene pairs.
Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Analysis of GmPOD genes in soybean alongside two other model plants (Arabidopsis and capsicum). (A) Comparison of Arabidopsis and soybean. (B) Comparison soybean and chili pepper. Gray lines indicate significant collinear regions within and between plant genomes; blue lines depict homologous GmPOD gene pairs. The chromosome number of is displayed above each chromosome.

Construction of GmPODs family protein interaction network

Analysis of protein–protein interactions, or PPIs, is a basic method for facilitating comprehension of the interactions between proteins and how they affect biological processes. Through close examination of the PPI network’s linkages, interactions between different proteins can be found46,47. Using the STRING database, a network comprising 29 soybean GmPOD proteins was created for this investigation (Fig. 4). GmPOD4, GmPOD20, GmPOD30, GmPOD73, GmPOD78, GmPOD148, and GmPOD151 show very robust interactions with other members of the family, as the network illustrates; thus, these proteins could play a key role in preserving cellular homeostasis and responding to stress conditions associated with high temperatures and humidity.Moreover, GmPOD76, which is located in the network’s center, interacts with other genes, demonstrating its crucial function in investigating the roles of the GmPOD gene in soybean under diverse stress circumstances.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

A demonstration of the GmPOD protein interaction network. Each node represents a protein, and each connection represents an interaction. The purple lines represent protein interactions that have been verified through experimentation, the green lines represent protein interactions that are inferred from gene proximity, and the blue lines represent protein interactions that are inferred from gene co-presence.

Phylogenetic analysis of GmPOD proteins

We created phylogenetic trees, utilizing 180 GmPOD, 75 CaPOD, and 73 AtPOD proteins, in order to clarify the evolutionary relationships among the members of the POD gene family and to infer their analogous activities; these proteins were categorized into six groups (Groups I–VI), as shown in Fig. 5. Sets of three (Ia, Ib, and Ic), three (IIa, IIb, and IIc), two (IIIa and IIIb), and two (IVa and IVb) subgroups were further separated into Groups I, II, and III. Group I was made up of seventeen AtPrx members, twenty-three CaPOD members, and seventy-three GmPOD members. Group II consisted of ten AtPrx members, nine CaPOD members, and fifteen GmPOD members. Group III is the largest subgroup, consisting of twenty-eight AtPrx members, twenty-eight CaPOD members, and fifty-three GmPOD members. Group IV consisted of fourteen AtPrx members, ten CaPOD members, and seventeen GmPOD members. Group V consisted of four AtPrx members, five CaPOD members, and ten GmPOD members. Group VI had only seven members, GmPOD41, GmPOD44, GmPOD46, GmPOD48, GmPOD94, GmPOD96, and GmPOD100. These results suggest that soybean has evolved many of its PODs during its long-term domestication.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Analysis of the phylogeny of proteins in soybean, Arabidopsis, and capsicum. The complete amino acid sequences of 180 GmPOD proteins, 73 AtPrx proteins, and 75 CaPOD proteins were aligned using the MUSCLE algorithm within MEGA 11.0. A phylogenetic tree was then built with 1000 bootstrap replicates through MEGA 11.0.

Gene structure and conserved domain analysis of GmPOD family

All members of the GmPOD gene family have numerous exons, according to an intron/exon makeup analysis of these genes. High degrees of conservation and tight evolutionary affinity were shown in the similar intron/exon architecture across the various GmPOD genes within each subgroup (Fig. 6).

Using the MEME suite (Version v5.5.3) (http://meme-suite.org/tools/meme/), we examined the conserved motifs of GmPOD proteins to obtain a thorough grasp of their functional variety. Ten conserved motifs (motifs 1–10) were found in the GmPOD protein sequences, as shown in Fig. 7. The phylogenetic tree and MEME analysis show that, although the motif compositions of GmPODs vary greatly between proteins in different phylogenetic groups, they are more consistent within the same group (Fig. 7). The motifs 1, 2, 3, 4, 5, 6, and 7 are common to many GmPODs. Subgroups II and IV are the only ones with motif 9. Overall, the structural domain compositions and configurations of proteins within the same subgroup are comparable, indicating that their activities may be similar, while variations between individual proteins can be used to deduce functional diversity.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Phylogenetic affiliation and gene architecture diagram for the soybean GmPOD gene family. With MEGA 11.0, the maximum likelihood approach was used to create the phylogenetic tree, which was backed with 1000 bootstrap repetitions for every branch. TBtools version 1.120 was used to visualize the genetic composition of GmPOD. Black lines represent introns, while a yellow box represents the coding sequence (CDS).

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Examination of the conserved motifs and evolutionary relationships within the soybean GmPOD gene family. Using MEGA 11.0, the maximum likelihood method was used to create the phylogenetic tree, and 1000 bootstrap replicates were used for each branch. Multiple Em Motif Induction (MEME) was used to find conserved motifs. Using TBtools version 1.120 and MEME.XML version 5.5.5, a model representing 10 motifs found in the whole GmPOD amino acid sequence was created. Every conserved motif is shown in the structural diagram as a uniquely colored box, oriented in a way that corresponds to the length of its sequence.

Analysis of cis-acting elements of GmPOD family

Using the PlantCARE database, we looked for cis-acting regulatory elements inside the 2000 bp promoter regions of the GmPOD genes in order to explore their possible functions and regulatory processes. Meja response elements (MeJARE), anaerobic response elements (ARE), ABA response elements (ABRE), drought response elements (DRE), low-temperature response elements (LTRE), gibberellin response elements (GARE), defensin and stress response elements (DSRE), SA response elements (SARE), and growth hormone response elements (AuxRE) are just a few of the cis-acting elements linked to plant hormone signaling and the stress responses that our analysis identified (Fig. 8). In all, 180 GmPOD genes were found to include the following elements: 152 ARE elements, 151 ABA elements, 123 MeJA elements, 95 GARE elements, 79 DRE elements, 76 DSRE elements, 71 SA and AuxRE components, and 50 LTRE elements (Fig. 8). These findings imply that these genes may be involved in several plant hormone signaling pathways and stress responses, suggesting that they are involved in how plants respond to their surroundings.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

The relationship between the 2000 bp promoter sequence and the soybean GmPOD gene family was investigated. Using MEGA 11.0, the maximum likelihood method, with 1000 bootstrap repeats per branch, was used to construct the phylogenetic tree. TBtools version 1.098661 was used to display the GmPOD 2000 bp promoter sequence, and the PlantCARE database was used for analysis. Several cis-acting regulatory components are represented with boxes of varying colors.

Analysis of GmPOD family expression in different tissues

We used the SoyMD database (Version v2.0) (available at https://yanglab.hzau.edu.cn/SoyMD/#/) to further explore the expression dynamics of GmPOD in soybean. The transcriptomic data in this database were derived from normally growing tissues under non-stress conditions. GmPOD gene expression profiles of several soybean tissues, for example the roots, sams, young leaves, mature leaves, flowers, seeds, embryos, and emdosperm, were examined. The constitutive expression of GmPOD gene family members varied noticeably among various organs, according to our findings (Fig. 9). With 175 genes found in at least one of the tissues under investigation, the heat map displayed the 180 GmPOD genes’ expression patterns throughout these tissues (Fig. 9). Fourteen genes (POD128, POD24, POD11, POD55, POD137, POD149, POD59, POD1, POD176, POD85, POD63, POD118, POD157, and POD141) showed increased expression (TPM ≥ 3.0) in soybean floral tissues; three genes (POD38, POD13, and POD53) were comparatively high expressed (TPM ≥ 3.0) in soybean endosperm tissues; in soybean leaf tissues, three genes (POD46, POD48, and POD105) had comparatively high expression levels; four genes (POD147, POD26, POD57, and POD75) displayed comparatively high expression levels in soybean seed tissues (TPM ≥ 3.0); and, in soybean embryos, two genes (POD45 and POD112) showed increased expression. Moreover, most genes were expressed at comparatively high levels in the tissues of soybean roots and stems (Fig. 9). According to these data, every GmPOD gene demonstrates a unique tissue-specific expression pattern, which may be related to their functions in coordinating plant development, growth, and stress responses.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Expression profiles of GmPOD gene in eight different tissues of soybean: root, stem, leaf, bud, pod, tip, callus and seedling. GmPOD transcriptome analysis of gene expression data from SoyMD database (https://yanglab.hzau.edu.cn/SoyMD/#/) to download. Heat maps were constructed by TBtools (v.1.120) using Log2 (TPM + 1) values for each gene, ranging from low expression levels (blue) to high expression levels (red).

Analysis of GmPOD gene expression profile under heat stress

We used qRT-PCR to measure the expression of GmPODs after exposure to stress from high temperatures and humidity in order to explore their possible involvement in response to these circumstances. The research found that the nine genes GmPOD2, GmPOD50, GmPOD67, GmPOD95, GmPOD135, GmPOD144, GmPOD152, GmPOD153 and GmPOD168 cover the main subfamilies (Ib, Ic, IVb, etc.) of the GmPOD phylogenetic tree (Fig. 5). Ensured the characterization of the entire gene family. It has a complete CDS region and conserved functional motifs (Figs. 6 and 7), indicating potential enzymatic activity. They all exhibit sensitive characteristics to high humidity and high temperature stress. Interestingly, GmPOD135 showed a quick rise in expression within a brief timeframe (Fig. 10a, b). Among these genes, leaves under high-temperature and -humidity stress showed greater expression levels of GmPOD50, GmPOD67, GmPOD153, and GmPOD168, from the Ib, Ic, and IVb families, indicating substantial involvement for these genes in stress response.

Using relative expression values, we then generated relative expression heat maps for these nine GmPOD genes to measure their sensitivity to high-temperature and -humidity stress (Fig. 10c), and all GmPODs showed higher expression after a stress treatment of 40 °C, compared to the 0-hour treatment. The relative expression levels of GmPOD2 and GmPOD135 in leaves reached their highest at 24 h under the 40 °C condition, whereas GmPOD50 and GmPOD67 reached their peaks at 48 h, as shown in Fig. 10c. In comparison to 0 h, the relative expression of GmPOD67 rose by roughly 10 times at 24 and 48 h, while the expression of GmPOD135 increased by nearly eight times at 24 h.

On the other hand, most genes’ relative expression levels were mostly steady at 24 and 48 h at 20 °C. The genes GmPOD50, GmPOD67, GmPOD153, and GmPOD168 had consistently elevated expression levels during the high-temperature and -humidity stress, implying that these genes may have an effect on soybean response to environmental stressors.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

Dynamics of GmPOD gene expression in soybean. (A) Leaf phenotype of soybean treated with different stress durations. (B) The GmPOD genes’ relative expression levels in leaves after 24 and 48 h at 20 and 40 degrees Celsius. The three biological replicates’ means and standard deviations are shown in the data. Duncan’s multiple range test yielded statistical significance levels, indicated with asterisks (* p < 0.05, ** p < 0.01, *** p < 0.001). (C) Heat maps showing the expressions of GmPOD genes in leaves at different temperatures (20 °C and 40 °C). The graph’s color gradients show the relative expression levels of each gene in several samples.

MiRNA targeting sites prediction

Using the plant small RNA target server psRNATarget (https://www.zhaolab.org/psRNATarget/analysis?function=2) with default parameters (expectation values set to ≤ 3.0), we identified a total of 49 miRNAs targeting ten out of total GmPOD (Fig. 11). GmPOD170 was predicted to be targeted by the maximum number of miRNAs, while GmPOD168 was targeted by only one miRNA. Most miRNA is specific to its corresponding GmPOD gene, with the cleavage identified as the primary regulatory effect (Fig. 11). The possibility for miRNA targeting of those GmPOD genes, evaluated by expectation values from psRNATarget, was represented by a positive correlation with circle size in Fig. 11.

Fig. 11
Fig. 11The alternative text for this image may have been generated using AI.
Full size image

Prediction of miRNA targeting 10 genes in Glycine max. Pink oval shapes indicate those ten family members, and green solid circles denote the targeting miRNAs. Circle size of each miRNA inversely correlates with the expectation values determined by psRNATarget, with smaller circles indicating higher possibility of target accessibility.

GmPOD protein structure analysis

We then utilized the AlphaFold2 online platform (https://alphafold.ebi.ac.uk/) to predict the 3D structures of these ten genes proteins (Fig. 12). Tertiary structural similarity among the GmPOD family members was assessed by using the TM-score tool (version v20190822) (http://zhanggroup.org/TM-score/). The results revealed that GmPOD95, GmPOD168, GmPOD50, and GmPOD152 shared higher similarity (all pairwise TM-scores above 0.5), while GmPOD135 displayed distinct structural features compared with the other members (Fig. 12). These findings are consistent with their subfamily classification based on protein sequence similarity.

Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.
Full size image

GmPOD Protein Structure Analysis. Predicted 3D structures of proteins encoded by ten genes and tertiary structural similarity analysis within the GmPOD family.

Discussion

Class III POD enzymes play crucial roles in the control of many physiological processes in plants, especially when they are responding to various biotic and abiotic stressors32,48. Numerous plant species have been subject to extensive genome-wide investigations into the POD gene family, including Arabidopsis thaliana32, Populus trichocarpa49, Zea mays5, Pyrus bretschneideri48, Manihot esculenta50, Capsicum annuum51, Litchi chinensis Sonn52, and Brassica napus53. In this study, 180 GmPOD genes were fully discovered in soybean, and their amino acid sequences and evolutionary connections allowed us to group the genes into six subfamilies (Fig. 5); this figure is higher than the numbers previously reported for Arabidopsis (73), capsicum (75), cassava (91), pilocarpa (93), alfalfa (102), and maize (111), suggesting that soybean has a significantly larger GmPOD gene family than other plant species54. Our subfamily classification highlights the conservation of these genes throughout many plant species and is consistent with previous findings for POD genes in other plants.

Moreover, investigations of conserved domains, motifs, and gene structures support our GmPOD subfamily categorization, showing that all subgroups have comparable domain arrangements, motif patterns, and exon/intron topologies (Figs. 6 and 7). Members of the same class of the GmPOD family were found to generally have comparable motif types and numbers in terms of conserved domains and motifs (Fig. 7); for example, similar domain configurations and motif compositions of GmPOD proteins tend to cluster into the same subfamily (Fig. 5), and the phylogenetic analysis results are consistent with the structural makeup of each GmPOD gene cluster. Numerous other species, including pears48, maize5, and potatoes55, also exhibit this pattern, suggesting that GmPOD proteins with comparable domain compositions and motif patterns may have similar activities, as a protein’s domain structure largely determines its activity.

Several studies have demonstrated that, over the course of evolution, introns are progressively integrated and maintained within the genome56. Intron deletion and insertion are frequent occurrences that can lead to a wide range of functional alterations in the evolution of genes. Genes with a comparatively large number of introns may result from old genes being replicated, which might have started with minor exon shuffling57. Analysis of the 180 GmPOD genes’ structures showed that there was a wide range of variation in the number of introns (zero to six), with most of the genes having two introns. Multiple introns indicate that the GmPOD gene structure has changed over time. As seen in Figs. 6 and 7, proteins belonging to the same subfamily notably differ in intron/exon architecture and motifs, which suggests that exon shuffling may be a common evolutionary feature of GmPODs and a major contributor to the functional variety of the soybean GmPOD family. The development of multigene families is often influenced by variations in intron/exon structure; of the 180 GmPOD genes, more than half are multigene assemblies. This phenomenon is especially clear in the 4-intron/2-exon model, which is representative of an ancestral intron pattern for the POD gene family and is present in a significant fraction of Arabidopsis32 and pepper33 genes, in addition to a large number of GmPOD genes.

Replication is one of the main processes driving genome evolution57, and segmental and tandem replication are two types of gene replication processes that are essential to the development of gene families58. Two gene copies may result from these occurrences, and under lessening evolutionary selection forces, one or both copies may acquire new roles59, and, because each paralog is specialized for a certain task, gene families are expanding60. Tandem replication, which frequently results from uneven crossing over many crossing events, is the duplication of one or more neighboring genes61. Large genomic regions can be duplicated as a consequence of chromosomal, fragment, or genome-wide replication, often resulting in multiple losses and rearrangements60, referred to as fragment replication events. Soybean has had two whole-genome duplications (WGD) and one whole-genome triplication (WGT)62,63; as a result, the soybean genome has undergone high levels of replication and has many copies of its genes. Within the GmPOD gene family of soybean, we found 86 duplicate gene pairs in total (Table 1), of which 56 are segmental duplicate gene pairs and 30 are tandem duplicate gene pairs; this number is significantly higher than the numbers of duplicate gene pairs found in pear (26 pairs)48, maize (28 pairs)5, and rice (48 pairs)6. Given that segmental duplicate gene pairs outnumber tandem duplicate gene pairs, it is likely that segmental duplication had a major role in the GmPOD gene family’s proliferation in soybean. The segmental and tandem repeat gene pairs’ Ka/Ks ratios were both much lower than one, showing that selection of purification was a major factor in the development of the GmPOD gene family in soybean.

The GmPOD gene family has six subfamilies, according to phylogenetic research. Subgroup VI genes stand out from those of the other five subgroups due to their unique multiple-exon signatures, a discovery consistent with earlier research in watermelon64, Chinese pears48, and cassava50, thus indicating that POD subgroup VI may have a common ancestor with other plants and may have particular purposes in these plants. Gene regulatory networks (GRNs) are complex systems that control growth and development in plants; they comprise transcription factors, regulatory RNAs, and enzymes65, and the plant’s response to various environmental stress conditions is largely mediated by these GRNs. The defense regulatory system of plants is stimulated, and downstream gene transcription is activated, via the interaction of transcription factors with cis-regulatory elements in the promoter region66,67, a system essential for controlling plant growth, development, and stress tolerance. In order to acquire more understanding of GmPOD regulation, we investigated the cis-acting components located in the 2.0 kb promoter region. MeJARE, ARE, ABRE, DRE, LTRE, GARE, DSRE, SARE, and AuxRE are examples of cis-acting elements whose functional annotations demonstrate their participation in a variety of regulatory processes, including development, hormone signaling, cell cycle regulation, abiotic/biotic stress responses, and transcription; these results provide a basic knowledge of the function of the GmPOD gene in several plant processes and responses to stress. Auxin, gibberellin, brassinosteroid, ABA, and MeJA are important plant hormones that influence how a plant reacts to stressors such as high temperatures and high humidity68; for example, MeJA specifically increases the activity of antioxidant enzymes to protect cell membranes and support the cells’ capacity for osmotic control when plants are subjected to stressors such as high temperatures and humidity.

Gene function may often be inferred from patterns of gene expression69,70; for a number of plant species, including pears48, tobacco71, potatoes54, and peppers33, POD gene expression has already been observed and assessed. Thus, we used the SoyMD database (https://yanglab.hzau.edu.cn/SoyMD/#/) to investigate the GmPOD expression levels in different soybean tissues, such as roots, stems, leaves, buds, seed tissues, apical tissues, calli, and seedlings, the results of which were displayed using TBtools software. Out of the 180 GmPOD genes, only 5 were found to be either hardly expressed or not expressed at all in the tissues that were analyzed. The remaining genes appeared to play crucial functions in soybean growth and development, as they were expressed in different tissues and helped to create different plant tissues (Fig. 9), results which support previous studies showing the importance of GmPOD expression in plant growth and development5,32,33,49,50 [5,33,34,50,51,]. Group IV includes six comparatively high expressed genes in all tissues (GmPOD8, GmPOD90, GmPOD107, GmPOD112, GmPOD153, and GmPOD166), suggesting that these genes may play important maintenance functions in soybean cells grown under typical circumstances; nonetheless, distinct tissues express some GmPOD genes differently. GmPOD26, GmPOD57, GmPOD75, and GmPOD147, for example, were comparatively high expressed in seeds but barely expressed at all in leaves, indicating that their primary function is in seed growth and development. It is also noteworthy that the roots included the greatest number of GmPOD genes with comparatively high expression levels (Fig. 9). These findings are in line with research on rice6, Arabidopsis32, and maize5, suggesting that the POD family may be essential to plant root function.

Additionally, it was discovered that some GmPOD genes—particularly, GmPOD44, GmPOD46, GmPOD105, GmPOD141, and GmPOD168—were strongly expressed in leaves; meanwhile, in blooming tissues, GmPOD1, GmPOD11, GmPOD24, GmPOD55, and GmPOD59 all showed increased expression (Fig. 9). These expression patterns, distinct to different tissues, point to the functional specialization of these genes and the important functions they play in those tissues. Nine GmPOD genes showed notable changes in expression under high-temperature and -humidity stress, according to an assessment of relative gene expression levels. These genes belonged to the subfamilies Ib, Ic, IIa, IIIa, and IVb, and included components that were sensitive to abiotic stressors, including DRE and ABA; additionally, cis-regulatory elements linked to hormones, such as ABA, GARE, and AuxRE, were found in most of these genes.

The results of our expression study showed that, under conditions of high temperature and high humidity, the expression levels of GmPOD50, GmPOD67, GmPOD152, GmPOD153, and GmPOD168 in leaves increased initially, and subsequently decreased over the course of 24 h, and, over the course of 48 h, this pattern continued for GmPOD50 and GmPOD152. On the other hand, the expression levels of GmPOD95 and GmPOD144 first dropped, and subsequently rose. Under conditions of high-temperature and -humidity stress, the Ib, Ic, and IVb families of genes, including GmPOD50, GmPOD67, GmPOD153, and GmPOD168, showed greater expression levels in leaves. After 48 h of stress treatment, the relative expression levels of four genes (GmPOD50, GmPOD67, GmPOD153, and GmPOD168) increased in comparison to those of the 0-hour control group, indicating that these genes are important for soybean response to these conditions. Plant hormones, which control important signaling molecules involved in plant growth, development, and stress response, are mostly found in these genes; thus, their reaction to high humidity and high temperature may be related to hormone control, as evidenced by the upregulation of their expression levels.

MicroRNAs (miRNAs), 21–24 nucleotide (nt) short small non-coding RNAs, regulate gene expression at the post-transcriptional level by targeting complementary mRNAs for cleavage or translational repression, playing critical roles in diverse plant developmental processes. According to the prediction results of miRNA targeting sites in Fig. 11, GmPOD170 was targeted by the largest number of miRNAs, while GmPOD168 was targeted by only one miRNA. Most mirnas are specific to their corresponding GmPOD gene, and cleavage was identified as the primary regulatory role (Fig. 11). Structural analysis of GmPOD proteins showed that GmPOD95, GmPOD168, GmPOD50, and GmPOD152 had higher similarity, while GmPOD135 showed different structural features compared to the other members (Fig. 12). These findings are consistent with subfamily classification based on protein sequence similarity.

According to current forecasts, within the next 20 years, the average global temperature of Earth is predicted to rise by 1.5 degrees Celsius72, while studies have shown that there would be a 3.1% drop in worldwide soybean yield for each degree Celsius the average global temperature rises21. For this reason, it is essential to comprehend how soybean plants react to stress from high temperatures and high humidity, as well as to improve crop tolerance to these circumstances, in order to promote human growth and ensure worldwide food security. In order to identify genes for molecular breeding techniques through which to address the problems of soybean growth under high-temperature and -humidity stress in this study, we used the expression pattern of the GmPOD gene family. For example, under the high-temperature and high-humidity stress treatment, GmPOD50, GmPOD67, GmPOD153, and GmPOD168 consistently showed comparatively high expression levels, suggesting that these genes may be essential for soybean’s reactions to such circumstances. A theoretical foundation is provided through the examination of cis-acting components in the promoter region for further research into the regulation processes of different variables on the GmPOD family, which may have complex roles in hormone regulation or signal transduction, as suggested by the existence of many hormone-related cis-acting elements; nevertheless, further investigation is required to verify these theories.

Conclusions

In the current research, a detailed whole-genome analysis of GmPOD gene families in soybean was carried out, and 180 GmPOD genes were identified. Based on their evolutionary links, these genes were arranged into six subfamilies and found to be distributed throughout 21 chromosomes. We evaluated the structures, conserved motifs, cis-acting regions, protein interaction networks, and homology of these genes, performed miRNA targeting site prediction and GmPOD protein structure analysis to better understand the evolutionary connections among members of the GmPOD gene family. In addition, we used qRT-PCR to explore the GmPOD gene expression patterns in different organs under different stress scenarios, providing information for further studies on the GmPOD genes’ functions. The finding of this research supply a theoretical framework for investigating the molecular regulatory systems of soybean under stress from high temperatures and high humidity and lay the groundwork for discovering important GmPOD candidate genes in soybean that are sensitive to these stressors.