Abstract
Cowpea (Vigna unguiculata (L.) Walp.) is a highly versatile and resilient crop, globally ranking as the third most pivotal grain legume. However, various biotic, abiotic, and physiological challenges, often hinder its productivity. Cowpea exhibits complex environmental adaptive responses regulated at the transcriptional and translational levels through mechanisms such as resistance genes (R-genes), transcription-associated proteins (TAPs), and protein kinases (PKs). A comprehensive study was conducted based on a whole-genome hybrid assembly (Illumina and Nanopore) in cowpea, revealing the identification of 2188 R-genes (29 classes), 5573 TAPs (118 families) and 1135 PKs (22 groups, 122 families). Among the R-genes, Kinases (KIN) and transmembrane proteins (RLKs and RLPs) were prominent, while CCHC (Zn), C2H2, MYB-HB-like, WD40-like, bHLH, and ERF families were notable among TAPs. The largest kinome group, RLK-Pelle, encompassed over three-fifths of the cowpea PKs (VuPKs), followed by CAMK and CMGC groups. Two and three novel families in TAPs (ABTB and CW-ZN-B3_VAL) and PKs (RLK-Pelle-URK-1, RLK-Pelle-URK-2, TKL-Cr-3), respectively, were identified along with two novel PK groups (NAK and TLK). Dispersed and tandem duplication events under purifying selection mainly contributed to kinome expansion, with chromosome ‘Vu03’ anchoring the maximum PKs. This investigation delves into the biological intricacies with manipulative potential to enhance cowpeas’ resilience to environmental challenges without compromising yield.
Similar content being viewed by others
Introduction
Identification of climate resilient crops assumes prominence among the major research priorities for ensuring global food and nutritional security. Legumes are known to evince acclimatizing adaptation to an extensive range of ecological conditions from arid to temperate climates. Among the hardy legumes, cowpea (Vigna unguiculata (L.) Walp.) is a multifarious crop endowed with climate smart attributes and laden with inherent potential to mitigate the vagaries of climate change1,2.
Globally, cowpea is the third most important grain legume crop in terms of area and production, next only to dry beans and chickpea. Cultivated over an estimated area of 15.19 Mha and production of 9.77 Mt3, the productivity of cowpea (643.5 kg/ha) is abysmally low compared to other legumes like chickpea and dry beans. Various biotic and abiotic stresses predominantly hinder cowpea productivity and augmentation of resistance against these factors is a research prerogative of eminence. Even though cowpeas are biologically resilient, the low productivity is essentially attributed to its cultivation under subsistence or marginal conditions with minimal inputs, limited access to improved varieties and largely grown as a mixed intercrop rather than sole crop4. From the genomics perspective, until recently, cowpea genetic improvement was on the backfoot unlike its counterparts like soybean, chickpea, and common bean. Cowpea is a diploid fabid with a chromosome number 2n = 22 and an estimated genome size of about 641 Mbp5. With next-generation sequencing becoming increasingly cost-effective and the various genomic tools and whole genome sequence of cowpea5 becoming readily accessible, the genetic improvement of cowpea has gathered momentum in recent times.
Like other plants, cowpea, when confronted with challenges imposed by various biotic and abiotic stresses, has a myriad of complex adaptive response mechanisms to thwart or minimize the negative impacts stemming out of such situations. The rapid and efficient responses are moderated at the transcriptional or translational levels primarily through resistance genes (R-genes), transcription-associated proteins (TAPs) and protein kinases (PKs) among others. R-genes are largely involved in the response against biotic stresses, especially the disease-causing pathogens6. The disease resistance reaction of plants is the consequence of the interactions between the R-genes and pathogen specific effector molecules called avirulence proteins7. Upon recognizing the invading pathogen, the R-genes elicit defence response signalling cascades against it. These R-genes consist of well conserved domains and motifs such as the N-terminal transmembrane (TM), serine/threonine and tyrosine kinase (STTK), lysin motif (LysM), coiled-coil (CC) and Toll/Interleukin-1 receptor (TIR), central nucleotide binding site (NB-ARC), and the C-terminal leucine rich repeats (LRR)8,9. These conserved structural features are utilized by various bioinformatic tools for mining the R-genes from the whole genome sequences. In plants, among the R-genes, the nucleotide-binding leucine-rich repeat (NLR) class of genes is most widely prevalent10. A collection of 481 NLRs have been validated through experimentation in 31 plant genera11. These R-genes are known to play a pivotal role in disease resistance breeding and therefore, it is imperative to envisage the R-genes prevalent across the genome.
The developmental and resilient adaptive responses of plants to environmental variations are dynamically moderated primarily through the reprogramming of gene expressions by various transcriptional regulatory factors, apart from epigenetic and translational control. In plants, intricate systems of TAPs regulate the transcription of protein encoding genes12,13. TAPs are primarily comprised of: (1) Transcription factors (TFs) that bind in a sequence-specific manner to cis-acting non-coding DNA regulatory regions; (2) Transcriptional regulators (TRs) that bring out regulation by non-specific DNA binding, protein–protein interactions or chromatin remodelling; and (3) Putative TAPs (PTs) with hitherto unknown roles14. Apart from these factors, a variety of other TAPs like RNA polymerases, mediator complexes, polyadenylation factors, transcription elongation and termination factors also work in a coordinated manner to regulate the transcription of genes, ensuring the accurate and controlled synthesis of RNA molecules in response to various cellular signals and environmental cues. As per an estimate, at least 5–7% of the genome encoded proteins in plants are tasked with transcriptional regulation15,16,17. The fraction of TFs in the expressed genome is in positive consonance with the complexity of the organism18.
The third major family of proteins coordinating the signalling pathways in tandem with other regulatory factors are the PKs. Kinomes, the whole set of kinases present in the genome, constituting about 1–2% of functional proteome, usually present a conserved catalytic domain comprising 250–300 amino-acids19. The PKs primarily regulate the adaptive and cellular functions through reversible phosphorylation and post-translational modification of downstream target proteins influencing their activities, localization, inter-protein interactions, and other features20. Activation of the proteins by appending phosphate moieties conjures a cascade of signal transductions eventually modulating the adaptive, developmental, metabolic, and stress-responsive cellular processes. PKs are documented as key elements in plant responses to biotic and abiotic stresses viz., water scarcity, high salinity, low temperature, and pathogen attack20.
In cowpea, the meagre reports on R-genes, TAPs, and PKs encouraged the present study to comprehend these regulatory factors using the hybrid assembly (Illumina and Nanopore) of a whole genome sequence of cowpea cultivar ‘CPD103’. The genome-wide identification and characterization of these regulatory factors will help not only understand the regulatory intricacies controlling the biological processes in legumes but also will pave the way for manipulating the processes to the relative advantage of mankind.
Materials and methods
Materials
The cowpea cultivar ‘CPD103’ is being used at our institute in cowpea mutation breeding programme for induction of resistance against cowpea aphid borne mosaic virus. To generate genomic resources, ‘CPD103’, also known as ‘CDS’, was subjected to de novo whole genome sequencing using Illumina and Nanopore sequencing techniques for hybrid assembly.
DNA extraction and quality control
Genomic DNA was extracted in duplicate from young leaves using Qiagen DNeasy Plant Mini kit as per the instructions contained in the kit and previously described21. Finally, DNA was eluted with 50 μl of 10 mM Tris–Cl (pH 8.0). The eluted genomic DNA was quantified and evaluated for quality using Nanodrop 2000 (Thermo Scientific, USA), Qubit (Thermo Scientific, USA), and agarose gel electrophoresis (samples with A260/A280 ratio 1.8–2.0, A260/A230 ratio > 1.8, Qubit concentration > 10–20 ng/µl for Illumina and > 50 ng/µl for Nanopore and those not showing smearing, degradation, RNA contamination or faint bands in gel electrophoresis were only used for sequencing).
Illumina and nanopore library preparation and sequencing
Library construction and sequencing was carried out at M/s Genotypic Technology, Bengaluru, India, as detailed previously21. Briefly, library preparation involved NEXTFLEX Rapid DNA-seq kit (BIOO Scientific, Inc. U.S.A.) for Illumina platform as per the manufacturer’s directives. The Qubit quantified DNA (500 ng) was fragmented (200–250 bp) by sonication (Covaris S220, USA), purified, ligated to multiplex-barcoded adaptors, and prepared the sequencing library by PCR-amplification for 4 cycles using kit provided primers. The library was thereafter, purified, checked for quality, Qubit quantified, and analyzed for fragment size distribution (Agilent 2200 Tape Station). The paired-end sequencing of equimolar-normalized library post multiplexing was carried out on a HiSeq X Ten Illumina sequencer as per the instruction of the manufacturer (150 cycles). For Nanopore library preparation, the end-repaired (NEBNext ultra II end repair kit, New England Biolabs, MA, USA) and purified DNA (1.5 μg) were ligated with adapter (AMX) at room temperature (20 °C) for 20 min using NEB Quick T4 DNA Ligase (New England Biolabs, MA, USA). The ligation sequencing kit (SQK-LSK109) provided elution buffer (15 μl) was used for eluting the purified reaction mixture and constituted the sequencing library. Long read sequencing was accomplished on a GridION X5 (Oxford Nanopore Technologies, Oxford, UK) sequencer with a SpotON flow cell R9.4 (FLO-MIN106) following a 48-h sequencing-protocol (20 × depth) and base calling was performed on the raw reads (‘fast5’ format) by Guppy basecaller3 v2.3.4 tool (https://nanoporetech.com/document/Guppy-protocol#windows-guppy). The sequencing samples in both the cases consisted of two biological replicates and two technical replicates.
Bioinformatic analysis
The hybrid (MaSuRCA v3.4.2)22 assembled genome was processed for repeat region masking using RepeatModeler v2.023 and RepeatMasker v4.0.624. The completeness of drafted genome assembly was validated (BUSCO v3.0.225) by retrieving the annotation percentage of complete genes. The draft genome of ‘CDS’ sample along with transcript data and reference protein data were used for gene prediction in BRAKER v2.1.4 tool26. The hybrid draft cowpea genome/proteome sequence of ‘CDS’ (Supplementary material SM) was used further for mining R-genes, TAPs, and PKs.
Prediction of R-genes
Putative R-genes in the draft cowpea genome assembly were predicted and annotated by employing Disease Resistance Analysis and Gene Orthology (DRAGO 3)27 pipeline available online through web interface (http://prgdb.org/prgdb4/drago3) using the proteome generated through the BRAKER tool. The DRAGO 3 pipeline makes use of the Pathogen Recognition Genes database (PRGdb v4)27 repository for prediction of putative genes based on the candidate recognition genes. DRAGO 3 performed multiple sequence alignment for each PRG class using MEGA X28 (MUSCLE algorithm with default parameters) prior to creating hidden Markov model (HMM) using HMMER v3 package29. An in-house PERL script filtered the best alignments with a minimum BLOSUM62 score of + 1 and peptides with at least 10 AAs were only considered. The HMM modules created by the PERL script were used to detect LRR, Kinase, NBS and TIR domains, while CC domains and TM domains were detected by DRAGO 3 using COILS v2.230 and TMHMM v2.0c31 programs. In addition, DRAGO 3 also detects LYK and LYP proteins containing LYSM (Lysin motif) in the place of LRR domains and also LECRK proteins containing lectin-like motifs (LECM).
Prediction of transcription associated proteins (TAPs) and protein kinases (PKs)
The TAPs containing the TFs and TRs present in the draft cowpea genome assembly were predicted using three different identification pipelines: the PlantTFcat pipeline32, the iTAK v1.6 pipeline33 and PlantTFDB v5.034 module in PlantRegMap34. The PlantTFcat pipeline utilizes InterProScan v5.59–91.035 to systematically search proteins for TFs/TRs/chromatin remodelling (CR)-related domain signatures. The iTAK pipeline based on PFAM domain models and consensus rules summarized from different pipelines, was used to identify and classify TFs, TRs and PKs from protein or nucleotide sequences into different gene families. PlantTFDB prediction tool adopts an integrative strategy by combining sequence-based prediction (InterProScan), orthologous-based projection, and collection of annotation in canonical sources [The Arabidopsis Information Resource (TAIR36) and UniProt37] to identify TFs. The non-redundant families identified by all three pipelines were used to predict the maximum number of TFs and TAP families in the proteome.
The cowpea PKs (VuPKs) were predicted using iTAK that is based on significant hit to protein kinase domains (PF00069, PF07714, or PF00481) in the Pfam database38 that were classified into gene families by comparing their sequences to a set of HMMs19. The sub-cellular localizations of these VuPK genes were predicted using CELLO v.2.539 and LOCALIZER v1.0.4 tools40. The VuPK protein sequences were submitted to ProtParam41 (https://web.expasy.org/protparam) to determine the molecular weights and theoretical isoelectric points (pIs).
For comparative analysis of TAPs and PKs, the reference genome of cowpea in NCBI (assembly ASM411807v2) was also analysed using iTAK and the results were compared with the predictions involving our genome.
Expansion mechanisms of VuPK genes
The duplication mechanisms leading to the origin of the VuPK genes were discerned using the Multiple Collinearity Scan toolkit vX42 (MCScanX) software package. MCScanX identified PK homologs along the V. unguiculata genome and categorized the duplication events into tandem and segmental duplications. The PK genes devoid of any duplicates were classified as “singletons”, while those with gene ranks less than 20 (gene ranks were assigned based on the order of chromosomal location) were considered “proximal duplicates”. Adjacent PK gene pairs with unit gene rank differences were classified as “tandem duplicates” and those with BLASTp hits exceeding 20 gene ranks were christened “dispersed duplicates”. The anchor genes in collinear blocks across chromosomes were regarded as “WGD/segmental duplicates”. Genes with multiple BLASTp hits were uniquely assigned one of the above classes in accordance with their precedence order (segmental, followed by tandem, proximal, and dispersed). The coding sequences of the tandemly duplicated VuPK genes, post alignment using Clustal Omega43 (EMBL-EBI Job Dispatcher sequence analysis tools framework44), were analysed through MEGA v11.0.13 for determining Ka (non-synonymous substitution)/Ks (Synonymous substitution) ratios. The substitution ratios resolved using standard genetic code following Nei-Gojobori method (Jukes-Cantor model) served as indicators of the selection nature these VuPK genes were subjected to. The duplicated gene pairs with Ka/Ks ratio of less than “1” could be construed to be under purifying selection (negative selection) resulting in conserved amino acid sequences, while those with more than “1” were deemed to have undergone positive or Darwinian selection leading to altered peptides. The duplicating genes with Ka/Ks ratios equal to one were profoundly uninfluenced by neutral selection, negating changes in amino acid sequences45.
Validation of in silico determined R-genes, TAPs and PKs
Twenty gene sequences (CDS) each from the in silico-identified R-genes, TAPs, and PKs, were randomly selected for genic primer design using Primer3web v4.1.046 with default parameters. The synthesized primers were used for polymerase chain reaction (PCR) amplification in ten diverse cowpea genotypes (GC3, TC901, C-152, PL-1, ARC-1, PLM211, NBC-1, Vu-89, VBN-3, VBN-1). For each genotype, DNA was extracted from two biological replicates. Each 25 µl PCR reaction contained 75 ng of genomic DNA, 1 µM each of forward and reverse primers, 250 µM dNTPs, 1 × Taq buffer with Mgcl2, and 0.85U of Taq DNA polymerase (Qiagen). Amplifications were performed in a Nexus Eppendorf thermal cycler using the following program: initial denaturation at 94 °C for 4 min; 35 cycles of denaturation at 94 °C for 1 min, annealing at 55–60 °C (depending on primer Tm) for 30 s, and extension at 72 °C for 1 min; followed by a final extension at 72 °C for 6 min. PCR products were separated on 2% agarose gels using a 100 bp DNA ladder as a size marker and visualized using a Syngenius gel documentation system (Syngene, UK). The sizes of the amplified fragments were compared with the expected amplicon lengths to validate primer specificity and amplification efficiency.
Transcriptomic analysis under biotic and abiotic stresses
RNA-seq raw data (Illumina paired-end reads) from cowpea plants subjected to biotic stress (infection with cowpea aphid-borne mosaic virus-CABMV) and abiotic stress (root dehydration) were retrieved from the NCBI Sequence Read Archive. The datasets, accessed via BioProject accessions PRJNA655993 and PRJNA605156, respectively, were then analysed for differential expression of various R-genes, TFs and PKs under these stress conditions. Details pertaining to sampling, stress application methods, sequencing, and cowpea genotypes used for transcriptome analysis have already been published47. Briefly, for biotic stress, young trifoliate leaves of the greenhouse-grown cultivar ‘IT85F-2687’ were mechanically injured with carborundum before applying viral inoculum and leaf samples were collected at 60 min and 16 h post-inoculation. For abiotic stress, root dehydration in the hydroponically grown cultivar ‘Pingo de Ouro’ involved withdrawing the nutrient solution, with root samples taken at 25 min and 150 min after treatment. Both stress conditions were applied during the V3 development stage, and the experimental design consisted of three biological replicates and two technical replicates. Bioinformatic analyses were performed on Galaxy web platform48. Briefly, the raw data were subjected to initial quality assessment with Falco v1.2.449 and the quality was further improved through trimming and filtering using Cutadapt50 with minimum Phred score of 30 and minimum read length of 80 bp. The trimmed paired-end reads were then mapped to the Vigna unguiculata reference genome assembly (ASM411807v2) and gene (gtf) annotation files (downloaded from NCBI) using RNA STAR51 v 2.7.11a (with default settings excepting that the value of 200 was input as the length of genomic sequence around annotated junctions). The resulting BAM files were used for counting the number of reads per annotated gene using FeatureCounts52 v2.0.8 with minimum mapping quality per read of 30. The read counts were further used for analysing differential gene expression (DGE) using DESeq253 v2.11.40.8 with normalization for sequencing depth and default settings. The annotated DESeq2 files were filtered to extract genes with a significant change in gene expression (adjusted p value < 0.05 and |log2FC|> 1) between treated and untreated samples. The volcano plots of DGEs were created through ggplot254 v3.5.2 within the Galaxy platform.
Results
Whole-genome sequencing of cowpea genotype ‘CDS’
The cowpea genome (‘CDS’) was de novo assembled through a hybrid (Illumina and nanopore) whole genome sequencing approach. Illumina sequencing generated a total of ~ 241 million short-reads, while the nanopore sequencing produced ~ 7.7 million long-reads, resulting in a sequencing coverage of ~ 120× for Illumina data and ~ 20× for nanopore data. Prominent assembly features are presented in Supplementary Table S0. The hybrid genome assembly resulted in a haploid genome size of ~ 325 MB which covered ~ 87% of the haploid genome estimated by the KmerGenie program (Supplementary Fig. S1). The final draft assembly of the genome was generated post processing of the assembled genome for repeat region masking. The completeness of the assembled draft genome was validated by read utilization and identification of single copy genes. About 94% percent of read utilization and 93.4% of BUSCO (Benchmarking Universal Single-Copy Orthologs) completeness (C: 93.4% (S: 92.0%, D: 1.4%), F: 1.5%, M: 5.1%, n: 5366) confirmed good draft assembly (Supplementary Fig. S2). The proteome sequence of ‘CDS’ generated by the BRAKER (bacronym for Bioinformatics Re-analysis Automation of Known and Expressed Regions) tool was used further for downstream analysis.
Prediction of R-genes
A total of 65,708 protein sequences that were generated by the BRAKER tool were analysed for prediction of R-genes using DRAGO 3. R-gene related domains and motifs were predicted in 2188 proteins belonging to 28 different classes (Table 1, Supplementary Table S1a). Maximum number of proteins containing the R-genes related domains belonged to kinases (855), followed by transmembrane receptors RLKs (Kin-LLR) (258), and RLPs (Ser/Thr-LRR) (238). Eight classes (CNL, TNL, NL, CN, TN, N, CTNL, CNT) harboured nucleotide-binding site domains encompassing 392 (17.9%) R-domain proteins. The candidate R-genes in cowpea were observed to carry domains or motifs of type ranging from one to five. Most of the proteins (1063) carried two types of domains in 13 different combinations, followed by those with three types of domains (610) in 12 different combinations. Singleton domains (Kin, NBS, LYSM, TIR, TM, LRR, LECM) were observed in 319 proteins encoded by R-genes, while 183 had 4 types of domains, and only 13 presented 5 types of domains with the lone combination CC-NBS-TM-TIR-LRR. Among the R-genes, the proteins with TM-KIN motifs had the maximum representation (686), distantly followed by LRR-TM (219) and LRR-TM-KIN (277). Seven proteins with CC-NBS-TM-TIR, CC-LECM-TM, CC-TM-TIR-LRR, CC-TM-LYSM-KIN, LRR-TM-TIR, NBS-TIR, NBS-LRR-TIR motif combinations were each represented singly (Table 1). In addition, to ascertain whether the assembled genome properly represented all the classes of R-genes, the reference genome of cowpea in NCBI (assembly ASM411807v2) was also used for R-genes prediction using the DRAGO 3 pipeline (Supplementary Table S1b). It was observed that the prediction based on the assembled genome in our study identified one class of R-genes, CLEC, that was not present in the reference genome, while LYP class found in the reference genome remained elusive in ours. Albeit the pattern of representation of different classes of R-genes remaining the same in both the genome assemblies, the preponderance of proteins within each class was higher in the reference genome excepting the L, LEC and LYS classes (Supplementary Table S1c). Each of the NBS and KIN domains were represented in 9 classes of R-genes, while LRR domains were found in 10, TIR in 8, LEC in 4, and LYS in 3 classes.
Transcription factors (TFs) and transcription-associated proteins (TAPs)
The repertoire of TFs and TAPs in cowpea were predicted using three different pipelines and all non-redundant TFs cumulatively identified by these three pipelines were anticipated to be involved in transcriptional regulation. The PlantTFcat pipeline identified a total of 5573 TAP encoding genes with 98 families, of which CCHC (Zn) (1187), C2H2 (642), MYB-HB like (361), and WD40 like (353) were predominant (Table 2). A total of 33 families showed low representation with each coded by less than 10 genes, while families like JSW1, JMJC-ARID, LFY, MYB-related, NOZZLE, STAT, and TAZ were under-represented merely by one or two genes each. The PlantTFDB pipeline successfully uncovered 2128 genes belonging to 58 families among which 17 were under-represented with less than 10 genes each (Table 2). The prominent TF families include bHLH (195), MYB (167), ERF (149), and C2H2 (128), while NZZ/SPL, HRT-like, LFY, STAT, NF-X1, S1Fa-like, and SAP were inconspicuously under-represented with one or two genes each. The iTAK pipeline mined 2198 TFs and 504 TRs from 93 families. Some of the over-represented families in the order of preponderance include MYB (165), bHLH (163), ERF (152) and C2H2 (143). Thirty-four of the TFs were identified to be under-represented with less than 10 genes each, while HRT, LFY, MED7, SOH1, ULT, and NOZZLE were uniquely represented. Altogether, 118 non-redundant families housing the TFs and TAPs were identified using the three pipelines (Supplementary Tables S2a–c). Thus, the largest TF families predicted in cowpea include CCHC-type Zinc-finger (CCHC(Zn), Cys(2)-His(2) type (C2H2), myeloblastosis-Homo box like (MYB-HB like), Trp (W)-Asp (D) repeat proteins (WD-40-like), basic helix-loop-helix (bHLH), MYB, and ethylene response factor (ERF). On comparing the TAPs deduced from the reference and our genomes, it was observed that three families viz., RB, STAT, and ULT were exclusively found in the CDS genome. In general, the number of TAPs in the reference genome was more abundant compared to our genome. In particular, the families belonging to bZIP, C3H, FAR1, HB-BELL, LUG, MADS-MIKC, MYB-related, NAC, PHP and WRKY were predominant (1.2 × to 3.5 ×). In contrast, the TFs belonging to AP2/ERF AP2, B3, C2C2 LSD and CPP were relatively more (1.2 × to 2 ×) in our genome (Supplementary Table S2d).
Genome-wide identification and classification of protein kinases (kinome)
The kinome, comprising the entire set of protein kinases (PKs) encoded by the cowpea genome, was predicted in silico through iTAK (Supplementary Table S3a). A total of 1215 kinases were discerned in cowpea after the exclusion of redundant sequences (Supplementary Table S3b). The identified PKs were classified into groups and families following an approach based on Hidden Markov Models. Only 1135 of the annotated PKs could be ascertained of their families following multiple sequence alignment and clustering based on the neighbour-joining method and were used for further analysis (Supplementary Table S3c). The 1135 PKs were allocated into 22 groups, comprising of 122 families (Table 3, Supplementary Table S3a). The receptor-like kinase/Pelle (RLK-Pelle) group was the largest comprising of 56 families and housing about 68.02% (772) of the total PKs in the genome (Fig. 1). The other major groups included Ca2+/calmodulin-dependent protein kinases (CAMK, 86) with 6 families, cyclin-dependent, mitogen-activated, glycogen synthase and CDK (cyclin dependent kinases)-like protein kinases (CMGC, 76) with 17 families, tyrosine kinase-like kinases (TKL, 57) with 11 families, and serine/threonine kinases (STE, 42) with 6 families. The RLK-Pelle_DLSV family was the largest with little more than one-sixth (133) of the RLK-Pelle group PKs. The other major families of the RLK-Pelle group included leucine-rich repeat-XI-1 (LRR-XI-1), leucine-rich repeat-III (LRR-III), receptor-like cytoplasmic kinase-VIIa-2 (RLCK-VIIa-2), L-type lectins (L-LEC), S domain 2b (SD-2b), LRK10-like kinase type 2 (LRK10L-2), and, Catharanthus roseus RLK1-like (CrRLK1L-1), each holding PKs in the range of 32–58. Some of the prominent families from other groups were calcium-dependent protein kinases (CDPK) and CAMK-like checkpoint kinase 1(CAMKL-CHK1) of CAMK group, homologous to yeast STE11 (STE11) of STE group, plant-specific 4 (Pl-4) of TKL group, ribosomal S6 kinases 2 (RSK-2) of AGC group, cyclin-dependent kinase-cdc2-related kinase 7-cyclin-dependent kinase 9 (CDK-CRK7-CDK9) of CMGC group, and nuclear receptor binding protein (NRBP) of with-no-lysine [K] kinases (WNK) group each comprising of 15–38 members. Of the 25 singleton families, only four were assigned to the largest group, RLK-Pelle. Ninety-nine families were codified by less than 15 genes; only 14 had 20 or more coding genes. The 1135 PKs were unevenly distributed across the 11 cowpea chromosomes. Chromosome 3 contained the highest number, anchoring 169 PKs (14.9%) spanning 68 families, followed by chromosome 5 with 152 PKs (13.4%) across 58 families. In contrast, chromosome 10 harboured the fewest PKs, with 69 members (6.1%) from 42 families, closely followed by chromosome 4, which housed 71 PKs (6.3%) representing 38 families (Fig. 2). One hundred fifty-two of the PK genes, associated with 24 unique families were devoid of introns, while the rest of the PKs (86.61%) have one or more introns in its genomic structure (Supplementary Table S3c). The average number of introns per family varied from zero (RLK-Pelle_LRR-VII-3) to 28 (PEK_GCN2). Twelve (∼9.8%) out of 122 searched families, all belonging to RLK-Pelle group, had an average number of introns between 0 and 0.77, indicating that most of its members do not have introns. In addition, this group also housed one (RLK-Pelle_LRR-XIIIb; mean number of introns: 26.25) of the seven families with 20 or more mean number of introns, representing its structural heterogeneity. Among the major groups, STE exhibited the highest average number of introns (14.92), while the largest group RLK-Pelle displayed the least average number of introns (4.85). On comparing with the reference genome-based kinome47, it was observed that two PK groups (NAK and TLK) and five families (TKL-Cr-3, RLK-Pelle-URK-1, RLK-Pelle-URK-2, NAK and TLK) were exclusively present in our genome. One (TKL-PI-3) of the 118 families reported in the reference genome remained elusive in our study (Supplementary Table S3d). The disparity in the total number of predicted PKs (1293 in reference vs 1135 in ours) resulted largely from a single group RLK-Pelle (908 vs 772).
Distribution of VuPKs among 22 kinase groups in the cowpea genome. The circular layout shows each VuPK group (outer ring, in Roman numerals), with bar length and associated Arabic numerals representing the number of genes per group. The variation in bar heights reflects the relative abundance of each group.
Chromosomal distribution of VuPKs in the cowpea genome. Each arc represents one of the 11 chromosomes (Vu01–Vu11), with the numbers indicating the total count of VuPKs per chromosome. The length of each solid arc is proportional to the number of encoded VuPKs, reflecting their relative abundance. The number of distinct protein kinase families present on each chromosome is listed alongside.
The dispersion duplication mechanism was the main apparatus for VuPK expansion in cowpea genome, responsible for the expansion of 841 VuPKs (Supplementary Table S4a). None of the VuPK genes showed expansion through whole genome duplication (WGD) event (Fig. 3). About 10 VuPK genes did not show duplication and were considered singletons (Supplementary Table S4b). Eighty-five VuPK genes belonging to 6 groups (CAMK, CMGC, TTK, WEE, STE and RLK-Pelle) exhibited proximal duplications (Supplementary Table S4c). Seventy-three tandem duplication events covering a total of 198 genes and composing of 119 duplicated gene pairs were identified. The tandem duplicated genes were observed primarily in 4 groups, viz., CAMK, CMGC, TKL, and RLK-Pelle. About 95% (187) of the tandemly duplicated genes belonged to the RLK-Pelle group (Supplementary Table S4d). The number of tandem duplication events in each chromosome varied from 3 to 14 with chromosome 5 (35 genes) and chromosome 7 (31 genes) housing the maximum number of tandemly arrayed VuPK genes. The number of VuPK genes within a tandem in each chromosome varied from 2 to 7, with chromosome 6 and chromosome 9 carrying the maximum genes per tandem event (Supplementary Table S4d).
Expansion mechanisms of protein kinases (VuPKs) across 22 kinase groups in the cowpea genome. The bar plot illustrates the number of VuPKs in each group, with color segments representing different duplication modes: tandem (blue), proximal (orange), dispersed (green), and singleton (yellow). Dispersed duplication is the most prevalent mechanism contributing to kinase group expansion.
The coding sequences of the VuPK genes undergo nucleotide substitutions that act as the driving force for natural selections to act upon. The ratio of non-synonymous to synonymous substitution rates (Ka/Ks) is often construed as an informative parameter of gene evolution under selection. The pairwise comparisons between the tandem duplicated genes showed that the Ks/Ka values varied between 0.08 and 7.75 (Supplementary Table S5) and the mean ratio of the tandem pairs was 0.67. Eighty-five percent of these gene pairs had less than a unit Ka/Ks ratio, suggesting their influence under purifying selection. About 15% of the gene pairs displayed more than a unit Ka/Ks ratio implicating the pertinent role of positive selection in driving their evolution.
The subcellular localizations of the VuPKs were also predicted through CELLO and LOCALIZER. Most of the PKs (419, 36.92%) were found localized to the plasma membrane followed by the nucleus (27.67%), cytoplasm (19.91%), chloroplast (6.43%), mitochondria (5.90%), extracellular (3.08%), and only one of the PKs (0.09%) was found localized to endoplasmic reticulum (Fig. 4). About 98.3% of the VuPKs localized to plasma membrane belonged to the RLK-Pelle group (Supplementary Table S6a). On the contrary, the LOCALIZER resolved 86 (6%), 46 (4.1%), 502 (44.2%) and 65 (5.7%) VuPKs to be localized to chloroplasts, mitochondria, nuclear with no transit peptides and nuclear with transit peptides, respectively (Supplementary Table S6b).
The isoelectric points (pIs) of the predicted VuPKs varied from 4.2 to 11.08, with MWs ranging from 9070 to 194,290 Da. The pIs and MWs of VuPKs varied widely within the groups exhibiting both extremes of values. CK1 was the only group displaying narrow intra-pI values (8.67–10.22) (Supplementary Table S7).
Validation of in silico determined R-genes, TAPs and PKs
All twenty genic primers designed from gene sequences of R-genes, TAPs, and PKs successfully amplified the target regions across the ten cowpea genotypes. While many primers exhibited monomorphic amplification patterns, a subset revealed presence/absence variations among the genotypes. The sizes of the amplified products matched the expected amplicon lengths precisely. The primer sequences along with their expected amplicon lengths are listed in Supplementary Tables S8a–c. Representative amplification profiles for two primers from each gene regulatory class are shown in Figs. 5, 6 and 7 (The original uncropped images of the gels are provided in Supplementary Fig. S3).
Differential gene expression under biotic and abiotic stresses
Biotic stress (cowpea aphid borne mosaic virus): The RNA-seq data from infected and non-infected cowpea plants showed significant upregulation of nine R-genes (Supplementary Table S9a). The R-genes belonged to typical NBS (1), TNL (1), RLK (1), RLP (1), LECRK (1) and KIN (4) classes with log2FC in the range of 2.11 (KIN) to 3.11 (NBS). None of the R-genes were downregulated in the resistant cowpea genotype IT85F-2687 post infection with the virus. Twenty-four TFs belonging to AP2/ERF-ERF (8), WRKY (4), TIFY (3), bHLH (2), MYB (2), and one each of GRAS, Jumonji, TCP, SBP and C2H2 were significantly upregulated (log2FC:1.63–2.8). Alternately, 11 TFs, four of AP2/ERF-ERF, three of NACs, two of WRKYs and one each of MYB and PLATZ were significantly under expressed consequent to infection (log2FC: − 2.82 to − 1.23). Six PKs (RLK-Pelle_LRR-IV, CMGC_CDK-CRK7-CDK9, STE_STE11, CAMK_CAMKL-CHK1, RLK-Pelle_DLSV and CMGC_CDK-CRK7-CDK9) were found upregulated after infection with log2FC values in the range of 2.1–2.56, while three PKs (CAMK_CAMKL-CHK1, RLK-Pelle_LRR-I-1, RLK-Pelle_CR4L) were significantly downregulated (log2FC of − 2.99 to − 1.48) owing to infection. The volcano plot of differentially expressed genes of cowpea plants infected with CABMV and the control plants is depicted in Fig. 8.
Volcano plot depicting differential expression of R-genes, transcriptionally active proteins (TAPs), and protein kinases (PKs) in cowpea plants infected with cowpea aphid-borne mosaic virus (CABMV) compared to control plants. Red dots indicate significantly upregulated genes, blue dots represent significantly downregulated genes, and grey dots denote non-significant changes. Selected gene families with significant differential expression are annotated on the plot.
Abiotic stress (root dehydration): When cowpea plants were subjected to dehydration, 11 R-genes were over-expressed, while 18 genes were under-expressed (Supplementary Table S9b). R-genes belonging to the classes CTNL (2), TNL (2), TN, CK, NL, RLK, CNL, CLK and KIN (1 each) with NBS and or LRR and Kinase domains were upregulated (log2FC: 1.15–2.45). Likewise, R-genes belonging to RLKs, KINs, TNs, TNLs, LECRK, NLs, CTNLs, CNs, CNLs, and CKs were downregulated (log2FC: − 3.69 to − 1.11). Dehydration resulted in the enhanced expression of RLK-Pelle_DLSV and RLK-Pelle_LRR-XI-1 PKs (log2FC: 1.26–2.45), while other classes of RLK-Pelle group (LRR-XI-2, RLCK-Os, SD-2b, DLSV, RLCK-VII1-2, LRR-XI-1, LRK10L-2) and AGC group (RSK-2) were under-expressed (log2FC: − 2.47 to − 1.11). Incidentally, different isoforms of RLK-Pelle_DLSV and RLK-Pelle_LRR-XI-1 were both up- and down-regulated due to dehydration. The TFs MADS-MIKC, LOB, and HSFs were over-expressed (log2FC: 1.08–1.75), while a good number of AP2/ERF (7) TFs were down-regulated along with others such as C2H2, WRKY, MYB, NAC, and GRAS (log2FC: − 1.70 to − 1.04). The volcano plot of differentially expressed genes of cowpea plants subjected to root dehydration and the control plants is depicted in Fig. 9.
Volcano plot showing differential expression of R-genes, transcriptionally active proteins (TAPs), and protein kinases (PKs) in cowpea plants subjected to root dehydration stress compared to control plants. Red dots represent significantly upregulated genes, blue dots indicate significantly downregulated genes, and grey dots correspond to non-significant changes. Notable gene families exhibiting significant differential expression are annotated.
Discussion
Cowpea, like other legumes, has evolved intricate molecular networks to mitigate diverse biotic and abiotic stresses. Whole-genome sequencing enables the comprehensive identification and characterization of molecular moderators, including R-genes, TAPs and PKs, facilitating the discovery of key regulators involved in plant stress responses. Our hybrid genome assembly of cowpea, leveraging Illumina and nanopore sequencing data, produced a high-quality draft. Although ~ 30 × coverage is ideal for de novo nanopore assemblies, this study used ~ 20 × nanopore data supplemented with ~ 120 × Illumina reads, balancing cost and computational efficiency without compromising assembly quality. The assembly (~ 325 Mbp) attained > 93% completeness (BUSCO), validating its robustness for functional annotation. The smaller genome size estimate (compared to a previous estimate of 519 Mbp5) is likely due to the limitation of hybrid assemblers in collapsing repeat regions55. Optical map-based PacBio sequencing is a more reliable estimator of genome size, in repeat-rich genomes like cowpea5. However, this limitation did not compromise our ability to identify key gene families. Functional annotation of biomolecules through computational prediction tools such as Hidden Markov Models (HMM)56 utilizes conserved domains and structural features, offering a cost- and time-effective alternative to experimental methods57.
R-genes
R-genes play a central role in plant immunity by encoding proteins that recognize pathogens and trigger defence responses. Widely used in resistance breeding, the predominant class (NB-LRR or NLR genes58) features a conserved nucleotide-binding domain and a variable leucine-rich repeat domain that determines pathogen specificity. R-genes confer resistance to a wide range of pathogens—including bacteria, fungi, viruses, and nematodes—despite encoding a limited set of proteins with conserved domains58,59,60. Through their modular structure, R-proteins can both recognize pathogen effectors (AVR proteins) and modulate defence signalling. Recognition follows one or more of four models: direct interaction (elicitor-receptor61,62), indirect sensing via modified host targets (guard model59), detection of altered decoy proteins (decoy model63), or incorporation of decoy-like domains within NLRs (integrated decoy model64,65).
With the increasing accessibility of whole-genome sequences, comprehensive analyses of R-genes have been undertaken across several crops, including blackgram, mungbean, chickpea, rice, tomato, Medicago, and Arabidopsis66. In dicots, R-genes typically represent 0.18% (papaya)67 to 5.3% (Arabidopsis)60 of the total gene content. In our study, R-genes comprised 3.3% of the cowpea genome, positioning it well within the reported range. While the proportion is notably higher than Medicago58 (1.2%), it is comparable to blackgram66 (3.9%), highlighting cowpea’s relatively rich repertoire of immune-related genes among legumes. Strikingly, NBS-domain containing genes accounted for 17.9% of total R-genes in cowpea, more than double the proportion observed in blackgram (8.6%)66. The 392 NBS-domain genes identified in this study is in line with a previous report in cowpea (402)5. It also matches the counts reported in other crops such as sorghum (346)68, soybean (319)69, common bean (325)70 and Arachis duranensis (393)71, reinforcing the evolutionary conservation and functional importance of this class. NBS-LRR proteins, including TIR-NBS-LRR (TNL) and coiled coil-NBS-LRR (CNL), are central to effector triggered immunity (ETI), a critical defence response against pathogen effectors72. Their broad involvement in resistance against fungal (wheat stripe/stem rust73,74, barley powdery mildew75, flax rust76, downy mildew in Arabidopsis77), viral (tobacco mosaic virus in tobacco78) and bacterial (rice blight79, Arabidopsis bacterial wilt80) diseases across crops is well documented, yet their functional relevance in cowpea remains underexplored. Additionally, receptor like kinases (RLK) and receptor like proteins (RLP), which mediate pathogen associated molecular pattern triggered immunity (PTI)72, comprised 23.3% of total R-genes in cowpea (Table 1), aligning closely with estimates in mungbean (25.7%)66. Significant differences were observed across the R-gene classes between reference and our genomes. CLEC class was detected exclusively in ours, whereas the LYP class was unique to the reference genome. This presence-absence variation suggested genotypic divergence likely shaped by selective pressures or breeding history81. Such variation also reflects the dynamic nature of the cowpea pan-genome, where non-core genes (genes present in some individuals but not all), often involved in stress responses, contribute disproportionately to genetic diversity81. In addition, our genome also exhibited an enrichment of lectin-domain (L, LEC, CLEC) and lysin-motif (LYS) containing R-genes (Supplementary Table S1c), which are key sensors of pathogen-associated molecular patterns like chitin and peptidoglycans, the hallmarks of fungal and bacterial pathogens82. In contrast, the reference genome harboured a higher proportion of canonical R-genes belonging to CN, CNL, CTNL, RLK, and TNL classes. Thus, our genome is primarily augmented with PTI-related R-genes, while the reference genome shows relative abundance of ETI-associated R-proteins. This contrasting distribution reveals divergent evolutionary trajectories of immune gene families and suggests potential specialization in pathogen defence across cowpea genotypes. Collectively, these findings not only highlight the richness and diversity of R-genes in cowpea but also underscore the importance of genome-level exploration in uncovering genotype-specific resistance mechanisms. They also provide a strong foundation for functional studies and targeted breeding efforts aimed at enhancing disease resistance in this climate-resilient legume.
Transcription factors (TFs) and transcription associated proteins (TAPs)
TAPs, including TFs, TRs and putative proteins, orchestrate complex gene expression networks that enable plants to respond to developmental cues and environmental stimuli. While TFs directly bind cis-regulatory elements, TRs often function as coactivators/corepressors or as chromatin remodellers. In the pursuit of developing climate-resilient pulse crop varieties, the TFs which form the key regulators for stress and developmental responses, are of paramount importance83.
While several curated databases exist for TF and TAP identification, no single pipeline captures their full diversity13. To address this, we employed a combinatorial approach using PlantTFcat, PlantTFDB, and iTAK, leading to the identification of 6464 TAP-encoding domains from 5226 transcripts-accounting for 9.8% of the genome. This result aligns well with previous reports in cowpea (~ 7.26% of the transcriptome)13. Although the number of TF families (118) identified in this study was lower than that of Misra et al13 (136 families), two families (ABTB and CW-Zn-B3_VAL) were uniquely revealed in our assembly, potentially reflecting the increased sensitivity of our hybrid assembly and BRAKER-based annotation strategy. However, these two families were discovered in other legumes such as common bean13. Pipeline-specific differences were also observed while comparing previous annotations by Misra et al.13. They identified five TAP families (CW-Zn-B3_VAL, Dicer, JmjC-ARID, Rel, and RF-X) exclusively from raw cowpea genome using MAKER84 and AUGUSTUS85 gene prediction tools. Contrarily, we were able to discern first three of the five TAP families even from our transcripts, reinforcing the superiority of BRAKER-based gene prediction. Our study also strengthens the conservation of stress-regulatory TFs such as NAC and WRKY in cowpea. We identified 99 NAC and 106 WRKY genes that were consistent with earlier reports (vs 90 NAC83 and 92 WRKY86). Low-copy TF families (Table 2; HRT, LFY, MED7, SOH1, ULT with single copy and others with few copies) although less represented, play important roles with their specialized, tightly regulated and often conserved functions (Table 2). Together with minimal functional redundancy (have few or no paralogs), they serve as strategic targets for precision crop improvement through gene editing or transgenic approaches. For instance, in soybean, editing GmE1 (a B3-domain low copy TF) led to early flowering under long day conditions87. Sadhukhan et al.88 identified a DREB2 ortholog in cowpea (VuDREB2A) with implications for imparting drought tolerance and confirmed the role of this potential candidate gene in conferring water stress tolerance through a transgenic approach. Comparative analysis with the reference genome revealed differential enrichment of TAP families. While all the TAP families in the reference genome were discoverable in ours, three families, RB, ULT, and STAT, were exclusive to our CDS genome. The reference genotype was richer in TFs associated with abiotic/biotic stress responses (e.g., bZIP, NAC, WRKY, MYB-related, C3H)89, and floral and meristem identity (e.g., MADS-MIKC, FAR1, HB-BELL, LUG, PHP)90. This suggests its adaptive advantage under environmental stresses such as drought, salinity, or pathogen attack. PHP may offer additional epigenetic regulation of flowering time. Conversely, CDS harboured greater abundance of TFs involved in developmental regulation90, including AP2/ERF AP2, B3, C2C2-LSD, and CPP. AP2/ERF and B3 TFs are known to regulate seed development and hormone signalling91, while C2C2-LSD is implicated in fine-tuning programmed cell death to limit pathogen spread92. Thus, CDS may exhibit enhanced developmental plasticity, earlier flowering, or higher seed yield under non-stress or moderate stress conditions. These findings underscore functional divergence between the genotypes. While the reference genotype appears better adapted for stress-prone environments, CDS may be optimized for reproductive success and yield stability. Crossbreeding strategies incorporating both could yield cultivars with synergistic improvements in stress resilience and productivity.
Protein kinases (PKs)
Protein kinases form one of the most expansive and functionally diverse gene families in plants, orchestrating complex signalling networks essential for development, environmental sensing, and stress adaptation. In the present study, we identified 1135 VuPKs in cowpea, accounting for 3.6% of predicted proteins, consistent with proportions observed in common bean93 (3.3%), Arabidopsis94 (3.4%), cucumber95 (3.69%), grapevine96 (3.7%), pineapple97 (2.8%), but lower than in soybean98 (4.7%). This reflects the evolutionary conservation of this regulatory machinery across angiosperms. A slightly higher proportion (4%) and a larger gene count (1298) in a previous cowpea kinome47, is likely due to differences in the genotypes and sequencing depth. We identified 22 PK groups, including two additional ones (NAK and TLK) unreported in the reference genome, and 122 families, of which five were novel (TKL-Cr-3, RLK-Pelle-URK-1, RLK-Pelle-URK-2, NAK and TLK). One (TKL-PI-3) of the 118 families in the reference genome-based kinome47 remained elusive in our study. These newly resolved families, largely underexplored in plant systems94,99, expand the known functional repertoire of PKs and highlight the potential for discovering genotype-specific signalling components. Their exclusive detection in our genome suggests lineage-specific expansions or adaptive retention, offering valuable leads for functional validation and targeted crop improvement. Incorporating findings from both studies bring the total known cowpea PK repertoire to at least 123 families across 22 groups. The RLK-Pelle group dominated the cowpea kinome (~ 68%), mirroring trends in other crops (63–75% in common bean93, Arabidopsis94, pineapple97, soybean98). This was followed by CAMK, CMGC, TKL, STE and AGC; together forming 94% of the kinome (Table 3, Fig. 1). All groups in the reference genome and ours showed similar abundance excepting RLK-Pelle group. In the former, we observed a relative enrichment of families like DLSV, LRK10L-2, LRR-XI-1, LRR-XII-1, and WAK within the RLK-Pelle group. This possibly reflects an evolved and diversified receptor system, likely enhancing the plants’ ability to sense and respond to broad range of environmental cues and pathogens100. The predominance of RLK-Pelle_DLSV family (~ 77% of RLK-Pelle group and 11.7% of total VuPKs) and the hierarchy of abundance of different PK groups (RLK-Pelle > CAMK > CMGC > TKL > STE) align with observations in different crop species including common bean93 (Fig. 1). The low representing atypical PK groups with minimal functional redundancy (PI02, TLK, BUB, IRE1, TTK, NAK, PEK, ULK, PI-3, PI-4, SCY1, Aur, and WEE) showed congruency with a previous study47 and may serve unique regulatory roles, making them promising candidates for gene function studies. Spatially, VuPKs were unevenly distributed across chromosomes, with Vu3 and Vu5 exhibiting the greatest abundance and diversity, while Vu10 carried the least VuPKs (Fig. 2). This finding corroborated with a previous study in cowpea47 and also mirrored syntenic patterns seen in common bean93, where chromosomes Pv8 and Pv10 (syntenic with Vu5 and Vu10)5 showed similar trends. The predominance of intron-containing PKs (86.6%) suggests evolutionary selection for structural complexity, potentially enhancing regulatory versatility101. The extent of intron-less PKs observed (13.4%) was similar to the previous reports in cowpea (13.6%)47 and common bean (13.5%)93, well within the range reported in other crops (9.5%-16.6% in grapevine96, pineapple97, and wheat102). The maximum introns per family (28) observed in the study is the same as that in other Fabids including common bean93 and soybean98.
Gene duplication is a pivotal mechanism driving genome evolution and functional diversification responsible for the vast expanse of PKs in plants99. Importantly, dispersed duplication emerged as the primary mechanism driving VuPK expansion (74.2%), followed by tandem (17.4%) and proximal (7.5%) duplications (Fig. 3). This pattern contrasts with legumes like common bean93 and soybean98, where whole-genome or segmental duplications predominate. Lack of recent polyploidy events and transposon-rich genome5 facilitated dispersed and tandem duplications in cowpea, responsible for the expansion of ~ 82% of VuPKs. While all three non-WGD mechanisms were distinct in RLK-Pelle, CAMK, and CMGC groups (Fig. 3), dispersed duplication exclusively was responsible for expansion in 14 specific groups (Fig. 3), notably CK1, NEK and WNK. Copies emanating through dispersed duplication might be the outcome of different transposition events (replicative, non-replicative or conservative) occurring in different plant genomes42,103. Tandem duplication was the second largest event forcing the expansion of 17.4% of VuPKs as previously deduced in cowpea47 and common bean93. PKs expanded through tandem duplication often play roles in biotic stress responses99, and over 85% of tandem duplicated gene pairs exhibited Ka/Ks < 1, suggesting purifying selection and potential functional redundancy, buffering against gene loss during diversification. Subcellular localization analysis showed a striking 98.3% of VuPKs targeted to the plasma membrane, belonged entirely to the RLK-Pelle group, consistent with their roles as transmembrane receptors in pathogen detection and hormonal signalling104. Other VuPKs localized to diverse compartments, including the nucleus, cytoplasm, chloroplast, mitochondria, extracellular space and endoplasmic reticulum, reflecting their functional breadth across signalling axes (Fig. 4, Supplementary Table S6a). The biochemical parameters of VuPKs like pI and MW varied extremely even within the groups like in common bean93 but contrasted to that in other crops like grapevine96, wherein the values generally remained similar within a group.
PCR validation of in silico determined genes
All designed primers successfully amplified the expected targets, validating the utility of the genomic data. Due to strong purifying (negative) selection as discussed above, short genic regions (~ 200–300 bp) within exons typically exhibit low polymorphism105. Nevertheless, some primers captured presence-absence variations (Fig. 5), a common feature in regulatory gene families106.
Interplay of R-genes, TFs and PKs under biotic and abiotic stresses
The expression dynamics of R-genes, TAPs, and PKs revealed distinct stress-specific regulatory patterns in cowpea. In the present study, nine R-genes were specifically induced in response to cowpea aphid borne mosaic virus (CABMV) infection. These gene classes included four kinases, one each of a TNL, RLK, RLP and LECRK, in addition to a canonical NLR, aligning with their established roles in pathogen perception and signal activation107,108. Though hardly reported in cowpea, such activation, mirrors findings in other legumes. For instance, Co-1 to Co-10 confer resistance to anthracnose (Colletotrichum lindemuthianum) in common bean109, while Phg-1 to Phg-5 and I genes provide resistance against angular leaf spot and bean common mosaic virus, respectively109. In soybean, the Rps1–Rps8, rhg1–Rhg4, Rsv1–Rsv4 and Rpp3 (a TNL) mediate resistance to Phytophthora sojae110, soybean cyst nematode111, soybean mosaic virus112, and Phakopsora113 (rust), respectively. In chickpea, the AB4.1 QTL associated with Ascochyta blight encompassed 12 predicted genes including those annotated as NBS-LRR RLK, WAK, zinc finger protein, and STPK114. In mungbean, several NLRs (VrNBS) showed significant activation response to mungbean yellow mosaic India virus (MYMIV)115. Interestingly, emerging evidence suggests that R-genes, especially those encoding NBS-LRR proteins, may also contribute to abiotic stress responses. In this study, RNA-seq data revealed differential regulation of 29 R-genes under root dehydration stress, with 11 upregulated and 18 downregulated genes, many belonging to the NBS-LRR class. Comparable patterns have been reported in other legumes. In grass pea, nine LsNBS genes (including LsNBS-D18, LsNBS-D204, and LsNBS -D180) exhibited significant stress-dependent expressions (both up- and down-regulation) under salt stress116. In Arabidopsis, overexpression of ADR1, an NLR gene, enhanced drought tolerance117. Such findings point to a broader functional scope of R-genes, suggesting their involvement in both biotic and abiotic stress signalling, potentially mediated through crosstalk with hormone-regulated pathways.
TFs demonstrated a complex and context-specific response. CABMV infection upregulated families like TIFY, GRAS, bHLH, TCP, C2H2, SBP, and Jumonji, whereas NAC and PLATZ were exclusively downregulated. TFs like WRKY, MYB, and AP2/ERF showed mixed response. Most of the upregulated TFs like AP2/ERF, MYB, bHLH are majorly intricated in regulating and synthesizing secondary metabolites like phenols, lignin, flavonoids, tannins etc. under biotic stress in various crops118,119. Simultaneously, many of these TFs are also involved in growth and developmental processes118. Therefore, under a given stress, isoforms of these TFs could show contrasting response within the same genotype as evident in this study. TFs are largely implicated in abiotic stress tolerance. In cowpea, two NAC genes, VuNAC1 and VuNAC2, isolated from a drought-hardy genotype imparted tolerance to multiple abiotic stresses such as drought, salinity and oxidative stresses7,8. The soybean NAC (GmNAC109) and WRKY (GmWRKY13, GmWRKY21, and GmWRKY54) genes enhance lateral root growth and contribute to drought and salt stress alleviation120,121. A chickpea MYB (1R-MYB) has been reported to co-regulate drought tolerance122. TFs like bZIP play crucial roles in ABA-mediated signalling pathways and are involved in modulating responses to abiotic stresses like drought, salinity and temperature extremes, such as OsbZIP62 in rice123. AP2/ERF and DREB TFs are integral to regulating gene expressions in response to abiotic stresses such as drought, salinity and cold as in cowpea13 and mungbean124. Such TFs typically associated with abiotic stresses including drought were predominantly downregulated in this study. This atypical response may reflect severe stress adaptation (roots completely exposed to air), genotype-specific repression, or shift toward growth arrest and resource allocation125,126. In contrast, MADS-MIKC, LOB, and HSF TFs were upregulated under dehydration, suggesting alternative pathways that contribute to root development and protective responses127,128.
A similar trend was observed in PKs. Biotic stress induced RLK-Pelle, CAMK, and CMGC kinase groups—consistent with their roles in early signal transduction and immune response19. PKs belonging to RLK-Pelle, CMGC, STE and CAMK were also found upregulated in cowpea subjected to CABMV and CPSMV viral infections47. Similarly, significant involvement of RLK-Pelle and CAMK families in response to various stressors were elucidated in sunflower129. Interestingly, different isoforms of the PKs belonging to same family of CAMK group were up- and downregulated under CABMV infection. While many PKs are involved in stress response, other isoforms have roles in development and may be downregulated because of the need for resource allocation upon stress treatment99. However, several PKs were suppressed under dehydration, possibly reflecting a stress-phase–specific metabolic adjustment. Many families belonging to RLK-Pelle group (LRR-XI-2, RLCK-Os, SD-2b, RLCK-VIIa-2, LRR-XI-1, LRK10L-2) and AGC group (RSK-2) were downregulated under root dehydration stress. Likewise, downregulation of RLCK-VIIa-2 was also observed in wheat under waterlogging conditions130. Like in biotic stress, different isoforms within the same family (DLSV and LRR-XI-1) were contrastingly expressed under root dehydration. This is congruent to similar observations in wheat130. Interestingly, RLK-Pelle_DLSV was upregulated under both stresses, underscoring its potential role as a convergent signalling hub, similar to reports in cowpea, wheat and sunflower47,129,130.
Thus, the R-genes are primarily involved in signal perception triggering immunity against invading pathogens, while the PKs are implicated in relaying the signal from the membrane to the nucleus. The signal transduction through their cascading effect phosphorylates or dephosphorylates the TFs, which modulate expression of stress-responsive genes by binding on to the promoters or cis-elements. The interplay between these three groups occurs at various levels and their feedback ensures a dynamic, context specific response that balances defence and growth of the plants.
Conclusion
Cowpea is a hardy legume of high agricultural value, particularly in the context of climate change. As it frequently encounters various stresses, identifying and understanding the roles of key regulatory elements, such as R-genes, TAPs and PKs in enduring these stresses is essential. The present study provides valuable insights into the repertoire, structural diversity, functional profiles, and genomic organization of these regulatory elements. The highly diversified and structurally complex regulatory units, enriched with novel and under characterized gene families may hold the key to unlocking stress tolerance and signalling specificity. The genotype-specific presence of unique gene groups and classes of the regulatory units, underscores cowpea’s evolutionary innovation in signal transduction. This presents rich opportunities for molecular breeding and translational research towards developing climate-smart cowpeas.
Data availability
The NGS genomic datasets generated during the current study are available in the NCBI SRA repository [under the accession number PRJNA858559]. All other data generated or analysed during this study are included in this published article (and its Supplementary Information files).
References
Ayalew, T. & Yoseph, T. Cowpea (Vigna unguiculata L. Walp.): A choice crop for sustainability during the climate change periods. J. Appl. Biol. Biotechnol. 10, 154–162. https://doi.org/10.7324/JABB.2022.100320 (2022).
Gonçalves, A., Ribeiro, T., Silva, L. R. & Ferreira, I. C. F. R. Cowpea (Vigna unguiculata L. Walp.), a renewed multipurpose crop for a more sustainable agri-food system: Nutritional advantages and constraints. J. Sci. Food Agric. 96, 2941–2951. https://doi.org/10.1002/jsfa.7644 (2016).
FAOSTAT. Food and agriculture data. https://www.fao.org/faostat (2022).
Mohammed, S. B., Shehu, M. Y. & Singh, B. B. Appraisal of cowpea cropping systems and farmers’ perceptions of production constraints and preferences in the dry savannah areas of Nigeria. CABI Agric. Biosci. 2, 25. https://doi.org/10.1186/s43170-021-00046-7 (2021).
Lonardi, S. et al. The genome of cowpea (Vigna unguiculata [L.] Walp.). Plant J. 98, 767–782. https://doi.org/10.1111/tpj.14349 (2019).
Gururani, M. A., Venkatesh, J. & Tran, L.-S.P. Plant disease resistance genes: Current status and future directions. Physiol. Mol. Plant Pathol. 78, 51–65. https://doi.org/10.1016/j.pmpp.2012.01.002 (2012).
McDowell, J. M. & Woffenden, B. J. Plant disease resistance genes: Recent insights and potential applications. Trends Biotechnol. 21, 178–183. https://doi.org/10.1016/S0167-7799(03)00053-2 (2003).
van Ooijen, G., van den Burg, H. A., Cornelissen, B. J. C. & Takken, F. L. W. Structure and function of resistance proteins in solanaceous plants. Annu. Rev. Phytopathol. 45, 43–72. https://doi.org/10.1146/annurev.phyto.45.062806.094430 (2007).
Li, P. et al. RGAugury: A pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants. BMC Genomics 17, 852. https://doi.org/10.1186/s12864-016-3197-x (2016).
Jones, J. D. G., Vance, R. E. & Dangl, J. L. Intracellular innate immune surveillance devices in plants and animals. Science 354, aaf6395. https://doi.org/10.1126/science.aaf6395 (2016).
Kourelis, J., Sakai, T., Adachi, H. & Kamoun, S. RefPlantNLR is a comprehensive collection of experimentally validated plant disease resistance proteins from the NLR family. PLoS Biol. 19, e3001124. https://doi.org/10.1371/journal.pbio.3001124 (2021).
Liu, L., White, M. J. & Singer, C. E. Transcription factors and their genes in higher plants. Eur. J. Biochem. 262, 247–257. https://doi.org/10.1046/j.1432-1327.1999.00349.x (1999).
Misra, V. A., Wang, Y. & Timko, M. P. A compendium of transcription factor and transcriptionally active protein coding gene families in cowpea (Vigna unguiculata L.). BMC Genomics 18, 898. https://doi.org/10.1186/s12864-017-4287-7 (2017).
Filiz, E., Vatansever, R. & Ozyigit, I. I. Bioinformatics database resources for plant transcription factors. In Plant Bioinformatics (eds Hakeem, K. R. et al.) 97–116 (Springer, 2017). https://doi.org/10.1007/978-3-319-67156-7_5.
Riechmann, J. L. et al. Arabidopsis transcription factors: Genome-wide comparative analysis among eukaryotes. Science 290, 2105–2110. https://doi.org/10.1126/science.290.5499.2105 (2000).
Qu, L. J. & Zhu, Y. X. Transcription factor families in Arabidopsis: Major progress and outstanding issues for future research. Curr. Opin. Plant Biol. 9, 544–549. https://doi.org/10.1016/j.pbi.2006.07.005 (2006).
Richardt, S., Lang, D., Reski, R., Frank, W. & Rensing, S. A. PlanTAPDB, a phylogeny-based resource of plant transcription-associated proteins. Plant Physiol. 143, 1452–1466. https://doi.org/10.1104/pp.106.091264 (2007).
Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151. https://doi.org/10.1038/nature01763 (2003).
Lehti-Shiu, M. D. & Shiu, S.-H. Diversity, classification and function of the plant protein kinase superfamily. Philos. Trans. R. Soc. B 367, 2619–2639. https://doi.org/10.1098/rstb.2012.0003 (2012).
Wang, P. et al. Mapping proteome-wide targets of protein kinases in plant stress responses. Proc. Natl. Acad. Sci. USA 117, 3270–3280. https://doi.org/10.1073/pnas.1919901117 (2020).
Dhanasekar, P. & Souframanien, J. Gamma-rays induced genome wide stable mutations in cowpea deciphered through whole genome sequencing. Int. J. Radiat. Biol. 100, 1072–1084. https://doi.org/10.1080/09553002.2024.2345087 (2024).
Zimin, A. V. et al. The MaSuRCA genome assembler. Bioinformatics 29, 2669–2677. https://doi.org/10.1093/bioinformatics/btt476 (2013).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457. https://doi.org/10.1073/pnas.1921046117 (2020).
Smit, A.F.A., Hubley, R. & Green, P. RepeatMasker Open-4.0. http://www.repeatmasker.org (2015).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212. https://doi.org/10.1093/bioinformatics/btv351 (2015).
Hoff, K. J., Lomsadze, A., Borodovsky, M. & Stanke, M. Whole-genome annotation with BRAKER. Methods Mol. Biol. 1962, 65–95. https://doi.org/10.1007/978-1-4939-9173-0_5 (2019).
García, J. C. et al. PRGdb 4.0: An updated database dedicated to genes involved in plant disease resistance process. Nucleic Acids Res. 50, D1483–D1490. https://doi.org/10.1093/nar/gkab1087 (2022).
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549. https://doi.org/10.1093/molbev/msy096 (2018).
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37. https://doi.org/10.1093/nar/gkr367 (2011).
Lupas, A., Van Dyke, M. & Stock, J. Predicting coiled coils from protein sequences. Science 252, 1162–1164. https://doi.org/10.1126/science.252.5009.1162 (1991).
Sonnhammer, E. L., von Heijne, G. & Krogh, A. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6, 175–182. https://doi.org/10.1186/1471-2105-14-321 (1998).
Dai, X., Sinharoy, S., Udvardi, M. & Zhao, P. X. PlantTFcat: An online plant transcription factor and transcriptional regulator categorization and analysis tool. BMC Bioinform. 14, 321. https://doi.org/10.1186/1471-2105-14-321 (2013).
Zheng, Y. et al. iTAK: A program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases. Mol. Plant 9, 1667–1670. https://doi.org/10.1016/j.molp.2016.09.014 (2016).
Tian, F., Yang, D. C., Meng, Y. Q., Jin, J. & Gao, G. PlantRegMap: Charting functional regulatory maps in plants. Nucleic Acids Res. 48, D1104–D1113. https://doi.org/10.1093/nar/gkz1020 (2020).
Mulder, N. J. & Apweiler, R. InterPro and InterProScan: Tools for protein sequence classification and comparison. Methods Mol. Biol. 396, 59–70. https://doi.org/10.1385/1-59745-513-9:59 (2007).
Rhee, S. Y. et al. The Arabidopsis information resource (TAIR): A model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res. 31, 224–228. https://doi.org/10.1093/nar/gkg076 (2003).
The UniProt Consortium. UniProt: The universal protein knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617. https://doi.org/10.1093/nar/gkad066 (2025).
Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 49, D412–D419. https://doi.org/10.1093/nar/gkaa913 (2021).
Yu, C.-S., Chen, Y.-C., Lu, C.-H. & Hwang, J.-K. Prediction of protein subcellular localization. Proteins 64, 643–651. https://doi.org/10.1002/prot.21018 (2006).
Sperschneider, J. et al. Localizer: Subcellular localization prediction of both plant and effector proteins in the plant cell. Sci. Rep. 7, 44598. https://doi.org/10.1038/srep44598 (2017).
Gasteiger, E. et al. Protein identification and analysis tools on the ExPASy server. In The Proteomics Protocols Handbook 571–607 (Humana press, Totowa, NJ, 2005). https://doi.org/10.1385/1-59259-890-0:571.
Wang, Y. et al. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49. https://doi.org/10.1093/nar/gkr1293 (2012).
Sievers, F. & Higgins, D. G. Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol. Biol. 1079, 105–116. https://doi.org/10.1007/978-1-62703-646-7_6 (2014).
Madeira, F. et al. The EMBL-EBI job dispatcher sequence analysis tools framework in 2024. Nucleic Acids Res. 52, W521–W525. https://doi.org/10.1093/nar/gkae241 (2024).
Nekrutenko, A., Makova, K. D. & Li, W.-H. The KA/KS ratio test for assessing the protein-coding potential of genomic regions: An empirical and simulation study. Genome Res. 12, 198–202. https://doi.org/10.1101/gr.217302 (2002).
Untergasser, A. et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 40, e115. https://doi.org/10.1093/nar/gks596 (2012).
Ferreira-Neto, J. R. C. et al. The cowpea kinome: Genomic and transcriptomic analysis under biotic and abiotic stresses. Front. Plant Sci. 12, 667013. https://doi.org/10.3389/fpls.2021.667013 (2021).
The Galaxy Community. The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res. 52, W83–W94. https://doi.org/10.1093/nar/gkae410 (2024).
de Sena Brandine, G. & Smith, A. D. Falco: High-speed FastQC emulation for quality control of sequencing data. F1000Res 8, 1874. https://doi.org/10.12688/f1000research.21142.2 (2021).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10. https://doi.org/10.14806/ej.17.1.200 (2011).
Dobin, A. et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. https://doi.org/10.1093/bioinformatics/bts635 (2013).
Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930. https://doi.org/10.1093/bioinformatics/btt656 (2013).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. https://doi.org/10.1186/s13059-014-0550-8 (2014).
Wickham, H. Ggplot2: Elegant Graphics for Data Analysis (Springer, 2016). https://doi.org/10.1007/978-3-319-24277-4.
Kong, W., Wang, Y., Zhang, S., Yu, J. & Zhang, X. Recent advances in assembly of complex plant genomes. Genom. Proteom. Bioinform. 21, 427–439. https://doi.org/10.1016/j.gpb.2023.04.004 (2023).
Yoon, B. J. Hidden Markov models and their applications in biological sequence analysis. Curr. Genomics 10, 402–415. https://doi.org/10.2174/138920209789177575 (2009).
Peng, W. Improving protein function prediction using domain and protein complexes in PPI networks. BMC Syst. Biol. 8, 35. https://doi.org/10.1186/1752-0509-8-35 (2014).
Ameline-Torregrosa, C. et al. Identification and characterization of nucleotide-binding site-leucine-rich repeat genes in the model plant Medicago truncatula. Plant Physiol. 146, 5–21. https://doi.org/10.1104/pp.107.110041 (2008).
Dangl, J. L. & Jones, J. D. Plant pathogens and integrated defence responses to infection. Nature 411, 826–833. https://doi.org/10.1038/35081161 (2001).
Meyers, B. C. Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis. Plant Cell 15, 809–834. https://doi.org/10.1105/tpc.009308 (2003).
Catanzariti, A. M. et al. The AvrM effector from flax rust has a structured C-terminal domain and interacts directly with the M resistance protein. Mol. Plant Microbe Interact. 23, 49–57. https://doi.org/10.1094/MPMI-23-1-0049 (2010).
Steinbrenner, A. D., Goritschnig, S. & Staskawicz, B. J. Recognition and activation domains contribute to allele-specific responses of an Arabidopsis NLR receptor to an oomycete effector protein. PLoS Pathog. 11, e1004665. https://doi.org/10.1371/journal.ppat.1004665 (2015).
van der Hoorn, R. A. & Kamoun, S. From guard to decoy: A new model for perception of plant pathogen effectors. Plant Cell 20, 2009–2017. https://doi.org/10.1105/tpc.108.060194 (2008).
Le Roux, C. et al. A receptor pair with an integrated decoy converts pathogen disabling of transcription factors to immunity. Cell 161, 1074–1088. https://doi.org/10.1016/j.cell.2015.04.025 (2015).
Sarris, P. F. et al. A plant immune receptor detects pathogen effectors that target WRKY transcription factors. Cell 161, 1089–1100. https://doi.org/10.1016/j.cell.2015.04.024 (2015).
Souframanien, J., Raizada, A., Dhanasekar, P. & Suprasanna, P. Draft genome sequence of the pulse crop blackgram (Vigna mungo (L.) Hepper) reveals potential R-genes. Sci. Rep. 11, 11247. https://doi.org/10.1038/s41598-021-90683-9 (2021).
Ming, R. et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452, 991–996. https://doi.org/10.1038/nature06856 (2008).
Mace, E. et al. The plasticity of NBS resistance genes in sorghum is driven by multiple evolutionary processes. BMC Plant Biol. 14, 253. https://doi.org/10.1186/s12870-014-0253-z (2014).
Kang, Y. J. et al. Genome-wide mapping of NBS-LRR genes and their association with disease resistance in soybean. BMC Plant Biol. 12, 139. https://doi.org/10.1186/1471-2229-12-139 (2012).
Wu, J., Zhu, J., Wang, L. & Wang, S. Genome-wide association study identifies NBS-LRR-encoding genes related with anthracnose and common bacterial blight in the common bean. Front. Plant Sci. 8, 1398. https://doi.org/10.3389/fpls.2017.01398 (2017).
Song, H. et al. Comparative analysis of NBS-LRR genes and their response to Aspergillus flavus in Arachis. PLoS ONE 12, e0171181. https://doi.org/10.1371/journal.pone.0171181 (2017).
Chisholm, S. T., Coaker, G., Day, B. & Staskawicz, B. J. Host-microbe interactions: Shaping the evolution of the plant immune response. Cell 124, 803–814. https://doi.org/10.1016/j.cell.2006.02.008 (2006).
Liu, W. et al. The stripe rust resistance gene Yr10 encodes an evolutionarily conserved and unique CC-NBS-LRR sequence in wheat. Mol. Plant 7, 1740–1755. https://doi.org/10.1093/mp/ssu112 (2014).
Periyannan, S. et al. The gene Sr33, an ortholog of barley Mla genes, encodes resistance to wheat stem rust race Ug99. Science 341, 786–788. https://doi.org/10.1126/science.1239028 (2013).
Zhou, F. et al. Cell-autonomous expression of barley Mla1 confers race-specific resistance to the powdery mildew fungus via a Rar1-independent signalling pathway. Plant Cell 13, 337–350. https://doi.org/10.1105/tpc.13.2.337 (2001).
Anderson, P. A. et al. Inactivation of the flax rust resistance gene M associated with loss of a repeated unit within the leucine-rich repeat coding region. Plant Cell 9, 641–651. https://doi.org/10.1105/tpc.9.4.641 (1997).
Noel, L. et al. Pronounced intraspecific haplotype divergence at the RPP5 complex disease resistance locus of Arabidopsis. Plant Cell 11, 2099–2112. https://doi.org/10.1105/tpc.11.11.2099 (1999).
Whitham, S. et al. The product of the tobacco mosaic virus resistance gene N: Similarity to toll and the interleukin-1 receptor. Cell 78, 1101–1115. https://doi.org/10.1016/0092-8674(94)90283-6 (1994).
Iyer, A. S. & McCouch, S. R. The rice bacterial blight resistance gene xa5 encodes a novel form of disease resistance. Mol. Plant Microbe Interact. 17, 1348–1354. https://doi.org/10.1094/MPMI.2004.17.12.1348 (2004).
Deslandes, L. et al. Resistance to Ralstonia solanacearum in Arabidopsis thaliana is conferred by the recessive RRS1-R gene, a member of a novel family of resistance genes. Proc. Natl. Acad. Sci. USA 99, 2404–2409. https://doi.org/10.1073/pnas.032485099 (2002).
Liang, Q. et al. A view of the pan-genome of domesticated Cowpea (Vigna unguiculata [L.] Walp.). Plant Genome 17, e20319. https://doi.org/10.1002/tpg2.20319 (2024).
Naithani, S., Komath, S. S., Nonomura, A. & Govindjee, G. Plant lectins and their many roles: Carbohydrate-binding and beyond. J. Plant Physiol. 266, 153531. https://doi.org/10.1016/j.jplph.2021.153531 (2021).
Srivastava, R. & Sahoo, L. Genome-wide analysis of cowpea NAC family elucidating the genetic and molecular relationships that interface stress and growth regulatory signals. Plant Gene 31, 100363. https://doi.org/10.1016/j.plgene.2022.100363 (2022).
Cantarel, B. L. et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
Stanke, M. & Morgenstern, B. AUGUSTUS: A web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, W465–W467 (2005).
Matos, M. K. D. S. et al. The WRKY transcription factor family in cowpea: Genomic characterization and transcriptomic profiling under root dehydration. Gene 823, 146377. https://doi.org/10.1016/j.gene.2022.146377 (2022).
Wan, Z. et al. CRISPR/Cas9-mediated targeted mutation of the E1 decreases photoperiod sensitivity, alters stem growth habits, and decreases branch number in soybean. Front. Plant Sci. 13, 1066820. https://doi.org/10.3389/fpls.2022.1066820 (2022).
Sadhukhan, A. et al. VuDREB2A, a novel DREB2-type transcription factor in the drought-tolerant legume cowpea, mediates DRE-dependent expression of stress-responsive genes and confers enhanced drought resistance in transgenic Arabidopsis. Planta 240, 645–664. https://doi.org/10.1007/s00425-014-2111-5 (2014).
Baillo, E. H., Kimotho, R. N., Zhang, Z. & Xu, P. Transcription factors associated with abiotic and biotic stress tolerance and their potential for crops improvement. Genes 10, 771. https://doi.org/10.3390/genes10100771 (2019).
Jin, J. P. et al. PlantTFDB 4.0: Toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 45, D1040–D1045 (2017).
Yuan, H. Y., Kagale, S. & Ferrie, A. M. R. Multifaceted roles of transcription factors during plant embryogenesis. Front. Plant Sci. 14, 1322728. https://doi.org/10.3389/fpls.2023.1322728 (2024).
Dietrich, R. A., Richberg, M. H., Schmidt, R., Dean, C. & Dangl, J. L. A novel zinc finger protein is encoded by the Arabidopsis LSD1 gene and functions as a negative regulator of plant cell death. Cell 88, 685–694. https://doi.org/10.1016/s0092-8674(00)81911-x (1997).
Aono, A. H. et al. Genome-wide characterization of the common bean kinome: Catalog and insights into expression patterns and genetic organization. Gene 855, 147127. https://doi.org/10.1016/j.gene.2022.147127 (2023).
Zulawski, M., Schulze, G., Braginets, R., Hartmann, S. & Schulze, W. X. The Arabidopsis kinome: Phylogeny and evolutionary insights into functional diversification. BMC Genomics 15, 1–15 (2014).
Costa, F. C. L. & Pereira, W. A. The Cucumis sativus kinome: Identification, annotation, and expression patterns in response to powdery mildew infection. BioRxiv 2023.03.16.532963. https://doi.org/10.1101/2023.03.16.532963 (2024).
Zhu, K. et al. The grapevine kinome: Annotation, classification and expression patterns in developmental processes and stress responses. Hortic. Res. 5, 19. https://doi.org/10.1038/s41438-018-0027-0 (2018).
Zhu, K. et al. The kinome of pineapple: Catalog and insights into functions in crassulacean acid metabolism plants. BMC Plant Biol. 18, 199. https://doi.org/10.1186/s12870-018-1389-z (2018).
Liu, J. et al. Soybean kinome: Functional classification and gene expression patterns. J. Exp. Bot. 66, 1919–1934. https://doi.org/10.1093/jxb/eru537 (2015).
Lehti-Shiu, M. D., Zou, C., Hanada, K. & Shiu, S. H. Evolutionary history and stress regulation of plant receptor-like kinase/pelle genes. Plant Physiol. 150, 12–26 (2009).
Shiu, S. H. & Bleecker, A. B. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases. Proc. Natl. Acad. Sci. USA 98, 10763–10768. https://doi.org/10.1073/pnas.181141598 (2001).
Champion, A. et al. Arabidopsis kinome: After the casting. Funct. Integr. Genomics 4, 163–187. https://doi.org/10.1007/s10142-003-0096-4 (2004).
Wei, K. & Li, Y. Functional genomics of the protein kinase superfamily from wheat. Mol. Breed. 39, 141. https://doi.org/10.1007/s11032-019-1045-9 (2019).
Wang, Y., Ficklin, S. P., Wang, X., Feltus, F. A. & Paterson, A. H. Large-scale gene relocations following an ancient genome triplication associated with the diversification of core eudicots. PLoS ONE 11, e0155637. https://doi.org/10.1371/journal.pone.0155637 (2016).
Minkoff, B. B. et al. A cell-free method for expressing and reconstituting membrane proteins enables functional characterization of the plant receptor-like protein kinase FERONIA. J. Biol. Chem. 292, 5932–5942. https://doi.org/10.1074/jbc.M116.761981 (2017).
Li, Y. et al. Contrasting patterns of nucleotide polymorphism suggest different selective regimes within different parts of the PgiC1 gene in Festuca ovina L.. Hereditas 154, 11. https://doi.org/10.1186/s41065-017-0032-6 (2017).
Zhong, Y., Liu, S., Zhang, X., Li, Y. & Yang, Q. Evolutionary pattern of the presence and absence genes in Fragaria. Can. J. Plant Sci. 102, 427–436. https://doi.org/10.1139/cjps-2020-0316 (2021).
Sun, Y., Qiao, Z., Muchero, W. & Chen, J. G. Lectin receptor-like kinases: the sensor and mediator at the plant cell surface. Front. Plant Sci. 11, 596301. https://doi.org/10.3389/fpls.2020.596301 (2020).
Tör, M., Lotze, M. T. & Holton, N. Receptor-mediated signalling in plants: Molecular patterns and programmes. J. Exp. Bot. 60, 3645–3654. https://doi.org/10.1093/jxb/erp233 (2009).
Gonçalves-Vidigal, M. C. et al. Co-segregation analysis and mapping of the anthracnose Co-10 and angular leaf spot Phg-ON disease-resistance genes in the common bean cultivar Ouro Negro. Theor. Appl. Genet. 126, 2245–2255. https://doi.org/10.1007/s00122-013-2131-8 (2013).
McCoy, A. G. et al. A global-temporal analysis on Phytophthora sojae resistance-gene efficacy. Nat. Commun. 14, 6043. https://doi.org/10.1038/s41467-023-41321-7 (2023).
Yu, N. et al. Impact of Rhg1 copy number, type, and interaction with Rhg4 on resistance to Heterodera glycines in soybean. Theor. Appl. Genet. 129, 2403–2412. https://doi.org/10.1007/s00122-016-2779-y (2016).
Hayes, A. J. et al. Molecular marker mapping of RSV4, a gene conferring resistance to all known strains of soybean mosaic virus. Crop Sci. 40, 1434–1437. https://doi.org/10.2135/cropsci2000.4051434x (2000).
Bish, M. D. et al. The soybean Rpp3 gene encodes a TIR-NBS-LRR protein that confers resistance to Phakopsora pachyrhizi. Mol. Plant Microbe Interact. 37, 561–570. https://doi.org/10.1094/MPMI-01-24-0007-R (2024).
Li, Y. Genome analysis identified novel candidate genes for Ascochyta blight resistance in chickpea using whole genome re-sequencing data. Front. Plant Sci. 8, 359. https://doi.org/10.3389/fpls.2017.00359 (2017).
Purwar, S. et al. Genome-wide identification and analysis of NBS-LRR-encoding genes in mungbean (Vigna radiata L. Wilczek) and their expression in two wild non-progenitors reveal their role in MYMIV resistance. J. Plant Growth Regul. 42, 6667–6680. https://doi.org/10.1007/s00344-023-10948-7 (2023).
Alsamman, A. M. et al. Identification, characterization, and validation of NBS-encoding genes in grass pea. Front. Genet. 14, 1187597. https://doi.org/10.3389/fgene.2023.1187597 (2023).
Chini, A. et al. Drought tolerance established by enhanced expression of the CC-NBS-LRR gene, ADR1, requires salicylic acid, EDS1 and ABI1. Plant J. 38, 810–822. https://doi.org/10.1111/j.1365-313X.2004.02086.x (2004).
Kajla, M., Roy, A., Singh, I. K. & Singh, A. Regulation of the regulators: transcription factors controlling biosynthesis of plant secondary metabolites during biotic stresses and their regulation by miRNAs. Front. Plant Sci. 14, 1126567. https://doi.org/10.3389/fpls.2023.1126567 (2023).
Biswas, D., Gain, H. & Mandal, A. MYB transcription factor: a new weapon for biotic stress tolerance in plants. Plant Stress 10, 100252. https://doi.org/10.1016/j.stress.2023.100252 (2023).
Zheng, C. et al. Transcription factors involved in plant stress and growth and development: NAC. Agronomy 15, 949. https://doi.org/10.3390/agronomy15040949 (2025).
Zhou, Q.-Y. et al. Soybean WRKY-type transcription factor genes, GmWRKY13, GmWRKY21, and GmWRKY54, confer differential tolerance to abiotic stresses in transgenic Arabidopsis plants. Plant Biotechnol. J. 6, 486–503 (2008).
Ramalingam, A. et al. Gene expression and yeast two-hybrid studies of 1R-MYB transcription factor mediating drought stress response in chickpea (Cicer arietinum L.). Front. Plant Sci. 6, 1117. https://doi.org/10.3389/fpls.2015.01117 (2015).
Yang, S. et al. A stress-responsive bZIP transcription factor OsbZIP62 improves drought and oxidative tolerance in rice. BMC Plant Biol. 19, 260. https://doi.org/10.1186/s12870-019-1872-1 (2019).
Muhammad, L. A. et al. Genome-wide identification of AP2/ERF transcription factors in mungbean (Vigna radiata) and expression profiling of the VrDREB subfamily under drought stress. Crop Pasture Sci. 69, 1009–1019 (2018).
Shao, H.-B., Chu, L.-Y., Jaleel, C. A. & Zhao, C.-X. Water deficit stress induced anatomical changes in higher plants. C. R. Biol. 331, 215–225. https://doi.org/10.1016/j.crvi.2008.01.002 (2008).
Nakashima, K., Yamaguchi-Shinozaki, K. & Shinozaki, K. The transcriptional regulatory network in the drought response and its crosstalk in abiotic stress responses including drought, cold, and heat. Front. Plant Sci. 5, 170. https://doi.org/10.3389/fpls.2014.00170 (2014).
Reddy, A. S. N., Marquez, Y., Kalyna, M. & Barta, A. Complexity of the alternative splicing landscape in plants. Plant Cell 25, 3657–3683. https://doi.org/10.1105/tpc.113.117523 (2013).
Song, X., Li, Y., Cao, X. & Qi, Y. MicroRNAs and their regulatory roles in plant–environment interactions. Annu. Rev. Plant Biol. 70, 489–525. https://doi.org/10.1146/annurev-arplant-050718-100334 (2019).
Yan, N. et al. Genome-wide characterization of the sunflower kinome: Classification, evolutionary analysis and expression patterns under different stresses. Front. Plant Sci. 15, 1450936. https://doi.org/10.3389/fpls.2024.1450936 (2024).
Yan, J. et al. Phylogeny of the plant receptor-like kinase (RLK) gene family and expression analysis of wheat RLK genes in response to biotic and abiotic stresses. BMC Genomics 24, 224. https://doi.org/10.1186/s12864-023-09303-7 (2023).
Funding
Open access funding provided by Department of Atomic Energy.
Author information
Authors and Affiliations
Contributions
D.P. investigated the experiment, analysed the data and results, and wrote the manuscript; S.J. supervised the investigation, assisted with data analysis and review of manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The author(s) declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Punniyamoorthy, D., Jegadeesan, S. Whole genome sequencing reveals transcriptional and translational elements potentially regulating biotic and abiotic stress responses in cowpea. Sci Rep 15, 32913 (2025). https://doi.org/10.1038/s41598-025-18334-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-18334-x











