Abstract
To explore the connection between chloroplast and coffee resistance factors, designated as SH1 to SH9, whole genomic DNA of 42 coffee genotypes was sequenced, and entire chloroplast genomes were de novo assembled. The chloroplast phylogenetic haplotype network clustered individuals per species instead of SH factors. However, for the first time, it allowed the molecular validation of Coffea arabica as the maternal parent of the spontaneous hybrid “Híbrido de Timor”. Individual reads were also aligned on the C. arabica reference genome to relate SH factors with chloroplast metabolism, and an in-silico analysis of selected nuclear-encoded chloroplast proteins (132 proteins) was performed. The nuclear-encoded thioredoxin-like membrane protein HCF164 enabled the discrimination of individuals with and without the SH9 factor, due to specific DNA variants linked to chromosome 7c (from C. canephora-derived sub-genome). The absence of both the thioredoxin domain and redox-active disulphide center in the HCF164 protein, observed in SH9 individuals, raises the possibility of potential implications on redox regulation. For the first time, the identification of specific DNA variants of chloroplast proteins allows discriminating individuals according to the SH profile. This study introduces an unexplored strategy for identifying protein/genes associated with SH factors and candidate targets of H. vastatrix effectors, thereby creating new perspectives for coffee breeding programs.
Similar content being viewed by others
Introduction
Coffee is the most important agricultural commodity, with more than 9 million tons consumed each year, and an estimated retail value of 70 billion US dollars. Coffee is crucial for the economy of more than 60 countries, and it is the primary source of income for more than 100 million people1,2. Its production is focused in developing countries, where coffee represents a substantial portion of export earnings and serves as a crucial source of livelihood for households3.
Even though the Coffea genus is estimated to contain around 120 species, coffee supply comes from two species: Coffea arabica and Coffea canephora, representing about 56% and 44% of the global production, respectively3. Coffea arabica is an allopolyploid species (2n = 4x = 44) resulting from the natural hybridization between the diploid species (2n = 2x = 22) Coffea eugenioides and C. canephora4. Coffea arabica shows low genetic diversity, climatic inelasticity, and susceptibility to diseases and pests5,6,7. On the other hand, C. canephora shows higher genetic diversity and better tolerance to adverse conditions such as high temperature, drought, and pathogen challenges8.
Coffee leaf rust (CLR) is one of the diseases most significantly affecting Arabica coffee production on a global scale1. It was first observed in 1861 on East African wild coffee plants, and in 1869 the biotrophic fungus Hemileia vastatrix was identified as its causal agent1. CLR disease causes premature leaf fall due to direct damage, weakening and favouring dieback of branches, decreasing the photosynthetic capacity and vigour of the infected coffee plants1,9,10. Due to the collapse of the coffee industry in several countries, efforts were made to identify and introduce coffee species with higher tolerance to CLR. Coffea liberica and C. canephora were among the earliest species to be introduced, resulting in the creation of numerous interspecific hybrids. The Kalimas and Kawisari hybrids (C. arabica × C. liberica) exhibited considerable variability and low productivity, while the spontaneous hybrid (C. arabica × C. canephora) known as “Híbrido de Timor” (HDT) proved to be more resilient11. HDT stands out as the most prominent interspecific hybrid, a tetraploid arabicoid found on Timor Island that exhibits heterogeneity in appearance and yield. The discovery of this hybrid with resistance to the main rust races was a breakthrough in the coffee breeding programs, which have been carried out by the Centro de Investigação das Ferrugens do Cafeeiro (CIFC), in Portugal over the last 50 years11,12 and references therein. HDT and its derivatives played a crucial role in controlled crosses alongside traditional C. arabica varieties. The breeding efforts led to the emergence of a wide array of progenies, and subsequently, commercial coffee varieties exhibiting strong rust resistance and high production. These resistant varieties have since been developed and made available in coffee-growing regions across Latin and Central America, Africa, and Asia1,12. The evidence of their widespread adoption is readily apparent through various available varieties catalogs, such as the World Coffee Research repository. The study of resistant inheritance on crosses involving HDT derivatives, several C. arabica varieties and other Coffea species led to the identification of at least nine rust resistance factors designated as SH1 to SH9. Those studies demonstrated that SH6-SH9 derives from C. canephora ancestors, SH3 from a C. liberica introgression and the remaining factors from C. arabica11,12,13,14 and references therein.
The arms race between plants and fungi includes multiple plant defence mechanisms that the pathogen continually tries to circumvent. A successful strategy from a biotrophic fungus’ point of view will be to control the host's primary metabolism for its own feeding purposes, while conversely, the plant may try to block the fungus' access. It was reported that in susceptible coffee plants, the H. vastatrix genes involved in sugar transport and metabolism were upregulated15. On the other hand, coffee plants treated with resistance inducers and challenged by H. vastatrix showed a reduced incidence of CLR disease which was related to primary metabolic adjustments, namely the up-regulation of proteins from the photosynthesis-related pathways and redox-related enzyme activities16. Coffee resistance to H. vastatrix has been associated with restricted fungal growth in the early stages of the infection process due to hypersensitive cell death (HR), accumulation of reactive oxygen species (ROS), haustoria encasement, and cell wall lignification12 and references therein. ROS retrograde signalling is involved in PTI (PAMP-Triggered Immunity) and ETI (Effector-Triggered Immunity) responses. The generation of the H2O2 signal in PTI occurs in photosystem I (PSI), while in ETI the H2O2 signal is generated under photosystem II (PSII). In both cases, there is a strong suppression of the nuclear-encoded chloroplast genes, including photosynthesis-related genes17. Considering the importance and interplay of ROS and carbohydrate metabolism to plant–pathogen interactions, the chloroplast represents a prime target for pathogens’ manipulation18,19. While the targeting of chloroplasts by effectors from filamentous pathogens is documented, and a dynamic role for the chloroplast metabolism in the regulation of immune responses is foreseen18, knowledge of chloroplast-localised rust effector proteins is very limited17,20. Chloroplast functioning can also be disturbed by cytosolic-acting effectors that block the translocation of chloroplast nuclear-encoded proteins from the cytosol to the chloroplast21.
In addition to the chloroplast’s role in plant immunity, and due to its maternal inheritance in the different coffee species and interspecific hybrids22 and references therein, the plastid genome (cpDNA) can also serve as a valuable tool for deducing ancestry and evolutionary relationships. The complete chloroplast genome of several coffee individuals has been described23,24,25,26,27,28,29,30,31. Genome annotation studies performed on C. arabica cpDNA have revealed the presence of a total of 114 unique genes, consisting of 80 protein-coding, 30 tRNA, and four rRNA genes29. Proteins encoded by C. arabica chloroplast genomes are involved in photosynthesis (44 proteins), transcription (25 proteins), and functions such as protein degradation, fatty acid metabolism, and carbon fixation (6 proteins), with five more hypothetical reading frames of unknown function (ycf1–ycf5)29. The nuclear genome encodes all the remaining proteins required for chloroplast functions (including DNA replication, genome maintenance, and the regulation of gene expression and protein activity). Thus, most of the 2000–3000 proteins composing the chloroplast proteome are translated into the cytosol and imported into chloroplasts32. To assess whether there is a connection between chloroplast metabolism and coffee defence responses (SH factors), we address two complementary questions:
-
1.
Does the chloroplast genome reflect SH phenotypes? To answer this question, we use chloroplast genome of 42 coffee genotypes from the CIFC collection with different resistance factors to H. vastatrix. We also make use of 18 conspecific genomes available at NCBI in 2022.08.
-
2.
Do nuclear-encoded chloroplast proteins reflect SH phenotypes? For this we performed an in-silico analysis of selected nuclear-encoded protein families acting on chloroplasts, focusing on gene families previously highlighted as being involved in H. vastatrix resistance16,20,21.
Results and discussion
Comparative chloroplast genomic analysis
To address whether cpDNA reflects SH resistance phenotypes, we assembled the whole chloroplast genomes of 42 individuals and studied the topology of the resulting haplotype network. Newly assembled chloroplast genomes ranged between 154,815 and 155,188 bp long and were grouped into 16 haplotypes (Fig. 1A, Supplemental Table S1).
Haplotype network based on whole chloroplast genomes. (A) Newly assembled chloroplast genomes of 42 Coffea sp. genotypes. (B) Sixty Coffea sp. genotypes including the 18 conspecific individuals downloaded from GenBank. Numbers within circles identify the haplotypes. Numbers in edges indicate genetic distances higher than one. Network topology reflects genealogical relationships instead of rust resistance phenotype. Genotype and SH details are presented in Supplemental Table S1.
Haplotype H01 was the most common haplotype, consisting of 22 individuals, with a high diversity of SH factors (Supplemental Table S1). Indeed, all nine SH factors were represented in this haplotype at least three times, such as SH8, and up to 19 times in the case of SH5 (Fig. 2A). Additionally, seven out of the nine SH factors were found in two or more haplotypes (Fig. 2B). These results suggests a lack of maternal inheritance of the SH resistance factors throughout the chloroplast genome. Consistently, haplotypes were distributed throughout the network, separating individuals per species instead of per SH resistance factors.
Lack of relationship between chloroplast haplotypes and SH factors was confirmed as every haplotype is composed of individuals containing multiple SH factors and every SH factor appears in multiple haplotypes: (A) Number of SH factors found in individuals showing Haplotype H01. Note that every individual may harbour more than one SH factor. (B) Number of haplotypes containing each SH factor. Only SH7 and SH9 were found in only one haplotype (H01).
Most C. arabica, HDT hybrids, and HDT-derivatives haplotypes clustered together (exceptions being haplotypes H02 and H16, Fig. 1A). This low differentiation was not surprising given that most of the C. arabica and HDT-derivative individuals share close kinship (Supplemental Table S1). Examples can be found in haplotype H03, which included the parental female C. arabica Dilla and Alghe (CIFC 128/2) and the sibling HDT derivative H468/41 (C. arabica 128/2 × HDT 1343/269); haplotype H05, which included the parental female C. arabica S4 Agaro (CIFC 110/5) and the sibling HDT derivative H583/5 (C. arabica 110/5 × HDT 1343/269). Furthermore, some of the closest haplotypes included individuals that come from the same geographic origin, such as H04 and H05 (with Ethiopian backgrounds; Supplemental Table S1). Others, like haplotypes H03 and H07, grouped landrace genotypes from the northeast African highlands (Rume Sudan Ethiopian landrace; Supplemental Table S1), the geographic origin of C. arabica11,14. When the information from 18 conspecific individuals from NCBI (13 C. arabica individuals) was included in the haplotype network, the Arabica cluster was reinforced (Fig. 1B). Coffea arabica, HDT hybrids, and HDT-derivatives cluster was closer to C. eugenioides haplotypes (genetic distance = 29, Fig. 1B) than to C. canephora haplotypes (genetic distance higher than 800). This was congruent with the accepted hypothesis that C. eugenioides was the female parent of C. arabica4,6. Indeed, based on the similarity in plastid DNA sequences, previous research has suggested that C. eugenioides was the ovule donor during the C. arabica hybridization event7,33,34,35. Our analysis further contributes to the maternal lineage of Coffea sp. (CIFC 951/1; haplotype H14). The position of haplotype H14 in the network (between haplotypes H12 and H13) suggests a maternal inheritance close to C. eugeneoides or a near coffee species.
The chloroplast genome of the HDT hybrids analysed further suggested C. arabica as the female parent. All HDT hybrids were within the haplotype H01, with a high genetic distance from C. canephora genotypes (higher than 800, Fig. 1). To our knowledge, this was the first molecular study addressing the maternal donor of HDT as morphological characteristics were used to infer C. arabica × C. canephora as its ancestors11,14. However, it is worth noting that our approach does not allow us to infer about the initial hybridization event (C. arabica × C. canephora) just about the maternal donor of the CIFC HDT hybrids used in this work. Further research is needed to address this and other alternative hypotheses.
Our analyses also revealed a very low genetic distance between individuals from haplotypes H02 and H16, suggesting a close maternal parentage between them (Fig. 1). Haplotype H16 represented the Kawisari hybrid, one of the oldest Indian hybrids created from C. arabica × C. liberica; haplotype H02 comprised three kinship individuals: C. arabica S 353 4/5 (CIFC 34/13), the female parental of both C. arabica H147/1 (C. arabica 34/13 × C. arabica 110/5) and HDT derivative H535/10 (C. arabica 34/13 × HDT 1343/269) (Fig. 1, Supplemental Table S1). The C. arabica S 353 4/5 genotype¸ from the Central Coffee Research Institute (CCRIBalehonnur, India) was originated from a selection of C. arabica × C. liberica derivatives11. We found a substantial genetic distance between haplotype H02 and the other Arabica haplotypes (almost one thousand). Indeed, haplotypes H02 and H16 were more closely related to C. canephora and C. liberica/excelsa haplotypes (genetic distance = 380 and 382, respectively; Fig. 1B). These findings raised the possibility that the genotypes within H02 and H16 may have originated from a non-arabica/eugenioides female parent. Further research is needed to ascertain its inheritance.
Haplotypes found in C. canephora, C. liberica/excelsa, C. racemosa, and C. eugenioides showed considerable genetic distances within each species, particularly when compared with C. arabica (Fig. 1). Our data reinforced the knowledge of the lower polymorphic genetic diversity of C. arabica when compared to the diploid relative species36.
Comparative analysis of nuclear genes encoding chloroplast-targeted proteins
After confirming that cpDNA did not explain individual resistance patterns, we focused on nuclear-encoded chloroplast proteins, described as targets of retrograde signalling generated within the chloroplast17. The chloroplast proteome has been estimated between 2100 and 3600 proteins, and approximately 3000 chloroplast proteins are nuclear-encoded37. To detect possible association with SH factors, we focused on the 25 individuals with known SH factors (Supplemental Table S1) and on the following nuclear-encoded protein families involved in resistance and acting on chloroplasts16,20,21: ATP-dependent zinc metalloprotease (FtsH); Elongation factor Tu (EFTU); Ferredoxin-thioredoxin reductase (FTR); Thioredoxin reductase (TRR); d-glycerate 3-kinase (GLYK); NAD(P)H dehydrogenase-like (NDH); Thioredoxin and Thioredoxin-like (TRX); Translation initiation factor (IF); Oxygen-evolving enhancer protein (OEE); and Cytochrome b6-f complex iron-sulphur subunit (ISP). In total, 132 nuclear-encoded chloroplast proteins associated with 89 nuclear regions were analysed considering DNA variants in the ORF as well as upstream and downstream flanking regions (Supplemental Table S2). We found 139 variants unevenly distributed among 11 nuclear regions that corresponded to polymorphisms in 8 proteins (Table 1). In addition, several variants found in the upstream and downstream flanking regions (regions 61 and 21 in Fig. 3A, respectively) can also play a role in controlling transcriptional and post-transcriptional events. A disproportionate number of variants for the membrane-anchored thioredoxin-like protein HCF164 (114 out 139) were exclusively shared among SH9 individuals and were mainly associated with the C. canephora-derived sub-genome (chromosome 7c, 112 variants) (Fig. 3A, Table 1). A detailed analysis of this gene showed clear differences between individuals with or without the SH9 factor (Fig. 3B,C). The clustering of individuals within the haplotypic network estimated for this region suggested the potential relationship between variants identified in the HCF164 nuclear region and the presence of the SH9 factor (Fig. 3B). Moreover, the observed variants impacted the peptide sequence codified by this region as protein prediction performed on the 25 studied individuals allowed us to identify three HCF164 protein isoforms. Two of the previous isoforms were exclusively found in non-SH9 individuals both exhibiting the thioredoxin domain and the redox-active disulphide center (CEVC catalytic motif). On the other hand, the five SH9 individuals (HDT genotypes: 832/1; 4106; H420/10; HW26/13; H419/20) shared the third isoform, in contrast to those found in the non-SH9 individuals. This isoform lacks the thioredoxin domain and the peptide sequence of the redox-active disulphide center due to a 19-residue deletion identified in this work (Fig. 3C).
Analysis of the HCF164 sequence in chromosome 7c (C. canephora-derived sug-genome): (A) Depiction of the variants identified in the genomic region encoding the HCF164 protein in chromosome 7c following the reference genome annotation (GCA_003713225). The diagram shows the ORF (composed of five exons represented as grey rectangles) and 2 kbp upstream and downstream flanking regions. The numbers in brackets represent the number of variants identified in the 25 studied individuals, whereas the numbers in bold represent variants potentially associated with the SH9 factor (that is, variants exclusively found in SH9 individuals). (B) Haplotype network of the genomic region encoding the HCF164 protein (including 2 kbp flanking regions) obtained for the 25 studied individuals. (C) Schematic view of the alignment of the three HCF164 protein isoforms predicted in the 25 studied individuals. The thioredoxin domain and the redox-active disulphide center are highlighted in white and black, respectively. Orange represents SH9 individuals and blue represents non-SH9 individuals.
Three-dimensional structural models were developed for HCF164 proteins expressed in SH9 and non-S9 individuals. The α-helix ranging from cysteine 163 to aspartate 182 (C163–D182) containing the typical CEVC catalytic motif of the protein was completely absent in the SH9-individuals (Supplemental Figure S1). This suggests that the lack of the redox-active disulphide center of HCF164 protein in the SH9 individual might have important biochemical implications as thioredoxins target several proteins and can modulate their activity. HCF164 is a membrane-anchored thioredoxin-like protein known to be indispensable for the assembly of the cytochrome b6-f complex (Cytb6-f) in the thylakoid membranes; the loss-of-function hcf mutants exhibited decreased photosynthetic electron transport rates38.
Cytb6-f provides an essential electronic connection between the light-powered chlorophyll protein complexes, photosystems I and II (PSI and PSII). It is suited to sensing the redox state of the electron transfer chain and the chloroplast stroma, interacting with various regulatory elements that transduce these signals to optimise photosynthesis in fluctuating environmental and metabolic conditions39. Cytb6-f complex is a ~ 220 kDa functional dimer with each monomeric unit comprising four major subunits: cytochrome f, cytochrome b6, Rieske iron-sulphur protein (ISP) and subunit IV; as well as four minor subunits39 and references therein. Results obtained by Motohashi and Hisabori40 suggested that the interaction between HCF164 and both the cytochrome f and ISP subunits were important prerequisites for the correct assembly of the Cytb6-f complex. They further evidenced the physiological significance of HCF164 as a transducer of reducing equivalent within the thylakoid lumen. In addition to this complex, HCF164 may interact and probably reduce other target proteins of the thylakoid membrane, such as metalloprotease FtsH2 and FtsH8, several ATP synthase subunits and chlorophyll a-b binding proteins38,40.
HCF164 protein–protein interactions were explored with the STRING database (only protein–protein interactions retrieved from Experimental/Biochemical Data or Association in Curated Databases were considered) using Arabidopsis protein annotations (as the interaction networks are better characterised than in coffee). As DNA variants for GLYK (6 variants localised in chromosome 4e; Table 1) were also exclusively found in SH9 individuals, we consider both proteins for the STRING analysis. Although no direct interaction between HCF164 and GLYK proteins was evidenced, the enrichment p-value obtained (< 1.0e−16) supports that, as a group, the proteins were metabolically connected (Fig. 4) through redox metabolism, photorespiration, and glycolysis. GLYK catalyses the conversion of glycerate to 3-phosphoglycerate involved in photorespiration and redox metabolism. The glyceraldehyde-3-phosphate dehydrogenases ALDH7B4 and ALDH3H1 are described as stress-responsive dehydrogenases that catalyse the conversion of glyceraldehyde 3-phosphate to d-glycerate 1,3-bisphosphate. HCF164 shows several interactions with superoxide dismutase (CDS1, CDS2) and peroxiredoxins (2CPA, 2CPB, PRXIIA, PRXIID, PRXIIE) (Fig. 4). Thereby, any changes to the balance of these proteins can affect chloroplast metabolism.
String predicted network for Arabidopsis thioredoxin-like protein (HCF164, AT4G37200) and d-glycerate 3-kinase (GLYK, AT1G80380) interaction. Only “Experiments” and “Databases” as active interaction sources and “none” in the 1st and 2nd shells were considered. Blue edges—evidence from curated databases; Pink edges—experimental evidence. Numbers: 1—evidence suggesting a functional link; 2—evidence suggesting a functional link and putative homologs were found interacting in other organisms; 3—putative homologs were found interacting in other organisms. Proteins included were: Phosphoglycerate kinase (PGKP2, AT1G56190); Phosphoglycerate mutase (PGAM1, AT1G22170); Glyoxylate/hydroxypyruvate reductase (HPPR2, AT1G79870); Glyceraldehyde-3-phosphate dehydrogenases (ALDH3H1, AT1G44170; ALDH7B4; ALDH2B4, AT3G48000; ALDH2B7); Superoxide dismutase [Cu–Zn] 1 and 2 (CSD; AT1G08830, AT2G28190); Peroxiredoxin II A, D and E (PRXII; AT1G65990, AT1G60740, AT3G52960) and 2-Cys-peroxiredoxin A and B (2CP; AT3G11630, At5g06290).
Recently, the mechanisms of stripe rust (Puccinia striiformis f. sp. tritici) effectors in wheat have been identified. Rust effectors targeted the ISP subunit of the Cytb6-f complex: some effectors interacted with ISP (nuclear-encoded chloroplast protein) in the cytosol blocking its translocation to the chloroplast21; other effectors interacted with ISP within the chloroplast preventing the complex assembly20. Both types of effectors interfered with the Cytb6-f complex functioning and ROS production by chloroplasts20,21. The authors further showed that completely blocking the Cytb6-f complex assembly was not advantageous for the fungus as it led to insufficient nutrients for fungal development in the latter stages of infection. So, although biotrophic rust fungi need to suppress chloroplast-mediated defences by their host plants, they need to retain the biosynthetic abilities of these organelles, which are vital for their survival.
The association of HCF164 polymorphism on chromosome 7c (C. canephora-derived sub-genome) with the resistance factor SH9 aligns with the existing information that the SH9 coffee resistance factor to H. vastatrix is derived from major genes from C. canephora (considered a resistance source)12 and references therein. The lack of a thioredoxin domain and redox-active disulphide center of HCF164 protein isoform expressed only by the SH9 individuals may suggest a biochemical advantage of these individuals over others. This difference in HCF164 function may result in a greater ability for SH9 individuals to resist fungal infections or to better regulate other biological processes. On the other hand, the redox-related roles of HCF164 might be taken over by other thioredoxins-like proteins. It will be necessary to determine if SH9-HCF164 is recognized by H. vastatrix effectors and if it could act as a decoy, preventing the effector’s function(s) while still allowing normal plant development. However, functional redundancy (or metabolic flexibility) is proposed but has not yet been fully characterised. Our results further reinforce the chloroplast-mediated defences against leaf rust, particularly carbon metabolism and redox homeostasis16. This study shows a strategy for searching proteins/genes associated with SH factors as well as candidate H. vastatrix effector targets, thus opening new perspectives for plant breeding programs.
Material and methods
Biological material
Forty-two coffee genotypes, including HDT, HDT-derivatives, C. arabica, C. eugenioides, C. canephora, and other related Coffea sp. were used (Table 2). Genotypes with different coffee resistance factors to H. vastatrix (from highly susceptible to highly resistant to rust) were considered. This selection was performed to maximise the number of individuals showing every known SH while minimising the total number of individuals (n = 25). Additional eighteen individuals carrying unknown SH factors were included to reconstruct the evolutionary and ancestry patterns among the studied individuals (Supplemental Table S1).
DNA extraction, sequencing, and nuclear genome analysis
For every individual, we isolated DNA from fresh leaf tissue using the NucleoSpin Plant II kit (Macherey–Nagel) following the manufacturer’s instructions. We prepared libraries using the TruSeq DNA PCR-Free kit (Illumina), and sequencing was performed by Novogene Inc. using two lanes of the HiSeq × System (Illumina) using 2 × 150 bp paired-end reads. We used CUTADAPT41 to filter the resulting fastq files and retained fragments with quality values higher than 20. We aligned filtered reads using BWA42 with default parameters and the C. arabica genome (GenBank accession number GCA_003713225.1) as a reference. We used SAMtools43 and BCFtools44 to process mapped reads, perform variant calls, subset vcf files, and obtain fasta DNA sequences per candidate region.
Chloroplast genome analysis
De novo assembly
We used NOVOPlasty45 to assemble individual chloroplast genomes from raw fastq files. We used the large subunit of the ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) from Zea mays (GenBank accession number V00171.1) as a seed and the entire chloroplast of C. arabica (EF044213.1) as a reference to solve conflicting regions found during the assembly. Following this strategy, we obtained circularised molecules for all 42 individuals that were subsequently reset using custom R scripts to homogenise the starting position to the sequence to TAGGCGAACGACGGGAATTGAA (one mismatch allowed). This sequence (corresponding with the intergenic region between trnH-GUG and rps19) was selected because it is the starting point in several C. arabica chloroplast genomes available in GenBank. The resulting complete chloroplast genomes are available in GenBank with accession numbers OQ946685-OQ946726 and were aligned using MAFFT46 with the “—auto” flag. For haplotype analysis, and in addition to the genomes of the 42 CIFC individuals we made use of 18 Coffea sp. chloroplast genomes available in GenBank (see details below).
We identified haplotypes on the aligned sequences using the haplotype function in the pegas package in R47, setting the argument “strict = TRUE” to consider ambiguities and gaps to differentiate haplotypes. We also used the pegas package to compute the haplotype network as implemented in the haploNet function with default arguments. Aliview v1.2548 was used to visualise fasta sequences.
Coffea chloroplast genomes available at the GenBank
We searched all the sequences in the GenBank Nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore, accessed 19 August 2022) containing the word “coffea” and filtered the resulting sequences by genetic compartment (Chloroplast) and sequence length (100,000–200,000 bp). As a result, we obtained 32 accessions belonging to 17 species with genome sizes ranging from 154,545 to 155,277 bp. Only genomic information from species represented in 42 CIFC genotypes was further considered (18 conspecific individuals). Thirteen of the 18 accessions are classified as C. arabica which showed the largest genome sizes (ranging from 155,186 to 155,277 bp whereas the remaining species range from 154,751 to 154,951 bp, Supplemental Table S1).
Nuclear-encoded candidate protein selection
In previous studies, several genes/proteins were identified as candidate coffee resistance markers that are simultaneously involved in chloroplast primary metabolism16. In addition, Xu et al.20 and Wang et al.21 highlighted the suppression of chloroplast function by wheat stripe rust effectors, targeting the cytochrome b6-f complex and, thus the photochemical reactions. Considering the information provided by these studies, we considered ten nuclear-encoded chloroplast protein families as potential candidates involved in coffee rust resistance. The selected protein families were ATP-dependent zinc metalloprotease (FtsH); Elongation factor Tu (EFTU); Ferredoxin-thioredoxin reductase (FTR); Thioredoxin reductase (TRR); d-glycerate 3-kinase (GLYK); NAD(P)H dehydrogenase-like (NDH); Thioredoxin and Thioredoxin-like (TRX); Translation initiation factor (IF); Oxygen-evolving enhancer protein (OEE); and Cytochrome b6-f complex iron-sulphur subunit (ISP).
Using the C. arabica gff3 file (available at https://www.ncbi.nlm.nih.gov/genome/browse/#!/proteins/77/418079%7CCoffea%20arabica/, last accessed on 4 November 2022), we identified all the proteins with annotation matching the following terms: “Ferredoxin-thioredoxin reductase” (4 proteins), “Oxygen-evolving enhancer protein” (4 proteins), “thioredoxin” (113 proteins found, used 58), “thioredoxin reductase” (4 proteins), “NAD(P)H dehydrogenase” (18 proteins), “glycerate” (6 proteins found, used 2), “FtsH” (26 proteins found, used 18), “elongation factor” (74 proteins found, used 10), “translation initiation factor” (142 proteins found, used 13), “b6-f” (1 protein found). Only DNA nuclear-encoded proteins were considered and, overall, our search resulted in a total of 132 proteins associated with 89 different loci (Supplemental Table S2). We used the C. arabica gff3 to record the starting and ending position of every protein gene within the C. arabica genome. We obtained fasta files for every Open Reading Frame (ORF) using BCFtools44 and the vcf file resulting from aligning our sample genomes to the reference as described above. In addition, we produced another two fasta files, one containing 2000 bp upstream of the first nucleotide in the ORF and the other containing 2000 bp downstream of the last nucleotide in the ORF. For every of the three resulting fasta files (that is, the ORF and the two flanking regions) we used a custom R script to identify variants exclusively shared among individuals showing a given SH. Only genotypes with known SH (n = 25) were considered for this analysis, aiming to associate single-nucleotide polymorphisms (SNPs) profiles with a particular SH.
Nuclear candidate protein isoforms
SNPs and other variants identified in a sequence may impact the primary sequence of amino acids, not only by changing the codon in a triplet but also by changing nucleotides defining introns and exons or modifying regulatory sequences. To evaluate the potential impact of variants exclusively shared among individuals showing a given SH, we performed gene prediction on individual sequences as implemented in the AUGUSTUS web interface (available at: http://bioinf.uni-greifswald.de/augustus/submission.php49) using the training for Solanum lycopersicum and selecting the options “Report genes on both strands”, “Middle alternative transcripts”, and “only predict complete genes”. We searched for conserved domains in every resulting predicted peptide using the Expasy ScanProsite tool (available at https://prosite.expasy.org/scanprosite/) and DELTA-BLAST (Domain Enhanced Lookup Time Accelerated BLAST available at https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastpandPAGE_TYPE=BlastSearchandLINK_LOC=blasthome).
Nuclear candidate protein interaction network
Our analyses highlight two proteins, the thioredoxin-like protein HCF164 (Supplemental Table S2, protein identifier TRX44) and the d-glycerate 3-kinase isoform X1 (Supplemental Table S2, protein identifier GLYK2), as having different SNP profiles for genotypes showing and lacking SH9. To explore their potential effect on metabolic pathways, we reconstructed the interaction network of those proteins using STRING v1.5 (https://string-db.org/; 24 May 2022). The STRING database50 allows for the exploration of known and predicted protein–protein interaction networks, particularly in well-characterised model species, such as Arabidopsis thaliana. The analysis was performed using the A. thaliana homologs (AT4G37200.1 and AT1G80380.2 for HCF164 and GLYK, respectively), identified by the amino acid sequence of our candidate proteins from Coffea (using the “search protein by sequence” STRING tool). The interaction network for each protein was obtained using the following settings: “Experiments” and “Databases” as active interaction sources; “0.400” as the minimum required interaction score (default); “no more than 20 interactions” in the 1st and 2nd shell. Proteins retrieved as having an interaction with HCF164 or GLYK were used to generate a protein–protein interaction network (same settings applied but with “none” in the 1st and 2nd shells).
HCF164 protein molecular modelling
A tridimensionality structural model for the thioredoxin-like domain (from G131 to E244) containing the CEVCRELAPDIYKIEQQYK deletion was built for the HCF164 protein using as a template for the high confidence regions, namely from T104 to V253, of the Alpha fold structure for C. arabica (https://www.uniprot.org/uniprotkb/A0A6P6TDQ6/entry). One hundred models were generated with the Modeller 9v2251, and the one with the lowest value for Modeller’s objective function was selected and validated using Procheck52. All molecular figures were created using PyMOL53.
Research Plant Statement
All methods were carried out in accordance with relevant institutional, national, and international guidelines and legislation guidelines complying with the Convention on Biological Diversity (https://www.cbd.int/convention/) and the Convention on the trade in Endangered Species of Wild Fauna and Flora (https://cites.org/eng). All plants used in this study belong to Centro de Investigação das Ferrugens do Cafeeiro CIFC—Coffee Rust Research Center) collection. CIFC is a research and advanced training Center of the Instituto Superior de Agronomia (ISA), Universidade de Lisboa (ULisboa) (https://www.isa.ulisboa.pt/en/cifc/about). The work presented in the manuscript was approved and developed under the research agreement of HDT-Coffee project.
Data availability
Individual chloroplast genomes generated for this study can be found in GenBank Accession Numbers OQ946685-OQ946726.
Change history
25 October 2023
A Correction to this paper has been published: https://doi.org/10.1038/s41598-023-45266-1
References
Talhinhas, P. et al. The coffee leaf rust pathogen Hemileia vastatrix: One and a half centuries around the tropics. Mol. Plant Pathol. 18, 1039. https://doi.org/10.1111/mpp.12512 (2017).
Mehrabi, Z. & Lashermes, P. Protecting the origins of coffee to safeguard its future. Nat. Plants 3, 16209. https://doi.org/10.1038/nplants.2016.209 (2017).
International Coffee Organization. Coffee Market Report, accessed 08 February 2023; www.ico.org/documents/cy2022-23/cmr-0123-e.pdf (2023).
Lashermes, P. et al. Molecular characterisation and origin of the Coffea arabica L. genome. Mol. Gen. Genet. 261, 259–266. https://doi.org/10.1007/s004380050965 (1999).
Davis, A. P. et al. High extinction risk for wild coffee species and implications for coffee sector sustainability. Sci. Adv. 5, eaav3473. https://doi.org/10.1126/sciadv.aav3473 (2019).
Scalabrin, S. et al. A single polyploidization event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm. Sci. Rep. 10, 4642. https://doi.org/10.1038/s41598-020-61216-7 (2020).
Bawin, Y. et al. Phylogenomic analysis clarifies the evolutionary origin of Coffea arabica. J. Syst. Evol. 59, 953–963. https://doi.org/10.1111/jse.12694 (2021).
Vieira, L. G. E. et al. Brazilian coffee genome project: An EST-based genomic resource. Braz. J. Plant Physiol. 18, 95–108 (2006).
Avelino, J. et al. The coffee rust crises in Colombia and Central America (2008–2013): Impacts, plausible causes and proposed solutions. Food Secur. 7, 303–321. https://doi.org/10.1007/s12571-015-0446-9 (2015).
Silva, M. C. et al. Coffee resistance to the main diseases: Leaf rust and coffee berry disease. Braz. J. Plant Physiol. 18, 119–147. https://doi.org/10.1590/S1677-04202006000100010 (2006).
Bettencourt, A. J. & Rodrigues, C. J. Jr. Principles and practice of coffee breeding for resistance to rust and other diseases. In Coffee Agronomy Vol. IV (eds Clarke, R. J. & Macrae, R.) 199–234 (Elsevier Applied Science Publishers LTD, 1988).
Silva, M. C. et al. An overview of the mechanisms involved in Coffee-Hemileia vastatrix interactions: Plant and pathogen perspectives. Agronomy 12, 326. https://doi.org/10.3390/agronomy12020326 (2022).
Noronha-Wagner, M. & Bettencourt, A. J. Genetic study of the resistance of Coffea spp. to leaf rust. Can. J. Bot. 45, 2021–2031. https://doi.org/10.1139/b67-220 (1967).
Rodrigues, C. J. Jr., Bettencourt, A. J. & Rijo, L. Races of the pathogen and resistance to coffee rust. Ann. Rev. Phytopathol. 13, 49–70 (1975).
Vieira, A. et al. Expression profiling of genes involved in the biotrophic colonisation of Coffea arabica leaves by Hemileia vastatrix. Eur. J. Plant Pathol. 133, 261–277. https://doi.org/10.1007/s10658-011-9864-5 (2012).
Possa, K. et al. Primary metabolism is distinctly modulated by plant resistance inducers in Coffea arabica leaves infected by Hemileia vastatrix. Front. Plant Sci. 11, 309. https://doi.org/10.3389/fpls.2020.00309 (2020).
Littlejohn, G. R., Breen, S., Smirnoff, N. & Grant, M. Chloroplast immunity illuminated. New Phytol. 229, 3088–3107. https://doi.org/10.1111/nph.17076 (2021).
Göhre, V. Photosynthetic defence. Nat. Plants 1, 15079. https://doi.org/10.1038/nplants.2015.79 (2015).
Rojas, C. M., Senthil-Kumar, M., Tzin, V. & Mysore, K. S. Regulation of primary plant metabolism during plant–pathogen interactions and its contribution to plant defense. Front. Plant Sci. https://doi.org/10.3389/fpls.2014.00017 (2014).
Xu, Q. et al. An effector protein of the wheat stripe rust fungus targets chloroplasts and suppresses chloroplast function. Nat. Commun. 10, 5571. https://doi.org/10.1038/s41467-019-13487-6 (2019).
Wang, X. et al. Two stripe rust effectors impair wheat resistance by suppressing import of host Fe–S protein into chloroplasts. Plant Physiol. 187, 2530–2543. https://doi.org/10.1093/plphys/kiab434 (2021).
Suresh, N., Shivanna, M. B. & Ram, A. S. Maternal inheritance of chloroplast DNA in Coffea arabica hybrids. Res. Biotech. 3, 39–44 (2012).
Guyeux, C. et al. Evaluation of chloroplast genome annotation tools and application to analysis of the evolution of coffee species. PLoS ONE 14, e0216347. https://doi.org/10.1371/journal.pone.0216347 (2019).
Park, J., Kim, Y., Xi, H. & Heo, K. I. The complete chloroplast genome of coffee tree, Coffea arabica L. ‘Blue Mountain’ (Rubiaceae). Mitochondrial DNA B 16, 2436–2437. https://doi.org/10.1080/23802359.2019.1636729 (2019).
Park, J. et al. The complete chloroplast genome of high production individual tree of Coffea arabica L. (Rubiaceae). Mitochondrial DNA B 4, 1541–1542. https://doi.org/10.1080/23802359.2019.1600386 (2019).
Park, J. et al. The complete chloroplast genomes of two cold hardness coffee trees, Coffea arabica L. (Rubiaceae). Mitochondrial DNA B 5, 1619–1621. https://doi.org/10.1080/23802359.2020.1715883 (2020).
Min, J., Kim, Y., Xi, H., Heo, K.-I. & Park, J. The complete chloroplast genome of coffee tree, Coffea arabica L. ‘Typica’ (Rubiaceae). Mitochondrial DNA B 4, 2240–2241. https://doi.org/10.1080/23802359.2019.1624213 (2019).
Ly, S. N. et al. Chloroplast genomes of Rubiaceae: Comparative genomics and molecular phylogeny in subfamily Ixoroideae. PLoS ONE 15, e0232295. https://doi.org/10.1371/journal.pone.0232295 (2020).
Mekbib, Y. et al. Chloroplast genome sequence variations and development of polymorphic markers in Coffea arabica. Plant Mol. Biol. Rep. 38, 491–502. https://doi.org/10.1007/s11105-020-01212-3 (2020).
Samson, N., Bausher, M. G., Lee, S. B., Jansen, R. K. & Daniell, H. The complete nucleotide sequence of the coffee (Coffea arabica L.) chloroplast genome: Organization and implications for biotechnology and phylogenetic relationships amongst angiosperms. Plant Biotechnol. J. 5, 339–353. https://doi.org/10.1111/j.1467 (2007).
Wu, D. et al. The complete chloroplast genome sequence of an economic plant Coffea canephora. Mitochondrial DNA B 2, 483–485. https://doi.org/10.1080/23802359.2017.1361353 (2017).
Woodson, J. D. Control of chloroplast degradation and cell death in response to stress. Trends Biochem. Sci. 47, 851–864. https://doi.org/10.1016/j.tibs.2022.03.010 (2022).
Maurin, O. et al. Towards a phylogeny for Coffea (Rubiaceae): Identifying well-supported lineages based on nuclear and plastid DNA sequences. Ann. Bot. 100, 1565–1583. https://doi.org/10.1093/aob/mcm257 (2007).
Tesfaye, K., Borsch, T., Govers, K. & Bekele, E. Characterization of Coffea chloroplast microsatellites and evidence for the recent divergence of C. arabica and C. eugenioides chloroplast genomes. Genome 50, 1112–1129. https://doi.org/10.1139/G07-088 (2007).
Charr, J.-C. et al. Complex evolutionary history of coffees revealed by full plastid genomes and 28,800 nuclear SNP analyses, with particular emphasis on Coffea canephora (Robusta coffee). Mol. Phylogenet. Evol. 151, 106906. https://doi.org/10.1016/j.ympev.2020.106906 (2020).
Negawo, A. T., Crouzillat, D., Pétiard, V. & Brouhan, P. Genetic diversity of Arabica coffee (Coffea arabica L.) collections. Ethiop. J. Appl. Sci. Technol. 1, 63–79 (2010).
Li, H.-M. & Chiu, C.-C. Protein transport into chloroplasts. Ann. Rev. Plant Biol. 61, 157–180. https://doi.org/10.1146/annurev-arplant-042809-112222 (2010).
Lennartz, K. et al. HCF164 encodes a thioredoxin-like protein involved in the biogenesis of the cytochrome b6f complex in Arabidopsis. Plant Cell 11, 2539–2551. https://doi.org/10.1105/tpc.010245 (2001).
Malone, L. A., Proctor, M. S., Hitchcock, A., Hunter, C. N. & Johnson, M. P. Cytochrome b6f—Orchestrator of photosynthetic electron transfer. Biochim. Biophys. Acta BBA Bioenerg. 1862, 148380. https://doi.org/10.1016/j.bbabio.2021.148380 (2021).
Motohashi, K. & Hisabori, T. HCF164 receives reducing equivalents from stromal thioredoxin across the thylakoid membrane and mediates reduction of target proteins in the thylakoid lumen. J. Biol. Chem. 281, 35039–35047. https://doi.org/10.1074/jbc.M605938200 (2006).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12. https://doi.org/10.14806/ej.17.1.200 (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 15, 1754–1760. https://doi.org/10.1093/bioinformatics/btp324 (2009).
Li, H. et al. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25, 2078–2079. https://doi.org/10.1093/bioinformatics/btp352 (2009).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 298729–298793. https://doi.org/10.1093/bioinformatics/btr509 (2011).
Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: De novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 45, e18. https://doi.org/10.1093/nar/gkw955 (2017).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780. https://doi.org/10.1093/molbev/mst010 (2013).
Paradis, E. pegas: An R package for population genetics with an integrated–modular approach. Bioinformatics 26, 419–420. https://doi.org/10.1093/bioinformatics/btp696 (2010).
Larsson, A. AliView: A fast and lightweight alignment viewer and editor for large data sets. Bioinformatics 30, 3276–3278. https://doi.org/10.1093/bioinformatics/btu531 (2014).
Keller, O., Kollmar, M., Stanke, M. & Waack, S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27, 757–763. https://doi.org/10.1093/bioinformatics/btr010 (2011).
Szklarczyk, D. et al. STRING v11: Protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47, D607–D613. https://doi.org/10.1093/nar/gky1131 (2019).
Šali, A., Potterton, L., Yuan, F., van Vlijmen, H. & Karplus, M. Evaluation of comparative protein modeling by MODELLER. Proteins 23, 318–326. https://doi.org/10.1002/prot.340230306 (1995).
Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. PROCHECK: A program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291. https://doi.org/10.1107/S0021889892009944 (1993).
Delano, W. L. The PyMOL Molecular Graphics System. Version 0.98 (Delano Scientific LLC, 2003).
Acknowledgements
This work was funded by Portuguese national funds through FCT—Fundação para a Ciência e Tecnologia, I.P., under the Projects: UID/AGR/04129/2020 of LEAF; UIDP/04378/2020 and UIDB/04378/2020 of UCIBIO; and LA/P/0140/2020 of i4HB and FCT and FEDER funds through PORNorte under the projects: HDT-Coffee (PTDC/ASP-PLA/32429/2017) and CoffeeRES (PTDC/ASP-PLA/29779/2017). H.A. was supported by Portuguese national funds through FCT within the scope of the Stimulus of Scientific Employment—Individual Support (CEECIND/00399/2017/CP1423/CT0004). A.O. was supported at the University of Bristol by Oracle for Research and the Biological and Biotechnological Sciences Research Council ([BB/X009831/1] and [BBW003449/1]). All molecular modelling work was carried out using the computational facilities of the Advanced Computing Research Centre, University of Bristol (http://www.bris.ac.uk/acrc).
Author information
Authors and Affiliations
Contributions
L.G. and A.P. contributed to the conception and design of the study. V.V. selected the biological material for sequencing. J.V., H.A., F.G., A.J., and A.P. contributed to genome sequencing and analysis. A.O. performed protein modelling and wrote the section of the manuscript. L.G., C.P. and A.P., performed the protein analysis and wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this Article was revised: The original version of this Article contained an error in Figure 3, panels B and C, where the results were incorrectly displayed.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Guerra-Guimarães, L., Pinheiro, C., Oliveira, A.S.F. et al. The chloroplast protein HCF164 is predicted to be associated with Coffea SH9 resistance factor against Hemileia vastatrix. Sci Rep 13, 16019 (2023). https://doi.org/10.1038/s41598-023-41950-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-023-41950-4






