Abstract
Similar to many eukaryotes, the thermoacidophilic archaeon Saccharolobus islandicus follows a defined cell cycle program, with two growth phases, G1 and G2, interspersed by a chromosome replication phase (S), and followed by genome segregation and cytokinesis (M-D) phases. To study whether and which other processes are cell cycle-coordinated, we synchronized cultures of S. islandicus and performed an in-depth transcriptomic analysis of samples enriched in cells undergoing the M-G1, S, and G2 phases, providing a holistic view of the S. islandicus cell cycle. We show that diverse metabolic pathways, protein synthesis, cell motility and even antiviral defense systems, are expressed in a cell cycle-dependent fashion. Moreover, application of a transcriptome deconvolution method defined sets of phase-specific signature genes, whose peaks of expression roughly matched those of yeast homologs. Collectively, our data elucidates the complexity of the S. islandicus cell cycle, suggesting that it more closely resembles the cell cycle of certain eukaryotes than previously appreciated.
Similar content being viewed by others
Introduction
The life of a cell unfolds through a series of intricately coordinated events that culminate in the production of two daughter cells. Faithful execution of this program, known as the cell cycle, ensures the perpetuation of cellular life. The core processes essential for cell cycle progression, such as accumulation of biomass, genome replication and cytokinesis, are common to all organisms1,2,3, but their coordination and the underlying molecular mechanisms exhibit remarkable diversity. For instance, eukaryotes encode a diverse set of cyclins1, the molecular regulators of the cell cycle, for which no homologs exist in archaea and bacteria2,3. Understanding the interplay between different cellular processes and the evolution of these relationships in different cellular domains is of fundamental interest and practical importance, because it offers insights into how both unicellular and multicellular life forms have evolved on our planet.
Bacteria and eukaryotes rely on distinct genome replication and cell division machineries, with bacteria using the FtsZ-based system for division4 and eukaryotes employing the ESCRT (endosomal sorting complexes required for transport) machinery for membrane abscission during cytokinesis5,6. In most model eukaryotes in which cell cycle has been studied, genome replication and cytokinesis are separated in time with the cell cycle being divided into four phases1,7,8: (i) during the first gap (G1) phase the cell grows and prepares for genome replication; (ii) the synthesis of genomic DNA takes place during the S phase; (iii) the second gap (G2) phase is a period of rapid cell growth and protein synthesis; and, finally, (iv) during the mitosis (M) phase the sister chromatids are segregated and the cell is divided in two. Progression through the cell cycle phases is tightly controlled at several checkpoints1,7,8, with errors at any of the checkpoints leading to devastating consequences at both cellular and organismal levels, including cell death, cancers and various other pathologies9,10,11.
In bacteria, the cell cycle is traditionally divided into three periods2,12: (i) the birth (B) period defined as the time between cell birth and initiation of genome replication, (ii) the C period–from chromosome replication initiation to termination, and (iii) the D period, which corresponds to the time between completion of DNA replication and cell division. Unlike in eukaryotes, the periods of the bacterial cell cycle are typically less strictly separated in time. For instance, bacterial genome replication is concomitant with segregation of chromosomal DNA into developing daughter cells and, depending on the growth conditions, generation time can be shorter than the combined duration of the C and D periods, leading to multiple DNA replication initiation events and overlapping replication cycles in each cell13. Accordingly, bacteria generally lack the cell cycle checkpoints, although in some bacterial models, cell volume and other characteristics (e.g., motility or lack thereof) are important for cell cycle progression14,15,16.
Archaea comprise a distinct domain of life and represent the closest ancestors of eukaryotes17,18. These single-celled organisms display a remarkable diversity of metabolic capabilities, environmental adaptations and molecular machineries responsible for key cellular processes. Similar to bacteria, archaea have circular chromosomes with most genes organized into operons19. However, archaeal proteins involved in replication, transcription and translation are more closely related to homologs in eukaryotes20,21. In contrast, cell division systems can be either bacterial-like, based on the FtsZ rings22,23,24, or eukaryotic-like, centered around the ESCRT complex25,26,27,28.
Thermoacidophilic archaea (optimal growth at ~80 °C and pH~3) in the order Sulfolobales emerged as models for cell biology and cell cycle studies29. The coccoid-shaped Sulfolobales cells use the ESCRT machinery for division and follow the eukaryotic-like cell cycle paradigm (Fig. 1A). An exponentially growing Sulfolobales cell starts the cycle with a short ( < 5% of the cell cycle) pre-replicative G1 phase, which is followed by the genome replication S phase (30-35% of the cycle). Then, the cell enters the longest G2 phase ( > 50% of the whole cycle) during which the cell prepares for genome segregation. Finally, the cycle culminates with two short, M and D, phases (each lasting <5% of the cycle) during which the genome copies are segregated and the cell is divided19,29. The overall program of the cell cycle appears to be conserved throughout the class Thermoprotei (formerly known as phylum Crenarchaeota)30. Importantly, the cell cycle in a Sulfolobales population can be synchronized using a transient treatment with acetic acid, which presumably induces respiration uncoupling, arresting the cells in the post-replicative G2 cell cycle phase. Acetic acid removal allows the near-synchronous resumption of the cell cycle29. Whereas the overall outline of the Sulfolobales cell cycle and coordination between genome replication and cytokinesis have been defined19,25,31,32,33,34, it remains unknown whether other central cellular processes, such as diverse metabolic and catabolic pathways or protein translation, are harmonized with the cell cycle.
A The cell cycle of an exponentially growing Saccharolobus cell. Cell cycle phases occupy an area proportional to their typical duration. Morphological changes occurring throughout the cell cycle are depicted next to the corresponding phases. B Experimental workflow for the transcriptomic analysis. Fifteen S. islandicus colonies were isolated and inoculated in liquid medium. The corresponding cultures were synchronized and samples collected at the indicated time points. The cell cycle phase at each sampled time point was determined by evaluating the genomic content of the cells using flow cytometry in three replicates (a representative profile is shown). The total RNA was extracted and sequenced for each of the 15 cultures during the time points corresponding to the enrichment in cells undergoing the M-G1, S and G2 phases. C Differential gene expression by pair-wise comparison of the three analyzed phases (M-G1, S and G2) represented using volcano plots. The y-axis represents the adjusted p value in logarithmic scale in base 10. The x-axis represents the fold change (FC) between the two phases in logarithmic scale in base 2. The horizontal lines mark the thresholds for significance, i.e., a p value of <0.01 (-log10(0.01) = 2). The vertical lines mark the thresholds for strong differential expression, i.e., FC of 2 (log2(±2) = ± 1). Statistical significance is extracted from the DGE analysis (Supplementary Data 1), where it was calculated with the limma package and p values were adjusted for multiple comparisons using the Benjamini-Hochberg method. Genes of interest are marked with general annotations.
To obtain a more integrated view of the different processes taking place during the archaeal cell cycle, we performed a deep transcriptomic analysis using Saccharolobus islandicus (formerly Sulfolobus islandicus; order Sulfolobales) as a model. We analyzed differential gene expression patterns during distinct cell cycle phases and harnessed the the statistical framework of the consensus Gene Co-expression Networks (GCN) to analyze negative and positive correlations between all expressed genes. The two complementary analytical approaches showed that the cell cycle of S. islandicus more closely follows the general eukaryotic paradigm than previously appreciated. We show that not only replication, chromosome segregation and division, but also other cellular processes occur in a cell cycle-dependent manner. Finally, we used a transcriptome deconvolution method to identify signature genes that are specific of particular cell cycle phases. Remarkably, these genes were generally well-conserved across Thermoproteota and their timing of expression matched the peak of expression of their homologs in yeast. Collectively, our data substantially improve the understanding of the S. islandicus cell cycle, opening new avenues for future research.
Results and discussion
Overview of the transcriptional landscape across cell cycle phases
To analyze if S. islandicus cells coordinate different processes as they progress through the cell cycle, we sequenced the transcriptomes at three time points during which the populations were enriched in cells undergoing the M-G1, S and G2 phases, as evidenced by flow cytometry (Fig. 1B, and Supplementary Fig. S1A). Due to the short duration of the consecutive M, D and G1 phases, populations specifically enriched in these phases could not be collected separately. Hence, the corresponding populations were pooled together within a single sample, denoted herein as M-G1. To obtain additional information about the coordination of gene expression, we leveraged the statistical framework of the consensus Gene Co-expression Networks (GCN). Out of the total of 2630 predicted genes, including protein coding genes, tRNA, rRNA and ncRNA, 2558 were expressed (i.e., at least one read per million reads), with 2356 genes being expressed in all three phases (Supplementary Fig. S1B, and Supplementary Data 1). Principal component analysis showed that 56.4% of the variance can be explained by the three first principal components and that samples can be clustered according to the cell cycle phases (Supplementary Fig. S1C). Notably, comparison of the transcriptomes from synchronized cultures with several publicly available transcriptomes from unsynchronized cultures showed that the two datasets clustered separately (Supplementary Fig. S1C), suggesting that the variance observed in our study is due to differences between the cell cycle phases.
The S. islandicus chromosome is structurally organized into A and B compartments that have high and low gene expression, respectively35,36. Consistently, genes in the A compartment had on average higher expression than genes in the B compartment during all cell cycle phases (Supplementary Fig. S1D), with the expression being the highest near the three origins of replication (ori). However, there was only negligible difference (t-test, p = 0.07) in the overall level of expression within the ori proximal regions during different cell cycle phases.
Pairwise comparison of the gene expression profiles during consecutive cell cycle phases, namely, M-G1 vs S, S vs G2 and G2 vs M-G1, revealed 743 differentially expressed genes (i.e., more than 1.4-fold change difference, adjusted p value < 0.01), which were either up- or down-regulated during a particular cell cycle phase (Fig. 1C; Supplementary Fig. S2), suggesting that expression of 28% of S. islandicus genes follows a coordinated pattern throughout the cell cycle. Of the 743 differentially expressed genes, 308 were specifically up- or down-regulated during a particular cell cycle phase, pointing to marked differences between the processes that occur during the three phases. The M-G1 phase displayed the highest number of differentially expressed genes when compared to either S or G2 phase (n = 198 in M-G1 vs n = 3 in S and n = 107 in G2).
Differential gene expression (DGE) analysis provides information on the changes in the extent of expression of genes during a given cell cycle phase. However, although instructive, this information does not capture the subtle changes in the rewiring of gene co-expression patterns, which might have a major impact on the progression of the cell cycle. Thus, to gain an orthogonal view on the co-expression of genes during different phases of the cell cycle, we constructed consensus GCNs for each of the three phases. Each GCN consists of nodes and edges, where nodes are genes and edges, i.e., lines connecting the nodes, represent statistically significant positive or negative correlations between the expression values of the connected genes. Each GCN had a different number of nodes and edges, which were classified into (i) the ‘core’, i.e., common to all three phases, (ii) ‘phase-specific’, i.e., co-expressed during one of the phases or (iii) ‘other’, if the co-expression was detected during two of the three phases (Supplementary Fig. S3). This analysis revealed that the S phase is characterized by the most complex network containing the highest number of edges and nodes (Supplementary Data 1), indicative of higher coordination of gene expression during this phase compared to the other phases. The DGE and GCN analyses show that a considerable fraction of genes follows cell cycle-dependent patterns of expression that manifest either at the level of expression strength or co-expression wiring. Notably, the GCNs can be used to uncover diverse aspects of cell biology, including functional associations between genes, shared promoter signatures and so much more (see Supplementary Note 1 for details).
Composition and properties of the core network
Contrasting the phase-dependent expression of genes that, by definition, characterize a particular cell cycle phase, the genes and groups of genes that ensure the maintenance of cell viability are expected to be co-expressed throughout the cell cycle. This co-expression is expected to occur regardless of whether the products of the co-expressed genes are functionally coupled, i.e., function in the same process or pathway. Analysis of such constitutively co-expressed genes, which we refer to as the housekeeping ‘core’ network, allowed us to define the house-keeping tasks necessary for the functioning of a cell. The core GCN, consisting of the core nodes connected by the core edges, contained 417 genes (Supplementary Data 2; note that only groups of four or more co-expressing genes were included as part of the core network). Analysis of the overall expression levels of the genes forming the core network revealed notable variation in expression levels across cell cycle phases, with significantly higher activity during the S and G2 phases (p value < 0.0001; Fig. 2A).
A Violin plots of the expression of the genes included in the core network throughout the three phases: M-G1, S and G2. A dashed horizontal line marks the average of the M-G1 phase as a reference. The expression of each gene corresponds to the average of 15 biological replicates, the expression of 411 core genes included in the DGE analysis is plotted in each violin plot. In the box plots, the center line represents the median; the box limits, the first and third quartiles; whiskers extend 1.5 times the interquartile range; data beyond the whiskers are outliers represented as points. Statistical significance was calculated by a two-sided paired Wilcoxon test. *** = p value < 0.0001. p values M-G1 vs S = 5.03e-18, M-G1 vs G2 = 6.1e-27. B Functional classification of the core genes. Genes were annotated and classified using the arCOG framework (see Methods). Correspondence between the letter code and the name of each category is provided in Supplementary Data 1. The different categories are grouped into four classes. C Topological representation of the core network. Each node is colored according to its arCOG category following the same color coding as in (B). D Essentiality of the genes in the core network compared to the non-core genes. Compartmentalization of the genes in the core network compared to the non-core genes. E Compartmentalization of the genes in the core network compared to the non-core genes. Information on compartmentalization of the genome was extracted from data obtained previously36. F Conservation of the genes in the core network compared to the non-core genes. Genes were assigned to four categories: (i) exclusive to Saccharolobus, (ii) restricted to Sulfolobales, (iii) conserved across Thermoproteota, and (iv) present in Thermoproteota and Methanobacteriota, Halobacteriota and Thermoplasmatota. Statistical significance was calculated by performing a Chi-square test. *** = p value = 6.347e-10.
All the nodes (i.e., genes) in the core network were assigned to functional categories according to the archaeal Clusters of Orthologous Genes (arCOG) annotation37 (Fig. 2B). Only ~15% of the genes in the core network could not be functionally annotated (arCOG categories R and S). Most genes in the core network were associated with central cellular processes such as translation, replication, energy production and various metabolic pathways. Analysis of the network topology, revealed several differentiated subclusters (Fig. 2C). Their presence suggests the existence of transcriptional fine-tuning in functionally linked genes. For instance, the largest subcluster was enriched in genes responsible for ribosome structure (e.g., ribosomal proteins uL23, uL2, uS3, uS8), transcription (e.g., transcription elongation factor NusA, DNA-directed RNA polymerase), genome replication and repair (e.g., DNA polymerase B1, topoisomerase VI, HerA, NurB), respiration (e.g., ATP-synthase, succinate dehydrogenase), carbon fixation (e.g., succinyl-CoA synthetase, aconitase A) and included genes responsible for the maintenance of the cell envelope (S-layer protein) (Fig. 2C). By contrast, several ABC-type transporters and sulfur metabolism genes (e.g., heterodisulfide reductase, sulfite reductase, ATP-sulfurylase), although also included in the core network, formed separate GCN subclusters. Notably, some of the CRISPR defense system genes (e.g., Cas10, Cmr6g7, Cmr1g7 and Cmr5SS of the Cmr-β cassette) were part of the core network, included in the largest subcluster, whereas other cas genes were expressed during particular cell cycle phases (see below), suggesting subtle transcriptional control of this complex defense system.
We assessed whether the core network is enriched in essential genes. To this end, we took advantage of the genome-wide gene essentiality information available for a closely related S. islandicus strain M.16.438 and compared the fraction of essential genes in the core network to that of the non-core genes. The proportion of essential genes was more than twice higher in the core network than in the rest of the genome (34% vs 13%, p value < 0.01; Fig. 2D). The majority (n = 115, 82%) of the essential genes were found in the largest subcluster of co-expressed core genes (Supplementary Fig. S4A), suggesting the existence of a mechanism ensuring coordinated and stable co-expression of genes that are critical for the cell functions. We assessed the distribution of the core genes with respect to the chromosomal A and B compartments. Slightly more than half (57.6%) of the core network genes were localized in the A compartment (p value < 0.01; Fig. 2E, Supplementary Fig. S4B). Collectively, these observations suggest that the S. islandicus chromosome has evolved to accommodate essential genes important throughout the cell cycle in the transcriptionally active, less tightly condensed part of the chromosome.
We then evaluated to what extent the core network is conserved in other archaea. To this end, genes from the core network were assigned to one of the four categories: (i) genes exclusive to the genus Saccharolobus, (ii) those restricted to the order Sulfolobales, (iii) genes conserved across the phylum Thermoproteota, and (iv) genes present in Thermoproteota and three other archaeal phyla (as defined by GTDB), namely, Methanobacteriota, Halobacteriota and Thermoplasmatota. We found that the core genes displayed significantly higher conservation compared to non-core genes, with 93.5% of the core genes being conserved across Thermoproteota, with 62.5% of the core genes being also conserved in three other phyla. However, when the non-core genes were considered, the fraction decreased to 83.7% and 46.1%, respectively (p value < 0.01; Fig. 2F). Moreover, analysis of the network topology showed that the most widely conserved genes occupied the central position of the largest co-expression subcluster (Supplementary Fig. S4C). Indeed, the degree of the most widely conserved genes was significantly higher than that of less conserved genes (Supplementary Fig. S4D), suggesting that the interplay between the core genes evolved prior to the radiation of archaeal diversity and that new, taxon-specific genes and their interactions with the conserved components of the core network have been established subsequently, likely concomitant with archaeal diversification.
Key cellular processes are coordinated with the cell cycle
To determine how S. islandicus coordinates different processes along the cell cycle, we assessed the differential expression and co-expression during different cell cycle phases of all 2558 expressed genes (Fig. 3A, B and Supplementary Fig. S5A). Additionally, KEGG enrichment analysis facilitated the identification of metabolic pathways significantly upregulated during each phase (Fig. 3C, and Supplementary Fig. S5B–D).
A Classification of the strongly differentially expressed genes in each pair-wise comparison by arCOG category. Genes were annotated and classified using the arCOG framework (see Methods). Genes included in the graph were only those strongly differentially expressed, i.e., their fold change (FC) was at least ±2 and their adjusted p value was <0.01. Statistical significance is extracted from the DGE analysis (Supplementary Data 1), where it was calculated with the limma package and p values were adjusted for multiple comparisons using the Benjamini-Hochberg method. The number of genes is shown on the x-axis, genes to the right of the vertical axis are upregulated in the corresponding comparison, while genes to the left are downregulated. B Specific co-expression during the S phase. GCN of the S phase with the specific nodes highlighted, grouped and colored by arCOG category (see Supplementary Fig. S5 for GCNs of the M-G1 and G2 phases). C Differences in the metabolism between phases. A full metabolic map of S. islandicus was extracted from the KEGG database94 (accession number: sir01100 [https://www.kegg.jp/pathway/sir01100]). Each dot represents a metabolite while each line represents the enzymes that transform those metabolites. Each pathway is classified into six groups depending on their peak of activity according to the KEGG enrichment and DGE analysis: (i) ‘All’, if their activity does not change, (ii) ‘M-G1’, (iii) ‘S’ or (iv) ‘G2’, if their activity peaks at each of the corresponding phases, and (v) ‘M-G1/S’ or (vi) ‘S/G2’, if their activity increased at the M-G1 or S phases and was maintained high into the following phase. Different pathways of interest are labeled on the map next to their metabolites.
Lipid biosynthesis and membrane biogenesis
Following cell division, the daughter cells start accumulating biomass and increase in size, up until the next round of cell division19. Increase in the surface area of the cell necessitates the synthesis of additional lipids. Similar to other members of the Sulfolobales, the S. islandicus cell membrane primarily consists of different species of glycerol dibiphytanyl glycerol tetraethers (GDGTs), which contain diverse polar head groups and a variable number of cyclopentane rings in the hydrophobic isoprenoid core39. Nine enzymes that participate in the synthesis of GDGT lipids have been identified40,41,42,43 (Fig. 4A) and five of them are significantly upregulated during the M-G1 phase compared to either S or G2 phase (Fig. 4B). Among these, digeranylgeranylglyceryl phosphate (DGGGP) synthase, which catalyzes the formation of an ether bond linking the second isoprenoid chain to the lipid precursor44, and calditol synthase (Cds), responsible for the synthesis of a unique cyclopentyl head group which plays a key role in the acid resistance of Sulfolobales43, are the most strongly upregulated genes. Notably, the GCN analysis showed that the lipid biosynthesis pathway is highly coordinated during the M-G1 and S phases, compared to G2 (Fig. 4C, inset) and synchronized with expression of diverse membrane proteins, post-translational modification enzymes and various systems embedded in the membrane, such as the Complex II of the electron transport chain and the ATP synthase (Fig. 4C). This observation suggests the existence of a link between cell membrane dependent systems and lipid biosynthesis. During M-G1, we observed higher specific co-expression, compared to S and G2, of genes encoding glycosyltransferases (SiRe_RS03975, SiRe_RS04230, SiRe_RS08195 and SiRe_RS02080), which according to their arCOG annotation are predicted to participate in membrane biogenesis (Supplementary Fig. S5A). Furthermore, many of the genes co-expressed with the lipid biosynthesis enzymes encode poorly characterized proteins, which could represent missing players in the membrane biogenesis processes. For instance, glycosyltransferase (SiRe_RS06775) and N-acetylneuraminate lyase (SiRe_RS10400) identified in the GCN analysis could participate in the synthesis of the lipid headgroups (Fig. 4C).
A Major steps of membrane biogenesis in S. islandicus. Dashed line represents possible modifications the polar head may have. IPP, isopentenyl pyrophosphate; DMAP, dimethylallyl monophosphate; GGPP, geranylgeranyl diphosphate; GGGP, geranylgeranylglyceryl phosphate; G1P, glycerol-1-phosphate; DGGGP, digeranylgeranylglyceryl phosphate; Tes, tetraether synthase; Cds, calditol synthase. B Expression of membrane biogenesis enzymes. Box plot of the expression of the enzymes in the 15 replicates. The center line represents the median; the box limits, the first and third quartiles; whiskers extend 1.5 times the interquartile range; data beyond the whiskers are outliers represented as points. Statistical significance is extracted from the DGE analysis, calculated with the limma package and p values were adjusted for multiple comparisons using the Benjamini-Hochberg method. * = p value ≤ 0.01. ** = p value ≤ 0.001. *** = p value ≤ 0.0001. Exact p values are in Supplementary Data 1. C Gene co-expression subnetworks of membrane biogenesis enzymes. Subnetworks contain only those nodes that co-express with at least two membrane biogenesis enzymes. Nodes are colored by arCOG category as in Fig. 2B, positive edges are green and negative edges, red. C inset. Line plot representing the number of nodes of the subnetwork of all genes co-expressing with the membrane biogenesis enzymes. D Expression of adhesive pili and archaellum. Box plot of their expression in the 15 replicates as in (B). Statistical significance as in (B). E Gene co-expression subnetwork at the S phase of the adhesive pili and archaellum. The subnetwork contains only those nodes that co-express with at least two of their components. Coloring as in (C). E inset. Line plots represent the number of nodes of the subnetworks of all genes co-expressing with the adhesive pili and the archaellum.
Genome replication and architecture
Genome replication is one of the focal points of the cell cycle and, by definition, occurs during the S phase, as evidenced by flow cytometry analysis (Fig. 1B). Unexpectedly, some of the key components of the replisome, such as replicative DNA polymerase PolB1 (SiRe_RS07370), replicative minichromosome maintenance (MCM) helicase (SiRe_RS06220), Gins23 (SiRe_RS06225), which participates in replication initiation and elongation, PolB1 binding protein 2 (SiRe_RS07230), and PCNA sliding clamp (SiRe_RS08085), are upregulated during the M-G1 (versus the G2 phase), a period preceding the actual S phase (Supplementary Data 1). Moreover, Orc1-1 (SiRe_RS08850), Orc1-3 (SiRe_RS00005) and WhiP (SiRe_RS06120), the three replication initiators of S. islandicus, are also upregulated during the M-G1 (versus the G2 phase), with expression of Orc1-3 and WhiP being also maintained throughout the S. The peak expression of Orc-1-1 and WhiP was observed during M-G1, whereas Orc1-3 expression peaks throughout the S phase. These results were validated for selected replisome genes using RT-qPCR and compared to non-synchronized cultures (Supplementary Fig. S6). Notably, the ATP-dependent DNA ligase (SiRe_RS09250), which ligates the Okazaki fragments during the lagging strand synthesis21, is upregulated during the S phase (when compared to the M-G1 phase), indicating that some replisome components have a different temporal expression during the cell cycle. Nevertheless, these observations suggest that the cell prepares for DNA replication in advance, by synthesizing most of the necessary enzymes during the M-G1 phase. Alternatively, the replisome components could also participate in DNA repair, preparing the genome for replication during the S phase. Indeed, some of the DNA repair genes are upregulated during the M-G1 phase as well (see Supplementary Note 2).
Many chromatin proteins, including Cren7 (SiRe_RS05625), two Sul7d family proteins (SiRe_RS03405 and SiRe_RS13370) and two Sso7c4 homologs (SiRe_RS07595 and SiRe_RS09950), were upregulated during the S phase compared to either the M-G1 or G2 phases (Fig. 3A, Supplementary Data 1), suggesting that chromatinization takes place concomitant with DNA replication. It has been recently suggested that Lrs14 family proteins of Sulfolobales should be considered as chromatin organizing proteins45. Our data show that one of the Lrs14 family proteins in S. islandicus, SiRe_RS09945, displays a similar transcription pattern as the main chromatin proteins, supporting the conclusion of De Kock et al.45 that Lrs14 is involved in chromatin organization. Given that chromatin proteins are among the most abundant proteins in the cell, chromatinization is likely to necessitate extensive protein translation during the S phase. Indeed, many translation-related genes, including those encoding a subset of ribosomal proteins, tRNAs, glycyl-tRNA synthetase, a subunit of the RNase P and translation initiation factor 6 as well as thermosome responsible for protein folding, were upregulated during the S phase (see Supplementary Note 3 for details).
Central metabolism
We next assessed whether the central metabolic pathways also displayed differential regulation during the cell cycle phases. To this end, we performed the KEGG enrichment analysis (Supplementary Fig. S5B-D) and analyzed the patterns of differential expression of genes assigned to arCOG categories related to metabolism (Fig. 3A). The two approaches provided congruent and complementary results. Many pathways, including those related to biosynthesis of amino acids and nucleotides (see Supplementary Note 4) as well as carbon metabolism, were not uniformly expressed across different cell cycle phases.
The tricarboxylic acid (TCA) cycle (also known as the Krebs or citric acid cycle) is one of the key energy-generating metabolic pathways of the cell. Through a series of biochemical reactions, TCA releases the energy stored in nutrients through the oxidation of acetyl-CoA derived from carbohydrates, fats, and proteins. We found that the carbon fixation pathways, an assemblage of metabolic pathways terminating in the TCA, are upregulated during the S phase (versus M-G1 phase) and stay active during G2 (Fig. 3C, Supplementary Fig. S5B-D). These pathways fix carbon through the synthesis of malonyl-CoA, which can then be transformed into succinyl-CoA, a key intermediate in the TCA cycle. The subunits of the acetyl-CoA carboxylase (SiRe_RS01265, SiRe_RS01270, SiRe_RS01275), which synthesizes malonyl-CoA, as well as the methylmalonyl-CoA epimerase (SiRe_RS01085) and mutase (SiRe_RS01080), which catalyze the last step in the transformation to succinyl-CoA, are upregulated during the S phase and even more strongly during G2, compared to M-G1 (Supplementary Data 1). Alternatively, the succinyl-CoA can enter the autotrophic hydroxybutyrate cycle46 and produce acetoacetyl-CoA, which will be broken into two acetyl-CoA molecules. Some enzymes participating in this transformation, namely, succinyl-CoA reductase (SiRe_RS04600), succinate semialdehyde reductase (SiRe_RS07755), 3-hydroxybutyryl-CoA dehydrogenase (SiRe_RS11400) and two acetyl-CoA acetyltransferases (SiRe_RS07425 and SiRe_RS13030), follow expression patterns similar to enzymes producing succinyl-CoA. Moreover, although not significantly enriched in any of the phases, the TCA cycle includes several enzymes, such as succinate dehydrogenase (SiRe_RS00780 and SiRe_RS00785) and the aconitate hydratase (SiRe_RS0595), that are also upregulated during the S and G2 phases, compared to the M-G1 phase. These results indicate that the most central and important pathways for production of energy tend to be less active during M-G1.
Notably, the substrates used for energy production during the S and G2 appear to be different. For instance, fatty acid degradation pathways (Supplementary Fig. S5B) are upregulated during the S phase, according to the KEGG enrichment. This is concomitant with the downregulation during the S phase (compared to the M-G1 phase) of the transcriptional repressor FadR (SiRe_ RS01515), a TetR-family transcriptional regulator that has been shown to repress β-oxidation47. By contrast, the G2 phase is associated with higher activation of the glycolysis/gluconeogenesis and other carbohydrate-related pathways as well as sulfur metabolism. Many of the differentially expressed genes in G2 are implicated in glycolysis/gluconeogenesis and sulfur metabolism. These include sulfide:quinone oxidoreductase (SiRe_RS13005), implicated in sulfur metabolism, and phosphoenolpyruvate carboxykinase (SiRe_RS02025), aldose 1-dehydrogenase (SiRe_RS11380) and gluconate dehydratase (SiRe_RS10395), which participate in the glycolysis/gluconeogenesis and the pentose phosphate pathways (Fig. 3C). Because many of the enzymes in the glycolysis/gluconeogenesis pathway are shared with the fructose, mannose, sucrose and starch metabolism, the latter pathways were also enriched in the G2 phase (Supplementary Fig. S5C). However, no differential gene expression of enzymes specific for each of these pathways was found during the S and G2 phases. Moreover, because S. islandicus lacks the essential phosphofructokinase needed to perform glycolysis through the canonical Embden-Meyerhof-Parnas Pathway48, it is likely that glucose or fructose molecules are metabolized via the pentose phosphate pathway to glyceraldehyde or glycerate and then introduced into the glycolytic pathway to be fully oxidized. Finally, the profound shifts in the metabolic landscape during the G2 phase are also supported by the largest density of specifically co-expressed metabolism-related genes during this phase. In particular we observed specific co-expression of genes implicated in energy production and conversion and the carbohydrate metabolism genes (Supplementary Fig. S5A).
Cell division, motility and adhesion
As explained above, mitosis (M), division (D) and the pre-replicative first gap (G1) phases occur in rapid succession, precluding us from obtaining populations enriched in these discrete phases. Nevertheless, some of the proteins are known to be markers for the M and D phases. In particular, chromosome segregation during the M phase is mediated by a pair of proteins, SegA and SegB49, while cell division during the D phase is driven by the ESCRT-based machinery27,28. Consistently, during M-G1, we observed strong upregulation of genes encoding the pair of genome segregation proteins (Fig. 3A, Supplementary Data 1) and ESCRT-based cell division machinery (including CdvA, ESCRT-III, ESCRT-III-1, ESCRT-III-2 and Vps4), compared to either S or G2 phase. Moreover, aCcr1, a transcription factor which terminates the cell division by repressing cdvA, exhibits peak expression during the M-G1 phase, consistent with the published results32. Notably, several other transcription factors display strongly pronounced cyclical expression patterns, suggestive of their importance for the progression of the cell cycle (see Supplementary Note 5 for details).
As in other members of the Sulfolobales50,51,52,53,54, motility and adhesion in S. islandicus are mediated by two evolutionarily related but functionally distinct extracellular filaments, the archaeal flagellum (or archaellum) and adhesive pili, respectively. Both filaments are composed of pilins/archaellins related to bacterial type IV pilins55,56,57 that are secreted through a membrane pore with the help of a cognate ATPase motor58,59. Live cell imaging of S. acidocaldarius cells revealed changes in the cell adhesion and motility around the time of division, suggesting the existence of coordination between these processes60. In particular, ~80% of the observed cells underwent a transient loss of adhesion immediately prior or during cell division and >40% of newborn daughter cells rapidly moved away from the site of division60. Our data are fully consistent with these findings. Differential gene expression analysis showed that the pore and ATPase that secrete the archaellins are upregulated during the M-G1 and S phases, compared to the G2 phase (Fig. 4D). By contrast, the pore and ATPase responsible for the export of the adhesive pilins are both upregulated during the G2 phase (Fig. 4D). These patterns suggest that adhesive pili, present during the G2 phase, would be replaced by archaella following cell division. Interestingly, during the S phase, adhesive pilins and the ATPase of the archaellum were inversely co-expressed with an FKBP family peptidyl-prolyl cis-trans isomerase chaperone (SiRe_RS06340), suggesting the involvement of the latter in the switch between swimming motility and adhesion (Fig. 4E).
Notably, the expression of the archaellum system seems to be activated by the one-component system ArnR, with deletion of this system in S. acidocaldarius affecting the expression of the archaellin and impairing motility61. Our data shows the ArnR ortholog in S. islandicus (SiRe_RS00635) is upregulated during the M-G1 phase (compared to G2), consistent with the possibility that activation of the archaellum operon during the M-G1 and S phases is regulated by the ArnR transcription factor. Moreover, the repression of the archaellum operon during the G2 phase is concomitant with the upregulation of the gene encoding coalescin (SiRe_RS00270), an SMC superfamily chromosome organizing protein which has been shown to have a high occupancy in the archaellum operon36. High occupancy of coalescin within particular genomic loci impedes the access of RNA polymerase and results in transcriptional repression.
Defense systems
Our data shows a strong upregulation of various defense related genes during the S and G2 phases, compared to the M-G1 phase (Fig. 3A). Defense against viruses and mobile genetic elements in S. islandicus REY15A is primarily mediated by the CRISPR (clustered regularly interspaced short palindromic repeats) system62 (see Supplementary Note 6 for description of the S. islandicus CRISPR systems). In addition to the more extensively studied CRISPR-Cas systems, S. islandicus REY15A encodes two recently predicted, but functionally uncharacterized defense systems, namely, Hma63, composed of a helicase (SiRe_RS03020), methyltransferase (SiRe_RS03035) and ATPase (SiRe_RS03030), and the Methylation Associated Defense System (MADS) (SiRe_RS00305 and SiRe_RS00315)64 (Fig. 5A).
A Organization of the different defense cassettes in the S. islandicus REY15A genome. CRISPR-Cas and Hma loci of REY15A are shown with different color-coded boxes. A diagonal dotted dash indicates a gap between the genes or cassettes. B Expression of the different defense related genes and cassettes throughout the cell cycle. Pair-wise comparisons between phases are represented in three volcano plots. The y-axis represents the adjusted p value in logarithmic scale in base 10. The x-axis represents the fold change (FC) between the two phases in logarithmic scale in base 2. The horizontal lines mark the threshold for significance, i.e., a p value of <0.01 (-log10(0.01) = 2). The vertical lines mark the threshold for strong differential expression, i.e., FC of 2 (log2(±2) = ± 1). Statistical significance is extracted from the DGE analysis (Supplementary Data 1), where it was calculated with the limma package and p values were adjusted for multiple comparisons using the Benjamini-Hochberg method. Defense genes and cassettes are highlighted and color-coded. C Gene co-expression subnetworks of the defense components. The subnetworks are extracted from the full GCNs by maintaining only those nodes that co-express with at least two defense-related (CRISPR-Cas or hma) genes. Nodes are colored by arCOG category following the same color code as in Fig. 2B, while edges are colored green if the co-expression is positive and red if it is negative. C inset. Size of the co-expression subnetwork of the CRISPR-Cas interference cassettes and the Hma cassette throughout the cell cycle. Line plot representing the number of nodes of the subnetwork of all genes co-expressing with the Cmr-β, the Cascade, the Cmr-α and the Hma cassettes at each of the phases of the cell cycle.
The differential gene expression analysis showed that different components of the Cascade complex, Cmr-α, and Cas6, which is responsible for the processing of the CRISPR RNA, were upregulated during the S phase, compared to the M-G1 phase, along with the Hma genes (Fig. 5B). All the aforementioned genes are also active during G2, when they are joined by the upregulated Cmr-β genes. Such pattern of expression suggests activation of the defense and surveillance systems during S and G2, albeit with slight variation between the different CRISPR types and modules. Curiously, concomitant with their downregulation during the M-G1 and subsequent de-repression during the S phase, networks of co-expression of CRISPR related genes showed an inverse correlation with transcriptional repressors during these two phases (two Lrp family transcriptional regulators, SiRe_RS02835 and SiRe_RS07120, during M-G1 and S, respectively; and two ArsR family transcriptional regulators, SiRe_RS09190 and SiRe_RS09585, during S phase) (Fig. 5C). In addition, a PIN-domain ribonuclease toxin (SiRe_RS03095) and several transposases (IS110 [SiRe_RS00565] and IS5 [SiRe_RS04210] family transposases and TnpB family nucleases [SiRe_RS04095, SiRe_RS05390, SiRe_RS06735]) also showed similar co-expression, indicative of a possible control by the defense system. By contrast, positive correlation between the expression of Cas nucleases with diverse cellular nucleases, including mRNA ribonuclease (SiRe_RS03015), Rrp4 cap of the RNA-degrading exosome complex65 (SiRe_RS06470) and TatD-family nuclease (SiRe_RS05420), and proteins implicated in DNA repair, such as Mre11 (SiRe_RS00310), PolB3 (SiRe_RS09745) and NreA (SiRe_RS06100) suggests the presence of yet to be discovered cellular partners of the CRISPR-Cas systems. Some of these partner proteins could be, for instance, activated by cyclic oligoadenylate (cOA) as in the case of Csx1 or play a role in preserving genome integrity upon CRISPR-Cas activation. During the S phase, which contains the most coordinated network of the CRISPR-Cas systems (Fig. 5C inset), all three components of the Hma system and genes of the putative MADS defense co-expressed with different CRISPR-Cas modules suggesting coordination between these distinct defense systems.
It is generally considered that defense systems are under tight regulatory control and activated only upon invasion of foreign mobile genetic elements, such as viruses or plasmids66,67. Our results suggest that this might not be entirely the case. Activation of some of the defense systems during the S phase could be triggered by the exposed DNA replication intermediates or more active proliferation of transposons. Alternatively, CRISPR systems may additionally function in non-defense contexts, for instance, during DNA repair, as previously hypothesized68. For instance, it has been recently demonstrated that in halophilic archaeon Haloferax volcanii, Cas3 protein, component of type I systems, facilitates rapid recovery from DNA damage69. Indeed, our co-expression networks show coordination of DNA repair and defense systems, suggesting a role of the defense systems in safeguarding the integrity of the genome.
Phase signature genes defined via transcriptome deconvolution
To validate the results of the differential gene expression and GCN analyses, and to identify the signature genes defining each phase, we applied a transcriptome deconvolution method to the transcriptomics and flow cytometry data (Supplementary Data 4; see Methods). The flow cytometry data demonstrates that each sample, although enriched in cells from the targeted phase, includes cells from all three phases (Supplementary Fig. S1). Using non-negative least squares (nnls) optimization we corrected gene expression levels in each sample by excluding transcription signal possibly originating from cells in non-targeted phases. S. islandicus genes were clustered based on corrected expression levels into four different groups depending on whether they tended to be more expressed at one of the three phases studied (groups 1-3) or display similar expression throughout the cell cycle (group 4) (Supplementary Data 4). This allowed identification of gene sets showing phase-specific expression patterns. In the next step, to gain a clearer view on the processes taking place during each of the phases, we filtered out all the poorly annotated genes assigned to arCOG categories R or S (Supplementary Data 4). The picture which emerged from this analysis was fully consistent with and complementary to the conclusions drawn from the manual analysis of the differential gene expression patterns and GCNs described in the previous sections (Fig. 6A). In particular, the signature genes specific of the M-G1 phase included those for genome segregation, cell division, DNA replication as well as nucleotide and lipid metabolisms; S-specific genes set included genes for diverse chromatin proteins and translation related genes as well as some of the carbon metabolism genes; G2 phase was characterized by genes for carbohydrate and sulfur metabolism as well as changes in the motility and defense. Moreover, functional enrichment analysis of the phase specific gene sets (Supplementary Data 4) shows significant enrichment of cell division, segregation and replication categories during the M-G1 phase; translation, transcription (which includes chromatin proteins), lipid metabolism and defense categories during the S phase, or defense and cell motility categories during the G2 phase.
A Circular heatmap of the signature genes identified via a transcriptome deconvolution method. The predicted expression estimated by transcriptome deconvolution was normalized in between phases and genes were clustered based on k-means into four different groups depending on whether they tended to be more highly expressed at one of the three phases studied (groups 1–3) or displayed uniform expression throughout the cell cycle (group 4). The clusters of genes were represented in a circular heatmap with their normalized expression using the circlize R package. Genes are ordered by arCOG category and their arCOG class (information storage and processing, cellular processes and signaling or metabolism) is marked in the graph. In a clockwise order starting at 6 o’clock, genes clustering as peaking at M-G1 are shown first, then those peaking at S and finally those that do at G2. Genes of interest are marked with their annotations next to the heatmap. B Graphical summary of the changes occurring during the cell cycle. The core and housekeeping processes which are maintained throughout the cycle are shown in the center of the graphic, whereas the phase-specific processes are depicted at the periphery. The presumed morphological states at each of the cell cycle phases are also depicted.
Notably, some of the signature genes identified after transcriptome deconvolution, did not stand out in the differential gene expression analysis. For instance, the signature gene set of the M-G1 phase includes the β-subunit of the proteasome and the proteasome-activating nucleotidase, a regulatory subunit that drives the conformational changes during the proteasome functional cycle70. It has been shown that in the presence of a proteasomal inhibitor, the ESCRT-III rings cannot be disassembled, resulting in cell division arrest in S. islandicus28 and other Sulfolobales species27. These experimental results are consistent with the proteasomal genes being selected as the M-G1 signature genes. Another example is the S-layer, its two subunits, SlaA and SlaB (SiRe_RS08185 and SiRe_RS08180), show their peak of expression during the M-G1 phase, concomitant with the up-regulation of membrane biogenesis pathways observed during that phase. Thus, the three phase-specific gene sets defined via the transcriptome deconvolution method appear to adequately represent the gene expression patterns along the S. islandicus cell cycle and can be used in subsequent analyses to assess the state of cells under different experimental conditions. Importantly, this analysis indicates that many processes in S. islandicus are coordinated with the cell cycle, being expressed during particular phases, resembling, at least qualitatively, the transcriptional landscape of the cell cycle in most eukaryotes.
The identified signature genes were significantly (p value < 0.01) more conserved compared to the rest of the genes. Namely, 89.9% and 51.6% of the signature genes were conserved across Thermoproteota and three other archaeal phyla, respectively, whereas the non-signature genes displayed lower conservation in the corresponding lineages (78.6% and 44.5%, respectively). The higher conservation of the phase-specific genes suggests that the regulation and the overall structure of the Saccharolobus cell cycle is conserved in archaeal lineages beyond the order Sulfolobales. Finally, we assessed whether the expression of the S. islandicus phase-specific genes also follows a cell cycle dependent transcription pattern in eukaryotes. To this end, we compared the cell cycle phase affiliation of genes that are homologous between S. islandicus and eukaryotes represented in the Cyclebase database71 (see Methods). The budding yeast S. cerevisiae shared the largest number (n = 403) of homologs with S. islandicus (Table 1, Supplementary Data 5). Of the 86 S. islandicus signature genes specific to the M-G1 genes, 67 homologous genes displayed peak expression during M or G1 phase in S. cerevisiae. Of the 57 G2 signature genes of S. islandicus, 37 were also expressed during G2 in the budding yeast. The S-specific genes, the largest category with 260 genes, 28.8% (n = 75) of which encode ribosomal proteins and other translation or protein folding related proteins (tRNA synthetases, tRNA ligases, thermosome subunits, etc.), displayed less congruence in the temporal expression, with only 57 genes displaying peak expression during the S phase in both organisms (Table 1, Supplementary Data 5). Thus, despite certain differences in the timing of expression of certain functions, the overall structuring of the transcriptional landscape follows a defined program in both S. islandicus, the budding yeast and possibly other Amorphea7.
Parallels to the eukaryotic cell cycle and limitations of the study
Our understanding of gene expression during the cell cycle in all organisms, and particularly in archaea, is rather limited. In most bacteria, gene expression appears to be primarily defined by the movement of the replication fork72,73,74,75. Our data suggest that this is not the case in S. islandicus or other Sulfolobales, where the expression program appears to be more akin to that of eukaryotes. In addition to the previously demonstrated cyclic expression of Sulfolobales genes related to cell division and DNA replication25,28,29,32,76,77, our results indicate that many other key cellular processes in S. islandicus, including defense systems, cell motility and adhesion apparatuses as well as diverse metabolic pathways, are coordinated with the progression of the cell cycle (Fig. 6B). Some of these processes are also expressed cyclically in certain eukaryotic model systems, e.g., the budding yeast1,7,8,78.
Despite the disparity in the duration of the G1 phase in most studied eukaryotes8 and Sulfolobales, the overall ‘logic’ appears to be similar, that is, to prepare for genome replication. Indeed, our data suggests a transcriptional activation of DNA repair pathways and expression of the major replisome components, including the replicative DNA polymerase, MCM helicase and two of the three origin recognition genes (Orc1-1 and WhiP), during the M-G1 phase. Similarly, the expression of the MCM subunits, helicase loader Cdc6 and origin recognition subunits (Orc1-6) in G1 have been demonstrated in opisthokonts, including yeast and vertebrates, such as Xenopus laevis79. In most eukaryotes where cell cycle has been studied, G1 is the pivotal moment when the cell’s fate is decided: replicate, differentiate or die1. Whether a similar checkpoint exists in S. islandicus remains to be investigated. In these eukaryotes, the G1 phase is also associated with high biosynthetic activity, accompanied by the increase in cell size1. Although in S. islandicus, most of the metabolic pathways displayed reduced activity during the M-G1, lipid biosynthesis was significantly enhanced, suggesting cell growth.
DNA replication marks the start of the S phase. In opisthokonts, cyclin-dependent kinases initiate replication through an orchestrated process within different origins1,79. Similarly, in Sulfolobales, the S phase starts with the almost simultaneous firing of two of the three origins of replication triggered by Orc1-1 and WhiP80,81, even though their transcripts are produced during the M-G1 phase (see above), consistent with the previous observations76. In opisthokonts, the DNA synthesis during the S phase is coordinated with the production of histones, facilitating the assembly of the newly replicated DNA into chromatin82. Remarkably, our data suggests that DNA replication and chromatinization are also coupled in S. islandicus. We found upregulation of the dominant chromatin-associated proteins, including Cren7, Sul7d and Lrs14, during the S phase45. However, unlike in yeast and vertebrates, where apart from histones, protein synthesis appears to be generally low1, we observed an upregulation of diverse translation-related genes, including those encoding the non-universally conserved core ribosomal proteins, tRNA synthetases, tRNAs and RNase P. Consistently, many genes encoding ribosomal proteins and tRNA synthetases were found as signature genes of the S phase after transcriptome deconvolution. Consistent with the low translation during the S phase in eukaryotes, only 10 of the 75 translation related genes specific to the S phase in S. islandicus showed peak expression at the same phase in S. cerevisiae. Instead, most of the S. cerevisiae homologs showed peak expression during the M or G2 phases (Supplementary Data 5). These differences may result from ultrastructural differences between eukaryotic and archaeal cells. In particular, presence of a nucleus in eukaryotes effectively uncouples transcription and translation, whereas in archaea, the two processes appear to be coupled83,84. Another notable difference between the cell cycles of S. islandicus and opisthokonts concerns the duration of the G1 and G2 phases, with the G1 in eukaryotes being one of the longest and G2 one of the shortest phases8, and the opposite being true for Sulfolobales. Nevertheless, despite these differences, the general logic of the coordination of metabolism, DNA repair, and lipid biosynthesis during one of the two G phases appears to be shared.
In a seminal study, Takemata and colleagues have shown that chromosome organization into A and B compartments in Sulfolobales is coupled with the chromosomal distribution of the SMC superfamily protein coalescin, which is largely defined by the transcriptional state of particular loci36. In non-synchronized S. islandicus populations, the archaellum operon, a heterodisulphide reductase gene cluster and fatty acid metabolism genes displayed high coalescin occupancy, consistent with their low transcriptional activity36. We observed that the three operons display cell cycle-dependent differential expression, with the fatty acid metabolism genes being active during the S phase, archaellum-related genes during M-G1 and S, and the heterodisulphide reductase gene cluster during M-G1. The expression of these genes is inversely related to the highest transcription of the coalescin gene during the G2 phase. These results suggest that non-synchronized cultures in their transcriptional profile, coalescin occupancy and hence chromosome organization are likely to resemble the G2 phase, consistent with the reported DNA content profiles36. Further research on chromosome organization during the cell cycle progression is likely to provide further insights into the functioning of Sulfolobales cells.
One of the limitations of our study is that changes in transcription do not necessarily correlate with protein expression and do not provide information on posttranslational protein modifications, which might be particularly relevant for cell cycle progression. Indeed, post-translational modifications, such as acetylation, phosphorylation and ubiquitylation, are critical to many complex regulatory networks that govern the cell cycle progression in eukaryotic cells85. The proteome of S. islandicus has been shown to undergo extensive lysine methylation and N-terminal acetylation, with the methylation levels increasing with the progression of the growth phases86. Notably, many chromatin proteins displayed differential methylation during different growth phases in non-synchronized populations. It will be interesting to see whether the chromatin methylation patterns change with the progression of the cell cycle. While ubiquitylation is absent in Sulfolobales, alternative mechanisms, e.g., SAMPylation, appear to play similar roles in protein turnover87. Thus, studies on changes in posttranslational modifications throughout the S. islandicus cell cycle appear as a promising future research direction.
Collectively, our results illuminate the complexity of the transcriptional landscape in an archaeal model system. Notably, the overall program of the cell cycle appears to be conserved throughout the class Thermoproteia30, suggesting that our findings could be extrapolated to other members of this archaeal lineage. In this context, the signature genes characteristic of different cell cycle phases could be particularly useful for future comparative studies.
Methods
Strains and growth conditions
Saccharolobus islandicus strain REY15A was grown aerobically at 76 °C with shaking in 25 ml of MTSV medium containing mineral salts (M), 0.2% (wt/vol) tryptone (T), 0.2% (wt/vol) sucrose (S) and a mixed vitamin solution (V); the pH was adjusted to 3.5 with sulfuric acid, as described previously88.
Synchronization of saccharolobus islandicus
Cells were synchronized using acetic acid (final concentration: 6 mM) as previously described89. Briefly, cells were cultured in 25 ml of MTSV media until OD600 reached 0.2, then cultures were synchronized by addition of acetic acid to arrest the cells at the end of the G2 phase. Following the incubation for 6 h, the cells were pelleted at 2900xg for 15 minutes, washed with 0.7% sucrose to remove the acetic acid and resuspended in warm acetic-acid free media. Once synchronized, the cells were grown in the MTSV medium as described above and the progression of the cell cycle was followed by flow cytometry. Briefly, cells were fixed with 70% ethanol at +4 °C and washed once with PBS. Fixed cells were stained with 40 μg/ml PI (Invitrogen) in staining buffer (100 mM Tris pH: 7.4, 0.5 mM NaCl, 1 mM CaCl2, 0.5 mM MgCl2, 0.1% Nonidet p-40) and their DNA content analyzed using CYTOflex (Beckman-Coulter). Fluorescence in CYTOflex was measured through the ECD channel with a manual height threshold in that channel at 5993 points, to filter non-fluorescent background, and default gain parameters.
RNA extraction and sequencing
Cells were plated on solid media and 15 single colonies were inoculated in liquid media, forming 15 cultures that represent completely independent biological replicates. The 15 biological replicates were grown in 3 different batches (5 cultures per batch), each batch was grown and synchronized at a different time (5 cultures at a time). RNA was then isolated from each batch on different days, no variance between batches was found using principal component analysis (see Supplementary Data 1 for batch number for each sample). The total RNA was extracted from the 15 biological replicates at three different time-points after synchronization: 2h30, 4 h and 6 h (Fig. 1B). The time points were selected based on the results of the flow cytometry analysis which showed that samples at these three time points are enriched in cells in the M-G1, S and G2 phases, respectively (Fig. 1C). Total RNA was extracted using TRI Reagent (SIGMA-Aldrich), following the manufacturer’s protocol, and treated with DNase (TURBO DNA-free kit; Invitrogen) following the manufacturer’s instructions. DNase treated samples were further purified with RNeasy Mini Kit (Qiagen). RNA samples were quantified using a Qubit Fluorometer (Thermo Fischer Scientific) and assessed for quality with a BioAnalyzer (Agilent) (see Supplementary Data 1 for quality control parameters for each sample). Libraries were prepared using the Illumina® Stranded Total RNA Prep, Ligation with Illumina® Ribo-Zero Plus kit, with custom ribodepletion. Following PCR amplification, all samples underwent two rounds of purification with AMPure beads (Beckman Coulter) to remove small fragments. Libraries were subsequently validated using both the Qubit Fluorometer and the Fragment Analyzer (Agilent). Sequencing was performed on an Illumina NextSeq 2000 sequencer with a P3 50-cycle kit and a target of 30–40 million reads per sample. The reads obtained were mapped to the reference S. islandicus REY15A genome (RefSeq accession number: NC_017276 [https://www.ncbi.nlm.nih.gov/nuccore/NC_017276.1/]) with Bowtie-290 using default parameters.
Differential gene expression analysis
The amount of reads per gene was counted with featureCounts91 with default parameters. Data was then processed using R and the EdgeR library92. Read counts per gene were standardized to counts per million and filtered to eliminate all of the genes which did not have at least one read per million in all 15 samples. Finally, reads were normalized using the TMM method, which assumes that the majority of genes is not differentially expressed93. The TMM method takes into account the sampling properties of the RNA-seq data and corrects for biases caused by different library size or differences on the expression properties of the whole sample, such as the presence of two chromosomes versus one chromosome (for more information see ref. 93). Logarithmic fold change in base two (log2FC) was calculated by subtracting the logarithmic average of two groups, e.g., the log2FC of M-G1 vs S is calculated by subtracting the logarithmic average of S to the logarithmic average of M-G1. The data was fitted to a linear model to calculate statistical significance using the limma package and p values were adjusted using the Benjamini-Hochberg method to correct for multiple testing. Genes were considered up- or down-regulated if log2FC was at least ±0.5 (FC ± 1.4) and the adjusted p value was <0.01, genes with a log2FC of at least ±1 (FC ± 2) were considered to be strongly up- or down-regulated.
KEGG enrichment
Information on the different metabolic pathways was extracted from the Kyoto Encyclopedia of Genes and Genomes (KEGG)94. KEGG enrichment p values were calculated by taking into account the adjusted p values of the genes in each pathway and comparing them with the genes outside the pathway by applying a two-sided Wilcoxon test. Significance threshold was set at 0.05. Gene ratio was calculated as the number of genes for which RNA sequencing data was available relative to the number of genes annotated in each pathway. Density plots were generated using the ggplot2 package in R with the transcriptomic data of the genes in each pathway.
Gene annotation
Genes were annotated and classified using the Clusters of Orthologous Genes (arCOG) framework37, where each gene is assigned a code indexed to a specific category according to its orthologs in other archaea. Essentiality information was extracted from the previous study on the closely related S. islandicus strain M.16.438. Information on compartmentalization of the genome was extracted from data obtained previously36. Statistical significance in the distribution of essential genes and compartments in the core network was calculated by performing a Fischer’s exact text comparing the distribution of two groups. For proteins of interest, the arCOG annotations were supplemented with the results from profile-profile comparisons using HHpred.
Codon usage
Codon usage for each gene was calculated using the coRdon library in R (Elek A, Kuzman M, Vlahovicek K (2023). coRdon: Codon Usage Analysis and Prediction of Gene Expressivity. doi:10.18129/B9.bioc.coRdon, R package v1.20.0, https://bioconductor.org/packages/coRdon) using the coding sequences for REY15A extracted from GenBank (accession number: NC_017276 [https://www.ncbi.nlm.nih.gov/nuccore/NC_017276.1/]). Average codon usage for the full genome was calculated using the Sequence Manipulation Suite95 using the bacterial (11) genetic code. Information on codon usage is provided in Supplementary Data 3.
Data visualization
Data was represented in the form of volcano plots, violin plots, box plots or dot plots using the ggplot2 package in R. Ggplot2 was used to generate and plot the regression model of the expression dependent on chromosome position. The method used was a generalized additive model.
Gene co-expression networks
Read count matrices quantifying the gene expression in M-G1, S and G2 were normalized using the VST transformation from the DESeq2 R package96. The normalized matrices were subsampled to generate the highest possible number of inferences of GCNs and to produce multiple gene co-expression networks (GCNs) for each phase of the cell-cycle; specifically, within 15 transcriptomes, 5 transcriptomes can be randomly sampled to generate 20 replicates with no more than 2 shared transcriptomes between any pair of replicates. For each cell cycle phase, this protocol returned 20 subsamples, from which 20 GCNs were built. In each GCN, nodes correspond to sequences assigned to genes, connected by edges. To build these GCNs, we used the normalized count matrix as input for package WGCNA97 and the function core to compute the Pearson correlation coefficients (PCC). A high PCC threshold for edge inclusion in a GCN was set at 0.8 as a first step to avoid spurious correlations: the edges in each final filtered GCN were thus weighted by either strong positive (PCC > 0.8) or negative (PCC < -0.8) PCC values. Then, for each phase of the cell cycle, consensus GCNs were constructed using a majority rule that only retained high correlation edges present in more than 80% of the replicate GCNs (see Supplementary Note 7 for a more detailed description of the method used for the construction of the GCNs). This strategy identified strongly correlated gene co-expression very commonly observed in a cell cycle phase and robust to sampling effects, because these co-expressions are observed almost irrespectively of what transcriptome samples were used to describe each cell cycle phase.
Classification of genes by conservation level
To estimate the conservation of S. islandicus genes among archaeal groups, a reference archaeal protein sequence dataset was assembled from 292 complete representative genome assemblies out of all archaeal genomic assemblies recorded in the RefSeq database98 and all proteins of S. islandicus REY15A. Diamond99 was used to perform an all-against-all comparison (with parameters -e 1e-5 -k 1000) using this reference protein sequence dataset. Gene families were computed from the results of the Diamond search using a sequence similarity network, built by filtering results (using standard thresholding parameters: minimum % identity = 30%; minimum mutual sequence coverage = 80%), using the cleanblast and familydetector scripts from the MultiTwin tool100. The resulting gene families were mapped on a reference archaeal phylogeny101 using the ete3 Python package, and further categorized by conservation level, based on the phylogenetic distribution of their members. Differences in conservation levels between genes in the core network compared to non-core genes was tested by performing a Chi-square test comparing the conservation distribution of the non-core genome with the core genes.
Identification of phase-specific signature genes
To estimate the signature genes for each phase, we applied a transcriptome deconvolution method102. We assumed that each sample consisted of three subpopulations of cells: (i) M-G1, (ii) S or (iii) G2. Each subpopulation has a different proportion of cells: \({p}_{M-G1}+{p}_{S}+{p}_{G2}=100\%\) (\({p}_{M-G1},{p}_{S},{p}_{G2}\) = percentage of cells in the M-G1, S and G2 subpopulations, respectively). Moreover, we assumed that each gene has a constant transcription level (number of reads) in all cells from one subpopulation (\({t}_{M-G1},{t}_{S},{t}_{G2}\)= transcription levels of a gene in the subpopulations M-G1, S and G2). With these assumptions, the total transcription level of a gene (\(T\)) can be presented as a sum of the transcription levels from three subpopulations of cells: \(T={p}_{M-G1} \,*\, {t}_{M-G1}+{p}_{S} \,*\, {t}_{S}+{p}_{G2} \,*\, {t}_{G2}\). The total transcription level or number of reads of a gene (\(T\)) was calculated with featureCounts (see above). The percentage of cells in different subpopulations (\({p}_{M-G1},{p}_{S},{p}_{G2}\)) was obtained from the flow cytometry data (see above) of three representative biological replicates (Supplementary Fig. S1A; Supplementary Data 4). From this data, for each gene we have nine linear equations (\(T={p}_{M-G1} \,*\, {t}_{M-G1}+{p}_{S} \,*\, {t}_{S}+{p}_{G2} \,*\, {t}_{G2}\)) with three unknown variables (\({t}_{M-G1},{t}_{S},{t}_{G2}\)). The unknown variables were predicted using the non-negative least square method in python (nnls function scipy version 1.11.4). As a result, for each gene we estimate \({t}_{M-G1},{t}_{S},{t}_{G2}\), the transcription levels or number of reads in each phase. Once the reads were predicted, we excluded all the genes whose expression was estimated to be lower than 3 counts per million reads in all phases. The remaining genes were clustered based on K-means into four different groups depending on whether they were more expressed at one of the three studied phases (groups 1-3) or displayed a similar expression throughout the cell cycle (group 4). A second round of clustering was performed excluding all poorly annotated genes (arCOG categories R and S).
Functional enrichment analysis of phase-specific signature genes was performed using the clusterProfiler package in R. The function enricher was used to test the enrichment and its significance of the different arCOG categories in the phase specific genes (M-G1, S and G2) compared to the full genome. The Benjamini-Hochberg method was used for multiple testing correction. An adjusted p value of less than 0.05 was considered significant.
Phylogenetic distribution of S. islandicus phase-specific genes
To determine whether S. islandicus phase-specific genes have homologs within bacterial and/or eukaryotic genomes, a reference proteic sequence dataset was built using all the reference proteomes from Uniprot103 for Bacteria (9285 proteomes) and Eukaryota (2625 proteomes). Diamond99 was used to perform a search (with parameters -e 1e-5 -k 10000) for homologs of 1530 proteins encoded by S. islandicus phase-specific genes against this reference protein sequence dataset. Depending on the phylogenetic diversity in homolog gene sets, S. islandicus phase-specific genes were classified as Prokaryotic if shared with bacteria only, Archaeoeukaryotic if shared with eukaryotes only or Mixed if shared with both groups.
Phase-specific gene expression in S. islandicus and eukaryotes
To compare the cell cycle-dependent gene expression patterns in archaea and eukaryotes, S. islandicus phase-specific genes eukaryotic homologs were identified in the CycleBase database71, which records the expression profile of periodically expressed genes during the eukaryotic cell cycle for 4 eukaryotic species (S. cerevisiae, S. pombe, H. sapiens and A. thaliana). Only the transcriptomic data of the CycleBase dataset was used for comparison. A blastp search was performed (with parameters -evalue 1e-5 -word_size 5) using the proteins encoded by S. islandicus phase-specific genes as query against an eukaryotic protein sequence dataset combining the proteins from Uniprot reference proteomes for S. cerevisiae (UP000002311), S. pombe (UP000002485), H. sapiens (UP000005640) and A. thaliana (UP000006548). Since hits were mostly found within S. cerevisiae proteins, expression peak times were then compared between S. islandicus and S. cerevisiae; for each S. islandicus phase-specific signature gene, similarity to S. cerevisiae cell cycle transcriptomic data was decided when at least one of the three closest periodically expressed S. cerevisiae homologs (best blast hits) was maximally expressed at a corresponding phase, according to either relaxed (correspondences: archaeal M-G1 with eukaryotic G1, G1/S, G2/M and M phases; archaeal S with eukaryotic G1/S and S phases; archaeal G2 with eukaryotic G2 and G2/M phases) or strict (correspondences: archaeal M-G1 with eukaryotic G1 and M phases; archaeal S with eukaryotic S phase; archaeal G2 with eukaryotic G2 phase) phase correspondence rules.
Two step RT-qPCR
After RNA extraction and purification (see above). First-strand cDNAs were synthesized from the total RNAs using the LunaScript® RT SuperMix Kit (New England Biolabs) following the manufacturer’s instructions. Briefly, 900 ng of RNA were mixed with the SuperMix and the reactions were incubated in a thermocycler at 25 °C for 2 minutes, followed by 10 minutes at 55 °C and finally 1 minute at 95 °C to heat inactivate the enzyme. One microlitre of the product was used as template in qPCR to evaluate the mRNA levels of the targeted genes. qPCR was performed using Luna® Universal qPCR Master Mix (New England Biolabs) and gene specific primers (Suplementary table S1). The reaction was performed following the manufacturer’s instructions with a denaturing at 95 °C for 2 min, 35 cycles of 95 °C for 15 seconds and 60 °C for 30 seconds. The qPCR was performed in a CFX Opus 96 Real Time PCR system machine by BioRad and the normalized expression was calculated using the software provided by the manufacturer with TBP as the reference gene (SiRe_RS05760). Briefly, expression was calculated using the ΔΔCq normalization or comparative Ct method with the reference gene TBP104. This method calculates the relative quantity of the gene of interest (ΔCq) as: \({{\rm{Relative\; Quantity}}}={2}^{({{\rm{CqMin}}}-{{\rm{Cq\; Sample}}})}\) (where CqMin= the average Cq for the sample with the lowest average Cq for the gene of interest; and CqSample= the average Cq for the sample). The relative quantity (RQ) is used to calculate the normalized expression (ΔΔCq) with the following formula: \({{\rm{Normalized\; Expression}}}=\frac{{{\rm{RQsample}}}}{{{\rm{RQsample\; Ref}}}}\).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The raw reads generated in this study were deposited in European Nucleotide Archive under the accession number PRJEB75364 and in Gene Expression Omnibus (GEO) repository under the accession number GSE296035.
References
Wang, Z. Cell cycle progression and synchronization: an overview. Methods Mol. Biol. 2579, 3–23 (2022).
Reyes-Lamothe, R. & Sherratt, D. J. The bacterial cell cycle, chromosome inheritance and cell growth. Nat. Rev. Microbiol 17, 467–478 (2019).
Cezanne, A., Foo, S., Kuo, Y. W. & Baum, B. The archaeal cell cycle. Annu Rev. Cell Dev. Biol. https://doi.org/10.1146/annurev-cellbio-111822-120242 (2024).
McQuillen, R. & Xiao, J. Insights into the structure, function, and dynamics of the bacterial cytokinetic FtsZ-Ring. Annu Rev. Biophys. 49, 309–341 (2020).
Hurley, J. H. ESCRTs are everywhere. EMBO J. 34, 2398–2407 (2015).
Vietri, M., Radulovic, M. & Stenmark, H. The many functions of ESCRTs. Nat. Rev. Mol. Cell Biol. 21, 25–42 (2020).
Harashima, H., Dissmeyer, N. & Schnittger, A. Cell cycle control across the eukaryotic kingdom. Trends Cell Biol. 23, 345–356 (2013).
Cooper, G. M. The eukaryotic cell cycle in The Cell: A Molecular Approach (Sinauer Associates, 2000).
Dang, F., Nie, L. & Wei, W. Ubiquitin signaling in cell cycle control and tumorigenesis. Cell Death Differ. 28, 427–438 (2021).
Milletti, G., Colicchia, V. & Cecconi, F. Cyclers’ kinases in cell division: from molecules to cancer therapy. Cell Death Differ. 30, 2035–2052 (2023).
Yang, L., Besschetnova, T. Y., Brooks, C. R., Shah, J. V. & Bonventre, J. V. Epithelial cell cycle arrest in G2/M mediates kidney fibrosis after injury. Nat. Med 16, 535–543 (2010).
Newton, A. & Ohta, N. Cell cycle regulation in bacteria. Curr. Opin. Cell Biol. 4, 180–185 (1992).
Skarstad, K., Steen, H. B. & Boye, E. Escherichia coli DNA distributions measured by flow cytometry and compared with theoretical computer simulations. J. Bacteriol. 163, 661–668 (1985).
Marczynski, G. T., Dingwall, A. & Shapiro, L. Plasmid and chromosomal DNA replication and partitioning during the Caulobacter crescentus cell cycle. J. Mol. Biol. 212, 709–722 (1990).
Si, F. et al. Invariance of initiation mass and predictability of cell size in Escherichia coli. Curr. Biol. 27, 1278–1287 (2017).
Beaufay, F., Coppine, J. & Hallez, R. When the metabolism meets the cell cycle in bacteria. Curr. Opin. Microbiol 60, 104–113 (2021).
Eme, L., Spang, A., Lombard, J., Stairs, C. W. & Ettema, T. J. G. Archaea and the origin of eukaryotes. Nat. Rev. Microbiol 15, 711–723 (2017).
Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173–179 (2015).
Lindas, A. C. & Bernander, R. The cell cycle of archaea. Nat. Rev. Microbiol 11, 627–638 (2013).
Olsen, G. J. & Woese, C. R. Archaeal genomics: an overview. Cell 89, 991–994 (1997).
Greci, M. D. & Bell, S. D. Archaeal DNA replication. Annu Rev. Microbiol 74, 65–80 (2020).
Liao, Y., Ithurbide, S., Evenhuis, C., Lowe, J. & Duggin, I. G. Cell division in the archaeon Haloferax volcanii relies on two FtsZ proteins with distinct functions in division ring assembly and constriction. Nat. Microbiol 6, 594–605 (2021).
Zhao, S. et al. Widespread photosynthesis reaction centre barrel proteins are necessary for haloarchaeal cell division. Nat. Microbiol 9, 712–726 (2024).
Nusßbaum, P. et al. Proteins containing photosynthetic reaction centre domains modulate FtsZ-based archaeal cell division. Nat. Microbiol 9, 698–711 (2024).
Samson, R. Y., Obita, T., Freund, S. M., Williams, R. L. & Bell, S. D. A role for the ESCRT system in cell division in archaea. Science 322, 1710–1713 (2008).
Makarova, K. S. et al. Diversity, origin, and evolution of the ESCRT systems. mBio 15, e0033524 (2024).
Tarrason Risa, G. et al. The proteasome controls ESCRT-III-mediated cell division in an archaeon. Science https://doi.org/10.1126/science.aaz2532 (2020).
Liu, J. et al. A relay race of ESCRT-III paralogs drives cell division in a hyperthermophilic archaeon. mBio 16, e0099124 (2025).
Lundgren, M. & Bernander, R. Genome-wide transcription map of an archaeal cell cycle. Proc. Natl. Acad. Sci. USA 104, 2939–2944 (2007).
Lundgren, M., Malandrin, L., Eriksson, S., Huber, H. & Bernander, R. Cell cycle characteristics of crenarchaeota: unity among diversity. J. Bacteriol. 190, 5362–5367 (2008).
Hjort, K. & Bernander, R. Changes in cell size and DNA content in Sulfolobus cultures during dilution and temperature shift experiments. J. Bacteriol. 181, 5669–5675 (1999).
Yang, Y. et al. A novel RHH family transcription factor aCcr1 and its viral homologs dictate cell cycle progression in archaea. Nucleic Acids Res 51, 1707–1723 (2023).
Hurtig, F. et al. The patterned assembly and stepwise Vps4-mediated disassembly of composite ESCRT-III polymers drives archaeal cell division. Sci. Adv. 9, eade5224 (2023).
Yen, C. Y. et al. Chromosome segregation in Archaea: SegA- and SegB-DNA complex structures provide insights into segrosome assembly. Nucleic Acids Res 49, 13150–13164 (2021).
Badel, C., Samson, R. Y. & Bell, S. D. Chromosome organization affects genome evolution in Sulfolobus archaea. Nat. Microbiol 7, 820–830 (2022).
Takemata, N., Samson, R. Y. & Bell, S. D. Physical and functional compartmentalization of archaeal chromosomes. Cell 179, 165–179.e118 (2019).
Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Archaeal clusters of orthologous genes (arCOGs): an update and application for analysis of shared features between thermococcales, methanococcales, and methanobacteriales. Life (Basel) 5, 818–840 (2015).
Zhang, C., Phillips, A. P. R., Wipfler, R. L., Olsen, G. J. & Whitaker, R. J. The essential genome of the crenarchaeal model Sulfolobus islandicus. Nat. Commun. 9, 4908 (2018).
Wang, F. et al. Spindle-shaped archaeal viruses evolved from rod-shaped ancestors to package a larger genome. Cell 185, 1297–1307.e11 (2022).
Jain, S., Caforio, A. & Driessen, A. J. Biosynthesis of archaeal membrane ether lipids. Front Microbiol 5, 641 (2014).
Guan, Z. et al. Gene deletions leading to a reduction in the number of cyclopentane rings in Sulfolobus acidocaldarius tetraether lipids. FEMS Microbiol. Lett. 365, fnx250 (2018).
Zeng, Z. et al. Identification of a protein responsible for the synthesis of archaeal membrane-spanning GDGT lipids. Nat. Commun. 13, 1545 (2022).
Zeng, Z., Liu, X. L., Wei, J. H., Summons, R. E. & Welander, P. V. Calditol-linked membrane lipids are required for acid tolerance in Sulfolobus acidocaldarius. Proc. Natl. Acad. Sci. USA 115, 12932–12937 (2018).
Hemmi, H., Shibuya, K., Takahashi, Y., Nakayama, T. & Nishino, T. S)-2,3-Di-O-geranylgeranylglyceryl phosphate synthase from the thermoacidophilic archaeon Sulfolobus solfataricus. Molecular cloning and characterization of a membrane-intrinsic prenyltransferase involved in the biosynthesis of archaeal ether-linked membrane lipids. J. Biol. Chem. 279, 50197–50203 (2004).
De Kock, V., Peeters, E. & Baes, R. The Lrs14 family of DNA-binding proteins as nucleoid-associated proteins in the Crenarchaeal order Sulfolobales. Mol. Microbiol. 123, 132–142 (2025).
Berg, I. A., Kockelkorn, D., Buckel, W. & Fuchs, G. A 3-hydroxypropionate/4-hydroxybutyrate autotrophic carbon dioxide assimilation pathway in Archaea. Science 318, 1782–1786 (2007).
Wang, K. et al. A TetR-family transcription factor regulates fatty acid metabolism in the archaeal model organism Sulfolobus acidocaldarius. Nat. Commun. 10, 1542 (2019).
Schocke, L., Brasen, C. & Siebers, B. Thermoacidophilic Sulfolobus species as source for extremozymes and as novel archaeal platform organisms. Curr. Opin. Biotechnol. 59, 71–77 (2019).
Kalliomaa-Sanford, A. K. et al. Chromosome segregation in Archaea mediated by a hybrid DNA partition machine. Proc. Natl. Acad. Sci. USA 109, 3754–3759 (2012).
Chaudhury, P., Quax, T. E. F. & Albers, S. V. Versatile cell surface structures of archaea. Mol. Microbiol 107, 298–311 (2018).
Henche, A. L. et al. Structure and function of the adhesive type IV pilus of Sulfolobus acidocaldarius. Environ. Microbiol 14, 3188–3202 (2012).
Lassak, K. et al. Molecular analysis of the crenarchaeal flagellum. Mol. Microbiol 83, 110–124 (2012).
Ghosh, A., Hartung, S., van der Does, C., Tainer, J. A. & Albers, S. V. Archaeal flagellar ATPase motor shows ATP-dependent hexameric assembly and activity stimulation by specific lipid binding. Biochem J. 437, 43–52 (2011).
Shahapure, R., Driessen, R. P., Haurat, M. F., Albers, S. V. & Dame, R. T. The archaellum: a rotating type IV pilus. Mol. Microbiol 91, 716–723 (2014).
Kreutzberger, M. A. B. et al. The evolution of archaeal flagellar filaments. Proc. Natl. Acad. Sci. USA 120, e2304256120 (2023).
Makarova, K. S., Koonin, E. V. & Albers, S. V. Diversity and evolution of Type IV pili systems in archaea. Front Microbiol 7, 667 (2016).
Wang, F. et al. The structures of two archaeal type IV pili illuminate evolutionary relationships. Nat. Commun. 11, 3424 (2020).
Liu, J. et al. Two distinct archaeal type IV pili structures formed by proteins with identical sequence. Nat. Commun. 15, 5049 (2024).
Kreutzberger, M. A. B. et al. Convergent evolution in the supercoiling of prokaryotic flagellar filaments. Cell 185, 3487–3500.e3414 (2022).
Charles-Orszag, A., Lord, S. J. & Mullins, R. D. High-Temperature live-cell imaging of cytokinesis, cell motility, and cell-cell interactions in the thermoacidophilic crenarchaeon sulfolobus acidocaldarius. Front Microbiol 12, 707124 (2021).
Lassak, K., Peeters, E., Wrobel, S. & Albers, S. V. The one-component system ArnR: a membrane-bound activator of the crenarchaeal archaellum. Mol. Microbiol 88, 125–139 (2013).
Manica, A. & Schleper, C. CRISPR-mediated defense mechanisms in the hyperthermophilic archaeal genus Sulfolobus. RNA Biol. 10, 671–678 (2013).
Payne, L. J. et al. Identification and classification of antiviral defence systems in bacteria and archaea with PADLOC reveals new system types. Nucleic Acids Res 49, 10868–10878 (2021).
Maestri A. et al. The bacterial defense system MADS interacts with CRISPR-Cas to limit phage infection and escape. Cell Host Microbe. 32, 1412–1426.e11 (2024).
Cvetkovic, M. A., Wurm, J. P., Audin, M. J., Schutz, S. & Sprangers, R. The Rrp4-exosome complex recruits and channels substrate RNA by a unique mechanism. Nat. Chem. Biol. 13, 522–528 (2017).
Quax, T. E. et al. Massive activation of archaeal defense genes during viral infection. J. Virol. 87, 8419–8428 (2013).
Leon-Sobrino, C., Kot, W. P. & Garrett, R. A. Transcriptome changes in STSV2-infected Sulfolobus islandicus REY15A undergoing continuous CRISPR spacer acquisition. Mol. Microbiol 99, 719–728 (2016).
Babu, M. et al. A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair. Mol. Microbiol 79, 484–502 (2011).
Miezner, G. et al. An archaeal Cas3 protein facilitates rapid recovery from DNA damage. Microlife 4, uqad007 (2023).
Sakata, E., Eisele, M. R. & Baumeister, W. Molecular and cellular dynamics of the 26S proteasome. Biochim Biophys. Acta Proteins Proteom. 1869, 140583 (2021).
Santos, A., Wernersson, R. & Jensen, L. J. Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes. Nucleic Acids Res 43, D1140–D1144 (2015).
Pountain, A. W. et al. Transcription-replication interactions reveal bacterial genome regulation. Nature 626, 661–669 (2024).
Hajduk, I. V., Rodrigues, C. D. & Harry, E. J. Connecting the dots of the bacterial cell cycle: Coordinating chromosome replication and segregation with cell division. Semin Cell Dev. Biol. 53, 2–9 (2016).
Zhou, P. & Helmstetter, C. E. Relationship between ftsZ gene expression and chromosome replication in Escherichia coli. J. Bacteriol. 176, 6100–6106 (1994).
Arjes, H. A. et al. Failsafe mechanisms couple division and DNA replication in bacteria. Curr. Biol. 24, 2149–2155 (2014).
Samson, R. Y. et al. Specificity and function of archaeal DNA replication initiator proteins. Cell Rep. 3, 485–496 (2013).
Bernander, R., Lundgren, M. & Ettema, T. J. Comparative and functional analysis of the archaeal cell cycle. Cell Cycle 9, 794–806 (2010).
Basu, S., Greenwood, J., Jones, A. W. & Nurse, P. Core control principles of the eukaryotic cell cycle. Nature 607, 381–386 (2022).
Limas, J. C. & Cook, J. G. Preparation for DNA replication: the key to a successful S phase. FEBS Lett. 593, 2853–2867 (2019).
Bell, S. D. Initiation of DNA Replication in the Archaea. Adv. Exp. Med Biol. 1042, 99–115 (2017).
Lundgren, M., Andersson, A., Chen, L., Nilsson, P. & Bernander, R. Three replication origins in Sulfolobus species: synchronous initiation of chromosome replication and asynchronous termination. Proc. Natl. Acad. Sci. USA 101, 7046–7051 (2004).
Nelson, D. M. et al. Coupling of DNA synthesis and histone synthesis in S phase independent of cyclin/cdk2 activity. Mol. Cell Biol. 22, 7459–7472 (2002).
Weixlbaumer, A., Grunberger, F., Werner, F. & Grohmann, D. Coupling of Transcription and Translation in Archaea: Cues From the Bacterial World. Front Microbiol 12, 661827 (2021).
French, S. L., Santangelo, T. J., Beyer, A. L. & Reeve, J. N. Transcription and translation are coupled in Archaea. Mol. Biol. Evol. 24, 893–895 (2007).
Cuijpers, S. A. G. & Vertegaal, A. C. O. Guiding mitotic progression by crosstalk between post-translational modifications. Trends Biochem Sci. 43, 251–268 (2018).
Vorontsov, E. A., Rensen, E., Prangishvili, D., Krupovic, M. & Chamot-Rooke, J. Abundant lysine methylation and N-terminal acetylation in sulfolobus islandicus revealed by bottom-up and top-down proteomics. Mol. Cell Proteom. 15, 3388–3404 (2016).
Anjum, R. S. et al. Involvement of a eukaryotic-like ubiquitin-related modifier in the proteasome pathway of the archaeon Sulfolobus acidocaldarius. Nat. Commun. 6, 8163 (2015).
Deng, L., Zhu, H., Chen, Z., Liang, Y. X. & She, Q. Unmarked gene deletion and host-vector system for the hyperthermophilic crenarchaeon Sulfolobus islandicus. Extremophiles 13, 735–746 (2009).
Liu, J. et al. Archaeal extracellular vesicles are produced in an ESCRT-dependent manner and promote gene transfer and nutrient cycling in extreme environments. ISME J. 15, 2892–2905 (2021).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).
Stothard, P. The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques 28, 1102–1104 (2000).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinforma. 9, 559 (2008).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44, D733–D745 (2016).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Corel, E. et al. MultiTwin: a software suite to analyze evolution at multiple levels of organization using multipartite graphs. Genome Biol. Evol. 10, 2777–2784 (2018).
Mendler, K. et al. AnnoTree: visualization and exploration of a functionally annotated microbial tree of life. Nucleic Acids Res 47, 4442–4448 (2019).
Avila Cobos, F., Alquicira-Hernandez, J., Powell, J. E., Mestdagh, P. & De Preter, K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat. Commun. 11, 5650 (2020).
UniProt Consortium. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res 51, D523–D531 (2023).
Schmittgen, T. D. & Livak, K. J. Analyzing real-time PCR data by the comparative C(T) method. Nat. Protoc. 3, 1101–1108 (2008).
Acknowledgements
This work was supported by Agence Nationale de la Recherche grant ANR-23-CE13-022 to MK. The work in EB laboratory was supported by an ATM grant from the MNHN (ATM AAP 2023) and an Emergence grant from Sorbonne Université (S21JR31001—IP/S/V2 EMERG-ESPA). MGRV was supported by a stipend from the Pasteur-Paris University (PPU) International PhD Program. We also acknowledge the help of Pierre-Henri Commere and the Flow Cytometry platform at Institut Pasteur. The Biomics Platform, C2RT, Institut Pasteur, Paris, France, is supported by France Génomique (ANR-10-INBS-09) and IBISA.
Author information
Authors and Affiliations
Contributions
M.G.R.V. performed the experimental studies, carried out the analysis, interpreted the data, wrote the original draft, reviewed and edited the manuscript. J.T. and Y.D. constructed the gene co-expression networks, carried out the analysis, interpreted the data, reviewed and edited the manuscript. S.M. carried out the analysis, reviewed and edited the manuscript. E.C., P.L., F.J.L., D.B. and L.P.H. reviewed and edited the manuscript. E.T. and M.M. prepared the library, performed the RNA sequencing and reviewed the manuscript. V.C.K. supervised the work, interpreted the data, reviewed and edited the manuscript. EB designed co-expression network analysis, interpreted the data, supervised the work, reviewed and edited the manuscript. M.K. supervised the work, interpreted the data, wrote the original draft, reviewed and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Daniela Barilla and the other, anonymous, reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gomez-Raya-Vilanova, M.V., Teulière, J., Medvedeva, S. et al. Transcriptional landscape of the cell cycle in a model thermoacidophilic archaeon reveals similarities to eukaryotes. Nat Commun 16, 5697 (2025). https://doi.org/10.1038/s41467-025-60613-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-60613-8