Introduction

Synthetic rewriting technologies have enabled us to perform large-scale modifications of genetic material through synthetic approaches1,2,3,4. This includes technologies for the assembly, transfer, maintenance, and rearrangement of large DNA. These advancements enable the de novo design, synthesis, and reprogramming of entire genomes. Utilizing this technological pipeline can surpass the limitations of natural genomes, allowing for the design of genetic materials independent of native genomic templates and even endowing them with entirely novel functions5.

The use of synthetic rewriting technologies, combined with diverse genomic design strategies, including codon reprogramming6,7,8,9, genome simplification10,11,12,13,14,15,16, and genome expansion17, enables the large-scale, template-free modification of viruses, bacteria, and unicellular eukaryotic cells. These rewritten genomes provide an unprecedented platform for exploring and reshaping complex phenotypes, addressing problems that are difficult to study within existing genetic frameworks. For example, they have facilitated the rapid production of viruses for vaccine development18,19,20, the creation of minimal bacterial cells to advance fundamental cell biology research11,12,21,22, the reprogramming of organisms for incorporating non-standard amino acids6,7,23, and the extensive reorganization of genomes that can be triggered on demand15. Moreover, they offer new insights into several fields, including the functional consequences of structural variants (SVs) at scale24, complex gene regulatory networks25, 3D chromosome topology-function relationships26,27, causal genotype-phenotype mapping28,29,30,31,32, exploring the plasticity of the genome33,34, protecting viral infections35,36,37, and the production of synthetic proteins35,38.

The design and synthesis of synthetic mammalian genomes is considered the next major frontier and challenge in the field of synthetic genomics1,5. This endeavor helps to elucidate the blueprint of life provided by the Human Genome Project (HGP-read) and addresses a range of human health challenges. The field has garnered support from initiatives in the United States, China, the United Kingdom, and other countries, including projects, such as Genome Project-Write (GP-write)39, the Dark Matter Project4, the National Key R&D Program of China, and the Synthetic Human Genome Project (SynHG). These projects aim to accelerate the development of synthetic rewriting technologies, advance the engineering and testing of synthetic mammalian genomes, and establish ethical frameworks. However, mammalian genomes are significantly more complex than those of unicellular organisms, such as Escherichia coli and Saccharomyces cerevisiae. This complexity arises from the vast scale of mammalian genomes and the prevalence of repetitive DNA40,41,42,43,44,45,46. Core technologies still face substantial challenges, as the assembly and transfer of large DNA fragments are often hindered by shearing forces in vitro, leading to cumbersome processes and reducing efficiency. Additionally, the functionality of vectors in mammalian cells is limited, and the difficulty in precise genome integration further presents technological obstacles.

Recent breakthroughs in DNA assembly, transfer, maintenance, and rearrangement technologies have led to significant advancements in synthetic rewriting in mammalian cells. The use of yeast life cycle-based47, haploidization-based17,48, and conjugation-based49,50 assembly techniques has enabled large-scale iterative assembly, greatly improving the efficiency of DNA assembly at the megabase (Mb) -scale and reducing the time required. These methods have successfully assembled various human sequences at the Mb-scale, demonstrating the feasibility of assembling mammalian DNA. Techniques for transferring large DNA into mammalian cells have improved the success rate of DNA transfer by minimizing the presence of ‘naked DNA’51. Additionally, the development of bottom-up approaches for constructing single-copy human artificial chromosomes (HACs)52 and the continuous optimization of integration strategies53,54,55,56,57,58,59 have provided new solutions for the maintenance of large DNA within mammalian cells. Genome rearrangement technologies that do not require whole-genome-scale synthesis offer novel approaches for constructing SV models or simplifying genomes.

This review highlights the emerging field of rewriting mammalian genomes, providing an overview of key breakthroughs in core enabling technologies related to the large DNA assembly-transfer-maintenance-rearrangement technological pipeline (Fig. 1). These advancements, from a technological breakthrough perspective, have provided unprecedented capabilities for the design and de novo synthesis of mammalian DNA. The field is rapidly transitioning from theoretical frameworks to practical engineering applications, thereby accelerating progress in disease modeling and enabling a range of novel biomedical applications. These developments represent a new generation of rewriting tools in synthetic biology, possessing transformative potential to reshape the future landscape of genomic engineering.

Fig. 1: Synthetic rewriting technologies in mammalian cells.
Fig. 1: Synthetic rewriting technologies in mammalian cells.
Full size image

a Mb-scale DNA and repetitive DNA assembly techniques. Rectangles with different colors represent the assembled fragments. Pink triangles represent the repetitive DNA. b Technologies for the transfer of large DNA to mammalian cells. The left section displays the yeast-mediated transfer techniques, the right section displays the bacteria-mediated transfer techniques, and the bottom section displays the microcell-mediated transfer techniques. And the purple ring and the blue circle respectively represent the cell surface proteins and the envelope proteins. c Methods for stable maintenance of exogenous large DNA in mammalian cells. The top section displays large DNA integration into the genome. The bottom section displays human/mammalian artificial chromosomes (HACs/MACs). The circle represents exogenous large DNA. d Genome rearrangement techniques. Blue, uniform cells indicate cells without rearrangement, while colorful cells represent cells that have undergone rearrangement.

Large DNA and repetitive DNA assembly techniques

Large DNA assembly is a crucial technique for de novo genome synthesis. Compared to bacterial and yeast genomes, mammalian genomes are approximately two orders of magnitude larger and contain high-identity duplicated sequences40. In the T2T-CHM13, segmental duplications account for approximately 218 Mb, primarily located in centromeric regions and five short arms of human chromosomes41. These differences highlight the greater complexity of mammalian genomes, which significantly complicates the assembly of large mammalian DNA. This necessitates the development of assembly techniques for handling large and repetitive DNA (Fig. 2). DNA sequences smaller than 100 kb can typically be assembled in vitro using methods, such as restriction enzyme ligation60,61 and Gibson assembly62. In contrast, large DNA is susceptible to shear force in vitro, which is usually assembled in vivo through homologous recombination, with E.coli and S.cerevisiae serving as the primary hosts. Furthermore, the complex repetitive DNA poses challenges to the assembly strategies based on homologous recombination. The emergence of new technologies has provided new ideas for the construction of repetitive DNA63,64.

Fig. 2: Mb-scale iterative DNA assembly techniques and repetitive DNA assembly techniques.
Fig. 2: Mb-scale iterative DNA assembly techniques and repetitive DNA assembly techniques.
Full size image

a Iterative assembly techniques of Mb-scale DNA in E. coli. Rectangles with different colors represent the assembled fragments. i. The schematic of the BASIS (bacterial artificial chromosome (BAC) stepwise insertion synthesis) technique. ii. The schematic of CALBIA (conjugation-associated linear-BAC iterative assembling) technique. b Iterative assembly techniques of Mb-scale DNA in S. cerevisiae. Rectangles with different colors represent the assembled fragments. i. The schematic of YLC-assembly (large DNA assembly via yeast life cycle) technique. The pink and blue circles represent yeast spores. ii. The schematic of HAnDy (haploidization-based DNA assembly and delivery in yeast) technique. c Assembly techniques for repetitive DNA. The blue arrow represents the repeat unit. i. The schematic of RCA combined with TAR (rolling circle amplification combined with transformation-associated recombination) technique. ii. The schematic of CAT (construction of a repetitive array through transformation) technique. iii. The schematic of Goldengate technique. The orange rectangles represent the restriction enzyme recognition sites. iv. The schematic of BioBrick technique. The rectangles represent the restriction enzyme recognition sites. v. The schematic of AE (amplification editing) technique. The red line represents the target sequence, and the yellow shape represents nCas9 protein and the blue shape represents the reverse transcriptase.

Assembly techniques for Mb-scale DNA

E. coli serves as an assembly host, benefiting from a shorter assembly cycle compared with S. cerevisiae. However, early DNA assembly in E. coli relied on the inefficient RecA-mediated recombination65. The advent of λ-red recombination 66and other related phage-encoded recombination systems67,68 significantly improved recombination efficiency in E. coli. Despite these advancements, the assembly of large-scale DNA remains challenging. The process of importing and exporting DNA for subsequent rounds of assembly is particularly susceptible to shear forces in vitro, which can lead to prolonged cycle times and hinder the assembly of complete constructs. To address the challenges associated with assembling DNA at the Mb-scale, several systems have been developed to overcome these obstacles. Replicon excision enhanced recombination (REXER)9,69 utilizes the CRISPR/Cas9 system to cleave episomal plasmids in vivo, generating target double-stranded DNA (dsDNA) to facilitate recombination mediated by the λ-Red system. This approach circumvents the challenges associated with delivering long dsDNA fragments into cells. The iterative extension of REXER, known as genome stepwise interchange synthesis (GENESIS)9,69, enables the stepwise replacement of genomic DNA with synthetic DNA. Through the use of REXER and GENESIS technologies, a high-fidelity replacement of approximately 0.5 Mb of synthetic DNA in the E. coli genome can be achieved6. Subsequent conjugative-based assembly enabled the synthesis of a 4 Mb synthetic genome. Based on REXER, conjugation coupled with programmed excision for enhanced recombination (CONEXER)49 combines universal spacers and conjugation-based DNA input, simplifying the workflow by circumventing the issues associated with variable CRISPR/Cas9 cleavage efficiencies at different loci and the cumbersome electroporation process. This approach can be extended to the bacterial artificial chromosome (BAC) stepwise insertion synthesis (BASIS) method49, which utilizes conjugation-based transfer for seamless, iterative assembly of Mb-scale DNA using BACs (Fig. 2a. i). With this method, a 1.1 Mb region of the human genome within chromosome 21 can be successfully assembled. The ∆recA E. coli was used for multiple rounds of CONEXER, establishing a rapid and continuous genome synthesis (CGS) method. This approach enabled the replacement of 0.5 Mb of the genome in just 10 days, without the need for sequencing at each step, thereby saving both time and cost. In addition, another conjugation-based assembly strategy, termed conjugation-associated linear-BAC iterative assembling (CALBIA), was developed (Fig. 2a. ii). This method innovatively employed the prokaryotic telomere system TelN-tos to generate linear BACs. Through a single crossover of homologous recombination, thereby facilitating the efficient assembly of large DNA. This approach enabled the assembly and stable maintenance of a 2.1 Mb human DNA, demonstrating sustained genetic stability within the host for up to three days50. To enhance the stability of assembled extrachromosomal large DNA in E. coli, functional replication modules were designed and incorporated into Mb-scale plasmids70.

Due to its efficient homologous recombination capability, S.cerevisiae has become a commonly used platform for large DNA assembly. Previous strategies were mainly based on stepwise assembly, where the initial fragments were assembled into intermediates, linearized in vitro, and then gradually transferred into S. cerevisiae to achieve the assembly of Mb-scale DNA. The assembly of a 1.08 Mb Mycoplasma mycoides genome from eleven 100 kb sequences was completed using yeast spheroplast transformation-associated recombination (TAR)21. In the Synthetic Yeast Genome Project (Sc2.0), the switching auxotrophies progressively for integration (SwAP-in) method was mainly employed to assemble synthetic chromosome fragments to replace wild-type chromosome fragments15,71,72,73,74. The procedure of these methods is straightforward, while the need for multiple time-consuming in vitro steps during large DNA assembly reduces the overall efficiency of obtaining the desired large DNA. Parallel assembly strategies combined with multiple iterative methods have significantly reduced the time required for Mb-scale DNA assembly. The meiotic recombination-mediated (MRA) parallel assembly approach exploits the potential for crossover interchanges of sister chromatids during meiosis in diploids to address the time-consuming issue of stepwise replacement of large DNA, facilitating the assembly of the final stages of the synthetic yeast chromosomes synXII (0.98 Mb) 75and synIV (1.45 Mb)26. Cas9-facilitated homologous recombination assembly (CasHRA)76, assembly via yeast life cycle (YLC-assembly)47, and haploidization-based DNA assembly and delivery in yeast (HAnDy)17,48 utilize the cleavage capability of CRISPR/Cas9 in assembly technology to release linear DNA and improve Mb-level DNA assembly efficiency. Through CasHRA, the assembled large plasmids were transferred into S. cerevisiae via yeast protoplast fusion, linearized, and greatly improved the assembly efficiency76. The YLC-assembly47 integrates the assembly process into the yeast life cycle, reducing damage to large DNA by eliminating complex and time-consuming in vitro manipulations during large DNA assembly (Fig. 2b. i). For the assembly of DNA to Mb-size, both the colony counts and assembly accuracy remained consistent, suggesting that YLC-assembly could be a method independent of size. Using this method, a 1.26 Mb human IGH region was successfully assembled. HAnDy17,48 enables yeast to bypass the meiotic process for sustained mating using CRISPR/Cas9-mediated genome elimination (Fig. 2b. ii). This facilitates haploidization and significantly reduces the experimental period of the sporulation process. This method has no need for any cumbersome operations in vitro. For each round of assembly, more than 10³ colonies can be easily obtained in the selective medium, while maintaining approximately 60% assembly accuracy. They efficiently assembled a 1.024 Mb synthetic accessory chromosome starting from commercially synthesized fragments of around 6 kb. Notably, the entire assembly process was carried out in yeast, thereby demonstrating the potential of yeast as an efficient platform for assembling Mb-scale DNA (Table 1).

Table 1 Comparison of assembly techniques for Mb-scale DNA and repetitive DNA

Assembly techniques for repetitive DNA

Repetitive DNA plays an important role in maintaining chromosomal spatial structure and the development of diseases77,78. For example, the centromere and telomere regions of the human genome are composed of various types of repetitive DNA, and the genome is also interspersed with a large number of transposable elements79,80,81. The α-satellite sequence in the centromere region provides binding sites for centromeric proteins (such as CENP-A), which are involved in chromosome segregation and maintaining genomic stability82. The telomere regions contain tandem TTAGGG repeats, which protect the chromosome ends and insulate them from chromosome shortening83. Transposable elements are closely associated with genetic diversity and genomic rearrangement, and play key regulatory roles84. Constructing large synthetic repetitive DNA will contribute to understanding their functions and uncovering the mechanisms. Several methods have been reported for assembling repetitive DNA. Based on the properties of the obtainable products, the methods can be classified into random-size assembly methods and precise-size assembly methods (Table 1 and Fig. 2c).

Rolling circle amplification (RCA) combined with TAR allows for the rapid amplification of a few hundred base pairs repeat units into large DNA arrays up to 140 kb85 (Fig. 2c. i). Introducing specific seed sequences (short repetitive unit sequences) carrying selection markers and using selection pressure can generate random copies of the seed sequences. In the construction of the synthetic yeast chromosome synXII, rDNA seed sequence was inserted at the designated locus, and the formation of new rDNA clusters (100–200 copies) was achieved through successive amplification under increasing concentrations of hygromycin B75,86. The method was named construction of a repetitive array through transformation (CAT) in subsequent studies86 (Fig. 2c. ii). While these methods enable the convenient acquisition of large repetitive DNA, the precise length of the obtained sequences remains uncertain and typically requires further techniques, such as pulsed-field gel electrophoresis and sequencing for confirmation.

For cases where precise repetitive DNA are needed, methods, such as Goldengate87,88, BioBrick89,90,91, and amplification editing (AE)63 can be employed. Both Goldengate and BioBrick rely on restriction endonucleases and ligase to achieve the assembly of repetitive DNA, thereby avoiding the instability of repetitive DNA often caused by homologous recombination. Through the Goldengate method, an sgRNA array was successfully constructed, incorporating 43 copies of repetitive gRNA scaffold and the hU6 promoter, thereby addressing the scarcity of standard biological parts87 (Fig. 2c. iii). By employing multiple rounds of BioBrick ligation, the 2.7 kb natural repeat sequences were successfully arrayed to reach a total length of 86 kb90 (Fig. 2c. iv). It is worth noting that a recently developed AE technique can precisely generate in situ 1Mb-100Mb duplications in mammalian genomes. This technique uses Cas9 nickases (nCas9) to cut different strands, creating flaps. By strategically altering the orientation of the nCas9 recognition site from a PAM-in orientation to a PAM-out orientation, the flaps can then anneal between each other to produce in situ duplications63 (Fig. 2c. v). This provides a clever approach for manipulating repeat sequences in mammalian cells.

Currently, both E. coli and S. cerevisiae have become highly efficient assembly platforms, which enabled the assembly of Mb-scale human genomic regions47,49,50. Regarding repetitive DNA assembly, assembly techniques have progressed from relying on random methods to utilizing precise assembly strategies. AE represents a significant breakthrough, enabling Mb-scale in situ duplication in the mammalian genome. However, the customized assembly of large-scale repetitive DNA remains a challenge. The optimized repetitive DNA assembly technology in the future should focus on the customized and reprogrammed assembly of different types and lengths of repetitive DNA. Large DNA constructs assembled in E. coli or S. cerevisiae can be delivered into mammalian cells using either direct cell-to-cell transfer or in vitro transfer. The details of the transfer technologies will be elaborated in the following section. Furthermore, assembly techniques developed in S. cerevisiae provide a reference for rewriting the mammalian genome15,71,72,73,74, as exemplified by the mSwAP-In method that directly enables genome modifications in mammalian cells59.

Technologies for the transfer of large DNA to mammalian cells

Efficient transfer methods for large DNA are essential prerequisites for its functionality in target cells. Conventional DNA transfer methods primarily include lipid transfection92,93, electroporation (Fig. 3a. i)94, microinjection95, and viral vectors96,97,98,99 (Table 2). However, delivering large DNA to mammalian cells remains a significant challenge. Compared to the delivery of small-sized DNA, large DNA purified in vitro is more susceptible to fragmentation caused by shear forces, and in vivo transfer methods are often limited by efficiency. Recent advancements have provided desirable solutions to address these issues (Fig. 3).

Fig. 3: Schematic of technologies for the transfer of large DNA to mammalian cells.
Fig. 3: Schematic of technologies for the transfer of large DNA to mammalian cells.
Full size image

a Physical and chemical techniques. The red curve represents transferred DNA. i. The schematic of electroporation. ii. The schematic of lipid transfection. iii. The schematic of microinjection. b Biological techniques. i. The schematic of microcell-mediated chromosome transfer. The purple ring and the blue circle respectively represent the cell surface proteins and the envelope proteins. ii. The schematic of yeast spheroplast-mammalian cell fusion. iii. The schematic of bacteria-mediated transfer methods. iv. The schematic of viral vectors.

Table 2 Comparison of methods for transferring large DNA into mammalian cells

Physical and chemical techniques

Employing some materials for compacting large DNA, as well as cleverly utilizing the natural protection of the cell nucleus, can effectively shield large DNA from shear stress during isolation and purification. When combined with physical and chemical transfer methods, these approaches can significantly improve the integrity of large DNA during transfer.

Polycations are commonly used materials for compacting large DNA, which are positively charged materials that tightly bind to the negatively charged DNA (Fig. 3a. ii). They help to highly condense the DNA, preventing large DNA fragments from mechanical forces during in vitro isolation100. The Globin-HAC was isolated using polyamine buffer and 25% sucrose cushion centrifugation and transferred from K562 cells to target mammalian recipient cells via lipid transfection101. Besides, 1.7 and 2.3 Mb yeast artificial chromosomes (YACs) were purified through high-quality pulsed field gel and compacted against degradation by poly-L-lysine (PLL) or polyethyleneimine (PEI), successfully delivered into HT1080 cells via lipid transfection100.

Nucleus extraction is a clever strategy to protect large DNA, not only to avoid breakage caused by the direct extraction of large DNA, but also to remove excess cellular components and material. By employing the nucleus isolation for chromosomes extraction (NICE) method, which isolates yeast nuclei and preserves chromosome structure, in conjunction with nuclear microinjection (Fig. 3a. iii), a massive 1.14 Mb synthetic human DNA was successfully delivered into mouse metaphase II (MII) oocytes51.

Biological techniques

Natural membrane fusion, spheroplast fusion, conjugation, and viral infection phenomena provide new insights for developing mammalian cell delivery methods. These methods primarily include microcell-mediated chromosome transfer (MMCT), yeast spheroplast-mammalian cell fusion, bacterial-to-cell transfer techniques, and viral vectors.

MMCT enables the transfer of chromosome-scale DNA between mammalian cells (Fig. 3b. i). The core steps of MMCT include: (1) Induce micronucleation: Donor cells are treated with mitotic spindle inhibitors, such as colcemid to induce micronuclei formation. (2) Enucleation of micronucleate cells: Cytochalasin B can promote micronuclei extrusion and centrifugation to collect microcells. (3) Fusion and selection: Microcells are fused with recipient cells, and clones carrying transferred chromosomes are selected via selection102. MMCT efficiency depends on the percentage of micronucleate cells and the number of micronuclei per cell during mitotic arrest (step 1), the degree of actin cytoskeleton disarrangement during micronuclei extrusion (step 2), and the fusion efficiency between microcell and recipient cell membranes (step 3). Traditional MMCT exhibits low efficiency, typically yielding approximately one clone per 10⁵–10⁶ recipient cells. This procedure conventionally employs CHO and A9 cells as standard donor cells, owing to their high efficiency in generating micronuclei. Recent advancements have targeted the optimization of each core step. For Steps 1-2, replacement of colcemid with TN-16+Griseofulvin, substitution of Cytochalasin B with Latrunculin B, and use of Collagen/Laminin surface coating collectively increased MMCT efficiency ~6 times103. For Step 3, employing the hemagglutinating virus of Japan envelope (HVJ-E) system instead of polyethylene glycol (PEG) improved artificial chromosome transfer efficiency from CHO cells to HT1080 and hiMSC by ~3 and 8 times104. Using measles virus (MV) fusogenic envelope proteins (H/F), which specifically bind to human CD46 or SLAM cell surface protein and trigger membrane fusion, enhanced chromosome transfer efficiency by 50 and 100 times in HT1080 and hiMSC cells105. Engineering the MV-H protein fused with single-chain antibodies further expanded the range of targeted human cells, which barely express native MV receptors106. Modified murine leukemia viruses (MLVs) envelope proteins enabled cross-species chromosome transfer (human, monkey, mouse, rat, and rabbit)107. Cryopreservation of microcells at −80 °C has also enhanced experimental flexibility108.

Preparing yeast spheroplasts via enzymatic degradation of the cell wall followed by fusion with mammalian cells represents a promising approach for transferring large DNA from yeast to mammalian systems (Fig. 3b. ii). Current methods have demonstrated the capacity to deliver Mb-scale DNA into diverse cell lines109, including goat fetal fibroblasts110, LA-9 mouse cells111, RCC-1 cells112, mouse embryonic stem (ES) cells113, and mouse L cells114. Notably, this approach exhibits significantly higher fidelity in delivering intact DNA compared to conventional methods, such as lipid transfection, albeit with efficiencies remaining modest (10−5–10−6). Recent optimizations involving mitotic arrest and refined reaction conditions have elevated delivery efficiency to 1/840 in HEK293 cells while expanding applicability to other cells115. Combining the mature DNA assembly techniques from yeast, this method may provide a simplified workflow for synthetic rewriting in mammalian cells, from DNA assembly to transfer. Transgenic mice harboring human TCRαβ loci (1.1 Mb and 0.7 Mb) have been successfully generated through yeast spheroplast-mammalian cell fusion113,116. However, current optimization strategies predominantly focus on PEG-mediated fusion protocols. Future developments could employ other membrane fusion mechanisms to improve throughput and broaden the range of ideal cell types.

Conjugation facilitates the transfer of DNA across bacteria and promotes the transfer of plasmids to be assembled within the bacterial host49,50,117. Similar processes can also occur between distantly related species118, primarily via broad-host-range and F-factor plasmids. Using the RK2 broad-host range conjugative plasmid system with modified E. coli as donor cells enables the transfer of DNA to mammalian cells119. Agrobacterium tumefaciens facilitates T-DNA transfer from Ti-plasmids to human cells120 via VIRD2 endonuclease-mediated single-strand DNA excision and genomic integration. Highly attenuated pathogenic strains have also been utilized as delivery vehicles for transferring DNA into mammalian cells121,122,123,124, which involves bacterial invasion of host cells, followed by their lysis and subsequent plasmid release. Invasive genes can be introduced into other bacteria for DNA delivery. Modified E.coli using the invasion gene of Yersinia pseudotuberculosis allows it to invade HeLa cells and deliver DNA, which can transfer a ~ 200 kb BAC to HeLa cells with an efficiency of 2.8%125. A similar technique can also enhance DNA transfer to COS-1 and CHO cells126. This approach will greatly improve the utility of existing BAC libraries from humans and other mammalian genomes in functional research (Fig. 3b. iii).

By utilizing the ability of viruses to infect cells, viral vectors can also be used for DNA transfer. The commonly used DNA viral vectors are lentiviral, adenoviral, and adeno-associated (AAV) viral vectors96. However, widely used high-efficiency viral vectors have a small carrying capacity, typically within 10 kb127. There are also viral vectors capable of carrying larger DNA, such as the Epstein-Barr virus (EBV) amplicon vector and herpes simplex virus type 1 (HSV-1) amplicon vector97,98,99,128, which can accommodate foreign DNA larger than 100 kb into mammalian cells (Fig. 3b. iv). EBV amplicon delivered a 123 kb insert (total size 152 kb) to the Loukes B-cell line, and achieved an efficiency 2000 times higher than conventional transfection in B cells99. HSV-1 amplicon delivered 100 kb of 17α DNA into HUES-2 and HT1080 cells with efficiencies of 16% and 25%, respectively128.

Currently, technologies at the forefront of the field have enabled the delivery of Mb-scale DNA into mammalian cells. These approaches encompass a variety of strategies, ranging from optimized MMCT103,104,105,106,107,108 and yeast spheroplast-mammalian cell fusion115 to novel strategies that ingeniously exploit the native protective mechanisms of the nucleus51. However, the current low transfer efficiency and significant manipulation difficulty remain the key limitations for the pipeline for synthetic rewriting in mammalian cells. Future technical optimization will focus on further protecting DNA integrity and improving the transfer efficiency into diverse cell types.

Methods for maintenance of exogenous large DNA in mammalian cells

Key strategies for the maintenance of large-scale DNA in mammalian cells are integration methods and HACs/ mammalian artificial chromosomes (MACs). Recent studies have made significant advancements in large DNA integration technologies, particularly in terms of scale, precision, and iterability. The successful establishment of single-copy HACs has also provided new possibilities for the application of artificial chromosomes (Fig. 4).

Fig. 4: Methods for maintenance of exogenous large DNA in mammalian cells.
Fig. 4: Methods for maintenance of exogenous large DNA in mammalian cells.
Full size image

a Large DNA integration into the genome. i. Site-specific recombination-mediated precise integration. The schematic of the Big-in technique. The triangles represent lox sites, the pink rectangle represents the Cre recombinase, and the blue rectangle represents the payload DNA. ii. Homologous recombination-mediated precise iterative integration. The schematic of the mSwAP-in technique. The pink rectangle represents payload A, the blue rectangle represents payload B, and the deep blue rectangle and the orange rectangle represent selection markers. iii. PE-mediated precise integration. A schematic of PASTE technique. The blue line represents Bxb1 attB site, and the pink line represents integrated gene. iv. Transposable elements-mediated precise integration. The left section displays the CAST technique, where the yellow line represents the target site and the blue line represents the edit sequence. The middle section displays the STITCHR technique, where the pink line represents the target site and the blue line represents the edit sequence. The right section displays the PRINT technique, where the red line and the yellow line represent the target sites, and the blue line represents the edit sequence. b Large DNA loading into HACs/MACs. i. The schematic of large DNA loading into alphoidtetO HAC. ii. The schematic of large DNA loading into 4q21LacO HAC. The blue circle represents the M. mycoides genomic DNA.

Techniques for the integration of large DNA into the genome

After the transfer of large DNA into mammalian cells, it can be integrated into the genome to persistently maintain through (1) random integration, (2) site-specific recombinase-mediated precise integration, (3) homologous recombination-mediated precise integration, (4) prime editing (PE)-mediated precise integration, or (5) transposable elements-mediated precise integration.

Large DNA can be stably integrated into cells through random integration113,116,129, but the insertion site, copy number, and orientation are uncertain. The site-specific recombination system-mediated integration of large DNA130,131,132,133 can achieve hundreds of kilobases at a time without relying on homologous recombination. For example, using sequential recombinase-mediated cassette exchange (S-RMCE) in ES cells, the entire human immunoglobulin variable-gene repertoire (about 2.7 Mb) was precisely inserted into the mouse genome94. A systematic pipeline for the precise integration of large DNA, called Big-in (Fig. 4a. i), was established through the introduction of a landing pad (LP) that permits payload integration via recombinase-mediated cassette exchange (RMCE). This platform incorporates efficient positive/negative selection, with integration events validated by hybridization capture sequencing (Capture-seq) and bamintersect134.

Homologous recombination-mediated methods can be used for large DNA replacement in mammalian cells. A strategy utilizing customized, large-scale BAC with homology arms achieved the in situ replacement of 6 Mb of mouse immunoglobulin genes with the corresponding human genes135. This technology also enabled the replacement of the murine TCRαβ variable regions, along with the ectodomains of MHC-I and MHC-II and co-receptors CD4 and CD8α/CD8β, with corresponding human sequences136. Researchers have developed a large-scale, efficient, scarless, iterative, and biallelic integration approach in mouse ES cells, called mammalian switching antibiotic resistance markers progressively for integration (mSwAP-In)59. This method is derived from the yeast genome rewriting method (SwAP-In)15,137 and utilizes CRISPR/Cas9-assisted homologous recombination combined with efficient positive and negative selection strategies (Fig. 4a. ii). Using this method, they achieved iterative genome rewriting of up to 115 kb of a customized Trp53 locus, and humanization of mice by 116 and 180 kb human ACE2 loci. This approach theoretically enables the infinite iterative integration of large DNA in mammalian cells, which have a similar homologous recombination ability to mouse ES cells.

With the continuous advancement of gene editing technologies, significant progress has been made in improving the scale and efficiency of gene integration. Programmable PE can be combined with site-specific integrases to achieve gene integration at precisely placed LPs without inducing DNA double-strand breaks (DSBs)53,54,56. The developed technique, called programmable addition via site-specific targeting elements (PASTE), enables the more efficient and precise insertion of DNA fragments (up to 36 kb) into mammalian and human cells (Fig. 4a. iii)53. Compared to PASTE, PrimeRoot utilizes many optimized components to enhance integration efficiency, offering an alternative strategy for optimizing large-scale DNA integration in mammalian cells138.

The large DNA targeted integration technology based on transposable elements achieves precise genomic insertion that is DSB-free, fundamentally avoiding the risk of random mutagenesis and cellular toxicity associated with the CRISPR/Cas9 system (Fig. 4a. iv). The CRISPR-associated transposases (CAST) utilize Tn7-like transposons for precise large DNA integration. Researchers applied phage-assisted continuous evolution (PACE) to perform hundreds of rounds of mutagenesis on the transposase module of type I-F CAST, resulting in the development of an optimized version, evoCAST, which achieved targeted integration efficiencies of 10%-30% for exogenous genes in human HEK293T cells58. Another system relies on R2 retrotransposons for precise large DNA integration. By harnessing the targeting flexibility of the CRISPR/nCas9 system in combination with the large DNA integration capacity of R2 retrotransposons, site-specific target-primed insertion through targeted CRISPR homing of retroelements (STITCHER) enables scarless integration of DNA up to 12.7 kb57. The precise RNA-mediated insertion of transgenes (PRINT) method marks a breakthrough by utilizing the eukaryotic R2 retrotransposon protein to achieve gene integration. By delivering only two in vitro transcribed RNAs (the message RNA encoding the R2 protein and the template RNA carrying the transgene), the technology achieved stable insertion up to 4 kb139. Further advancements were made through systematic analysis and rational engineering design, which significantly optimized the integration efficiency of the R2 retrotransposon system in mammalian cells. For example, the engineered system successfully achieved the gene integration efficiency exceeding 60% in mouse embryos and 25% in human liver cells. The engineered R2 system (en-R2Tg) demonstrated remarkable on-target integration specificity, reaching up to 99% at the intended locus. This near-elimination of random insertion events underscores its high safety profile for potential gene therapy applications140.

Design and construction of HACs/MACs

HACs/MACs offer key advantages, such as avoiding host genome perturbations and ensuring high mitotic stability. The construction of HACs/MACs is mainly divided into two approaches: top-down and bottom-up. Top-down HACs/MACs are typically formed by directional chromosome truncation mediated by artificial telomeres or the Cre-loxP system. By reducing the natural chromosomes to HACs/MACs that contain functional elements, and then adding payloads into the HACs/MACs. Artificial chromosomes for the human chromosomes X, Y, 14, and 21 have already been constructed using this method, with scales in the Mb-range141,142,143,144. Bottom-up HACs/MACs are typically achieved by introducing 30-200 kb of natural or assembled α-satellite sequences into mammalian cells89,90,91,145,146,147,148. Recently, the bottom-up HACs have made significant progress, providing new strategies for the functional applications of HAC.

A type of strategy uses a synthetic alphoid DNA array carrying multiple tet operator (tetO) sequences that enable the attachment of functional proteins to the array as tet repressor (tetR) fusions. By linking natural α-satellite monomers with synthetic monomers that replace the CENP-B binding motif with a 42 bp tetO sequence, a 50 kb alphoid tetO dimeric repeat sequence can be constructed, enabling de novo HAC formation. This designed HAC can be controlled by a tTA transcriptional activator or tTS transcriptional silencer fused to TetR, thereby allowing for the inactivation of centromere activity when not required149. Using a similar method, the histone acetyltransferase (HAT) domains were attached to the array, breaking the HAC barrier in HeLa cells150.

Furthermore, structural analysis151 and further functional exploration of the formed alphoid tetO HAC have also been conducted. These structurally confirmed HACs/MACs were transferred into homologous recombination-efficient DT40 cells and CHO cells that are amenable to MMCT operations for editing and then transferred into target cells (Fig. 4b. i). Through multiple rounds of MMCT and gene editing, the structure of the HAC remains largely unchanged. The retrofitting of a loxP cassette into DT40 cells by homologous recombination for alphoid tetO HAC gene loading, followed by Cre-loxP recombination in CHO cells to insert EGFP, allows stable expression for at least 12 weeks152. Using similar methods, the 90 kb BRCA1 gene153, ~55 kb NBS1, and ~25 kb VHL genes154 were loaded into HACs/MACs in CHO cells through Cre-loxP recombination and delivered into target cells via MMCT to correct genetic defects and perform a series of functional tests. Chromatin insulators can be used for protection to prevent gene expression from being affected by heterochromatin domains and to avoid potential interference with the function of the centromere155. Recently, the maintenance of alphoid tetO HAC in mouse ES cells and their differentiated progeny in mice, as well as germline transmission in mice, have been evaluated156,157. They found that alphoid tetO HAC is well-tolerated in mouse ES cells, preserving pluripotency in vitro and their differentiated derivatives, and exhibits robust maintenance and expression during mouse ontogeny. Alphoid tetO HAC could be maintained as extrachromosomal elements without selection and could be transmitted through both ova and sperm. These have done preliminary characterization work for subsequent applications in these systems.

Another strategy tries to induce centromere formation by inducing CENP-A nucleosome assembly through targeted recruitment. Seeding CENP-A nucleosome assembly by a pulse expression of targeted LacI-HJURP can induce centromere formation on chr7α-sat BAC LacO and chr11α-sat BAC LacO, which come from CENP-A-poor regions and are deemed incapable of forming de novo HACs. And what’s even more exciting is that 4q21 BAC LacO can form HACs that bypass centromeric DNA, and targeted LacI-HJURP could induce HAC formation without the need for genomic sequences158. Using the 4q21 BAC LacO to load 550 kb M.mycoides DNA in S. cerevisiae has efficiently formed a single-copy HAC in human cells52, offering a potential approach for functional sequence loading (Fig. 4b. ii).

In mammalian cells, large DNA can be maintained via two primary strategies, including integration into the genome and loading into HACs/MACs. Integration strategies have evolved from random integration to precise, targeted integration. Key technologies include the mSwAP-In iterative rewriting technique59 and large DNA integration mediated by DSB-free gene editing tools53,54,56,57,58,139,140. Regarding HACs/MACs construction, de novo centromere formation independent of canonical repetitive DNA can be induced by seeding CENP-A nucleosome assembly158. Future research will focus on increasing the payload size and achieving efficient, multiplexed-target, and precise genomic integration. Moreover, a key objective is to design and construct convenient and efficient HACs/MACs (analogous to BACs and YACs) that can ensure the stable expression of large-scale DNA in mammalian cells. These tools should feature compact structures and high sequence stability, thereby further enriching the synthetic rewriting toolbox for mammalian cells.

Genome rearrangement techniques in mammalian cells

Genome rearrangement is closely associated with driving biological evolution and plays a pivotal role in the development of various complex diseases159,160. The Sc2.0 project enables large-scale rearrangements by incorporating recombination sites during the construction of synthetic chromosomes15,137. However, this process remains highly challenging in mammals. Recent advancements have provided examples of systematically introducing recombination sites into the genome or directly inducing seamless rearrangements (Fig. 5).

Fig. 5: Genome rearrangement techniques.
Fig. 5: Genome rearrangement techniques.
Full size image

a Site-specific recombination systems-based genome rearrangement techniques. i. PE-mediated insertion of recombination sites. ii. PiggyBac transposon-mediated insertion of recombination cassette library. The red rectangle represents repeat sequences, and the diamond represents recombination sites. The rectangles of different colors represent the barcodes. b RNA-guided editing systems-based genome rearrangement techniques. i. CRISPR/Cas9 systems-mediated rearrangements. The top section displays that sgRNAs target specific genomic sequences for precise rearrangements. The gray ovals represent telomeres and black ovals represent centromeres. The bottom section displays that sgRNAs target genomic repeat sequences for random rearrangements. The red rectangle represents repeat sequences. ii. Programmable bridge recombinases-mediated rearrangements. The top section displays bridge recombinases-mediated large DNA inversion, and the bottom section displays bridge recombinases-mediated large DNA excision.

Site-specific recombination systems-based genome rearrangement techniques

The site-specific recombination system, such as the Cre-loxP system, can be used to induce recombination between two specific DNA sites, resulting in deletions, inversions, or other structural variations161. The symmetric loxP (loxPsym) site lacks the directional characteristic of the classical loxP site, and the DNA between two loxPsym sites undergoes inversion and deletion with equal probability. This method has been widely used in the Sc2.0 project to achieve a highly adaptable and multifunctional synthetic yeast genome29,162,163,164,165,166,167,168,169,170,171. Due to the large size of mammalian cell genomes compared to S. cerevisiae, it is currently challenging to introduce site-specific recombination sites at the whole-genome level, as done in yeast through de novo synthesis. Recent research has primarily focused on systematically inserting recombinase recognition sites into the genome through engineered approaches to rearrange the human genome using recombinase systems.

PE was applied to insert loxPsym sites into the targeted enhancer region172. Through Cre recombinase-induced rearrangements, this method was used on the distal super-enhancer of the OTX2 gene to generate a diverse cell pool with a randomized regulatory landscape. Using PE targeting of high-copy-number long interspersed nuclear element-1 (LINE-1) retrotransposons, thousands of recombinase recognition sites were inserted into the human genome (Fig. 5a. i)173. Recombinase-mediated recombination events subsequently generated variants covering three times the human genome in a single experiment. These events included a variety of rearrangements, such as deletions, inversions, extrachromosomal DNAs, fold-backs, and translocations. The resulting cell pool contained cells with an average of over a hundred SVs. Clones with multiple large-scale rearrangements were further characterized, revealing that when variant copy numbers were altered, they significantly impacted gene expression, although they did not affect the expression of adjacent genes. To further improve the efficiency and scale of DNA rearrangements, Sun et al. developed a programmable chromosome engineering (PCE) system, which enabled inversions of up to 12 Mb in mammalian cells174.

Another study developed a method called Genome-Shuffle-seq, which utilizes transposons to integrate loxPsym sites and distinct barcodes into various genomic locations (Fig. 5a. ii). Changes in barcode pairing during the recombination process represent the occurrence of different SVs. By coupling T7 in situ transcription with single-cell RNA-seq, precise localization of SVs and their impact on gene expression can be achieved without the need for whole-genome sequencing. In each experiment with mouse ES cells and human cancer cells, thousands of SVs were generated and localized. The functional effects of large-scale SVs on cellular adaptability, gene expression, and other aspects were further analyzed175. These two technologies provide potential technical approaches for in-depth exploration of the relationship between genotype and phenotype, as well as for the construction of the minimal mammalian genome.

In addition to the Cre-lox system, researchers have discovered a multiplexed site-specific inversion system that can be used in mammalian cells176. When multiple sites coexist, it preferentially promotes inversions between reverse sites rather than deletions between same-oriented sites, allowing for complex DNA rearrangements through random inversions between multiple sites and providing a diverse tool for rearrangements in mammalian cells.

RNA-guided editing systems-based genome rearrangement techniques

RNA-guided editing systems enable genome rearrangements by binding to the target sites on the genome through RNA navigation and functional enzymes to act at the targeted sites. These include CRISPR/Cas9-mediated rearrangements (Fig. 5b. i) as well as bridge recombinases-mediated rearrangements (Fig. 5b. ii).

CRISPR/Cas9-mediated rearrangements primarily occur under the guidance of sgRNA, where the Cas9 nuclease specifically cleaves the DNA double-strand and then religates the DNA through non-homologous end joining (NHEJ) or homologous-directed repair (HDR) pathways, stimulating rearrangements. By designing precise targeting sites, large fragment deletions, inversions, translocations, and copy number variations can be achieved in mammalian cells177,178,179,180. This method can also induce genome-wide rearrangements by targeting repetitive DNA in the mammalian genome, but this method is typically challenging because the extensive DSBs lead to the death of most cells181.

The emergence of bridge recombinase offers a new perspective for precise genome-wide rearrangement182. Bridge RNA (bRNA), binding with IS622 recombinase, specifically recognizes the donor and recipient DNA, ultimately enabling precise rearrangements, including large DNA integration, inversion, and excision. Using this technology, 0.92 Mb inversion and 0.13 Mb deletion were achieved in human cells183 (Table 3).

Table 3 Comparison of genome rearrangement techniques

Currently, both site-specific recombination systems and RNA-guided editing systems have enabled customized Mb-scale inversion and whole-genome random rearrangements in mammalian cells173,174,175,181,183. Moreover, integration tools, such as PE and transposons have facilitated the pre-installation of recombination sites173,175. These technological advancements enable the generation of thousands of SVs at the whole-genome level, allowing for rapid functional analysis. Separately, CRISPR/Cas9 stimulates rearrangement by inducing DSBs. However, its primary limitation in large-scale rearrangement is the high cellular toxicity associated with extensive DSBs181. A transformative breakthrough is the DSB-free RNA-guided bridge recombinase system, which can perform precise, Mb-scale DNA rearrangements183. Nevertheless, large-scale rearrangement remains constrained by operationally complex and low efficiency. Future efforts in genome rearrangement technology should focus on expanding the types and enhancing the efficiency of rearrangements via DSB-free RNA-guided enzyme systems. The ultimate goal is to create a scalable platform to enable in-depth exploration of genotype-phenotype relationships and investigation of the genomic regulatory landscape.

Applications of synthetic rewriting techniques to mammalian systems

Synthetic rewriting technologies, such as assembly, transfer, maintenance, and rearrangement, facilitate the introduction of diverse modifications into mammalian genomes and enable the customization of genetic material. These advancements have significantly expanded both the scale and diversity of the customizable genome range. Such progress allows for the construction of better cell models to understand diseases and test new therapies, while also offering insights into the long-standing complexities of genome regulation.

Constructing mammalian cell models and animal models

Synthetic rewriting technologies can support the construction of models that are difficult to achieve through traditional methods, including simulating cell models caused by abnormal gene amplification and complex loss or mutations in genes, as well as developing humanized mouse models and aneuploid mouse models.

The in situ abnormal amplification of genes, as well as the extensive deletion or mutation of gene sequences, are difficult to simulate or treat using traditional methods. The AE technology facilitates the study of the functions and underlying mechanisms of genomic SVs by simulating gene amplification and generating repetitive DNA at different scales. The HbH model was established in the erythroid-related K562 cell line for the study of α-thalassemia, and the chromosome microduplication genetic model was established in mouse and human ES cells to explore the functional impact of SVs on chromosome scales63. Duchenne muscular dystrophy (DMD) is caused by multiple deletions or mutations in the dystrophin gene. Using a HAC with the 2.4 Mb complete human genome sequence of the dystrophin gene, the genetic defects in iPS cells of human DMD patients were successfully corrected184. Missense p53 mutations are commonly observed in cancer, owing to frequent deamination of 5-methylcytosine, resulting in C to T conversion. Using the mSwAP-in technology, leading to the generation of the 115 kb synTrp53 in mouse ES cells, which demonstrates enhanced resistance to spontaneous mutagenesis59.

Mice are commonly employed as preclinical models. However, due to evolutionary divergence, human disease physiology or immune responses are not fully recapitulated. Humanized mouse models constructed via mSwAP-in technology enable better modeling of human diseases. Using the mSwAP-in technology, humanized mice can be efficiently constructed by replacing the mouse Ace2 gene with human ACE2 loci of 116 kb and 180 kb, creating a more human-like infection model59. The introduction of Mb-scale human immunoglobulin genes or human TCR repertoire sequences into mice establishes models for studying human immune responses and developing monoclonal antibodies. Based on the assembly-yeast spheroplast-ES cell fusion-integration pathway, mice with human TCRαβ gene loci113,116 have been successfully established. Mice with human immunoglobulin genes135,185 were established through precise customized BACs and in situ replacements. These mice provide a platform for the discovery of therapeutic antibodies and the identification of pathogenic and therapeutic human TCRs. Constructing aneuploid mouse models is crucial for understanding chromosomal disorders and cancer186. To date, trisomic mouse models, such as Tc1 and TcMAC21 have been established using technologies like MMCT and MAC, successfully recapitulating many phenotypes of Down syndrome (DS) with broad potential to support basic and clinical research187,188. Furthermore, the induction of chromosome loss by simultaneously generating multiple DSBs at target chromosomal locations using CRISPR/Cas9 has successfully deleted sex chromosomes or autosomes in various cell lines, mouse embryos, and in vivo tissues189,190. Prospectively, leveraging insights from chromosomal rearrangement in S. cerevisiae29,162,163,164,165,166,167,168,169,170, synthetic rewriting technologies could be used to introduce programmable recombination sites into aneuploid cells via strategies, such as de novo design and assembly or multiplex editing. These approaches would facilitate the establishment of a flexible and reconfigurable mouse trisomy model. Such systems enable the systematic induction of chromosomal rearrangements, thereby elucidating the key genomic regions and intergenic interactions underlying aneuploidy-associated pathologies.

Revealing the complex regulatory mechanisms of the genome

Synthetic rewriting technologies can be used to study the complex regulatory mechanisms of the genome, including the basic principles of epigenetic establishment, the complex regulatory roles of non-coding elements, and the complex biological effects caused by chromosome fusion.

Due to the heritable and stable characteristics of epigenetic modifications on natural chromosomes, it is difficult to learn the basic principles of their establishment by erasing epigenetic marks. The SynNICE method provides a unique platform for exploring de novo epigenomic regulatory mechanisms. The de novo assembled 1.14 Mb human AZFa (hAZFa) locus was delivered into early mouse embryos, where spontaneous incorporation of mouse histones and the establishment of DNA methylation at the single-cell stage were observed51.

Non-coding ‘dark matter’ constitutes approximately 98% of the human genome, a space where multiple distant regulatory variants interact in unknown ways, and it is increasingly recognized as a significant factor in the genetics of numerous diseases and traits4,191,192,193. Traditional methods struggle to comprehensively analyze the complex regulatory mechanisms of the genome by merely removing, disrupting individual regulatory components, or overlooking their context. The design-assembly-transfer-maintenance approach enables complete design freedom for synthetic DNA sequences, helping us address long-standing challenges in regulatory genomics. By de novo designing and assembling an 86 kb α-globin super-enhancer, a new class of regulatory elements called ‘facilitator’ was revealed. Facilitators lack intrinsic enhancer activity but potentiate the activity of classical enhancers in a position-dependent manner194. Using Big-IN to rewrite the mouse Sox2 locus to construct deletions, rearrangements, and inversions of single or multiple distal clusters of DNase I hypersensitive sites (DHSs), as well as surgical alterations to transcription factor (TF) recognition sequences, the expression of the endogenous Sox2 locus was analyzed, demonstrating the role of context in regulating the function of genomic regulatory elements195. By introducing a synthetic ~100 kb human HPRT1 locus reverse sequence (HPRT1R) into S. cerevisiae and mouse ES cells, a new perspective on the default transcriptional state of eukaryotic genomes was explored. In S. cerevisiae, both the reverse and natural HPRT1 loci exhibit widespread activity, whereas the inverted locus shows no activity at all in mouse ES cells, instead displaying repressive chromatin characteristics196. By constructing different rat HoxA variants ranging from 130-170 kb in yeast and integrating them into ectopic loci in the genome, the independent ability of these variants to reconstruct different aspects of HoxA regulation at ectopic clusters was tested197.

There are no effective methods to explore the complex biological effects caused by chromosome fusion. Employing CRISPR/Cas9-mediated genome rearrangement in mouse cells, centromere-centromere chromosome fusions and centromere-telomere chromosome fusions have been achieved, yielding mice with fused chromosomes177,179,198. These models have been utilized to dissect the multifaceted impact of chromosomal fusion phenomena on gene expression, stem cell differentiation, chromatin organization, mouse phenotypes, and mouse development.

Discussion

Synthetic rewriting technologies can be used either independently or within the assembly-transfer-maintenance-rearrangement pipeline and have unique value for constructing mammalian disease models and studying complex regulatory mechanisms of genomes. Despite the substantial progress made in mammalian synthetic rewriting technologies, several limitations remain. (1) Using existing technologies, Mb-scale DNA assembly can be completed efficiently in E. coli and S. cerevisiae. However, challenges remain for assembling mammalian genomic DNA. Key limitations include the absence of methods for larger-scale assembly (such as at the 10 Mb-100 Mb scale) and difficulties in the fidelity and stability of repetitive DNA. Base insertions and deletions have been observed within mononucleotide and TA-rich repeats in some E. coli assembly strains49,50, while deleted DNA fragments have been reported in interspersed repeats at the IGH locus in some S. cerevisiae assembly strains47. To advance mammalian genome rewriting, future efforts should focus on scaling up assembly capacity, developing or modifying assembly platforms to improve the accurate assembly and stabilization of highly repetitive regions, and establishing automated workflows for large DNA construction. Additionally, it is essential to implement convenient detection technologies to identify and correct unintended errors introduced during genome manipulation. (2) Emerging technologies related to the transfer of large DNA into mammalian cells offer new solutions to overcome the limitations of in vitro operations on large DNA. However, they are still constrained by challenges, such as low efficiency, limited recipient cell types, and potential contamination from the host genome. Therefore, developing more efficient, universal, and safe transfer methods is the future trend of large DNA transfer technologies. (3) Recent advancements in single-copy artificial chromosomes, combined with the construction of a ~ 12 Mb yeast chromosome199,200, imply that yeast centromere vectors can serve as hosts for synthetic mammalian chromosomes. These studies may enable the exploration of larger artificial chromosomes constructed in yeast before transferring them into mammalian cells, offering a new possibility for rewriting mammalian chromosomes. (4) Recently, the genome rearrangement technologies in mammalian cells have enabled large-scale rearrangements, facilitating the study of the potential mechanisms of SVs in mammalian cells. In terms of rearrangement strategies, enhancing the efficiency and developing multi-site rearrangements of bridge recombinases is expected to achieve more extensive and precise rearrangements without the need to introduce recombination sites. Additionally, combined with technologies, such as DNA assembly, it is possible to synthesize re-arrangeable sequences and induce rearrangements in specific regions. In terms of functionality, like the Sc2.0 project, genome-scale rearrangements with high-throughput analysis and detection methods not only support basic genomic research in mammalian cells but also have the potential to be used to explore phenotypic evolution.

Given that mammals are multicellular organisms, the genetic stability of their germline and tissue function is crucial, making stem cells an important cornerstone in the synthetic rewriting technology-mediated modification of mammalian cells. Stem cells, particularly pluripotent and embryonic stem cells, have expandability and potent differentiation capacity, which provides abundant platforms for subsequent disease modeling and therapy201. Currently, strategies based on DNA assembly, transfer, and maintenance have enabled the construction of animal models and the large-scale analysis of complex regulatory mechanisms in stem cells59,194,195,196,197. Notably, haploid embryonic stem cells (haESCs) are a type of stem cell containing a single set of chromosomes202,203,204, which can be combined with various synthetic rewriting technologies, providing a powerful platform for the construction of complex models and enabling large-scale functional gene screening.

The rapid development of artificial intelligence (AI) in multiple fields provides technical support for the synthetic rewriting of mammalian genomes. Generative AI has demonstrated its potential in designing and predicting functional elements205,206,207,208. For example, AlphaGenome can interpret the function of large-scale non-coding regions in the mammalian genome209, and Evo2 can predict the functional impact of DNA sequence variations, generate genomic-scale sequences, and control epigenetic structures210,211. Cello provides logical gate structures and mathematical models to represent the dynamic behavior of gene circuits212. The combination of AI-assisted genome design and synthetic rewriting technologies will further advance the process of mammalian genome rewriting, significantly enhance the understanding of mammalian synthetic biology, and contribute to the development of human medicine and health.