Abstract
Massively parallel genetic screens have been used to map sequence-to-function relationships for a variety of genetic elements1,2,3,4,5. However, as these approaches interrogate only short sequences, it remains challenging to perform high-throughput assays on constructs containing combinations of multiple sequence elements arranged across multi-kb length scales. Overcoming this barrier could accelerate synthetic biology; by screening diverse gene circuit designs and learning ‘composition to function’ mappings, genetic part composability rules could be revealed, enabling rapid identification of behaviour-optimized design variants6,7. Here we introduce CLASSIC (combining long- and short-range sequencing to investigate genetic complexity), a genetic screening platform that combines long- and short-read next-generation sequencing (NGS) modalities to quantitatively assess pools of constructs of arbitrary length containing diverse genetic part compositions. We show that CLASSIC can measure expression profiles of over 105 gene circuit designs (from 5–20 kb) in a single experiment in human cells. The resulting datasets can be used to train machine-learning models that accurately predict circuit behaviour across expansive circuit design landscapes, revealing part composability rules that govern circuit performance. Our study shows that, by expanding the throughput of each design–build–test–learn cycle, CLASSIC enhances the pace and scale of synthetic biology and establishes an experimental basis for data-driven design of complex genetic systems.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
All Nanopore and Illumina sequencing datasets generated in this study are available from the Sequencing Read Archive (BioProject: PRJNA1347054).
Code availability
All custom scripts used for Nanopore sequencing data analysis are available at GitHub (https://github.com/cbashorlab/WIMPY). Code associated with Illumina data analysis and model training are available at GitHub (https://github.com/cbashorlab/CLASSIC). All other scripts used to generate any analysis in addition to those provided above are available on request.
References
de Boer, C. G. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat. Biotechnol. 38, 56–65 (2020).
Castillo-Hair, S. et al. Optimizing 5′UTRs for mRNA-delivered gene editing using deep learning. Nat. Commun. 15, 5284 (2024).
Angenent-Mari, N. M., Garruss, A. S., Soenksen, L. R., Church, G. & Collins, J. J. A deep learning approach to programmable RNA switches. Nat. Commun. 11, 5057 (2020).
Sahu, B. et al. Sequence determinants of human gene regulatory elements. Nat. Genet. 54, 283–294 (2022).
Jones, E. M. et al. Structural and functional characterization of G protein-coupled receptors with deep mutational scanning. eLife 9, e54895 (2020).
Zhang, C., Tsoi, R. & You, L. Addressing biological uncertainties in engineering gene circuits. Integr. Biol. 8, 456–464 (2016).
Kitano, S., Lin, C., Foo, J. L. & Chang, M. W. Synthetic biology: learning the way toward high-precision biological design. PLoS Biol. 21, e3002116 (2023).
English, M. A., Gayet, R. V. & Collins, J. J. Designing biological circuits: synthetic biology within the operon model and beyond. Annu. Rev. Biochem. 90, 221–244 (2021).
Mahata, B. et al. Compact engineered human mechanosensitive transactivation modules enable potent and versatile synthetic transcriptional control. Nature Methods 20, 1716–1728 (2023).
Slusarczyk, A. L., Lin, A. & Weiss, R. Foundations for the design and implementation of synthetic genetic circuits. Nat. Rev. Genet. 13, 406–420 (2012).
Bashor, C. J. & Collins, J. J. Understanding biological regulation through synthetic biology. Annu. Rev. Biophys. 47, 399–423 (2018).
Bashor, C. J., Hilton, I. B., Bandukwala, H., Smith, D. M. & Veiseh, O. Engineering the next generation of cell-based therapeutics. Nat. Rev. Drug Discov. 21, 655–675 (2022).
Beitz, A. M., Oakes, C. G. & Galloway, K. E. Synthetic gene circuits as tools for drug discovery. Trends Biotechnol. 40, 210–225 (2022).
Kitada, T., DiAndreth, B., Teague, B. & Weiss, R. Programming gene and engineered-cell therapies with synthetic biology. Science 359, eaad1067 (2018).
Cameron, D. E., Bashor, C. J. & Collins, J. J. A brief history of synthetic biology. Nat. Rev. Microbiol. 12, 381–390 (2014).
Yeung, E. et al. Biophysical constraints arising from compositional context in synthetic gene networks. Cell Syst. 5, 11–24 (2017).
Lou, C., Stanton, B., Chen, Y. J., Munsky, B. & Voigt, C. A. Ribozyme-based insulator parts buffer synthetic circuits from genetic context. Nat. Biotechnol. 30, 1137–1142 (2012).
Muller, I. E. et al. Gene networks that compensate for crosstalk with crosstalk. Nat. Commun. 10, 4028 (2019).
Kinney, J. B., Murugan, A., Callan, C. G. Jr. & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA 107, 9158–9163 (2010).
Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).
Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA 110, 14024–14029 (2013).
Taskiran, I. I. et al. Cell-type-directed design of synthetic enhancers. Nature 626, 212–220 (2024).
Gosai, S. J. et al. Machine-guided design of cell-type-targeting cis-regulatory elements. Nature 634, 1211–1220 (2024).
Agarwal, V. et al. Massively parallel characterization of transcriptional regulatory elements. Nature 639, 411–420 (2025).
Bogard, N., Linder, J., Rosenberg, A. B. & Seelig, G. A deep neural network for predicting and engineering alternative polyadenylation. Cell 178, 91–106 (2019).
Khoroshkin, M. et al. A generative framework for enhanced cell-type specificity in rationally designed mRNAs. Preprint at bioRxiv https://doi.org/10.1101/2024.12.31.630783 (2024).
Gera, T., Jonas, F., More, R. & Barkai, N. Evolution of binding preferences among whole-genome duplicated transcription factors. eLife 11, e73225 (2022).
DelRosso, N. et al. Large-scale mapping and mutagenesis of human transcriptional effector domains. Nature 616, 365–372 (2023).
Zhou, Y. et al. Encoding genetic circuits with DNA barcodes paves the way for machine learning-assisted metabolite biosensor response curve profiling in yeast. ACS Synth. Biol. 11, 977–989 (2022).
Wong, A. S., Choi, G. C., Cheng, A. A., Purcell, O. & Lu, T. K. Massively parallel high-order combinatorial genetics in human cells. Nat. Biotechnol. 33, 952–961 (2015).
Matreyek, K. A. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882 (2018).
Liu, H. et al. Magic pools: parallel assessment of transposon delivery vectors in bacteria. mSystems 3, e00143-17 (2018).
Weber, E., Engler, C., Gruetzner, R., Werner, S. & Marillonnet, S. A modular cloning system for standardized assembly of multigene constructs. PLoS ONE 6, e16765 (2011).
Duportet, X. et al. A platform for rapid prototyping of synthetic gene networks in mammalian cells. Nucleic Acids Res. 42, 13440–13451 (2014).
Petitclerc, D. et al. The effect of various introns and transcription terminators on the efficiency of expression vectors in various cultured cell lines and in the mammary gland of transgenic mice. J. Biotechnol. 40, 169–178 (1995).
Khalil, A. S. et al. A synthetic biology framework for programming eukaryotic transcription functions. Cell 150, 647–658 (2012).
Maeder, M. L., Thibodeau-Beganny, S., Sander, J. D., Voytas, D. F. & Joung, J. K. Oligomerized pool engineering (OPEN): an ‘open-source’ protocol for making customized zinc-finger arrays. Nat. Protoc. 4, 1471–1501 (2009).
Li, H. S. et al. Multidimensional control of therapeutic human cell function with synthetic gene circuits. Science 378, 1227–1234 (2022).
Feil, R., Wagner, J., Metzger, D. & Chambon, P. Regulation of Cre recombinase activity by mutated estrogen receptor ligand-binding domains. Biochem. Biophys. Res. Commun. 237, 752–757 (1997).
Bashor, C. J. et al. Complex signal processing in synthetic gene circuits using cooperative regulatory assemblies. Science 364, 593–597 (2019).
Donahue, P. S. et al. The COMET toolkit for composing customizable genetic programs in mammalian cells. Nat. Commun. 11, 779 (2020).
Muldoon, J. J. et al. Model-guided design of mammalian genetic programs. Sci. Adv. 7, eabe9375 (2021).
Kabadi, A. M. & Gersbach, C. A. Engineering synthetic TALE and CRISPR/Cas9 transcription factors for regulating gene expression. Methods 69, 188–197 (2014).
La Russa, M. F. & Qi, L. S. The new state of the art: Cas9 for gene activation and repression. Mol. Cell. Biol. 35, 3800–3809 (2015).
Sadowski, I., Ma, J., Triezenberg, S. & Ptashne, M. GAL4-VP16 is an unusually potent transcriptional activator. Nature 335, 563–564 (1988).
Shin, Y. et al. Spatiotemporal control of intracellular phase transitions using light-activated optoDroplets. Cell 168, 159–171 (2017).
Schneider, N. et al. Liquid-liquid phase separation of light-inducible transcription factors increases transcription activation in mammalian cells and mice. Sci. Adv. 7, eabd3568 (2021).
Gossen, M. & Bujard, H. Tight control of gene expression in mammalian cells by tetracycline-responsive promoters. Proc. Natl Acad. Sci. USA 89, 5547–5551 (1992).
Shin, Y. & Brangwynne, C. P. Liquid phase condensation in cell physiology and disease. Science 357, eaaf4382 (2017).
Tycko, J. et al. Development of compact transcriptional effectors using high-throughput measurements in diverse contexts. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02442-6 (2024).
Tague, E. P., Dotson, H. L., Tunney, S. N., Sloas, D. C. & Ngo, J. T. Chemogenetic control of gene expression and cell signaling with antiviral drugs. Nat. Methods 15, 519–522 (2018).
Jiang, K. et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 387, eadr6006 (2025).
Lin, J., Luo, R. & Pinello, L. EPInformer: a scalable deep learning framework for gene expression prediction by integrating promoter-enhancer sequences with multimodal epigenomic data. Preprint at bioRxiv https://doi.org/10.1101/2024.08.01.606099 (2024).
Wimmer, E., Mueller, S., Tumpey, T. M. & Taubenberger, J. K. Synthetic viruses: a new opportunity to understand and prevent viral disease. Nat. Biotechnol. 27, 1163–1172 (2009).
Brophy, J. A. & Voigt, C. A. Principles of genetic circuit design. Nat. Methods 11, 508–520 (2014).
Pinglay, S. et al. Synthetic regulatory reconstitution reveals principles of mammalian Hox cluster regulation. Science 377, eabk2820 (2022).
Voigt, C. A. Synthetic biology 2020-2030: six commercially-available products that are changing our world. Nat. Commun. 11, 6379 (2020).
Valeri, J. A. et al. Sequence-to-function deep learning frameworks for engineered riboregulators. Nat. Commun. 11, 5058 (2020).
Hollerer, S. et al. Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping. Nat. Commun. 11, 3551 (2020).
Rai, K., Wang, Y., O’Connell, R. W., Patel, A. B. & Bashor, C. J. Using machine learning to enhance and accelerate synthetic biology. Curr. Opin. Biomed. Eng. 31, 100553 (2024).
Karst, S. M. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169 (2021).
Chung, C. T., Niemela, S. L. & Miller, R. H. One-step preparation of competent Escherichia coli: transformation and storage of bacterial cells in the same solution. Proc. Natl Acad. Sci. USA 86, 2172–2175 (1989).
Parrish, J. R. et al. High-throughput cloning of Campylobacter jejuni ORfs by in vivo recombination in Escherichia coli. J. Proteome Res. 3, 582–586 (2004).
Currin, A. et al. Highly multiplexed, fast and accurate nanopore sequencing for verification of synthetic DNA constructs and sequence libraries. Synth. Biol. 4, ysz025 (2019).
De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).
Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Hermann, M. et al. Binary recombinase systems for high-resolution conditional mutagenesis. Nucleic Acids Res. 42, 3894–3907 (2014).
Acknowledgements
We thank O. Igoshin, A. Patel, Y. Lagisetty, S. Singh and the members of the Bashor laboratory for discussions. This work was supported by grants from NIH R01 EB029483 (C.J.B.), NIH R01 EB032272 (C.J.B.), ONR N00014-21-1-4006 (C.J.B.) and funding from the Robert J. Kleberg Jr and Helen C. Kleberg Foundation (C.J.B.). This work was also supported by the Genetic Design and Engineering Center (GDEC) at Rice University, which is funded by CPRIT RP210116. R.W.O. was supported by a graduate fellowship from the American Heart Association (917746). B.K. was supported by a NLM Training Program in Biomedical Informatics and Data Science fellowship (T15LM007093-31) and by NIH grant P01-AI15299901. K.D.C. was supported by NSF EF-2126387 and the Ken Kennedy Institute Computational Science & Engineering Recruiting Fellowship. T.J.T. was supported by NSF grants IIS-2239114 and EF-2126387, NIH grant P01-AI152999 and AI2Health cluster funding from Ken Kennedy Institute, Rice University. P.M. and J.W.R were supported by NIH R35GM119461 (P.M.).
Author information
Authors and Affiliations
Contributions
R.W.O., K.R. and C.J.B. conceived the study. R.W.O. and K.R. carried out the experiments and developed the analysis software, with assistance from T.C.P., Y.W., L.B.C.B., K.D.S., J.A.W., S.L., T.H.Z., E.M.R. and A.S.; R.W.O., K.R. and T.C.P. developed the modular cloning scheme and LP cell line, with assistance from S.L.; B.K., K.D.C. and T.J.T. helped to develop the barcoding scheme and analysis software. R.W.O., K.R., T.C.P., Y.W., J.W.R., P.M. and C.J.B. analysed the data. C.J.B. supervised the study. R.W.O., K.R. and C.J.B wrote the manuscript, with input from all of the authors.
Corresponding author
Ethics declarations
Competing interests
A provisional patent application that covers technologies described in this Article has been filed by Rice University.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Overview of 166K-member gene circuit library construction.
a, Timeline of library construction process from level 0 part fragments to level 3 libraries. b, Schematic of the construction strategy for each level of the assembly process (grey boxes), with number of cells indicated for assembly transformation and plating (brown circles, 10 cm plates; brown rectangles, autoclave trays), colony scraping, and HEK-LP transfection (pink circles, 10 cm plates). The inputs and products for each assembly level are represented to the right. c, Nanopore sequencing read-length distribution of level 3 single-input inducible circuit library, with the relative proportion of each identifiable DNA product denoted: library members (pink), re-circularized level 3 destination vector (orange), empty level 3 destination vector (dark grey), contamination from level 0, 1, or 2 genetic parts (light grey), or unidentified DNA species (blue).
Extended Data Fig. 2 166k-member library balance and barcoding analysis metrics.
a, Confusion matrices showing the percentage of reads unambiguously assigned to each genetic part (on-diagonal) and reads ambiguously linked to two parts (off-diagonal) following level 3 library assembly. Individual values representing <0.5% of the total reads are not shown. The percentage of reads in which a part identity could not be determined are shown at the bottom of the respective part confusion matrices. b, Number of barcodes determined to be uniquely mapped to a single composition (“unique”) or multiple compositions (“non-unique”), as determined from Nanopore sequencing analysis of the level 3 library.
Extended Data Fig. 3 Comparison and optimization of ML models.
a, Top: schematic outlining the process of generating training (light grey), validation (blue), test (purple), and isolate (navy) sets. CLASSIC data sets are first divided into high- (>12 barcodes, dark grey) or general-quality (<12 barcodes, light grey) sets before generating the training, validation, and test splits. Bottom: comparison of the performance of 5 different model classes (linear regression, quadratic regression, random forest (RF), convolutional neural network (CNN), and multi-layer perceptron (MLP)) for predicting circuit behaviour using varying amounts of training data (x-axis), as monitored using test (purple line), validation (light blue line), and isolate (navy line) set r2 values (y-axis) (see supplementary text section S3.5). Grey shaded region on each plot represents a regime in which the training set is dominated by general-quality reads. For the selected model class (MLP), training curves of the root mean squared error (RMSE) and loss for the validation set are provided. b, Basal (purple line) and induced (navy line) r2 values for predicted vs observed CLASSIC expression from the trained MLP. Insets represent CLASSIC vs predicted measurements for > 1 (bottom left, r2 = 0.43) and > 12 (top middle, r2 = 0.80) barcodes per composition, and the number of compositions for increasing number of barcodes per composition (bottom right). c, Hyperparameter optimization (HPO) of the MLP, monitored using the validation set: learning rate (LR, y-axis) for different numbers of layers (x-axis) with 4 layers and a learning rate of 5 × 10−2 providing highest r2 (red outline) (top); solver choice, with SGDM leading to the highest r2 (red bar) (bottom left); momentum, with a momentum of 0.9 providing the highest r2 (red circle) (bottom right). d, Comparing ground-truth basal expression (left), induced expression (middle), and fold change measurements (right) (x-axis) with HPO MLP predictions (y-axis) for the isolate set (r2 = 0.96, r2 = 0.91, MAE = 0.22, respectively, n = 40).
Extended Data Fig. 4 Summary of cell lines constructed from the 166k-library design space to validate predictions from the MLP model.
a, Basal and induced eGFP expression levels for each constructed cell line circuit composition overlaid onto single-input behaviour space (clonally isolated, green; constructed out-of-sample, red; constructed in-sample, teal; contour for 97.5% of compositions in the MLP-predicted behaviour space, grey). Dotted lines separate behavioural regions of interest: low basal (<500 AU), purple arrow; high induction (>70k AU) blue; high fold-change (HFC) (>25x, green). b, Comparing fold-change values for MLP model-predictions (red, left, n = 136) or CLASSIC measurements (green, right, n = 121) with ground truth cell line measurements. c, Residual between model and CLASSIC measurements. Heatmap corresponds to the manhattan distance between residuals of basal and induced expression values calculated across a 20 × 20 grid in the behaviour space. Variants were assigned to grids based on CLASSIC measured values. d, Basal and induced expression behaviour for the top 10 highest error compositions from the individual variants shown in (a). CLASSIC measurements (grey) and corresponding ground truth values measured from constructed cell lines(green) are shown. Lines (blue dashed) link each pair of CLASSIC and ground truth expression.
Extended Data Fig. 5 Clustering analysis of HFC compositions.
a, Gap test for cluster number (left) and subsequent UMAP projection of HFC variants grouped into 3 cluster (cluster A, blue; cluster B, red; cluster C, purple) (middle left). Cluster similarity scores from 100 independent k-means clustering outcomes for all compositions, and adjusted rand score for all pairs of clustering results (right). Means represent the population cluster similarity and adjusted rand index respectively. b, Part usage frequency for variants in each cluster. c, Mapping of basal and induced eGFP expression of compositions from each of the three clusters, overlaid on a contour constructed from 97.5% of the data from the MLP-predicted behaviour space (see Fig. 4a) (grey fill). d, Distribution of basal (dotted line) and induced eGFP expression (solid line) (bottom axis), as well as fold change values (grey line, top axis) for each cluster. Circles represent the median values, boxes span the 25th to 75th percentiles, and the upper and lower whiskers represent the median +/− 1.5x IQR (line ends). Median values are shown to the left of the plot. Sample sizes (n): Cluster A = 5,018, Cluster B = 452, and Cluster C = 62.
Extended Data Fig. 6 Fine-tuning model.
a, Schematic depicting a proposed method for expanding the model-predicted design space (red) to include 2 new parts (NFZ and no IDP) by fine-tuning the model using small libraries of new parts (green) (left). Representation of the position of the new parts in the synTF architecture (right). b, Schematic outlining the assembly strategy to retroactively add the TA NFZ to the design space. Transparent green boxes signify individual plasmids or plasmid pools that contain the new part. c, Schematic outlining the assembly strategy to retroactively add IDP-less variants the design space. Transparent green boxes signify individual plasmids or plasmid pools that contain the new part. d, eGFP distributions for the new libraries to explore this sub-space. e, Table of the number of cells sorted into each bin for both inducer conditions during flowSeq. f, 8 individually constructed variants from the sub-space to validate CLASSIC measurements. Grey region, ERCH; Green square, HFC region. g & h, Comparison of basal eGFP expression predictions and CLASSIC measurements for a high-quality test set of compositions lacking an IDP (panel g) or containing an NFZ TA (panel h), using either a base model (white dots with black outline, r2 = 0.90 or r2 = 0.81, respectively) or a fine-tuned model (purple dots, r2 = 0.94 or r2 = 0.89, respectively) (left). Breakdown of the basal (purple) and induced (teal) expression prediction accuracy with increasing amounts of fine-tuning data from the IDP lacking (g) or NFZ-containing (h) libraries, as assessed by monitoring the test set r2 (middle). A 2D map outlining the amounts of base library and no IDP library (g) or NFZ-containing (h) data required for optimal fine-tuning of the base model, as determined by the test set r2 (right). i, 11 individually constructed variants from the sampled (teal) and un-sampled (red) expanded design space to validate fine-tuned model predictions. Grey region, ERCH; Green square, HFC region.
Extended Data Fig. 7 Hyperparameter tuning and validation of base MLP model.
a, Hyperparameter optimization for the multi-layer fully connected neural network. Validation r2 values for (left) 2D combinations of learning rate (y-axis) and number of layers (x-axis) for varied amounts of training data used, (top right) momentum parameter for stochastic gradient descent with momentum (SGDM), and (bottom right) different solvers. Most optimal parameter from each scan is shown in red. b, Training curves showing RMSE (top) and loss (bottom) as a function of training iteration for the validation set. Training was stopped with a validation patience parameter of 300 iterations. c, Comparison of model predictions to randomly isolated clones from the library. MAE: mean absolute error. d, Scatter plots of the test set from the base model for each of the four input conditions (basal: light grey, navy: OHT only, orange: GZV only, green: both inducers). r2, Pearson’s r2.
Extended Data Fig. 8 Validation of model predictions with individual measurements.
a, Flow cytometry was used to measure basal, OHT-induced, GZV-induced, and dual-induced eGFP expression levels for 36 individually constructed cell lines harbouring integrated circuit compositions sampled from across the multi-input library behaviour space. Green bar plots (flow cytometry measurement) and dotted red outlines (model predictions) are shown for each circuit for all four conditions (far left, basal; middle left, 4-OHT-induction; middle right, GZV-induction; far right, dual induction). Bars represent the mean of the expression distribution for a single measurement. KL-divergence (DKL) from AND (top) or OR (bottom) gate shown below each plot (prediction, red; measurement, green). Numbers in grey circles represent an index for that circuit. A legend explaining the layout of each plot is shown in a grey rectangle. b, The AND-OR coordinates of each cell line (green dots), superimposed on the contour of the design space (grey).
Extended Data Fig. 9 Extended AND-gate clustering analysis of the multi-input library.
a, Clustering of the multi-input AND-like behaviour space (top left) and part usage across the clusters. b, AND cluster expression distributions across the 4 input conditions (basal, black; 4-OHT, navy; GZV, orange; Both, green), represented by boxplots outlining the interquartile range (IQR) (box), the median (white band), and the median +/− 1.5x IQR (line ends). Median values are shown to the left of the plot. Sample sizes (n): Cluster A = 11,167, Cluster B = 9,854, and Cluster C = 3,627. c, Cluster stability analysis of compositions in the AND-like behaviour space by computing the cluster similarity index (top) across 100 UMAP projections and cluster calculations, and adjusted rand index (bottom) across every pairwise combination of clustering results across the 100 UMAP projections and cluster calculations.
Extended Data Fig. 10 Extended OR-gate clustering analysis of the multi-input library.
a, Clustering of the multi-input OR-like behaviour space (top left) and part usage across the clusters. b, OR cluster expression distributions across the 4 input conditions (basal, black; 4-OHT, navy; GZV, orange; Both, green), represented by boxplots outlining the interquartile range (IQR) (box), the median (white band), and the median +/− 1.5x IQR (line ends). Median values are shown to the left of the plot. Sample sizes (n): Cluster A = 9,240, Cluster B = 7138, Cluster C = 2,908, and Cluster D = 1,545. c, Cluster stability analysis of compositions in the OR-like behaviour space by computing the cluster similarity index (top) across 100 UMAP projections and cluster calculations, and adjusted rand index (bottom) across every pairwise combination of clustering results across the 100 UMAP projections and cluster calculations.
Supplementary information
Supplementary Information
Supplementary Notes and Supplementary Figures supporting the Article and its Extended Data Figures.
Supplementary Table 1
DNA sequences used in this study. This includes genetic parts and primers.
Supplementary Table 2
Flow cytometry measurements, model predictions and CLASSIC measurements (where applicable) for individually constructed variants and isolated cell lines.
Supplementary Table 3
Part use, MI and clustering information from the single-input and multi-input libraries.
Supplementary Table 4
A breakeven table for calculating the cost of CLASSIC experiments of varying sizes and complexity.
Supplementary Table 5
A list of published single- and dual-input inducible circuits, as well as information such as cell type, integration method, enrichment strategy, inducer molecule, output and FC (where applicable).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rai, K., O’Connell, R.W., Piepergerdes, T.C. et al. Ultra-high-throughput mapping of genetic design space. Nature (2026). https://doi.org/10.1038/s41586-025-09933-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41586-025-09933-9


