Extended Data Fig. 10: Compositional bias of ribozyme-synthesised amplification products towards family box codons.

Left, Estimated percentages of each triplet substrate incorporated into amplification products in the 73-cycle unseeded amplification reaction (Fig. 5b). The triplet composition of pool H products (Extended Data Fig. 9a) was multiplied by the observed yield of RNA product in the reaction (the equivalent of 284.5 pmols of triplets incorporated out of 800 pmols supplied; see Fig. 5c) to estimate the consumption of each triplet (from the 12.5 pmols of each of the 64 triplets available in the analytical sample). Consumption %s are shown to the right of each triplet sequence, which show strong biases: some triplets are barely incorporated, others are almost fully consumed. Those triplets’ consumptions calculated to exceed 100% may reflect sequence/structure/length preferences of RNA intercalation, adapter ligation and/or RT-PCR in the work-up, not unusual when sequencing short RNAs43. GC-rich triplets are particularly depleted, and their high degree of utilisation might also reflect their better capacity to initiate RNA synthesis on the growing product pool (Extended Data Fig. 4) and more effective inhibition of strand reannealing (Extended Data Fig. 3). Triplets highlighted in grey represent family box codons (FBC) in the genetic code, encoding the same amino acid independently of their third position. Triplets written in purple represent the anticodons of FBCs. Right, Correlations between these classifications and triplet usage. TPR RNA products are 3.8-fold enriched in family box codons, which are thought to have comprised an early genetic code. There are contributions from both nucleobase composition (random sequences with identical nucleobase composition yield a less pronounced 2.8-fold FBC preference) and triplet sequence preferences (there is just a 1.6-fold preference for family box anticodons which have inherently identical GC compositions). TPR-generated amplification products therefore would likely generate longer peptides when translated using a putative primordial genetic code (assuming identical triplet / codon register). As may be expected based upon their base composition, triplets corresponding to modern stop codons were rarely incorporated.