Figure 2
From: Scaling up DNA digital data storage by efficiently predicting DNA hybridisation using deep learning

Visual summary of the yield distribution at 57 °C (the temperature used throughout the paper). For a discussion on the behaviour of the yield at different temperatures, see Supplementary Information 14. The yield is binned in 10 groups, each spanning a 0.1 interval. Brighter colours correspond to higher yield and are shared for the two subfigures. (a) Low and high values are the most numerous, which is expected considering our generative procedure. (b) The highest density is achieved at the extremes of 0, respectively 1. Given how sensitive the molecules are even to a 1-base change, we count the entire intermediate range of yields [0.1, 0.9) as one entity when considering how balanced the dataset is. In this regard, 1,058,364 pairs achieve low yields (\(< 0.1\)), 769,750 achieve very high yields (\(\ge 0.9\)) and 728,862 are in-between. It is important that samples with extremely low or high yields are well represented. In particular, there are virtually endless combinations of base pairs resulting in minimum yield.