Fig. 1: Data acquisition and overview of the recombinase sequence data for training the deep learning approach. | Nature Communications

Fig. 1: Data acquisition and overview of the recombinase sequence data for training the deep learning approach.

From: Prediction of designer-recombinases for DNA editing with generative deep learning

Fig. 1

a Illustration describing the data collection. Evolved recombinase libraries were collected and sequenced with the PacBio HiFi method for high accuracy full length reads of the recombinase genes. Gene sequences were translated to protein and stored with the respective target sequences. b Illustration of Cre recombinase dimer binding the loxP DNA target sequence (top). All half-site bases covered by the sequenced recombinase libraries are shown on the bottom. The values in the coverage table indicate the number of target-sites with the respective base (rows) on the respective half-site position (columns). c Frequency of residues selected for in all sequenced libraries when compared to Cre. Positions mutated in >50 percent of the sequences or with >7 different residues observed are indicated with their number. The number of different amino acids selected for at a particular position is color-coded (Residues observed). Numbers in red had not been highlighted previously. d t-SNE dimensionality reduction of 100 random recombinase sequences from all sequenced libraries. Color indicates the amino acid hamming distance (dH(Cre)) of the sequences to Cre. Zoom in on selected libraries on the right. Target site correspondence of zoom ins are indicated. Source data are provided as a Source Data file.

Back to article page