Fig. 1: Bioinformatics and machine learning to derive PET hydrolase sequences from natural diversity.
From: Sourcing thermotolerant poly(ethylene terephthalate) hydrolase scaffolds from natural diversity

A PET hydrolase candidates (74 total) selected by HMM and ML shown with a minimum-evolution phylogenetic tree. Sequences retrieved from environmental (meta)genomes in JGI IMG with lower HMM scores (groups 1–3) are notably diverse compared to the sequences that comprise the rest of the tree (groups 4–7). The symbols around the tree show expression, activity, and previously reported PET activity. Full organism names and accession numbers are shown in Supplementary Table 9, and sequence identity between these 74 sequences and previously reported PETases is shown in Supplementary Table 8. A maximum-likelihood phylogenetic tree of all experimentally confirmed PET hydrolases is shown in Supplementary Fig. 1. B Sequence Similarity Network (SSN) of PET hydrolases with experimentally confirmed PET hydrolase activity, including sequences examined in this study and previously reported PETases. Edges represent pairwise BLAST similarity with E-value < 1e–10. The SSN clusters are consistent with the associated families in the ESTHER database57, and show that most reported PET hydrolases fall in the polyester-lipase-cutinase family. We note that these clusters are different from phylogenetic groups in (A). Full details of experimentally verified PET hydrolases are shown in Supplementary Tables 1 and 10.