Extended Data Fig. 1: Identification of novel DNA-dependent RNA polymerase B (RNApolB) clades in the sunlit ocean.

The maximum-likelihood phylogenetic tree (LG+F+R10 model, 906 sites) is based on 2,728 RNApolB sequences more than 800 amino acids in length with similarity <90% (gray color in the inner ring) identified from 11 large marine metagenomic co-assemblies. This analysis also includes 262 reference RNApolB sequences (red color in the inner ring) corresponding to known archaeal, bacterial, eukaryotic and giant virus lineages for perspective. The middle ring shows the number of RNApolB sequences from the 11 metagenomic co-assemblies that match to the selected amino acid sequence with identity >90% (log10). The outer ring displays selections made for the different clades. Finally, RNApolB new lineages are labelled with a red dot for mirusviruses (subclades were characterized in subsequent analyses) and in blue for Proculviricetes.