Fig. 4

Distribution of protein sequence lengths across the modeling types. Protein length distribution for the whole UniProt reference proteome UP000005640, sequences not considered in our dataset (omitted), sequences considered in our dataset are shown, together with the length distribution of sequences which were modeled with homology modeling and the three AI-guided prediction tools AlphaFold 2 (if available, downloaded from the AlphaFold database), OpenFold and ESMFold. Boxplots are superpositioned with respective violin plots. The protein length is shown on a logarithmic scale.