Fig. 6: Flow diagram of CF-random’s blind mode.

A Foldseek database is created from CF-random generated structures (collectively called the ensemble). This database is used to evaluate each structure’s similarity to the ensemble. Contrasting default mode in Fig. 1, two reference structures are not required, enabling a blind search. Principal component analysis followed by two different clustering algorithms produce the final subset of CF-random conformations that represent the structural variance in the ensemble. From left to right: CF-random blind mode produces 200 predicted structures; all structures are queried against the generated database to create the similarity matrix. Predicted structures from T. maratima IMPase (PDB ID: 2P3V) are depicted within the similarity matrix representation. Principal component analysis and HDBSCAN then produce a reduced space with similar structures clustered near each other (scatter plot top right), and K-medoids selects representative structures from the HDBSCAN clusters (2P3V predicted structures for the green and yellow clusters are shown bottom left, purple and blue structures in Supplementary Fig. 5). Source data are provided as a Source Data file.