Extended Data Table 1 Embedding-based clustering metrics on three datasets

From: Generalized biological foundation model with unified nucleic acid and protein language

  1. The clustering scores (using K-Means++) of the four embedding methods on the S1, S2, and S3 datasets.