Fig. 1: Overview of sequence classification benchmark workflow. | Nature Communications

Fig. 1: Overview of sequence classification benchmark workflow.

From: Benchmarking DNA foundation models for genomic and genetic tasks

Fig. 1

DNA sequences are input into foundation models, generating token embeddings from the final layer. These embeddings undergo output pooling to produce high-dimensional representations of input sequences. A supervised classifier (random forest) is trained on these embeddings using labeled datasets. Model performance is evaluated on a independent test set using multiple metrics, with AUROC as the primary measure.

Back to article page