Fig. 3: The model performance on external validation.
From: A versatile CRISPR/Cas9 system off-target prediction tool using language model

On DIG-seq dataset, CCLMoff achieved superior performance (AUROC=0.985 and AUPRC=0.720) than the SOTA model, indicating that CCLMoff can successfully capture off-target pattern revealed by DIG-seq. In DISCOVER-seq and DISCOVER-seq+ dataset, CCLMoff exhibited superior performance in AUPRC (AUPRC=0.661) and considerable performance in AUROC (AUROC=0.944), indicating that CCLMoff have sufficient capacity in recalling the potential off-target sites. In GUIDE-seq dataset, CCLMoff exhibited limited performance (AUPRC=0.810, AUROC=0.279), due to the baseline model was directly trained on the dataset of GUIDE-seq, indicating that the existing model intend to be an approach-specific model instead of general off-target site prediction model.