Table 2 Cas12a protein trans-cleavage activity prediction accuracy using different strategies

From: Discovery of CRISPR-Cas12a clades using a large language model

Best model + covariates

Accuracy in test set

LightGBM [1] + embeddings

69.2%

CatBoost [2] + PCA-2

69.2%

CatBoost [2] + PCA-3

84.6%

CatBoost [2] + PCA-4

92.3%

CatBoost [2] + PCA-5

84.6%

LightGBM [1] + PCA-6

92.3%

CatBoost [2] + PCA-7

76.9%

RandomForest [3] + PCA-8

92.3%

Neural Network [4] + PCA-9

84.6%

CatBoost [2] + PCA-10

76.9%

CatBoost [2] + PCA-11

84.6%

CatBoost [2] + PCA-12

69.2%

CatBoost [2] + PCA-13

69.2%

CatBoost [2] + PCA-14

84.6%

Neural Network [4] + PCA-15

69.2%