Table 2 Cas12a protein trans-cleavage activity prediction accuracy using different strategies
From: Discovery of CRISPR-Cas12a clades using a large language model
Best model + covariates | Accuracy in test set |
---|---|
LightGBM [1] + embeddings | 69.2% |
CatBoost [2] + PCA-2 | 69.2% |
CatBoost [2] + PCA-3 | 84.6% |
CatBoost [2] + PCA-4 | 92.3% |
CatBoost [2] + PCA-5 | 84.6% |
LightGBM [1] + PCA-6 | 92.3% |
CatBoost [2] + PCA-7 | 76.9% |
RandomForest [3] + PCA-8 | 92.3% |
Neural Network [4] + PCA-9 | 84.6% |
CatBoost [2] + PCA-10 | 76.9% |
CatBoost [2] + PCA-11 | 84.6% |
CatBoost [2] + PCA-12 | 69.2% |
CatBoost [2] + PCA-13 | 69.2% |
CatBoost [2] + PCA-14 | 84.6% |
Neural Network [4] + PCA-15 | 69.2% |