Fig. 1: The procedures of HMM construction and model optimization.
From: Targeted assemblies of cas1 suggest CRISPR-Cas’s response to soil warming

HMM construction (a): near full length of cas1 protein sequences previously used in Cas1 TIGRFAM, Pfam and mentioned in the literature were retrieved from NCBI and were cluster at 50% sequence identity after aligned by MAFFT and dereplicated by RDPTools. Eight HMMs were constructed based on the seed sequences in each cluster. Model optimization (b): the HMM performance was evaluated by the simulated reads generated from a set of reference genomes carrying 17 subtypes of CRISPR-Cas systems. We optimized the HMMs by updating the coverage of the corresponding seed sequences and Framebot files.