Figure 1
From: Studying pathogens degrades BLAST-based pathogen identification

NCBI nucleic acid and protein databases allow categorization of chimeric material under the taxa most relevant to the study. For example, NCBI protein accession 6PCI_D is classified as Ebola virus since it is the Ebola virus GP2 protein, studied with the aid of an appended twin streptavidin tag. BLAST matching of new material can then misidentify its taxa by matching against the chimeric material. For example, when the twin streptavidin tag is added to the mRuby protein, BLAST on its 3’ end produces a best match with 6PCI_D, since they share their last amino acid before the tag, thus identifying the sequence as controlled Ebola virus material, despite it being completely unrelated. In short, chimeric material can mislead BLAST-based identification of controlled sequences into believing that benign sequences are dangerous or dangerous sequences are benign.