Kawasaki et al. construct a dataset covering 26 viral families and use large language models pre-trained on nucleotide sequences to identify zoonotic viruses with human infectivity potential. High predictive performance was obtained, even with partial viral sequences, but not all zoonotic lineages could be identified.
- Junna Kawasaki
- Tadaki Suzuki
- Michiaki Hamada