Extended Data Fig. 4: Experimental validation of proteins identified through ProTrek searches.
From: A trimodal protein language model enables advanced protein searches

a, Sequence alignment of hUDG-Y147A with proteins identified via ProTrek searches. S-V1 to S-V5 represent the top five hits obtained from the sequence-to-sequence search, while T-V1 to T-V5 represent the top five hits obtained from the text-to-sequence search. Yellow-highlighted regions denote highly conserved sequences. Sequence alignment was performed using Clustal Omega. b, Proteins S-V1 to S-V5 and T-V1 to T-V5 were fused with Cas9n following the introduction of a mutation analogous to UDG-Y147A. Notably, S-V1 and T-V1 refer to the same protein. eGFP was used as a negative control, while the mock group represented cells without any treatment. HeLa cells were transfected with the base editor constructs alongside specific sgRNAs targeting Dicer-1 and VEGFA. Five days post-transfection, thymine nucleotide substitutions at the target sites were quantified using high-throughput sequencing (HTS), with the mutated nucleotide positions annotated relative to the 5’ end of the protospacer. Data are presented as mean ± s.d. from two independent experiments (n = 2).