Fig. 1: Approaches to PPI modeling and MINToverview.

a Existing PLMs either process multiple interacting proteins either by concatenating the output embeddings (left) or concatenating the input tokens (right). The former involves making multiple passes through the PLM to generate embeddings for each sequence independently and then concatenating them. The latter treats interacting sequences as a single sequence and generates the embeddings for the concatenated sequence. b MINT treats multiple interacting sequences as separate entities and generates embeddings in a contextual manner that conserves cross-sequence relationships and maintains scalability. This enables it to learn from the vast number of physical PPIs from STRING-DB20 using a modified version of the masked language modeling (MLM) loss. c The workflow and architecture of MINT. Each protein sequence is tokenized using the ESM-2 tokenizer10, and special tokens are added for the start and end of the sequence. Note here that we add these special tokens for each interacting sequence, maintaining sequence identity. Our architecture involves adding cross-attention blocks to the base ESM-2 model. This results in the output representations of each token being affected by the tokens in the same sequence and those in the interacting sequences. Each block is repeated L times, where L is 33 for MINT. d A non-exhaustive list of protein types, PPI properties, and research questions that can be evaluated using MINT. We benchmark it on general protein complexes, antibodies, and TCR-Epitope-MHC interactions against other PLMs. We then provide examples of the types of analysis that can be done using MINT by predicting oncogenic PPIs and SARS-CoV-2 antibody cross-neutralization using experimentally labeled data. Created in BioRender. Ullanat, V. (2025) https://BioRender.com/d80o431.