Fig. 1: Model architecture of Ligand-Transformer.

Ligand-Transformer represents a protein-ligand complex as a heterogeneous graph, incorporating residue and atom sets from the protein and ligand, respectively, along with pairwise features. The graph is formed from inputs generated by AlphaFold2 for proteins and GraphMVP for ligands, which are then re-encoded into an initial complete graph. This graph is subsequently refined through a 12-layer transformer-like network. This network updates both node and edge representations via self-attention with pair bias. The output is processed by the affinity head for binding affinity prediction and by the distance head for distance distribution prediction. Components within dotted-line boxes have fixed parameters, whereas those within solid lines are trainable.