Fig. 2: Schematic comparison of cross-attention (in text-to-image generation) and LAMA-attention (in protein conformational ensemble generation). | Nature Machine Intelligence

Fig. 2: Schematic comparison of cross-attention (in text-to-image generation) and LAMA-attention (in protein conformational ensemble generation).

From: Conditional diffusion with locality-aware modal alignment for generating diverse protein conformational ensembles

Fig. 2

a, In traditional cross-attention, each pixel in the generated image is potentially related to all tokens in the input text without prior algorithmic control on the attention field. b, In the LAMA-attention, each residue-pair (Res-Pair) representation ij is related to only those residues that are likely to interact with residue i and j biologically—a stronger, locality-aware spatial alignment between sequence and structure.

Back to article page