Table 2 Mathematical symbols and notation used in the AttentionWrapper formulation. Bold symbols denote vectors or matrices; italics denote scalar quantities.

From: A histopathology aware DINO model with attention based representation enhancement

Symbol

Description

\(\textbf{W}\)

Generic weight matrix (layer-specific versions use subscripts, e.g., \(\textbf{W}_1\), \(\textbf{W}_2\))

\(\textbf{W}_1\)

Spatial-attention first linear weight (\(C \times d_s\))

\(\textbf{W}_2\)

Spatial-attention output weight (\(d_s \times 1\))

\(\textbf{W}_3\)

Channel-attention “squeeze” weight (\(C \times d_c\))

\(\textbf{W}_4\)

Channel-attention “expand” weight (\(d_c \times C\))

\(\textbf{b}_1, \textbf{b}_2, \textbf{b}_3, \textbf{b}_4\)

Bias vectors for the corresponding linear layers

N

Number of patches/tokens extracted from a WSI (rows in embedding matrix)

B

Mini-batch size (number of slides/WSIs per iteration)

C

Embedding/channel dimension of each patch vector

\(d_s\)

Hidden size used in the spatial-attention MLP

\(d_c\)

Hidden size used in the channel-attention MLP

\(\textbf{F}\)

Patch-embedding matrix (\(N \times C\)), where the nth row is \(\varvec{f}_n^\top\)

\(\varvec{f}_n\)

Embedding vector of the nth patch (C)

\(\textbf{A}_s\)

Spatial-attention weights across tokens (\(N \times 1\))

\(a_{s,n}\)

Spatial-attention score for the nth patch (scalar in (0, 1))

\(\textbf{A}_c\)

Channel-attention weights across channels (C)

\(\textrm{LN}(\cdot )\)

Layer Normalization (applied per token over the channel axis)

\(\textrm{GELU}(\cdot )\)

Gaussian Error Linear Unit activation

\(\sigma (\cdot )\)

Sigmoid activation

\(\textrm{softmax}(\cdot )\)

Softmax normalization

\(\odot\)

Element-wise product with broadcasting

\(\oplus\)

Residual addition

GAP

Global Average Pooling

\(\textbf{z}\)

Slide-level representation vector after pooling (C)

p

Dropout probability (in [0, 1])