Table 2 Mathematical symbols and notation used in the AttentionWrapper formulation. Bold symbols denote vectors or matrices; italics denote scalar quantities.
From: A histopathology aware DINO model with attention based representation enhancement
Symbol | Description |
|---|---|
\(\textbf{W}\) | Generic weight matrix (layer-specific versions use subscripts, e.g., \(\textbf{W}_1\), \(\textbf{W}_2\)) |
\(\textbf{W}_1\) | Spatial-attention first linear weight (\(C \times d_s\)) |
\(\textbf{W}_2\) | Spatial-attention output weight (\(d_s \times 1\)) |
\(\textbf{W}_3\) | Channel-attention “squeeze” weight (\(C \times d_c\)) |
\(\textbf{W}_4\) | Channel-attention “expand” weight (\(d_c \times C\)) |
\(\textbf{b}_1, \textbf{b}_2, \textbf{b}_3, \textbf{b}_4\) | Bias vectors for the corresponding linear layers |
N | Number of patches/tokens extracted from a WSI (rows in embedding matrix) |
B | Mini-batch size (number of slides/WSIs per iteration) |
C | Embedding/channel dimension of each patch vector |
\(d_s\) | Hidden size used in the spatial-attention MLP |
\(d_c\) | Hidden size used in the channel-attention MLP |
\(\textbf{F}\) | Patch-embedding matrix (\(N \times C\)), where the nth row is \(\varvec{f}_n^\top\) |
\(\varvec{f}_n\) | Embedding vector of the nth patch (C) |
\(\textbf{A}_s\) | Spatial-attention weights across tokens (\(N \times 1\)) |
\(a_{s,n}\) | Spatial-attention score for the nth patch (scalar in (0, 1)) |
\(\textbf{A}_c\) | Channel-attention weights across channels (C) |
\(\textrm{LN}(\cdot )\) | Layer Normalization (applied per token over the channel axis) |
\(\textrm{GELU}(\cdot )\) | Gaussian Error Linear Unit activation |
\(\sigma (\cdot )\) | Sigmoid activation |
\(\textrm{softmax}(\cdot )\) | Softmax normalization |
\(\odot\) | Element-wise product with broadcasting |
\(\oplus\) | Residual addition |
GAP | Global Average Pooling |
\(\textbf{z}\) | Slide-level representation vector after pooling (C) |
p | Dropout probability (in [0, 1]) |