Table 2 Mathematical symbols and notation used in the AttentionWrapper formulation. Bold symbols denote vectors or matrices; italics denote scalar quantities.

From: A histopathology aware DINO model with attention based representation enhancement

Symbol	Description
\(\textbf{W}\)	Generic weight matrix (layer-specific versions use subscripts, e.g., \(\textbf{W}_1\), \(\textbf{W}_2\))
\(\textbf{W}_1\)	Spatial-attention first linear weight (\(C \times d_s\))
\(\textbf{W}_2\)	Spatial-attention output weight (\(d_s \times 1\))
\(\textbf{W}_3\)	Channel-attention “squeeze” weight (\(C \times d_c\))
\(\textbf{W}_4\)	Channel-attention “expand” weight (\(d_c \times C\))
\(\textbf{b}_1, \textbf{b}_2, \textbf{b}_3, \textbf{b}_4\)	Bias vectors for the corresponding linear layers
N	Number of patches/tokens extracted from a WSI (rows in embedding matrix)
B	Mini-batch size (number of slides/WSIs per iteration)
C	Embedding/channel dimension of each patch vector
\(d_s\)	Hidden size used in the spatial-attention MLP
\(d_c\)	Hidden size used in the channel-attention MLP
\(\textbf{F}\)	Patch-embedding matrix (\(N \times C\)), where the nth row is \(\varvec{f}_n^\top\)
\(\varvec{f}_n\)	Embedding vector of the nth patch (C)
\(\textbf{A}_s\)	Spatial-attention weights across tokens (\(N \times 1\))
\(a_{s,n}\)	Spatial-attention score for the nth patch (scalar in (0, 1))
\(\textbf{A}_c\)	Channel-attention weights across channels (C)
\(\textrm{LN}(\cdot )\)	Layer Normalization (applied per token over the channel axis)
\(\textrm{GELU}(\cdot )\)	Gaussian Error Linear Unit activation
\(\sigma (\cdot )\)	Sigmoid activation
\(\textrm{softmax}(\cdot )\)	Softmax normalization
\(\odot\)	Element-wise product with broadcasting
\(\oplus\)	Residual addition
GAP	Global Average Pooling
\(\textbf{z}\)	Slide-level representation vector after pooling (C)
p	Dropout probability (in [0, 1])

Back to article page

Quick links

Search

Quick links