Table 1 CSW-S network structure details.

Structure	Input		Convolution kernel	Passage	Step length
Convolutional token enbedding	24 × 24 × 3		7 × 7	64	54
Structure	Input	depth	number head	dim	reso	split-size
Stage1	3136 × 64	1	2	64	56	1
Stage2	3136 × 64	2	4	128	28	2
Stage3	3136 × 64	2	8	256	14	7
Stage4	3136 × 64	1	16	512	7	7

Dim refers to the dimension of the fully connected network spread into and number head the number of multi-headed attention heads. Each head is responsible for different correlations. The resolution is the size of the picture before it is spread into a vector, and depth is the number of repetitions.

Quick links

Search