Table 9 Comparison of time complexity and parameter sizes for different modules.

Module	Time complexity	Params
Standard Transformer¹³	\(O(H^2 \times W^2 \times C_{in})\)	4.09M
DCF (Ours)	\(O(H \times W \times C_{in}^2)\)	1.54M
Ordinary Convolution	\(O(H \times W \times K^2 \times C_{in} \times C_{out})\)	65.79K
DyConv⁴¹	\(O(H \times W \times K^2 \times C_{in} \times C_{out} \times N)\)	264.22K
LMC (Ours)	\(O(H \times W \times K^2 \times C_{in} \times C_{out} \times N)\)	287.84K
MHSA¹³	\(O(H^2 \times W^2 \times C_{in})\)	263.17K
W-MSA³⁵	\(O(H \times W \times C_{in}^2 + N_w \times W_s^4 \times C_{in})\)	66.05K
WAT (Ours)	\(O(H^2 \times W^2 \times C_{in} / 16)\)	887.23K

Among them, H, W, \(C_{in}\) represent the height, width, and number of channels of the input feature map, respectively; \(C_{out}\) represents the number of output channels; K denotes the kernel size of the convolution; \(N_w\) represents the number of windows; and \(W_s\) represents the spatial size of each window.
The best result is indicated in bold.

Quick links

Search