Table 1 Pretrained models’ parameters. The models include different variations of EfficientNet (B0-B7) and CLIP-based architectures, such as RN50, RN101, and vision transformer (ViT) models. These models are evaluated for their generalization performance in our benchmarking testbed.

From: A practical generalization metric for deep networks benchmarking

EfficientNet

# Params

CLIP

# Params

efficientnet-b0

5.3M

RN50

38M

efficientnet-b1

7.8M

RN101

56M

efficientnet-b2

9.2M

RN50x4

87M

efficientnet-b3

12M

RN50x16

167M

efficientnet-b4

19M

RN50x64

420M

efficientnet-b5

30M

ViT-B/32

87M

efficientnet-b6

43M

ViT-B/16

86M

efficientnet-b7

66M

ViT-L/14

304M