Table 9 Ablation study: impact of prompting strategies on classification NAS performance

From: Large language models driven neural architecture search for universal and lightweight disease diagnosis on histopathology slide images

Backbone

Dataset

Prompt strategy

Iterations (↓)

API Calls (↓)

Prec@1 (%)↑

FLOPs↓

Params (M)↓

ShuffleNet

BreakHis

EP (Ours)

10

3.5

99.98***

213.30M

1.80

  

GAP

10

3.9

95.32

238.27M

2.38

  

NSP

14

3.5

94.73

225.86M

2.07

 

Diabetic

EP (Ours)

10

4.2

73.22***

240.25M

2.10

  

GAP

11

4.5

66.85

328.80M

2.81

  

NSP

10

4.3

66.30

249.54M

2.44

ViT

BreakHis

EP (Ours)

10

4.7

98.08***

4.95G

25.12

  

GAP

15

7.5

96.09

4.83G

24.53

  

NSP

15

4.6

95.30

4.60G

23.35

 

Diabetic

EP (Ours)

10

4.6

70.38***

4.13G

20.99

  

GAP

10

4.6

52.17

4.95G

25.12

  

NSP

10

5.4

63.86

4.95G

25.12

  1. Results are averaged over 5 runs. Best results for each metric within a group are typically achieved by EP. The Prec@1 of EP consistently outperforms the counterpart of GAP and NSP with very high significance (p < 0.001).
  2. EP Expert Prompt (Ours), GAP Generic Assistant Prompt, NSP No System Prompt, Prec@1: Top-1 accuracy.