Table 3 Detailed search cost and classification performance comparison for one-shot NAS methods on the BreakHis and Diabetic datasets

From: Large language models driven neural architecture search for universal and lightweight disease diagnosis on histopathology slide images

Dataset: BreakHis

 

ShuffleNet backbone

ViT backbone

Metric

Random search

Cream

Pathology-NAS

Random search

AutoFormer

Pathology-NAS

Iterations ↓

500

300

10

500

300

10

GPT-4 API Calls

0

0

10

0

0

10

FLOPs ↓

275.96M

442.99M

213.30M

4.42G

1.28G

4.95G

Prec@1 (%) ↑

95.21 ± 0.34

97.13 ± 0.41

99.98 ± 0.27***

95.67 ± 0.22

96.21 ± 0.39

98.08 ± 0.26***

API Cost ($)

0.00

0.00

0.13

0.00

0.00

0.17

Latency (hrs)

0.000

0.000

0,001

0.000

0.000

0.001

ST (GPU hrs) ↓

32.40

10.63

7.42

67.28

31.72

14.88

TT (GPU hrs) ↓

32.400

10.630

7.421

67.280

31.720

14.881

Dataset: Diabetic

 

ShuffleNet Backbone

ViT Backbone

Metric

Random search

Cream

Pathology-NAS

Random search

AutoFormer

Pathology-NAS

Iterations ↓

500

120

10

500

300

10

GPT-4 API Calls

0

0

10

0

0

10

FLOPs ↓

246.32M

440.07M

240.25M

4.77G

1.28G

4.13G

Prec@1 (%) ↑

65.03 ± 0.59

70.31 ± 0.38

73.22 ± 0.34***

58.47 ± 0.57

67.62 ± 0.24

70.38 ± 0.22***

API Cost ($)

0.00

0.00

0.12

0.00

0.00

0.18

Latency (hrs)

0.000

0.000

0.001

0.000

0.000

0.001

ST (GPU hrs) ↓

10.90

1.43

1.16

22.76

11.24

6.12

TT (GPU hrs) ↓

10.900

1.430

1.161

22.760

11.240

6.121

  1. Results are averaged over 5 independent runs. The table presents a comprehensive breakdown of search costs (Iterations, GPT-4 API Calls, Latency (hrs), API Cost ($), ST (GPU hrs), TT (GPU hrs)) alongside key performance metrics (FLOPs, Prec@1 (%)) for Pathology-NAS compared with Random Search, Cream (for the ShuffleNet backbone), and AutoFormer (for the ViT backbone). Optimal values for performance and lower values for costs are typically highlighted in bold where applicable. Statistical significance of Pathology-NAS Prec@1 (%) performance compared to Random Search (assessed by independent two-sample Welch’s t tests) is denoted by: ***p < 0.001 (very highly significant). Prec@1: Top-1 accuracy.
  2. ST Search Time, TT Total Time, TT = ST + Latency, Prec@1 Top-1 accuracy.