Table 4 Impact of ToolAgent on diagnostic performance

From: Early diagnosis of axial spondyloarthritis in primary care using multi-agent systems

Models

Dataset

UNSURE rate

Sensitivity without UNSURE

(95% CI)

Specificity without UNSURE

(95% CI)

Accuracy without UNSURE

(95% CI)

Accuracy with UNSURE

(95% CI)

F1-score without UNSURE

(95% CI)

Mean Time/s

Agents without ToolAgent

Training

Dataset

0.0279

0.9300

(0.8967–0.9575)

0.7717

(0.6829–0.8515)

0.8883

(0.8567–0.9198)

0.8635

(0.8280–0.898)

0.9246

(0.900–0.9472)

55.8

Agents with ToolAgent

0.0279

0.9228

(0.8884–0.9533)

0.8667

(0.7975–0.9334)

0.9083

(0.8739–0.9398)

0.8830

(0.8530–0.9100)

0.9373

(0.9149–0.9579)

47.9

Agents without ToolAgent

Validation

Dataset

0.0538

0.9380

(0.883–0.973)

0.7234

(0.5908–0.8421)

0.8807

(0.8295–0.9261)

0.8333

(0.7680–0.8920)

0.9202

(0.8794–0.9498)

55.8

Agents with ToolAgent

0.0323

0.8615

(0.7969–0.9174)

0.8000

(0.6800–0.9057)

0.8444

(0.7889–0.9000)

0.8172

(0.7560–0.8760)

0.8889

(0.8413–0.9242)

47.9

Agents without ToolAgent

Testing

Dataset

0.0000

0.9062

(0.8077–1.0000)

0.7368

(0.5238–0.9287)

0.8431

(0.7451–0.9412)

/

0.8788

(0.7869–0.9552)

54.4

Agents with ToolAgent

0.0000

0.9375

(0.8485–1.0000)

0.7368

(0.5263–0.9375)

0.8627

(0.7647–0.9608)

/

0.8955

(0.8136–0.9677)

56.3

  1. The bold values indicates the models with the best performance.