Table 1 Results detailing the performance of the SVM, SSAST and BNN models on the nine evaluation tasks for each of the four audio modalities: sentence, three coughs, cough and exhalation

From: Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

Train

Standard (9,379+ 16,518)

Match (2,599+ 2,599)

Random (20,000+ 37,665)

Test

Standard (3,820+ 7,301)

Match (907+ 907)

Long (10,315+ 20,509)

Long match (2,098+ 2,098)

Standard (3,820+ 7,301)

Match (907+ 907)

Long (10,315+ 20,509)

Long match (2,098+ 2,098)

Random (3,514+ 6,663)

Sentence

SVM

UAR

0.669

0.566

0.699

0.570

0.658

0.567

0.646

0.579

0.721

ROC

0.732

0.596

0.766

0.591

0.714

0.600

0.693

0.597

0.796

PR

0.578

0.574

0.625

0.580

0.553

0.583

0.515

0.576

0.686

SSAST

UAR

0.733

0.594

0.739

0.583

0.692

0.602

0.666

0.572

0.763

ROC

0.800

0.619

0.818

0.621

0.760

0.635

0.732

0.604

0.846

PR

0.684

0.594

0.715

0.594

0.631

0.626

0.590

0.579

0.774

BNN

UAR

0.685

0.586

0.702

0.566

0.703

0.604

0.687

0.581

0.702

ROC

0.776

0.623

0.804

0.614

0.767

0.634

0.749

0.610

0.834

PR

0.645

0.613

0.689

0.593

0.634

0.629

0.619

0.593

0.752

Three coughs

SVM

UAR

0.669

0.555

0.694

0.541

0.635

0.539

0.639

0.550

0.713

ROC

0.727

0.568

0.759

0.558

0.684

0.560

0.688

0.568

0.782

PR

0.570

0.550

0.605

0.538

0.523

0.553

0.510

0.546

0.647

SSAST

UAR

0.681

0.555

0.696

0.551

0.652

0.546

0.662

0.555

0.725

ROC

0.750

0.577

0.781

0.569

0.714

0.571

0.723

0.568

0.809

PR

0.607

0.553

0.648

0.552

0.563

0.557

0.561

0.557

0.701

BNN

UAR

0.678

0.558

0.696

0.551

0.657

0.558

0.660

0.535

0.716

ROC

0.751

0.578

0.786

0.578

0.713

0.578

0.720

0.558

0.807

PR

0.601

0.550

0.647

0.556

0.551

0.554

0.563

0.551

0.691

Cough

SVM

UAR

0.648

0.536

0.685

0.540

0.633

0.541

0.638

0.538

0.695

ROC

0.712

0.544

0.748

0.550

0.687

0.559

0.692

0.559

0.763

PR

0.559

0.526

0.594

0.535

0.533

0.550

0.521

0.545

0.625

SSAST

UAR

0.681

0.545

0.690

0.541

0.638

0.528

0.640

0.543

0.702

ROC

0.742

0.561

0.768

0.559

0.692

0.552

0.692

0.560

0.790

PR

0.603

0.540

0.631

0.548

0.535

0.545

0.532

0.550

0.675

BNN

UAR

0.647

0.540

0.661

0.534

0.618

0.532

0.638

0.541

0.672

ROC

0.732

0.570

0.765

0.563

0.682

0.542

0.698

0.556

0.786

PR

0.581

0.556

0.621

0.549

0.511

0.526

0.522

0.541

0.678

Exhalation

SVM

UAR

0.600

0.523

0.639

0.544

0.587

0.528

0.585

0.529

0.653

ROC

0.646

0.555

0.690

0.559

0.618

0.541

0.621

0.550

0.712

PR

0.477

0.560

0.513

0.547

0.444

0.536

0.431

0.543

0.566

SSAST

UAR

0.649

0.553

0.663

0.558

0.593

0.531

0.588

0.531

0.660

ROC

0.701

0.581

0.725

0.580

0.653

0.552

0.644

0.556

0.750

PR

0.563

0.578

0.575

0.561

0.496

0.548

0.473

0.549

0.634

BNN

UAR

0.576

0.529

0.581

0.526

0.603

0.525

0.601

0.541

0.608

ROC

0.683

0.569

0.722

0.578

0.679

0.570

0.675

0.567

0.744

PR

0.539

0.581

0.573

0.563

0.519

0.573

0.507

0.551

0.620

  1. The metrics corresponding to the highest performance for each of the 18 (evaluation procedure, test set) pairs (that is, for each pair in {UAR, ROC, PR} × {standard, match, long, long match, random}) across all modalities and models, are bolded. Each training and test set is shown with the corresponding support of individuals who are COVID+ and COVID. ROC, ROC–AUC; PR, PR–AUC; UAR, unweighted average recall.