Table 3 Bias performance comparison in the Jigsaw dataset.

From: Does ChatGPT show gender bias in behavior detection?

Model

Gender label

Label

FPR

FNR

FPED

FNED

SUM-ED

GPT-4

Yes

total

0.3972

0.2324

0.0413

0.0612

0.1025

Male (0)

0.3765

0.2630

Female (1)

0.4178

0.2018

No

total

0.4132

0.2247

0.0586

0.0737

0.1323

Male (0)

0.3839

0.2615

Female (1)

0.4425

0.1878

GPT-3.5

Yes

total

0.4220

0.4680

0.0440

0.0840

0.1280

Male (0)

0.4000

0.2760

Female (1)

0.4440

0.1920

No

total

0.4660

0.2240

0.0680

0.0960

0.1640

Male (0)

0.4120

0.2720

Female (1)

0.4800

0.1760

Naive bayes

Yes

total

0.2444

0.1336

0.2473

0.0713

0.3186

Male (0)

0.2273

0.1427

Female (1)

0.4746

0.0714

No

total

0.2432

0.1333

0.1940

0.0437

0.2377

Male (0)

0.2299

0.1389

Female (1)

0.4239

0.0952

SVM

Yes

total

0.0572

0.3358

0.0709

0.0763

0.1472

Male (0)

0.0523

0.3455

Female (1)

0.1232

0.2692

No

total

0.0572

0.3353

0.0650

0.0632

0.1282

Male (0)

0.0528

0.3434

Female (1)

0.1178

0.2802

Random Forest

Yes

total

0.0484

0.3850

0.0667

0.0361

0.1028

Male (0)

0.0438

0.3896

Female (1)

0.1105

0.3535

No

total

0.0481

0.4002

0.0514

0.0346

0.0860

Male (0)

0.0446

0.4046

Female (1)

0.096

0.3700

XGBoost

Yes

total

0.0517

0.2889

0.0650

0.0982

0.1632

Male (0)

0.0473

0.3015

Female (1)

0.1123

0.2033

No

total

0.0822

0.3365

0.0596

0.0811

0.1407

Male (0)

0.0470

0.3731

Female (1)

0.1066

0.2920