Table 1 Quantitative evaluation for camera-only multi-object tracking

From: Towards generalizable and interpretable three-dimensional tracking with inverse neural rendering

Training data unseen

Method

Car

Motorcycle

Modality

AMOTA ↑

AMOTP (m) ↓

Recall ↑

MOTA ↑

AMOTA ↑

AMOTP (m) ↓

Recall ↑

MOTA ↑

×

PF-Track

0.622

0.916

0.719

0.558

0.448

1.245

0.457

0.384

Camera

×

QTrack

0.692

0.753

0.760

0.596

0.531

1.098

0.861

0.500

Camera

×

QD-3DT

0.425

1.258

0.563

0.358

0.253

1.543

0.437

0.243

Camera

✓

QD-3DT (trained on WOD)

0.000

1.893

0.226

0.000

0.000

2.000

0.000

0.000

Camera

× (CP)

CenterTrack

0.202

1.195

0.313

0.134

0.011

1.636

0.141

0.033

Camera

✓ (CP)

AB3DMOT

0.387

1.158

0.506

0.284

0.254

1.549

0.360

0.232

Camera

✓ (CP)

Inverse neural rendering (ours)

0.402

1.213

0.521

0.315

0.244

1.479

0.389

0.220

Camera

  1. Bold entries denotes best and underlined second best scores for methods that did not train on the dataset or use the same detection backbone.