Table 2 TREC-COVID results.

From: COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization

 

Score

Team rank

Score

Team rank

Round 1

All submissions (144)

Automatic submissions (102)

 

All pairs (1.53M)

Judged pairs (8691)

Bpref

0.5176

2

0.5176

1

MAP

0.2401

13

0.4870

1

P@5

0.6333

19

0.8267

1

P@10

0.5567

21

0.7933

1

nDCG@10

0.5445

13

0.7233

1

Round 2

All submissions (136)

Automatic submissions (73)

 

All pairs (2.20M)

Judged pairs (12,037)

Bpref

0.5402

2

0.5232

1

MAP

0.3487

1

0.5138

1

P@5

0.8000

3

0.8171

1

P@10

0.7200

3

0.7629

1

nDCG@10

0.6996

1

0.7247

1

Round 3

All submissions (79)

Automatic submissions (32)

 

All pairs (5.14M)

Judged pairs (12,713)

Bpref

0.5665

7

0.5665

1

MAP

0.3182

7

0.5385

1

P@5

0.7800

14

0.8200

2

P@10

0.7600

12

0.7850

2

nDCG@10

0.6867

12

0.7065

2

Round 4

All submissions (72)

Automatic submissions (28)

 

All pairs (7.10M)

Judged pairs (13,262)

Bpref

0.5887

7

0.5887

3

MAP

0.3436

10

0.5653

3

P@5

0.8222

14

0.8222

5

P@10

0.7978

12

0.8133

4

nDCG@10

0.7391

12

0.7449

6

Round 5

All submissions (126)

Automatic submissions (49)

 

All pairs (9.56M)

Judged pairs (23,151)

Bpref

0.5253

13

0.5253

3

MAP

0.3089

14

0.4884

3

P@5

0.8760

13

0.876

3

P@10

0.8260

15

0.842

3

nDCG@10

0.7488

16

0.7567

4

  1. Performance evaluation of the COVID-19 search engine on the five rounds of the TREC-COVID challenge dataset. Two contexts are considered. Context 1 (columns “All submissions, All pairs”) considers our search engine performance against all search engines—manual, feedback, and automatic engines—using both annotated and non-annotated topic-document pairs. Context 2 (“Automatic submissions, Judged pairs”) considers our search engine performance strictly against those in its class—automatic search engines, using topic–document pairs annotated by experts.