Table 2 Averaged results with standard deviations on four domains, where \(d_1 \rightarrow d_2\) indicates the domain adaptation from \(d_1\) (domain of \(D_d\)) to \(d_2\) (domain of \(D_t\)).

From: Subset selection for domain adaptive pre-training of language model

Domain adaptation

Task dataset

Vanilla model

(w/o further pre-training)

RandomSet

SuzukiSet

AlignSet

FullSet

BioMed \(\rightarrow\) BioMed

RCT

85.71 ± 0.06

86.47 ± 0.08

86.39 ± 0.08

86.52 ± 0.04

-

CS \(\rightarrow\) CS

ACL-ARC

74.07 ± 5.60

75.71 ± 2.01

75.56 ± 2.48

76.15 ± 1.51

-

News \(\rightarrow\) News

AGNEWS

93.73 ± 0.19

93.90 ± 0.20

93.99 ± 0.06

94.12 ± 0.13

93.84 ± 0.13

HYPERPARTISAN

81.81 ± 4.67

84.84 ± 4.35

84.95 ± 1.26

86.45 ± 1.98

82.63 ± 2.10

Personality \(\rightarrow\) Personality

First Impressions V2

0.1128 ± 0.0013

0.1120 ± 0.0004

0.1121 ± 0.0004

0.1118 ± 0.0001

–

\(\dagger\) Personality \(\rightarrow\) Review

HELPFULNESS

68.40 ± 0.65

68.21 ± 0.39

68.19 ± 0.28

68.42 ± 0.46

–

IMDB

93.64 ± 0.01

93.32 ± 0.20

90.95 ± 0.24

93.35 ± 0.04

–

  1. We used macro-F1 for the classification tasks, but micro-F1 for the RCT dataset. Mean absolute error (MAE) was utilized for the First Impressions V2 dataset. \(\dagger\) indicates cross-domain adaptation.
  2. Significant values are in bold.