Fig. 2
From: Subset selection for domain adaptive pre-training of language model

The overview of AlignSet, where \(\hat{D}_d\) is a subset of \(D_d\), \(\hat{M}_d\) is a domain-adaptive language model, \(V_d\) and \(V_g\) are embeddings generated by \(M_d\) and \(M_g\), respectively. And \(U_d\) and \(U_g\) refer to the embeddings projected into the unified embedding space via the alignment layer.