Fig. 5: Common sequence-based language modeling architectures: Bert68 as an example.
From: Application of Artificial Intelligence In Drug-target Interactions Prediction: A Review

Bert’s model, in order from top to bottom, encodes the input sequence into textual and positional information to obtain absolute and relative positional information. Then masked language modeling is performed to predict the next sentence to handle the sentence-level classification task.