Fig. 4
From: Large-scale transformer-based topic graphs identify thematic links between engineering and biology

The filtering outcome for the 101 M abstracts dataset illustrates the pipeline’s step-by-step approach. The figure visualizes each stage of dataset refinement—from preprocessing through domain-specific classification—to arrive at a final subset of 126,012 abstracts focused on engineering and biology.