Fig. 1: Overview of the PGAP2 workflow.

a The flowchart depicts four sequential stages of the PGAP2 pipeline, arranged from top to bottom: data preprocessing, quality control, gene clustering, and postprocessing analysis. b The core algorithm of PGAP2 begins by constructing an identity network and a synteny map, which serve as the foundational data structures for ortholog inference. Following regional refinement, PGAP2 iteratively merges nodes based on gene cluster diversity, connectivity, and Bidirectional Best Hit (BBH) criteria. The abbreviation “sp.” in this figure refers to “species”.