Fig. 1

For drug nodes, SMILES molecular structures are extracted and converted into MACCS molecular fingerprints, followed by dimensionality reduction using principal component analysis (PCA)22 to obtain low-dimensional vector representations. For protein nodes, amino acid compositions and dipeptide frequencies are computed from their sequences and similarly reduced via PCA. For side effect and disease nodes, which lack inherent structural or sequential information, the Node2vec algorithm23 is adopted to generate embeddings. Specifically, a heterogeneous biomedical network is constructed based on known interactions, including drug–drug interactions, drug–disease associations, drug–side effect associations, protein–protein interactions, and protein–disease associations. Node2vec is applied to this network to capture the contextual semantics of side effect and disease entities through biased random walks.