Table 9 Preprocessing scripts to prepare datasets ingested into Petagraph.

From: Petagraph: A large-scale unifying knowledge graph framework for integrating biomolecular and biomedical data

Preprocess

Script

Creates Concept node and edge files for 4DN human chromosome loops based on 4DN dot call files and connects to HSCLO Concept nodes

4DN_LOOP.R

Creates Concept node and edge files for 4DN human chromosome Q values based on 4DN dot call files

4DN_Q.R

Creates nodes and edges files for the human embryonic heart single-cell marker data from Asp2019

ASP2019.ipynb

Creates edge files for ClinVar links between human genes, diseases and phenotypes

CLINVAR.R

Creates edge files for CMAP relationships between compounds to human genes

CMAP.R

Creates Concept node and edge files for TPM gene expression and eQTL data from the GTEx project.

GTEX.ipynb

Creates edge files GTEXCOEXP relationships between human genes

GTEXCOEXP.R

Creates edge files between HPO and HGNC from the HPO project.

HGNCHPO.ipynb

Creates edge files between HPO and MP concept nodes based on PhenKnowLator output.

HPOMP.ipynb

Creates edges between ENSEMBL and HSCLO

HSCLO_GENCODE.R

Workflows to process Kids First phenotype and genotype count data.

KF_main.ipynb

Creates edge files for L1000 relationships between compounds to human genes

L1000.R

Creates nodes and edges files for the mouse data from the IMPC

MPMGI.ipynb

Creates nodes and edges files for MSigDB linking genes to pathways

MSIGDB.R

Creates edge files for STRING relationships between human proteins (UniProt IDs)

STRING.R

  1. All scripts can be found under https://github.com/TaylorResearchLab/Petagraph/tree/main/Scientific_Data_2024/code/preprocessing/.