Fig. 2: Using GGD data packages. | Nature Communications

Fig. 2: Using GGD data packages.

From: Go Get Data (GGD) is a framework that facilitates reproducible access to genomic data

Fig. 2

a Data recipe environment variables allow one to use the installed data files without needing to know where the files are stored or how to get them. For example, if one installed the grch38-coding-exons-ensembl-v1 and grch38-reference-genome-ensembl-v1 data packages, one could identify the complement between coding exons and a reference genome using each data file’s unique environment variable with the “bedtools complement” command. These environment variables allow one to perform any number of analyses with different bioinformatic tools or scripts. b Using the “get-files” command, one can perform the same analysis on coding exons as seen in panel a. With data package environment variables, one needs to be in the environment where the packages were installed in order to use them. Alternatively, the “get-files” command provides access to data files installed by GGD and stored in either the currently active conda environment or a different non-active conda environment. Accessing data files in different environments is supported by the “--prefix” argument. This allows a user to install and store all data packages in a single conda environment while being able to access them from any other environment where GGD is installed. GGD commands are in orange, environment variables that refer to GGD data package files are in blue.

Back to article page