Fig. 1: Overview of the creation and use of GGD data recipes. | Nature Communications

Fig. 1: Overview of the creation and use of GGD data recipes.

From: Go Get Data (GGD) is a framework that facilitates reproducible access to genomic data

Fig. 1

a GGD creates a data recipe from a Bash script, which defines the steps taken to access, process, and curate the desired data files. (1) The “ggd make-recipe” command incorporates the Bash script and additional auto-generated files into a complete data recipe. (2) The “ggd check-recipe” command executes required tests and validates the created data recipe. (3) Once a data recipe has been tested and validated, it can be added to the GGD data recipe repository on GitHub. (4) Each data recipe is further tested via an automatic continuous integration system. If validated, the recipe is transitioned into a data package, which is added to the Anaconda Cloud and the resulting data files are cached on AWS storage. b Validated data packages can be found via the GGD command-line interface. For example, to find all data packages associated with “grch38” or “hg38” and the keyword “cpg” one would use “ggd search” with “grch38”, “hg38”, and “cpg” as search terms. GGD will identify and return all data packages within the GGD library that are associated with the search terms provided. c The desired data package is installed via the “ggd install” command. If the data files are cached, they are downloaded directly. If the data package must be built from the recipe, GGD follows the instructions within the recipe while accounting for both software and data dependencies. Installation ends with tracking the version of the installed data package and the creation of local environment variables that facilitate the use of installed data packages. GGD commands are in orange, GGD data packages are in blue.

Back to article page