Figure 2: The process of data acquisition, curation, adjustment, reformatting and modeling. | Nature Genetics

Figure 2: The process of data acquisition, curation, adjustment, reformatting and modeling.

From: Developing predictive molecular maps of human disease through community-based modeling

Figure 2

Data flows into the repository from a number of different sources (examples are shown). Individual datasets typically contain different types of data and are submitted in various formats. Curation involves reformatting the data into a common tab-delimited text matrix format. This curated standard format is available for download and allows for the development of workflows for common manipulations (for example, adjustments for technical covariates, such as gene expression array batch). The 'curated and adjusted' dataset is also available for download. Data analysts or modelers may use the curated data or the curated and adjusted data for downstream analyses; the key feature is that the version of the dataset that is used for an analysis, as well as the underlying code and workflow, is stored. Allowing different types of users to interact with the data at different points in the process has advantages. For example, providing tools to enable the curation of a dataset into a standard format provides the user with the benefit of easy curation and opens up tools for downsteam quality control and analysis.

Back to article page