DICOM datasets for reproducible neuroimaging research across manufacturers and software versions

Rorden, Christopher; Béranger, Benoît; Cheng, Hu; Clemence, Matthew; Debacker, Clément; Fernandez, Brice; Halchenko, Yaroslav O.; Harms, Michael P.; Holla, Bharath; Innis, Isaiah; Kuijer, Joost P. A.; Levitas, Daniel; Litinas, Krisanne; Luci, Jeffrey; Newman-Norlund, Roger; Peltier, Scott; Rehwald, Wolfgang; Reid, Robert I.; Rogers, Baxter; Schwarz, Christopher G.; Shin, Jaemin; Ganesan, Venkatasubramanian; Ganji, Sandeep; Morgan, Paul S.

doi:10.1038/s41597-025-05503-w

Download PDF

Data Descriptor
Open access
Published: 09 July 2025

DICOM datasets for reproducible neuroimaging research across manufacturers and software versions

Christopher Rorden ORCID: orcid.org/0000-0002-7554-6142¹,
Benoît Béranger²,
Hu Cheng³,
Matthew Clemence⁴,
Clément Debacker⁵,
Brice Fernandez⁶,
Yaroslav O. Halchenko⁷,
Michael P. Harms⁸,
Bharath Holla⁹,
Isaiah Innis³,
Joost P. A. Kuijer ORCID: orcid.org/0000-0002-4181-0427¹⁰,
Daniel Levitas ORCID: orcid.org/0000-0003-2279-7447³,
Krisanne Litinas¹¹,
Jeffrey Luci^12,13,
Roger Newman-Norlund¹,
Scott Peltier¹¹,
Wolfgang Rehwald¹⁴,
Robert I. Reid¹⁵,
Baxter Rogers^16,17,18,19,
Christopher G. Schwarz ORCID: orcid.org/0000-0002-1466-8357¹⁵,
Jaemin Shin²⁰,
Venkatasubramanian Ganesan⁹,
Sandeep Ganji^15,21 &
…
Paul S. Morgan^22,23

Scientific Data volume 12, Article number: 1168 (2025) Cite this article

4114 Accesses
Metrics details

Subjects

Abstract

DICOM is an industry-standard for medical imaging data targeted at interoperability across systems. This enables transfer, storage and processing of imaging data regardless of the manufacturer. Pragmatically, manufacturers often store detailed acquisition parameters in private rather than public DICOM tags. In parallel, the DICOM standard itself has gradually evolved by introducing new public tags and properties to better capture emerging imaging technologies. Accurately extracting these details is essential for reproducible neuroimaging research. To address this need, we created a series of DICOM datasets illustrating how various manufacturers encode acquisition details that are critical for modern processing and analysis. These minimal test cases, covering CT and MR modalities, highlight manufacturer-specific conventions, including the use of public tags, private tags, and proprietary data structures. For each DICOM dataset, we provide corresponding NIfTI-formatted images with metadata JSON files following the BIDS standard, using consistent terminology to mitigate variations in how manufacturers encode acquisition details. Our repository provides validation datasets for any tool that is intended to extract acquisition details from medical imaging data.

A DICOM dataset for evaluation of medical image de-identification

Article Open access 16 July 2021

The R package for DICOM to brain imaging data structure conversion

Article Open access 04 October 2023

Reduction of NIFTI files storage and compression to facilitate telemedicine services based on quantization hiding of downsampling approach

Article Open access 02 March 2024

Background & Summary

Reproducibility is a critical challenge in neuroimaging research¹. Most analyses involve multiple stages of image processing and complex statistical modeling to mitigate noise and identify meaningful signals². These processes require precise knowledge of acquisition parameters, such as slice timing and phase encoding polarity. Studies aiming to aggregate data across sites must also address variability between scanners in order to ensure generalizability³. Consequently, neuroimaging researchers must be able to reliably extract details about the acquisition parameters. This task is facilitated by the Digital Imaging and Communications in Medicine (DICOM) standard⁴, which dominates medical imaging, promoting interoperability across tools and manufacturers. However, the rapid evolution of imaging technologies often outpaces consensus-based updates to the standard, leading manufacturers to use self-defined (“private”) metadata tags, which are, ideally, later integrated into the DICOM definition as standardized (“public”) tags. We provide a comprehensive collection of DICOM images spanning various manufacturers, modalities, and software versions to address this challenge. We also offer ground truth values for the imaging parameters that are crucial for reproducibility. These datasets enable tool developers to ensure robust and reproducible analyses of neuroimaging data.

Historically, each neuroimaging team used its own idiosyncratic method to provide sequence details for analysis with neuroimaging pipelines. The Brain Imaging Data Structure (BIDS)^5,6 provided a more standardized framework for organizing and describing imaging datasets, defining the imaging format (voxel intensity stored in NIfTI), imaging parameters (in human-readable JSON text files using manufacturer agnostic terminology), and file naming (providing hints for intention), as well as the relevant non-imaging details of an experiment (e.g., participant behavioral data and demographics). The BIDS format allows for automated analyses of datasets regardless of scale, aids reproducibility, and facilitates data sharing and reuse. While this intentionally constrained format is considerably simpler than DICOM, it is worth noting that since the source data has DICOM format, adopting the BIDS structure does not replace the arduous⁷ task of accurately extracting acquisition details from raw imaging data. Therefore, repositories that share neuroimaging data in BIDS format⁸ require data providers to extract acquisition details prior to sharing, while DICOM repositories^9,10 require data users to extract imaging parameters prior to analysis. These needs have become increasingly acute as public funding agencies expect scientists to share large datasets and clinical teams are aggregating huge datasets to empower precision medicine. We aim to support both approaches, providing validation datasets that ensure that imaging data and metadata can be determined from DICOM data regardless of scanner manufacturer and software version. While our team has focused on generating domain-specific BIDS/NIfTI formats, the resulting validation repositories also support efforts to extract standardized parameters across manufacturers for teams that choose to retain the DICOM format¹¹.

Prior work in this topic includes the seminal “Rosetta bit” project¹² which highlighted the importance of validation datasets in neuroimaging and providing gold-standard conversions of DICOM to NIfTI images. However, that project predated BIDS, and therefore while it provided a validation for voxel intensities and spatial properties, it did not provide sequence details crucial for image processing within a site and data harmonization across sites. Likewise, Rutherford and colleagues¹³ synthesized DICOM datasets to evaluate the performance of de-identification algorithms, but did not address acquisition details. Our datasets extend these traditions, providing updated resources to support modern interpretations of the DICOM standard in general, as well as the introduction of the enhanced DICOM format¹⁴.

Therefore, our overarching objective is to provide validation DICOM datasets that demonstrate how different manufacturers and different software versions store acquisition parameters. Our datasets include both the original DICOM datasets as well as the known solutions for critical acquisition parameters (using text files in the BIDS specification). Our datasets provide minimal test cases for understanding manufacturer-specific conventions, including private DICOM tags, and demonstrate edge cases that require careful interpretation. These repositories aid in developing and maintaining tools that read DICOM images like dcm2niix, dicm2nii and SPM¹⁵. By providing BIDS-compatible NIfTI images alongside standardized metadata, we offer a practical resource for improving data conversion and enhancing reproducibility.

Methods

We have assembled a collection of 36 distinct DICOM modules publicly available on Zenodo¹⁶ (Table 1; with mirrors on GitHub) designed to illustrate the diversity of images observed in the neuroimaging domain. Where possible, the datasets use low-resolution images and relatively few volumes with the aim of providing concise examples of specific use cases. This is in contrast to traditional research repositories where high spatial resolution and many observations are considered beneficial. We have curated our examples into specific repositories highlighting specific challenges. The rationale for each of these repositories is included in its “README.md” text file. A brief overview of the DICOM and BIDS methods for storing sequence information will provide context for the challenge of interpreting these validation datasets.

Table 1 DICOM modules with validated conversion to a harmonized terminology defined by the BIDS specification.

Full size table

While DICOM files can contain many classes of data (e.g. sounds, waveforms, text documents)⁴, here we focus on files that contain images. A DICOM image file contains both the image data (voxel intensity values) as well as a series of tags that describe the image acquisition. Each DICOM tag is defined as two 16-bit hexadecimal numbers referred to as “group” and “element”, commonly written as text in the form <gggg,eeee>. All even numbered groups refer to public tags that are defined by the DICOM specification. Some public tags are required for a specific image modality (for example, the numeric <0018,0081> “Echo Time” is required for MR images) while others are optional (e.g., the string <0018,1020> “Software Versions” is optional). In contrast, odd numbered groups are private tags that the manufacturer can define. In the same way that typed programming languages define variables as strings, integers or floating point numbers, each DICOM tag is associated with a ‘Value Representation’ (VR) that defines the type of the data. While most DICOM tags store variables of a fixed type (e.g., a string, or an array of integers), some manufacturers use the VR ‘Other Byte’ (OB) to store sequence details using their own proprietary formats. Classic DICOM objects typically only store a single 2D slice in each file. For 4D time series such as functional MRI and diffusion imaging, this can result in tens of thousands of files for a single imaging series. Some manufacturers (Siemens and United Imaging Healthcare) provide the option to save all slices from a 3D volume as tiles (‘mosaics’) in a single DICOM file, dramatically reducing the number of files per series (at the cost of non-compliance with the DICOM standard and requiring tools to de-interlace these 2D mosaics into a 3D volume). More recently, the ‘enhanced DICOM’ specification¹⁴ allows saving multi-frame data where entire 3D and 4D series are stored in a single DICOM file. While complex, the DICOM standard provides tremendous flexibility and can work across a broad range of medical images.

The BIDS specification stores computed tomography (CT) and magnetic resonance imaging (MRI) data via two files. The pixel data and spatial properties (e.g., slice angulation) are saved as a binary NIfTI format file (which itself contains its own limited metadata) while other acquisition meta data are stored in a human-readable text file in the JSON format of key:value pairs. For example, the numeric value ‘“EchoTime”: 0.03’ and the string value ‘“SoftwareVersions”: “syngo MR B17”’. BIDS stores 3D anatomical images as a single file, and most 4D functional and diffusion time-series as a single file (though note, for multi-echo time series, each echo is stored as a separate file). This format has a more limited scope and is more constrained than DICOM. For example, all slices in a 3D NIfTI volume must be equidistant, while the DICOM format allows for variable slice distances. Another example is that DICOM supports many formats for compressing the voxel data (referred to as transfer syntaxes), with many of these compression schemes essentially unique for medical imaging (e.g., lossless JPEG formats with 16-bit precision were not widely adopted outside DICOM). In contrast, BIDS images can either be compressed or use the old but ubiquitous gzip file-level compression. The human-readable nature of BIDS JSONs requires floating-point numbers to be stored as ASCII text, which can introduce rounding errors and differences in precision compared to binary representations. This limitation is particularly relevant for our validation datasets, as tools that use different conventions for storing values may detect minor discrepancies between their outputs and the provided reference values. As a result, a degree of tolerance may be necessary to distinguish meaningful errors from negligible differences in value representation.

Common terminology across manufacturers

Many terms are clearly defined by the DICOM standard and are unambiguous across manufacturers. Since both DICOM and the BIDS specification attempt to describe the parameters used to acquire imaging data, it is unsurprising that many DICOM public tags map directly to BIDS keys (Table 2). However, it is worth noting that different manufacturers interpret some public DICOM tags differently. For example, some MR sequences have both brief and long repetition times, and some manufacturers report the brief duration while others use the long duration for the public tag <0018, 0080> “Repetition Time”. In contrast, the BIDS standard attempts to disambiguate different intervals, hence the BIDS keys “RepetitionTime”, “RepetitionTimeExcitation” and “RepetitionTimePreparation”.

Table 2 Some DICOM tags have a one-to-one correspondence with BIDS fields.

Full size table

Manufacturer specific terminology and missing data

Some BIDS keys map onto private tags used by specific manufacturers (Table 3). The table presents private DICOM tags that map directly onto a single BIDS key:value pair. Manufacturers can also record information in public DICOM tags that are embedded inside private tags, with values that conflict with the public tag information stored at the root level (e.g., our ‘dcm_qa_philips’ repository demonstrates this with the public tag <0020,0032> “Image Position (Patient)”). In addition, as noted previously, some manufacturers use OB data chunks to encode variables using their own proprietary data structures. Furthermore, different manufacturers use different conventions to encode the spatial direction for the diffusion-sensitizing directions/vectors that accompany diffusion imaging (with some using world space and others using image space). Finally, the DICOM files generated by some manufacturers omit acquisition details that are needed by some BIDS-compliant pipelines. For example, Philips DICOM files do not provide the details required to populate the BIDS ‘SliceTiming’ array. This means that a user must either populate that information manually (for example, using ezBIDS¹⁷) or skip the slice time correction processing step that can improve statistical power in certain acquisition regimes¹⁸. Likewise, Philips DICOM files do not record the phase encoding polarity or readout-time parameters necessary for correcting spatial distortions in echo-planar acquisitions¹⁹. In addition to addressing variations in reporting data, validation datasets play a crucial role in identifying missing variables and errors in DICOM files. When data are absent, tools such as PET2BIDS²⁰ can supplement missing information to enable subsequent analyses. For both, this enables manufacturers to correct these issues and allows tools to flag problematic data. A notable example is the recent Siemens enhanced data, where features like slice timing and the MultibandAccelerationFactor were incorrectly specified, as documented in our dcm_xa_61 repository. Given that both BIDS and DICOM are evolving community standards, a comprehensive listing is beyond the scope of this work. Instead, we provide hyperlinks to the formal specifications and refer readers to the respective websites for Tables 2, 3, which are best suited to track ongoing updates.

Table 3 Some manufacturer private DICOM tags have one-to-one translations to BIDS terminology.

Full size table

While the public DICOM tags and defined BIDS fields aim to standardize common acquisition parameters, in practice, manufacturers may use proprietary values or settings that lack an unambiguous one-to-one mapping. Our diverse datasets are intended to illustrate both standardization and areas of divergence. For example, the ‘dcm_qa_cs_dl’ repository includes MRI series from Canon, Philips, and Siemens where deep learning–based image reconstruction is employed, with vendor-specific settings such as different sharpening levels (e.g., off, low, or high). Detecting these proprietary settings can help identify inter-site differences and support statistical modeling when acquisition parameters vary. Thus, while our primary goal is to provide examples of commonalities across manufacturers, the datasets also capture important instances of manufacturer-specific variability.

Third-party modifications

Additionally, some third-party DICOM PACS (Picture Archiving and Communication System) systems can modify each DICOM file they touch. This can introduce further challenges including renumbering of private tags, appending their own tags, or altering the compression scheme used for the voxel intensities (the transfer syntax). These modifications can make it more difficult to interpret the original manufacturer’s proprietary tags and may require additional troubleshooting or adjustments during the conversion process. Ensuring compatibility with such systems is an ongoing effort that underscores the importance of robust, adaptable conversion tools and detailed documentation. The ‘dcm_qa_ts’ repositories provide concrete examples of these situations.

Data Records

All datasets are available from the dcm_validate master repository, which links to each validation dataset as a Git submodule. The repository is archived at Zenodo (https://doi.org/10.5281/zenodo.15310934)¹⁶. Our validation dataset has a hierarchical structure, with specific edge cases (manufacturer, software, modality) illustrated in independent modular repositories. Table 1 provides a list of the repositories and the URL for data citation. Each of the repositories follows a regular structure. Specifically, the DICOM files are in the “In” folder, with the validated reference NIfTI format images and BIDS text files in the “Ref” folder. Also in the root folder is the “README.md” text file that describes the rationale for the repository, with each repository showcasing unique DICOM properties. The “LICENSE” file details the permissive license used for distribution. All repositories also include the shell script “batch.sh” that will invoke the user’s installed instance of dcm2niix to generate a new set of images in a folder named “Out” and to report any differences between these and the files provided in the “Ref” folder.

Each repository listed in Table 1 is structured as a standalone dataset with the following top-level folders and files, as shown

In/: Contains the original DICOM files in.dcm format. Files are organized by series and maintain the original filenames as exported from the scanner or PACS.
- Ref/: Contains validated reference files in BIDS-compliant format:.nii: NIfTI-formatted imaging data.json: Metadata sidecars with key acquisition parameters (e.g., RepetitionTime, EchoTime, PhaseEncodingDirection, etc.).bvec/.bval: FSL format gradient directions and magnitude for diffusion images.
README.md: Describes the goal of the repository, key characteristics of the dataset, and instructions for running the validation.
LICENSE: Declares the BSD 2-Clause license, with a dual-license option under CC BY 4.0 for image reuse.
batch.sh: A shell script that calls the user’s local dcm2niix installation to convert In/ to Out/, and compares the results against Ref/.

Some repositories also include domain-specific resources:

*.xlsx: Spreadsheet with parameter sweeps or timing calculations (e.g., dcm_qa).
slicetime.cpp: Minimal C source for slice timing validation (dcm_qa_ge).
*.bval, *.bvec: Diffusion validation code and files (e.g. dcm_qa_dwi).

Folder and file names are consistent across repositories to facilitate automation. Users can expect the same script (batch.sh) and README format in each repository, regardless of modality or manufacturer. The primary variable across repositories is the DICOM content: different manufacturers, modalities, or edge cases (as described in the Comments column of Table 1).

The goal of these repositories is to demonstrate DICOM implementations, and therefore there was an explicit emphasis on low resolution images with a small number of observations. Many of the repositories include phantoms (water bottles). Some modalities (in particular, diffusion and arterial spin labeling) benefit from images of the human brain to allow proper validation. In these cases, data was acquired from the co-authors with the explicit knowledge that these would be shared on public repositories. These images also include generalized demographic details required to estimate specific absorption rate (SAR) such as age, height and weight. The Institutional Review Board (IRB) at the University of South Carolina determined that IRB review was not required for this project, as it does not meet the criteria for “research” as defined under 45 CFR 46.102(l), given that the data from each repository pertains to a single individual and are not generalizable to broader populations. This determination aligns with the intent of the project, which is to provide high-quality validation resources for the neuroimaging community rather than to draw inferences about human health or behavior.

This simple structure allows developers to ensure consistent results across different versions of dcm2niix, including builds compiled with different toolchains, targeting various architectures, or linked against different libraries, all of which can introduce variability in output^21,22. While these repositories were originally developed to support automated regression testing of dcm2niix¹⁵, portions have since been adopted to assist development and validation in other tools, including nibabel²³, divest for r²⁴, mriconvert¹⁵, SPM¹⁵, FreeSurfer’s mri_convert²⁵, orthanc-neuro²⁶, and dicm2nii¹⁵.

Several of the repositories contain additional files that help developers extract the correct results. For example, ‘dcm_qa’ provides an Excel format spreadsheet that demonstrates how in-plane acceleration factor, partial Fourier and other parameters were varied, providing the formulas to infer total readout time. Likewise, the repository ‘dcm_qa_ge’ provides a minimal C program (slicetime.cpp) validated by General Electric (GE) engineers for deriving parameters. Another example is ‘dcm_qa_sag’, which includes Python scripts to generate validation tensor files and bitmap images used to confirm the correct definition of the diffusion gradient directions. In all these cases, these supplemental files are described in the “README.md” file of the repository.

Most of the repositories in Table 1 focus on the Magnetic Resonance (MR) modality, reflecting its versatility in generating diverse contrasts and its popularity among researchers due to the absence of ionizing radiation exposure for participants. The repository ‘dcm_qa_ct’ provides examples of computed tomography (CT), highlighting unique features of this modality, such as gantry tilt (leading to shear in 3D volumes) and variable inter-slice distances.

As shown in Table 1, each validation repository is modular, following the same structure. Typically, each repository is designed to showcase a specific edge case, as indicated by the ‘Comments’ section in Table 1, as well as the individual README.md files within each repository. The README file provides the rationale for each repository. We envision the number of repositories growing to document the evolving development of DICOM usage. Zenodo provides a master repository (‘dcm_validate’) that includes all sub-modules. This provides a single starting location for all of the modules.

In general, most validation repositories provide DICOM images acquired directly from the scanner without modification. However, there are exceptions where images were exported through a local Picture Archiving and Communication System (PACS), and these images can be identified by inspecting the Implementation Version Name (0002,0013) tag. No uniform de-identification tool or configuration was applied across datasets; rather, each dataset reflects the local practices at the institution where the data were acquired. We emphasize that private attributes critical for acquisition metadata were retained.

Technical Validation

Each dataset includes reference BIDS format text files that have been meticulously checked to ensure accurate correspondence with the DICOM information. As these have been made publicly available, users have been able to identify limitations and extend the content of the JSON files as the BIDS specification gets extended and our understanding of the manufacturer’s own interpretation of the DICOM standard and their use of proprietary DICOM tags improves. Any individual can use a GitHub issue to make a suggestion for how these repositories can be enhanced (with our dcm2niix repository already listing 931 closed issues that describe enhancements, feature requests, and limitations). Therefore, these repositories provide a method for the community to work collaboratively to ensure robust data conversion.

Although every effort has been made to ensure accuracy, certain limitations are inherent in interpreting vendor-specific attributes without formal documentation. In the absence of public manufacturer-issued conformance statements, extracted metadata were manually inspected and, when possible, verified with input from manufacturer engineers. By remaining open-source, these datasets invite community-driven validation and represent the most complete publicly available harmonization effort.

To ensure the validity and quality of our own derived datasets, we initially used the provided batch.sh script to generate a reference conversion using dcm2niix, with outputs saved in the Ref folder. The resulting BIDS datasets were then evaluated using the bids-validator to confirm compliance. We manually inspected all fields to ensure that required metadata were either correctly populated or documented as unavailable in the source DICOMs. For each new release of dcm2niix, these repositories are revalidated to ensure identical results; any discrepancies are reviewed manually to determine whether they reflect an unintended change or a meaningful improvement. Sharing these datasets publicly has allowed the broader neuroimaging community—including developers of related tools—to provide feedback and help verify that the curated outputs are both accurate and comprehensive Fig. 1.

Usage Notes

The validation repositories with DICOM files and their corresponding validated BIDS/NIfTI reference files are publicly available on Zenodo¹⁶. Once downloaded, each module includes a script “batch.sh” BASH command line script can be executed. This will use the version of dcm2niix in the user’s path to convert all of the DICOM files in the “In” folder to a new folder “Out” and then test that all of the files in the “Out” folder match the reference files in the “Ref” folder. Binary NIfTI images are tested for identical matches, while text-based JSON files are compared key by key, with any discrepancies reported.

To help users navigate the growing collection of validation datasets, we include a Python script (catalog_datasets.py) that catalogs series-level DICOM metadata across all submodules. This script is designed to run in two stages: the first pass scans all available BIDS JSON files (typically one file per DICOM series) and generates a catalog_fields.txt file listing all encountered fields. Users can then edit this file to select a subset of relevant fields (e.g., Manufacturer, PatientAge, EchoTime). A second run of the script uses the customized field list to generate a comma-separated values (CSV) table summarizing the selected metadata across all datasets. This utility facilitates dataset discovery and supports tool developers in identifying representative series for specific testing scenarios.For example, at the time of writing, the field “Manufacturer” appeared in 427 JSON files: Siemens (210), GE (90), Philips (70), Canon (45), Toshiba (5), and UIH (5). Similarly, the “MagneticFieldStrength” field identified 12 series acquired at 1.5 T, 391 at 3 T, and 14 at 7 T, with the remainder corresponding to CT acquisitions. This cataloging script provides a flexible way to search the entire family of repositories, enabling users to identify datasets with specific acquisition properties or scanner configurations without downloading each repository individually.

Code availability

All repositories are publicly available with URLs for each dataset listed in Table 1. Each repository is available using a permissive open source license with details described in each repository’s “LICENSE” file. Users can make suggestions directly from any of the repository web pages by generating an “Issue”.

References

Vogt, N. Reproducibility in MRI. Nat Methods 20, 34, https://doi.org/10.1038/s41592-022-01737-3 (2023).
Article CAS PubMed Google Scholar
Smith, S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23(Suppl 1), S208–19, https://doi.org/10.1016/j.neuroimage.2004.07.051 (2004).
Article PubMed Google Scholar
Yu, M. et al. Statistical harmonization corrects site effects in functional connectivity measurements from multi-site fMRI data. Hum Brain Mapp 39, 4213–4227, https://doi.org/10.1002/hbm.24241 (2018).
Article PubMed PubMed Central Google Scholar
Bidgood, W. D. Jr, Horii, S. C., Prior, F. W. & Van Syckle, D. E. Understanding and using DICOM, the data interchange standard for biomedical imaging. J Am Med Inform Assoc 4, 199–212, https://doi.org/10.1136/jamia.1997.0040199 (1997).
Article PubMed PubMed Central Google Scholar
Gorgolewski, K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data 3, 160044, https://doi.org/10.1038/sdata.2016.44 (2016).
Article PubMed PubMed Central Google Scholar
Poldrack, R. A. et al. The past, present, and future of the brain imaging data structure (BIDS). Imaging Neurosci (Camb) 2, 1–19, https://doi.org/10.1162/imag_a_00103 (2024).
Article PubMed Google Scholar
Horien, C. et al. A hitchhiker’s guide to working with large, open-source neuroimaging datasets. Nat Hum Behav 5, 185–193, https://doi.org/10.1038/s41562-020-01005-4 (2021).
Article PubMed Google Scholar
Markiewicz, C. J. et al. The OpenNeuro resource for sharing of neuroscience data. Elife 10, https://doi.org/10.7554/eLife.71774 (2021).
Petersen, R. C. et al. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology 74, 201–209, https://doi.org/10.1212/WNL.0b013e3181cb3e25 (2010).
Article PubMed PubMed Central Google Scholar
Herz, C. et al. An Open Source Library for Standardized Communication of Quantitative Image Analysis Results Using DICOM. Cancer Res 77, e87–e90, https://doi.org/10.1158/0008-5472.CAN-17-0336 (2017).
Article CAS PubMed PubMed Central Google Scholar
Fedorov, A. et al. DICOM for quantitative imaging biomarker development: a standards based approach to sharing clinical data and structured PET/CT analysis results in head and neck cancer research. PeerJ 4, e2057, https://doi.org/10.7717/peerj.2057 (2016).
Article PubMed PubMed Central Google Scholar
Yvernault, B. C. et al. Validating DICOM transcoding with an open multi-format resource. Neuroinformatics 12, 615–617, https://doi.org/10.1007/s12021-014-9230-9 (2014).
Article PubMed PubMed Central Google Scholar
Rutherford, M. et al. A DICOM dataset for evaluation of medical image de-identification. Sci Data 8, 183, https://doi.org/10.1038/s41597-021-00967-y (2021).
Article PubMed PubMed Central Google Scholar
Clunie, D. & Erickson, B. J. The new enhanced multiframe CT and MR DICOM objects. in (Proc. 2005 Society for Computer Applications in Radiology Annual Meeting, Orlando, FL, 2005).
Li, X., Morgan, P. S., Ashburner, J., Smith, J. & Rorden, C. The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J. Neurosci. Methods 264, 47–56, https://doi.org/10.1016/j.jneumeth.2016.03.001 (2016).
Article PubMed Google Scholar
Rorden, C. neurolabusc/dcm_validate: Initial release. https://doi.org/10.5281/zenodo.15310934.
Levitas, D. et al. ezBIDS: Guided standardization of neuroimaging data interoperable with major data archives and platforms. Sci Data 11, 179, https://doi.org/10.1038/s41597-024-02959-0 (2024).
Article PubMed PubMed Central Google Scholar
Sladky, R. et al. Slice-timing effects and their correction in functional MRI. Neuroimage 58, 588–594, https://doi.org/10.1016/j.neuroimage.2011.06.078 (2011).
Article PubMed Google Scholar
Andersson, J. L. R., Skare, S. & Ashburner, J. How to correct susceptibility distortions in spin-echo echo-planar images: application to diffusion tensor imaging. Neuroimage 20, 870–888, https://doi.org/10.1016/S1053-8119(03)00336-7 (2003).
Article PubMed Google Scholar
Galassi, A. et al. PET2BIDS: a library for converting Positron Emission Tomography data to BIDS. J Open Source Softw 9, https://doi.org/10.21105/joss.06067 (2024).
Renton, A. I. et al. Neurodesk: an accessible, flexible and portable data analysis environment for reproducible neuroimaging. Nat Methods 21, 804–808, https://doi.org/10.1038/s41592-023-02145-x (2024).
Article CAS PubMed PubMed Central Google Scholar
Rorden, C. et al. niimath and fslmaths: replication as a method to enhance popular neuroimaging tools. Apert Neuro 4, https://doi.org/10.52294/001c.94384 (2024).
Brett, M. et al. Nipy/nibabel: 5.3.1. https://doi.org/10.5281/ZENODO.591597 (Zenodo, 2024).
Clayden, J. & Rorden, C. Divest: Get images out of DICOM format quickly. CRAN: Contributed Packages The R Foundation https://doi.org/10.32614/cran.package.divest (2016).
Greve, D. N. & Fischl, B. Accurate and robust brain image alignment using boundary-based registration. Neuroimage 48, 63–72, https://doi.org/10.1016/j.neuroimage.2009.06.060 (2009).
Article PubMed Google Scholar
Jodogne, S. The Orthanc ecosystem for medical imaging. J. Digit. Imaging 31, 341–352, https://doi.org/10.1007/s10278-018-0082-y (2018).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This project has been supported by the National Institutes of Health (P50-DC014664; RF1-MH133701). Typical of open source projects, we appreciate the contributions from a wide community. In particular, we note the contributions of Shan C Young.

Author information

Authors and Affiliations

McCausland Center for Brain Imaging, Department of Psychology, University of South Carolina, Columbia, SC, 29208, USA
Christopher Rorden & Roger Newman-Norlund
CENIR, Paris Brain Institute - ICM, Hôpital Pitié-Salpêtrière de Sorbonne Université, Paris, France
Benoît Béranger
Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, 47405, USA
Hu Cheng, Isaiah Innis & Daniel Levitas
Philips Healthcare, Farnborough, UK
Matthew Clemence
Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-Brain team, 75014, Paris, France
Clément Debacker
GE HealthCare, Buc, France
Brice Fernandez
Department of Psychological and Brain Sciences, Department of Computer Science, Hanover, NH, USA
Yaroslav O. Halchenko
Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
Michael P. Harms
Department of Psychiatry, Integrative Medicine, National Institute of Mental Health and Neuro-Sciences (NIMHANS), Bengaluru, India
Bharath Holla & Venkatasubramanian Ganesan
Radiology & Nuclear Medicine Amsterdam UMC, Amsterdam, The Netherlands
Joost P. A. Kuijer
Functional MRI Laboratory, Department of Radiology, University of Michigan, Ann Arbor, MI, USA
Krisanne Litinas & Scott Peltier
Center for Advanced Human Brain Imaging Research, Rutgers University, Piscataway, NJ, USA
Jeffrey Luci
Department of Psychiatry, Rutgers Robert Wood Johnson Medical School, Piscataway, NJ, USA
Jeffrey Luci
Siemens Medical Solutions USA, Inc., Malvern, PA, USA
Wolfgang Rehwald
Department of Radiology, Mayo Clinic, Rochester, MN, 56001, USA
Robert I. Reid, Christopher G. Schwarz & Sandeep Ganji
Vanderbilt University Institute of Imaging Science, Nashville, TN, 37232, USA
Baxter Rogers
Vanderbilt University Medical Center Department of Radiology and Radiological Sciences, Nashville, TN, 37232, USA
Baxter Rogers
Vanderbilt University Department of Biomedical Engineering, Nashville, TN, 37232, USA
Baxter Rogers
Vanderbilt University Medical Center Department of Psychiatry and Behavioral Sciences, Nashville, TN, 37232, USA
Baxter Rogers
GE Healthcare, GE Healthcare, New York, NY, USA
Jaemin Shin
Philips, Cambridge, MA, 02142, USA
Sandeep Ganji
School of Medicine, University of Nottingham, Nottingham, UK
Paul S. Morgan
NIHR Nottingham Biomedical Research Centre, Nottingham, UK
Paul S. Morgan

Authors

Christopher Rorden
View author publications
Search author on:PubMed Google Scholar
Benoît Béranger
View author publications
Search author on:PubMed Google Scholar
Hu Cheng
View author publications
Search author on:PubMed Google Scholar
Matthew Clemence
View author publications
Search author on:PubMed Google Scholar
Clément Debacker
View author publications
Search author on:PubMed Google Scholar
Brice Fernandez
View author publications
Search author on:PubMed Google Scholar
Yaroslav O. Halchenko
View author publications
Search author on:PubMed Google Scholar
Michael P. Harms
View author publications
Search author on:PubMed Google Scholar
Bharath Holla
View author publications
Search author on:PubMed Google Scholar
Isaiah Innis
View author publications
Search author on:PubMed Google Scholar
Joost P. A. Kuijer
View author publications
Search author on:PubMed Google Scholar
Daniel Levitas
View author publications
Search author on:PubMed Google Scholar
Krisanne Litinas
View author publications
Search author on:PubMed Google Scholar
Jeffrey Luci
View author publications
Search author on:PubMed Google Scholar
Roger Newman-Norlund
View author publications
Search author on:PubMed Google Scholar
Scott Peltier
View author publications
Search author on:PubMed Google Scholar
Wolfgang Rehwald
View author publications
Search author on:PubMed Google Scholar
Robert I. Reid
View author publications
Search author on:PubMed Google Scholar
Baxter Rogers
View author publications
Search author on:PubMed Google Scholar
Christopher G. Schwarz
View author publications
Search author on:PubMed Google Scholar
Jaemin Shin
View author publications
Search author on:PubMed Google Scholar
Venkatasubramanian Ganesan
View author publications
Search author on:PubMed Google Scholar
Sandeep Ganji
View author publications
Search author on:PubMed Google Scholar
Paul S. Morgan
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors reviewed and contributed to the manuscript. CR developed the concept of validation datasets, and was the initial author of the script files, “readme” files, and was the author for the original draft of the manuscript. All other members were involved with acquiring validation datasets from their centers and validating the accurate conversion for their datasets.

Corresponding author

Correspondence to Christopher Rorden.

Ethics declarations

Competing interests

Some authors of this manuscript are employed by imaging equipment manufacturers, specifically Philips (MC, SG), General Electric (BF, JS), and Siemens (WR). Their contributions were made in the interest of promoting transparency and reproducibility in scientific research. These individuals provided technical insights that facilitate the interpretation of vendor-specific attributes and support the broader community’s efforts to harmonize metadata extraction across manufacturers. No commercial products are promoted herein. These contributions are intended to encourage the alignment of vendor practices with evolving open standards, rather than maintain the status quo. The rest of the authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rorden, C., Béranger, B., Cheng, H. et al. DICOM datasets for reproducible neuroimaging research across manufacturers and software versions. Sci Data 12, 1168 (2025). https://doi.org/10.1038/s41597-025-05503-w

Download citation

Received: 16 December 2024
Accepted: 01 July 2025
Published: 09 July 2025
Version of record: 09 July 2025
DOI: https://doi.org/10.1038/s41597-025-05503-w