Abstract
DICOM is an industry-standard for medical imaging data targeted at interoperability across systems. This enables transfer, storage and processing of imaging data regardless of the manufacturer. Pragmatically, manufacturers often store detailed acquisition parameters in private rather than public DICOM tags. In parallel, the DICOM standard itself has gradually evolved by introducing new public tags and properties to better capture emerging imaging technologies. Accurately extracting these details is essential for reproducible neuroimaging research. To address this need, we created a series of DICOM datasets illustrating how various manufacturers encode acquisition details that are critical for modern processing and analysis. These minimal test cases, covering CT and MR modalities, highlight manufacturer-specific conventions, including the use of public tags, private tags, and proprietary data structures. For each DICOM dataset, we provide corresponding NIfTI-formatted images with metadata JSON files following the BIDS standard, using consistent terminology to mitigate variations in how manufacturers encode acquisition details. Our repository provides validation datasets for any tool that is intended to extract acquisition details from medical imaging data.
Similar content being viewed by others
Background & Summary
Reproducibility is a critical challenge in neuroimaging research1. Most analyses involve multiple stages of image processing and complex statistical modeling to mitigate noise and identify meaningful signals2. These processes require precise knowledge of acquisition parameters, such as slice timing and phase encoding polarity. Studies aiming to aggregate data across sites must also address variability between scanners in order to ensure generalizability3. Consequently, neuroimaging researchers must be able to reliably extract details about the acquisition parameters. This task is facilitated by the Digital Imaging and Communications in Medicine (DICOM) standard4, which dominates medical imaging, promoting interoperability across tools and manufacturers. However, the rapid evolution of imaging technologies often outpaces consensus-based updates to the standard, leading manufacturers to use self-defined (“private”) metadata tags, which are, ideally, later integrated into the DICOM definition as standardized (“public”) tags. We provide a comprehensive collection of DICOM images spanning various manufacturers, modalities, and software versions to address this challenge. We also offer ground truth values for the imaging parameters that are crucial for reproducibility. These datasets enable tool developers to ensure robust and reproducible analyses of neuroimaging data.
Historically, each neuroimaging team used its own idiosyncratic method to provide sequence details for analysis with neuroimaging pipelines. The Brain Imaging Data Structure (BIDS)5,6 provided a more standardized framework for organizing and describing imaging datasets, defining the imaging format (voxel intensity stored in NIfTI), imaging parameters (in human-readable JSON text files using manufacturer agnostic terminology), and file naming (providing hints for intention), as well as the relevant non-imaging details of an experiment (e.g., participant behavioral data and demographics). The BIDS format allows for automated analyses of datasets regardless of scale, aids reproducibility, and facilitates data sharing and reuse. While this intentionally constrained format is considerably simpler than DICOM, it is worth noting that since the source data has DICOM format, adopting the BIDS structure does not replace the arduous7 task of accurately extracting acquisition details from raw imaging data. Therefore, repositories that share neuroimaging data in BIDS format8 require data providers to extract acquisition details prior to sharing, while DICOM repositories9,10 require data users to extract imaging parameters prior to analysis. These needs have become increasingly acute as public funding agencies expect scientists to share large datasets and clinical teams are aggregating huge datasets to empower precision medicine. We aim to support both approaches, providing validation datasets that ensure that imaging data and metadata can be determined from DICOM data regardless of scanner manufacturer and software version. While our team has focused on generating domain-specific BIDS/NIfTI formats, the resulting validation repositories also support efforts to extract standardized parameters across manufacturers for teams that choose to retain the DICOM format11.
Prior work in this topic includes the seminal “Rosetta bit” project12 which highlighted the importance of validation datasets in neuroimaging and providing gold-standard conversions of DICOM to NIfTI images. However, that project predated BIDS, and therefore while it provided a validation for voxel intensities and spatial properties, it did not provide sequence details crucial for image processing within a site and data harmonization across sites. Likewise, Rutherford and colleagues13 synthesized DICOM datasets to evaluate the performance of de-identification algorithms, but did not address acquisition details. Our datasets extend these traditions, providing updated resources to support modern interpretations of the DICOM standard in general, as well as the introduction of the enhanced DICOM format14.
Therefore, our overarching objective is to provide validation DICOM datasets that demonstrate how different manufacturers and different software versions store acquisition parameters. Our datasets include both the original DICOM datasets as well as the known solutions for critical acquisition parameters (using text files in the BIDS specification). Our datasets provide minimal test cases for understanding manufacturer-specific conventions, including private DICOM tags, and demonstrate edge cases that require careful interpretation. These repositories aid in developing and maintaining tools that read DICOM images like dcm2niix, dicm2nii and SPM15. By providing BIDS-compatible NIfTI images alongside standardized metadata, we offer a practical resource for improving data conversion and enhancing reproducibility.
Methods
We have assembled a collection of 36 distinct DICOM modules publicly available on Zenodo16 (Table 1; with mirrors on GitHub) designed to illustrate the diversity of images observed in the neuroimaging domain. Where possible, the datasets use low-resolution images and relatively few volumes with the aim of providing concise examples of specific use cases. This is in contrast to traditional research repositories where high spatial resolution and many observations are considered beneficial. We have curated our examples into specific repositories highlighting specific challenges. The rationale for each of these repositories is included in its “README.md” text file. A brief overview of the DICOM and BIDS methods for storing sequence information will provide context for the challenge of interpreting these validation datasets.
While DICOM files can contain many classes of data (e.g. sounds, waveforms, text documents)4, here we focus on files that contain images. A DICOM image file contains both the image data (voxel intensity values) as well as a series of tags that describe the image acquisition. Each DICOM tag is defined as two 16-bit hexadecimal numbers referred to as “group” and “element”, commonly written as text in the form <gggg,eeee>. All even numbered groups refer to public tags that are defined by the DICOM specification. Some public tags are required for a specific image modality (for example, the numeric <0018,0081> “Echo Time” is required for MR images) while others are optional (e.g., the string <0018,1020> “Software Versions” is optional). In contrast, odd numbered groups are private tags that the manufacturer can define. In the same way that typed programming languages define variables as strings, integers or floating point numbers, each DICOM tag is associated with a ‘Value Representation’ (VR) that defines the type of the data. While most DICOM tags store variables of a fixed type (e.g., a string, or an array of integers), some manufacturers use the VR ‘Other Byte’ (OB) to store sequence details using their own proprietary formats. Classic DICOM objects typically only store a single 2D slice in each file. For 4D time series such as functional MRI and diffusion imaging, this can result in tens of thousands of files for a single imaging series. Some manufacturers (Siemens and United Imaging Healthcare) provide the option to save all slices from a 3D volume as tiles (‘mosaics’) in a single DICOM file, dramatically reducing the number of files per series (at the cost of non-compliance with the DICOM standard and requiring tools to de-interlace these 2D mosaics into a 3D volume). More recently, the ‘enhanced DICOM’ specification14 allows saving multi-frame data where entire 3D and 4D series are stored in a single DICOM file. While complex, the DICOM standard provides tremendous flexibility and can work across a broad range of medical images.
The BIDS specification stores computed tomography (CT) and magnetic resonance imaging (MRI) data via two files. The pixel data and spatial properties (e.g., slice angulation) are saved as a binary NIfTI format file (which itself contains its own limited metadata) while other acquisition meta data are stored in a human-readable text file in the JSON format of key:value pairs. For example, the numeric value ‘“EchoTime”: 0.03’ and the string value ‘“SoftwareVersions”: “syngo MR B17”’. BIDS stores 3D anatomical images as a single file, and most 4D functional and diffusion time-series as a single file (though note, for multi-echo time series, each echo is stored as a separate file). This format has a more limited scope and is more constrained than DICOM. For example, all slices in a 3D NIfTI volume must be equidistant, while the DICOM format allows for variable slice distances. Another example is that DICOM supports many formats for compressing the voxel data (referred to as transfer syntaxes), with many of these compression schemes essentially unique for medical imaging (e.g., lossless JPEG formats with 16-bit precision were not widely adopted outside DICOM). In contrast, BIDS images can either be compressed or use the old but ubiquitous gzip file-level compression. The human-readable nature of BIDS JSONs requires floating-point numbers to be stored as ASCII text, which can introduce rounding errors and differences in precision compared to binary representations. This limitation is particularly relevant for our validation datasets, as tools that use different conventions for storing values may detect minor discrepancies between their outputs and the provided reference values. As a result, a degree of tolerance may be necessary to distinguish meaningful errors from negligible differences in value representation.
Common terminology across manufacturers
Many terms are clearly defined by the DICOM standard and are unambiguous across manufacturers. Since both DICOM and the BIDS specification attempt to describe the parameters used to acquire imaging data, it is unsurprising that many DICOM public tags map directly to BIDS keys (Table 2). However, it is worth noting that different manufacturers interpret some public DICOM tags differently. For example, some MR sequences have both brief and long repetition times, and some manufacturers report the brief duration while others use the long duration for the public tag <0018, 0080> “Repetition Time”. In contrast, the BIDS standard attempts to disambiguate different intervals, hence the BIDS keys “RepetitionTime”, “RepetitionTimeExcitation” and “RepetitionTimePreparation”.
Manufacturer specific terminology and missing data
Some BIDS keys map onto private tags used by specific manufacturers (Table 3). The table presents private DICOM tags that map directly onto a single BIDS key:value pair. Manufacturers can also record information in public DICOM tags that are embedded inside private tags, with values that conflict with the public tag information stored at the root level (e.g., our ‘dcm_qa_philips’ repository demonstrates this with the public tag <0020,0032> “Image Position (Patient)”). In addition, as noted previously, some manufacturers use OB data chunks to encode variables using their own proprietary data structures. Furthermore, different manufacturers use different conventions to encode the spatial direction for the diffusion-sensitizing directions/vectors that accompany diffusion imaging (with some using world space and others using image space). Finally, the DICOM files generated by some manufacturers omit acquisition details that are needed by some BIDS-compliant pipelines. For example, Philips DICOM files do not provide the details required to populate the BIDS ‘SliceTiming’ array. This means that a user must either populate that information manually (for example, using ezBIDS17) or skip the slice time correction processing step that can improve statistical power in certain acquisition regimes18. Likewise, Philips DICOM files do not record the phase encoding polarity or readout-time parameters necessary for correcting spatial distortions in echo-planar acquisitions19. In addition to addressing variations in reporting data, validation datasets play a crucial role in identifying missing variables and errors in DICOM files. When data are absent, tools such as PET2BIDS20 can supplement missing information to enable subsequent analyses. For both, this enables manufacturers to correct these issues and allows tools to flag problematic data. A notable example is the recent Siemens enhanced data, where features like slice timing and the MultibandAccelerationFactor were incorrectly specified, as documented in our dcm_xa_61 repository. Given that both BIDS and DICOM are evolving community standards, a comprehensive listing is beyond the scope of this work. Instead, we provide hyperlinks to the formal specifications and refer readers to the respective websites for Tables 2, 3, which are best suited to track ongoing updates.
While the public DICOM tags and defined BIDS fields aim to standardize common acquisition parameters, in practice, manufacturers may use proprietary values or settings that lack an unambiguous one-to-one mapping. Our diverse datasets are intended to illustrate both standardization and areas of divergence. For example, the ‘dcm_qa_cs_dl’ repository includes MRI series from Canon, Philips, and Siemens where deep learning–based image reconstruction is employed, with vendor-specific settings such as different sharpening levels (e.g., off, low, or high). Detecting these proprietary settings can help identify inter-site differences and support statistical modeling when acquisition parameters vary. Thus, while our primary goal is to provide examples of commonalities across manufacturers, the datasets also capture important instances of manufacturer-specific variability.
Third-party modifications
Additionally, some third-party DICOM PACS (Picture Archiving and Communication System) systems can modify each DICOM file they touch. This can introduce further challenges including renumbering of private tags, appending their own tags, or altering the compression scheme used for the voxel intensities (the transfer syntax). These modifications can make it more difficult to interpret the original manufacturer’s proprietary tags and may require additional troubleshooting or adjustments during the conversion process. Ensuring compatibility with such systems is an ongoing effort that underscores the importance of robust, adaptable conversion tools and detailed documentation. The ‘dcm_qa_ts’ repositories provide concrete examples of these situations.
Data Records
All datasets are available from the dcm_validate master repository, which links to each validation dataset as a Git submodule. The repository is archived at Zenodo (https://doi.org/10.5281/zenodo.15310934)16. Our validation dataset has a hierarchical structure, with specific edge cases (manufacturer, software, modality) illustrated in independent modular repositories. Table 1 provides a list of the repositories and the URL for data citation. Each of the repositories follows a regular structure. Specifically, the DICOM files are in the “In” folder, with the validated reference NIfTI format images and BIDS text files in the “Ref” folder. Also in the root folder is the “README.md” text file that describes the rationale for the repository, with each repository showcasing unique DICOM properties. The “LICENSE” file details the permissive license used for distribution. All repositories also include the shell script “batch.sh” that will invoke the user’s installed instance of dcm2niix to generate a new set of images in a folder named “Out” and to report any differences between these and the files provided in the “Ref” folder.
Each repository listed in Table 1 is structured as a standalone dataset with the following top-level folders and files, as shown
-
In/: Contains the original DICOM files in.dcm format. Files are organized by series and maintain the original filenames as exported from the scanner or PACS.
-
Ref/: Contains validated reference files in BIDS-compliant format:.nii: NIfTI-formatted imaging data.json: Metadata sidecars with key acquisition parameters (e.g., RepetitionTime, EchoTime, PhaseEncodingDirection, etc.).bvec/.bval: FSL format gradient directions and magnitude for diffusion images.
-
-
README.md: Describes the goal of the repository, key characteristics of the dataset, and instructions for running the validation.
-
LICENSE: Declares the BSD 2-Clause license, with a dual-license option under CC BY 4.0 for image reuse.
-
batch.sh: A shell script that calls the user’s local dcm2niix installation to convert In/ to Out/, and compares the results against Ref/.
Some repositories also include domain-specific resources:
-
*.xlsx: Spreadsheet with parameter sweeps or timing calculations (e.g., dcm_qa).
-
slicetime.cpp: Minimal C source for slice timing validation (dcm_qa_ge).
-
*.bval, *.bvec: Diffusion validation code and files (e.g. dcm_qa_dwi).
Folder and file names are consistent across repositories to facilitate automation. Users can expect the same script (batch.sh) and README format in each repository, regardless of modality or manufacturer. The primary variable across repositories is the DICOM content: different manufacturers, modalities, or edge cases (as described in the Comments column of Table 1).
The goal of these repositories is to demonstrate DICOM implementations, and therefore there was an explicit emphasis on low resolution images with a small number of observations. Many of the repositories include phantoms (water bottles). Some modalities (in particular, diffusion and arterial spin labeling) benefit from images of the human brain to allow proper validation. In these cases, data was acquired from the co-authors with the explicit knowledge that these would be shared on public repositories. These images also include generalized demographic details required to estimate specific absorption rate (SAR) such as age, height and weight. The Institutional Review Board (IRB) at the University of South Carolina determined that IRB review was not required for this project, as it does not meet the criteria for “research” as defined under 45 CFR 46.102(l), given that the data from each repository pertains to a single individual and are not generalizable to broader populations. This determination aligns with the intent of the project, which is to provide high-quality validation resources for the neuroimaging community rather than to draw inferences about human health or behavior.
This simple structure allows developers to ensure consistent results across different versions of dcm2niix, including builds compiled with different toolchains, targeting various architectures, or linked against different libraries, all of which can introduce variability in output21,22. While these repositories were originally developed to support automated regression testing of dcm2niix15, portions have since been adopted to assist development and validation in other tools, including nibabel23, divest for r24, mriconvert15, SPM15, FreeSurfer’s mri_convert25, orthanc-neuro26, and dicm2nii15.
Several of the repositories contain additional files that help developers extract the correct results. For example, ‘dcm_qa’ provides an Excel format spreadsheet that demonstrates how in-plane acceleration factor, partial Fourier and other parameters were varied, providing the formulas to infer total readout time. Likewise, the repository ‘dcm_qa_ge’ provides a minimal C program (slicetime.cpp) validated by General Electric (GE) engineers for deriving parameters. Another example is ‘dcm_qa_sag’, which includes Python scripts to generate validation tensor files and bitmap images used to confirm the correct definition of the diffusion gradient directions. In all these cases, these supplemental files are described in the “README.md” file of the repository.
Most of the repositories in Table 1 focus on the Magnetic Resonance (MR) modality, reflecting its versatility in generating diverse contrasts and its popularity among researchers due to the absence of ionizing radiation exposure for participants. The repository ‘dcm_qa_ct’ provides examples of computed tomography (CT), highlighting unique features of this modality, such as gantry tilt (leading to shear in 3D volumes) and variable inter-slice distances.
As shown in Table 1, each validation repository is modular, following the same structure. Typically, each repository is designed to showcase a specific edge case, as indicated by the ‘Comments’ section in Table 1, as well as the individual README.md files within each repository. The README file provides the rationale for each repository. We envision the number of repositories growing to document the evolving development of DICOM usage. Zenodo provides a master repository (‘dcm_validate’) that includes all sub-modules. This provides a single starting location for all of the modules.
In general, most validation repositories provide DICOM images acquired directly from the scanner without modification. However, there are exceptions where images were exported through a local Picture Archiving and Communication System (PACS), and these images can be identified by inspecting the Implementation Version Name (0002,0013) tag. No uniform de-identification tool or configuration was applied across datasets; rather, each dataset reflects the local practices at the institution where the data were acquired. We emphasize that private attributes critical for acquisition metadata were retained.
Technical Validation
Each dataset includes reference BIDS format text files that have been meticulously checked to ensure accurate correspondence with the DICOM information. As these have been made publicly available, users have been able to identify limitations and extend the content of the JSON files as the BIDS specification gets extended and our understanding of the manufacturer’s own interpretation of the DICOM standard and their use of proprietary DICOM tags improves. Any individual can use a GitHub issue to make a suggestion for how these repositories can be enhanced (with our dcm2niix repository already listing 931 closed issues that describe enhancements, feature requests, and limitations). Therefore, these repositories provide a method for the community to work collaboratively to ensure robust data conversion.
Although every effort has been made to ensure accuracy, certain limitations are inherent in interpreting vendor-specific attributes without formal documentation. In the absence of public manufacturer-issued conformance statements, extracted metadata were manually inspected and, when possible, verified with input from manufacturer engineers. By remaining open-source, these datasets invite community-driven validation and represent the most complete publicly available harmonization effort.
To ensure the validity and quality of our own derived datasets, we initially used the provided batch.sh script to generate a reference conversion using dcm2niix, with outputs saved in the Ref folder. The resulting BIDS datasets were then evaluated using the bids-validator to confirm compliance. We manually inspected all fields to ensure that required metadata were either correctly populated or documented as unavailable in the source DICOMs. For each new release of dcm2niix, these repositories are revalidated to ensure identical results; any discrepancies are reviewed manually to determine whether they reflect an unintended change or a meaningful improvement. Sharing these datasets publicly has allowed the broader neuroimaging community—including developers of related tools—to provide feedback and help verify that the curated outputs are both accurate and comprehensive Fig. 1.
Structure of a Validation Repository. Each repository contains two folders: the “In” folder holds the input files in DICOM format, while the “Ref” folder contains the reference conversion of these DICOM files to BIDS format, with each series producing a NIfTI image file (.nii) and an accompanying JSON metadata file (.json). The figure illustrates three types of 4D time-series inputs: functional imaging (fMRI), stored as mosaics with one DICOM file per 3D volume (32 volumes); diffusion imaging (DWI), saved in classic DICOM format with one file per 2D slice (60 slices); and arterial spin labeling imaging (ASL), saved in enhanced DICOM format, where all slices and volumes are stored in a single file. In all cases, the reference NIfTI files are stored as 4D data. The repository also includes a shell script (batch.sh) that uses dcm2niix to convert the DICOM data from the “In” folder into a BIDS dataset in a new “Out” folder and verifies that the output matches the files in the “Ref” folder. Additionally, a README.md file describes the unique properties of the repository, and a LICENSE file specifies the permissions for sharing the dataset.
Usage Notes
The validation repositories with DICOM files and their corresponding validated BIDS/NIfTI reference files are publicly available on Zenodo16. Once downloaded, each module includes a script “batch.sh” BASH command line script can be executed. This will use the version of dcm2niix in the user’s path to convert all of the DICOM files in the “In” folder to a new folder “Out” and then test that all of the files in the “Out” folder match the reference files in the “Ref” folder. Binary NIfTI images are tested for identical matches, while text-based JSON files are compared key by key, with any discrepancies reported.
To help users navigate the growing collection of validation datasets, we include a Python script (catalog_datasets.py) that catalogs series-level DICOM metadata across all submodules. This script is designed to run in two stages: the first pass scans all available BIDS JSON files (typically one file per DICOM series) and generates a catalog_fields.txt file listing all encountered fields. Users can then edit this file to select a subset of relevant fields (e.g., Manufacturer, PatientAge, EchoTime). A second run of the script uses the customized field list to generate a comma-separated values (CSV) table summarizing the selected metadata across all datasets. This utility facilitates dataset discovery and supports tool developers in identifying representative series for specific testing scenarios.For example, at the time of writing, the field “Manufacturer” appeared in 427 JSON files: Siemens (210), GE (90), Philips (70), Canon (45), Toshiba (5), and UIH (5). Similarly, the “MagneticFieldStrength” field identified 12 series acquired at 1.5 T, 391 at 3 T, and 14 at 7 T, with the remainder corresponding to CT acquisitions. This cataloging script provides a flexible way to search the entire family of repositories, enabling users to identify datasets with specific acquisition properties or scanner configurations without downloading each repository individually.
Code availability
All repositories are publicly available with URLs for each dataset listed in Table 1. Each repository is available using a permissive open source license with details described in each repository’s “LICENSE” file. Users can make suggestions directly from any of the repository web pages by generating an “Issue”.
References
Vogt, N. Reproducibility in MRI. Nat Methods 20, 34, https://doi.org/10.1038/s41592-022-01737-3 (2023).
Smith, S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23(Suppl 1), S208–19, https://doi.org/10.1016/j.neuroimage.2004.07.051 (2004).
Yu, M. et al. Statistical harmonization corrects site effects in functional connectivity measurements from multi-site fMRI data. Hum Brain Mapp 39, 4213–4227, https://doi.org/10.1002/hbm.24241 (2018).
Bidgood, W. D. Jr, Horii, S. C., Prior, F. W. & Van Syckle, D. E. Understanding and using DICOM, the data interchange standard for biomedical imaging. J Am Med Inform Assoc 4, 199–212, https://doi.org/10.1136/jamia.1997.0040199 (1997).
Gorgolewski, K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data 3, 160044, https://doi.org/10.1038/sdata.2016.44 (2016).
Poldrack, R. A. et al. The past, present, and future of the brain imaging data structure (BIDS). Imaging Neurosci (Camb) 2, 1–19, https://doi.org/10.1162/imag_a_00103 (2024).
Horien, C. et al. A hitchhiker’s guide to working with large, open-source neuroimaging datasets. Nat Hum Behav 5, 185–193, https://doi.org/10.1038/s41562-020-01005-4 (2021).
Markiewicz, C. J. et al. The OpenNeuro resource for sharing of neuroscience data. Elife 10, https://doi.org/10.7554/eLife.71774 (2021).
Petersen, R. C. et al. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology 74, 201–209, https://doi.org/10.1212/WNL.0b013e3181cb3e25 (2010).
Herz, C. et al. An Open Source Library for Standardized Communication of Quantitative Image Analysis Results Using DICOM. Cancer Res 77, e87–e90, https://doi.org/10.1158/0008-5472.CAN-17-0336 (2017).
Fedorov, A. et al. DICOM for quantitative imaging biomarker development: a standards based approach to sharing clinical data and structured PET/CT analysis results in head and neck cancer research. PeerJ 4, e2057, https://doi.org/10.7717/peerj.2057 (2016).
Yvernault, B. C. et al. Validating DICOM transcoding with an open multi-format resource. Neuroinformatics 12, 615–617, https://doi.org/10.1007/s12021-014-9230-9 (2014).
Rutherford, M. et al. A DICOM dataset for evaluation of medical image de-identification. Sci Data 8, 183, https://doi.org/10.1038/s41597-021-00967-y (2021).
Clunie, D. & Erickson, B. J. The new enhanced multiframe CT and MR DICOM objects. in (Proc. 2005 Society for Computer Applications in Radiology Annual Meeting, Orlando, FL, 2005).
Li, X., Morgan, P. S., Ashburner, J., Smith, J. & Rorden, C. The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J. Neurosci. Methods 264, 47–56, https://doi.org/10.1016/j.jneumeth.2016.03.001 (2016).
Rorden, C. neurolabusc/dcm_validate: Initial release. https://doi.org/10.5281/zenodo.15310934.
Levitas, D. et al. ezBIDS: Guided standardization of neuroimaging data interoperable with major data archives and platforms. Sci Data 11, 179, https://doi.org/10.1038/s41597-024-02959-0 (2024).
Sladky, R. et al. Slice-timing effects and their correction in functional MRI. Neuroimage 58, 588–594, https://doi.org/10.1016/j.neuroimage.2011.06.078 (2011).
Andersson, J. L. R., Skare, S. & Ashburner, J. How to correct susceptibility distortions in spin-echo echo-planar images: application to diffusion tensor imaging. Neuroimage 20, 870–888, https://doi.org/10.1016/S1053-8119(03)00336-7 (2003).
Galassi, A. et al. PET2BIDS: a library for converting Positron Emission Tomography data to BIDS. J Open Source Softw 9, https://doi.org/10.21105/joss.06067 (2024).
Renton, A. I. et al. Neurodesk: an accessible, flexible and portable data analysis environment for reproducible neuroimaging. Nat Methods 21, 804–808, https://doi.org/10.1038/s41592-023-02145-x (2024).
Rorden, C. et al. niimath and fslmaths: replication as a method to enhance popular neuroimaging tools. Apert Neuro 4, https://doi.org/10.52294/001c.94384 (2024).
Brett, M. et al. Nipy/nibabel: 5.3.1. https://doi.org/10.5281/ZENODO.591597 (Zenodo, 2024).
Clayden, J. & Rorden, C. Divest: Get images out of DICOM format quickly. CRAN: Contributed Packages The R Foundation https://doi.org/10.32614/cran.package.divest (2016).
Greve, D. N. & Fischl, B. Accurate and robust brain image alignment using boundary-based registration. Neuroimage 48, 63–72, https://doi.org/10.1016/j.neuroimage.2009.06.060 (2009).
Jodogne, S. The Orthanc ecosystem for medical imaging. J. Digit. Imaging 31, 341–352, https://doi.org/10.1007/s10278-018-0082-y (2018).
Acknowledgements
This project has been supported by the National Institutes of Health (P50-DC014664; RF1-MH133701). Typical of open source projects, we appreciate the contributions from a wide community. In particular, we note the contributions of Shan C Young.
Author information
Authors and Affiliations
Contributions
All authors reviewed and contributed to the manuscript. CR developed the concept of validation datasets, and was the initial author of the script files, “readme” files, and was the author for the original draft of the manuscript. All other members were involved with acquiring validation datasets from their centers and validating the accurate conversion for their datasets.
Corresponding author
Ethics declarations
Competing interests
Some authors of this manuscript are employed by imaging equipment manufacturers, specifically Philips (MC, SG), General Electric (BF, JS), and Siemens (WR). Their contributions were made in the interest of promoting transparency and reproducibility in scientific research. These individuals provided technical insights that facilitate the interpretation of vendor-specific attributes and support the broader community’s efforts to harmonize metadata extraction across manufacturers. No commercial products are promoted herein. These contributions are intended to encourage the alignment of vendor practices with evolving open standards, rather than maintain the status quo. The rest of the authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rorden, C., Béranger, B., Cheng, H. et al. DICOM datasets for reproducible neuroimaging research across manufacturers and software versions. Sci Data 12, 1168 (2025). https://doi.org/10.1038/s41597-025-05503-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-05503-w



