Table 1 Hierarchical organization of the data fields in the mdCATH dataset, with units and description.

From: mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics

Field

Size

Type

Unit

Description

Domain ID/

chain

N

string

 

Chain ID

element

N

string

 

Chemical element

pdb

1

string

 

PDB file used for simulation

psf

1

string

 

Topology file used for simulation

pdbProteinAtoms

1

string

 

PDB file with the N reported atoms

resid

N

integer

 

Residue number

resname

N

string

 

Residue name

z

N

integer

 

Atomic number

.numResidues

1

integer

 

Number of residues (attribute)

320/

   

Group for the 320 K simulations

0/

   

Data of the first replica

coords

F × N × 3

float

Å

Atom coordinates

forces

F × N × 3

float

kcal/mol/Å

Forces

dssp

F × R

string

 

DSSP secondary str. assignments

gyrationRadius

F

double

nm

Gyration radius

rmsd

F

float

nm

Root-mean square deviation w.r.t. begin

rmsf

R

float

nm

Cα root-mean-square fluctuation

box

3 × 3

float

nm

Simulation unit cell

.numFrames

1

integer

 

Number of frames for this replica (attribute)

1/

   

Second replica

    

348/

    

    
  1. The following groups and fields are provided in an HDF5 file for each simulated CATH domain. Key: N, number of atoms; R, number of residues; F, trajectory length in frames (1 frame corresponds to 1 ns of simulated time).