Table 1 Parsed X-Ray data files.

Field for Data File	Description
people.csv: Contains a list of actors, corresponding characters, and IMDb name IDs for actors.
name_id	Name ID of person from IMDb. The URL corresponding to a name_id would be https://www.imdb.com/name/ < name_id >. Example: https://www.imdb.com/name/nm0451321for nm0451321.
person	Name of the person.
character	Name of the character in the movie.
scenes.csv: Contains a list of scenes, along with the start and end timestamps of each scene.
scene	Scene number.
start	Scene start timestamp in milliseconds.
end	Scene end timestamp in milliseconds.
people_in_scenes.csv: Contains a list of scenes with IMDb IDs of people appearing in the scene, along with start and end timestamps.
scene	Scene number.
start	Scene start timestamp in milliseconds.
end	Scene end timestamp in milliseconds.
name_id	Name ID of person from IMDb.
timestamp	Timestamp of the character’s first appearance in the scene, in milliseconds.

These files are provided for each film in the schema shown in Fig. 2 under the xrays directory. Note that the scene timestamps in these files are not known to be aligned to the subtitle timestamps contained in .ttml2 files (Fig. 2). However, the example Jupyter notebook demonstrates how subtitles can be assigned to scenes based on temporal overlap. The majority of subtitle segments fall fully within scene boundaries, indicating a strong degree of alignment between the two timestamp sources. Perfect correspondence is not expected, as spoken dialogue may cross visual scene boundaries.

Quick links

Search