Table 1 Summary of the variables.

From: Unfolding the downloads of datasets: A multifaceted exploration of influencing factors

 

Variable

Abbreviation

Description

Dependent variables

Total downloads

Normalized_download

The total number of downloads of files in the dataset.

Average downloads

Normalized_average_download

The average daily downloads of files in the dataset.

Independent variables

Length of the descriptive text

Description_length

The number of words in the description metadata field.

New Dale-Chall score

Dale_chall

The New Dale-Chall readability score of the description text in the dataset metadata.

Degree of file accessibility

File_openness

The proportion of files in the dataset that are completely open and can be directly downloaded by users.

Authority of the dataset author’s institution

Institution_rank

The impact of the dataset author’s institution in the world.

Citations of the dataset-related papers

Related_citation

The total number of citations of papers supported by the dataset.

Control variables

Length of time the dataset was released from

Publication_duration

The number of days between the publication date and the most recent date in the dataset.

Number of files

File_num

Number of files in the dataset.

Dataset size

File_size

The total size of the files in the dataset.

Number of subjects

Subject_num

The number of subjects to which a dataset belongs.

Indexed by the registry of research data repositories

Re3data_fairsharing

Whether the data repository where the dataset comes from is indexed by re3data.org or fairsharing.org.