Table 1 Brief summary of major data sources commonly used in the science of science literature.

From: SciSciNet: A large-scale open data lake for the science of science research

Data source

Highlights

API

Data dump

Crossref

Data on publications with DOIs registered in Crossref.

✓

✓

OpenAlex

Data connecting publications, authors, institutions, and concepts.

✓

✓

Dimensions

Data connecting publications, grants, datasets, trials, and patents.

—

—

Overton

Policy documents and their citations to science and policy.

—

—

OpenCitations

DOI-DOI open citation links.

✓

✓

AMiner

Advanced information generated through data mining techniques.

✓

✓

CiteSeerX

Full-text publications, one of the earliest digital library search engines.

✓

—

ORCID

Data on researchers with ORCID IDs (funding, works, peer review, etc.).

✓

✓

ROR

Data on research organizations with ROR IDs, seeded by GRID.

✓

✓

Retraction Watch

Data on retracted papers and reasons for retraction.

✗

—

Semantic Scholar

Publication dataset featuring AI-derived products (e.g., embeddings).

✓

—

Web of Science

Curated by in-house experts, basis for Journal Citation Reports.

—

—

PubMed

Biomedical literature with PubMed IDs, linked to NIH projects, clinical trials, and other biomedical entities.

✓

✓

NIH RePORTER

Data on NIH-funded projects, with linkages to publications, patents, and clinical studies.

✓

✓

NSF Awards

Data on NSF-funded projects, with linkages to publications.

✓

✓

Clinical Trials

Information on clinical studies and linkages to references worldwide.

✓

✓

PatentsView

Data on USPTO patents (citations, classifications, inventors, etc.).

✓

✓

Patent Citation to Science

Patent-science citations extracted from USPTO and EPO patents.

✗

✓

Publications of Nobel laureates

Publication records and prize-winning papers of Nobel laureates.

✗

✓

Altmetric

Data on online attention (e.g., mainstream and social media).

✓

—

CORE

Metadata and full-text information of 87 M + papers.

✓

✓

Unpaywall

Publication metadata and open-access related information.

✓

✓

DOAJ

Community-curated data on open-access journals and papers.

✓

✓

OpenAIRE Research Graph

Data connecting scientific products, organizations, funded projects, etc. from 70 K + sources.

✓

✓

Faculty Opinions with Gender

Metadata of authors from Faculty Opinions with gender classification from Faculty Opinions and Web of Science.

—

✓

Scopus

Documents selected by an independent review board of experts.

—

—

Lens

Citation relationships within and across papers and patents.

—

—

Springer Nature SciGraph

Triples connecting multiple entities in the research landscape, including publications, funders, and affiliations.

✓

✓

Google Scholar

Large-scale data on publications, citations, and disambiguated scholar profiles indexed by Google.

✗

✗

  1. ✓: publicly available, —: available upon application or subscription, ✗: not available to the best of our knowledge (a more detailed summary is given in Table S1).