Table 1 Brief summary of major data sources commonly used in the science of science literature.
From: SciSciNet: A large-scale open data lake for the science of science research
Data source | Highlights | API | Data dump |
|---|---|---|---|
Crossref | Data on publications with DOIs registered in Crossref. | ✓ | ✓ |
OpenAlex | Data connecting publications, authors, institutions, and concepts. | ✓ | ✓ |
Dimensions | Data connecting publications, grants, datasets, trials, and patents. | — | — |
Overton | Policy documents and their citations to science and policy. | — | — |
OpenCitations | DOI-DOI open citation links. | ✓ | ✓ |
AMiner | Advanced information generated through data mining techniques. | ✓ | ✓ |
CiteSeerX | Full-text publications, one of the earliest digital library search engines. | ✓ | — |
ORCID | Data on researchers with ORCID IDs (funding, works, peer review, etc.). | ✓ | ✓ |
ROR | Data on research organizations with ROR IDs, seeded by GRID. | ✓ | ✓ |
Retraction Watch | Data on retracted papers and reasons for retraction. | ✗ | — |
Semantic Scholar | Publication dataset featuring AI-derived products (e.g., embeddings). | ✓ | — |
Web of Science | Curated by in-house experts, basis for Journal Citation Reports. | — | — |
PubMed | Biomedical literature with PubMed IDs, linked to NIH projects, clinical trials, and other biomedical entities. | ✓ | ✓ |
NIH RePORTER | Data on NIH-funded projects, with linkages to publications, patents, and clinical studies. | ✓ | ✓ |
NSF Awards | Data on NSF-funded projects, with linkages to publications. | ✓ | ✓ |
Clinical Trials | Information on clinical studies and linkages to references worldwide. | ✓ | ✓ |
PatentsView | Data on USPTO patents (citations, classifications, inventors, etc.). | ✓ | ✓ |
Patent Citation to Science | Patent-science citations extracted from USPTO and EPO patents. | ✗ | ✓ |
Publications of Nobel laureates | Publication records and prize-winning papers of Nobel laureates. | ✗ | ✓ |
Altmetric | Data on online attention (e.g., mainstream and social media). | ✓ | — |
CORE | Metadata and full-text information of 87 M + papers. | ✓ | ✓ |
Unpaywall | Publication metadata and open-access related information. | ✓ | ✓ |
DOAJ | Community-curated data on open-access journals and papers. | ✓ | ✓ |
OpenAIRE Research Graph | Data connecting scientific products, organizations, funded projects, etc. from 70 K + sources. | ✓ | ✓ |
Faculty Opinions with Gender | Metadata of authors from Faculty Opinions with gender classification from Faculty Opinions and Web of Science. | — | ✓ |
Scopus | Documents selected by an independent review board of experts. | — | — |
Lens | Citation relationships within and across papers and patents. | — | — |
Springer Nature SciGraph | Triples connecting multiple entities in the research landscape, including publications, funders, and affiliations. | ✓ | ✓ |
Google Scholar | Large-scale data on publications, citations, and disambiguated scholar profiles indexed by Google. | ✗ | ✗ |